Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[onedpl][ranges][doc] + Range-based API description #1596

Merged
merged 38 commits into from
Sep 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
976dbb8
[onedpl][doc] + Range-based API description
MikeDvorskiy May 21, 2024
1c38faa
[oneDPL][doc] + introduction for oneapi::dpl::ext::ranges
MikeDvorskiy May 21, 2024
525a3ca
[onedpl][doc][ranges] + supported std ranges description
MikeDvorskiy May 21, 2024
daa301e
[onedpl][doc][ranges] + algo list
MikeDvorskiy May 21, 2024
8594c37
[onedpl][doc][ranges] + minor change in the wording
MikeDvorskiy May 21, 2024
e712352
[oneDPL][doc][ranges] + changes in the wording
MikeDvorskiy May 21, 2024
dcb74e4
[onedpl][doc][ranges] + wording about ranges support in "Pass Data.."…
MikeDvorskiy May 21, 2024
3931b76
[onedpl][doc][ranges] + minor changes in formatting
MikeDvorskiy May 21, 2024
5f3b885
[onedpl][doc][ranges] + example
MikeDvorskiy May 21, 2024
56ffa81
[oneDPL][doc][ranges] minor change in an example
MikeDvorskiy May 21, 2024
a661f97
[oneDPL][doc][ranges] views::reverse() => views::reverse
MikeDvorskiy May 21, 2024
15bc630
[oneDPL][doc][ranges] + minor change in an exemlple
MikeDvorskiy May 21, 2024
9d6a770
[oneDPL][doc][ranges] + minor change in Parallel API topic
MikeDvorskiy May 21, 2024
47368f0
+ fix a typo
MikeDvorskiy May 22, 2024
d0cb76d
+ fix a typo
MikeDvorskiy May 22, 2024
3b77301
+ fix a typo
MikeDvorskiy May 22, 2024
96b0558
[oneDPL][ranges][doc] minor changes
MikeDvorskiy Jul 17, 2024
358ddb3
Merge branch 'main' into std_ranges_support_as_ext
akukanov Sep 10, 2024
6f30bb7
Split experimental and production APIs into separate pages
akukanov Sep 10, 2024
ed18aba
Add missed backquotes, split long lines
akukanov Sep 11, 2024
5a67152
Add a short description to the Parallel API page
akukanov Sep 12, 2024
9aab1a4
Reworked the new page for range algorithms
akukanov Sep 13, 2024
8a63b1c
Rework the page for experimental range APIs
akukanov Sep 13, 2024
d9b06e6
Merge branch 'main' into std_ranges_support_as_ext
akukanov Sep 13, 2024
5d270e9
Address review feedback
akukanov Sep 13, 2024
07d3c99
Address feedback to a previous documentation patch
akukanov Sep 13, 2024
724b0ad
Consistently use angle brackets around header file names
akukanov Sep 14, 2024
bf80964
Fix the exanples
akukanov Sep 16, 2024
e31aaf7
Reorganize the page on data passing, add information for range algori…
akukanov Sep 16, 2024
58118a9
Fix labels, address feedback
akukanov Sep 16, 2024
4d951e8
Fix rendering and improve code examples
akukanov Sep 16, 2024
f776759
Refer to host policies as standard-aligned almost everywhere
akukanov Sep 16, 2024
7bf3c17
Add another note and a cross-reference from 'Parallel range algorithms'
akukanov Sep 16, 2024
85b2029
Avoid the `USM memory' tautology
akukanov Sep 16, 2024
6cd8a4e
Reduce ambiguity between host memory and host-allocated USM
akukanov Sep 17, 2024
6d00f35
Fix a typo
akukanov Sep 17, 2024
ed681eb
Fix rendering
akukanov Sep 17, 2024
64f9f63
Address review comments
akukanov Sep 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Utility Function Object Classes
##################################

The definitions of the utility function objects are available through the
``oneapi/dpl/functional`` header. All function objects are implemented in the ``oneapi::dpl`` namespace.
``<oneapi/dpl/functional>`` header. All function objects are implemented in the ``oneapi::dpl`` namespace.

* ``identity``: A function object type where the operator() returns the argument unchanged.
It is an implementation of ``std::identity`` that can be used prior to C++20.
Expand Down
7 changes: 5 additions & 2 deletions documentation/library_guide/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -82,12 +82,15 @@ to build the standard C++ code for execution on a SYCL device:
icpx -fsycl -fsycl-pstl-offload=gpu program.cpp -o program

This option redirects C++ parallel algorithms invoked with the ``std::execution::par_unseq`` policy
to |onedpl_short| algorithms. It does not change the behavior of the |onedpl_short| execution policies and algorithms
that are directly used in the code.
to |onedpl_short| algorithms. It does not change the behavior of the |onedpl_short| algorithms and
execution policies that are directly used in the code.

Useful Information
==================

.. _library-restrictions:


Difference with Standard C++ Parallel Algorithms
************************************************

Expand Down
2 changes: 1 addition & 1 deletion documentation/library_guide/kernel_templates_main.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ It is recommended to use kernel templates when there is an opportunity to custom
for a particular workload (for example, the number of elements and their type),
or for a specific device (for example, based on the available local memory).

To use the API, include the ``oneapi/dpl/experimental/kernel_templates`` header file.
To use the API, include the ``<oneapi/dpl/experimental/kernel_templates>`` header file.
The primary API namespace is ``oneapi::dpl::experimental::kt``, and nested namespaces are used to further categorize the templates.

* :doc:`Kernel Configuration <kernel_templates/kernel_configuration>`. Generic structure for configuring a kernel template.
Expand Down
5 changes: 3 additions & 2 deletions documentation/library_guide/macros.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ Macro Description
``_PSTL_VERSION_PATCH`` ``_PSTL_VERSION % 10``: The patch number.
================================= ==============================

.. _feature-macros:

Feature Macros
==============
Use these macros to test presence of specific |onedpl_short| functionality.
Expand All @@ -40,8 +42,7 @@ Macro Macro values and the functionality
---------------------------------- -----------------------------------------------
``ONEDPL_HAS_RANGE_ALGORITHMS`` Parallel range algorithms.

* ``202409L`` - for_each, transform, find, find_if, find_if_not, any_of, all_of, none_of, adjacent_find, search, search_n,
count, count_if, equal, is_sorted, sort, stable_sort, min_element, max_element, copy, copy_if, merge
* ``202409L`` - see :ref:`available algorithms <range-algorithms-202409L>`.
================================== ===============================================

Additional Macros
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Additional Algorithms
######################

The definitions of the algorithms listed below are available through the ``oneapi/dpl/algorithm``
The definitions of the algorithms listed below are available through the ``<oneapi/dpl/algorithm>``
header. All algorithms are implemented in the ``oneapi::dpl`` namespace.

* ``reduce_by_segment``: performs partial reductions on a sequence's values and keys. Each
Expand Down
22 changes: 11 additions & 11 deletions documentation/library_guide/parallel_api/execution_policies.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,23 +4,23 @@ Execution Policies
According to `the oneAPI specification
<https://uxlfoundation.github.io/oneAPI-spec/spec/elements/oneDPL/source/index.html>`_,
|onedpl_long| (|onedpl_short|) provides execution policies semantically aligned with the C++ standard,
also referred to as *standard aligned* or *host execution policies*, and the *device execution policies*
referred to as *standard-aligned* or *host execution policies*, as well as *device execution policies*
to run data parallel computations on heterogeneous systems.

The execution policies are defined in the ``oneapi::dpl::execution`` namespace and provided
in the ``oneapi/dpl/execution`` header. The policies have the following meaning:
in the ``<oneapi/dpl/execution>`` header. The policies have the following meaning:

====================== =====================================================
Policy Value or Type Description
Policy Name / Type Description
====================== =====================================================
``seq`` The standard aligned policy for sequential execution.
``seq`` The standard-aligned policy for sequential execution.
---------------------- -----------------------------------------------------
``unseq`` The standard aligned policy for unsequenced SIMD execution.
``unseq`` The standard-aligned policy for possible unsequenced SIMD execution.
This policy requires user-provided functions to be SIMD-safe.
---------------------- -----------------------------------------------------
``par`` The standard aligned policy for parallel execution by multiple threads.
``par`` The standard-aligned policy for possible parallel execution by multiple threads.
---------------------- -----------------------------------------------------
``par_unseq`` The standard aligned policy with the combined effect of ``unseq`` and ``par``.
``par_unseq`` The standard-aligned policy with the combined effect of ``unseq`` and ``par``.
---------------------- -----------------------------------------------------
``device_policy`` The class template to create device policies for data parallel execution.
---------------------- -----------------------------------------------------
Expand Down Expand Up @@ -54,9 +54,9 @@ Follow these steps to add Parallel API to your application:
- ``#include <oneapi/dpl/memory>``

#. Pass a |onedpl_short| execution policy object as the first argument to a parallel algorithm
to specify the desired execution behavior.
to indicate the desired execution behavior.

#. If you use the C++ standard aligned execution policies:
#. If you use the standard-aligned execution policies:

- Compile the code with options that enable OpenMP parallelism and/or SIMD vectorization pragmas.
- Compile and link with the |onetbb_short| or |tbb_short| library for TBB-based parallelism.
Expand Down Expand Up @@ -197,8 +197,8 @@ The code below assumes you have added ``using namespace oneapi::dpl::execution;`
Error Handling with Device Execution Policies
=============================================

The SYCL error handling model supports two types of errors: Synchronous errors cause the SYCL host
runtime libraries throw exceptions. Asynchronous errors may only be processed in a user-supplied error handler
The SYCL error handling model supports two types of errors. Synchronous errors cause the SYCL API functions
to throw exceptions. Asynchronous errors may only be processed in a user-supplied error handler
associated with a SYCL queue.

For algorithms executed with device policies, handling all errors, synchronous or asynchronous, is a
Expand Down
2 changes: 1 addition & 1 deletion documentation/library_guide/parallel_api/iterators.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Iterators
#########

The definitions of the iterators are available through the ``oneapi/dpl/iterator``
The definitions of the iterators are available through the ``<oneapi/dpl/iterator>``
header. All iterators are implemented in the ``oneapi::dpl`` namespace.

* ``counting_iterator``: a random-access iterator-like type whose dereferenced value is an integer
Expand Down
111 changes: 111 additions & 0 deletions documentation/library_guide/parallel_api/parallel_range_algorithms.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
Parallel Range Algorithms
#########################

C++20 introduces the `Ranges library <https://en.cppreference.com/w/cpp/ranges>`_ and
`range algorithms <https://en.cppreference.com/w/cpp/algorithm/ranges>`_ as a modern paradigm for expressing
generic operations on data sequences.

|onedpl_long| (|onedpl_short|) extends it with *parallel range algorithms*, which can be used with the standard range
classes to leverage |onedpl_short| ability of parallel execution on both the host computer and data parallel devices.

oneDPL only supports random access ranges, because they allow simultaneous constant-time access to elements
at any position in the range. This enables efficient workload distribution among multiple threads or processing units,
which is essential for achieving high performance in parallel execution.

.. Note::

The use of parallel range algorithms requires C++20 and the C++ standard libraries coming with GCC 10 (or higher),
Clang 16 (or higher) and Microsoft* Visual Studio* 2019 16.10 (or higher).

Supported Range Views
---------------------

`Views <https://en.cppreference.com/w/cpp/ranges/view>`_ are lightweight ranges typically used to describe
data transformation pipelines. The C++20 standard defines two categories of standard range views, called
*factories* and *adaptors*:

* A range factory generates its data elements on access via an index or an iterator to the range.
* A range adaptor transforms its underlying data range(s) or view(s) into a new view with modified behavior.

The following C++ standard random access adaptors and factories can be used with the |onedpl_short|
parallel range algorithms:

* ``std::ranges::views::all``: A range adaptor that returns a view that includes all elements of a range
(only with standard-aligned execution policies).
* ``std::ranges::subrange``: A utility that produces a view from an iterator and a sentinel or from a range.
* ``std::span``: A view to a contiguous data sequence.
* ``std::ranges::iota_view``: A range factory that generates a sequence of elements by repeatedly incrementing
an initial value.
* ``std::ranges::single_view``: A view that contains exactly one element of a specified value.
* ``std::ranges::transform_view``: A range adaptor that produces a view that applies a transformation to each element
of another view.
* ``std::ranges::reverse_view``: A range adaptor that produces a reversed sequence of elements provided by another view.
* ``std::ranges::take_view``: A range adaptor that produces a view of the first N elements from another view.
* ``std::ranges::drop_view``: A range adaptor that produces a view excluding the first N elements from another view.

Visit :doc:`pass_data_algorithms` for more information, especially on the :ref:`use of range views <use-range-views>`
with device execution policies.

Supported Algorithms
--------------------

The ``<oneapi/dpl/algorithm>`` header defines the parallel range algorithms in the ``namespace oneapi::dpl::ranges``.
All algorithms work with both standard-aligned (host) and device execution policies.

The ``ONEDPL_HAS_RANGE_ALGORITHMS`` :ref:`feature macro <feature-macros>` may be used to test for the presence of
parallel range algorithms.

.. _range-algorithms-202409L:

If ``ONEDPL_HAS_RANGE_ALGORITHMS`` is defined to ``202409L`` or a greater value, the following algorithms are provided:

* ``for_each``
* ``transform``
* ``find``
* ``find_if``
* ``find_if_not``
* ``adjacent_find``
* ``all_of``
* ``any_of``
* ``none_of``
* ``search``
* ``search_n``
* ``count``
* ``count_if``
* ``equal``
* ``sort``
* ``stable_sort``
* ``is_sorted``
* ``min_element``
* ``max_element``
* ``copy``
* ``copy_if``
* ``merge``

Usage Example for Parallel Range Algorithms
-------------------------------------------

.. code:: cpp

{
std::vector<int> vec_in = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
std::vector<int> vec_out{vec_in.size()};

auto view_in = std::ranges::views::all(vec_in) | std::ranges::views::reverse;
oneapi::dpl::ranges::copy(oneapi::dpl::execution::par, view_in, vec_out);
}
{
using usm_shared_allocator = sycl::usm_allocator<int, sycl::usm::alloc::shared>;
// Allocate for the queue used by the execution policy
usm_shared_allocator alloc{oneapi::dpl::execution::dpcpp_default.queue()};

std::vector<int, usm_shared_allocator> vec_in{{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, alloc};
std::vector<int, usm_shared_allocator> vec_out{vec_in.size(), alloc};

auto view_in = std::ranges::subrange(vec_in.begin(), vec_in.end()) | std::ranges::views::reverse;
oneapi::dpl::ranges::copy(oneapi::dpl::execution::dpcpp_default, view_in, std::span(vec_out));
}

.. rubric:: See also:

:doc:`range_based_api`
Loading
Loading