In this exercise, you will solve the axpy
problem (Y=Y+a*X
) imposing the right dependencies on the various tasks. while correctly imposing task dependencies. All operations, including initialization, must be performed on devices. You should start from the vector addition exercise solutions or from the skeleton codes in this folder.
Structure of the Code:
- define a SYCL queue
- declare the variables
- fill the variables with data using 2 kernels, one for each array
- do the final
axpy
computation in another kernel - copy data to host to check the results
There are several ways to enforce dependencies.
When managing memory with buffers and accessors, dependencies are handled automatically. Accessors ensure proper synchronization by blocking access to associated buffers until kernels complete their operations. As a result, kernels that use the same buffer execute in the correct order.
Steps
- Start from the vector_add example or use the skeleton axpy_buffer.cpp
- initialize the arrays
X
andY
with two separate kernels. Use initial valuesX=1
, andY=2
at the beginning. - compute
Y=Y+a*X
using a 3rd kernel witha=1
- copy the final result back to the host to validate
SYCL queues are out-of-order by default, meaning kernels can execute concurrently. When using USM, submitting multiple kernels without explicit synchronization can result in incorrect execution order. To enforce task order, you can define the queue as in-order, ensuring tasks are executed sequentially.
Steps
- Start from the vector_add example or use the skeleton axpy_usm_queue_sync.cpp.
- Modify the queue definition
sycl::queue queue(sycl::default_selector{}, sycl::property::queue::in_order{});
- initialize
X
andY
on the device using two separate kernels (no need for.memcpy
calls). - submit a third kernel to compute
Y = Y + a * X
with a = 1. - copy the result back to the host and validate it
Instead of using in-order queues can use sycl::events
to explicitly set the order of execution. Each kernel submission returns an event, which can be used to ensure that subsequent tasks wait for the completion of preceding tasks.
Steps
- Start from the vector_add example or use the skeleton axpy_usm_events.cpp
- Keep the default out-of-order queue definition
- Initialize arrays
X
andY
on the device using two separate kernels. Capture the events from these kernel submissions:
auto event_x = queue.submit([&](sycl::handler &h) {
h.parallel_for(range{N}, [=](id<1> idx) { X[idx] = 1; });
});
auto event_b = queue.submit([&](sycl::handler &h) {
h.parallel_for(range{N}, [=](id<1> idx) { Y[idx] = 2; });
});
- submit the
axpy
kernel with an explicit dependency on the two initialization events
queue.submit([&](sycl::handler &h) {
h.depends_on({event_x, event_y});
h.parallel_for(range{N}, [=](id<1> idx) { Y[idx] += a * X[idx]; });
});
or
queue.
h.parallel_for(range{N},{event_x, event_y}, [=](id<1> idx) { Y[idx] += a * X[idx]; });
- as a exercise you can synch the host with the event
sycl::event::wait({event_a, event_b});
- copy the final result back to the host for validation