Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Dependencies, axpy example

In this exercise, you will solve the axpy problem (Y=Y+a*X) imposing the right dependencies on the various tasks. while correctly imposing task dependencies. All operations, including initialization, must be performed on devices. You should start from the vector addition exercise solutions or from the skeleton codes in this folder.

Structure of the Code:

  1. define a SYCL queue
  2. declare the variables
  3. fill the variables with data using 2 kernels, one for each array
  4. do the final axpy computation in another kernel
  5. copy data to host to check the results

There are several ways to enforce dependencies.

I. Automatic dependencies using Buffer and Accessors API

When managing memory with buffers and accessors, dependencies are handled automatically. Accessors ensure proper synchronization by blocking access to associated buffers until kernels complete their operations. As a result, kernels that use the same buffer execute in the correct order.

Steps

  1. Start from the vector_add example or use the skeleton axpy_buffer.cpp
  2. initialize the arrays Xand Y with two separate kernels. Use initial values X=1, and Y=2 at the beginning.
  3. compute Y=Y+a*X using a 3rd kernel with a=1
  4. copy the final result back to the host to validate

II. Dependencies using USM

IIa) Use of in-order queues

SYCL queues are out-of-order by default, meaning kernels can execute concurrently. When using USM, submitting multiple kernels without explicit synchronization can result in incorrect execution order. To enforce task order, you can define the queue as in-order, ensuring tasks are executed sequentially.

Steps

  1. Start from the vector_add example or use the skeleton axpy_usm_queue_sync.cpp.
  2. Modify the queue definition
sycl::queue queue(sycl::default_selector{}, sycl::property::queue::in_order{});
  1. initialize X and Y on the device using two separate kernels (no need for .memcpy calls).
  2. submit a third kernel to compute Y = Y + a * X with a = 1.
  3. copy the result back to the host and validate it

IIb) Use sycl::events

Instead of using in-order queues can use sycl::events to explicitly set the order of execution. Each kernel submission returns an event, which can be used to ensure that subsequent tasks wait for the completion of preceding tasks.

Steps

  1. Start from the vector_add example or use the skeleton axpy_usm_events.cpp
  2. Keep the default out-of-order queue definition
  3. Initialize arrays X and Y on the device using two separate kernels. Capture the events from these kernel submissions:
auto event_x = queue.submit([&](sycl::handler &h) {
    h.parallel_for(range{N}, [=](id<1> idx) { X[idx] = 1; });
});
auto event_b = queue.submit([&](sycl::handler &h) {
    h.parallel_for(range{N}, [=](id<1> idx) { Y[idx] = 2; });
});
  1. submit the axpy kernel with an explicit dependency on the two initialization events
queue.submit([&](sycl::handler &h) {
   h.depends_on({event_x, event_y});
   h.parallel_for(range{N}, [=](id<1> idx) { Y[idx] += a * X[idx]; });
});

or

queue.
   h.parallel_for(range{N},{event_x, event_y}, [=](id<1> idx) { Y[idx] += a * X[idx]; });
  1. as a exercise you can synch the host with the event sycl::event::wait({event_a, event_b});
  2. copy the final result back to the host for validation