Write a device kernel that calculates the single precision BLAS operation
saxpy, i.e. y = a * x + y
.
Initialise the vectors with some values on the CPU and confirm the correctness of the calculation, e.g. by comparing to reference values calculated on the CPU or by printing out the results.
You may start from a skeleton code provided in saxpy.cpp.