Skip to content

Commit

Permalink
replacing hiplaunchkernel to hiplaunchkernelGGL and removing hiplaunc…
Browse files Browse the repository at this point in the history
…hparm
  • Loading branch information
tcs committed Oct 25, 2019
1 parent 9f6efd6 commit e836b62
Show file tree
Hide file tree
Showing 15 changed files with 45 additions and 47 deletions.
2 changes: 1 addition & 1 deletion FAQ/FAQ_HIP.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ HIP provides the following:
* Memory management (hipMalloc(), hipMemcpy(), hipFree(), etc.)
* Streams (hipStreamCreate(),hipStreamSynchronize(), hipStreamWaitEvent(), etc.)
* Events (hipEventRecord(), hipEventElapsedTime(), etc.)
* Kernel launching (hipLaunchKernel is a standard C/C++ function that replaces <<< >>>)
* Kernel launching (hipLaunchKernelGGL is a standard C/C++ function that replaces <<< >>>)
* HIP Module API to control when adn how code is loaded.
* CUDA*style kernel coordinate functions (threadIdx, blockIdx, blockDim, gridDim)
* Cross*lane instructions including shfl, ballot, any, all
Expand Down
2 changes: 1 addition & 1 deletion Inputs/hip_runtime_api.h
Original file line number Diff line number Diff line change
Expand Up @@ -328,7 +328,7 @@ hipError_t hipDeviceReset(void);
* - Any device memory subsequently allocated from this host thread (using hipMalloc) will be
* allocated on device.
* - Any streams or events created from this host thread will be associated with device.
* - Any kernels launched from this host thread (using hipLaunchKernel) will be executed on device
* - Any kernels launched from this host thread (using hipLaunchKernelGGL) will be executed on device
* (unless a specific stream is specified, in which case the device associated with that stream will
* be used).
*
Expand Down
6 changes: 3 additions & 3 deletions Installation_Guide/HIP.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ Simple Example
################

The HIP API includes functions such as hipMalloc, hipMemcpy, and hipFree.
Programmers familiar with CUDA will also be able to quickly learn and start coding with the HIP API. Compute kernels are launched with the "hipLaunchKernel" macro call. Here is simple example showing a snippet of HIP API code:
Programmers familiar with CUDA will also be able to quickly learn and start coding with the HIP API. Compute kernels are launched with the "hipLaunchKernelGGL" macro call. Here is simple example showing a snippet of HIP API code:
::
hipMalloc(&A_d, Nbytes));
hipMalloc(&C_d, Nbytes));
Expand All @@ -65,7 +65,7 @@ Programmers familiar with CUDA will also be able to quickly learn and start codi
const unsigned blocks = 512;
const unsigned threadsPerBlock = 256;
hipLaunchKernel(vector_square, /* compute kernel*/
hipLaunchKernelGGL(vector_square, /* compute kernel*/
dim3(blocks), dim3(threadsPerBlock), 0/*dynamic shared*/, 0/*stream*/, /* launch config*/
C_d, A_d, N); /* arguments to the compute kernel */
Expand Down Expand Up @@ -128,7 +128,7 @@ Tour of the HIP Directories
* **Include:**

* **hip_runtime_api.h** : Defines HIP runtime APIs and can be compiled with many standard Linux compilers (hcc, GCC, ICC, CLANG, etc), in either C or C++ mode.
* **hip_runtime.h** : Includes everything in hip_runtime_api.h PLUS hipLaunchKernel and syntax for writing device kernels and device functions. hip_runtime.h can only be compiled with hcc.
* **hip_runtime.h** : Includes everything in hip_runtime_api.h PLUS hipLaunchKernelGGL and syntax for writing device kernels and device functions. hip_runtime.h can only be compiled with hcc.
* **hcc_detail/**** , ***nvcc_detail/**** : Implementation details for specific platforms. HIP applications should not include these files directly.
* **hcc.h** : Includes interop APIs for HIP and HCC
* **bin**: Tools and scripts to help with hip porting
Expand Down
2 changes: 1 addition & 1 deletion Programming_Guides/CUDAAPIHIPTEXTURE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ CUDA Runtime API functions supported by HIP
+--------------------------------+---------------------------+
| cudaGetParameterBufferV2 | |
+--------------------------------+---------------------------+
| cudaLaunchKernel | hipLaunchKernel |
| cudaLaunchKernel | hipLaunchKernelGGL |
+--------------------------------+---------------------------+
| cudaSetDoubleForDevice | |
+--------------------------------+---------------------------+
Expand Down
2 changes: 1 addition & 1 deletion Programming_Guides/HIP-FAQ.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ HIP provides the following:
* Memory management (hipMalloc(), hipMemcpy(), hipFree(), etc.)
* Streams (hipStreamCreate(),hipStreamSynchronize(), hipStreamWaitEvent(), etc.)
* Events (hipEventRecord(), hipEventElapsedTime(), etc.)
* Kernel launching (hipLaunchKernel is a standard C/C++ function that replaces <<< >>>)
* Kernel launching (hipLaunchKernelGGL is a standard C/C++ function that replaces <<< >>>)
* HIP Module API to control when adn how code is loaded.
* CUDA-style kernel coordinate functions (threadIdx, blockIdx, blockDim, gridDim)
* Cross-lane instructions including shfl, ballot, any, all
Expand Down
19 changes: 9 additions & 10 deletions Programming_Guides/HIP-GUIDE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Supported __global__ functions are
* Executed on the device
* Called ("launched") from the host

HIP __global__ functions must have a void return type, and the first parameter to a HIP __global__ function must have the type hipLaunchParm.see :ref:`Kernel`
HIP __global__ functions must have a void return type.

HIP lacks dynamic-parallelism support, so __global__ functions cannot be called from the device.

Expand All @@ -61,8 +61,8 @@ Calling __global__ Functions

__global__ functions are often referred to as kernels, and calling one is termed launching the kernel. These functions require the caller to specify an "execution configuration" that includes the grid and block dimensions. The execution configuration can also include other information for the launch, such as the amount of additional shared memory to allocate and the stream where the kernel should execute. HIP introduces a standard C++ calling convention to pass the execution configuration to the kernel (this convention replaces the Cuda <<< >>> syntax). In HIP,

* Kernels launch with the "hipLaunchKernel" function
* The first five parameters to hipLaunchKernel are the following:
* Kernels launch with the "hipLaunchKernelGGL" function
* The first five parameters to hipLaunchKernelGGL are the following:
* symbol kernelName: the name of the kernel to launch. To support template kernels which contains "," use the HIP_KERNEL_NAME macro. The hipify tools insert this automatically.
* dim3 gridDim: 3D-grid dimensions specifying the number of blocks to launch.
* dim3 blockDim: 3D-block dimensions specifying the number of threads in each block.
Expand All @@ -71,16 +71,16 @@ __global__ functions are often referred to as kernels, and calling one is termed
:ref:`Synchronization-Functions`).
* Kernel arguments follow these first five parameters ::
//Example pseudo code introducing hipLaunchKernel
__global__ MyKernel(hipLaunchParm lp, float *A, float *B, float *C, size_t N)
//Example pseudo code introducing hipLaunchKernelGGL
__global__ MyKernel(float *A, float *B, float *C, size_t N)
{
...
}
//Replace MyKernel<<<dim3(gridDim), dim3(gridDim), 0, 0>>> (a,b,c,n);
hipLaunchKernel(MyKernel, dim3(gridDim), dim3(groupDim), 0/*dynamicShared*/, 0/*stream), a, b, c, n)
hipLaunchKernelGGL(MyKernel, dim3(gridDim), dim3(groupDim), 0/*dynamicShared*/, 0/*stream), a, b, c, n)


The hipLaunchKernel macro always starts with the five parameters specified above, followed by the kernel arguments. The Hipify script automatically converts Cuda launch syntax to hipLaunchKernel, including conversion of optional arguments in <<< >>> to the five required hipLaunchKernel parameters. The :ref:`dim3` constructor accepts zero to three arguments and will by default initialize unspecified dimensions to 1. See dim3. The kernel uses the coordinate built-ins (hipThread*, hipBlock*, hipGrid*) to determine coordinate index and coordinate bounds of the work item that’s currently executing.
The hipLaunchKernelGGL macro always starts with the five parameters specified above, followed by the kernel arguments. The Hipify script automatically converts Cuda launch syntax to hipLaunchKernelGGL, including conversion of optional arguments in <<< >>> to the five required hipLaunchKernelGGL parameters. The :ref:`dim3` constructor accepts zero to three arguments and will by default initialize unspecified dimensions to 1. See dim3. The kernel uses the coordinate built-ins (hipThread*, hipBlock*, hipGrid*) to determine coordinate index and coordinate bounds of the work item that’s currently executing.

.. _Kernel:

Expand All @@ -98,8 +98,7 @@ Kernel-Launch Example

__global__
void
MyKernel (hipLaunchParm lp, /*lp parm for execution configuration */
const float *a, const float *b, float *c, unsigned N)
MyKernel (const float *a, const float *b, float *c, unsigned N)
{
unsigned gid = hipThreadIdx_x; // <- coordinate index function
if (gid < N) {
Expand All @@ -111,7 +110,7 @@ Kernel-Launch Example
float *a, *b, *c; // initialization not shown...
unsigned N = 1000000;
const unsigned blockSize = 256;
hipLaunchKernel(MyKernel,
hipLaunchKernelGGL(MyKernel,
(N/blockSize), dim3(blockSize), 0, 0, a,b,c,N);
}

Expand Down
2 changes: 1 addition & 1 deletion Programming_Guides/HIP-Terms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Table Comparing Syntax for Different Compute APIs
| Host + Device Function | __host__ | __host__ | [[hc]] [[cpu]] | restrict(amp,cpu) | No equivalent |
| | __device__ | __device__ | | | |
+------------------------+-----------------------+-----------------------+---------------------------------------------------+----------------------------------------------------+---------------------------------+
| Kernel Launch | <<< >>> | hipLaunchKernel | hc::parallel_for_each | concurrency::parallel_for_each | clEnqueueNDRangeKernel |
| Kernel Launch | <<< >>> |hipLaunchKernelGGL | hc::parallel_for_each | concurrency::parallel_for_each | clEnqueueNDRangeKernel |
+------------------------+-----------------------+-----------------------+---------------------------------------------------+----------------------------------------------------+---------------------------------+
| Global Memory | __global__ | __global__ | Unnecessary / Implied | Unnecessary / Implied | __global__ |
+------------------------+-----------------------+-----------------------+---------------------------------------------------+----------------------------------------------------+---------------------------------+
Expand Down
12 changes: 6 additions & 6 deletions Programming_Guides/HIP-porting-guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -295,10 +295,10 @@ Makefiles can use the following syntax to conditionally provide a default HIP_PA

HIP_PATH ?= $(shell hipconfig --path)

hipLaunchKernel
hipLaunchKernelGGL
~~~~~~~~~~~~~~~~

hipLaunchKernel is a variadic macro which accepts as parameters the launch configurations (grid dims, group dims, stream, dynamic shared size) followed by a variable number of kernel arguments. This sequence is then expanded into the appropriate kernel launch syntax depending on the platform.
hipLaunchKernelGGL is a variadic macro which accepts as parameters the launch configurations (grid dims, group dims, stream, dynamic shared size) followed by a variable number of kernel arguments. This sequence is then expanded into the appropriate kernel launch syntax depending on the platform.
While this can be a convenient single-line kernel launch syntax, the macro implementation can cause issues when nested inside other macros. For example, consider the following:

::
Expand All @@ -310,7 +310,7 @@ While this can be a convenient single-line kernel launch syntax, the macro imple
(command); /* The nested ( ) will cause compile error */\
}

MY_LAUNCH (hipLaunchKernel(vAdd, dim3(1024), dim3(1), 0, 0, Ad), true, "firstCall");
MY_LAUNCH (hipLaunchKernelGGL(vAdd, dim3(1024), dim3(1), 0, 0, Ad), true, "firstCall");

Avoid nesting macro parameters inside parenthesis - here's an alternative that will work:

Expand All @@ -322,7 +322,7 @@ Avoid nesting macro parameters inside parenthesis - here's an alternative that w
command;\
}

MY_LAUNCH (hipLaunchKernel(vAdd, dim3(1024), dim3(1), 0, 0, Ad), true, "firstCall");
MY_LAUNCH (hipLaunchKernelGGL(vAdd, dim3(1024), dim3(1), 0, 0, Ad), true, "firstCall");

Compiler Options
~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -441,7 +441,7 @@ Device Code:

__constant__ int Value[LEN];

__global__ void Get(hipLaunchParm lp, int *Ad)
__global__ void Get(int *Ad)
{
int tid = hipThreadIdx_x + hipBlockIdx_x * hipBlockDim_x;
Ad[tid] = Value[tid];
Expand All @@ -461,7 +461,7 @@ Device Code:
HIP_ASSERT(hipMalloc((void**)&Ad, SIZE));

HIP_ASSERT(hipMemcpyToSymbol(HIP_SYMBOL(Value), A, SIZE, 0, hipMemcpyHostToDevice));
hipLaunchKernel(Get, dim3(1,1,1), dim3(LEN,1,1), 0, 0, Ad);
hipLaunchKernelGGL(Get, dim3(1,1,1), dim3(LEN,1,1), 0, 0, Ad);
HIP_ASSERT(hipMemcpy(B, Ad, SIZE, hipMemcpyDeviceToHost));

for(unsigned i=0;i<LEN;i++)
Expand Down
2 changes: 1 addition & 1 deletion Programming_Guides/HIP-terminology.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ HIP terminology comparison with OpenCL, Cuda, C++ AMP and HCC
+-----------------------+---------------+-----------------+---------------------+------------------------+---------------------------+
|Kernel Launch | | | | concurrency:: | |
| | <<< >>> | hipLaunchKernel |hc:: | parallel_for_each |clEnqueueND- |
| | | |parallel_for_each | |RangeKernel |
| | | GGL |parallel_for_each | |RangeKernel |
+-----------------------+---------------+-----------------+---------------------+------------------------+---------------------------+
+-----------------------+---------------+-----------------+---------------------+------------------------+---------------------------+
| | | | | | |
Expand Down
19 changes: 9 additions & 10 deletions Programming_Guides/Kernel_language.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ Supported ``__global__`` functions are
* Executed on the device
* Called ("launched") from the host

HIP ``__global__`` functions must have a void return type, and the first parameter to a HIP ``__global__`` function must have the type hipLaunchParm. See `Kernel-Launch Example <https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#kernel-launch-example>`_ .
HIP ``__global__`` functions must have a void return type. See `Kernel-Launch Example <https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#kernel-launch-example>`_ .

HIP lacks dynamic-parallelism support, so ``__global__`` functions cannot be called from the device.

Expand All @@ -118,8 +118,8 @@ Calling __global__ Functions

``__global__`` functions are often referred to as kernels, and calling one is termed launching the kernel. These functions require the caller to specify an "execution configuration" that includes the grid and block dimensions. The execution configuration can also include other information for the launch, such as the amount of additional shared memory to allocate and the stream where the kernel should execute. HIP introduces a standard C++ calling convention to pass the execution configuration to the kernel (this convention replaces the Cuda <<< >>> syntax). In HIP,

* Kernels launch with the "hipLaunchKernel" function
* The first five parameters to hipLaunchKernel are the following:
* Kernels launch with the "hipLaunchKernelGGL" function
* The first five parameters to hipLaunchKernelGGL are the following:
* **symbol kernelName:** the name of the kernel to launch. To support template kernels which contains "," use the HIP_KERNEL_NAME macro. The hipify tools insert this automatically.
* **dim3 gridDim:** 3D-grid dimensions specifying the number of blocks to launch.
* **dim3 blockDim:** 3D-block dimensions specifying the number of threads in each block.
Expand All @@ -130,18 +130,18 @@ Calling __global__ Functions

::

// Example pseudo code introducing hipLaunchKernel:
__global__ MyKernel(hipLaunchParm lp, float *A, float *B, float *C, size_t N)
// Example pseudo code introducing hipLaunchKernelGGL:
__global__ MyKernel(float *A, float *B, float *C, size_t N)
{
...
}
// Replace MyKernel<<<dim3(gridDim), dim3(gridDim), 0, 0>>> (a,b,c,n);
hipLaunchKernel(MyKernel, dim3(gridDim), dim3(groupDim), 0/*dynamicShared*/, 0/*stream), a, b, c, n);
hipLaunchKernelGGL(MyKernel, dim3(gridDim), dim3(groupDim), 0/*dynamicShared*/, 0/*stream), a, b, c, n);

The hipLaunchKernel macro always starts with the five parameters specified above, followed by the kernel arguments. The Hipify script automatically converts Cuda launch syntax to hipLaunchKernel, including conversion of optional arguments in <<< >>> to the five required hipLaunchKernel parameters. The dim3 constructor accepts zero to three arguments and will by default initialize unspecified dimensions to 1. See `dim3 <https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#dim3>`_. The kernel uses the coordinate built-ins (hipThread*, hipBlock*, hipGrid*) to determine coordinate index and coordinate bounds of the work item that’s currently executing. See `Coordinate Built-Ins <https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#coordinate-built-ins>`_.
The hipLaunchKernelGGL macro always starts with the five parameters specified above, followed by the kernel arguments. The Hipify script automatically converts Cuda launch syntax to hipLaunchKernelGGL, including conversion of optional arguments in <<< >>> to the five required hipLaunchKernelGGL parameters. The dim3 constructor accepts zero to three arguments and will by default initialize unspecified dimensions to 1. See `dim3 <https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#dim3>`_. The kernel uses the coordinate built-ins (hipThread*, hipBlock*, hipGrid*) to determine coordinate index and coordinate bounds of the work item that’s currently executing. See `Coordinate Built-Ins <https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#coordinate-built-ins>`_.

.. _Kernel-Launch-Example:

Expand All @@ -159,8 +159,7 @@ Kernel-Launch Example
__global__
void
MyKernel (hipLaunchParm lp, /*lp parm for execution configuration */
const float *a, const float *b, float *c, unsigned N)
MyKernel (const float *a, const float *b, float *c, unsigned N)
{
unsigned gid = hipThreadIdx_x; // <- coordinate index function
if (gid < N) {
Expand All @@ -173,7 +172,7 @@ Kernel-Launch Example
unsigned N = 1000000;
const unsigned blockSize = 256;
hipLaunchKernel(MyKernel, dim3(N/blockSize), dim3(blockSize), 0, 0, a,b,c,N);
hipLaunchKernelGGL(MyKernel, dim3(N/blockSize), dim3(blockSize), 0, 0, a,b,c,N);
}
.. _Variable-Type-Qualifiers:
Expand Down
Loading

0 comments on commit e836b62

Please sign in to comment.