replacing hiplaunchkernel to hiplaunchkernelGGL and removing hiplaunc…

…hparm
settle · Oct 25, 2019 · e836b62 · e836b62
1 parent 9f6efd6
commit e836b62
Show file tree

Hide file tree

Showing 15 changed files with 45 additions and 47 deletions.
diff --git a/FAQ/FAQ_HIP.rst b/FAQ/FAQ_HIP.rst
@@ -14,7 +14,7 @@ HIP provides the following:
 * Memory management (hipMalloc(), hipMemcpy(), hipFree(), etc.)
 * Streams (hipStreamCreate(),hipStreamSynchronize(), hipStreamWaitEvent(),  etc.)
 * Events (hipEventRecord(), hipEventElapsedTime(), etc.)
-* Kernel launching (hipLaunchKernel is a standard C/C++ function that replaces <<< >>>)
+* Kernel launching (hipLaunchKernelGGL is a standard C/C++ function that replaces <<< >>>)
 * HIP Module API to control when adn how code is loaded.
 * CUDA*style kernel coordinate functions (threadIdx, blockIdx, blockDim, gridDim)
 * Cross*lane instructions including shfl, ballot, any, all

diff --git a/Inputs/hip_runtime_api.h b/Inputs/hip_runtime_api.h
@@ -328,7 +328,7 @@ hipError_t hipDeviceReset(void);
  * - Any device memory subsequently allocated from this host thread (using hipMalloc) will be
  * allocated on device.
  * - Any streams or events created from this host thread will be associated with device.
- * - Any kernels launched from this host thread (using hipLaunchKernel) will be executed on device
+ * - Any kernels launched from this host thread (using hipLaunchKernelGGL) will be executed on device
  * (unless a specific stream is specified, in which case the device associated with that stream will
  * be used).
  *

diff --git a/Installation_Guide/HIP.rst b/Installation_Guide/HIP.rst
@@ -56,7 +56,7 @@ Simple Example
 ################
 
 The HIP API includes functions such as hipMalloc, hipMemcpy, and hipFree.
-Programmers familiar with CUDA will also be able to quickly learn and start coding with the HIP API. Compute kernels are launched with the "hipLaunchKernel" macro call. Here is simple example showing a snippet of HIP API code:
+Programmers familiar with CUDA will also be able to quickly learn and start coding with the HIP API. Compute kernels are launched with the "hipLaunchKernelGGL" macro call. Here is simple example showing a snippet of HIP API code:
 ::
  hipMalloc(&A_d, Nbytes));
  hipMalloc(&C_d, Nbytes));
@@ -65,7 +65,7 @@ Programmers familiar with CUDA will also be able to quickly learn and start codi
  
  const unsigned blocks = 512;
  const unsigned threadsPerBlock = 256;
- hipLaunchKernel(vector_square,   /* compute kernel*/
+ hipLaunchKernelGGL(vector_square,   /* compute kernel*/
                  dim3(blocks), dim3(threadsPerBlock), 0/*dynamic shared*/, 0/*stream*/,     /* launch config*/
                  C_d, A_d, N);  /* arguments to the compute kernel */ 
  
@@ -128,7 +128,7 @@ Tour of the HIP Directories
  * **Include:**
 
 	* **hip_runtime_api.h** : Defines HIP runtime APIs and can be compiled with many standard Linux compilers (hcc, GCC, ICC, CLANG, etc), in either C or C++ mode.
-	* **hip_runtime.h** : Includes everything in hip_runtime_api.h PLUS hipLaunchKernel and syntax for writing device kernels and device functions. hip_runtime.h can only be compiled with hcc.
+	* **hip_runtime.h** : Includes everything in hip_runtime_api.h PLUS hipLaunchKernelGGL and syntax for writing device kernels and device functions. hip_runtime.h can only be compiled with hcc.
 	* **hcc_detail/**** , ***nvcc_detail/**** : Implementation details for specific platforms. HIP applications should not include these files directly.
 	* **hcc.h** : Includes interop APIs for HIP and HCC
  * **bin**: Tools and scripts to help with hip porting

diff --git a/Programming_Guides/CUDAAPIHIPTEXTURE.rst b/Programming_Guides/CUDAAPIHIPTEXTURE.rst
@@ -160,7 +160,7 @@ CUDA Runtime API functions supported by HIP
 +--------------------------------+---------------------------+
 |   cudaGetParameterBufferV2     |                           |
 +--------------------------------+---------------------------+
-|   cudaLaunchKernel             |   hipLaunchKernel         |
+|   cudaLaunchKernel             |   hipLaunchKernelGGL      |
 +--------------------------------+---------------------------+
 |   cudaSetDoubleForDevice       |                           |
 +--------------------------------+---------------------------+

diff --git a/Programming_Guides/HIP-FAQ.rst b/Programming_Guides/HIP-FAQ.rst
@@ -13,7 +13,7 @@ HIP provides the following:
     * Memory management (hipMalloc(), hipMemcpy(), hipFree(), etc.)
     * Streams (hipStreamCreate(),hipStreamSynchronize(), hipStreamWaitEvent(), etc.)
     * Events (hipEventRecord(), hipEventElapsedTime(), etc.)
-    * Kernel launching (hipLaunchKernel is a standard C/C++ function that replaces <<< >>>)
+    * Kernel launching (hipLaunchKernelGGL is a standard C/C++ function that replaces <<< >>>)
     * HIP Module API to control when adn how code is loaded.
     * CUDA-style kernel coordinate functions (threadIdx, blockIdx, blockDim, gridDim)
     * Cross-lane instructions including shfl, ballot, any, all

diff --git a/Programming_Guides/HIP-GUIDE.rst b/Programming_Guides/HIP-GUIDE.rst
@@ -39,7 +39,7 @@ Supported __global__ functions are
     * Executed on the device
     * Called ("launched") from the host
 
-HIP __global__ functions must have a void return type, and the first parameter to a HIP __global__ function must have the type hipLaunchParm.see :ref:`Kernel`
+HIP __global__ functions must have a void return type.
 
 HIP lacks dynamic-parallelism support, so __global__ functions cannot be called from the device.
 
@@ -61,8 +61,8 @@ Calling __global__ Functions
 
 __global__ functions are often referred to as kernels, and calling one is termed launching the kernel. These functions require the caller to specify an "execution configuration" that includes the grid and block dimensions. The execution configuration can also include other information for the launch, such as the amount of additional shared memory to allocate and the stream where the kernel should execute. HIP introduces a standard C++ calling convention to pass the execution configuration to the kernel (this convention replaces the Cuda <<< >>> syntax). In HIP,
 
-   * Kernels launch with the "hipLaunchKernel" function
-    * The first five parameters to hipLaunchKernel are the following:
+   * Kernels launch with the "hipLaunchKernelGGL" function
+    * The first five parameters to hipLaunchKernelGGL are the following:
        *  symbol kernelName: the name of the kernel to launch. To support template kernels which contains "," use the HIP_KERNEL_NAME 		                      macro. The hipify tools insert this automatically.
        *  dim3 gridDim: 3D-grid dimensions specifying the number of blocks to launch.
        *  dim3 blockDim: 3D-block dimensions specifying the number of threads in each block.
@@ -71,16 +71,16 @@ __global__ functions are often referred to as kernels, and calling one is termed
           :ref:`Synchronization-Functions`).
    * Kernel arguments follow these first five parameters ::
     
-      //Example pseudo code introducing hipLaunchKernel
-      __global__ MyKernel(hipLaunchParm lp, float *A, float *B, float *C, size_t N)
+      //Example pseudo code introducing hipLaunchKernelGGL
+      __global__ MyKernel(float *A, float *B, float *C, size_t N)
       {
       ...
       } 
       //Replace MyKernel<<<dim3(gridDim), dim3(gridDim), 0, 0>>> (a,b,c,n);
-      hipLaunchKernel(MyKernel, dim3(gridDim), dim3(groupDim), 0/*dynamicShared*/, 0/*stream), a, b, c, n)
+      hipLaunchKernelGGL(MyKernel, dim3(gridDim), dim3(groupDim), 0/*dynamicShared*/, 0/*stream), a, b, c, n)
 
 
-The hipLaunchKernel macro always starts with the five parameters specified above, followed by the kernel arguments. The Hipify script automatically converts Cuda launch syntax to hipLaunchKernel, including conversion of optional arguments in <<< >>> to the five required hipLaunchKernel parameters. The :ref:`dim3` constructor accepts zero to three arguments and will by default initialize unspecified dimensions to 1. See dim3. The kernel uses the coordinate built-ins (hipThread*, hipBlock*, hipGrid*) to determine coordinate index and coordinate bounds of the work item that’s currently executing. 
+The hipLaunchKernelGGL macro always starts with the five parameters specified above, followed by the kernel arguments. The Hipify script automatically converts Cuda launch syntax to hipLaunchKernelGGL, including conversion of optional arguments in <<< >>> to the five required hipLaunchKernelGGL parameters. The :ref:`dim3` constructor accepts zero to three arguments and will by default initialize unspecified dimensions to 1. See dim3. The kernel uses the coordinate built-ins (hipThread*, hipBlock*, hipGrid*) to determine coordinate index and coordinate bounds of the work item that’s currently executing. 
 
  .. _Kernel:
 
@@ -98,8 +98,7 @@ Kernel-Launch Example
 
   __global__ 
   void 
-  MyKernel (hipLaunchParm lp, /*lp parm for execution configuration */
-          const float *a, const float *b, float *c, unsigned N)
+  MyKernel (const float *a, const float *b, float *c, unsigned N)
   {
      unsigned gid = hipThreadIdx_x; // <- coordinate index function
      if (gid < N) {
@@ -111,7 +110,7 @@ Kernel-Launch Example
      float *a, *b, *c; // initialization not shown...
      unsigned N = 1000000;
      const unsigned blockSize = 256;
-     hipLaunchKernel(MyKernel, 
+     hipLaunchKernelGGL(MyKernel, 
    (N/blockSize), dim3(blockSize), 0, 0,  a,b,c,N);
   }
 

diff --git a/Programming_Guides/HIP-Terms.rst b/Programming_Guides/HIP-Terms.rst
@@ -40,7 +40,7 @@ Table Comparing Syntax for Different Compute APIs
 | Host + Device Function |   __host__            |   __host__            |   [[hc]] [[cpu]]                                  |   restrict(amp,cpu)                                | No equivalent                   |
 |                        |   __device__          |   __device__          |                                                   |                                                    |                                 |
 +------------------------+-----------------------+-----------------------+---------------------------------------------------+----------------------------------------------------+---------------------------------+
-| Kernel Launch          |   <<< >>>             |   hipLaunchKernel     |   hc::parallel_for_each                           |   concurrency::parallel_for_each                   |   clEnqueueNDRangeKernel        |
+| Kernel Launch          |   <<< >>>             |hipLaunchKernelGGL     |   hc::parallel_for_each                           |   concurrency::parallel_for_each                   |   clEnqueueNDRangeKernel        |
 +------------------------+-----------------------+-----------------------+---------------------------------------------------+----------------------------------------------------+---------------------------------+
 | Global Memory          |   __global__          |   __global__          | Unnecessary / Implied                             | Unnecessary / Implied                              |   __global__                    |
 +------------------------+-----------------------+-----------------------+---------------------------------------------------+----------------------------------------------------+---------------------------------+

diff --git a/Programming_Guides/HIP-porting-guide.rst b/Programming_Guides/HIP-porting-guide.rst
@@ -295,10 +295,10 @@ Makefiles can use the following syntax to conditionally provide a default HIP_PA
 
   HIP_PATH ?= $(shell hipconfig --path)
 
-hipLaunchKernel
+hipLaunchKernelGGL
 ~~~~~~~~~~~~~~~~
 
-hipLaunchKernel is a variadic macro which accepts as parameters the launch configurations (grid dims, group dims, stream, dynamic shared size) followed by a variable number of kernel arguments. This sequence is then expanded into the appropriate kernel launch syntax depending on the platform.
+hipLaunchKernelGGL is a variadic macro which accepts as parameters the launch configurations (grid dims, group dims, stream, dynamic shared size) followed by a variable number of kernel arguments. This sequence is then expanded into the appropriate kernel launch syntax depending on the platform.
 While this can be a convenient single-line kernel launch syntax, the macro implementation can cause issues when nested inside other macros. For example, consider the following:
 
 ::
@@ -310,7 +310,7 @@ While this can be a convenient single-line kernel launch syntax, the macro imple
      (command);   /* The nested ( ) will cause compile error */\
   }
 
-  MY_LAUNCH (hipLaunchKernel(vAdd, dim3(1024), dim3(1), 0, 0, Ad), true, "firstCall");
+  MY_LAUNCH (hipLaunchKernelGGL(vAdd, dim3(1024), dim3(1), 0, 0, Ad), true, "firstCall");
 
 Avoid nesting macro parameters inside parenthesis - here's an alternative that will work:
 
@@ -322,7 +322,7 @@ Avoid nesting macro parameters inside parenthesis - here's an alternative that w
      command;\ 
   }
 
-  MY_LAUNCH (hipLaunchKernel(vAdd, dim3(1024), dim3(1), 0, 0, Ad), true, "firstCall");
+  MY_LAUNCH (hipLaunchKernelGGL(vAdd, dim3(1024), dim3(1), 0, 0, Ad), true, "firstCall");
 
 Compiler Options
 ~~~~~~~~~~~~~~~~
@@ -441,7 +441,7 @@ Device Code:
 
   __constant__ int Value[LEN];
 
-  __global__ void Get(hipLaunchParm lp, int *Ad)
+  __global__ void Get(int *Ad)
   {
       int tid = hipThreadIdx_x + hipBlockIdx_x * hipBlockDim_x;
       Ad[tid] = Value[tid];
@@ -461,7 +461,7 @@ Device Code:
       HIP_ASSERT(hipMalloc((void**)&Ad, SIZE));
 
       HIP_ASSERT(hipMemcpyToSymbol(HIP_SYMBOL(Value), A, SIZE, 0, hipMemcpyHostToDevice));
-      hipLaunchKernel(Get, dim3(1,1,1), dim3(LEN,1,1), 0, 0, Ad);
+      hipLaunchKernelGGL(Get, dim3(1,1,1), dim3(LEN,1,1), 0, 0, Ad);
       HIP_ASSERT(hipMemcpy(B, Ad, SIZE, hipMemcpyDeviceToHost));
 
       for(unsigned i=0;i<LEN;i++)

diff --git a/Programming_Guides/HIP-terminology.rst b/Programming_Guides/HIP-terminology.rst
@@ -54,7 +54,7 @@ HIP terminology comparison with OpenCL, Cuda, C++ AMP and HCC
 +-----------------------+---------------+-----------------+---------------------+------------------------+---------------------------+
 |Kernel Launch          |               |                 |                     | concurrency::          |                           |
 |                       |   <<< >>>     | hipLaunchKernel |hc::                 | parallel_for_each      |clEnqueueND-               |
-|                       |               |                 |parallel_for_each    |                        |RangeKernel                |
+|                       |               |   GGL           |parallel_for_each    |                        |RangeKernel                |
 +-----------------------+---------------+-----------------+---------------------+------------------------+---------------------------+
 +-----------------------+---------------+-----------------+---------------------+------------------------+---------------------------+
 |                       |               |                 |                     |                        |                           |

diff --git a/Programming_Guides/Kernel_language.rst b/Programming_Guides/Kernel_language.rst
@@ -93,7 +93,7 @@ Supported ``__global__`` functions are
 * Executed on the device
 * Called ("launched") from the host
 
-HIP ``__global__`` functions must have a void return type, and the first parameter to a HIP ``__global__`` function must have the type hipLaunchParm. See `Kernel-Launch Example <https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#kernel-launch-example>`_ .
+HIP ``__global__`` functions must have a void return type. See `Kernel-Launch Example <https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#kernel-launch-example>`_ .
 
 HIP lacks dynamic-parallelism support, so ``__global__`` functions cannot be called from the device.
 
@@ -118,8 +118,8 @@ Calling __global__ Functions
 
 ``__global__`` functions are often referred to as kernels, and calling one is termed launching the kernel. These functions require the caller to specify an "execution configuration" that includes the grid and block dimensions. The execution configuration can also include other information for the launch, such as the amount of additional shared memory to allocate and the stream where the kernel should execute. HIP introduces a standard C++ calling convention to pass the execution configuration to the kernel (this convention replaces the Cuda <<< >>> syntax). In HIP,
 
-* Kernels launch with the "hipLaunchKernel" function
-* The first five parameters to hipLaunchKernel are the following:
+* Kernels launch with the "hipLaunchKernelGGL" function
+* The first five parameters to hipLaunchKernelGGL are the following:
 	* **symbol kernelName:** the name of the kernel to launch. To support template kernels which contains "," use the HIP_KERNEL_NAME macro. The hipify tools insert this automatically.
 	* **dim3 gridDim:** 3D-grid dimensions specifying the number of blocks to launch.
 	* **dim3 blockDim:** 3D-block dimensions specifying the number of threads in each block.
@@ -130,18 +130,18 @@ Calling __global__ Functions
 
 ::
 
-  // Example pseudo code introducing hipLaunchKernel:
-  __global__ MyKernel(hipLaunchParm lp, float *A, float *B, float *C, size_t N)
+  // Example pseudo code introducing hipLaunchKernelGGL:
+  __global__ MyKernel(float *A, float *B, float *C, size_t N)
   {
   ...
   } 
  
   // Replace MyKernel<<<dim3(gridDim), dim3(gridDim), 0, 0>>> (a,b,c,n);
   
-  hipLaunchKernel(MyKernel, dim3(gridDim), dim3(groupDim), 0/*dynamicShared*/, 0/*stream), a, b, c, n);
+  hipLaunchKernelGGL(MyKernel, dim3(gridDim), dim3(groupDim), 0/*dynamicShared*/, 0/*stream), a, b, c, n);
  
 
-The hipLaunchKernel macro always starts with the five parameters specified above, followed by the kernel arguments. The Hipify script automatically converts Cuda launch syntax to hipLaunchKernel, including conversion of optional arguments in <<< >>> to the five required hipLaunchKernel parameters. The dim3 constructor accepts zero to three arguments and will by default initialize unspecified dimensions to 1. See `dim3 <https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#dim3>`_. The kernel uses the coordinate built-ins (hipThread*, hipBlock*, hipGrid*) to determine coordinate index and coordinate bounds of the work item that’s currently executing. See `Coordinate Built-Ins <https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#coordinate-built-ins>`_.
+The hipLaunchKernelGGL macro always starts with the five parameters specified above, followed by the kernel arguments. The Hipify script automatically converts Cuda launch syntax to hipLaunchKernelGGL, including conversion of optional arguments in <<< >>> to the five required hipLaunchKernelGGL parameters. The dim3 constructor accepts zero to three arguments and will by default initialize unspecified dimensions to 1. See `dim3 <https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#dim3>`_. The kernel uses the coordinate built-ins (hipThread*, hipBlock*, hipGrid*) to determine coordinate index and coordinate bounds of the work item that’s currently executing. See `Coordinate Built-Ins <https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#coordinate-built-ins>`_.
 
 .. _Kernel-Launch-Example:
 
@@ -159,8 +159,7 @@ Kernel-Launch Example
  
   __global__ 
   void 
-  MyKernel (hipLaunchParm lp, /*lp parm for execution configuration */
-           const float *a, const float *b, float *c, unsigned N)
+  MyKernel (const float *a, const float *b, float *c, unsigned N)
   {
       unsigned gid = hipThreadIdx_x; // <- coordinate index function
       if (gid < N) {
@@ -173,7 +172,7 @@ Kernel-Launch Example
      unsigned N = 1000000;
      const unsigned blockSize = 256; 
  
-     hipLaunchKernel(MyKernel, dim3(N/blockSize), dim3(blockSize), 0, 0,  a,b,c,N);
+     hipLaunchKernelGGL(MyKernel, dim3(N/blockSize), dim3(blockSize), 0, 0,  a,b,c,N);
   }
  
 .. _Variable-Type-Qualifiers: