Skip to content
This repository has been archived by the owner on Jun 5, 2023. It is now read-only.

Commit

Permalink
Revert "Add cleanup_text.sh script, and remove non-ASCII characters f…
Browse files Browse the repository at this point in the history
…rom RST code blocks"
  • Loading branch information
Rmalavally authored Apr 8, 2020
1 parent d6188df commit 611e249
Show file tree
Hide file tree
Showing 17 changed files with 119 additions and 315 deletions.
6 changes: 3 additions & 3 deletions Deep_learning/GCN-asm-tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,9 +71,9 @@ The host program should also allocate memory for the in, index and out buffers.
out = AllocateBuffer(size);

// Fill Kernarg memory
Kernarg(in); // Add base pointer to "in" buffer
Kernarg(index); // Append base pointer to "index" buffer
Kernarg(out); // Append base pointer to "out" buffer
Kernarg(in); // Add base pointer to “in” buffer
Kernarg(index); // Append base pointer to index buffer
Kernarg(out); // Append base pointer to out buffer

Initial Wavefront and Register State To launch a kernel in real hardware, the run time needs information about the kernel, such as

Expand Down
6 changes: 3 additions & 3 deletions GCN_ISA_Manuals/GCN-ISA-Manuals.rst
Original file line number Diff line number Diff line change
Expand Up @@ -90,9 +90,9 @@ The host program should also allocate memory for the in, index and out buffers.
out = AllocateBuffer(size);

// Fill Kernarg memory
Kernarg(in); // Add base pointer to "in" buffer
Kernarg(index); // Append base pointer to "index" buffer
Kernarg(out); // Append base pointer to "out" buffer
Kernarg(in); // Add base pointer to “in” buffer
Kernarg(index); // Append base pointer to index buffer
Kernarg(out); // Append base pointer to out buffer

Initial Wavefront and Register State To launch a kernel in real hardware, the run time needs information about the kernel, such as

Expand Down
6 changes: 3 additions & 3 deletions Installation_Guide/FAQ-on-Installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ This problem can occur on Fedora installation if several previous kernels are cu
This is not an issue with the YUM repository; it is caused by the size of the /boot filesystem and the size of the kernels already installed on it. This issue can be fixed by uninstalling previous versions of the rocm Linux kernel:
::
sudo dnf remove rocm
rpm -qa | grep kfd | xargs sudo rpm -e
rpm -qa | grep kfd | xargs sudo rpm e
sudo dnf install rocm
Installing from an archived repository
Expand All @@ -104,7 +104,7 @@ Here is an Example:

cd /temp && wget http://repo.radeon.com/rocm/archive/apt_1.6.3.tar.bz2
tar -xvf apt_1.6.3.tar.bz2
sudo echo "deb [amd64] file://temp/apt_1.6.3 xenial main" > /etc/apt/sources.lists.d/rocm.local.list
sudo echo deb [amd64] file://temp/apt_1.6.3 xenial main > /etc/apt/sources.lists.d/rocm.local.list
sudo apt-get update && sudo apt-get install rocm

Users should make sure that no other list files contain another rocm repo configuration.
Expand All @@ -119,7 +119,7 @@ Add a /etc/yum.d/rocm.local.repo file with the following contents: ::
enabled=1
gpgcheck=0
cd /temp && wget http://repo.radeon.com/rocm/archive/yum_1.6.3.tar.bz2
tar -xvf yum_1.6.3.tar.bz2
tar xvf yum_1.6.3.tar.bz2

Then execute: ::

Expand Down
2 changes: 1 addition & 1 deletion Installation_Guide/HCC-Compiler.rst
Original file line number Diff line number Diff line change
Expand Up @@ -173,4 +173,4 @@ For applications compiled using hcc, ThinLTO could significantly improve link-ti
ThinLTO Phase 2 - Under development
**************************************

This ThinLTO implementation which will use llvm-lto LLVM tool to replace clamp-device bash script. It adds an optllc option into ThinLTOGenerator, which will perform in-program opt and codegen in parallel.
This ThinLTO implementation which will use llvm-lto LLVM tool to replace clamp-device bash script. It adds an optllc option into ThinLTOGenerator, which will perform in-program opt and codegen in parallel.
4 changes: 2 additions & 2 deletions Installation_Guide/Installation-Guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ To install ROCm on your system, follow the instructions below:
enabled=1
gpgcheck=0

Note: The URL of the repository must point to the location of the repositories' repodata database.
Note: The URL of the repository must point to the location of the repositories repodata database.

3. Install ROCm components using the following command:

Expand Down Expand Up @@ -352,7 +352,7 @@ The following section tells you how to perform an install and uninstall ROCm on

::

sudo zypper clean -all
sudo zypper clean all
sudo zypper addrepo --no-gpgcheck http://repo.radeon.com/rocm/zyp/zypper/ rocm
sudo zypper ref
zypper install rocm-dkms
Expand Down
4 changes: 2 additions & 2 deletions Installation_Guide/Quick Start Installation Guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ To install ROCm on your system, follow the instructions below:
enabled=1
gpgcheck=0

Note: The URL of the repository must point to the location of the repositories' repodata database.
Note: The URL of the repository must point to the location of the repositories repodata database.

3. Install ROCm components using the following command:

Expand Down Expand Up @@ -363,7 +363,7 @@ The following section tells you how to perform an install and uninstall ROCm on

::

sudo zypper clean -all
sudo zypper clean all
sudo zypper addrepo --no-gpgcheck http://repo.radeon.com/rocm/zyp/zypper/ rocm
sudo zypper ref
zypper install rocm-dkms
Expand Down
2 changes: 1 addition & 1 deletion Programming_Guides/HIP-GUIDE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ __global__ functions are often referred to as kernels, and calling one is termed
hipLaunchKernelGGL(MyKernel, dim3(gridDim), dim3(groupDim), 0/*dynamicShared*/, 0/*stream), a, b, c, n)


The hipLaunchKernelGGL macro always starts with the five parameters specified above, followed by the kernel arguments. The Hipify script automatically converts Cuda launch syntax to hipLaunchKernelGGL, including conversion of optional arguments in <<< >>> to the five required hipLaunchKernelGGL parameters. The :ref:`dim3` constructor accepts zero to three arguments and will by default initialize unspecified dimensions to 1. See dim3. The kernel uses the coordinate built-ins (hipThread*, hipBlock*, hipGrid*) to determine coordinate index and coordinate bounds of the work item that's currently executing.
The hipLaunchKernelGGL macro always starts with the five parameters specified above, followed by the kernel arguments. The Hipify script automatically converts Cuda launch syntax to hipLaunchKernelGGL, including conversion of optional arguments in <<< >>> to the five required hipLaunchKernelGGL parameters. The :ref:`dim3` constructor accepts zero to three arguments and will by default initialize unspecified dimensions to 1. See dim3. The kernel uses the coordinate built-ins (hipThread*, hipBlock*, hipGrid*) to determine coordinate index and coordinate bounds of the work item thats currently executing.

.. _Kernel:

Expand Down
2 changes: 1 addition & 1 deletion Programming_Guides/Kernel_language.rst
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ Calling __global__ Functions
hipLaunchKernelGGL(MyKernel, dim3(gridDim), dim3(groupDim), 0/*dynamicShared*/, 0/*stream), a, b, c, n);

The hipLaunchKernelGGL macro always starts with the five parameters specified above, followed by the kernel arguments. The Hipify script automatically converts Cuda launch syntax to hipLaunchKernelGGL, including conversion of optional arguments in <<< >>> to the five required hipLaunchKernelGGL parameters. The dim3 constructor accepts zero to three arguments and will by default initialize unspecified dimensions to 1. See `dim3 <https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#dim3>`_. The kernel uses the coordinate built-ins (hipThread*, hipBlock*, hipGrid*) to determine coordinate index and coordinate bounds of the work item that's currently executing. See `Coordinate Built-Ins <https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#coordinate-built-ins>`_.
The hipLaunchKernelGGL macro always starts with the five parameters specified above, followed by the kernel arguments. The Hipify script automatically converts Cuda launch syntax to hipLaunchKernelGGL, including conversion of optional arguments in <<< >>> to the five required hipLaunchKernelGGL parameters. The dim3 constructor accepts zero to three arguments and will by default initialize unspecified dimensions to 1. See `dim3 <https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#dim3>`_. The kernel uses the coordinate built-ins (hipThread*, hipBlock*, hipGrid*) to determine coordinate index and coordinate bounds of the work item thats currently executing. See `Coordinate Built-Ins <https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#coordinate-built-ins>`_.

.. _Kernel-Launch-Example:

Expand Down
6 changes: 3 additions & 3 deletions Programming_Guides/Opencl-optimization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1543,7 +1543,7 @@ In the second block of code, the ``?:`` operator executes in the vector units, s
a[idx] = d[idx];
}

This is inefficient because the GPU compiler must know the base pointer that every load comes from and in this situation, the compiler cannot determine what aEUR~d' points to. So, both B and C are assigned to the same GPU resource, removing the ability to do certain optimizations.
This is inefficient because the GPU compiler must know the base pointer that every load comes from and in this situation, the compiler cannot determine what ‘d' points to. So, both B and C are assigned to the same GPU resource, removing the ability to do certain optimizations.

*If the algorithm allows changing the work-group size, it is possible to get better performance by using larger work-groups (more work-items in each work-group) because the workgroup creation overhead is reduced. On the other hand, the OpenCL CPU runtime uses a task-stealing algorithm at the work-group level, so when the kernel execution time differs because it contains conditions and/or loops of varying number of iterations, it might be better to increase the number of work-groups. This gives the runtime more flexibility in scheduling work-groups to idle CPU cores. Experimentation might be needed to reach optimal work-group size.
*Since the AMD OpenCL runtime supports only in-order queuing, using clFinish() on a queue and queuing a blocking command gives the same result. The latter saves the overhead of another API command.
Expand Down Expand Up @@ -2230,7 +2230,7 @@ The following are sample kernels with different coalescing patterns.
{
int gid = get_global_id(0);
if((gid & 0x1) == 0) {
gid = (gid & (Eoe63)) +62 - get_local_id(0);
gid = (gid & (˜63)) +62 - get_local_id(0);
}
output[gid] = input[gid];
return;
Expand Down Expand Up @@ -3242,7 +3242,7 @@ In the second block of code, the ``?:`` operator executes in an ALU clause, so n
}


This is inefficient because the GPU compiler must know the base pointer that every load comes from and in this situation, the compiler cannot determine what aEUR~d' points to. So, both B and C are assigned to the same GPU resource, removing the ability to do certain optimizations.
This is inefficient because the GPU compiler must know the base pointer that every load comes from and in this situation, the compiler cannot determine what ‘d' points to. So, both B and C are assigned to the same GPU resource, removing the ability to do certain optimizations.
* If the algorithm allows changing the work-group size, it is possible to get better performance by using larger work-groups (more work-items in each work-group) because the workgroup creation overhead is reduced. On the other hand, the OpenCL CPU runtime uses a task-stealing algorithm at the work-group level, so when the kernel execution time differs because it contains conditions and/or loops of varying number of iterations, it might be better to increase the number of work-groups. This gives the runtime more flexibility in scheduling work-groups to idle CPU cores. Experimentation might be needed to reach optimal work-group size.
* Since the AMD OpenCL runtime supports only in-order queuing, using ``clFinish`` () on a queue and queuing a blocking command gives the same result. The latter saves the overhead of another API command.
Expand Down
14 changes: 7 additions & 7 deletions Programming_Guides/Opencl-programming-guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1469,7 +1469,7 @@ Now, these headers can be passed as embedded headers along with the program obje
::

cl_program input_headers[2] = { foo_pg, myinc_pg };
char * input_header_names[2] = { "foo.h", "mydir/myinc.h" };
char * input_header_names[2] = { foo.h”, “mydir/myinc.h };

clCompileProgram(program_A, 0, NULL, // num_devices & device_list
NULL, // compile_options
Expand Down Expand Up @@ -1660,7 +1660,7 @@ A sample kernel definition is shown below.

kernel void sample_kernel( global const uchar *normalPtr, global uchar *svmPtr)
{
...
}
To create a kernel object for the above kernel, you must pass the program object corresponding to the kernel to the clCreateKernel function. Assuming that the program object containing the above kernel function has been created and built as program, a kernel object for the above kernel would be created as follows:
Expand Down Expand Up @@ -2139,7 +2139,7 @@ OpenCL Language types.
MyFunc ()
{
tempClass = new(Test);
... // Some OpenCL startup code - create context, queue, etc.
... // Some OpenCL startup code create context, queue, etc.
cl_mem classObj = clCreateBuffer(context, CL_MEM_USE_HOST_PTR, sizeof(Test), &tempClass, event);
clEnqueueMapBuffer(...,classObj,...);
tempClass.setX(10);
Expand Down Expand Up @@ -2393,9 +2393,9 @@ Generic example

In OpenCL 1.2, the developer needed to write three functions for a pointer p that can reference the local, private, or global address space::
void fooL (local int *p) { ... }
void fooP (private int *p) { ... }
void fooG (global int *p) { ... }
void fooL (local int *p) { }
void fooP (private int *p) { }
void fooG (global int *p) { }


Expand Down Expand Up @@ -2967,7 +2967,7 @@ There are special directives for the OpenCL compiler to enable or disable availa
#pragma OPENCL EXTENSION all: <behavior>


The <extension_name> is described in Section A.1, "Extension Name
The <extension_name> is described in Section A.1, Extension Name
Convention.”. The second form allows to address all extensions at once. The <behavior> token can be either:

* **enable** - the extension is enabled if it is supported, or the error is reported if the specified extension is not supported or token “all” is used.
Expand Down
4 changes: 2 additions & 2 deletions ROCm_API_References/HCC-API.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ For example:

::

`` hcchcc-config -cxxflags -ldflagsfoo.cpp -o foo ``
`` hcchcc-config cxxflags ldflagsfoo.cpp -o foo ``

HCC built-in macros
********************
Expand Down Expand Up @@ -143,4 +143,4 @@ HC supports capturing memory pointer by a GPU kernel.

``` // allocate GPU memory through the HSA API int* gpu_pointer; hsa_memory_allocate(..., &gpu_pointer); ... parallel_for_each(ext, [=](index i) [[hc]] { gpu_pointer[i[0]]++; }
``` For HSA APUs that supports system wide shared virtual memory, a GPU kernel can directly access system memory allocated by the host: ``` int* cpu_memory = (int*) malloc(...); ... parallel_for_each(ext, [=](index i) [[hc]] { cpu_memory[i[0]]++; }); ```
``` For HSA APUs that supports system wide shared virtual memory, a GPU kernel can directly access system memory allocated by the host: ``` int* cpu_memory = (int*) malloc(...); ... parallel_for_each(ext, [=](index i) [[hc]] { cpu_memory[i[0]]++; }); ```
18 changes: 9 additions & 9 deletions ROCm_Compiler_SDK/ROCm-Native-ISA.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2265,7 +2265,7 @@ DS
ds_min_rtn_f64 v[8:9], v2, v[4:5]


For full list of supported instructions, refer to "LDS/GDS instructions" in ISA Manual.
For full list of supported instructions, refer to LDS/GDS instructions in ISA Manual.

.. _FLAT:

Expand All @@ -2280,7 +2280,7 @@ FLAT
flat_atomic_fmax_x2 v[1:2], v[3:4], v[5:6] glc


For full list of supported instructions, refer to "FLAT instructions" in ISA Manual.
For full list of supported instructions, refer to FLAT instructions in ISA Manual.


.. _MUBUF:
Expand All @@ -2295,7 +2295,7 @@ MUBUF
buffer_wbinvl1
buffer_atomic_inc v1, v2, s[8:11], s4 idxen offset:4 slc

For full list of supported instructions, refer to "MUBUF Instructions" in ISA Manual.
For full list of supported instructions, refer to MUBUF Instructions in ISA Manual.

.. _SMRD/SMEM:

Expand All @@ -2309,7 +2309,7 @@ SMRD/SMEM
s_dcache_inv_vol
s_memtime s[4:5]

For full list of supported instructions, refer to "Scalar Memory Operations" in ISA Manual.
For full list of supported instructions, refer to Scalar Memory Operations in ISA Manual.

.. _SOP1:

Expand All @@ -2325,7 +2325,7 @@ SOP1
s_swappc_b64 s[2:3], s[4:5]
s_cbranch_join s[4:5]

For full list of supported instructions, refer to "SOP1 Instructions" in ISA Manual.
For full list of supported instructions, refer to SOP1 Instructions in ISA Manual.

.. _SOP2:

Expand All @@ -2343,7 +2343,7 @@ SOP2
s_bfe_i64 s[2:3], s[4:5], s6
s_cbranch_g_fork s[4:5], s[6:7]

For full list of supported instructions, refer to "SOP2 Instructions" in ISA Manual.
For full list of supported instructions, refer to SOP2 Instructions in ISA Manual.

.. _SOPC:

Expand All @@ -2356,7 +2356,7 @@ SOPC
s_bitcmp0_b64 s[2:3], s4
s_setvskip s3, s5

For full list of supported instructions, refer to "SOPC Instructions" in ISA Manual.
For full list of supported instructions, refer to SOPC Instructions in ISA Manual.

.. _SOPP:

Expand All @@ -2376,7 +2376,7 @@ SOPP
s_sendmsg sendmsg(MSG_INTERRUPT)
s_trap 1

For full list of supported instructions, refer to "SOPP Instructions" in ISA Manual.
For full list of supported instructions, refer to SOPP Instructions in ISA Manual.

Unless otherwise mentioned, little verification is performed on the operands of SOPP Instructions, so it is up to the programmer to be familiar with the range or acceptable values.

Expand Down Expand Up @@ -2434,7 +2434,7 @@ VOP_SDWA examples
v_fract_f32 v0, |v0| dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1
v_cmpx_le_u32 vcc, v1, v2 src0_sel:BYTE_2 src1_sel:WORD_0

For full list of supported instructions, refer to "Vector ALU instructions".
For full list of supported instructions, refer to Vector ALU instructions.


.. _Code Object V2 Predefined Symbols (-mattr=-code-object-v3):
Expand Down
Loading

0 comments on commit 611e249

Please sign in to comment.