diff --git a/CMake/UsingCMakewithAMDROCm.rst b/CMake/UsingCMakewithAMDROCm.rst new file mode 100644 index 00000000..565613c1 --- /dev/null +++ b/CMake/UsingCMakewithAMDROCm.rst @@ -0,0 +1,168 @@ + +=========================== +Using CMake with AMD ROCm +=========================== + +Most components in AMD ROCm support CMake 3.5 or higher out-of-the-box and do not require any special Find modules. A Find module is often used by +downstream to find the files by guessing locations of files with platform-specific hints. Typically, the Find module is required when the +upstream is not built with CMake or the package configuration files are not available. + +AMD ROCm provides the respective *config-file* packages, and this enables ``find_package`` to be used directly. AMD ROCm does not require any Find +module as the *config-file* packages are shipped with the upstream projects. + +Finding Dependencies +-------------------- + +When dependencies are not found in standard locations such as */usr* or */usr/local*, then the ``CMAKE_PREFIX_PATH`` variable can be set to the +installation prefixes. This can be set to multiple locations with a semicolon separating the entries. + +There are two ways to set this variable: + +- Pass the flag when configuring with ``-DCMAKE_PREFIX_PATH=....`` This approach is preferred when users install the components in custom + locations.  + +- Append the variable in the CMakeLists.txt file. This is useful if the dependencies are found in a common location. For example, when + the binaries provided on `repo.radeon.com `_ are installed to */opt/rocm*, you can add the following line to a CMakeLists.txt file + + ::  + + list (APPEND CMAKE_PREFIX_PATH /opt/rocm/hip /opt/rocm) + + + +Using HIP in CMake +================== + +There are two ways to use HIP in CMake: + +- Use the HIP API without compiling the GPU device code. As there is no GPU code, any C or C++ compiler can be used. + The ``find_package(hip)`` provides the ``hip::host`` target to use HIP in this context + +:: + + # Search for rocm in common locations + list(APPEND CMAKE_PREFIX_PATH /opt/rocm/hip /opt/rocm) + # Find hip + find_package(hip) + # Create the library + add_library(myLib ...) + # Link with HIP + target_link_libraries(myLib hip::host) + +.. note:: + The ``hip::host`` target provides all the usage requirements needed to use HIP without compiling GPU device code. + +- Use HIP API and compile GPU device code. This requires using a + device compiler. The compiler for CMake can be set using either the + ``CMAKE_C_COMPILER`` and ``CMAKE_CXX_COMPILER`` variable or using the ``CC`` and + ``CXX`` environment variables. This can be set when configuring CMake or + put into a CMake toolchain file. The device compiler must be set to a + compiler that supports AMD GPU targets, which is usually Clang.  + +The ``find_package(hip)`` provides the ``hip::device`` target to add all the +flags for device compilation + +:: + + # Search for rocm in common locations + list(APPEND CMAKE_PREFIX_PATH /opt/rocm/hip /opt/rocm) + # Find hip + find_package(hip) + # Create library + add_library(myLib ...) + # Link with HIP + target_link_libraries(myLib hip::device) + +This project can then be configured with:: + + cmake -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ .. + +Which uses the device compiler provided from the binary packages from +`repo.radeon.com `_. + +.. note:: + Compiling for the GPU device requires at least C++11. This can be + enabled by setting ``CMAKE_CXX_STANDARD`` or setting the correct compiler flags + in the CMake toolchain. + +The GPU device code can be built for different GPU architectures by +setting the ``GPU_TARGETS`` variable. By default, this is set to all the +currently supported architectures for AMD ROCm. It can be set by passing +the flag during configuration with ``-DGPU_TARGETS=gfx900``. It can also be +set in the CMakeLists.txt as a cached variable before calling +``find_package(hip)``:: + + # Set the GPU to compile for + set(GPU_TARGETS "gfx900" CACHE STRING "GPU targets to compile for") + # Search for rocm in common locations + list(APPEND CMAKE_PREFIX_PATH /opt/rocm/hip /opt/rocm) + # Find hip + find_package(hip) + +Using AMD ROCm Libraries +======================== + +Libraries such as rocBLAS, MIOpen, and others support CMake users as +well. + +As illustrated in the example below, to use MIOpen from CMake, you can +call ``find_package(miopen)``, which provides the ``MIOpen`` CMake target. This +can be linked with ``target_link_libraries``:: + + # Search for rocm in common locations + list(APPEND CMAKE_PREFIX_PATH /opt/rocm/hip /opt/rocm) + # Find miopen + find_package(miopen) + # Create library + add_library(myLib ...) + # Link with miopen + target_link_libraries(myLib MIOpen) + +.. note:: + Most libraries are designed as host-only API, so using a GPU device + compiler is not necessary for downstream projects unless it uses the GPU + device code. + + +ROCm CMake Packages +=================== + ++-----------+----------+-------------------------------------------------------+ +| Component | Package | Targets | ++===========+==========+=======================================================+ +| HIP | hip | hip::host, hip::device | ++-----------+----------+-------------------------------------------------------+ +| rocPRIM | rocprim | roc::rocprim | ++-----------+----------+-------------------------------------------------------+ +| rocThrust | rocthrust| roc::rocthrust | ++-----------+----------+-------------------------------------------------------+ +| hipCUB | hipcub | hip::hipcub | ++-----------+----------+-------------------------------------------------------+ +| rocRAND | rocrand | roc::rocrand | ++-----------+----------+-------------------------------------------------------+ +| rocBLAS | rocblas | roc::rocblas | ++-----------+----------+-------------------------------------------------------+ +| rocSOLVER | rocsolver| roc::rocsolver | ++-----------+----------+-------------------------------------------------------+ +| hipBLAS | hipblas | roc::hipblas | ++-----------+----------+-------------------------------------------------------+ +| rocFFT | rocfft | roc::rocfft | ++-----------+----------+-------------------------------------------------------+ +| hipFFT | hipfft | hip::hipfft | ++-----------+----------+-------------------------------------------------------+ +| rocSPARSE | rocsparse| roc::rocsparse | ++-----------+----------+-------------------------------------------------------+ +| hipSPARSE | hipsparse|roc::hipsparse | ++-----------+----------+-------------------------------------------------------+ +| rocALUTION|rocalution| roc::rocalution | ++-----------+----------+-------------------------------------------------------+ +| RCCL | rccl | rccl | ++-----------+----------+-------------------------------------------------------+ +| MIOpen | miopen | MIOpen | ++-----------+----------+-------------------------------------------------------+ +| MIGraphX | migraphx | migraphx::migraphx, migraphx::migraphx_c, | +| | | migraphx::migraphx_cpu, migraphx::migraphx_gpu, | +| | | migraphx::migraphx_onnx, migraphx::migraphx_tf | ++-----------+----------+-------------------------------------------------------+ + + diff --git a/Current_Release_Notes/Current-Release-Notes.rst b/Current_Release_Notes/Current-Release-Notes.rst index a73eaf52..30a77dc1 100644 --- a/Current_Release_Notes/Current-Release-Notes.rst +++ b/Current_Release_Notes/Current-Release-Notes.rst @@ -1,197 +1,1144 @@ -.. image:: /Current_Release_Notes/amdblack.jpg +.. image:: Currrent_Release_Notes/amdblack.jpg | -============================================================= -AMD Radeon Open Compute platforM (ROCm) Release Notes v3.3.0 -============================================================= -April 1st, 2020 +================================ +AMD ROCm™ Release Notes v5.0 +================================ +February, 2022 -What Is ROCm? -============== -ROCm is designed to be a universal platform for gpu-accelerated computing. This modular design allows hardware vendors to build drivers that support the ROCm framework. ROCm is also designed to integrate multiple programming languages and makes it easy to add support for other languages. +AMD ROCm v5.0 Release Notes +============================ -Note: You can also clone the source code for individual ROCm components from the GitHub repositories. +ROCm Installation Updates +========================= -ROCm Components -~~~~~~~~~~~~~~~~ +This document describes the features, fixed issues, and information +about downloading and installing the AMD ROCmâ„¢ software. + +It also covers known issues and deprecations in this release. + +Notice for Open-source and Closed-source ROCm Repositories in Future Releases +----------------------------------------------------------------------------- + +To make a distinction between open-source and closed-source components, +all ROCm repositories will consist of sub-folders in future releases. + +- All open-source components will be placed in the + *base-url/<rocm-ver>/main* sub-folder +- All closed-source components will reside in the + *base-url/<rocm-ver>/ proprietary* sub-folder + +List of Supported Operating Systems +----------------------------------- + +The AMD ROCm platform supports the following operating systems: + +=============================== =========================== +**OS-Version (64-bit)** **Kernel Versions** +=============================== =========================== +CentOS 8.3 4.18.0-193.el8 +CentOS 7.9 3.10.0-1127 +RHEL 8.5 4.18.0-348.7.1.el8_5.x86_64 +RHEL 8.4 4.18.0-305.el8.x86_64 +RHEL 7.9 3.10.0-1160.6.1.el7 +SLES 15 SP3 5.3.18-59.16-default +Ubuntu 20.04.3 5.8.0 LTS / 5.11 HWE +Ubuntu 18.04.5 [5.4 HWE kernel] 5.4.0-71-generic +=============================== =========================== + +Support for RHEL v8.5 +~~~~~~~~~~~~~~~~~~~~~ + +This release extends support for RHEL v8.5. + +Supported GPUs +~~~~~~~~~~~~~~ + +Radeon Pro V620 and W6800 Workstation GPUs +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This release extends ROCm support for Radeon Pro V620 and W6800 +Workstation GPUs. + +- SRIOV virtualization support for Radeon Pro V620 + +- KVM Hypervisor (1VF support only) on Ubuntu Host OS with Ubuntu, + CentOs, and RHEL Guest + +- Support for ROCm-SMI in an SRIOV environment. For more details, refer + to the ROCm SMI API documentation. + +**Note:** Radeon Pro v620 is not supported on SLES. + +ROCm Installation Updates for ROCm v5.0 +--------------------------------------- + +This release has the following ROCm installation enhancements. + +Support for Kernel Mode Driver +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The following components for the ROCm platform are released and available for the v3.3 -release: +In this release, users can install the kernel-mode driver using the +Installer method. Some of the ROCm-specific use cases that the installer +currently supports are: -• Drivers +- OpenCL (ROCr/KFD based) runtime +- HIP runtimes +- ROCm libraries and applications +- ROCm Compiler and device libraries +- ROCr runtime and thunk +- Kernel-mode driver -• Tools +Support for Multi-version ROCm Installation and Uninstallation +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -• Libraries +Users now can install multiple ROCm releases simultaneously on a system +using the newly introduced installer script and package manager install +mechanism. -• Source Code +Users can also uninstall multi-version ROCm releases using the +*amdgpu-uninstall* script and package manager. -You can access the latest supported version of drivers, tools, libraries, and source code for the ROCm platform at the following location: -https://github.com/RadeonOpenCompute/ROCm +Support for Updating Information on Local Repositories +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +In this release, the *amdgpu-install* script automates the process of +updating local repository information before proceeding to ROCm +installation. -Supported Operating Systems +Support for Release Upgrades ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The ROCm v3.3.x platform is designed to support the following operating systems: +Users can now upgrade the existing ROCm installation to specific or +latest ROCm releases. +For more details, refer to the AMD ROCm Installation Guide v5.0. -* Ubuntu 16.04.6(Kernel 4.15) and 18.04.4 (Kernel 5.3) +AMD ROCm V5.0 Documentation Updates +=================================== -* CentOS v7.7 (Using devtoolset-7 runtime support) +New AMD ROCm Information Portal ROCm v4.5 and Above +----------------------------------------------------- -* RHEL v7.7 (Using devtoolset-7 runtime support) +Beginning ROCm release v5.0, AMD ROCm documentation has a new portal at +`https://docs.amd.com `__. This portal consists +of ROCm documentation v4.5 and above. -* SLES 15 SP1 +For documentation prior to ROCm v4.5, you may continue to access +`http://rocmdocs.amd.com `__. +Documentation Updates for ROCm 5.0 +---------------------------------- -What\'s New in This Release -=========================== +Deployment Tools +~~~~~~~~~~~~~~~~ + +ROCm Data Center Tool Documentation Updates +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- ROCm Data Center Tool User Guide +- ROCm Data Center Tool API Guide + +ROCm System Management Interface Updates +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- System Management Interface Guide +- System Management Interface API Guide + +ROCm Command Line Interface Updates +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -**Multi\-Version Installation** +- Command Line Interface Guide + +Machine Learning/AI Documentation Updates +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +- Deep Learning Guide +- MIGraphX API Guide +- MIOpen API Guide +- MIVisionX API Guide + +ROCm Libraries Documentation Updates ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Users can install and access multiple versions of the ROCm toolkit simultaneously. +- hipSOLVER User Guide +- RCCL User Guide +- rocALUTION User Guide +- rocBLAS User Guide +- rocFFT User Guide +- rocRAND User Guide +- rocSOLVER User Guide +- rocSPARSE User Guide +- rocThrust User Guide + +Compilers and Tools +~~~~~~~~~~~~~~~~~~~ + +ROCDebugger Documentation Updates +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- ROCDebugger User Guide +- ROCDebugger API Guide + +ROCTracer +^^^^^^^^^ + +- ROCTracer User Guide +- ROCTracer API Guide + +Compilers +^^^^^^^^^ + +- AMD Instinct High Performance Computing and Tuning Guide +- AMD Compiler Reference Guide + +HIPify Documentation +^^^^^^^^^^^^^^^^^^^^ + +- HIPify User Guide +- HIP Supported CUDA API Reference Guide + +ROCm Debug Agent +^^^^^^^^^^^^^^^^ + +- ROCm Debug Agent Guide +- System Level Debug Guide +- ROCm Validation Suite + +Programming Models Documentation +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +HIP Documentation +^^^^^^^^^^^^^^^^^ -Previously, users could install only a single version of the ROCm toolkit. +- HIP Programming Guide +- HIP API Guide +- HIP FAQ Guide -Now, users have the option to install multiple versions simultaneously and toggle to the desired version of the ROCm toolkit. From the v3.3 release, multiple versions of ROCm packages can be installed in the */opt/rocm-* folder. - -**Prerequisites** -############################### +OpenMP Documentation +^^^^^^^^^^^^^^^^^^^^ -Ensure the existing installations of ROCm, including */opt/rocm*, are completely removed before the v3.3 ROCm toolkit installation. The ROCm v3.3 package requires a clean installation. +- OpenMP Support Guide -* To install a single instance of ROCm, use the rocm-dkms or rocm-dev packages to install all the required components. This creates a symbolic link */opt/rocm* pointing to the corresponding version of ROCm installed on the system. +ROCm Glossary +~~~~~~~~~~~~~ -* To install individual ROCm components, create the */opt/rocm* symbolic link pointing to the version of ROCm installed on the system. For example, *# ln -s /opt/rocm-3.3.0 /opt/rocm* +- ROCm Glossary - Terms and Definitions -* To install multiple instance ROCm packages, create */opt/rocm* symbolic link pointing to the version of ROCm installed/used on the system. For example, *# ln -s /opt/rocm-3.3.0 /opt/rocm* +AMD ROCm Legacy Documentation Links ROCm v4.3 and Prior +--------------------------------------------------------- -**Note**: The Kernel Fusion Driver (KFD) must be compatible with all versions of the ROCm software installed on the system. +- For AMD ROCm documentation, see +https://rocmdocs.amd.com/en/latest/ -Before You Begin -################# +- For installation instructions on supported platforms, see -Review the following important notes: +https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html -**Single Version Installation** +- For AMD ROCm binary structure, see -To install a single instance of the ROCm package, access the non-versioned packages. You must not install any components from the multi-instance set. +https://rocmdocs.amd.com/en/latest/Installation_Guide/Software-Stack-for-AMD-GPU.html -For example, +- For AMD ROCm release history, see -* rocm-dkms +*https://rocmdocs.amd.com/en/latest/Current_Release_Notes/ROCm-Version-History.html* -* rocm-dev +What's New in This Release +========================== -* hip +HIP Enhancements +---------------- -A fresh installation or an upgrade of the single-version installation will remove the existing version completely and install the new version in the */opt/rocm-* folder. +The ROCm v5.0 release consists of the following HIP enhancements. -.. image:: /Current_Release_Notes/singleinstance.png +HIP Installation Guide Updates +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -**Multi Version Installation** +The HIP Installation Guide is updated to include building HIP from +source on the NVIDIA platform. -* To install a multi-instance of the ROCm package, access the versioned packages and components. +Refer to the HIP Installation Guide v5.0 for more details. -For example, +Managed Memory Allocation +~~~~~~~~~~~~~~~~~~~~~~~~~ + +Managed memory, including the ``__managed__`` keyword, is now supported +in the HIP combined host/device compilation. Through unified memory +allocation, managed memory allows data to be shared and accessible to +both the CPU and GPU using a single pointer. The allocation is managed +by the AMD GPU driver using the Linux Heterogeneous Memory Management +(HMM) mechanism. The user can call managed memory API hipMallocManaged +to allocate a large chunk of HMM memory, execute kernels on a device, +and fetch data between the host and device as needed. + +**Note:** In a HIP application, it is recommended to do a capability +check before calling the managed memory APIs. For example, + +:: + + + int managed\_memory = 0; + + HIPCHECK(hipDeviceGetAttribute(&managed\_memory, + + hipDeviceAttributeManagedMemory,p\_gpuDevice)); + + if (!managed\_memory ) { + + printf ("info: managed memory access not supported on the device %d\n Skipped\n", p\_gpuDevice); + + } + + else { + + HIPCHECK(hipSetDevice(p\_gpuDevice)); + + HIPCHECK(hipMallocManaged(&Hmm, N \* sizeof(T))); + + . . . + + } + +**Note:** The managed memory capability check may not be necessary; +however, if HMM is not supported, managed malloc will fall back to using +system memory. Other managed memory API calls will, then, have + +Refer to the HIP API documentation for more details on managed memory +APIs. + +For the application, see + +https://github.com/ROCm-Developer-Tools/HIP/blob/rocm-4.5.x/tests/src/runtimeApi/memory/hipMallocManaged.cpp + +New Environment Variable +------------------------ + +The following new environment variable is added in this release: - * rocm-dkms3.3.0 ++-----------------------+-----------------------+-----------------------+ +| **Environment | **Value** | **Description** | +| Variable** | | | ++=======================+=======================+=======================+ +| **HSA_COOP_CU_COUNT** | 0 or 1 (default is 0) | Some processors | +| | | support more CUs than | +| | | can reliably be used | +| | | in a cooperative | +| | | dispatch. Setting the | +| | | environment variable | +| | | HSA_COOP_CU_COUNT to | +| | | 1 will cause ROCr to | +| | | return the correct CU | +| | | count for cooperative | +| | | groups through the | +| | | HSA_AMD | +| | | _AGENT_INFO_COOPERATI | +| | | VE_COMPUTE_UNIT_COUNT | +| | | attribute of | +| | | hsa_agent_get_info(). | +| | | Setting | +| | | HSA_COOP_CU_COUNT to | +| | | other values, or | +| | | leaving it unset, | +| | | will cause ROCr to | +| | | return the same CU | +| | | count for the | +| | | attributes | +| | | HSA_AMD | +| | | _AGENT_INFO_COOPERATI | +| | | VE_COMPUTE_UNIT_COUNT | +| | | and | +| | | HSA_AMD_AGENT_INF | +| | | O_COMPUTE_UNIT_COUNT. | +| | | Future ROCm releases | +| | | will make | +| | | HSA_COOP_CU_COUNT=1 | +| | | the default. | ++-----------------------+-----------------------+-----------------------+ +| | | | ++-----------------------+-----------------------+-----------------------+ - * rocm-dev3.3.0 +ROCm Math and Communication Libraries +------------------------------------- - * hip3.3.0 +.. image:: Lib1.png +.. image:: lib2.png +.. image:: lib3.png +.. image:: lib4.png +.. image:: lib5.png +.. image:: lib6.png -* The new multi-instance package enables you to install two versions of the ROCm toolkit simultaneously and provides the ability to toggle between the two versioned packages. -* The ROCm-DEV package does not create symlinks -* Users must create symlinks if required +System Management Interface +--------------------------- -* Multi-version installation with previous ROCm versions is not supported +Clock Throttling for GPU Events +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -* Kernel Fusion Driver (KFD) must be compatible with all versions of ROCm installations +This feature lists GPU events as they occur in real-time and can be used +with *kfdtest* to produce *vm_fault* events for testing. -.. image:: /Current_Release_Notes/MultiIns.png +The command can be called with either " **-e**" or " **-“showevents**" +like this: -**IMPORTANT**: A single instance ROCm package cannot co-exist with the multi-instance package. +:: -**NOTE**: The multi-instance installation applies only to ROCm v3.3 and above. This package requires a fresh installation after the complete removal of existing ROCm packages. The multi-version installation is not backward compatible. + **-e** [EVENT [EVENT ...]], **--showevents** [EVENT [EVENT ...]] Show event list + -**GPU Process Information** -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Where "EVENT" is any list combination of ' **VM_FAULT**', ' +**THERMAL_THROTTLE**', or ' **GPU_RESET**' and is NOT case sensitive. -A new functionality to display process information for GPUs is available in this release. For example, you can view the process details to determine if the GPU(s) must be reset. +**Note:** If no event arguments are passed, all events will be watched +by default. -To display the GPU process details, you can: +CLI Commands +^^^^^^^^^^^^ -* Invoke the API +:: -or -* Use the Command Line Interface (CLI) + ./rocm-smi --showevents vm\_fault thermal\_throttle gpu\_reset -For more details about the API and the command instructions, see -https://github.com/RadeonOpenCompute/rocm_smi_lib/blob/master/docs/ROCm_SMI_Manual.pdf + =========== ROCm System Management Interface ====================== + ========================== Show Events ============================ -**Support for 3D Pooling Layers** + press 'q' or 'ctrl + c' to quit + + DEVICE TIME TYPE DESCRIPTION + + ========================= End of ROCm SMI Log ===================== + + \*run kfdtest in another window to test for vm\_fault events + +**Note:** Unlike other rocm-smi CLI commands, this command does not quit +unless specified by the user. Users may press either ' **q**' or ' +**ctrl + c**' to quit. + +Display XGMI Bandwidth Between Nodes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -AMD ROCm is enhanced to include support for 3D pooling layers. The implementation of 3D pooling layers now allows users to run 3D convolutional networks, such as ResNext3D, on AMD Radeon Instinct GPUs. +The *rsmi_minmax_bandwidth_get* API reads the HW Topology file and +displays bandwidth (min-max) between any two NUMA nodes in a matrix +format. +The Command Line Interface (CLI) command can be called as follows: -**ONNX Enhancements** -~~~~~~~~~~~~~~~~~~~~~~~~~ +:: -Open Neural Network eXchange (ONNX) is a widely-used neural net exchange format. The AMD model compiler & optimizer support the pre-trained models in ONNX, NNEF, & Caffe formats. Currently, ONNX versions 1.3 and below are supported. -The AMD Neural Net Intermediate Representation (NNIR) is enhanced to handle the rapidly changing ONNX versions and its layers. + ./rocm-smi --shownodesbw -.. image:: /Current_Release_Notes/onnx.png + CLI ---shownodesbw + usage- We show maximum theoretical xgmi bandwidth between 2 numa nodes -Deprecations in the v3.3 Release -================================ + sample output- + + ================= ROCm System Management Interface ================ + ================= Bandwidth =================================== + GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 + GPU0 N/A 50000-200000 50000-50000 0-0 0-0 0-0 50000-100000 0-0 + GPU1 50000-200000 N/A 0-0 50000-50000 0-0 50000-50000 0-0 0-0 + GPU2 50000-50000 0-0 N/A 50000-200000 50000-100000 0-0 0-0 0-0 + GPU3 0-0 50000-50000 50000-200000 N/A 0-0 0-0 0-0 50000-50000 + GPU4 0-0 0-0 50000-100000 0-0 N/A 50000-200000 50000-50000 0-0 + GPU5 0-0 50000-50000 0-0 0-0 50000-200000 N/A 0-0 50000-50000 + GPU6 50000-100000 0-0 0-0 0-0 50000-50000 0-0 N/A 50000-200000 + GPU7 0-0 0-0 0-0 50000-50000 0-0 50000-50000 50000-200000 N/A + Format: min-max; Units: mps + + +**Note:**\ "0-0" min-max bandwidth indicates devices are not connected +directly. + +P2P Connection Status +~~~~~~~~~~~~~~~~~~~~~ + +The *rsmi_is_p2p_accessible* API returns "True" if P2P can be +implemented between two nodes, and returns "False" if P2P cannot be +implemented between the two nodes. + +The Command Line Interface command can be called as follows: + +:: + + + ./rocm-smi -showtopoaccess + + Sample Output: + + ./rocm-smi --showtopoaccess + + ====================== ROCm System Management Interface ======================= + + ==================== Link accessibility between two GPUs ====================== + + GPU0 GPU1 + + GPU0 True True + + GPU1 True True + + ============================= End of ROCm SMI Log ============================ + + # Breaking Changes + + ## Runtime Breaking Change + + Re-ordering of the enumerated type in hip\_runtime\_api.h to better match NV. See below for the difference in enumerated types. + + ROCm software will be affected if any of the defined enums listed below are used in the code. Applications built with ROCm v5.0 enumerated types will work with a ROCm 4.5.2 driver. However, an undefined behavior error will occur with a ROCm v4.5.2 application that uses these enumerated types with a ROCm 5.0 runtime. + + typedef enum hipDeviceAttribute\_t { + + - hipDeviceAttributeMaxThreadsPerBlock, ///\< Maximum number of threads per block. + + - hipDeviceAttributeMaxBlockDimX, ///\< Maximum x-dimension of a block. + + - hipDeviceAttributeMaxBlockDimY, ///\< Maximum y-dimension of a block. + + - hipDeviceAttributeMaxBlockDimZ, ///\< Maximum z-dimension of a block. + + - hipDeviceAttributeMaxGridDimX, ///\< Maximum x-dimension of a grid. + + - hipDeviceAttributeMaxGridDimY, ///\< Maximum y-dimension of a grid. + + - hipDeviceAttributeMaxGridDimZ, ///\< Maximum z-dimension of a grid. + + - hipDeviceAttributeMaxSharedMemoryPerBlock, ///\< Maximum shared memory available per block in + + - ///\< bytes. + + - hipDeviceAttributeTotalConstantMemory, ///\< Constant memory size in bytes. + + - hipDeviceAttributeWarpSize, ///\< Warp size in threads. + + - hipDeviceAttributeMaxRegistersPerBlock, ///\< Maximum number of 32-bit registers available to a + + - ///\< thread block. This number is shared by all thread + + - ///\< blocks simultaneously resident on a + + - ///\< multiprocessor. + + - hipDeviceAttributeClockRate, ///\< Peak clock frequency in kilohertz. + + - hipDeviceAttributeMemoryClockRate, ///\< Peak memory clock frequency in kilohertz. + + - hipDeviceAttributeMemoryBusWidth, ///\< Global memory bus width in bits. + + - hipDeviceAttributeMultiprocessorCount, ///\< Number of multiprocessors on the device. + + - hipDeviceAttributeComputeMode, ///\< Compute mode that device is currently in. + + - hipDeviceAttributeL2CacheSize, ///\< Size of L2 cache in bytes. 0 if the device doesn't have L2 + + - ///\< cache. + + - hipDeviceAttributeMaxThreadsPerMultiProcessor, ///\< Maximum resident threads per + + - ///\< multiprocessor. + + - hipDeviceAttributeComputeCapabilityMajor, ///\< Major compute capability version number. + + - hipDeviceAttributeComputeCapabilityMinor, ///\< Minor compute capability version number. + + - hipDeviceAttributeConcurrentKernels, ///\< Device can possibly execute multiple kernels + + - ///\< concurrently. + + - hipDeviceAttributePciBusId, ///\< PCI Bus ID. + + - hipDeviceAttributePciDeviceId, ///\< PCI Device ID. + + - hipDeviceAttributeMaxSharedMemoryPerMultiprocessor, ///\< Maximum Shared Memory Per + + - ///\< Multiprocessor. + + - hipDeviceAttributeIsMultiGpuBoard, ///\< Multiple GPU devices. + + - hipDeviceAttributeIntegrated, ///\< iGPU + + - hipDeviceAttributeCooperativeLaunch, ///\< Support cooperative launch + + - hipDeviceAttributeCooperativeMultiDeviceLaunch, ///\< Support cooperative launch on multiple devices + + - hipDeviceAttributeMaxTexture1DWidth, ///\< Maximum number of elements in 1D images + + - hipDeviceAttributeMaxTexture2DWidth, ///\< Maximum dimension width of 2D images in image elements + + - hipDeviceAttributeMaxTexture2DHeight, ///\< Maximum dimension height of 2D images in image elements + + - hipDeviceAttributeMaxTexture3DWidth, ///\< Maximum dimension width of 3D images in image elements + + - hipDeviceAttributeMaxTexture3DHeight, ///\< Maximum dimensions height of 3D images in image elements + + - hipDeviceAttributeMaxTexture3DDepth, ///\< Maximum dimensions depth of 3D images in image elements + + + hipDeviceAttributeCudaCompatibleBegin = 0, + + - hipDeviceAttributeHdpMemFlushCntl, ///\< Address of the HDP\_MEM\_COHERENCY\_FLUSH\_CNTL register + + - hipDeviceAttributeHdpRegFlushCntl, ///\< Address of the HDP\_REG\_COHERENCY\_FLUSH\_CNTL register + + + hipDeviceAttributeEccEnabled = hipDeviceAttributeCudaCompatibleBegin, ///\< Whether ECC support is enabled. + + + hipDeviceAttributeAccessPolicyMaxWindowSize, ///\< Cuda only. The maximum size of the window policy in bytes. + + + hipDeviceAttributeAsyncEngineCount, ///\< Cuda only. Asynchronous engines number. + + + hipDeviceAttributeCanMapHostMemory, ///\< Whether host memory can be mapped into device address space + + + hipDeviceAttributeCanUseHostPointerForRegisteredMem,///\< Cuda only. Device can access host registered memory + + + ///\< at the same virtual address as the CPU + + + hipDeviceAttributeClockRate, ///\< Peak clock frequency in kilohertz. + + + hipDeviceAttributeComputeMode, ///\< Compute mode that device is currently in. + + + hipDeviceAttributeComputePreemptionSupported, ///\< Cuda only. Device supports Compute Preemption. + + + hipDeviceAttributeConcurrentKernels, ///\< Device can possibly execute multiple kernels concurrently. + + + hipDeviceAttributeConcurrentManagedAccess, ///\< Device can coherently access managed memory concurrently with the CPU + + + hipDeviceAttributeCooperativeLaunch, ///\< Support cooperative launch + + + hipDeviceAttributeCooperativeMultiDeviceLaunch, ///\< Support cooperative launch on multiple devices + + + hipDeviceAttributeDeviceOverlap, ///\< Cuda only. Device can concurrently copy memory and execute a kernel. + + + ///\< Deprecated. Use instead asyncEngineCount. + + + hipDeviceAttributeDirectManagedMemAccessFromHost, ///\< Host can directly access managed memory on + + + ///\< the device without migration + + + hipDeviceAttributeGlobalL1CacheSupported, ///\< Cuda only. Device supports caching globals in L1 + + + hipDeviceAttributeHostNativeAtomicSupported, ///\< Cuda only. Link between the device and the host supports native atomic operations + + + hipDeviceAttributeIntegrated, ///\< Device is integrated GPU + + + hipDeviceAttributeIsMultiGpuBoard, ///\< Multiple GPU devices. + + + hipDeviceAttributeKernelExecTimeout, ///\< Run time limit for kernels executed on the device + + + hipDeviceAttributeL2CacheSize, ///\< Size of L2 cache in bytes. 0 if the device doesn't have L2 cache. + + + hipDeviceAttributeLocalL1CacheSupported, ///\< caching locals in L1 is supported + + + hipDeviceAttributeLuid, ///\< Cuda only. 8-byte locally unique identifier in 8 bytes. Undefined on TCC and non-Windows platforms + + + hipDeviceAttributeLuidDeviceNodeMask, ///\< Cuda only. Luid device node mask. Undefined on TCC and non-Windows platforms + + + hipDeviceAttributeComputeCapabilityMajor, ///\< Major compute capability version number. + + + hipDeviceAttributeManagedMemory, ///\< Device supports allocating managed memory on this system + + + hipDeviceAttributeMaxBlocksPerMultiProcessor, ///\< Cuda only. Max block size per multiprocessor + + + hipDeviceAttributeMaxBlockDimX, ///\< Max block size in width. + + + hipDeviceAttributeMaxBlockDimY, ///\< Max block size in height. + + + hipDeviceAttributeMaxBlockDimZ, ///\< Max block size in depth. + + + hipDeviceAttributeMaxGridDimX, ///\< Max grid size in width. + + + hipDeviceAttributeMaxGridDimY, ///\< Max grid size in height. + + + hipDeviceAttributeMaxGridDimZ, ///\< Max grid size in depth. + + + hipDeviceAttributeMaxSurface1D, ///\< Maximum size of 1D surface. + + + hipDeviceAttributeMaxSurface1DLayered, ///\< Cuda only. Maximum dimensions of 1D layered surface. + + + hipDeviceAttributeMaxSurface2D, ///\< Maximum dimension (width, height) of 2D surface. + + + hipDeviceAttributeMaxSurface2DLayered, ///\< Cuda only. Maximum dimensions of 2D layered surface. + + + hipDeviceAttributeMaxSurface3D, ///\< Maximum dimension (width, height, depth) of 3D surface. + + + hipDeviceAttributeMaxSurfaceCubemap, ///\< Cuda only. Maximum dimensions of Cubemap surface. + + + hipDeviceAttributeMaxSurfaceCubemapLayered, ///\< Cuda only. Maximum dimension of Cubemap layered surface. + + + hipDeviceAttributeMaxTexture1DWidth, ///\< Maximum size of 1D texture. + + + hipDeviceAttributeMaxTexture1DLayered, ///\< Cuda only. Maximum dimensions of 1D layered texture. + + + hipDeviceAttributeMaxTexture1DLinear, ///\< Maximum number of elements allocatable in a 1D linear texture. + + + ///\< Use cudaDeviceGetTexture1DLinearMaxWidth() instead on Cuda. + + + hipDeviceAttributeMaxTexture1DMipmap, ///\< Cuda only. Maximum size of 1D mipmapped texture. + + + hipDeviceAttributeMaxTexture2DWidth, ///\< Maximum dimension width of 2D texture. + + + hipDeviceAttributeMaxTexture2DHeight, ///\< Maximum dimension hight of 2D texture. + + + hipDeviceAttributeMaxTexture2DGather, ///\< Cuda only. Maximum dimensions of 2D texture if gather operations performed. + + + hipDeviceAttributeMaxTexture2DLayered, ///\< Cuda only. Maximum dimensions of 2D layered texture. + + + hipDeviceAttributeMaxTexture2DLinear, ///\< Cuda only. Maximum dimensions (width, height, pitch) of 2D textures bound to pitched memory. + + + hipDeviceAttributeMaxTexture2DMipmap, ///\< Cuda only. Maximum dimensions of 2D mipmapped texture. + + + hipDeviceAttributeMaxTexture3DWidth, ///\< Maximum dimension width of 3D texture. + + + hipDeviceAttributeMaxTexture3DHeight, ///\< Maximum dimension height of 3D texture. + + + hipDeviceAttributeMaxTexture3DDepth, ///\< Maximum dimension depth of 3D texture. + + + hipDeviceAttributeMaxTexture3DAlt, ///\< Cuda only. Maximum dimensions of alternate 3D texture. + + + hipDeviceAttributeMaxTextureCubemap, ///\< Cuda only. Maximum dimensions of Cubemap texture + + + hipDeviceAttributeMaxTextureCubemapLayered, ///\< Cuda only. Maximum dimensions of Cubemap layered texture. + + + hipDeviceAttributeMaxThreadsDim, ///\< Maximum dimension of a block + + + hipDeviceAttributeMaxThreadsPerBlock, ///\< Maximum number of threads per block. + + + hipDeviceAttributeMaxThreadsPerMultiProcessor, ///\< Maximum resident threads per multiprocessor. + + + hipDeviceAttributeMaxPitch, ///\< Maximum pitch in bytes allowed by memory copies + + + hipDeviceAttributeMemoryBusWidth, ///\< Global memory bus width in bits. + + + hipDeviceAttributeMemoryClockRate, ///\< Peak memory clock frequency in kilohertz. + + + hipDeviceAttributeComputeCapabilityMinor, ///\< Minor compute capability version number. + + + hipDeviceAttributeMultiGpuBoardGroupID, ///\< Cuda only. Unique ID of device group on the same multi-GPU board + + + hipDeviceAttributeMultiprocessorCount, ///\< Number of multiprocessors on the device. + + + hipDeviceAttributeName, ///\< Device name. + + + hipDeviceAttributePageableMemoryAccess, ///\< Device supports coherently accessing pageable memory + + + ///\< without calling hipHostRegister on it + + + hipDeviceAttributePageableMemoryAccessUsesHostPageTables, ///\< Device accesses pageable memory via the host's page tables + + + hipDeviceAttributePciBusId, ///\< PCI Bus ID. + + + hipDeviceAttributePciDeviceId, ///\< PCI Device ID. + + + hipDeviceAttributePciDomainID, ///\< PCI Domain ID. + + + hipDeviceAttributePersistingL2CacheMaxSize, ///\< Cuda11 only. Maximum l2 persisting lines capacity in bytes + + + hipDeviceAttributeMaxRegistersPerBlock, ///\< 32-bit registers available to a thread block. This number is shared + + + ///\< by all thread blocks simultaneously resident on a multiprocessor. + + + hipDeviceAttributeMaxRegistersPerMultiprocessor, ///\< 32-bit registers available per block. + + + hipDeviceAttributeReservedSharedMemPerBlock, ///\< Cuda11 only. Shared memory reserved by CUDA driver per block. + + + hipDeviceAttributeMaxSharedMemoryPerBlock, ///\< Maximum shared memory available per block in bytes. + + + hipDeviceAttributeSharedMemPerBlockOptin, ///\< Cuda only. Maximum shared memory per block usable by special opt in. + + + hipDeviceAttributeSharedMemPerMultiprocessor, ///\< Cuda only. Shared memory available per multiprocessor. + + + hipDeviceAttributeSingleToDoublePrecisionPerfRatio, ///\< Cuda only. Performance ratio of single precision to double precision. + + + hipDeviceAttributeStreamPrioritiesSupported, ///\< Cuda only. Whether to support stream priorities. + + + hipDeviceAttributeSurfaceAlignment, ///\< Cuda only. Alignment requirement for surfaces + + + hipDeviceAttributeTccDriver, ///\< Cuda only. Whether device is a Tesla device using TCC driver + + + hipDeviceAttributeTextureAlignment, ///\< Alignment requirement for textures + + + hipDeviceAttributeTexturePitchAlignment, ///\< Pitch alignment requirement for 2D texture references bound to pitched memory; + + + hipDeviceAttributeTotalConstantMemory, ///\< Constant memory size in bytes. + + + hipDeviceAttributeTotalGlobalMem, ///\< Global memory available on devicice. + + + hipDeviceAttributeUnifiedAddressing, ///\< Cuda only. An unified address space shared with the host. + + + hipDeviceAttributeUuid, ///\< Cuda only. Unique ID in 16 byte. + + + hipDeviceAttributeWarpSize, ///\< Warp size in threads. + + - hipDeviceAttributeMaxPitch, ///\< Maximum pitch in bytes allowed by memory copies + + - hipDeviceAttributeTextureAlignment, ///\<Alignment requirement for textures + + - hipDeviceAttributeTexturePitchAlignment, ///\<Pitch alignment requirement for 2D texture references bound to pitched memory; + + - hipDeviceAttributeKernelExecTimeout, ///\<Run time limit for kernels executed on the device + + - hipDeviceAttributeCanMapHostMemory, ///\<Device can map host memory into device address space + + - hipDeviceAttributeEccEnabled, ///\<Device has ECC support enabled + + + hipDeviceAttributeCudaCompatibleEnd = 9999, + + + hipDeviceAttributeAmdSpecificBegin = 10000, + + - hipDeviceAttributeCooperativeMultiDeviceUnmatchedFunc, ///\< Supports cooperative launch on multiple + + - ///devices with unmatched functions + + - hipDeviceAttributeCooperativeMultiDeviceUnmatchedGridDim, ///\< Supports cooperative launch on multiple + + - ///devices with unmatched grid dimensions + + - hipDeviceAttributeCooperativeMultiDeviceUnmatchedBlockDim, ///\< Supports cooperative launch on multiple + + - ///devices with unmatched block dimensions + + - hipDeviceAttributeCooperativeMultiDeviceUnmatchedSharedMem, ///\< Supports cooperative launch on multiple + + - ///devices with unmatched shared memories + + - hipDeviceAttributeAsicRevision, ///\< Revision of the GPU in this device + + - hipDeviceAttributeManagedMemory, ///\< Device supports allocating managed memory on this system + + - hipDeviceAttributeDirectManagedMemAccessFromHost, ///\< Host can directly access managed memory on + + - /// the device without migration + + - hipDeviceAttributeConcurrentManagedAccess, ///\< Device can coherently access managed memory + + - /// concurrently with the CPU + + - hipDeviceAttributePageableMemoryAccess, ///\< Device supports coherently accessing pageable memory + + - /// without calling hipHostRegister on it + + - hipDeviceAttributePageableMemoryAccessUsesHostPageTables, ///\< Device accesses pageable memory via + + - /// the host's page tables + + - hipDeviceAttributeCanUseStreamWaitValue ///\< '1' if Device supports hipStreamWaitValue32() and + + - ///\< hipStreamWaitValue64() , '0' otherwise. + + + hipDeviceAttributeClockInstructionRate = hipDeviceAttributeAmdSpecificBegin, ///\< Frequency in khz of the timer used by the device-side "clock\*" + + + hipDeviceAttributeArch, ///\< Device architecture + + + hipDeviceAttributeMaxSharedMemoryPerMultiprocessor, ///\< Maximum Shared Memory PerMultiprocessor. + + + hipDeviceAttributeGcnArch, ///\< Device gcn architecture + + + hipDeviceAttributeGcnArchName, ///\< Device gcnArch name in 256 bytes + + + hipDeviceAttributeHdpMemFlushCntl, ///\< Address of the HDP\_MEM\_COHERENCY\_FLUSH\_CNTL register + + + hipDeviceAttributeHdpRegFlushCntl, ///\< Address of the HDP\_REG\_COHERENCY\_FLUSH\_CNTL register + + + hipDeviceAttributeCooperativeMultiDeviceUnmatchedFunc, ///\< Supports cooperative launch on multiple + + + ///\< devices with unmatched functions + + + hipDeviceAttributeCooperativeMultiDeviceUnmatchedGridDim, ///\< Supports cooperative launch on multiple + + + ///\< devices with unmatched grid dimensions + + + hipDeviceAttributeCooperativeMultiDeviceUnmatchedBlockDim, ///\< Supports cooperative launch on multiple + + + ///\< devices with unmatched block dimensions + + + hipDeviceAttributeCooperativeMultiDeviceUnmatchedSharedMem, ///\< Supports cooperative launch on multiple + + + ///\< devices with unmatched shared memories + + + hipDeviceAttributeIsLargeBar, ///\< Whether it is LargeBar + + + hipDeviceAttributeAsicRevision, ///\< Revision of the GPU in this device + + + hipDeviceAttributeCanUseStreamWaitValue, ///\< '1' if Device supports hipStreamWaitValue32() and + + + ///\< hipStreamWaitValue64() , '0' otherwise. + + + hipDeviceAttributeAmdSpecificEnd = 19999, + + + hipDeviceAttributeVendorSpecificBegin = 20000, + + + // Extended attributes for vendors + + } hipDeviceAttribute\_t; + + enum hipComputeMode { + +Known Issues in This Release +============================ + +Incorrect dGPU Behavior When Using AMDVBFlash Tool +-------------------------------------------------- + +The AMDVBFlash tool, used for flashing the VBIOS image to dGPU, does not +communicate with the ROM Controller specifically when the driver is +present. This is because the driver, as part of its runtime power +management feature, puts the dGPU to a sleep state. + +As a workaround, users can run *amdgpu.runpm=0*, which temporarily +disables the runtime power management feature from the driver and +dynamically changes some power control-related sysfs files. + +Issue with START Timestamp in ROCProfiler +----------------------------------------- + +Users may encounter an issue with the enabled timestamp functionality +for monitoring one or multiple counters. ROCProfiler outputs the +following four timestamps for each kernel: + +- Dispatch +- Start +- End +- Complete + +**Issue** + +This defect is related to the Start timestamp functionality, which +incorrectly shows an earlier time than the Dispatch timestamp. + +To reproduce the issue, + +1. Enable timing using the --timestamp on* flag_.\_ +2. Use the *-i* option with the input filename that contains the name of + the counter(s) to monitor. +3. Run the program. +4. Check the output result file. + +**Current behavior** + +BeginNS is lower than DispatchNS, which is incorrect. + +**Expected behavior** + +The correct order is: + +*Dispatch < Start < End < Complete* + +Users cannot use ROCProfiler to measure the time spent on each kernel +because of the incorrect timestamp with counter collection enabled. + +**Recommended Workaround** + +Users are recommended to collect kernel execution timestamps without +monitoring counters, as follows: + +1. Enable timing using the *–timestamp on* flag, and run the + application. +2. Rerun the application using the *-i* option with the input filename + that contains the name of the counter(s) to monitor, and save this to + a different output file using the *-o* flag. +3. Check the output result file from step 1. +4. The order of timestamps correctly displays as: + +*DispathNS < BeginNS < EndNS < CompleteNS* + +1. Users can find the values of the collected counters in the output + file generated in step 2. + +.. _radeon-pro-v620-and-w6800-workstation-gpus-1: + +Radeon Pro V620 and W6800 Workstation GPUs +------------------------------------------ + +No Support for SMI and ROCDebugger on SRIOV +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +System Management Interface (SMI) and ROCDebugger are not supported in +the SRIOV environment on any GPU. For more information, refer to the +Systems Management Interface documentation. + +Deprecations and Warnings in This Release +========================================= + +ROCm Libraries Changes Deprecations and Deprecation Removal +------------------------------------------------------------- + +- The hipFFT.h header is now provided only by the hipFFT package. Up to + ROCm 5.0, users would get hipFFT.h in the rocFFT package too. +- The GlobalPairwiseAMG class is now entirely removed, users should use + the PairwiseAMG class instead. +- The rocsparse_spmm signature in 5.0 was changed to match that of + rocsparse_spmm_ex. In 5.0, rocsparse_spmm_ex is still present, but + deprecated. Signature diff for rocsparse_spmm + +*rocsparse_spmm in 5.0* +~~~~~~~~~~~~~~~~~~~~~~~ + +rocsparse_status rocsparse_spmm(rocsparse_handle handle, + +:: + + rocsparse\_operation trans\_A, + + rocsparse\_operation trans\_B, + + const void\* alpha, + + const rocsparse\_spmat\_descr mat\_A, + + const rocsparse\_dnmat\_descr mat\_B, + + const void\* beta, + + const rocsparse\_dnmat\_descr mat\_C, + + rocsparse\_datatype compute\_type, + + rocsparse\_spmm\_alg alg, + + rocsparse\_spmm\_stage stage, + + size\_t\* buffer\_size, + + void\* temp\_buffer); + +*rocSPARSE_spmm in 4.0* +~~~~~~~~~~~~~~~~~~~~~~~ + +rocsparse_status rocsparse_spmm(rocsparse_handle handle, + +:: + + rocsparse\_operation trans\_A, + + rocsparse\_operation trans\_B, + + const void\* alpha, + + const rocsparse\_spmat\_descr mat\_A, + + const rocsparse\_dnmat\_descr mat\_B, + + const void\* beta, + + const rocsparse\_dnmat\_descr mat\_C, + + rocsparse\_datatype compute\_type, + + rocsparse\_spmm\_alg alg, + + size\_t\* buffer\_size, + + void\* temp\_buffer); + +HIP API Deprecations and Warnings +--------------------------------- + +Warning - Arithmetic Operators of HIP Complex and Vector Types +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In this release, arithmetic operators of HIP complex and vector types +are deprecated. + +- As alternatives to arithmetic operators of HIP complex types, users + can use arithmetic operators of std::complex types. +- As alternatives to arithmetic operators of HIP vector types, users + can use the operators of the native clang vector type associated with + the data member of HIP vector types. + +During the deprecation, two macros_HIP_ENABLE_COMPLEX_OPERATORS +and_HIP_ENABLE_VECTOR_OPERATORS are provided to allow users to +conditionally enable arithmetic operators of HIP complex or vector +types. + +Note, the two macros are mutually exclusive and, by default, set to +*Off*. + +The arithmetic operators of HIP complex and vector types will be removed +in a future release. + +Refer to the HIP API Guide for more information. -Code Object Manager (Comgr) Functions -################################## +Refactor of HIPCC/HIPCONFIG +~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The following Code Object Manager (Comgr) functions are deprecated. +In prior ROCm releases, by default, the hipcc/hipconfig Perl scripts +were used to identify and set target compiler options, target platform, +compiler, and runtime appropriately. -* `amd_comgr_action_info_set_options` -* `amd_comgr_action_info_get_options` +In ROCm v5.0, hipcc.bin and hipconfig.bin have been added as the +compiled binary implementations of the hipcc and hipconfig. These new +binaries are currently a work-in-progress, considered, and marked as +experimental. ROCm plans to fully transition to hipcc.bin and +hipconfig.bin in the a future ROCm release. The existing hipcc and +hipconfig Perl scripts are renamed to hipcc.pl and hipconfig.pl +respectively. New top-level hipcc and hipconfig Perl scripts are +created, which can switch between the Perl script or the compiled binary +based on the environment variable HIPCC_USE_PERL_SCRIPT. -These functions were originally deprecated in version 1.3 of the Comgr library as they no longer supported options with embedded spaces. +In ROCm 5.0, by default, this environment variable is set to use hipcc +and hipconfig through the Perl scripts. -The deprecated functions are now replaced with the array-oriented options API, which include +Subsequently, Perl scripts will no longer be available in ROCm in a +future release. -* `amd_comgr_action_info_set_option_list` -* `amd_comgr_action_info_get_option_list_count` -* `amd_comgr_action_info_get_option_list_item` +Warning - Compiler-Generated Code Object Version 4 Deprecation +-------------------------------------------------------------- +Support for loading compiler-generated code object version 4 will be +deprecated in a future release with no release announcement and replaced +with code object 5 as the default version. -Hardware and Software Support Information -========================================== +The current default is code object version 4. -AMD ROCm is focused on using AMD GPUs to accelerate computational tasks such as machine learning, engineering workloads, and scientific computing. In order to focus our development efforts on these domains of interest, ROCm supports a targeted set of hardware configurations. +Warning - MIOpenTensile Deprecation +----------------------------------- -For more information, see +MIOpenTensile will be deprecated in a future release. -https://github.com/RadeonOpenCompute/ROCm -DISCLAIMER -=========== -The information contained herein is for informational purposes only and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD’s products are as set forth in a signed agreement between the parties or in AMD’s Standard Terms and Conditions of Sale. S -AMD, the AMD Arrow logo, Radeon, Ryzen, Epyc, and combinations thereof are trademarks of Advanced Micro Devices, Inc. -Google® is a registered trademark of Google LLC. -PCIe® is a registered trademark of PCI-SIG Corporation. -Linux is the registered trademark of Linus Torvalds in the U.S. and other countries. -Ubuntu and the Ubuntu logo are registered trademarks of Canonical Ltd. -Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. +Disclaimer +========== +The information presented in this document is for informational purposes +only and may contain technical inaccuracies, omissions, and +typographical errors. The information contained herein is subject to +change and may be rendered inaccurate for many reasons, including but +not limited to product and roadmap changes, component and motherboard +versionchanges, new model and/or product releases, product differences +between differing manufacturers, software changes, BIOS flashes, +firmware upgrades, or the like. Any computer system has risks of +security vulnerabilities that cannot be completely prevented or +mitigated.AMD assumes no obligation to update or otherwise correct or +revise this information. However, AMD reserves the right to revise this +information and to make changes from time to time to the content hereof +without obligation of AMD to notify any person of such revisions or +changes.THIS INFORMATION IS PROVIDED ‘AS IS.” AMD MAKES NO +REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND +ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS +THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY +IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR +ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR +ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES +ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS +EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.AMD, the AMD Arrow +logo, and combinations thereof are trademarks of Advanced Micro Devices, +Inc.Other product names used in this publication are for identification +purposes only and may be trademarks of their respective companies. +©[2021]Advanced Micro Devices, Inc.All rights reserved. + +Third-party Disclaimer +---------------------- + +Third-party content is licensed to you directly by the third party that +owns the content and is not licensed to you by AMD. ALL LINKED +THIRD-PARTY CONTENT IS PROVIDED “AS IS” WITHOUT A WARRANTY OF ANY KIND. +USE OF SUCH THIRD-PARTY CONTENT IS DONE AT YOUR SOLE DISCRETION AND +UNDER NO CIRCUMSTANCES WILL AMD BE LIABLE TO YOU FOR ANY THIRD-PARTY +CONTENT. YOU ASSUME ALL RISK AND ARE SOLELY RESPONSIBLE FOR ANY DAMAGES +THAT MAY ARISE FROM YOUR USE OF THIRD-PARTY CONTENT. diff --git a/Current_Release_Notes/Lib1.png b/Current_Release_Notes/Lib1.png new file mode 100644 index 00000000..c3bc73c2 Binary files /dev/null and b/Current_Release_Notes/Lib1.png differ diff --git a/Current_Release_Notes/MultiIns.png b/Current_Release_Notes/MultiIns.png deleted file mode 100644 index 27ba797d..00000000 Binary files a/Current_Release_Notes/MultiIns.png and /dev/null differ diff --git a/Current_Release_Notes/Versionchange1.png b/Current_Release_Notes/Versionchange1.png deleted file mode 100644 index 00a740e9..00000000 Binary files a/Current_Release_Notes/Versionchange1.png and /dev/null differ diff --git a/Current_Release_Notes/images/CG1.PNG b/Current_Release_Notes/images/CG1.PNG new file mode 100644 index 00000000..7b6f6a7f Binary files /dev/null and b/Current_Release_Notes/images/CG1.PNG differ diff --git a/Current_Release_Notes/images/CG2.PNG b/Current_Release_Notes/images/CG2.PNG new file mode 100644 index 00000000..0307743f Binary files /dev/null and b/Current_Release_Notes/images/CG2.PNG differ diff --git a/Current_Release_Notes/images/CG3.PNG b/Current_Release_Notes/images/CG3.PNG new file mode 100644 index 00000000..f68f1701 Binary files /dev/null and b/Current_Release_Notes/images/CG3.PNG differ diff --git a/Current_Release_Notes/images/CLI1.PNG b/Current_Release_Notes/images/CLI1.PNG new file mode 100644 index 00000000..f770c1c4 Binary files /dev/null and b/Current_Release_Notes/images/CLI1.PNG differ diff --git a/Current_Release_Notes/images/CLI2.PNG b/Current_Release_Notes/images/CLI2.PNG new file mode 100644 index 00000000..7f4571ad Binary files /dev/null and b/Current_Release_Notes/images/CLI2.PNG differ diff --git a/Current_Release_Notes/images/SMI.PNG b/Current_Release_Notes/images/SMI.PNG new file mode 100644 index 00000000..d996fad5 Binary files /dev/null and b/Current_Release_Notes/images/SMI.PNG differ diff --git a/Current_Release_Notes/images/keyfeatures.PNG b/Current_Release_Notes/images/keyfeatures.PNG new file mode 100644 index 00000000..0e8e423a Binary files /dev/null and b/Current_Release_Notes/images/keyfeatures.PNG differ diff --git a/Current_Release_Notes/images/latestGPU.PNG b/Current_Release_Notes/images/latestGPU.PNG new file mode 100644 index 00000000..db6934e0 Binary files /dev/null and b/Current_Release_Notes/images/latestGPU.PNG differ diff --git a/Current_Release_Notes/images/rocsolverAPI.PNG b/Current_Release_Notes/images/rocsolverAPI.PNG new file mode 100644 index 00000000..43b62058 Binary files /dev/null and b/Current_Release_Notes/images/rocsolverAPI.PNG differ diff --git a/Current_Release_Notes/lib2.png b/Current_Release_Notes/lib2.png new file mode 100644 index 00000000..b50000be Binary files /dev/null and b/Current_Release_Notes/lib2.png differ diff --git a/Current_Release_Notes/lib3.png b/Current_Release_Notes/lib3.png new file mode 100644 index 00000000..f5aff407 Binary files /dev/null and b/Current_Release_Notes/lib3.png differ diff --git a/Current_Release_Notes/lib4.png b/Current_Release_Notes/lib4.png new file mode 100644 index 00000000..adf16251 Binary files /dev/null and b/Current_Release_Notes/lib4.png differ diff --git a/Current_Release_Notes/lib5.png b/Current_Release_Notes/lib5.png new file mode 100644 index 00000000..43f1838c Binary files /dev/null and b/Current_Release_Notes/lib5.png differ diff --git a/Current_Release_Notes/lib6.png b/Current_Release_Notes/lib6.png new file mode 100644 index 00000000..47f0efa2 Binary files /dev/null and b/Current_Release_Notes/lib6.png differ diff --git a/Current_Release_Notes/onnx.png b/Current_Release_Notes/onnx.png deleted file mode 100644 index 032cfa79..00000000 Binary files a/Current_Release_Notes/onnx.png and /dev/null differ diff --git a/Current_Release_Notes/singleinstance.png b/Current_Release_Notes/singleinstance.png deleted file mode 100644 index a37861d4..00000000 Binary files a/Current_Release_Notes/singleinstance.png and /dev/null differ diff --git a/HIP Documentation/HIP Programming Guide.rst b/HIP Documentation/HIP Programming Guide.rst new file mode 100644 index 00000000..7563f7c7 --- /dev/null +++ b/HIP Documentation/HIP Programming Guide.rst @@ -0,0 +1,23 @@ +========================= +HIP Programing Guide +========================= + +What is this repository for? + +HIP allows developers to convert CUDA code to portable C++. The same source code can be compiled to run on NVIDIA or AMD GPUs. + +Key features +============== + +HIP is very thin and has little or no performance impact over coding directly in CUDA or hcc “HC” mode. + +HIP allows coding in a single-source C++ programming language including features such as templates, C++11 lambdas, classes, namespaces, +and more. + +HIP allows developers to use the “best” development environment and tools on each target platform. + +The “hipify” tool automatically converts source from CUDA to HIP. + +Developers can specialize for the platform (CUDA or hcc) to tune for performance or handle tricky cases + +New projects can be developed directly in the portable HIP C++ language and can run on either NVIDIA or AMD platforms. Additionally, HIP provides porting tools which make it easy to port existing CUDA codes to the HIP layer, with no loss of performance as compared to the original CUDA application. HIP is not intended to be a drop-in replacement for CUDA, and developers should expect to do some manual coding and performance tuning work to complete the port. diff --git a/Installation_Guide/Images/PackName.png b/Installation_Guide/Images/PackName.png new file mode 100644 index 00000000..f111a2a9 Binary files /dev/null and b/Installation_Guide/Images/PackName.png differ diff --git a/Installation_Guide/Images/ROCmProgMod.png b/Installation_Guide/Images/ROCmProgMod.png new file mode 100644 index 00000000..0ed1acbc Binary files /dev/null and b/Installation_Guide/Images/ROCmProgMod.png differ diff --git a/Installation_Guide/Images/SuppEnv.png b/Installation_Guide/Images/SuppEnv.png new file mode 100644 index 00000000..65376745 Binary files /dev/null and b/Installation_Guide/Images/SuppEnv.png differ diff --git a/Installation_Guide/MESA-Multimedia_Installation.rst b/Installation_Guide/MESA-Multimedia_Installation.rst new file mode 100644 index 00000000..5ee39735 --- /dev/null +++ b/Installation_Guide/MESA-Multimedia_Installation.rst @@ -0,0 +1,442 @@ +.. image:: amdblack.jpg + + +=============================== +Mesa Multimedia Installation +=============================== + +Prerequisites +-------------- + +- Ensure you have ROCm installed on the system. + +For ROCm installation instructions, see + +https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html + + +Mesa Multimedia Installation +------------------------------- + +System Prerequisites +####################### + +The following operating systems are supported for Mesa Multimedia: + +- Ubuntu 18.04.3 + +- Ubuntu 20.04, including dual kernel + + + + +Installation Prerequisites +############################ + + +1. Select the desired repository package to download the amdgpu graphics stack packages based on your required Ubuntu version and branch of code. + + +---------------------------------------+--------------------------------------+ + | Ubuntu 18.04 | Ubuntu 20.04 | + +=======================================+======================================+ + | amd-nonfree-mainline_18.04-1_all.deb | amd-nonfree-mainline_20.04-1_all.deb | + +---------------------------------------+--------------------------------------+ + | amd-nonfree-VERSION_18.04-1_all.deb | amd-nonfree-VERSION_20.04-1_all.deb | + +---------------------------------------+--------------------------------------+ + | amd-nonfree-staging_18.04-1_all.deb | amd-nonfree-staging_20.04-1_all.deb | + +---------------------------------------+--------------------------------------+ + + + +.. note:: + + For installing release drivers, VERSION must be replaced with a driver version. For example, 19.40, 19.50, 20.10, and others. + + +2. If installed, ensure the *amd-nonfree-mainline* package is uninstalled. Use the following instruction to uninstall: + + :: + + sudo dpkg --purge amd-nonfree-mainline + + +.. note:: + + If the *amd-nonfree-mainline* package is installed and available on the system, the following error displays: + + :: + + taccuser@mlseqa-hyd-virt-srv-07:~/4.0-mesa$ amdgpu-install --no-dkms + Reading package lists... Done + Building dependency tree + Reading state information... Done + E: Unable to locate package amdgpu-pin + Reading package lists... Done + Building dependency tree + Reading state information... Done + E: Unable to locate package amdgpu-pro-pin + ERROR: Unable to install pin package. + + + + + 3. Use the following instructions to download and install the selected package: + + :: + + MIRROR=artifactory-cdn.amd.com/artifactory/list/amdgpu-deb + + REPO_PKG=amd-nonfree-mainline_18.04-1_all.deb + + cd /tmp + + wget http://${MIRROR}/${REPO_PKG} + + sudo dpkg -i ${REPO_PKG} + + + +Installation Instructions +########################### + +1. Use the following installation instructions to install Mesa Multimeda: + +:: + sudo apt update + + sudo amdgpu-install -y --no-dkms + + + + +2. ``gstreamer`` Installation + +:: + + sudo apt-get -y install libgstreamer1.0-0 gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-plugins-ugly gstreamer1.0-plugins-bad gstreamer1.0- vaapi gstreamer1.0-libav gstreamer1.0-tools + + sudo apt-get -y install gst-omx-listcomponents gstreamer1.0-omx-bellagio-config gstreamer1.0-omx-generic gstreamer1.0-omx-generic-config + + + + +3. Utilities Installation + +:: + + sudo apt-get -y install mediainfo ffmpeg + + sudo reboot + + # Check amdgpu loadking status after reboot + + dmesg | grep -i initialized + + Sep 24 13:00:42 jz-tester kernel: [ 277.120055] [drm] VCN decode and encode initialized successfully. + + Sep 24 13:00:42 jz-tester kernel: [ 277.121654] [drm] Initialized amdgpu 3.34.0 20150101 for 0000:03:00.0 on minor 1 + + + + +4. Configure Running Environment Variables + +:: + + export BELLAGIO_SEARCH_PATH=/opt/amdgpu/lib/x86_64-linux-gnu/libomxil-bellagio0:/opt/amdgpu/lib/libomxil-bellagio0 + + export GST_PLUGIN_PATH=/opt/amdgpu/lib/x86_64-linux-gnu/gstreamer-1.0/ + + export GST_VAAPI_ALL_DRIVERS=1 + + export OMX_RENDER_NODE=/dev/dri/renderD128 + + + + + + +Check Installation +-------------------- + +1. Ensure you perform an installation check. + +The following instructions must be run with **sudo**: + + +:: + + omxregister-bellagio -v + + Scanning directory /opt/amdgpu/lib/libomxil-bellagio0/ + + Scanning library /opt/amdgpu/lib/libomxil-bellagio0/libomx_mesa.so + + Component OMX.mesa.video_decoder registered with 0 quality levels + + Specific role OMX.mesa.video_decoder.mpeg2 registered + + Specific role OMX.mesa.video_decoder.avc registered + + Specific role OMX.mesa.video_decoder.hevc registered + + Component OMX.mesa.video_encoder registered with 0 quality levels + + Specific role OMX.mesa.video_encoder.avc registered + + + 2 OpenMAX IL ST static components in 1 library successfully scanned + + +:: + + gst-inspect-1.0 omx + + +Plugin Details + + +---------------------------------------+--------------------------------------+ + | Name | OMX | + +---------------------------------------+--------------------------------------+ + | Description | GStreamer OpenMAX Plug-ins | + +---------------------------------------+--------------------------------------+ + | Filename | /usr/lib/x86_64-linux-gnu/ | + | | gstreamer-1.0/libgstomx.so | + +---------------------------------------+--------------------------------------+ + | Version | 1.12.4 | + +---------------------------------------+--------------------------------------+ + | License | LGPL | + +---------------------------------------+--------------------------------------+ + | Source module | gst-omx | + +---------------------------------------+--------------------------------------+ + | Source release date | 2017-12-07 | + +---------------------------------------+--------------------------------------+ + | Binary package | GStreamer OpenMAX Plug-ins source | + | | release | + +---------------------------------------+--------------------------------------+ + | Origin URL | Unknown package origin | + +---------------------------------------+--------------------------------------+ + + + omxmpeg2dec: OpenMAX MPEG2 Video Decoder + + omxh264dec: OpenMAX H.264 Video Decoder + + omxh264enc: OpenMAX H.264 Video Encoder + + + 3. Features + + +-- 3 elements + + :: + + +:: + + gst-inspect-1.0 vaapi + + + Plugin Details + + +---------------------------------------+--------------------------------------+ + | Name | vaapi | + +---------------------------------------+--------------------------------------+ + | Description | VA-API based elements | + +---------------------------------------+--------------------------------------+ + | Filename | /usr/lib/x86_64-linux-gnu/ | + | | gstreamer-1.0/libgstvaapi.so | + +---------------------------------------+--------------------------------------+ + | Version | 1.14.5 | + +---------------------------------------+--------------------------------------+ + | License | LGPL | + +---------------------------------------+--------------------------------------+ + | Source module | gstreamer-vaapi | + +---------------------------------------+--------------------------------------+ + | Source release date | 2019-05-29 | + +---------------------------------------+--------------------------------------+ + | Binary package | gstreamer-vaapi | + | | | + +---------------------------------------+--------------------------------------+ + | Origin URL | http://bugzilla.gnome.org | + | | /enter_bug.cgi?product=GStreamer | + +---------------------------------------+--------------------------------------+ + + + +:: + vaapijpegdec: VA-API JPEG decoder + vaapimpeg2dec: VA-API MPEG2 decoder + vaapih264dec: VA-API H264 decoder + vaapivc1dec: VA-API VC1 decoder + vaapivp9dec: VA-API VP9 decoder + vaapih265dec: VA-API H265 decoder + vaapipostproc: VA-API video postprocessing + vaapidecodebin: VA-API Decode Bin + vaapisink: VA-API sink + vaapih265enc: VA-API H265 encoder + vaapih264enc: VA-API H264 encoder + + 11 Features + + +-- 11 elements + + + + + +Verification Test +------------------- + +Run the verification test with *sudo*. For example, + +:: + + sudo gst-launch-1.0 -f filesrc location=./mpeg2/1080p/hdwatermellon_1_5.mpg ! mpegpsdemux ! mpegvideoparse ! vaapimpeg2dec ! filesink location=t.yuv + + +.. note:: + + If the verification test is not run with *sudo*, you may encounter the following error: + +:: + + (gst-plugin-scanner:8781): GLib-GObject-WARNING **: 20:04:40.048: cannot register existing type 'GstOMXVideoDec' + (gst-plugin-scanner:8781): GLib-CRITICAL **: 20:04:40.048: g_once_init_leave: assertion 'result != 0' failed + (gst-plugin-scanner:8781): GLib-GObject-CRITICAL **: 20:04:40.048: g_type_register_static: assertion 'parent_type > 0' failed + (gst-plugin-scanner:8781): GLib-CRITICAL **: 20:04:40.048: g_once_init_leave: assertion 'result != 0' failed + + +MPEG2 Decode +************** +:: + + gst-launch-1.0 -f filesrc location=./mpeg2/1080p/hdwatermellon_1_5.mpg ! mpegpsdemux ! mpegvideoparse ! omxmpeg2dec ! filesink location=t.yuv + + gst-launch-1.0 -f filesrc location=./mpeg2/1080p/hdwatermellon_1_5.mpg ! mpegpsdemux ! mpegvideoparse ! vaapimpeg2dec ! filesink location=t.yuv + + ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD129 -i ./mpeg2/1080p/hdwatermellon_1_5.mpg -bf 0 -c:v rawvideo -pix_fmt yuv420p t.yuv + + +AVC/H264 Decode +**************** +:: + + gst-launch-1.0 filesrc location=./1080p_H264.mp4 ! qtdemux name=demux demux.video_0 ! queue ! h264parse ! omxh264dec ! filesink location=t.yuv + + gst-launch-1.0 filesrc location=./1080p_H264.mp4 ! qtdemux name=demux demux.video_0 ! queue ! h264parse ! vaapih264dec ! filesink location=t.yuv + + ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD129 -i ./1080p_H264.mp4 -bf 0 -c:v rawvideo -pix_fmt yuv420p t.yuv + + gst-launch-1.0 filesrc location=./h264/4k/4K-CHIMEI-INN-60MBPS.MP4 ! qtdemux name=demux demux.video_0 ! queue ! h264parse ! vaapih264dec ! filesink location=t.yuv + + ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD129 -i ./h264/4k/4K-CHIMEI-INN-60MBPS.MP4 -bf 0 -c:v rawvideo -pix_fmt yuv420p t.yuv + + +AVC/H264 Encode +**************** +:: + + gst-launch-1.0 -f videotestsrc num-buffers=100 ! omxh264enc ! filesink location=t.h264 + + gst-launch-1.0 -f videotestsrc num-buffers=100 ! vaapih264enc ! filesink location=t.h264 + + ffmpeg -vaapi_device /dev/dri/renderD129 -s 1920x1080 -pix_fmt yuv420p -i t.yuv -vf 'format=nv12|vaapi,hwupload' -c:v h264_vaapi out.mp4 + + +VC1 Decode +********** +:: + + gst-launch-1.0 -v filesrc location=./vc1/1080p/1080P_ElephantsDream.wmv ! asfdemux ! vaapivc1dec ! filesink location=t.yuv + + ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD129 -i ./vc1/1080p/1080P_ElephantsDream.wmv -bf 0 -c:v rawvideo -pix_fmt yuv420p t.yuv + + +HEVC/H265 decode +***************** +:: + + gst-launch-1.0 filesrc location=./h265/Guardians_of_the_galaxy_trailer_720p.mp4 ! qtdemux name=demux demux.video_0 ! queue ! h265parse ! vaapih265dec ! filesink location=t.yuv + + ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD129 -i ./h265/Guardians_of_the_galaxy_trailer_720p.mp4 -bf 0 -c:v rawvideo -pix_fmt yuv420p t.yuv + + #10Bit + ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD129 -i ./Perfume_1080p_h265_10bit.mp4 -vcodec rawvideo -pixel_format yuv420p ./t.yuv + + +HEVC/H265 encode +****************** +:: + + gst-launch-1.0 -f videotestsrc num-buffers=100 ! vaapih265enc ! filesink location=t.h265 + + ffmpeg -vaapi_device /dev/dri/renderD129 -s 1920x1080 -pix_fmt yuv420p -i t.yuv -vf 'format=nv12|vaapi,hwupload' -c:v hevc_vaapi out.mp4 + + +VP9 decode +****************** +:: + + gst-launch-1.0 filesrc location=./VP9/'Grubby Grubby vs Neytpoh Pt.1 Warcraft 3 ORC vs NE Twisted -1.webm' ! matroskademux ! vaapivp9dec ! filesink location=t.yuv + + ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD129 -i ./VP9/'Grubby Grubby vs Neytpoh Pt.1 Warcraft 3 ORC vs NE Twisted -1.webm' -bf 0 -c:v rawvideo - pix_fmt yuv420p t.yuv + + #10Bit + ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD129 -i ./crowd_run_4096X2176_fr30_bd10_4buf_l5.webm -vcodec rawvideo -pixel_format yuv420p ./t.yuv + + +MJPEG Decode +****************** +:: + + gst-launch-1.0 filesrc location=./MJPEG/004_motion_720p60-420-lq.avi ! jpegparse ! vaapijpegdec ! filesink location=t.yuv + + ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD129 -i ./MJPEG/004_motion_720p60-420-lq.avi -bf 0 -c:v rawvideo -pix_fmt yuv420p t.yuv + + +VC1 Decode +****************** +:: + + gst-launch-1.0 -v filesrc location=./vc1/1080p/1080P_ElephantsDream.wmv ! asfdemux ! vaapivc1dec ! filesink location=t.yuv + + ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD129 -i ./vc1/1080p/1080P_ElephantsDream.wmv -bf 0 -c:v rawvideo -pix_fmt yuv420p t.yuv + + +Transcode +****************** +:: + + gst-launch-1.0 -f filesrc location=./h264/1080p/Inception.mp4 ! qtdemux ! vaapih264dec ! vaapih265enc ! filesink location=t.h265 + + gst-launch-1.0 -f filesrc location=./h265/shaun_white_480p.mp4 ! qtdemux ! vaapih265dec ! vaapih264enc ! filesink location=t.h264 + + ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -hwaccel_device /dev/dri/renderD129 -i mpeg2/1080p/hdwatermellon_1_5.mpg -bf 0 -c:v h264_vaapi ~/output.mp4 + + ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -hwaccel_device /dev/dri/renderD129 -i mpeg2/1080p/hdwatermellon_1_5.mpg -bf 0 -c:v hevc_vaapi ~/output.mp4 + + +Notes +========= + +1. AMD Instinct (TM) has no X server up, so the decode image will be dumped into a YUV (NV12) format file. It can be checked offline with a YUV player. + +2. Mediainfo can help you detect the original clip's format and resolution. For example, ``mediainfo ./MJPEG/004_motion_720p60-420-lq.avi`` + +3. ``ffmpeg`` can be used to play the YUV image file. For example, ``ffplay -framerate 30 -f rawvideo -video_size 1920x1080 -pixel_format nv12 t.yuv`` + +4. For VAAPI decode, the output video size requires 16-alignment. For example, 1920x1080 after decode, 1920x1088 must be used to play. + +5. You can find a quick test script in the attachment. You must, however, download ``mm_test_arct.instr``. The test clip is located at: + + http://lnx-jfrog/artifactory/linux-ci-generic-local/mesa/1080p_H264.mp4 + +6. vooya :: raw Video Sequence Player: https://www.offminor.de/ + +7. Use the command below to list the available amdgpu device render nodes: + +:: + + for i in $(ls /dev/dri/renderD* | xargs -l basename | cut -c8-);do [[ "$(grep "amdgpu" /sys/kernel/debug/dri/$i/name)" == "" ]] && continue;echo "AMD_RENDER_NODE=/dev/dri/renderD$i";done + diff --git a/Installation_Guide/PackName.png b/Installation_Guide/PackName.png new file mode 100644 index 00000000..f111a2a9 Binary files /dev/null and b/Installation_Guide/PackName.png differ diff --git a/Installation_Guide/ROCm Installation v4.5.rst b/Installation_Guide/ROCm Installation v4.5.rst new file mode 100644 index 00000000..b87b0b40 --- /dev/null +++ b/Installation_Guide/ROCm Installation v4.5.rst @@ -0,0 +1,258 @@ + +.. image:: /Installation_Guide/amdblack.jpg +| +============================================== +AMD ROCm Installation Guide v4.5 +============================================== + + +.. contents:: + + + +Overview of ROCm Installation Methods +-------------------------------------- + +In addition to the installation method using the native Package Manager, AMD ROCm v4.5 introduces new methods to install ROCm. With this release, the ROCm installation uses the amdgpu-install and amdgpu-uninstall scripts. + +The amdgpu-install script streamlines the installation process by: + +- Abstracting the distribution-specific package installation logic + +- Performing the repository set-up + +- Allowing a user to specify the use case and automating the installation of all the required packages + +- Performing post-install checks to verify whether the installation was completed successfully + +- Installing the uninstallation script + +The amdgpu-uninstall script allows the removal of the entire ROCm stack by using a single command. + +Some of the ROCm-specific use cases that the installer currently supports are: + + +- OpenCL (ROCr/KFD based) runtime + +- HIP runtimes + +- ROCm libraries and applications + +- ROCm Compiler and device libraries + +- ROCr runtime and thunk + + +For more information, refer to the Installation Methods section in this guide. + + + +About This Document +==================== + +This document is intended for users familiar with the Linux environments and discusses the installation/uninstallation of ROCm programming models on the various flavors of Linux. + + +This document also refers to Radeon™ Software for Linux® as AMDGPU stack, including the kernel-mode driver amdgpu-dkms. + + +The guide provides the installation instructions for the following: + + +- ROCm Installation + +- Heterogeneous-Computing Interface for Portability (HIP) SDK + +- OPENCL ™ SDK + +- Kernel Mode Driver + + + +System Requirements +====================== + + +The system requirements for the ROCm v4.5 installation are as follows: + +.. image:: SuppEnv.png + :alt: Screenshot + + + +.. note:: + + Installing ROCm on Linux will require superuser privileges. For systems that have enabled sudo packages, ensure you use the sudo prefix for all required commands. +  + + Prerequisite Actions + --------------------- + + + You must perform the following steps before installing ROCm programming models and check if the system meets all of the requirements to proceed with the installation. + +- Confirm the system has a supported Linux distribution version + +- Confirm the system has a ROCm-capable GPU + +- Confirm the system has standard compilers and tools installed + + + +Confirm You Have a Supported Linux Distribution Version +========================================================= + + +The ROCm installation is supported only on specific Linux distributions and their kernel versions. + +.. note:: + + The ROCm installation is not supported on 32-bit operating systems. + + +How to Check Linux Distribution and Kernel Versions on Your System +******************************************************************* + + +Linux Distribution Information +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Ensure you obtain the distribution information of the system by using the following command on your system from the Command Line Interface (CLI), + +:: + + $ uname -m && cat /etc/*release + + + + + + For example, running the command above on an Ubuntu system results in the following output: + + :: + + x86_64 + DISTRIB_ID=Ubuntu + DISTRIB_RELEASE=18.04 + DISTRIB_CODENAME=bionic + DISTRIB_DESCRIPTION="Ubuntu 18.04.5 LTS" + + + + +Kernel Information +^^^^^^^^^^^^^^^^^^^ + +Type the following command to check the kernel version of your Linux system. + +:: + $ uname -srmv + + + + + +The output of the command above lists the kernel version in the following format: + +:: + Linux 5.4.0-77-generic #86~18.04.5-Ubuntu SMP Fri Jun 18 01:23:22 UTC 2021 x86_64 + + + +OS and Kernel Version Match +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Confirm that the obtained Linux distribution and kernel versions match with System Requirements. + + +Confirm You Have a ROCm-Capable GPU +===================================== + +The ROCm platform is designed to support the following list of GPUs: + + + .. image:: ROCmProgMod.png + :alt: Screenshot + + +How to Verify Your System Has a ROCm-Capable GPU + +************************************************** + +To verify that your system has a ROCm-capable GPU, enter the following command from the Command Line Interface (CLI): + +:: + + $ lshw -class display + The command displays the details of detected GPUs on the system in the following format: + *-display + description: VGA compatible controller + product: Vega 20 + vendor: Advanced Micro Devices, Inc. [AMD/ATI] + physical id: 0 + bus info: pci@0000:43:00. + version: c1 + width: 64 bits + clock: 33MHz + capabilities: vga_controller bus_master cap_list rom + configuration: driver=amdgpu latency=0 + resources: irq:66 memory:80000000-8fffffff memory:90000000-901fffff ioport:2000(size=256) memory:9f600000-9f67ffff memory:c0000-dffff + + + +.. note:: + + Verify from the output that the product field value matches the supported GPU Architecture in the table above. + + +Confirm the System Has Compiler and Tools Installed +====================================================== + +You must install and configure Devtoolset-7 to use RHEL/CentOS 7.9 + + +How to Install and Configure Devtoolset-7 +******************************************* + +Refer to the RHEL/CentOS Environment section for more information on the steps necessary for installing and setting up Devtoolset-7. + + +Meta-packages in ROCm Programming Models +------------------------------------------ + +This section provides information about the required meta-packages for the following AMD ROCm™ programming models: + +- Heterogeneous-Computing Interface for Portability (HIP) + +- OpenCL™ + + +ROCm Package Naming Conventions +================================ + +A meta-package is a grouping of related packages and dependencies used to support a specific use-case, for example, running HIP applications. All meta-packages exist in both versioned and non-versioned forms. + +- Non-versioned packages – For a single installation of the latest version of ROCm + +- Versioned packages – For multiple installations of ROCm + + + .. image:: PackName.png + :alt: Screenshot + + +The image above demonstrates the single and multi-version ROCm packages' naming structure, including examples for various Linux distributions. + + +Components of ROCm Programming Models +======================================== + +The following image demonstrates the high-level layered architecture of ROCm programming models and their meta-packages. All meta-packages are a combination of required packages and libraries. For example, + +- rocm-hip-runtime is used to deploy on supported machines to execute HIP applications. + +- rocm-hip-sdk contains runtime components to deploy and execute HIP applications and tools to develop the applications. + + + + diff --git a/Installation_Guide/ROCmProgMod.png b/Installation_Guide/ROCmProgMod.png new file mode 100644 index 00000000..0ed1acbc Binary files /dev/null and b/Installation_Guide/ROCmProgMod.png differ diff --git a/Installation_Guide/SuppEnv.png b/Installation_Guide/SuppEnv.png new file mode 100644 index 00000000..65376745 Binary files /dev/null and b/Installation_Guide/SuppEnv.png differ diff --git a/ROCm_API_References/clSPARSE_api.rst b/ROCm_API_References/clSPARSE_api.rst deleted file mode 100644 index 7d65c9b2..00000000 --- a/ROCm_API_References/clSPARSE_api.rst +++ /dev/null @@ -1,15 +0,0 @@ -.. _clSPARSE_api: - -clSPARSE API Documentation -========================== - -It is an OpenCL library implementing Sparse linear algebra routines. - - * `Dense L1 BLAS operations `_ - Dense BLAS level 1 routines for dense vectors - - * `Sparse L2 BLAS operations `_ - Sparse BLAS level 2 routines for sparse matrix dense vector - - * `Sparse L3 BLAS operations `_ - Sparse BLAS level 3 routines for sparse matrix dense matrix diff --git a/index.rst b/index.rst index 2ce50a6b..8b5148e4 100644 --- a/index.rst +++ b/index.rst @@ -59,12 +59,19 @@ AMD ROCm gives developers the flexibility of choice for hardware and aids in the Release Notes Current_Release_Notes/Current-Release-Notes Installation_Guide/Installation-Guide + +.. toctree:: + :maxdepth: 4 + :hidden: + :caption: HIP Documentation + + HIP Programming_Guides/HIP Programming-Guides .. toctree:: :maxdepth: 6 :hidden: :caption: Developer Documentation - + Programming_Guides/Programming-Guides ROCm_Compiler_SDK/ROCm-Compiler-SDK