Skip to content
This repository has been archived by the owner on May 14, 2024. It is now read-only.
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: ROCm/ROCm-Device-Libs
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: roc-1.6.1
Choose a base ref
...
head repository: ROCm/ROCm-Device-Libs
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: roc-1.6.x
Choose a head ref

Commits on May 18, 2017

  1. installing other required headers for device enqueue

    Change-Id: I4cb4f1e4b780bbfc3465a90f846098068157073d
    ashwinma committed May 18, 2017
    Copy the full SHA
    570aabe View commit details
  2. removed couple of trailing spaces

    Change-Id: Id4b1064d90959b1010c9a440d336437e0ae824db
    ashwinma committed May 18, 2017
    Copy the full SHA
    e77df28 View commit details

Commits on Jul 12, 2017

  1. Merge "installing other required headers for device enqueue"

    ashwinma authored and Gerrit Code Review committed Jul 12, 2017
    Copy the full SHA
    41a856c View commit details
  2. Merge "removed couple of trailing spaces"

    ashwinma authored and Gerrit Code Review committed Jul 12, 2017
    Copy the full SHA
    56f712b View commit details
  3. Add cuda wrapper functions from remove-promote-change-addr-space branch

    Change-Id: I6340cb4605ba37e84aeada9d8fe407be118cf126
    Guansong Zhang committed Jul 12, 2017
    Copy the full SHA
    07b961e View commit details
  4. Update syncscope usage based on

      https://reviews.llvm.org/rL307722
    
    Change-Id: Iaf3d356d753b4665fc2ceb108952976e55705904
    kzhuravl committed Jul 12, 2017
    Copy the full SHA
    21000a2 View commit details

Commits on Jul 13, 2017

  1. Merge "Add cuda wrapper functions from remove-promote-change-addr-spa…

    …ce branch"
    Guansong Zhang authored and Gerrit Code Review committed Jul 13, 2017
    Copy the full SHA
    fcaf0c9 View commit details

Commits on Jul 14, 2017

  1. Add amdgcn--cuda as an option of AMDGPU_TARGET_TRIPLE

    Change-Id: I4423aaab86cce06eb750e422fe855e21312406fa
    Guansong Zhang committed Jul 14, 2017
    Copy the full SHA
    757a62d View commit details

Commits on Jul 18, 2017

  1. Use irif.h header to include functions using inline asm

    Change-Id: I4658fd709d808529d25fa3d524c895ade73a9103
    Guansong Zhang committed Jul 18, 2017
    Copy the full SHA
    835f078 View commit details

Commits on Jul 20, 2017

  1. Merge "Update syncscope usage based on https://reviews.llvm.org/rL307722

    "
    kzhuravl authored and Gerrit Code Review committed Jul 20, 2017
    Copy the full SHA
    1db13d8 View commit details

Commits on Jul 26, 2017

  1. Eliminate internal use of out arguments

    Change-Id: Ic89087ee823bb6ea2fb571f11ce1bcf6f582b921
    b-sumner committed Jul 26, 2017
    Copy the full SHA
    8e108c4 View commit details

Commits on Jul 28, 2017

  1. Drop always_inline attribute

    Change-Id: I624de6a34980e1dd905a4865e3eb3db80f62770f
    b-sumner committed Jul 28, 2017
    Copy the full SHA
    d4b5f4d View commit details

Commits on Jul 31, 2017

  1. Add relaxed math attributes to all functions

    Set relaxed math attributes. This does not mean a library module was built
    with those relaxations, but marks it compatible with the relaxations which
    may be used for the kernel module. Setting them prevents removal of them
    for a caller function, thus retaining original caller attributes.
    
    Change-Id: I45dcb6e05e6e92ebc916ba7cebcb0ca7a0de2502
    rampitec committed Jul 31, 2017
    Copy the full SHA
    a6f6461 View commit details

Commits on Aug 8, 2017

  1. Implement explicitly rounded basic operations

    Change-Id: Iae43c85d8e6f071674d235247ff4390dc0a94789
    b-sumner committed Aug 8, 2017
    Copy the full SHA
    3e18a45 View commit details

Commits on Aug 10, 2017

  1. Update fast fma test

    Change-Id: I54793aa1f15da105cd721a27438f19490e0076e4
    b-sumner committed Aug 10, 2017
    Copy the full SHA
    063b023 View commit details

Commits on Aug 11, 2017

  1. Fix CMake for HCC build

    CMAKE_SOURCE_DIR has been changed after recent changes in
    HCC. Use
    CMAKE_CURRENT_SOURCE_DIR to properly locate the source codes of ROCDL.
    
    Change-Id: If943fa094c9c0b4bd8d5b6dba9e22706759a83bf
    whchung committed Aug 11, 2017
    Copy the full SHA
    1bbab71 View commit details
  2. Switch to 64-bit intrinsics

    Change-Id: I6119d87f9ed44629823a7dd88c78da340893688a
    b-sumner committed Aug 11, 2017
    Copy the full SHA
    1fbc67a View commit details
  3. Merge "Switch to 64-bit intrinsics"

    b-sumner authored and Gerrit Code Review committed Aug 11, 2017
    Copy the full SHA
    6f09e8f View commit details
  4. Revise HC workitem indexing function implementation

    - Make sure all functions return 32-bit unsigned integers.
    - Optimize implementation of amp_get_local_size().
    - Optimize implementation of amp_get_num_groups().
    
    Change-Id: Ib27caa0030071bca338ab924fa405ad288dbc0c2
    whchung committed Aug 11, 2017
    Copy the full SHA
    a57dee3 View commit details

Commits on Aug 17, 2017

  1. Add atomic support

    Change-Id: I8d5eb3b052441664933e52114cfa3b8e7762c9b7
    b-sumner committed Aug 17, 2017
    Copy the full SHA
    6d7336b View commit details

Commits on Aug 18, 2017

  1. Copy the full SHA
    db41889 View commit details

Commits on Aug 23, 2017

  1. For GFX9+ ring the full 64-bit doorbell directly

    Change-Id: Ie8dc8748a40f01220f7d4d98fe5216d625635bbc
    ashwinma committed Aug 23, 2017
    Copy the full SHA
    805795d View commit details

Commits on Aug 24, 2017

  1. Update attributes

    Change-Id: Ida1b7322defc5b4e6fb8eec354c7d6cfe26afd8f
    b-sumner committed Aug 24, 2017
    Copy the full SHA
    c58ed46 View commit details

Commits on Aug 29, 2017

  1. Revert "Add relaxed math attributes to all functions"

    This reverts commit a6f6461.
    
    This is incorrect. You cannot assume these attributes for the
    library functions. IR optimzations can rely on these assumptions
    and break the library IR. Codegen also can break. For example this
    triggers fcanonicalize elimination optimizations if the function is
    emitted as a call. Fixes various conformance failures when stress
    testing calls.
    
    Change-Id: I51c25cb4e7b178fce2e2656d473aeb7b20e40858
    arsenm committed Aug 29, 2017
    Copy the full SHA
    6a77f3f View commit details

Commits on Aug 30, 2017

  1. Avoid WQM instructions

    Change-Id: I7454599e5e794be5ec62a03e8b82d12398839426
    b-sumner committed Aug 30, 2017
    Copy the full SHA
    d70e343 View commit details

Commits on Sep 1, 2017

  1. Update bit counting functions

    Change-Id: Iedf2217e0c7e0c1987cb127276aff1ce48272354
    b-sumner committed Sep 1, 2017
    Copy the full SHA
    e5f678f View commit details

Commits on Sep 6, 2017

  1. Merge "For GFX9+ ring the full 64-bit doorbell directly"

    ashwinma authored and Gerrit Code Review committed Sep 6, 2017
    Copy the full SHA
    812e697 View commit details
  2. Remove workaround for atomics

    Change-Id: Ia82031d162b23a70689506258306846d950e93e8
    b-sumner committed Sep 6, 2017
    Copy the full SHA
    79d2d27 View commit details

Commits on Sep 7, 2017

  1. Change address space to enable enqueue

    Change-Id: I99ab1f49e0369946af2620c272950e4ec306ac79
    b-sumner committed Sep 7, 2017
    Copy the full SHA
    abc46c9 View commit details

Commits on Sep 13, 2017

  1. Merge "Revert "Add relaxed math attributes to all functions""

    arsenm authored and Gerrit Code Review committed Sep 13, 2017
    Copy the full SHA
    4b1b066 View commit details

Commits on Sep 15, 2017

  1. Pipe functions

    Change-Id: I47368f5e3d7b1083d0e7ba8a9b355cd7f5433f19
    b-sumner committed Sep 15, 2017
    Copy the full SHA
    94b5167 View commit details

Commits on Sep 23, 2017

  1. Rename tool_output_file to ToolOutputFile, NFC

    See r314050 for more details
    
    Change-Id: I06a1e62d537abc6a40dd8bb9453059238416904b
    kzhuravl committed Sep 23, 2017
    Copy the full SHA
    a985d4a View commit details

Commits on Sep 27, 2017

  1. Initial documentation for OCKL

    Change-Id: I3f61348dd6eb87479ed438c52da8c8a512044cc0
    b-sumner committed Sep 27, 2017
    Copy the full SHA
    c6b7afe View commit details

Commits on Oct 30, 2017

  1. Merge pull request #47 from RadeonOpenCompute/roc-1.6.4

    roc-1.6.4 updates
    kzhuravl authored Oct 30, 2017
    Copy the full SHA
    c36d9f7 View commit details
Showing 325 changed files with 4,133 additions and 1,199 deletions.
6 changes: 6 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -33,6 +33,9 @@ if (GENERIC_IS_ZERO)
set(AMDGPU_TARGET_TRIPLE "amdgcn--amdhsa-amdgizcl")
# HCC will execute utils/change-addr-space.sh
# and apply utils/add_amdgiz.sed on all .ll files in subdirectory hc/, irif/, opencl/
if (CUDA_TRIPLE)
set(AMDGPU_TARGET_TRIPLE "amdgcn--cuda")
endif (CUDA_TRIPLE)

endif (GENERIC_IS_ZERO)

@@ -52,6 +55,9 @@ add_subdirectory(oclc)
add_subdirectory(ocml)
add_subdirectory(ockl)
add_subdirectory(opencl)
if (CUDA_TRIPLE)
add_subdirectory(cuda2gcn)
endif (CUDA_TRIPLE)

if(BUILD_HC_LIB)
add_subdirectory(hc)
17 changes: 17 additions & 0 deletions cuda2gcn/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
##===--------------------------------------------------------------------------
## ROCm Device Libraries
##
## This file is distributed under the University of Illinois Open Source
## License. See LICENSE.TXT for details.
##===--------------------------------------------------------------------------

file(GLOB cl_sources
${CMAKE_CURRENT_SOURCE_DIR}/src/*.cl
)

file(GLOB sources ${cl_sources})

include_directories(${CMAKE_CURRENT_SOURCE_DIR}/../ocml/inc)
include_directories(${CMAKE_CURRENT_SOURCE_DIR}/../ockl/inc)
include_directories(${CMAKE_CURRENT_SOURCE_DIR}/../irif/inc)
opencl_bc_lib(cuda2gcn ${sources})
46 changes: 46 additions & 0 deletions cuda2gcn/src/bitsbytes.cl
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
/*===--------------------------------------------------------------------------
* ROCm Device Libraries
*
* This file is distributed under the University of Illinois Open Source
* License. See LICENSE.TXT for details.
*===------------------------------------------------------------------------*/

#include "ockl.h"
#include "irif.h"

#define ATTR __attribute__((always_inline, const))

//-------- T __nv_brev
ATTR int __nv_brev(int x) { return __llvm_bitreverse_i32(x); }

//-------- T __nv_brevll
ATTR long __nv_brevll(long x) { return __llvm_bitreverse_i64(x); }

//-------- T __nv_clz
ATTR int __nv_clz(int x)
{
return (int)__ockl_clz_u32((uint)x);
}

//-------- T __nv_clzll
ATTR int __nv_clzll(long x)
{
uint xlo = (uint)x;
uint xhi = (uint)(x >> 32);
uint zlo = __ockl_clz_u32(xlo) + 32u;
uint zhi = __ockl_clz_u32(xhi);
return (int)(xhi == 0 ? zlo : zhi);
}

//-------- T __nv_ffs
ATTR int __nv_ffs(int x) { return (32 - __nv_clz(x&(-x))); }

//-------- T __nv_ffsll
ATTR int __nv_ffsll(long x) { return (int)(64 - __nv_clzll(x&(-x))); }

//-------- T __nv_popc
ATTR int __nv_popc(int x) { return __llvm_ctpop_i32(x); }

//-------- T __nv_popcll
ATTR int __nv_popcll(long x) { return (int)__llvm_ctpop_i64(x); }

150 changes: 150 additions & 0 deletions cuda2gcn/src/convert.cl
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
/*===--------------------------------------------------------------------------
* ROCm Device Libraries
*
* This file is distributed under the University of Illinois Open Source
* License. See LICENSE.TXT for details.
*===------------------------------------------------------------------------*/

#define ATTR __attribute__((always_inline, const))

#define CONVERTM(A,B,m,n) ATTR B __nv_##A##2##B##_##m(A x) \
{ return convert_##B##_##n(x); }

#define CONVERT(A,B) \
CONVERTM(A, B, rd, rtn) \
CONVERTM(A, B, rn, rte) \
CONVERTM(A, B, ru, rtp) \
CONVERTM(A, B, rz, rtz)

//-------- T __nv_double2float_rd
//-------- T __nv_double2float_rn
//-------- T __nv_double2float_ru
//-------- T __nv_double2float_rz
CONVERT(double, float)

//-------- T __nv_double2int_rd
//-------- T __nv_double2int_rn
//-------- T __nv_double2int_ru
//-------- T __nv_double2int_rz
CONVERT(double, int)

//-------- T __nv_float2int_rd
//-------- T __nv_float2int_rn
//-------- T __nv_float2int_ru
//-------- T __nv_float2int_rz
CONVERT(float, int)

//-------- T __nv_int2float_rd
//-------- T __nv_int2float_rn
//-------- T __nv_int2float_ru
//-------- T __nv_int2float_rz
CONVERT(int, float)

//-------- T __nv_double2uint_rd
//-------- T __nv_double2uint_rn
//-------- T __nv_double2uint_ru
//-------- T __nv_double2uint_rz
CONVERT(double, uint)

//-------- T __nv_float2uint_rd
//-------- T __nv_float2uint_rn
//-------- T __nv_float2uint_ru
//-------- T __nv_float2uint_rz
CONVERT(float, uint)

//-------- T __nv_uint2double_rd
//-------- T __nv_uint2double_rn
//-------- T __nv_uint2double_ru
//-------- T __nv_uint2double_rz
CONVERT(uint, double)

//-------- T __nv_uint2float_rd
//-------- T __nv_uint2float_rn
//-------- T __nv_uint2float_ru
//-------- T __nv_uint2float_rz
CONVERT(uint, float)

#define CONVERT2LLM(A,B,m,n) ATTR long __nv_##A##2ll_##m(A x) \
{ return convert_long_##n(x); }

#define CONVERT2LL(A) \
CONVERT2LLM(A, long, rd, rtn) \
CONVERT2LLM(A, long, rn, rte) \
CONVERT2LLM(A, long, ru, rtp) \
CONVERT2LLM(A, long, rz, rtz)

//-------- T __nv_double2ll_rd
//-------- T __nv_double2ll_rn
//-------- T __nv_double2ll_ru
//-------- T __nv_double2ll_rz
CONVERT2LL(double)

//-------- T __nv_float2ll_rd
//-------- T __nv_float2ll_rn
//-------- T __nv_float2ll_ru
//-------- T __nv_float2ll_rz
CONVERT2LL(float)

#define CONVERT2ULLM(A,B,m,n) ATTR ulong __nv_##A##2ull_##m(A x) \
{ return convert_ulong_##n(x); }

#define CONVERT2ULL(A) \
CONVERT2ULLM(A, ulong, rd, rtn) \
CONVERT2ULLM(A, ulong, rn, rte) \
CONVERT2ULLM(A, ulong, ru, rtp) \
CONVERT2ULLM(A, ulong, rz, rtz)

//-------- T __nv_double2ull_rd
//-------- T __nv_double2ull_rn
//-------- T __nv_double2ull_ru
//-------- T __nv_double2ull_rz
CONVERT2ULL(double)

//-------- T __nv_float2ull_rd
//-------- T __nv_float2ull_rn
//-------- T __nv_float2ull_ru
//-------- T __nv_float2ull_rz
CONVERT2ULL(float)

#define CONVERT4LLM(A,B,m,n) ATTR B __nv_ll2##B##_##m(long x) \
{ return convert_##B##_##n(x); }

#define CONVERT4LL(B) \
CONVERT4LLM(long, B, rd, rtn) \
CONVERT4LLM(long, B, rn, rte) \
CONVERT4LLM(long, B, ru, rtp) \
CONVERT4LLM(long, B, rz, rtz)

//-------- T __nv_ll2double_rd
//-------- T __nv_ll2double_rn
//-------- T __nv_ll2double_ru
//-------- T __nv_ll2double_rz
CONVERT4LL(double)

//-------- T __nv_ll2float_rd
//-------- T __nv_ll2float_rn
//-------- T __nv_ll2float_ru
//-------- T __nv_ll2float_rz
CONVERT4LL(float)

#define CONVERT4ULLM(A,B,m,n) ATTR B __nv_ull2##B##_##m(ulong x) \
{ return convert_##B##_##n(x); }

#define CONVERT4ULL(B) \
CONVERT4ULLM(ulong, B, rd, rtn) \
CONVERT4ULLM(ulong, B, rn, rte) \
CONVERT4ULLM(ulong, B, ru, rtp) \
CONVERT4ULLM(ulong, B, rz, rtz)

//-------- T __nv_ull2double_rd
//-------- T __nv_ull2double_rn
//-------- T __nv_ull2double_ru
//-------- T __nv_ull2double_rz
CONVERT4ULL(double)

//-------- T __nv_ull2float_rd
//-------- T __nv_ull2float_rn
//-------- T __nv_ull2float_ru
//-------- T __nv_ull2float_rz
CONVERT4ULL(float)

33 changes: 33 additions & 0 deletions cuda2gcn/src/float.cl
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
/*===--------------------------------------------------------------------------
* ROCm Device Libraries
*
* This file is distributed under the University of Illinois Open Source
* License. See LICENSE.TXT for details.
*===------------------------------------------------------------------------*/

#define ATTR __attribute__((always_inline, const))

//-------- T __nv_finitef
ATTR int __nv_finitef(float x) { return isfinite(x); }

//-------- T __nv_isfinited
ATTR int __nv_isfinited(double x) { return isfinite(x); }

//-------- T __nv_isinfd
ATTR int __nv_isinfd(double x) { return isinf(x); }

//-------- T __nv_isinff
ATTR int __nv_isinff(float x) { return isinf(x); }

//-------- T __nv_isnand
ATTR int __nv_isnand(double x) { return isnan(x); }

//-------- T __nv_isnanf
ATTR int __nv_isnanf(float x) { return isnan(x); }

//-------- T __nv_nan
ATTR double __nv_nan(char *tagp) { return __builtin_nan(tagp); }

//-------- T __nv_nanf
ATTR float __nv_nanf(char *tagp) { return __builtin_nan(tagp); }

54 changes: 54 additions & 0 deletions cuda2gcn/src/generic.cl
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
/*===--------------------------------------------------------------------------
* ROCm Device Libraries
*
* This file is distributed under the University of Illinois Open Source
* License. See LICENSE.TXT for details.
*===------------------------------------------------------------------------*/

#define ATTR __attribute__((always_inline, const))

#define MAX(x, y) (((x) > (y)) ? (x) : (y))
#define MIN(x, y) (((x) < (y)) ? (x) : (y))

//-------- T __nv_abs
ATTR int __nv_abs(int x) { return abs(x); }

//-------- T __nv_llabs
ATTR long __nv_llabs(long x) { return abs(x); }

//-------- T __nv_max
ATTR int __nv_max(int a, int b) { return MAX(a,b); }

//-------- T __nv_llmax
ATTR long __nv_llmax(long a, long b) { return MAX(a,b); }

//-------- T __nv_ullmax
ATTR ulong __nv_ullmax(ulong a, ulong b) { return MAX(a,b); }

//-------- T __nv_umax
ATTR uint __nv_umax(uint a, uint b) { return MAX(a,b); }

//-------- T __nv_min
ATTR int __nv_min(int a, int b) { return MIN(a,b); }

//-------- T __nv_llmin
ATTR long __nv_llmin(long a, long b) { return MIN(a,b); }

//-------- T __nv_ullmin
ATTR ulong __nv_ullmin(ulong a, ulong b) { return MIN(a,b); }

//-------- T __nv_umin
ATTR uint __nv_umin(uint a, uint b) { return MIN(a,b); }

//-------- T __nv_sad
ATTR uint __nv_sad(int x, int y, uint z)
{
return (z+abs(x-y));
}

//-------- T __nv_usad
ATTR uint __nv_usad(uint x, uint y, uint z)
{
return (z+abs(x-y));
}

23 changes: 23 additions & 0 deletions cuda2gcn/src/half.cl
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
/*===--------------------------------------------------------------------------
* ROCm Device Libraries
*
* This file is distributed under the University of Illinois Open Source
* License. See LICENSE.TXT for details.
*===------------------------------------------------------------------------*/

#pragma OPENCL EXTENSION cl_khr_fp16 : enable

#define ATTR __attribute__((always_inline, const))

//-------- T __nv_float2half_rn
half __nv_float2half_rn(float x)
{
return (half)x;
}

//-------- T __nv_half2float
float __nv_half2float(half x)
{
return (float)x;
}

29 changes: 29 additions & 0 deletions cuda2gcn/src/integer.cl
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
/*===--------------------------------------------------------------------------
* ROCm Device Libraries
*
* This file is distributed under the University of Illinois Open Source
* License. See LICENSE.TXT for details.
*===------------------------------------------------------------------------*/

#include "ockl.h"

#define ATTR __attribute__((always_inline, const))

//-------- T __nv_mul24
ATTR int __nv_mul24(int x, int y) { return __ockl_mul24_i32(x, y); }

//-------- T __nv_umul24
ATTR uint __nv_umul24(uint x, uint y) { return __ockl_mul24_u32(x, y); }

//-------- T __nv_mul64hi
ATTR long __nv_mul64hi(long x, long y) { return __ockl_mul_hi_i64(x,y); }

//-------- T __nv_mulhi
ATTR int __nv_mulhi(int x, int y) { return __ockl_mul_hi_i32(x,y); }

//-------- T __nv_umul64hi
ATTR ulong __nv_umul64hi(ulong x, ulong y) { return __ockl_mul_hi_u64(x,y); }

//-------- T __nv_umulhi
ATTR uint __nv_umulhi(uint x, uint y) { return __ockl_mul_hi_u32(x,y); }

Loading