Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inputing non-isbits types #128

Closed
bjarthur opened this issue Mar 13, 2023 · 7 comments
Closed

inputing non-isbits types #128

bjarthur opened this issue Mar 13, 2023 · 7 comments

Comments

@bjarthur
Copy link

bjarthur commented Mar 13, 2023

a simple modification to the example in the README does not work:

julia> using Metal

julia> function vaddT(T, a, b, c)            ### T is input here; nominally it is a Type
           i = thread_position_in_grid_1d()
           c[i] = a[i] + T(b[i])             ### it is used here
           return
       end
vaddT (generic function with 1 method)

julia> a = MtlArray([1,1,1,1]); b = MtlArray([2,2,2,2]); c = similar(a);

julia> @metal threads=2 groups=2 vaddT(Float32, a, b, c)
ERROR: InvalidIRError: compiling kernel #vaddT(Type{Float32}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to gpu_malloc)
Stacktrace:
 [1] malloc
   @ ~/.julia/packages/GPUCompiler/S3TWf/src/runtime.jl:89
 [2] macro expansion
   @ ~/.julia/packages/GPUCompiler/S3TWf/src/runtime.jl:184
 [3] macro expansion
   @ ./none:0
 [4] box
   @ ./none:0
 [5] box_float32
   @ ~/.julia/packages/GPUCompiler/S3TWf/src/runtime.jl:213
 [6] Int64
   @ ./float.jl:788
 [7] convert
   @ ./number.jl:7
 [8] setindex!
   @ ~/.julia/dev/Metal/src/device/array.jl:105
 [9] vaddT
   @ ./REPL[2]:3
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams, GPUCompiler.FunctionSpec{typeof(vaddT), Tuple{Type{Float32}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}}}}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/S3TWf/src/validation.jl:141
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/S3TWf/src/driver.jl:418 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/S3TWf/src/driver.jl:417 [inlined]
  [5] emit_llvm(job::GPUCompiler.CompilerJob, method_instance::Any; libraries::Bool, deferred_codegen::Bool, optimize::Bool, cleanup::Bool, only_entry::Bool, validate::Bool, ctx::LLVM.Context)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/S3TWf/src/utils.jl:83
  [6] mtlfunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:166
  [7] #40
    @ ~/.julia/dev/Metal/src/compiler/execution.jl:161 [inlined]
  [8] JuliaContext(f::Metal.var"#40#41"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams, GPUCompiler.FunctionSpec{typeof(vaddT), Tuple{Type{Float32}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}}}}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/S3TWf/src/driver.jl:76
  [9] mtlfunction_compile(job::GPUCompiler.CompilerJob)
    @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:160
 [10] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(Metal.mtlfunction_compile), linker::typeof(Metal.mtlfunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/S3TWf/src/cache.jl:90
 [11] mtlfunction(f::typeof(vaddT), tt::Type{Tuple{Type{Float32}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:148
 [12] mtlfunction(f::typeof(vaddT), tt::Type{Tuple{Type{Float32}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}}})
    @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:141
 [13] top-level scope
    @ ~/.julia/dev/Metal/src/compiler/execution.jl:64
 [14] top-level scope
    @ ~/.julia/dev/Metal/src/initialization.jl:33

i'm hoping we can get this to work, as it works fine with CUDA.jl:

julia> using CUDA

julia> function vaddT(T, a, b, c)
                  i = threadIdx().x
                  c[i] = a[i] + T(b[i])
                  return
              end
vaddT (generic function with 1 method)

julia> a = CuArray([1,1,1,1]); b = CuArray([2,2,2,2]); c = similar(a);

julia> @cuda threads=4 vaddT(Float32, a,b,c)
CUDA.HostKernel{typeof(vaddT), Tuple{Type{Float32}, CuDeviceVector{Int64, 1}, CuDeviceVector{Int64, 1}, CuDeviceVector{Int64, 1}}}(vaddT, CuFunction(Ptr{CUDA.CUfunc_st} @0x000000000578ad50, CuModule(Ptr{CUDA.CUmod_st} @0x0000000005494bd0, CuContext(0x0000000001ca6140, instance 510ba3156c98e3a9))), CUDA.KernelState(Ptr{Nothing} @0x00007fb21ba00000))

julia> c
4-element CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}:
 3
 3
 3
 3

this is with julia 1.8.5, 0-day master of Metal.jl, and an M2 Max

@maleadt
Copy link
Member

maleadt commented Mar 13, 2023

You're actually passing a type, while you should be specializing on it instead (kernel(::Type{T}) where T).

@bjarthur
Copy link
Author

thanks for the quick reply. that gives the same error though.

i think the fundamental problem is that type conversion is not allowed in a kernel, as the modified function below, which does NOT input a type nor specialize, but rather calls Float32 directly, also gives the same error.

julia> using Metal

julia> function vaddT(a, b, c)
           i = thread_position_in_grid_1d()
           c[i] = a[i] + Float32(b[i])
           return
       end
vaddT (generic function with 1 method)

julia> a = MtlArray([1,1,1,1]); b = MtlArray([2,2,2,2]); c = similar(a);

julia> @metal threads=2 groups=2 vaddT(a, b, c)
ERROR: InvalidIRError: compiling kernel #vaddT(MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to gpu_malloc)
Stacktrace:
 [1] malloc
   @ ~/.julia/packages/GPUCompiler/S3TWf/src/runtime.jl:89
 [2] macro expansion
   @ ~/.julia/packages/GPUCompiler/S3TWf/src/runtime.jl:184
 [3] macro expansion
   @ ./none:0
 [4] box
   @ ./none:0
 [5] box_float32
   @ ~/.julia/packages/GPUCompiler/S3TWf/src/runtime.jl:213
 [6] Int64
   @ ./float.jl:788
 [7] convert
   @ ./number.jl:7
 [8] setindex!
   @ ~/.julia/dev/Metal/src/device/array.jl:105
 [9] vaddT
   @ ./REPL[2]:3
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams, GPUCompiler.FunctionSpec{typeof(vaddT), Tuple{MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}}}}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/S3TWf/src/validation.jl:141
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/S3TWf/src/driver.jl:418 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/S3TWf/src/driver.jl:417 [inlined]
  [5] emit_llvm(job::GPUCompiler.CompilerJob, method_instance::Any; libraries::Bool, deferred_codegen::Bool, optimize::Bool, cleanup::Bool, only_entry::Bool, validate::Bool, ctx::LLVM.Context)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/S3TWf/src/utils.jl:83
  [6] mtlfunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:166
  [7] #40
    @ ~/.julia/dev/Metal/src/compiler/execution.jl:161 [inlined]
  [8] JuliaContext(f::Metal.var"#40#41"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams, GPUCompiler.FunctionSpec{typeof(vaddT), Tuple{MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}}}}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/S3TWf/src/driver.jl:76
  [9] mtlfunction_compile(job::GPUCompiler.CompilerJob)
    @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:160
 [10] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(Metal.mtlfunction_compile), linker::typeof(Metal.mtlfunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/S3TWf/src/cache.jl:90
 [11] mtlfunction(f::typeof(vaddT), tt::Type{Tuple{MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:148
 [12] mtlfunction(f::typeof(vaddT), tt::Type{Tuple{MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}}})
    @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:141
 [13] top-level scope
    @ ~/.julia/dev/Metal/src/compiler/execution.jl:64
 [14] top-level scope
    @ ~/.julia/dev/Metal/src/initialization.jl:33

@bjarthur
Copy link
Author

more debugging data, which might or might not be related / informative: outside of the kernel, Float32 works, but sin by itself does not:

julia> Float32.(b)
4-element MtlVector{Float32}:
 2.0
 2.0
 2.0
 2.0

julia> sin.(Array(b))
4-element Vector{Float64}:
 0.9092974268256817
 0.9092974268256817
 0.9092974268256817
 0.9092974268256817

julia> sin.(Float32.(b))
4-element MtlVector{Float32}:
 0.9092974
 0.9092974
 0.9092974
 0.9092974

julia> sin.(b)
ERROR: InvalidIRError: compiling kernel #broadcast_kernel#28(Metal.mtlKernelContext, MtlDeviceVector{Float64, 1}, Base.Broadcast.Broadcasted{Metal.MtlArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(sin), Tuple{Base.Broadcast.Extruded{MtlDeviceVector{Int64, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64) resulted in invalid LLVM IR
Reason: unsupported use of double floating-point value
Stacktrace:
 [1] reinterpret
   @ ./essentials.jl:438
 [2] paynehanek
   @ ./special/rem_pio2.jl:139
 [3] rem_pio2_kernel
   @ ./special/rem_pio2.jl:282
 [4] sin
   @ ./special/trig.jl:41
 [5] sin
   @ ./math.jl:1372
 [6] _broadcast_getindex_evalf
   @ ./broadcast.jl:670
 [7] _broadcast_getindex
   @ ./broadcast.jl:643
 [8] getindex
   @ ./broadcast.jl:597
 [9] broadcast_kernel
   @ ~/.julia/packages/GPUArrays/6STCb/src/host/broadcast.jl:59

@maleadt
Copy link
Member

maleadt commented Mar 15, 2023

You are mixing different bug reports in a single issue; please keep them separate (also for questions like this Discourse is better suited).


julia> sin.(b)
ERROR: InvalidIRError: compiling kernel #broadcast_kernel#28(Metal.mtlKernelContext, MtlDeviceVector{Float64, 1}, Base.Broadcast.Broadcasted{Metal.MtlArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(sin), Tuple{Base.Broadcast.Extruded{MtlDeviceVector{Int64, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64) resulted in invalid LLVM IR
Reason: unsupported use of double floating-point value

Works for me:

julia> b = rand(Float32, 1)
1-element Vector{Float32}:
 0.9079935

julia> sin.(MtlArray(b))
1-element MtlVector{Float32}:
 0.78827065

ERROR: InvalidIRError: compiling kernel #vaddT(MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to gpu_malloc)

That's because Metal currently does not support exceptions, #69, and assigning a Float32 to an Int array (like you're doing in your kernel) can throw an InexactError.

@maleadt maleadt closed this as not planned Won't fix, can't repro, duplicate, stale Mar 15, 2023
@bjarthur
Copy link
Author

bjarthur commented Mar 15, 2023

thanks for the help, and sorry for the dumb questions.

to clarify your previous comment about specializing, that is not expected to work either i guess, right? at least i can't get it to:

julia> using Metal

julia> a = MtlArray([1,1,1,1f0]); b = MtlArray([2,2,2,2f0]); c = similar(a);

julia> function vadd1(a, b, c)  # specialize on T
           function kernel1(::Type{T}, a, b, c) where T
               i = thread_position_in_grid_1d()
               c[i] = a[i] + T(b[i])
               return
           end
           T = Float32
           @metal threads=2 groups=2 kernel1(T, a, b, c)
       end
vadd1 (generic function with 1 method)

julia> vadd1(a, b, c)
ERROR: AssertionError: isbits(arg)
Stacktrace:
 [1] (::Metal.var"#43#45"{MTLSize, MTLSize, Metal.HostKernel{var"#kernel1#1", Tuple{Type{Float32}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}}}, Tuple{DataType, MtlVector{Float32}, MtlVector{Float32}, MtlVector{Float32}}, Vector{MTLBuffer}})(cce::Metal.MTL.MTLComputeCommandEncoderInstance)
   @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:216
 [2] MTLComputeCommandEncoder(f::Metal.var"#43#45"{MTLSize, MTLSize, Metal.HostKernel{var"#kernel1#1", Tuple{Type{Float32}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}}}, Tuple{DataType, MtlVector{Float32}, MtlVector{Float32}, MtlVector{Float32}}, Vector{MTLBuffer}}, cmdbuf::Metal.MTL.MTLCommandBufferInstance; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ Metal.MTL ~/.julia/dev/Metal/lib/mtl/command_enc/compute.jl:44
 [3] MTLComputeCommandEncoder(f::Function, cmdbuf::Metal.MTL.MTLCommandBufferInstance)
   @ Metal.MTL ~/.julia/dev/Metal/lib/mtl/command_enc/compute.jl:41
 [4] (::Metal.HostKernel{var"#kernel1#1", Tuple{Type{Float32}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}}})(::Type, ::Vararg{Any}; groups::Int64, threads::Int64, queue::Metal.MTL.MTLCommandQueueInstance)
   @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:197
 [5] macro expansion
   @ ~/.julia/dev/Metal/src/compiler/execution.jl:79 [inlined]
 [6] vadd1(a::MtlVector{Float32}, b::MtlVector{Float32}, c::MtlVector{Float32})
   @ Main ./REPL[4]:8
 [7] top-level scope
   @ REPL[5]:1
 [8] top-level scope
   @ ~/.julia/dev/Metal/src/initialization.jl:33

so i tried sneaking T in via a closure. no joy:

julia> function vadd2(a, b, c)  # closure with T
           function kernel2(a, b, c)
               i = thread_position_in_grid_1d()
               c[i] = a[i] + T(b[i])
               return
           end 
           T = Float32
           @metal threads=2 groups=2 kernel2(a, b, c)
       end
vadd2 (generic function with 1 method)

julia> vadd2(a, b, c)
ERROR: GPU compilation of kernel2(MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}) in world 32452 failed
KernelError: passing and using non-bitstype argument

Argument 1 to your kernel function is of type var"#kernel2#2", which is not isbits:
  .T is of type Core.Box which is not isbits.
    .contents is of type Any which is not isbits.

Stacktrace:
  [1] check_invocation(job::GPUCompiler.CompilerJob)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/validation.jl:101
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/anMCs/src/driver.jl:154 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/anMCs/src/driver.jl:152 [inlined]
  [5] emit_julia(job::GPUCompiler.CompilerJob; validate::Bool)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/utils.jl:83
  [6] emit_julia
    @ ~/.julia/packages/GPUCompiler/anMCs/src/utils.jl:77 [inlined]
  [7] compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ Metal ~/.julia/dev/Metal/src/compiler/compilation.jl:59
  [8] #39
    @ ~/.julia/dev/Metal/src/compiler/compilation.jl:55 [inlined]
  [9] JuliaContext(f::Metal.var"#39#40"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/driver.jl:76
 [10] compile
    @ ~/.julia/dev/Metal/src/compiler/compilation.jl:54 [inlined]
 [11] actual_compilation(cache::Dict{UInt64, Any}, key::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, ft::Type, tt::Type, world::UInt64, compiler::typeof(Metal.compile), linker::typeof(Metal.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/cache.jl:184
 [12] cached_compilation(cache::Dict{UInt64, Any}, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, ft::Type, tt::Type, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/cache.jl:163
 [13] macro expansion
    @ ~/.julia/dev/Metal/src/compiler/execution.jl:161 [inlined]
 [14] macro expansion
    @ ./lock.jl:223 [inlined]
 [15] mtlfunction(f::var"#kernel2#2", tt::Type{Tuple{MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:157
 [16] mtlfunction
    @ ~/.julia/dev/Metal/src/compiler/execution.jl:155 [inlined]
 [17] macro expansion
    @ ~/.julia/dev/Metal/src/compiler/execution.jl:77 [inlined]
 [18] vadd2(a::MtlVector{Float32}, b::MtlVector{Float32}, c::MtlVector{Float32})
    @ Main ./REPL[6]:8
 [19] top-level scope
    @ REPL[7]:1
 [20] top-level scope
    @ ~/.julia/dev/Metal/src/initialization.jl:33

only when T is defined inside does it work:

julia> function vadd3(a, b, c)
           function kernel3(a, b, c)
               T = Float32
               i = thread_position_in_grid_1d()
               c[i] = a[i] + T(b[i])
               return
           end
           @metal threads=2 groups=2 kernel3(a, b, c)
       end
vadd3 (generic function with 1 method)

julia> vadd3(a, b, c)
Metal.HostKernel{var"#kernel3#3", Tuple{MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}}}(var"#kernel3#3"(), Metal.MTL.MTLComputePipelineStateInstance (object of type AGXG14XFamilyComputePipeline))

i was surprised that setting T to a function of the type parameters in the outer function worked:

julia> function vadd4(a::MtlVector{Ta}, b::MtlVector{Tb}, c::MtlVector{Tc}) where {Ta,Tb,Tc}
           function kernel4(a, b, c)
               T = promote_type(Ta,Tb,Tc)
               i = thread_position_in_grid_1d()
               c[i] = a[i] + T(b[i])
               return
           end
           @metal threads=2 groups=2 kernel4(a, b, c)
       end
vadd4 (generic function with 1 method)

julia> vadd4(a, b, c)
Metal.HostKernel{var"#kernel4#4"{Float32, Float32, Float32}, Tuple{MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}}}(var"#kernel4#4"{Float32, Float32, Float32}(), Metal.MTL.MTLComputePipelineStateInstance (object of type AGXG14XFamilyComputePipeline))

but worry that doing so inside the kernel is slow. but i guess that is computed during compilation, not run-time?

my original CUDA function, which i'm trying to refactor to Metal (and eventually KernelAbstractions), computes T via the type parameters outside the kernel and works. like this:

julia> function vadd5(a::MtlVector{Ta}, b::MtlVector{Tb}, c::MtlVector{Tc}) where {Ta,Tb,Tc}
           function kernel5(a, b, c)
               i = thread_position_in_grid_1d()
               c[i] = a[i] + T(b[i])
               return
           end
           T = promote_type(Ta,Tb,Tc)
           @metal threads=2 groups=2 kernel5(a, b, c)
       end
vadd5 (generic function with 1 method)

julia> vadd5(a, b, c)  # KernelError: passing and using non-bitstype argument
ERROR: GPU compilation of kernel5(MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}) in world 32458 failed
KernelError: passing and using non-bitstype argument

Argument 1 to your kernel function is of type var"#kernel5#5", which is not isbits:
  .T is of type Core.Box which is not isbits.
    .contents is of type Any which is not isbits.

Stacktrace:
  [1] check_invocation(job::GPUCompiler.CompilerJob)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/validation.jl:101
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/anMCs/src/driver.jl:154 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/anMCs/src/driver.jl:152 [inlined]
  [5] emit_julia(job::GPUCompiler.CompilerJob; validate::Bool)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/utils.jl:83
  [6] emit_julia
    @ ~/.julia/packages/GPUCompiler/anMCs/src/utils.jl:77 [inlined]
  [7] compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ Metal ~/.julia/dev/Metal/src/compiler/compilation.jl:59
  [8] #39
    @ ~/.julia/dev/Metal/src/compiler/compilation.jl:55 [inlined]
  [9] JuliaContext(f::Metal.var"#39#40"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/driver.jl:76
 [10] compile
    @ ~/.julia/dev/Metal/src/compiler/compilation.jl:54 [inlined]
 [11] actual_compilation(cache::Dict{UInt64, Any}, key::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, ft::Type, tt::Type, world::UInt64, compiler::typeof(Metal.compile), linker::typeof(Metal.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/cache.jl:184
 [12] cached_compilation(cache::Dict{UInt64, Any}, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, ft::Type, tt::Type, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/cache.jl:163
 [13] macro expansion
    @ ~/.julia/dev/Metal/src/compiler/execution.jl:161 [inlined]
 [14] macro expansion
    @ ./lock.jl:223 [inlined]
 [15] mtlfunction(f::var"#kernel5#5", tt::Type{Tuple{MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:157
 [16] mtlfunction
    @ ~/.julia/dev/Metal/src/compiler/execution.jl:155 [inlined]
 [17] macro expansion
    @ ~/.julia/dev/Metal/src/compiler/execution.jl:77 [inlined]
 [18] vadd5(a::MtlVector{Float32}, b::MtlVector{Float32}, c::MtlVector{Float32})
    @ Main ./REPL[12]:8
 [19] top-level scope
    @ REPL[13]:1
 [20] top-level scope
    @ ~/.julia/dev/Metal/src/initialization.jl:33

@maleadt
Copy link
Member

maleadt commented Mar 15, 2023

The issue with vadd1 is a bug indeed, thanks for reiterating. I'll see about fixing it.

vadd2 doesn't work because T is a variable that's captured, whereas in vadd5 it's a typevar Julia can specialize your kernel on.

vadd3 works because you don't pass the non-isbits type T as an argument, but only have it as a constant in your kernel (again, allowing Julia to specialize on it).

As an alternative to vadd4 you can also get T from any of the kernel arguments:

function vadd(a, b, c)
    function kernel1(a::AbstractVector{T}, b, c) where T
        i = thread_position_in_grid_1d()
        c[i] = T(a[i]) + T(b[i])
        return
    end
    @metal threads=2 groups=2 kernel1(a, b, c)
end

The problems you're getting is just because you're trying to pass actual types, which aren't concrete objects, instead of typevars Julia can specialize on. There's a very big difference between them, even though the syntax looks similar.

@maleadt
Copy link
Member

maleadt commented Mar 15, 2023

#135 should make vadd1 possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants