Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup conversion from types <: DataType or <: UnionAll to CEnum.Cenum or Enum type ids #262

Open
mathieu17g opened this issue Nov 5, 2021 · 4 comments

Comments

@mathieu17g
Copy link
Collaborator

@yeesian and @visr, I may have found a way to speedup conversion from OGRFieldType and OGRFieldSubType to DataType x200 times.
This could notably speedup layer to table conversions.

In order to move the Dict lookup from runtime to compile time, we can create a real OGRFieldType instead of using directly a type id as we do currently, OGRFieldType being an Enum
This would allow to use generated convert functions and simplify @convert macro by the way

I have added the following code mockup to types.jl:

struct Real_OGRFieldType{OGRFieldType} end
Real_OGRFieldType2DataType = Dict(
    Real_OGRFieldType{OFTInteger} => Int32,
    # ...
)
@generated function convert(
    ::Type{DataType},
    real_oft::Type{T},
) where {T<:Real_OGRFieldType}
    return :($(get(Real_OGRFieldType2DataType, T, nothing)))
end

nothing could be replaced to throw error when lookup fails

This gives:

julia> using ArchGDAL; const AG=ArchGDAL
ArchGDAL

julia> using BenchmarkTools

julia> convert(DataType, AG.Real_OGRFieldType{AG.OFTInteger})
Int32

julia> @code_lowered convert(DataType, AG.Real_OGRFieldType{AG.OFTInteger})
CodeInfo(
    @ /Users/Mathieu/.../dev/ArchGDAL/src/types.jl:299 within `convert'
   ┌ @ /Users/Mathieu/.../dev/ArchGDAL/src/types.jl within `macro expansion'
1 ─│     return Int32
   └
)

julia> @benchmark convert(DataType, AG.Real_OGRFieldType{AG.OFTInteger})
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min  max):  0.026 ns  0.050 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     0.037 ns             ┊ GC (median):    0.00%
 Time  (mean ± σ):   0.037 ns ± 0.002 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                           █   ▁             
  ▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▂▁▁▄▁▁▁▅▁▁▁▂▁▁▁▂▁▁▁▂▁▁▃▁▁▁█▁▁▁█▁▁▁▃▁▁▁▃▁▁▂ ▂
  0.026 ns       Histogram: frequency by time      0.041 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> convert(DataType, AG.OFTInteger)
Int32

julia> @code_lowered convert(DataType, AG.OFTInteger)
CodeInfo(
1%1 = Base.getindex(ArchGDAL.OGRFieldType_to_DataType_map, ft)
└──      return %1
)

julia> @benchmark convert(DataType, AG.OFTInteger)
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min  max):  8.242 ns  58.932 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     8.314 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   8.754 ns ±  1.666 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▆▃▃ ▃▃  ▁     ▄                                           ▁
  ████▅██▄▄█▁▁▃▁▅█▁▁▃▃▁▁▃▁▁▁▄▁▁▃▁▁▁▁▃▁▁▃▄▁▁▁▁▁▁▁▁▁▁▁▃▁▅▅▆▆▇▇ █
  8.24 ns      Histogram: log(frequency) by time     18.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> 
@yeesian
Copy link
Owner

yeesian commented Nov 5, 2021

That sounds like a proposition to me with almost no downsides, I'm curious what led to your choice of types (compared with e.g. #246)? Is it for the machinery of generated functions?

@mathieu17g
Copy link
Collaborator Author

Well here is what I did:

  1. Examined @code_lowered output
  2. => need to get the same as the reverse conversion convert(::OGRFieldType, ::Int32)
  3. => need to have OGRFieldType as a real type to benefit from the Julia shortcut convert(::Type{Type}, x::Type) = x at https://github.com/JuliaLang/julia/blob/ae8452a9e0b973991c30f27beb2201db1b0ea0d3/base/essentials.jl#L206 , as in the reverse conversion
  4. Tried to use Val(oft) and dispatch on a subtype of something like struct Val{OGRFieldType} end => did not work
  5. Mimicking what has been done with IGeometry, I tried with struct NewOGRFieldType{OGRFieldType} end => it worked :-)

The @generated macro, which only works on types, could do the same as the @convert macro while being more concise. And therefore could be placed in types.jl file next to each conversion pairs collections.

Regarding the case of issue #246, maybe we could make getfield(::Feature, ::Int) specialize, if we had types looking like Feature{FD} where FD <: FeatureDefn{NT} where NT <: NamedTuple.
Testing would tell.

@mathieu17g
Copy link
Collaborator Author

mathieu17g commented Nov 6, 2021

While raising this issue, I had not in mind the usage of convert(DataType, oft::OGRFielType), which I guess may only be a the end of asint, ..., asdatetime functions. I'm not even sure it is called there...

In those functions, the convert time must be less than 20% of the total time spent in getfield.
We should then first address the issue #246, shouldn't we ?
If so and if @yeesian, you have not yet started to work on it, shall I auto assign #246 to myself ?

EDIT 1:
Looking at the output of @code_warntype below, it seems that convert(DataType, oft::OGRFielType) is not used asint, ..., asdatetime functions. I going to go on looking for where these convert(DataType, oft::OGRFielType) functions are used, to balance to potential gain with the one of fixing #246

julia> AG.read("test/data/point.geojson") do ds
           AG.getlayer(ds) do layer
               AG.getfeature(layer, 0) do f
                   @code_warntype GDAL.ogr_f_getfieldasstring(f.ptr, 1)
               end
           end
       end
Variables
  #self#::Core.Const(GDAL.ogr_f_getfieldasstring)
  arg1::Ptr{Nothing}
  arg2::Int64

Body::Union{Nothing, String}
1%1 = Base.cconvert(GDAL.OGRFeatureH, arg1)::Ptr{Nothing}%2 = Base.cconvert(GDAL.Cint, arg2)::Int32%3 = Base.unsafe_convert(GDAL.OGRFeatureH, %1)::Ptr{Nothing}%4 = Base.unsafe_convert(GDAL.Cint, %2)::Int32%5 = $(Expr(:foreigncall, :(Core.tuple(:OGR_F_GetFieldAsString, GDAL.libgdal)), Cstring, svec(Ptr{Nothing}, Int32), 0, :(:ccall), :(%3), :(%4), :(%2), :(%1)))::Cstring%6 = GDAL.aftercare(%5, false)::Union{Nothing, String}
└──      return %6

julia> AG.read("test/data/point.geojson") do ds
           AG.getlayer(ds) do layer
               AG.getfeature(layer, 0) do f
                   @code_warntype AG.asstring(f, 1)
               end
           end
       end
Variables
  #self#::Core.Const(ArchGDAL.asstring)
  feature::ArchGDAL.Feature
  i::Int64

Body::String
1%1 = ArchGDAL.String::Core.Const(String)
│   %2 = GDAL.ogr_f_getfieldasstring::Core.Const(GDAL.ogr_f_getfieldasstring)
│   %3 = Base.getproperty(feature, :ptr)::Ptr{Nothing}%4 = (%2)(%3, i)::Union{Nothing, String}%5 = Base.convert(%1, %4)::String%6 = Core.typeassert(%5, %1)::String
└──      return %6

EDIT 2:
I have changed this line in utils.jl in @convert macro with:

(eval(T2) isa Type{DataType}) && (eval(T1) <: OGRFieldType) || push!(

to drop the convert(DataType, oft::OGRFielType) functions.

Running Pkg.test("ArchGDAL"), reveals that only specific unit tests are failing.

@test convert(DataType, AG.OFTInteger) == Int32
@test convert(DataType, AG.OFTReal) == Float64

@test Base.convert(DataType, AG.OFTInteger) == Int32

@test Base.convert(DataType, AG.OFTReal) == Float64

So these conversion functions may be useless. The move of the dictionary lookup from runtime to compile time might be useful for some other conversions, but I propose to tackle the issue #246 first

@mathieu17g mathieu17g changed the title Speedup conversion from OGRFieldType and OGRFieldSubType to DataType Speedup conversion from types <: DataType or <: UnionAll to CEnum.Cenum or Enum type ids Nov 6, 2021
@yeesian
Copy link
Owner

yeesian commented Nov 10, 2021

So sorry for my late response, I was dealing with a lot recently. I have not started working on #246 and have assigned it to you instead, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants