Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NCDatasets.load! does not check the sizes and data types of output arrays and buffers #211

Open
Yixiao-Zhang opened this issue Jun 19, 2023 · 2 comments

Comments

@Yixiao-Zhang
Copy link

Describe the bug

The documentation of NCDatasets.jl mentions that NCDatasets.load! can be used to improve the performance of this package. However, this function does not check the sizes and the data types of the output arrays and buffers. When the size or the data type of the output array or buffer is wrong, the output result will be wrong and Julia itself may even crash due to a segmentation fault.

To Reproduce

using NCDatasets

ncfile = "test.nc"
Nx = 1_000_000
TYPE_x = Float64

# create NetCDF file
NCDataset(ncfile, "c") do ds
    defDim(ds, "x", Nx)
    x = defVar(ds, "x", TYPE_x, ("x", ))
    x[:] = collect(1:Nx)
end

# read the NetCDF file correctly
NCDataset(ncfile, "r") do ds
    x_read = Vector{TYPE_x}(undef, Nx)
    NCDatasets.load!(variable(ds, "x"), x_read, :)

    # This works.
    @assert all(isapprox.(x_read, 1:Nx))
end

# read the NetCDF file with a wrong data type
NCDataset(ncfile, "r") do ds
    x_read = Vector{Int64}(undef, Nx)

    # The line below runs without warning, but it gives
    # a wrong result.
    NCDatasets.load!(variable(ds, "x"), x_read, :)
end

# read the NetCDF file with a wrong array size
NCDataset(ncfile, "r") do ds
    x_read = Vector{TYPE_x}(undef, 1)

    @info "Julia will crash after the line below."
    # The line below runs without warning, when Nx is small. However,
    # when Nx is large enough, Julia crashes.
    NCDatasets.load!(variable(ds, "x"), x_read, :)
end

Expected behavior

NCDatasets.load! should give an error when the size or the data type of the output array or buffer is incorrect. At least, the documentation should not mention using NCDatasets.load! without describing its potential danger.

A safe version of load! that does these checks will be very helpful.

Environment

  • operating system: Ubuntu 20.04.6 LTS
  • Julia version: 1.9.1
  • NCDatasets version: v0.12.17

Full output

[ Info: Julia will crash after the line below.

[45103] signal (11.128): Segmentation fault
in expression starting at /home/yixiaoz/Documents/IdealizedOceanWorlds.jl/TestNCDatasets.jl/run.jl:33
jl_array_ptr_ref at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/julia.h:1028 [inlined]
eval_phi at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/interpreter.c:358
eval_body at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/interpreter.c:626
jl_interpret_toplevel_thunk at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/interpreter.c:762
jl_toplevel_eval_flex at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/toplevel.c:912
jl_toplevel_eval_flex at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/toplevel.c:856
ijl_toplevel_eval_in at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/toplevel.c:971
eval at ./boot.jl:370 [inlined]
include_string at ./loading.jl:1899
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gf.c:2940
_include at ./loading.jl:1959
include at ./Base.jl:457
jfptr_include_31033.clone_1 at /home/yixiaoz/.julia/juliaup/julia-1.9.1+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gf.c:2940
exec_options at ./client.jl:307
_start at ./client.jl:522
jfptr__start_29455.clone_1 at /home/yixiaoz/.julia/juliaup/julia-1.9.1+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
true_main at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/jlapi.c:573
jl_repl_entrypoint at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/jlapi.c:717
main at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/cli/loader_exe.c:59
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x401098)
Allocations: 962532 (Pool: 961521; Big: 1011); GC: 2
Segmentation fault (core dumped)
@Alexander-Barth
Copy link
Member

OK, it is mentioned in the doc string (https://github.com/Alexander-Barth/NCDatasets.jl/blob/master/src/variable.jl#L104 ) of the NCDatasets.load! function but not the fatal consequences.

But in any case, I agree, that it is better that the default should be safer. In the recent commit there is now a bounds check and if the types do not match then one gets a method error. The bounds check can be disabled by using e.g. @inbounds NCDatasets.load!(ncv,data,:).

Alexander-Barth added a commit that referenced this issue Jun 21, 2023
@Yixiao-Zhang
Copy link
Author

Now checkbuffer only checks the total number of elements (length), but it does not check dimensions of the array (size). Is there a reason for only checking the total number of elements?

Thank you very much for adding these checks and upgrading the doc string! I can close this issue after a pull that merges this commit into the master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants