Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDF5 error running examples/fokker-planck/fokker-planck-relaxation.toml #290

Open
mrhardman opened this issue Nov 7, 2024 · 7 comments
Open

Comments

@mrhardman
Copy link
Collaborator

I wanted to revisit an old test case, so ran examples/fokker-planck/fokker-planck-relaxation, on a single core, with #289 at the present commit. I have an install with Julia-provided MPI and HDF5. I see the following strange error. @johnomotani does this look familiar?

julia> run_moment_kinetics("runs/fokker-planck-relaxation.toml")
Starting setup   18:38:59
setting up GL quadrature   18:41:26
beginning (boundary) weights calculation   18:41:26
finished (boundary) weights calculation   18:41:26
begin elliptic operator assignment   18:41:26
finished elliptic operator constructor assignment   18:41:26
finished LU decomposition initialisation   18:41:26
finished YY array calculation   18:41:26
ERROR: MethodError: no method matching lastindex(::Nothing)

Closest candidates are:
  lastindex(::Any, ::Any)
   @ Base abstractarray.jl:427
  lastindex(::Base64.Buffer)
   @ Base64 /*/linux-x86_64/julia/1.10.2/share/julia/stdlib/v1.10/Base64/src/buffer.jl:19
  lastindex(::Markdown.MD)
   @ Markdown /*/linux-x86_64/julia/1.10.2/share/julia/stdlib/v1.10/Markdown/src/parse/parse.jl:26
  ...

Stacktrace:
 [1] run_moment_kinetics(input_dict::Dict{String, Any}; restart::Bool, restart_time_index::Int64)
   @ moment_kinetics ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/moment_kinetics.jl:161
 [2] run_moment_kinetics
   @ ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/moment_kinetics.jl:115 [inlined]
 [3] #run_moment_kinetics#3
   @ ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/moment_kinetics.jl:174 [inlined]
 [4] run_moment_kinetics(input_filename::String)
   @ moment_kinetics ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/moment_kinetics.jl:173
 [5] top-level scope
   @ REPL[2]:1

caused by: HDF5.API.H5Error: Error writing dataset
libhdf5 Stacktrace:
 [1] H5D__ioinfo_adjust: Dataset/Can't perform independent IO
     Can't perform independent write when MPI_File_sync is required by ROMIO driver.
  ⋮
Stacktrace:
  [1] macro expansion
    @ ~/excalibur/moment_kinetics_test_install2/.julia/packages/HDF5/Z859u/src/api/error.jl:18 [inlined]
  [2] h5d_write(dataset_id::HDF5.Dataset, mem_type_id::HDF5.Datatype, mem_space_id::Int64, file_space_id::Int64, xfer_plist_id::HDF5.DatasetTransferProperties, buf::Base.RefValue{…})
    @ HDF5.API ~/excalibur/moment_kinetics_test_install2/.julia/packages/HDF5/Z859u/src/api/functions.jl:912
  [3] write_dataset
    @ ~/excalibur/moment_kinetics_test_install2/.julia/packages/HDF5/Z859u/src/datasets.jl:577 [inlined]
  [4] write_dataset
    @ ~/excalibur/moment_kinetics_test_install2/.julia/packages/HDF5/Z859u/src/datasets.jl:576 [inlined]
  [5] write_single_value!(::HDF5.Group, ::String, ::Int64; parallel_io::Bool, description::String, units::Nothing, overwrite::Bool)
    @ moment_kinetics.file_io ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/file_io_hdf5.jl:111
  [6] write_single_value!
    @ ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/file_io_hdf5.jl:89 [inlined]
  [7] macro expansion
    @ ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/file_io.jl:729 [inlined]
  [8] macro expansion
    @ ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/looping.jl:808 [inlined]
  [9]
    @ moment_kinetics.file_io ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/file_io.jl:727
 [10] macro expansion
    @ ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/file_io.jl:2128 [inlined]
 [11] macro expansion
    @ ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/looping.jl:808 [inlined]
 [12] setup_moments_io(prefix::String, io_input::moment_kinetics.file_io.io_input_struct, vz::moment_kinetics.coordinates.coordinate{…}, vr::moment_kinetics.coordinates.coordinate{…}, vzeta::moment_kinetics.coordinates.coordinate{…}, vpa::moment_kinetics.coordinates.coordinate{…}, vperp::moment_kinetics.coordinates.coordinate{…}, r::moment_kinetics.coordinates.coordinate{…}, z::moment_kinetics.coordinates.coordinate{…}, composition::moment_kinetics.input_structs.species_composition, collisions::moment_kinetics.input_structs.collisions_input, evolve_density::Bool, evolve_upar::Bool, evolve_ppar::Bool, external_source_settings::@NamedTuple{…}, input_dict::Dict{…}, io_comm::MPI.Comm, run_id::String, restart_time_index::Int64, previous_runs_info::Nothing, time_for_setup::Float64, t_params::moment_kinetics.input_structs.time_info{…}, nl_solver_params::@NamedTuple{…})
    @ moment_kinetics.file_io ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/file_io.jl:2116
 [13] macro expansion
    @ ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/file_io.jl:482 [inlined]
 [14] macro expansion
    @ ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/looping.jl:808 [inlined]
 [15] setup_file_io(io_input::moment_kinetics.file_io.io_input_struct, boundary_distributions::moment_kinetics.moment_kinetics_structs.boundary_distributions_struct, vz::moment_kinetics.coordinates.coordinate{…}, vr::moment_kinetics.coordinates.coordinate{…}, vzeta::moment_kinetics.coordinates.coordinate{…}, vpa::moment_kinetics.coordinates.coordinate{…}, vperp::moment_kinetics.coordinates.coordinate{…}, z::moment_kinetics.coordinates.coordinate{…}, r::moment_kinetics.coordinates.coordinate{…}, composition::moment_kinetics.input_structs.species_composition, collisions::moment_kinetics.input_structs.collisions_input, evolve_density::Bool, evolve_upar::Bool, evolve_ppar::Bool, external_source_settings::@NamedTuple{…}, input_dict::Dict{…}, restart_time_index::Int64, previous_runs_info::Nothing, time_for_setup::Float64, t_params::moment_kinetics.input_structs.time_info{…}, nl_solver_params::@NamedTuple{…})
    @ moment_kinetics.file_io ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/file_io.jl:461
 [16] macro expansion
    @ ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/moment_kinetics.jl:355 [inlined]
 [17] setup_moment_kinetics(input_dict::Dict{…}; restart::Bool, restart_time_index::Int64, debug_loop_type::Nothing, debug_loop_parallel_dims::Nothing, skip_electron_solve::Bool)
    @ moment_kinetics ~/excalibur/moment_kinetics_test_install2/.julia/packages/TimerOutputs/NRdsv/src/TimerOutput.jl:237
 [18] setup_moment_kinetics
    @ ~/excalibur/moment_kinetics_test_install2/.julia/packages/TimerOutputs/NRdsv/src/TimerOutput.jl:230 [inlined]
 [19] macro expansion
    @ ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/moment_kinetics.jl:133 [inlined]
 [20] macro expansion
    @ ~/excalibur/moment_kinetics_test_install2/.julia/packages/TimerOutputs/NRdsv/src/TimerOutput.jl:237 [inlined]
 [21] run_moment_kinetics(input_dict::Dict{String, Any}; restart::Bool, restart_time_index::Int64)
    @ moment_kinetics ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/moment_kinetics.jl:131
 [22] run_moment_kinetics
    @ ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/moment_kinetics.jl:115 [inlined]
 [23] #run_moment_kinetics#3
    @ ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/moment_kinetics.jl:174 [inlined]
 [24] run_moment_kinetics(input_filename::String)
    @ moment_kinetics ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/moment_kinetics.jl:173
 [25] top-level scope
    @ REPL[2]:1
Some type information was truncated. Use `show(err)` to see complete types.
@johnomotani
Copy link
Collaborator

I don't think I've seen that one before. I'm having issues with HDF5 at the moment, but mine are to do with parallel I/O of the timing data outputs introduced in #276, and don't look related to this. That input file is checked in the CI, so it should have been run recently, in pretty much the setup you describe!

@mrhardman
Copy link
Collaborator Author

On my local machine I can make a fresh install and reproduce the error, on both 1.10.2 and 1.10.6. Moreover, the tests pass in the same environment where I see this bug. Have any input options changed that could have caused this? See below

julia> using moment_kinetics

julia> include("moment_kinetics/test/fokker_planck_time_evolution_tests.jl")
Fokker Planck dFdt = C[F,F] relaxation test
    - testing gausslegendre_pseudospectral_vbzero-impose-regularity
    - testing gausslegendre_pseudospectral_vbzero
    - testing gausslegendre_pseudospectral_vbnonevbnone
Test Summary:                               | Pass  Total     Time
Fokker Planck dFdt = C[F,F] relaxation test |   72     72  7m00.2s

julia> run_moment_kinetics("examples/fokker-planck/fokker-planck-relaxation.toml")
Starting setup   7:58:44
setting up GL quadrature   7:58:44
beginning (boundary) weights calculation   7:58:44
finished (boundary) weights calculation   7:58:44
begin elliptic operator assignment   7:58:44
finished elliptic operator constructor assignment   7:58:44
finished LU decomposition initialisation   7:58:44
finished YY array calculation   7:58:44
ERROR: MethodError: no method matching lastindex(::Nothing)

Closest candidates are:
  lastindex(::Any, ::Any)
   @ Base abstractarray.jl:427
  lastindex(::Cmd)
   @ Base process.jl:678
  lastindex(::Markdown.MD)
   @ Markdown /*/linux-x86_64/julia/1.10.6/share/julia/stdlib/v1.10/Markdown/src/parse/parse.jl:26
  ...

Stacktrace:
 [1] run_moment_kinetics(input_dict::Dict{String, Any}; restart::Bool, restart_time_index::Int64)
   @ moment_kinetics ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/moment_kinetics.jl:161
 [2] run_moment_kinetics
   @ ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/moment_kinetics.jl:115 [inlined]
 [3] #run_moment_kinetics#3
   @ ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/moment_kinetics.jl:174 [inlined]
 [4] run_moment_kinetics(input_filename::String)
   @ moment_kinetics ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/moment_kinetics.jl:173
 [5] top-level scope
   @ REPL[12]:1

caused by: HDF5.API.H5Error: Error writing dataset
libhdf5 Stacktrace:
 [1] H5D__ioinfo_adjust: Dataset/Can't perform independent IO
     Can't perform independent write when MPI_File_sync is required by ROMIO driver.
  ⋮
Stacktrace:
  [1] macro expansion
    @ ~/excalibur/moment_kinetics_test_install2/.julia/packages/HDF5/Z859u/src/api/error.jl:18 [inlined]
  [2] h5d_write(dataset_id::HDF5.Dataset, mem_type_id::HDF5.Datatype, mem_space_id::Int64, file_space_id::Int64, xfer_plist_id::HDF5.DatasetTransferProperties, buf::Base.RefValue{…})
    @ HDF5.API ~/excalibur/moment_kinetics_test_install2/.julia/packages/HDF5/Z859u/src/api/functions.jl:912
  [3] write_dataset
    @ ~/excalibur/moment_kinetics_test_install2/.julia/packages/HDF5/Z859u/src/datasets.jl:577 [inlined]
  [4] write_dataset
    @ ~/excalibur/moment_kinetics_test_install2/.julia/packages/HDF5/Z859u/src/datasets.jl:576 [inlined]
  [5] write_single_value!(::HDF5.Group, ::String, ::Int64; parallel_io::Bool, description::String, units::Nothing, overwrite::Bool)
    @ moment_kinetics.file_io ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/file_io_hdf5.jl:111
  [6] write_single_value!
    @ ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/file_io_hdf5.jl:89 [inlined]
  [7] macro expansion
    @ ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/file_io.jl:729 [inlined]
  [8] macro expansion
    @ ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/looping.jl:808 [inlined]
  [9]
    @ moment_kinetics.file_io ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/file_io.jl:727
 [10] macro expansion
    @ ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/file_io.jl:2128 [inlined]
 [11] macro expansion
    @ ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/looping.jl:808 [inlined]
 [12] setup_moments_io(prefix::String, io_input::moment_kinetics.file_io.io_input_struct, vz::moment_kinetics.coordinates.coordinate{…}, vr::moment_kinetics.coordinates.coordinate{…}, vzeta::moment_kinetics.coordinates.coordinate{…}, vpa::moment_kinetics.coordinates.coordinate{…}, vperp::moment_kinetics.coordinates.coordinate{…}, r::moment_kinetics.coordinates.coordinate{…}, z::moment_kinetics.coordinates.coordinate{…}, composition::moment_kinetics.input_structs.species_composition, collisions::moment_kinetics.input_structs.collisions_input, evolve_density::Bool, evolve_upar::Bool, evolve_ppar::Bool, external_source_settings::@NamedTuple{…}, input_dict::Dict{…}, io_comm::MPI.Comm, run_id::String, restart_time_index::Int64, previous_runs_info::Nothing, time_for_setup::Float64, t_params::moment_kinetics.input_structs.time_info{…}, nl_solver_params::@NamedTuple{…})
    @ moment_kinetics.file_io ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/file_io.jl:2116
 [13] macro expansion
    @ ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/file_io.jl:482 [inlined]
 [14] macro expansion
    @ ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/looping.jl:808 [inlined]
 [15] setup_file_io(io_input::moment_kinetics.file_io.io_input_struct, boundary_distributions::moment_kinetics.moment_kinetics_structs.boundary_distributions_struct, vz::moment_kinetics.coordinates.coordinate{…}, vr::moment_kinetics.coordinates.coordinate{…}, vzeta::moment_kinetics.coordinates.coordinate{…}, vpa::moment_kinetics.coordinates.coordinate{…}, vperp::moment_kinetics.coordinates.coordinate{…}, z::moment_kinetics.coordinates.coordinate{…}, r::moment_kinetics.coordinates.coordinate{…}, composition::moment_kinetics.input_structs.species_composition, collisions::moment_kinetics.input_structs.collisions_input, evolve_density::Bool, evolve_upar::Bool, evolve_ppar::Bool, external_source_settings::@NamedTuple{…}, input_dict::Dict{…}, restart_time_index::Int64, previous_runs_info::Nothing, time_for_setup::Float64, t_params::moment_kinetics.input_structs.time_info{…}, nl_solver_params::@NamedTuple{…})
    @ moment_kinetics.file_io ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/file_io.jl:461
 [16] macro expansion
    @ ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/moment_kinetics.jl:355 [inlined]
 [17] setup_moment_kinetics(input_dict::Dict{…}; restart::Bool, restart_time_index::Int64, debug_loop_type::Nothing, debug_loop_parallel_dims::Nothing, skip_electron_solve::Bool)
    @ moment_kinetics ~/excalibur/moment_kinetics_test_install2/.julia/packages/TimerOutputs/NRdsv/src/TimerOutput.jl:237
 [18] setup_moment_kinetics
    @ ~/excalibur/moment_kinetics_test_install2/.julia/packages/TimerOutputs/NRdsv/src/TimerOutput.jl:230 [inlined]
 [19] macro expansion
    @ ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/moment_kinetics.jl:133 [inlined]
 [20] macro expansion
    @ ~/excalibur/moment_kinetics_test_install2/.julia/packages/TimerOutputs/NRdsv/src/TimerOutput.jl:237 [inlined]
 [21] run_moment_kinetics(input_dict::Dict{String, Any}; restart::Bool, restart_time_index::Int64)
    @ moment_kinetics ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/moment_kinetics.jl:131
 [22] run_moment_kinetics
    @ ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/moment_kinetics.jl:115 [inlined]
 [23] #run_moment_kinetics#3
    @ ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/moment_kinetics.jl:174 [inlined]
 [24] run_moment_kinetics(input_filename::String)
    @ moment_kinetics ~/excalibur/moment_kinetics_test_install2/moment_kinetics/src/moment_kinetics.jl:173
 [25] top-level scope
    @ REPL[12]:1
Some type information was truncated. Use `show(err)` to see complete types.

@mrhardman
Copy link
Collaborator Author

P.S. When I try to return to the bash terminal, I get this message

julia>
Attempting to use an MPI routine (internal_Barrier) before initializing or after finalizing MPICH

@johnomotani
Copy link
Collaborator

I cannot reproduce. I merged #278 into #289 to do the setup with Julia-provided MPI+HDF5. examples/fokker-planck/fokker-planck-relaxation.toml runs on a single core without error, using Julia-1.11.1 and Julia-1.10.6.

I've just merged #278 into master. @mrhardman could you check if you can reproduce this with a clean install of master, setting up by running machines/machine_setup.sh and choosing not to use the system MPI (which will mean that the Julia-provided MPI is used)?

@johnomotani
Copy link
Collaborator

Especially that last error message, sounds a bit like you're using the wrong MPI, but that shouldn't be possible running in serial. Did you run using mpiexecjl on one core rather than just running julia?

@mrhardman
Copy link
Collaborator Author

I confirm that I found these bugs again even in an install using the machines/machine_setup.sh script.

@mrhardman
Copy link
Collaborator Author

I have tried to find this bug on another Linux computer, and I am also unable to reproduce it there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants