-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Precompile bugs? #275
Comments
Update: I have checked thoroughly and I can no longer successfully precompile and run mpi tests in any new install, regardless of the various choices I could make in the manual setup. Help needed! |
I think the first warning about The For the MPI errors - are you 100% sure that |
Oh, one thing that might help - if you're installing |
See #254. |
I think we need to take seriously the lines
suggesting that there is an issue with the order in which the My new evidence is that I have tried running a simple fokker-planck simulation with
and
respectively, and I obtain the same result as the recorded test moment_kinetics/moment_kinetics/test/fokker_planck_time_evolution_tests.jl Lines 83 to 129 in 9284bd9
density_amplitude = 0.0 instead of density_amplitude = 0.001 . I have tried this with and without -O3 . I think this indicates that MPI runs are working with -Jmoment_kinetics.so , but the test scripts are not.
@johnomotani Could you check if running tests with |
I have determined that when the moment kinetics package is such that
runs correctly but
errors or hangs, the bug can be reproduced with the following very simple script
showing that the MPI error may have something to do with changes to the code outside |
I find that I can recover ability to use the
Note that you may have to run
to avoid errors when compiling for the first time after choosing to use different modules.
and
On commit 9284bd9, I have the following output from
@johnomotani Is this finding worth reproducing or reporting? It seems like newer versions of |
Thanks Michael! I think I may be starting to understand what's happening. In #277 I unpinned the HDF5_jll version, because the newer version can work now. However, the newer HDF5_jll does link MPI, so if you use a 'system MPI' (rather than the Julia-provided MPI) you have to also link an external HDF5. Sorry, I assumed everyone was already doing this in order to use parallel I/O, but should have flagged it and documented it better. If this is the problem, I don't know why things seemed to work when not using a system image, but library linking conflicts can just cause random errors which sometimes 'work' and sometimes crash and sometimes mess up results.
You can see the commands to compile HDF5 (with the system MPI linked) at the bottom of the script `machines/generic-pc/compile_dependencies.sh`. It'd be nice if that script could run standalone, but the script needs a bit of restructuring to do that.
|
Unfortunately, I don't now think this problem is down to using the system MPI. The last set of reports were made using the julia MPI with `mpiexecjl`(i.e., not loading any system modules besides julia/1.10.2), because I found that the precompilation bug affected installs using both the system-provided MPI and the native julia MPI. I have tried to make a new install script to pin in advance to the MPI and HDF5_jll modules mentioned above, but this was not enough to avoid the errors. When I have time I will try to work out which other libraries might need to be pinned.
|
After pinning
All of these errors seem to be in the form
To me this looks like the "test extension" of Accessors is not found. Despite these errors, the tests ran and passed on this occasion.
|
On 57e0d1d I think I have found a adequate resolution (for now) to the problem discussed here. The missing information in my previous setup was the extra module requirements that have crept in. Explicitly, if one uses a setup script as below
Then on the system on which I am working, I finally obtain passing tests with MPI using the
To try to help new manual installs of the future to be less painful, I have suggested changes in #285. The list of packages used in the install on my present system is given below.
|
Compiling a system image is now segfaulting for me using Julia-1.11.1 on (more-or-less) the master branch. Need to look into this when I get a chance... Edit: this segfault does not happen with Julia-1.10.5. |
When I try to precompile the latest master (097ee11) on a machine which is usually reliable using the manual install instructions https://mabarnes.github.io/moment_kinetics/dev/manual_setup/, I now see this output
Is this caused by changes related to #273?
I have checked that the branch here https://github.com/mabarnes/moment_kinetics/commits/electric_field_switch/ without the timer changes does not have the first error message (the
LoweredCodeUtils
error is present). In this older branch, I can still useto make
moment_kinetics.so
and runsuccessfully. However, in the latest master, with the manual install instructions I get the following error when I run the above command
We may need to add
StatsBase
to https://mabarnes.github.io/moment_kinetics/dev/manual_setup/, e.g.,Unfortunately adding
StatsBase
did not fix the MPI error, although it did remove the precompile warning.@johnomotani Does this make sense, or is something else going wrong here? I am trying to make a script for @MantasAbazorius to use to setup moment kinetics on the same machine, and
precompile.jl
might only be making a validmoment_kinetics.so
when run from command line directly (i.e., not in a bash script).The text was updated successfully, but these errors were encountered: