-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nightwatch and/or desiconda mismatch between NERSC and KPNO #335
Comments
Another possibility is that the underlying CCD calibration files in $DESI_SPECTRO_CALIB and/or $DESI_SPECTRO_DARK weren't in sync. I think we've been pretty good about keeping CALIB in sync (in svn, easier), but I'm not sure about DARK (at NERSC, not in svn, needs a different sync procedure and I'm not sure anyone is doing that). |
Thanks for the quick check @sbailey. It does look like $DESI_SPECTRO_CALIB at KPNO is one commit behind. Here is the output of
vs at NERSC:
Should Jose or I run |
Yes, please go ahead and update KPNO. We never purposefully have the two out of sync. |
I updated DESI_SPECTRO_CALIB on desi-7 and deleted+reprocessed night/expid 20230205/166300. The b4d mask did not go away, but at least the working copy of the calibrations is synced up with the repository. Next guess: the desiconda projects are out of date. The DESICONDA_VERSION on desi-7 is 20200924. While it looks like Can/should we attempt an update of a few projects? Or all of desiconda? |
Check of $DESI_SPECTRO_DARK: this variable is not defined at KPNO. Unsetting the variable at NERSC does not cause b4d to be masked out in expid 166300. A mismatch in desiconda packages seems more likely. Will try to set up against an older version of desiconda at NERSC to see if the masking error is reproduced. |
Downgrading desiconda to 22.2 at NERSC does not reproduce the error but it does create processing problems for multiple exposures. |
Getting closer to solving this problem, with the 2.1.0-dev version of desiconda installed at KPNO with minor changes (see desihub/desiconda#59). I'm able to run |
A copy of
I'm not certain where this is arising; it could be in the Python multiprocessing module. |
@marcelo-alvarez @tskisner @craigwarner-ufastro do you recognize this? Nightwatch at KPNO uses multiprocessing but not MPI, and not GPU, but it does touch some code with numba JIT kernels and uses numpy with OpenMP parallelization under-the-hood. At NERSC we set KMP_AFFINITY=disabled, but I don't think we ever needed to mess with that at KPNO. |
@sbailey, I have not seen this. KMP_AFFINITY=disabled is set (redundantly from the point of view of I am not familiar enough with how the desiconda environment is set up at KPNO to know if desispec and redrock modules are used, or if modules are used at all. @sybenzvi was the environment variable KMP_AFFINITY set to disabled at runtime when you obtained the error your reported above at KPNO? If not, you could try that and see if it fixes it. |
@marcelo-alvarez, I had not defined KMP_AFFINITY so I just tried running
and I get the same error as before. In case it helps, I'm attaching the installation log for desiconda, which I installed using the README instructions (but with the hpsspy and mpi4py installations disabled). |
@sybenzvi I don't see anything from the installation log that would explain the |
Potential update on this old ticket: today @jose-bermejo and I are testing the installation of desiconda with Rob Knop and we encountered this same OMP assertion issue. Googling around we found this workaround based on setting the following environment variable: Will try this at desi-7 and report back. It seems to be related to the version of the intel compiler and may be fixed in newer versions of the compiler. |
Confirming that |
@sybenzvi great Googling. In retrospect we should have anticipated this, since it's also set at NERSC via desimodules, i.e.
If you now have a desiconda that is working in practice at KPNO, it might make sense to close this issue and return to desihub/desiconda#60. What do you think? |
@marcelo-alvarez, I agree, let's close this issue in Nightwatch. I was clearly skipping a step in the setup at KPNO so all that's really needed is to test the install again using desimodules to configure the environment. |
On 20230205, Becky Canning pointed out that in exposure 166300, amp b4d is masked resulting in missing B fluxes for fibers 2257-2499. The missing amp is visible in the KPNO Nightwatch QA pages for this exposure.
However, the same exposure at NERSC does not show b4d masked out. Check that the versions of Nightwatch and desiconda on desi-7 and cori/perlmutter are identical.
The text was updated successfully, but these errors were encountered: