-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pqcxms not generating res files #86
Comments
Hi, this all looks good to me, except that at some point the program stops by a |
Hello, it's running on an HPC cluster via Slurm, kicked off thusly:
It SIGABRTs after a few minutes so it's not running into a time limit as we gave it 24 hours. Looking at the node it's not running out of RAM or some other resource as far as I can tell. We're running RHEL 9.4. Do we need to absolutely go the q-batch route for this? It seems to insert everything as its own job in the queue is the only main difference. Thanks! |
You do not need the q-batch script, it's only provided to make it easy-to-use for batch users. Please feel free to use a SLURM submission script for this, somewhere someone already created a SLURM script for that, but I can't recall if it was posted somewhere. |
OK thanks! The q-batch approach looked like it was written for torque or PBS initially so I thought it might be easier to approach it from a different angle. Especially when mixing it with Lmod.
I don't think so either after digging in and trying to figure out how QCXMS works. We're only running it on one node right now and were trying a few different approaches to see how it worked. I think we're going to have to come up with our our script for this. Looks like the correct approach is to setup an individual job for each TMPQCXMS/TMP.X and their corresponding qcxms.in and then cat the out and res files into the main project directory if I'm following the logic correctly. |
Yes, if you can make a script that runs a single instance of QCxMS on each node and collect the res file, that is actually what the q-batch script does. So it would bne best to use it for your own infrastructure. |
@lh59281 you may want to use SLURM Job Arrays. Basically if you have a QCxMS task with 400 trajectories, you just submit a single command which will loop through all the 400 tasks via Slurm, its much less load on the Slurm scheduler and sysadmins love it. You also might want to check if hyperthreading is on/off depending on Intel or AMD processors. Plus because of the large write overhead, you certainly want to cache your data directory via RAM or use enterprise SSDs. If you use a RAID array, make sure you can write fast enough (small chunks). You can also ping @Shunyang2018, we have run thousands of QCEIMS and QCxMS jobs on Slurm, sometimes using several thousand CPU cores. The only other job before running the Job Array is to have all the trajectories ready in their directories, you basically want to have the initial MD run individually. You can do that on a high GHz CPU like a Core i9 or Ryzen 9, because I think this is not parallelized. Or you can create another SLURM job by using the --dependency switch with the afterok option. After evrything is run, you can ZIP it and transfer everything for post processing and spectral output with https://github.com/qcxms/PlotMS we also have an MSP output parser that allows for NIST import among other things. Here is a SLURM array job. The bash file calls the *.slurm file. Worked for us
and the run-array-qcxms.slurm file
|
Slurm Job Arrays are an excellent choice for software tools like QCxMS that require processing hundreds of individual subdirectories for several reasons: Efficient Parallelization Simplified Job Management Scalability Resource Optimization Organized Output Environment Variables Flexible Execution Control |
@tobigithub thanks for the info! We're running enterprise grade equipment (HPE compute nodes with AMD Epycs) so I'm not too worried if we blow up an SSD. I'm just happy to see them used and our budget be put to good use for our researchers. I set this up about two years ago. I have been building and learning as I go, my last cluster experience was SGE back in the 2000s. I've been in the process of benching SMT to see if we want it off on all our nodes or a mix, so I'll add QCXMS as another tool we need to check on that with. |
It looks like all 49 of the 50 trajectories finished OK, just trajectory 9 was interrupted? Might want to check with a very small molecule like methane and only very short trajectories and then monitor the SLURM log files and QCxMS errors created. Usually when run as individual processes via the Slurm Array there are no issues. Individual trajectories can of course fail, but that should not interrupt the SLURM queue or other schedulers. They just fail. The final MS or MS/MS spectrum is created with the batch scripts getres (which loops through all the *.res files in the individual directories) and plotms (which creates the MS or MS/MS spectrum file).
|
Unfortunately the Slurm output I have from the user just has the SIGABRT from QCXMS. The way we were running it I think a failed run would interrupt the process as you described. Again thanks for the help, you're probably looking at that and going "this guy has a job?" but running this Slurm cluster is abotu 1/8 of my duties so I kind of have to chuck things over the fence as fast as possible. Not quite as in depth with it as I should be. |
Hello, I'm working with a graduate student trying to use QCxMS on our HPC. We're on version 5.2.1. I'm working from the instructions found here:
https://xtb-docs.readthedocs.io/en/latest/qcxms_doc/qcxms_run.html
We're down to Step 3, I've fixed the swapping of
-prod
for--prod
in the pqcxms script from a bug mentioned elsewhere here and it appears to be running. We're using slurm and the job finishes, we get a qcxms.out file in the main project directory along with the individual ones in the TMPQCXMS/TMP.X directories but all the .res files are empty.I'm the HPC admin and have a vague recollection of mass spectroscopy from my physics grad school days but I will say I'm not a domain expert on the subject. From my glancing at the attached files it looks like it produces some energies from the calculations but does not create the spectra. I do see a SIGABRT in the slurm output:
I'm sure I must be missing something simple to help this student along in her work. Thanks for any help! Relevant in and out files below, let me know if there is anything else I can provide.
qcxms.in.txt
qcxms.out.txt
Edit: I've also tried to customize the q-batch script for our slurm configuration. But from my understanding it looks like you can just call pqxms from a an sbatch file and run it that way intsead of queuing each run like q-batch seems to do?
The text was updated successfully, but these errors were encountered: