Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pqcxms not generating res files #86

Open
lh59281 opened this issue Aug 30, 2024 · 10 comments
Open

pqcxms not generating res files #86

lh59281 opened this issue Aug 30, 2024 · 10 comments

Comments

@lh59281
Copy link

lh59281 commented Aug 30, 2024

Hello, I'm working with a graduate student trying to use QCxMS on our HPC. We're on version 5.2.1. I'm working from the instructions found here:

https://xtb-docs.readthedocs.io/en/latest/qcxms_doc/qcxms_run.html

We're down to Step 3, I've fixed the swapping of -prod for --prod in the pqcxms script from a bug mentioned elsewhere here and it appears to be running. We're using slurm and the job finishes, we get a qcxms.out file in the main project directory along with the individual ones in the TMPQCXMS/TMP.X directories but all the .res files are empty.

I'm the HPC admin and have a vague recollection of mass spectroscopy from my physics grad school days but I will say I'm not a domain expert on the subject. From my glancing at the attached files it looks like it produces some energies from the calculations but does not create the spectra. I do see a SIGABRT in the slurm output:

xargs: wrapped_qcxms: terminated by signal 6

I'm sure I must be missing something simple to help this student along in her work. Thanks for any help! Relevant in and out files below, let me know if there is anything else I can provide.

qcxms.in.txt
qcxms.out.txt

Edit: I've also tried to customize the q-batch script for our slurm configuration. But from my understanding it looks like you can just call pqxms from a an sbatch file and run it that way intsead of queuing each run like q-batch seems to do?

@JayTheDog
Copy link
Member

Hi, this all looks good to me, except that at some point the program stops by a signal 6 termination. This looks like it has nothing to do with the code, as no error message is produced and it all stops at the exact same time because the wrapper script is stopped.
Did you log out of your account or closed the terminal or such thing? Maybe start the calculation locally again but use the nohup command to ensure that the script is still running while you are not active on your PC.

@lh59281
Copy link
Author

lh59281 commented Aug 30, 2024

Hello, it's running on an HPC cluster via Slurm, kicked off thusly:

#!/bin/bash
#SBATCH -N1 -n50 --time=24:00:00 --mem=200GB

module load qcxms
pqcxms -j 50 -t 1

It SIGABRTs after a few minutes so it's not running into a time limit as we gave it 24 hours. Looking at the node it's not running out of RAM or some other resource as far as I can tell.

We're running RHEL 9.4. Do we need to absolutely go the q-batch route for this? It seems to insert everything as its own job in the queue is the only main difference.

Thanks!

@JayTheDog
Copy link
Member

You do not need the q-batch script, it's only provided to make it easy-to-use for batch users. Please feel free to use a SLURM submission script for this, somewhere someone already created a SLURM script for that, but I can't recall if it was posted somewhere.
Furthermore, I don't think that using 50 cores via slurm and at the same time trying to get 50 instances running via batch on that one node is the right way to do it. The pqcxms script is a batch script that aims to run a calculation locally, using xargs to talk to the local processors. If you have a HPC, the SLURM/BATCH queuing system should take the job of submitting the jobs to each node, not the batch script

@lh59281
Copy link
Author

lh59281 commented Aug 30, 2024

OK thanks! The q-batch approach looked like it was written for torque or PBS initially so I thought it might be easier to approach it from a different angle. Especially when mixing it with Lmod.

Furthermore, I don't think that using 50 cores via slurm and at the same time trying to get 50 instances running via batch on that one node is the right way to do it. The pqcxms script is a batch script that aims to run a calculation locally, using xargs to talk to the local processors. If you have a HPC, the SLURM/BATCH queuing system should take the job of submitting the jobs to each node, not the batch script

I don't think so either after digging in and trying to figure out how QCXMS works. We're only running it on one node right now and were trying a few different approaches to see how it worked. I think we're going to have to come up with our our script for this.

Looks like the correct approach is to setup an individual job for each TMPQCXMS/TMP.X and their corresponding qcxms.in and then cat the out and res files into the main project directory if I'm following the logic correctly.

@JayTheDog
Copy link
Member

Yes, if you can make a script that runs a single instance of QCxMS on each node and collect the res file, that is actually what the q-batch script does. So it would bne best to use it for your own infrastructure.
Hope this helped and good luck!

@tobigithub
Copy link

tobigithub commented Aug 31, 2024

@lh59281 you may want to use SLURM Job Arrays. Basically if you have a QCxMS task with 400 trajectories, you just submit a single command which will loop through all the 400 tasks via Slurm, its much less load on the Slurm scheduler and sysadmins love it.

You also might want to check if hyperthreading is on/off depending on Intel or AMD processors. Plus because of the large write overhead, you certainly want to cache your data directory via RAM or use enterprise SSDs. If you use a RAID array, make sure you can write fast enough (small chunks). You can also ping @Shunyang2018, we have run thousands of QCEIMS and QCxMS jobs on Slurm, sometimes using several thousand CPU cores.

The only other job before running the Job Array is to have all the trajectories ready in their directories, you basically want to have the initial MD run individually. You can do that on a high GHz CPU like a Core i9 or Ryzen 9, because I think this is not parallelized. Or you can create another SLURM job by using the --dependency switch with the afterok option.

After evrything is run, you can ZIP it and transfer everything for post processing and spectral output with https://github.com/qcxms/PlotMS we also have an MSP output parser that allows for NIST import among other things.

Here is a SLURM array job. The bash file calls the *.slurm file. Worked for us

#!/bin/bash

# location of slurm output and log files
mkdir slurm-logs

# run arrayjob on cluster, check MS2TMPDIR size and adjust array size
# use: shopt -s nullglob; files=(MS2TMPDIR/*); echo ${#files[@]};

sbatch  --array=1-400 run-array-qcxms.slurm

and the run-array-qcxms.slurm file

#!/bin/bash

### ------------------------------------------------------------
### QCxMS slurm file for cluster use
### Tobias Kind // Fiehnlab 2020 // CC-BY license // v1.1 Dec 2020
###
### This is a array job for slurm and processes all files
### in the TMPQCXMS based on the slurm scheduler.
### Requires adaption to other cluster environments if ported.
### Slurm info: https://hpc-wiki.info/hpc/SLURM
### Check file system performance on share vs local SSD
### -------------------------------------------------------------

#SBATCH --job-name=array_job
#SBATCH --output=slurm-logs/out_%j.txt  # write into separate directory
##SBATCH --gres=gpu:1                   # gres only needed for GPU computin
##SBATCH --nodes=1                      # request whatever is available
##SBATCH --exclusive                    # does not need to run exclusive
#SBATCH --cpus-per-task=2               # Open-MPI directive, set to 4 (slow share drive) and OMP = 4 (which uses 100% threads)
##SBATCH --ntasks-per-node=1            # 
#SBATCH --partition=production          # 
#SBATCH --time=0-08:00:00               # expected time of completion in hours, minutes, seconds, default 1-day

echo "Starting at `date`"
echo "Running on hosts: $SLURM_NODELIST"
echo "Running on $SLURM_NNODES nodes."
echo "Running $SLURM_NTASKS tasks."
echo "Current working directory is `pwd`"


# echo time and date
echo "Job started at " `date`
echo ''

# echo which node we are
echo  $HOSTNAME
echo ''

# echo the CPU
lscpu
echo ''

# echo the  job id
echo 'jobid: '${SLURM_JOB_ID}
echo ''

# source the qcxms files, heavily dependent on systems install
# set OMP threads (slurm needs to requrest double of that)
export XTBHOME=/share/fiehnlab/software/qcxms/.XTBPARAM
export PATH=$PATH:/share/fiehnlab/software/qcxms
export OMP_NUM_THREADS=2

echo 'Task ID:'
echo ${SLURM_ARRAY_TASK_ID}/
echo ''

# change into taskarray directory and execute QCxMS
cd TMPQCXMS/TMP.${SLURM_ARRAY_TASK_ID}/

# print current directory
pwd

# execute QCxMS in production mode
qcxms -prod > qcxms.out 2>&1 &
wait

# echo time and date
echo "Job ended at " `date`
echo ''

@tobigithub
Copy link

Slurm Job Arrays are an excellent choice for software tools like QCxMS that require processing hundreds of individual subdirectories for several reasons:

Efficient Parallelization
Job Arrays allow you to easily parallelize the processing of multiple subdirectories across compute nodes. Instead of submitting 400-800 individual jobs, you can submit a single array job that spawns multiple tasks, each handling one subdirectory.

Simplified Job Management
With Job Arrays, you only need to submit and manage a single job submission rather than hundreds of individual jobs. This significantly reduces the load on the job scheduler and makes it easier to track and manage your workload.

Scalability
Job Arrays can easily scale from processing a few subdirectories to hundreds or even thousands. You can adjust the array size by simply modifying the --array parameter, making it flexible for different dataset sizes.

Resource Optimization
Each array task inherits the resource specifications (CPUs, memory, etc.) set in the job script. This ensures consistent resource allocation across all subdirectory processing tasks without having to specify them individually.

Organized Output
Job Arrays allow for systematic naming of output and error files using the %A (array job ID) and %a (task ID) placeholders. This makes it easy to track outputs for each subdirectory processed.

Environment Variables
Slurm sets specific environment variables for array jobs, such as SLURM_ARRAY_TASK_ID, which can be used within your script to identify and process the correct subdirectory for each task.

Flexible Execution Control
You can easily limit the number of simultaneously running tasks using the % notation in the --array directive. This allows you to control the load on the system and optimize resource usage. By leveraging Slurm Job Arrays, tools like QCxMS can efficiently process large numbers of subdirectories in parallel, simplifying job management, optimizing resource usage, and improving overall workflow organization.

@lh59281
Copy link
Author

lh59281 commented Sep 4, 2024

@tobigithub thanks for the info! We're running enterprise grade equipment (HPE compute nodes with AMD Epycs) so I'm not too worried if we blow up an SSD. I'm just happy to see them used and our budget be put to good use for our researchers.

I set this up about two years ago. I have been building and learning as I go, my last cluster experience was SGE back in the 2000s. I've been in the process of benching SMT to see if we want it off on all our nodes or a mix, so I'll add QCXMS as another tool we need to check on that with.

@tobigithub
Copy link

It looks like all 49 of the 50 trajectories finished OK, just trajectory 9 was interrupted? Might want to check with a very small molecule like methane and only very short trajectories and then monitor the SLURM log files and QCxMS errors created. Usually when run as individual processes via the Slurm Array there are no issues. Individual trajectories can of course fail, but that should not interrupt the SLURM queue or other schedulers. They just fail. The final MS or MS/MS spectrum is created with the batch scripts getres (which loops through all the *.res files in the individual directories) and plotms (which creates the MS or MS/MS spectrum file).

  • Also its not about blowing up SSDs, rather that even RAID arrays will bog down with thousands of small writes. Its nice to benchmark write speeds in real-time.
  • It is easily possible to saturate a 10k or 100k CPU cluster to 99% with a SLURM array.
  • Also QCxMS does not need 200GB RAM, depending on how many EPIC nodes you have, please consider, you requested 10TB of RAM total, but for many quad socket EPYC motherboards they usually hold only max 4-8 TB of RAM (2TB per node). So that might be an error point if the jobs are allocated to one or two nodes only. Check if QCxMS is happy with 8 Gbyte or 16 Gbyte. Or leave the memory requirement out?

@lh59281
Copy link
Author

lh59281 commented Sep 5, 2024

  • We've got metrics via prometheus/node_exporter and so far it's not been an issue, I think the researcher is just running small/short runs. Our other chemists use Gaussian 16 which has its own quirks. But it's something we keep an eye on. Right now being under utilized is a bigger concern than being over committed on resources due organizational posture.
  • Seems like that's entirely possible. Again we've got decent metrics on our cluster and are still in the R&D state with QCxMS.
  • The way I have Slurm configured they have to specify a RAM amount as we had issues with another piece of software. The 200GB was just us giving it a ridiculously high ceiling to see if it was running out of RAM or butting up against it somehow. Initial runs were at 8GB. Since pqcxms file was invoked with Slurm I don't think it would compound like your saying? That sounds more like a behavior if one uses something like q-batch where it submits a Slurm batch with a 200GB limit for each TMP.X directory. The way it's run here my understanding is that each qcxms instance falls under that pqcxms batch script and cannot use more than 200GB total.

Unfortunately the Slurm output I have from the user just has the SIGABRT from QCXMS. The way we were running it I think a failed run would interrupt the process as you described.

Again thanks for the help, you're probably looking at that and going "this guy has a job?" but running this Slurm cluster is abotu 1/8 of my duties so I kind of have to chuck things over the fence as fast as possible. Not quite as in depth with it as I should be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants