Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't run vasp via srun。 #55

Open
yinkaaiwu opened this issue Dec 9, 2023 · 0 comments
Open

Can't run vasp via srun。 #55

yinkaaiwu opened this issue Dec 9, 2023 · 0 comments

Comments

@yinkaaiwu
Copy link

yinkaaiwu commented Dec 9, 2023

Hello, I am trying to run VASP calculations on a Slurm cluster using srun, but I have encountered a very strange issue. When I check the job status with squeue, it shows that my job is "Running," but in reality, no VASP processes are being started. VASP-interactive also doesn't produce any errors. It successfully creates the initial files but gets stuck in the 'while self.process.poll() is not None' loop, which is quite strange.

I tried using submitit with subprocess.Popen() and mpirun to execute your _start_vasp_process() function, and I encountered the same bug. It was only when I modified the command parameter from 'mpirun -np xx vasp_std' to 'vasp_std' that I was able to successfully start VASP on a single thread.

I have tried many things and ruled out environment variables as the possible cause, but I still can't find the reason for this bug. Although I feel that this may not be an issue with your code and could be related to Slurm or mpirun, I believe others might have faced similar problems. Therefore, I have opened an issue in the hope of getting a solution from you. Thank you!

Below is the code I used to run VASP with srun:

from ase.optimize import BFGS
from vasp_interactive import VaspInteractive
from ase.db import connect

def runvasp(params, atoms, path):
    params['directory'] = path
    with VaspInteractive(**params) as vi:
        atoms.set_calculator(vi)
        dyn = BFGS(atoms=atoms,
                   maxstep=0.15,
                   trajectory=f'{path}/vasp_relaxation.traj',
                   logfile=f'{path}/vasp_BFGS.log')
        dyn.run(fmax=0.05, steps=2)
    return dyn.get_number_of_steps()


params = dict(
    system='VaspJet',
    command='srun -p CLUSTER -N 1 -n 48 vasp_gam -J test1 ',
    xc='PBE',
    lreal='Auto',
    kpts=(1, 1, 1),
    lmaxmix=4,
    encut=300,
    ismear=0,
    sigma=0.05,
    algo='fast',
    prec='Normal',
    nsw=2000,
    ibrion=-1,
    npar=4,
    isif=3,
    nwrite=1,
    lwave=False,
    lcharg=False,
    txt='vasp.out'
)

atoms1 = connect('./AuAgPt.db').get_atoms(id=1)
runvasp(params, atoms1, './test')

This is the code I used with submitit and subprocess.Popen() to start VASP with mpirun.

import submitit
import time
from subprocess import Popen, PIPE

def startvasp(cwd):
    process = Popen(
        args='mpirun -np 48 vasp_gam',
        shell=True,
        stdin=PIPE,
        stdout=PIPE,
        stderr=PIPE,
        cwd=cwd,
        universal_newlines=True,
        bufsize=0
    )
    stdout, stderr = process.communicate()
    return process.pid, process.poll(), stdout, stderr


# executor is the submission interface (logs are dumped in the folder)
executor = submitit.AutoExecutor(folder="log_test")
# set timeout in min, and partition for running the job
executor.update_parameters(
    timeout_min=3600,
    slurm_partition="CLUSTER",
    nodes=1,
    tasks_per_node=1,
    cpus_per_task=48,
    slurm_setup=[
    ]
)
jobs = []
for i in range(1):
    executor.update_parameters(slurm_job_name=f"test1")
    job = executor.submit(startvasp, '/home/fwtop/vaspjet-test/part-1.0/10')
    # job = executor.submit(startvasp, '/home/wyk')
    jobs.append(job)

time.sleep(2)
print(jobs[0].get_info())
print(jobs[0].result()[:2])
print(jobs[0].result()[2])
print(jobs[0].result()[3])

This is the work_dir looks like:

(base) [redhat@gpu test]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               131   CLUSTER         test1       fwtop  R       0:43      1 hpc-1-806
(base) [redhat@gpu test]$ ls
ase-sort.dat  INCAR  KPOINTS  POSCAR  POTCAR  STOPCAR  vasp_BFGS.log  vasp.out  vasp_relaxation.traj
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant