Multi step MD flow #489

gpetretto · 2023-08-26T08:23:58Z

This PR introduces a flow that allows to split an MD simulation over multiple steps and to continue from a previously terminated flow.

The reason for introducing this is that long MD calculations would usually not fit inside a single submission and that the amount of steps required to achive a good convergence might not be know a priori. The idea is that should be easyto split/join MD flows and then reconstruct a full trajectory.

A few points to describe the changes:

The number of MD calculations in the Flow is governed by the n_runs parameter.
At the end of the Flow one job will generate an output containing the final structure and the list of uuids of the outputs of the flow. The reason is to:
- be able to easily generate the full trajectory, but avoiding storing the total trajectory. Storing it would mean duplicating the stored data for a potentially very large object.
- be able to restart from the last step and at the end of the new flow have the full list of uuids required to reconstruct the whole trajectory.
A new Flow can be generated from the uuid of the last job of a previously terminated MultiMDFlow.
the continuation Flow should be executed setting allow_external_references=True in run_locally or in flow_to_workflow

Also some questions/points that could be discussed:

There is no obvious way of setting the number of steps and the time steps in each of the MD jobs. The standard way should probably be
```
md_maker = MDMaker(input_set_generator=MDSetGenerator(nsteps=10000, time_step=1))
md_flow = MultiMDMaker(md_maker=md_maker, n_runs=200).make(structure)
```
but it looks a bit involved. It would be nice to be able to set nsteps, time_step and n_runs in one place. Are there better ways to do this?
pymatgen now supports global properties. However, when passing a Structure to the Poscar object site_properties and properties in the structure are ignored. At the moment I modified the InputSet to take this into account when writing the POSCAR. Would it be better to change pymatgen so that it considers the structure properties when generating a Poscar?
Should the output produced by md_output contain more data (e.g. be a subclass of emmet's StructureMetadata)
Is the restart from uuid fine this way? Could it be a more general approach in atomate2?

Tests are still missing.

Also pinging @mjwen as the original author of the MDMaker.

mjwen · 2023-09-15T14:01:41Z

@gpetretto thanks for the PR. I will be able to take a look at this and get back to you soon if I have comments.

mjwen · 2023-10-06T21:39:37Z

Hi @gpetretto.

I've taken a look the new flow. The question in the ~~above~~ below comment is the only major one. Everything else looks great. I will provide other minor comments once we got that figured out.

JaGeo · 2023-10-06T21:41:49Z

(@mjwen I think you need to actually publish the review)

mjwen · 2023-10-06T21:37:21Z

src/atomate2/vasp/flows/core.py

+            else:
+                md_structure = md_job.output.structure
+                md_prev_vasp_dir = md_job.output.dir_name
+            md_job = self.md_maker.make(md_structure, prev_vasp_dir=md_prev_vasp_dir)


From line 561, a next run will take the structure from the previous run as input, and additional copy files from the previous dir.

Will this ensure that the MD will continue from the last stopped point. Positions of atoms will of course be, but how about others? For example, if one uses a thermostat, will the state of the thermostate be preserved? In other words, if I have a single MD will 100 steps and run 5 MD each with 20 steps using the new flow, will I get exactly the same results?

Thanks for taking the time to look into this and sorry for the delay.
My understanding is that there is no way of preserving the exact state of the thermostat. I also tried to look into VASP input and output to check if any information about the state of thermostat could be set, but it does not look like that. Splitting a calculation over chunks will not produce exactly the same trajectory as if the calculation was not split, as the total energy of system composed of the structure+thermostat will not be preserved over a restart, however this should still lead to meaninful result.
As far as I know, splitting the calculation over several chunks is a standard procedure, if all the required steps do not fit inside a single run. The VASP wiki also describes this as the procedure to follow: https://www.vasp.at/wiki/index.php/Molecular_dynamics_calculations

If a continuation run is performed copy CONTCAR to POSCAR or possibly deliver initial velocities in the POSCAR file.

So it should be fine to proceed in this way.

Sounds good

mjwen · 2023-10-06T21:45:16Z

src/atomate2/vasp/sets/base.py

+        site_properties = structure.site_properties
+        poscar = Poscar(
+            structure,
+            velocities=site_properties.get("velocities"),


Will these guarantee the continuation of the MD?

Actually I have discovered that if an NpT simulation is performed also the lattice velocities are added to the CONTCAR (see https://www.vasp.at/wiki/index.php/POSCAR) and should be preserved. At the moment they are completely ignored from the pymatgen Poscar object. I will add them.

gpetretto · 2023-10-26T15:29:17Z

As an additional point, we have further discussed about how to best set variables like nsteps, time_step, n_runs ensemble, start_temp and end_temp without the need to instantiate the MultiMDMaker, the MDMaker and the MDSetGenerator. We thought that a classmethod could be a good solution for this. For example:

class MultiMDMaker(Maker):

    @classmethod
    def from_parameters(cls, nsteps, time_steps, n_runs, ensamble, start_temp, end_temp):
        generator = MDSetGenerator(nsteps=nsteps, time_step=time_step, enamble=ensamble, start_temp=start_temp, end_temp=end_temp)
        md_maker = MDMaker(input_set_generator=generator)
        return cls(md_maker=md_maker, n_runs=n_runs).make(structure)

It seems that this approach is not used in other Flow Makers, but it would be particularly useful in this case, since the default values of these parameters will almost never fit the users's need and will require defining multiple objects manually. @utf do you think this might be a suitable approach?

davidwaroquiers · 2023-11-17T14:28:09Z

Hello,

We have discussed with @gpetretto about one non-critical point but still to be decided. I have the feeling that "MultiMDMaker" is a bit odd and I would prefer to call this flow maker "MDMaker". The point is the job maker also has the same name. We see four options for this:

1/ Keep MultiMDMaker for the flow and MDMaker for the job (i.e. as it is now)
Pros: no backward compatibility issue, no change for users, scripts or packages using the job MDMaker
Cons: new features of the MultiMDMaker (i.e. splitting, continuing) are of course not covered by the MDMaker-created jobs. Could be "confusing" for the user ? I would think that the MultiMDMaker alone should be "advertised" as the one to be used (consider someone who's just willing to make one single MD calculation, he can start one either with the MDMaker or with the MultiMDMaker-1-split, then he wants to continue, he cannot do with the MDMaker).

2/ Call the flow maker an "MDMaker" and keep MDMaker for the job
Pros: no backward compatibility issue, no change for users, scripts or packages using the job MDMaker.
Cons: could be confusing for the user to have a job and a flow maker with same name. Same as above, the flow maker should probably be "advertised"

3/ Call the flow maker an "MDMaker" and rename MDMaker to SingleMDMaker for the job
Pros: Clear distinction between the two. Not confusing. Pushes users to make the change to the flow MDMaker which gives them more features.
Cons: Backward incompatible (people using MDMaker will have to change, either to SingleMDMaker if they really want it, or to the flow MDMaker). (could also be seen as a pro, as it will push them to use

4/ Call the flow maker an "MDMaker" and rename MDMaker to SingleMDMaker for the job, but keeping an "MDMaker" wrapper to SingleMDMaker, with a warning or deprecation (?) warning mentioning that it is better to use the flow
Pros: no backward compatibility, no change for users, scripts or packages using the job MDMaker (thanks to the wrapper). Pushes (thanks to the warning message) users to make the change to the flow MDMaker which gives them more features.
Cons: ?

My personal preference is on 4/ (as you may have guessed) but I am perfectly ok to any of the 3 other options (or yet another one). Pinging @mjwen @JaGeo @utf to get your comments/preference/decision on this.

We would like to have this merged as soon as possible once it is ready (@gpetretto is doing the latest changes and adding tests as of now) but we would like to have the above decided. We fixed ourselves next Tuesday so if by next Tuesday we don't have any replies or if it leads to too much discussion around it because it's too tricky, we would keep it as it is now (option 1/).

Thanks a lot!

gpetretto · 2023-11-22T13:11:57Z

I have restructured the MultiMDFlow and made a few changes:

Now the number of runs is not determined by an int, but a list of MDMakers can be passed. In this way the chunks do not need to be equivalent.
The standard entry point for a user would be the from_parameters classmethod. In this way one can set the standard parameters directly in the flow, without having to instantiate the input generator and the Job Maker.
By default the Trajectory is stored only with the PARTIAL option, as introduced in Improve trajectory handling for MD emmet#886
In the original MDMaker time_step was defined as an int. I think there are cases where it definitely makes sense to have a float, so I changed its type.
I added a test that includes the joining of two MultiMDMaker flows.

Tests will thus require the latest version of emmet-core to pass. Not sure if I should set it in this PR.

As David mentioned above, as far as we are concerned this is ready to be reviewed and it would be convenient for us if it could be merged.

codecov · 2023-11-29T09:53:19Z

Codecov Report

Merging #489 (ff8d58d) into main (0c8eff8) will increase coverage by 0.03%.
Report is 19 commits behind head on main.
The diff coverage is 89.65%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #489      +/-   ##
==========================================
+ Coverage   76.00%   76.03%   +0.03%     
==========================================
  Files          83       86       +3     
  Lines        7027     7111      +84     
  Branches     1045     1055      +10     
==========================================
+ Hits         5341     5407      +66     
- Misses       1364     1381      +17     
- Partials      322      323       +1

Files	Coverage Δ
src/atomate2/vasp/flows/core.py	`89.51% <100.00%> (ø)`
src/atomate2/vasp/jobs/core.py	`89.41% <ø> (-0.81%)`	⬇️
src/atomate2/vasp/schemas/md.py	`100.00% <100.00%> (ø)`
src/atomate2/vasp/sets/base.py	`74.31% <100.00%> (-0.07%)`	⬇️
src/atomate2/vasp/sets/core.py	`81.48% <100.00%> (ø)`
src/atomate2/vasp/jobs/md.py	`85.71% <85.71%> (ø)`
src/atomate2/vasp/flows/md.py	`89.36% <89.36%> (ø)`

... and 8 files with indirect coverage changes

davidwaroquiers · 2023-12-05T14:45:21Z

Hi @utf

Just to know if you had time to look into this PR and if it can be merged ? I've started to implement an MLFF-based MD flow maker and I would like to reuse the MultiMDMaker as one piece of it, so it would be nice to have it merged when you feel it's ok.

If there is anything left to be done for this PR, just let us know. (maybe if you wanted to comment or tell your preference on my above comment on the MultiMDMaker name, otherwise we keep it as it is now)

Thanks a lot!

Best,

davidwaroquiers · 2023-12-18T15:30:14Z

Just getting back on this PR as I'd like to work on the MLFF-based workflow next week. @janosh or @utf, do you think something should be finalized/modified/improved here ?
Thanks!

janosh

@davidwaroquiers I left a few minor comments.

src/atomate2/vasp/jobs/md.py

src/atomate2/vasp/flows/md.py

tests/vasp/jobs/test_md.py

gpetretto · 2023-12-19T08:53:57Z

Thanks for your comments @janosh. I updated the code and tests.

utf · 2024-01-08T18:01:14Z

Thanks @gpetretto and @davidwaroquiers!

gpetretto added 3 commits August 25, 2023 17:15

initial implementation of a multi step MD flow

9a3dc7d

Merge remote-tracking branch 'upstream/main' into mdflow

5a44863

update md output document

d11af68

mjwen reviewed Oct 6, 2023

View reviewed changes

janosh force-pushed the main branch from 6ceb1ec to 3764841 Compare October 27, 2023 00:12

gpetretto mentioned this pull request Oct 27, 2023

Add lattice velocities to Poscar materialsproject/pymatgen#3428

Merged

gpetretto added 4 commits October 28, 2023 15:15

Merge remote-tracking branch 'upstream/main' into mdflow

c73cd85

Merge remote-tracking branch 'upstream/main' into mdflow

e4d5a34

handle lattice velocities

a806f74

Merge remote-tracking branch 'upstream/main' into mdflow

433e636

Updates to MultiMDMaker and tests

c4fe745

gpetretto added 2 commits November 22, 2023 14:16

Merge remote-tracking branch 'upstream/main' into mdflow

eea84df

fix typing

3e6f827

gpetretto changed the title ~~[WIP] Multi step MD flow~~ Multi step MD flow Nov 22, 2023

Merge remote-tracking branch 'upstream/main' into mdflow

1d706be

move MD flow to separate module

e2ce587

janosh reviewed Dec 19, 2023

View reviewed changes

src/atomate2/vasp/jobs/md.py Outdated Show resolved Hide resolved

src/atomate2/vasp/flows/md.py Show resolved Hide resolved

src/atomate2/vasp/flows/md.py Outdated Show resolved Hide resolved

janosh reviewed Dec 19, 2023

View reviewed changes

tests/vasp/jobs/test_md.py Outdated Show resolved Hide resolved

fix md flows and tests

ff8d58d

utf added the enhancement Improvements to existing features label Jan 8, 2024

utf merged commit 9db1f58 into materialsproject:main Jan 8, 2024
7 checks passed

chiang-yuan mentioned this pull request Feb 22, 2024

[WIP] Support general ASE Calculator for general MLIP MD simulations #738

Closed

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi step MD flow #489

Multi step MD flow #489

gpetretto commented Aug 26, 2023

mjwen commented Sep 15, 2023

mjwen commented Oct 6, 2023 •

edited

Loading

JaGeo commented Oct 6, 2023

mjwen Oct 6, 2023

gpetretto Oct 26, 2023

mjwen Nov 14, 2023

mjwen Oct 6, 2023

gpetretto Oct 26, 2023

gpetretto commented Oct 26, 2023

davidwaroquiers commented Nov 17, 2023

gpetretto commented Nov 22, 2023 •

edited

Loading

codecov bot commented Nov 29, 2023 •

edited

Loading

davidwaroquiers commented Dec 5, 2023

davidwaroquiers commented Dec 18, 2023

janosh left a comment

gpetretto commented Dec 19, 2023

utf commented Jan 8, 2024

Multi step MD flow #489

Multi step MD flow #489

Conversation

gpetretto commented Aug 26, 2023

mjwen commented Sep 15, 2023

mjwen commented Oct 6, 2023 • edited Loading

JaGeo commented Oct 6, 2023

mjwen Oct 6, 2023

Choose a reason for hiding this comment

gpetretto Oct 26, 2023

Choose a reason for hiding this comment

mjwen Nov 14, 2023

Choose a reason for hiding this comment

mjwen Oct 6, 2023

Choose a reason for hiding this comment

gpetretto Oct 26, 2023

Choose a reason for hiding this comment

gpetretto commented Oct 26, 2023

davidwaroquiers commented Nov 17, 2023

gpetretto commented Nov 22, 2023 • edited Loading

codecov bot commented Nov 29, 2023 • edited Loading

Codecov Report

davidwaroquiers commented Dec 5, 2023

davidwaroquiers commented Dec 18, 2023

janosh left a comment

Choose a reason for hiding this comment

gpetretto commented Dec 19, 2023

utf commented Jan 8, 2024

mjwen commented Oct 6, 2023 •

edited

Loading

gpetretto commented Nov 22, 2023 •

edited

Loading

codecov bot commented Nov 29, 2023 •

edited

Loading