How to recreate/find MP ids in your hdf5 train/val splits #502

DeNeutoy · 2024-07-03T21:19:21Z

DeNeutoy
Jul 3, 2024

Hello! In #297 the information you provided around the validation energy MAEs was very helpful. I was wondering if you could help clarify one thing:

I was wondering if it is possible to recreate the HDF5 data you have provided as your train/val split. The reason I want to do this is to retrieve the mp-ids for each batch - when I looked into this, it seems difficult because:

In the HDF5 files, there is no reference to which .extxyz file it came from
In the dataset provided as training data in the MACE MP repository, there is no train/val split, so I assume you have used the functionality in your pre-processing script here to create a random split.

The reason I would like to recreate your dataset exactly is because we have computed some dispersion corrections on the original MPTraj dataset, but I would also like to use your exact data split to make the comparison between the models fairer. Do you see a way for this to be possible?

If not, do you think that this will matter, or have you observed low variance when using different randomly held out splits of MPTraj? I couldn't quite tell from your pre-processing code if you are holding out entire trajectories from MPtraj for your validation set also, or if you consider each trajectory point to be independent from a dataset split perspective.

ilyes319 · 2024-07-04T13:37:35Z

ilyes319
Jul 4, 2024
Maintainer

Hey,

To recreate the files, you need to use the pre-processing (see doc here). All the point of the trajectories are independent. I think to get the id, you should redo the split by hand, and then process each files (train and valid) indepdently. I don't think the training is sensitive to the split.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to recreate/find MP ids in your hdf5 train/val splits #502

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

How to recreate/find MP ids in your hdf5 train/val splits #502

DeNeutoy Jul 3, 2024

Replies: 1 comment

ilyes319 Jul 4, 2024 Maintainer

DeNeutoy
Jul 3, 2024

ilyes319
Jul 4, 2024
Maintainer