Skip to content

Commit

Permalink
Merge pull request #409 from openforcefield/nagl2-training-opt
Browse files Browse the repository at this point in the history
NAGL2 training optimizations part 1
  • Loading branch information
amcisaac authored Nov 21, 2024
2 parents 7169c27 + fedd8d3 commit 4cf9b44
Show file tree
Hide file tree
Showing 10 changed files with 56,856 additions and 3 deletions.
3 changes: 0 additions & 3 deletions .gitattributes
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,3 @@
*.zst filter=lfs diff=lfs merge=lfs -text
*.bz filter=lfs diff=lfs merge=lfs -text
*bz2 filter=lfs diff=lfs merge=lfs -text
/mnt/storage/nobackup/nca121/qca-dataset-submission/submissions/2024-10-11-MLPepper-RECAP-Optimized-Fragments-Add-Iodines-v1.0/esp_50k_I_singlepoint_dataset.json.bz2 filter=lfs diff=lfs merge=lfs -text
/mnt/storage/nobackup/nca121/qca-dataset-submission/submissions/2024-10-11-MLPepper-RECAP-Optimized-Fragments-Add-Iodines-v1.0/dataset.pdf filter=lfs diff=lfs merge=lfs -text
/mnt/storage/nobackup/nca121/qca-dataset-submission/submissions/2024-10-11-MLPepper-RECAP-Optimized-Fragments-Add-Iodines-v1.0/iodine_filtered.json.bz2 filter=lfs diff=lfs merge=lfs -text
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,7 @@ These are currently used to find a minimum energy conformation of a molecule.
| `OpenFF Sulfur Optimization Training Coverage Supplement v1.0` | [2024-09-11-OpenFF-Sulfur-Optimization-Training-Coverage-Supplement-v1.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-09-11-OpenFF-Sulfur-Optimization-Training-Coverage-Supplement-v1.0) | Additional optimization training data for Sage sulfur and phosphorus parameters | C, S, F, O, H, Cl, Br, P, N | |
| `OpenFF Sulfur Optimization Benchmarking Coverage Supplement v1.0` | [2024-09-18-OpenFF-Sulfur-Optimization-Benchmarking-Coverage-Supplement-v1.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-09-18-OpenFF-Sulfur-Optimization-Benchmarking-Coverage-Supplement-v1.0) | Additional optimization benchmarking data for Sage sulfur and phosphorus parameters | S, P, Cl, C, N, O, H, Br, F | |
| `OpenFF Lipid Optimization Training Supplement v1.0` | [2024-10-08-OpenFF-Lipid-Optimization-Training-Supplement-v1.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-10-08-OpenFF-Lipid-Optimization-Training-Supplement-v1.0) | Additional optimization training data for Sage from representative LIPID MAPS fragments | I, Br, O, H, P, C, N, Cl, F, S | |
| `OpenFF NAGL2 Training Optimization Dataset Part 1 v4.0` | [2024-11-19-OpenFF-NAGL2-Training-Optimization-Dataset-Part-1-v4.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-11-19-OpenFF-NAGL2-Training-Optimization-Dataset-Part-1-v4.0) | Optimization dataset for NAGL2 training, part 1 | Cl, O, C, P, I, Br, B, S, N, F, H, Si | |
# TorsionDrive Datasets
These are currently used perform a complete rotation of one or more selected bonds, where optimizations are performed over a discrete set of angles.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# OpenFF NAGL2 Training Optimization Dataset Part 1 v4.0

## Description
A dataset containing molecules from the [`MLPepper RECAP Optimized Fragments v1.0`](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-07-26-MLPepper-RECAP-Optimized-Fragments-v1.0)
and [`MLPepper RECAP Optimized Fragments v1.0 Add Iodines`](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-10-11-MLPepper-RECAP-Optimized-Fragments-Add-Iodines-v1.0) datasets,
with new conformers and optimized at the OpenFF default level of theory (B3LYP-D3BJ/DZVP).
The dataset is intended to be used for calculating single point energies and properties,
which will then be used to train our second-generation graph neural network charge model (NAGL2).
This is part 1, for molecules with molecular weight less than 300 Da.


For each molecule, a set of up to 5 conformers were generated by:

* generating a set of up to 1000 conformers with a RMS cutoff of 0.1 Å
using the OpenEye backend of the OpenFF toolkit

* applying ELF conformer selection (max 5 conformers) using OpenEye


## General information
* Date: 2024-11-19
* Class: OpenFF Optimization Dataset
* Purpose: Conformer optimization
* Name: OpenFF NAGL2 Training Optimization Dataset Part 1 v4.0
* Number of unique molecules: 55134
* Number of conformers: 131198
* Number of conformers (min, mean, max): 1.00, 2.38, 5.00
* Molecular weight (min, mean, max): 32.12, 158.53, 299.97
* Charges: -4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
* Dataset submitter: Alexandra McIsaac
* Dataset generator: Alexandra McIsaac

## QCSubmit generation pipeline
* `generate-dataset-part1.ipynb` was used to generate conformers from CMILES and create the dataset.

## QCSubmit Manifest
* `dataset_part1.json.bz2`: compressed dataset ready for submission
* `dataset_part1.pdf`: Visualization of dataset molecules
* `dataset_part1.smi`: Smiles strings for dataset molecules
* `generate-dataset-part1.ipynb`: Notebook describing dataset generation and submission
* `input-environment.yaml`: Environment file used to create Python environment for the notebook
* `input-environment-full.yaml`: Fully-resolved environment used to execute the notebook.
* `mlpepper.json.bz2` zipped version of the MLPepper dataset needed to generate conformers.

## Metadata
* Elements: {Cl, O, C, P, I, Br, B, S, N, F, H, Si}
* Spec: default
* basis: DZVP
* implicit_solvent: None
* keywords: {}
* maxiter: 200
* method: B3LYP-D3BJ
* program: psi4
* SCF properties:
* dipole
* quadrupole
* wiberg_lowdin_indices
* mayer_indices
Git LFS file not shown
Binary file not shown.
Loading

0 comments on commit 4cf9b44

Please sign in to comment.