-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #409 from openforcefield/nagl2-training-opt
NAGL2 training optimizations part 1
- Loading branch information
Showing
10 changed files
with
56,856 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
58 changes: 58 additions & 0 deletions
58
...ons/2024-11-19-OpenFF-NAGL2-Training-Optimization-Dataset-Part-1-v4.0/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
# OpenFF NAGL2 Training Optimization Dataset Part 1 v4.0 | ||
|
||
## Description | ||
A dataset containing molecules from the [`MLPepper RECAP Optimized Fragments v1.0`](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-07-26-MLPepper-RECAP-Optimized-Fragments-v1.0) | ||
and [`MLPepper RECAP Optimized Fragments v1.0 Add Iodines`](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-10-11-MLPepper-RECAP-Optimized-Fragments-Add-Iodines-v1.0) datasets, | ||
with new conformers and optimized at the OpenFF default level of theory (B3LYP-D3BJ/DZVP). | ||
The dataset is intended to be used for calculating single point energies and properties, | ||
which will then be used to train our second-generation graph neural network charge model (NAGL2). | ||
This is part 1, for molecules with molecular weight less than 300 Da. | ||
|
||
|
||
For each molecule, a set of up to 5 conformers were generated by: | ||
|
||
* generating a set of up to 1000 conformers with a RMS cutoff of 0.1 Å | ||
using the OpenEye backend of the OpenFF toolkit | ||
|
||
* applying ELF conformer selection (max 5 conformers) using OpenEye | ||
|
||
|
||
## General information | ||
* Date: 2024-11-19 | ||
* Class: OpenFF Optimization Dataset | ||
* Purpose: Conformer optimization | ||
* Name: OpenFF NAGL2 Training Optimization Dataset Part 1 v4.0 | ||
* Number of unique molecules: 55134 | ||
* Number of conformers: 131198 | ||
* Number of conformers (min, mean, max): 1.00, 2.38, 5.00 | ||
* Molecular weight (min, mean, max): 32.12, 158.53, 299.97 | ||
* Charges: -4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 | ||
* Dataset submitter: Alexandra McIsaac | ||
* Dataset generator: Alexandra McIsaac | ||
|
||
## QCSubmit generation pipeline | ||
* `generate-dataset-part1.ipynb` was used to generate conformers from CMILES and create the dataset. | ||
|
||
## QCSubmit Manifest | ||
* `dataset_part1.json.bz2`: compressed dataset ready for submission | ||
* `dataset_part1.pdf`: Visualization of dataset molecules | ||
* `dataset_part1.smi`: Smiles strings for dataset molecules | ||
* `generate-dataset-part1.ipynb`: Notebook describing dataset generation and submission | ||
* `input-environment.yaml`: Environment file used to create Python environment for the notebook | ||
* `input-environment-full.yaml`: Fully-resolved environment used to execute the notebook. | ||
* `mlpepper.json.bz2` zipped version of the MLPepper dataset needed to generate conformers. | ||
|
||
## Metadata | ||
* Elements: {Cl, O, C, P, I, Br, B, S, N, F, H, Si} | ||
* Spec: default | ||
* basis: DZVP | ||
* implicit_solvent: None | ||
* keywords: {} | ||
* maxiter: 200 | ||
* method: B3LYP-D3BJ | ||
* program: psi4 | ||
* SCF properties: | ||
* dipole | ||
* quadrupole | ||
* wiberg_lowdin_indices | ||
* mayer_indices |
3 changes: 3 additions & 0 deletions
3
.../2024-11-19-OpenFF-NAGL2-Training-Optimization-Dataset-Part-1-v4.0/dataset_part1.json.bz2
Git LFS file not shown
Binary file added
BIN
+15.7 MB
...sions/2024-11-19-OpenFF-NAGL2-Training-Optimization-Dataset-Part-1-v4.0/dataset_part1.pdf
Binary file not shown.
Oops, something went wrong.