-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #410 from openforcefield/nagl2-training-opt-p2
Nagl2 training opt p2
- Loading branch information
Showing
9 changed files
with
2,690 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
57 changes: 57 additions & 0 deletions
57
...ons/2024-11-19-OpenFF-NAGL2-Training-Optimization-Dataset-Part-2-v4.0/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
# OpenFF NAGL2 Training Optimization Dataset Part 2 v4.0 | ||
|
||
## Description | ||
A dataset containing molecules from the [`MLPepper RECAP Optimized Fragments v1.0`](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-07-26-MLPepper-RECAP-Optimized-Fragments-v1.0) | ||
and [`MLPepper RECAP Optimized Fragments v1.0 Add Iodines`](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-10-11-MLPepper-RECAP-Optimized-Fragments-Add-Iodines-v1.0) datasets, | ||
with additional conformers and optimized at the OpenFF default level of theory (B3LYP-D3BJ/DZVP). | ||
The dataset is intended to be used for calculating single point energies and properties, | ||
which will then be used to train our second-generation graph neural network charge model (NAGL2). | ||
This is part 2, for molecules with molecular weight greater than 300 Da. | ||
|
||
|
||
For each molecule, a set of up to 5 conformers were generated by: | ||
|
||
* generating a set of up to 1000 conformers with a RMS cutoff of 0.1 Å | ||
using the OpenEye backend of the OpenFF toolkit | ||
|
||
* applying ELF conformer selection (max 5 conformers) using OpenEye | ||
|
||
## General information | ||
* Date: 2024-11-19 | ||
* Class: OpenFF Optimization Dataset | ||
* Purpose: Conformer optimization | ||
* Name: OpenFF NAGL2 Training Optimization Dataset Part 2 v4.0 | ||
* Number of unique molecules: 1197 | ||
* Number of conformers: 2323 | ||
* Number of conformers (min, mean, max): 1.00, 1.94, 5.00 | ||
* Molecular weight (min, mean, max): 300.08, 377.82, 701.59 | ||
* Charges: -4.0 -2.0 -1.0 0.0 1.0 2.0 | ||
* Dataset submitter: Alexandra McIsaac | ||
* Dataset generator: Alexandra McIsaac | ||
|
||
## QCSubmit generation pipeline | ||
* `generate-dataset-part2.ipynb` was used to generate conformers from CMILES and create the dataset. | ||
|
||
## QCSubmit Manifest | ||
* `dataset_part2.json.bz2`: compressed dataset ready for submission | ||
* `dataset_part2.pdf`: Visualization of dataset molecules | ||
* `dataset_part2.smi`: Smiles strings for dataset molecules | ||
* `generate-dataset-part2.ipynb`: Notebook describing dataset generation and submission | ||
* `input-environment.yaml`: Environment file used to create Python environment for the notebook | ||
* `input-environment-full.yaml`: Fully-resolved environment used to execute the notebook. | ||
* `mlpepper.json.bz2`: Zipped version of the mlpepper dataset that can be read in for quicker conformer generation | ||
|
||
## Metadata | ||
* Elements: {Si, B, O, I, S, Cl, N, H, C, P, F, Br} | ||
* Spec: default | ||
* basis: DZVP | ||
* implicit_solvent: None | ||
* keywords: {} | ||
* maxiter: 200 | ||
* method: B3LYP-D3BJ | ||
* program: psi4 | ||
* SCF properties: | ||
* dipole | ||
* quadrupole | ||
* wiberg_lowdin_indices | ||
* mayer_indices |
3 changes: 3 additions & 0 deletions
3
.../2024-11-19-OpenFF-NAGL2-Training-Optimization-Dataset-Part-2-v4.0/dataset_part2.json.bz2
Git LFS file not shown
Binary file added
BIN
+382 KB
...sions/2024-11-19-OpenFF-NAGL2-Training-Optimization-Dataset-Part-2-v4.0/dataset_part2.pdf
Binary file not shown.
Oops, something went wrong.