Skip to content

Commit

Permalink
Merge pull request #410 from openforcefield/nagl2-training-opt-p2
Browse files Browse the repository at this point in the history
Nagl2 training opt p2
  • Loading branch information
amcisaac authored Nov 21, 2024
2 parents 4cf9b44 + 420966b commit e54e981
Show file tree
Hide file tree
Showing 9 changed files with 2,690 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,7 @@ These are currently used to find a minimum energy conformation of a molecule.
| `OpenFF Sulfur Optimization Benchmarking Coverage Supplement v1.0` | [2024-09-18-OpenFF-Sulfur-Optimization-Benchmarking-Coverage-Supplement-v1.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-09-18-OpenFF-Sulfur-Optimization-Benchmarking-Coverage-Supplement-v1.0) | Additional optimization benchmarking data for Sage sulfur and phosphorus parameters | S, P, Cl, C, N, O, H, Br, F | |
| `OpenFF Lipid Optimization Training Supplement v1.0` | [2024-10-08-OpenFF-Lipid-Optimization-Training-Supplement-v1.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-10-08-OpenFF-Lipid-Optimization-Training-Supplement-v1.0) | Additional optimization training data for Sage from representative LIPID MAPS fragments | I, Br, O, H, P, C, N, Cl, F, S | |
| `OpenFF NAGL2 Training Optimization Dataset Part 1 v4.0` | [2024-11-19-OpenFF-NAGL2-Training-Optimization-Dataset-Part-1-v4.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-11-19-OpenFF-NAGL2-Training-Optimization-Dataset-Part-1-v4.0) | Optimization dataset for NAGL2 training, part 1 | Cl, O, C, P, I, Br, B, S, N, F, H, Si | |
| `OpenFF NAGL2 Training Optimization Dataset Part 2 v4.0` | [2024-11-19-OpenFF-NAGL2-Training-Optimization-Dataset-Part-2-v4.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-11-19-OpenFF-NAGL2-Training-Optimization-Dataset-Part-2-v4.0) | Optimization dataset for NAGL2 training, part 2 | Si, B, O, I, S, Cl, N, H, C, P, F, Br | |
# TorsionDrive Datasets
These are currently used perform a complete rotation of one or more selected bonds, where optimizations are performed over a discrete set of angles.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# OpenFF NAGL2 Training Optimization Dataset Part 2 v4.0

## Description
A dataset containing molecules from the [`MLPepper RECAP Optimized Fragments v1.0`](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-07-26-MLPepper-RECAP-Optimized-Fragments-v1.0)
and [`MLPepper RECAP Optimized Fragments v1.0 Add Iodines`](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-10-11-MLPepper-RECAP-Optimized-Fragments-Add-Iodines-v1.0) datasets,
with additional conformers and optimized at the OpenFF default level of theory (B3LYP-D3BJ/DZVP).
The dataset is intended to be used for calculating single point energies and properties,
which will then be used to train our second-generation graph neural network charge model (NAGL2).
This is part 2, for molecules with molecular weight greater than 300 Da.


For each molecule, a set of up to 5 conformers were generated by:

* generating a set of up to 1000 conformers with a RMS cutoff of 0.1 Å
using the OpenEye backend of the OpenFF toolkit

* applying ELF conformer selection (max 5 conformers) using OpenEye

## General information
* Date: 2024-11-19
* Class: OpenFF Optimization Dataset
* Purpose: Conformer optimization
* Name: OpenFF NAGL2 Training Optimization Dataset Part 2 v4.0
* Number of unique molecules: 1197
* Number of conformers: 2323
* Number of conformers (min, mean, max): 1.00, 1.94, 5.00
* Molecular weight (min, mean, max): 300.08, 377.82, 701.59
* Charges: -4.0 -2.0 -1.0 0.0 1.0 2.0
* Dataset submitter: Alexandra McIsaac
* Dataset generator: Alexandra McIsaac

## QCSubmit generation pipeline
* `generate-dataset-part2.ipynb` was used to generate conformers from CMILES and create the dataset.

## QCSubmit Manifest
* `dataset_part2.json.bz2`: compressed dataset ready for submission
* `dataset_part2.pdf`: Visualization of dataset molecules
* `dataset_part2.smi`: Smiles strings for dataset molecules
* `generate-dataset-part2.ipynb`: Notebook describing dataset generation and submission
* `input-environment.yaml`: Environment file used to create Python environment for the notebook
* `input-environment-full.yaml`: Fully-resolved environment used to execute the notebook.
* `mlpepper.json.bz2`: Zipped version of the mlpepper dataset that can be read in for quicker conformer generation

## Metadata
* Elements: {Si, B, O, I, S, Cl, N, H, C, P, F, Br}
* Spec: default
* basis: DZVP
* implicit_solvent: None
* keywords: {}
* maxiter: 200
* method: B3LYP-D3BJ
* program: psi4
* SCF properties:
* dipole
* quadrupole
* wiberg_lowdin_indices
* mayer_indices
Git LFS file not shown
Binary file not shown.
Loading

0 comments on commit e54e981

Please sign in to comment.