Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenFF Organometallics Exploratory Optimization Dataset #413

Merged
merged 4 commits into from
Dec 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -297,6 +297,7 @@ These are currently used to find a minimum energy conformation of a molecule.
| `OpenFF Lipid Optimization Training Supplement v1.0` | [2024-10-08-OpenFF-Lipid-Optimization-Training-Supplement-v1.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-10-08-OpenFF-Lipid-Optimization-Training-Supplement-v1.0) | Additional optimization training data for Sage from representative LIPID MAPS fragments | I, Br, O, H, P, C, N, Cl, F, S | |
| `OpenFF NAGL2 Training Optimization Dataset Part 1 v4.0` | [2024-11-19-OpenFF-NAGL2-Training-Optimization-Dataset-Part-1-v4.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-11-19-OpenFF-NAGL2-Training-Optimization-Dataset-Part-1-v4.0) | Optimization dataset for NAGL2 training, part 1 | Cl, O, C, P, I, Br, B, S, N, F, H, Si | |
| `OpenFF NAGL2 Training Optimization Dataset Part 2 v4.0` | [2024-11-19-OpenFF-NAGL2-Training-Optimization-Dataset-Part-2-v4.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-11-19-OpenFF-NAGL2-Training-Optimization-Dataset-Part-2-v4.0) | Optimization dataset for NAGL2 training, part 2 | Si, B, O, I, S, Cl, N, H, C, P, F, Br | |
| `OpenFF Organometallics Exploratory Optimization Dataset` | [2024-12-03-OpenFF-Organometallics-Exploratory-Optimization-Dataset](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-12-03-OpenFF-Organometallics-Exploratory-Optimization-Dataset) | Optimization training data for organometallic molecules | F, P, O, C, Zn, N, Ni, Pt, S, Pd, Mg, Br, Rh, Fe, H, Cl, B, Li | |
| `OpenFF NAGL2 Training Optimization Dataset v4.0` | [2024-12-09-OpenFF-NAGL2-Training-Optimization-Dataset-v4.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-12-09-OpenFF-NAGL2-Training-Optimization-Dataset-v4.0) | Optimization dataset for NAGL2 training, combined and filtered | Si, B, O, I, S, Cl, N, H, C, P, F, Br | |


Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# OpenFF Organometallics Exploratory Optimization Dataset

## Description

An optimization dataset created to test the OpenFF and QCArchive infrastructure
for calculations involving organometallic molecules. The molecules in this
dataset were extracted from the `OpenEye SMILES` entries in the [Chemical
Component Dictionary](https://www.wwpdb.org/data/ccd) mmCIF file. These were
filtered to remove molecules with radical electrons and to include only
molecules with the desired metal atoms: Pd, Fe, Zn, Mg, Cu, Li, Rh, Ir, Pt, Ni,
Cr, and Ag. These were further filtered to retain only molecules with at least
10 atoms, an absolute charge of less than 4, and those not present in any of our
existing training data. From this candidate set, the molecules were sorted based
on their number of atoms, and the smallest 100 were selected. Of these, 56 were
further removed by errors in the dataset preparation process, leaving 44
molecules.

## General Information

* Date: 2024-12-03
* Class: OpenFF Optimization Dataset
* Purpose: Provide training data for metal-containing molecules
* Name: OpenFF Organometallics Exploratory Optimization Dataset
* Number of unique molecules: 44
* Number of filtered molecules: 55
* Number of conformers: 239
* Number of conformers per molecule (min, mean, max): 1, 5.43, 10
* Mean molecular weight: 424.38
* Max molecular weight: 741.40
* Charges: [0.0, 1.0, 2.0, 3.0]
* Dataset submitter: Brent Westbrook
* Dataset generator: Brent Westbrook

## QCSubmit Generation Pipeline

* `main.py`: This script shows how the dataset was prepared from `components.cif`, retrieved
from the CCD, and `inchis.dat`, which contains the InCHI keys of our existing
training data.


## QCSubmit Manifest

* `main.py`: Script describing dataset generation and submission
* `input-environment.yaml`: Environment file used to create the Python environment for the script
* `full-environment.yaml`: Fully-resolved environment used to execute the script
* `opt.toml`: Experimental [qcaide](https://github.com/ntBre/qcaide) input file for defining
variables used throughout the QCA submission process
* `dataset.json.bz2`: Compressed dataset ready for submission
* `dataset.pdf`: Visualization of dataset molecules
* `output.smi`: SMILES strings for dataset molecules

## Metadata
* Elements: {F, P, O, C, Zn, N, Ni, Pt, S, Pd, Mg, Br, Rh, Fe, H, Cl, B, Li}
* Spec: BP86/def2-TZVP
* basis: def2-TZVP
* implicit_solvent: None
* keywords: {}
* maxiter: 200
* method: BP86
* program: psi4
* SCF properties:
* dipole
* quadrupole
* wiberg_lowdin_indices
* mayer_indices
Git LFS file not shown
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,306 @@
name: qcarchive-user-submit
channels:
- openeye
- conda-forge
dependencies:
- _libgcc_mutex=0.1=conda_forge
- _openmp_mutex=4.5=2_gnu
- ambertools=23.3=py311h9fea076_6
- annotated-types=0.6.0=pyhd8ed1ab_0
- anyio=4.2.0=pyhd8ed1ab_0
- apsw=3.46.0.0=py311h3ea06b8_0
- argcomplete=3.2.2=pyhd8ed1ab_0
- argon2-cffi=23.1.0=pyhd8ed1ab_0
- argon2-cffi-bindings=21.2.0=py311h459d7ec_4
- arpack=3.8.0=nompi_h0baa96a_101
- arrow=1.3.0=pyhd8ed1ab_0
- asttokens=2.4.1=pyhd8ed1ab_0
- astunparse=1.6.3=pyhd8ed1ab_0
- async-lru=2.0.4=pyhd8ed1ab_0
- attrs=23.2.0=pyh71513ae_0
- babel=2.14.0=pyhd8ed1ab_0
- basis_set_exchange=0.9.1=pyhd8ed1ab_0
- beautifulsoup4=4.12.3=pyha770c72_0
- bleach=6.1.0=pyhd8ed1ab_0
- blosc=1.21.5=h0f2a231_0
- brotli=1.1.0=hd590300_1
- brotli-bin=1.1.0=hd590300_1
- brotli-python=1.1.0=py311hb755f60_1
- bson=0.5.9=py_0
- bzip2=1.0.8=hd590300_5
- c-ares=1.26.0=hd590300_0
- c-blosc2=2.13.1=hb4ffafa_0
- ca-certificates=2024.8.30=hbcca054_0
- cached-property=1.5.2=hd8ed1ab_1
- cached_property=1.5.2=pyha770c72_1
- cachetools=5.3.2=pyhd8ed1ab_0
- cairo=1.18.0=h3faef2a_0
- certifi=2024.8.30=pyhd8ed1ab_0
- cffi=1.16.0=py311hb3a22ac_0
- chardet=5.2.0=py311h38be061_1
- charset-normalizer=3.3.2=pyhd8ed1ab_0
- click=8.1.7=unix_pyh707e725_0
- colorama=0.4.6=pyhd8ed1ab_0
- comm=0.2.1=pyhd8ed1ab_0
- contourpy=1.2.0=py311h9547e67_0
- cudatoolkit=11.8.0=h4ba93d1_12
- cycler=0.12.1=pyhd8ed1ab_0
- debugpy=1.8.0=py311hb755f60_1
- decorator=5.1.1=pyhd8ed1ab_0
- defusedxml=0.7.1=pyhd8ed1ab_0
- entrypoints=0.4=pyhd8ed1ab_0
- exceptiongroup=1.2.0=pyhd8ed1ab_2
- executing=2.0.1=pyhd8ed1ab_0
- expat=2.5.0=hcb278e6_1
- fftw=3.3.10=nompi_hc118613_108
- font-ttf-dejavu-sans-mono=2.37=hab24e00_0
- font-ttf-inconsolata=3.000=h77eed37_0
- font-ttf-source-code-pro=2.038=h77eed37_0
- font-ttf-ubuntu=0.83=h77eed37_1
- fontconfig=2.14.2=h14ed4e7_0
- fonts-conda-ecosystem=1=0
- fonts-conda-forge=1=0
- fonttools=4.47.2=py311h459d7ec_0
- fqdn=1.5.1=pyhd8ed1ab_0
- freetype=2.12.1=h267a509_2
- freetype-py=2.3.0=pyhd8ed1ab_0
- gettext=0.21.1=h27087fc_0
- greenlet=3.0.3=py311hb755f60_0
- hdf4=4.2.15=h2a13503_7
- hdf5=1.14.3=nompi_h4f84152_100
- icu=73.2=h59595ed_0
- idna=3.6=pyhd8ed1ab_0
- importlib-metadata=7.0.1=pyha770c72_0
- importlib_metadata=7.0.1=hd8ed1ab_0
- importlib_resources=6.1.1=pyhd8ed1ab_0
- iniconfig=2.0.0=pyhd8ed1ab_0
- ipykernel=6.29.0=pyhd33586a_0
- ipython=8.20.0=pyh707e725_0
- ipywidgets=8.1.1=pyhd8ed1ab_0
- isoduration=20.11.0=pyhd8ed1ab_0
- jedi=0.19.1=pyhd8ed1ab_0
- jinja2=3.1.3=pyhd8ed1ab_0
- joblib=1.3.2=pyhd8ed1ab_0
- json5=0.9.14=pyhd8ed1ab_0
- jsonpointer=2.4=py311h38be061_3
- jsonschema=4.21.1=pyhd8ed1ab_0
- jsonschema-specifications=2023.12.1=pyhd8ed1ab_0
- jsonschema-with-format-nongpl=4.21.1=pyhd8ed1ab_0
- jupyter-lsp=2.2.2=pyhd8ed1ab_0
- jupyter_client=8.6.0=pyhd8ed1ab_0
- jupyter_core=5.7.1=py311h38be061_0
- jupyter_events=0.9.0=pyhd8ed1ab_0
- jupyter_server=2.12.5=pyhd8ed1ab_0
- jupyter_server_terminals=0.5.2=pyhd8ed1ab_0
- jupyterlab=4.0.12=pyhd8ed1ab_0
- jupyterlab_pygments=0.3.0=pyhd8ed1ab_0
- jupyterlab_server=2.25.2=pyhd8ed1ab_0
- jupyterlab_widgets=3.0.9=pyhd8ed1ab_0
- keyutils=1.6.1=h166bdaf_0
- kiwisolver=1.4.5=py311h9547e67_1
- krb5=1.21.2=h659d440_0
- lcms2=2.16=hb7c19ff_0
- ld_impl_linux-64=2.40=h41732ed_0
- lerc=4.0.0=h27087fc_0
- libaec=1.1.2=h59595ed_1
- libblas=3.9.0=21_linux64_openblas
- libboost=1.82.0=h6fcfa73_6
- libboost-python=1.82.0=py311h92ebd52_6
- libbrotlicommon=1.1.0=hd590300_1
- libbrotlidec=1.1.0=hd590300_1
- libbrotlienc=1.1.0=hd590300_1
- libcblas=3.9.0=21_linux64_openblas
- libcurl=8.5.0=hca28451_0
- libdeflate=1.19=hd590300_0
- libedit=3.1.20191231=he28a2e2_2
- libev=4.33=hd590300_2
- libexpat=2.5.0=hcb278e6_1
- libffi=3.4.2=h7f98852_5
- libgcc=14.1.0=h77fa898_1
- libgcc-ng=14.1.0=h69a702a_1
- libgfortran-ng=13.2.0=h69a702a_4
- libgfortran5=13.2.0=ha4646dd_4
- libglib=2.78.3=h783c2da_0
- libgomp=14.1.0=h77fa898_1
- libiconv=1.17=hd590300_2
- libjpeg-turbo=3.0.0=hd590300_1
- liblapack=3.9.0=21_linux64_openblas
- libnetcdf=4.9.2=nompi_h9612171_113
- libnghttp2=1.58.0=h47da74e_1
- libnsl=2.0.1=hd590300_0
- libopenblas=0.3.26=pthreads_h413a1c8_0
- libpng=1.6.39=h753d276_0
- libsodium=1.0.18=h36c2ea0_1
- libsqlite=3.46.0=hde9e2c9_0
- libssh2=1.11.0=h0841786_0
- libstdcxx-ng=13.2.0=h7e041cc_4
- libtiff=4.6.0=ha9c0a0a_2
- libuuid=2.38.1=h0b41bf4_0
- libwebp-base=1.3.2=hd590300_0
- libxcb=1.15=h0b41bf4_0
- libxcrypt=4.4.36=hd590300_1
- libxml2=2.12.4=h232c23b_1
- libzip=1.10.1=h2629f0a_3
- libzlib=1.2.13=hd590300_5
- lz4-c=1.9.4=hcb278e6_0
- lzo=2.10=h516909a_1000
- markupsafe=2.1.4=py311h459d7ec_0
- matplotlib-base=3.8.2=py311h54ef318_0
- matplotlib-inline=0.1.6=pyhd8ed1ab_0
- mda-xdrlib=0.2.0=pyhd8ed1ab_0
- mdtraj=1.9.9=py311h90fe790_1
- mistune=3.0.2=pyhd8ed1ab_0
- msgpack-python=1.0.7=py311h9547e67_0
- munkres=1.1.4=pyh9f0ad1d_0
- nbclient=0.8.0=pyhd8ed1ab_0
- nbconvert-core=7.14.2=pyhd8ed1ab_0
- nbformat=5.9.2=pyhd8ed1ab_0
- ncurses=6.5=h59595ed_0
- nest-asyncio=1.6.0=pyhd8ed1ab_0
- netcdf-fortran=4.6.1=nompi_hacb5139_103
- networkx=3.2.1=pyhd8ed1ab_0
- nomkl=1.0=h5ca1d4c_0
- notebook=7.0.7=pyhd8ed1ab_0
- notebook-shim=0.2.3=pyhd8ed1ab_0
- numexpr=2.8.8=py311h039bad6_100
- numpy=1.26.3=py311h64a7726_0
- ocl-icd=2.3.1=h7f98852_0
- ocl-icd-system=1.0.0=1
- openeye-toolkits=2023.1.1=py311_0
- openff-amber-ff-ports=0.0.4=pyhca7485f_0
- openff-forcefields=2024.01.0=pyhca7485f_0
- openff-interchange=0.3.18=pyhd8ed1ab_0
- openff-interchange-base=0.3.18=pyhd8ed1ab_0
- openff-models=0.1.1=pyhca7485f_0
- openff-qcsubmit=0.53.0=pyhd8ed1ab_1
- openff-toolkit=0.15.1=pyhd8ed1ab_0
- openff-toolkit-base=0.15.1=pyhd8ed1ab_0
- openff-units=0.2.1=pyh1a96a4e_0
- openff-utilities=0.1.12=pyhd8ed1ab_0
- openjpeg=2.5.0=h488ebb8_3
- openmm=8.1.1=py311h9766050_0
- openssl=3.3.2=hb9d3cd8_0
- overrides=7.7.0=pyhd8ed1ab_0
- packaging=23.2=pyhd8ed1ab_0
- packmol=20.010=h86c2bf4_0
- pandas=2.2.0=py311h320fe9a_0
- pandocfilters=1.5.0=pyhd8ed1ab_0
- panedr=0.8.0=pyhd8ed1ab_0
- parmed=4.2.2=py311hb755f60_1
- parso=0.8.3=pyhd8ed1ab_0
- pcre2=10.42=hcad00b1_0
- perl=5.32.1=7_hd590300_perl5
- pexpect=4.9.0=pyhd8ed1ab_0
- pickleshare=0.7.5=py_1003
- pillow=10.2.0=py311ha6c5da5_0
- pint=0.21=pyhd8ed1ab_0
- pip=23.3.2=pyhd8ed1ab_0
- pixman=0.43.2=h59595ed_0
- pkgutil-resolve-name=1.3.10=pyhd8ed1ab_1
- platformdirs=4.2.0=pyhd8ed1ab_0
- pluggy=1.4.0=pyhd8ed1ab_0
- prometheus_client=0.19.0=pyhd8ed1ab_0
- prompt-toolkit=3.0.42=pyha770c72_0
- psutil=5.9.8=py311h459d7ec_0
- pthread-stubs=0.4=h36c2ea0_1001
- ptyprocess=0.7.0=pyhd3deb0d_0
- pure_eval=0.2.2=pyhd8ed1ab_0
- py-cpuinfo=9.0.0=pyhd8ed1ab_0
- pycairo=1.25.1=py311h8feb60e_0
- pycalverter=1.6.1=py_0
- pycparser=2.21=pyhd8ed1ab_0
- pydantic=2.6.0=pyhd8ed1ab_0
- pydantic-core=2.16.1=py311h46250e7_0
- pyedr=0.8.0=pyhd8ed1ab_0
- pygments=2.17.2=pyhd8ed1ab_0
- pyjwt=2.8.0=pyhd8ed1ab_0
- pyparsing=3.1.1=pyhd8ed1ab_0
- pysocks=1.7.1=pyha2e5f31_6
- pytables=3.9.2=py311h10c7f7f_1
- pytest=8.0.0=pyhd8ed1ab_0
- python=3.11.7=hab00c5b_1_cpython
- python-constraint=1.4.0=py_0
- python-dateutil=2.8.2=pyhd8ed1ab_0
- python-fastjsonschema=2.19.1=pyhd8ed1ab_0
- python-json-logger=2.0.7=pyhd8ed1ab_0
- python-tzdata=2023.4=pyhd8ed1ab_0
- python_abi=3.11=4_cp311
- pytz=2023.4=pyhd8ed1ab_0
- pyyaml=6.0.1=py311h459d7ec_1
- pyzmq=25.1.2=py311h34ded2d_0
- qcelemental=0.27.1=pyhd8ed1ab_0
- qcportal=0.55=pyhd8ed1ab_0
- rdkit=2023.09.4=py311h4c2f14b_0
- readline=8.2=h8228510_1
- referencing=0.33.0=pyhd8ed1ab_0
- regex=2023.12.25=py311h459d7ec_0
- reportlab=4.0.9=py311h459d7ec_0
- requests=2.31.0=pyhd8ed1ab_0
- rfc3339-validator=0.1.4=pyhd8ed1ab_0
- rfc3986-validator=0.1.1=pyh9f0ad1d_0
- rlpycairo=0.2.0=pyhd8ed1ab_0
- rpds-py=0.17.1=py311h46250e7_0
- scipy=1.12.0=py311h64a7726_2
- send2trash=1.8.2=pyh41d4057_0
- setuptools=69.0.3=pyhd8ed1ab_0
- six=1.16.0=pyh6c4a22f_0
- smirnoff99frosst=1.1.0=pyh44b312d_0
- snappy=1.1.10=h9fff704_0
- sniffio=1.3.0=pyhd8ed1ab_0
- soupsieve=2.5=pyhd8ed1ab_1
- sqlalchemy=2.0.25=py311h459d7ec_0
- sqlite=3.46.0=h6d4b2fc_0
- stack_data=0.6.2=pyhd8ed1ab_0
- tabulate=0.9.0=pyhd8ed1ab_1
- terminado=0.18.0=pyh0d859eb_0
- tinycss2=1.2.1=pyhd8ed1ab_0
- tk=8.6.13=noxft_h4845f30_101
- tomli=2.0.1=pyhd8ed1ab_0
- tornado=6.3.3=py311h459d7ec_1
- tqdm=4.66.1=pyhd8ed1ab_0
- traitlets=5.14.1=pyhd8ed1ab_0
- types-python-dateutil=2.8.19.20240106=pyhd8ed1ab_0
- typing-extensions=4.9.0=hd8ed1ab_0
- typing_extensions=4.9.0=pyha770c72_0
- typing_utils=0.1.0=pyhd8ed1ab_0
- tzdata=2023d=h0c530f3_0
- unidecode=1.3.8=pyhd8ed1ab_0
- uri-template=1.3.0=pyhd8ed1ab_0
- urllib3=2.2.0=pyhd8ed1ab_0
- wcwidth=0.2.13=pyhd8ed1ab_0
- webcolors=1.13=pyhd8ed1ab_0
- webencodings=0.5.1=pyhd8ed1ab_2
- websocket-client=1.7.0=pyhd8ed1ab_0
- wheel=0.42.0=pyhd8ed1ab_0
- widgetsnbextension=4.0.9=pyhd8ed1ab_0
- xmltodict=0.13.0=pyhd8ed1ab_0
- xorg-kbproto=1.0.7=h7f98852_1002
- xorg-libice=1.1.1=hd590300_0
- xorg-libsm=1.2.4=h7391055_0
- xorg-libx11=1.8.7=h8ee46fc_0
- xorg-libxau=1.0.11=hd590300_0
- xorg-libxdmcp=1.1.3=h7f98852_0
- xorg-libxext=1.3.4=h0b41bf4_2
- xorg-libxrender=0.9.11=hd590300_0
- xorg-libxt=1.3.0=hd590300_1
- xorg-renderproto=0.11.1=h7f98852_1002
- xorg-xextproto=7.3.0=h0b41bf4_1003
- xorg-xproto=7.0.31=h7f98852_1007
- xz=5.2.6=h166bdaf_0
- yaml=0.2.5=h7f98852_2
- zeromq=4.3.5=h59595ed_0
- zipp=3.17.0=pyhd8ed1ab_0
- zlib=1.2.13=hd590300_5
- zlib-ng=2.0.7=h0b41bf4_0
- zstandard=0.22.0=py311haa97af0_0
- zstd=1.5.5=hfc55251_0
- pip:
- amberutils==21.0
- edgembar==0.2
- mmpbsa-py==16.0
- packmol-memgen==2023.2.24
- pdb4amber==22.0
- pymsmt==22.0
- pytraj==2.0.6
- sander==22.0
prefix: /home/brent/mambaforge/envs/qcarchive-user-submit
Loading
Loading