Skip to content

Commit

Permalink
Add LIPID MAPS fragment optimization dataset (#394)
Browse files Browse the repository at this point in the history
* OpenFF Lipid Optimization Training Supplement v1.0

* update main readme

* update elements
  • Loading branch information
ntBre authored Oct 14, 2024
1 parent f0e663b commit 95809b4
Show file tree
Hide file tree
Showing 12 changed files with 20,683 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -276,6 +276,8 @@ These are currently used to find a minimum energy conformation of a molecule.
|`OpenFF Iodine Fragment Opt v1.0` | [2024-09-10-OpenFF-Iodine-Fragment-Opt-v1.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-09-10-OpenFF-Iodine-Fragment-Opt-v1.0) | B3LYP-D3BJ/DZVP optimized conformers for a variety of I-containing fragment molecules | C, O, I, S, F, Br, Cl, N, H ||
| `OpenFF Sulfur Optimization Training Coverage Supplement v1.0` | [2024-09-11-OpenFF-Sulfur-Optimization-Training-Coverage-Supplement-v1.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-09-11-OpenFF-Sulfur-Optimization-Training-Coverage-Supplement-v1.0) | Additional optimization training data for Sage sulfur and phosphorus parameters | C, S, F, O, H, Cl, Br, P, N | |
| `OpenFF Sulfur Optimization Benchmarking Coverage Supplement v1.0` | [2024-09-18-OpenFF-Sulfur-Optimization-Benchmarking-Coverage-Supplement-v1.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-09-18-OpenFF-Sulfur-Optimization-Benchmarking-Coverage-Supplement-v1.0) | Additional optimization benchmarking data for Sage sulfur and phosphorus parameters | S, P, Cl, C, N, O, H, Br, F | |
| `OpenFF Lipid Optimization Training Supplement v1.0` | [2024-10-08-OpenFF-Lipid-Optimization-Training-Supplement-v1.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-10-08-OpenFF-Lipid-Optimization-Training-Supplement-v1.0) | Additional optimization training data for Sage from representative LIPID MAPS fragments | I, Br, O, H, P, C, N, Cl, F, S | |
# TorsionDrive Datasets
These are currently used perform a complete rotation of one or more selected bonds, where optimizations are performed over a discrete set of angles.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# OpenFF Lipid Optimization Training Supplement v1.0

## Description

An optimization data set created to improve the training coverage of lipid-like
molecules in Sage. The molecules in this data set were selected from the [LIPID
MAPS](https://www.lipidmaps.org/) database via
[cura](https://github.com/ntBre/curato/tree/6039bea5c64f8cd6b374fd12b5fa3b355898d98b)
after being fragmented by the [XFF
fragmentation](https://github.com/XtalPi-XFF/2023_XFF_paper/tree/b84ef6079d24ebd2e86f78a495e3257b375255fa/fragmentation)
algorithm, clustered on the 2048-bit, radius-3 Morgan fingerprint from RDKit by
the `LeaderPicker.LazyBitVectorPick` algorithm, also from RDKit, with a distance
threshold of 0.6559. Placeholder dummy atoms in the fragments were generally
filled with H atoms, and `S(=O)(=O)*` groups were replaced with `S(=O)([O-])`.
The candidate fragments were further restricted to those with between 3 and 70
heavy atoms and containing only the elements Cl, P, Br, I, H, C, O, N, F, and S.
Candidates with InChIKeys matching existing Sage training or benchmarking data
were also filtered out.

## General Information

* Date: 2024-10-08
* Class: OpenFF Optimization Dataset
* Purpose: Improve training coverage in Sage
* Name: OpenFF Lipid Optimization Training Supplement v1.0
* Number of unique molecules: 3971
* Number of filtered molecules: 32
* Number of conformers: 29770
* Number of conformers per molecule (min, mean, max): 1, 7.50, 10
* Mean molecular weight: 291.73
* Max molecular weight: 1016.29
* Charges: -4.0, -1.0, 0.0, 1.0
* Dataset submitter: Brent Westbrook
* Dataset generator: Brent Westbrook

## QCSubmit Generation Pipeline

* `generate-dataset.py`: This script shows how the dataset was prepared from the input file `input.smi`.
* `main.py`: This script shows how the dataset was prepared from the initial cura database.
* `inchis.dat`: The list of InChIKeys present in existing Sage training and
benchmarking data, used by `main.py`.

## QCSubmit Manifest

* `generate-dataset.py`: Script describing dataset generation and submission
* `input-environment.yaml`: Environment file used to create the Python environment for the script
* `full-environment.yaml`: Fully-resolved environment used to execute the script
* `opt.toml`: Experimental [qcaide](https://github.com/ntBre/qcaide) input file for defining
variables used throughout the QCA submission process
* `dataset.json.bz2`: Compressed dataset ready for submission
* `dataset.pdf`: Visualization of dataset molecules
* `output.smi`: SMILES strings for dataset molecules

## Metadata
* Elements: {I, Br, O, H, P, C, N, Cl, F, S}
* Spec: default
* basis: DZVP
* implicit_solvent: None
* keywords: {}
* maxiter: 200
* method: B3LYP-D3BJ
* program: psi4
* SCF properties:
* dipole
* quadrupole
* wiberg_lowdin_indices
* mayer_indices
Git LFS file not shown
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,306 @@
name: qcarchive-user-submit
channels:
- openeye
- conda-forge
dependencies:
- _libgcc_mutex=0.1=conda_forge
- _openmp_mutex=4.5=2_gnu
- ambertools=23.3=py311h9fea076_6
- annotated-types=0.6.0=pyhd8ed1ab_0
- anyio=4.2.0=pyhd8ed1ab_0
- apsw=3.46.0.0=py311h3ea06b8_0
- argcomplete=3.2.2=pyhd8ed1ab_0
- argon2-cffi=23.1.0=pyhd8ed1ab_0
- argon2-cffi-bindings=21.2.0=py311h459d7ec_4
- arpack=3.8.0=nompi_h0baa96a_101
- arrow=1.3.0=pyhd8ed1ab_0
- asttokens=2.4.1=pyhd8ed1ab_0
- astunparse=1.6.3=pyhd8ed1ab_0
- async-lru=2.0.4=pyhd8ed1ab_0
- attrs=23.2.0=pyh71513ae_0
- babel=2.14.0=pyhd8ed1ab_0
- basis_set_exchange=0.9.1=pyhd8ed1ab_0
- beautifulsoup4=4.12.3=pyha770c72_0
- bleach=6.1.0=pyhd8ed1ab_0
- blosc=1.21.5=h0f2a231_0
- brotli=1.1.0=hd590300_1
- brotli-bin=1.1.0=hd590300_1
- brotli-python=1.1.0=py311hb755f60_1
- bson=0.5.9=py_0
- bzip2=1.0.8=hd590300_5
- c-ares=1.26.0=hd590300_0
- c-blosc2=2.13.1=hb4ffafa_0
- ca-certificates=2024.8.30=hbcca054_0
- cached-property=1.5.2=hd8ed1ab_1
- cached_property=1.5.2=pyha770c72_1
- cachetools=5.3.2=pyhd8ed1ab_0
- cairo=1.18.0=h3faef2a_0
- certifi=2024.8.30=pyhd8ed1ab_0
- cffi=1.16.0=py311hb3a22ac_0
- chardet=5.2.0=py311h38be061_1
- charset-normalizer=3.3.2=pyhd8ed1ab_0
- click=8.1.7=unix_pyh707e725_0
- colorama=0.4.6=pyhd8ed1ab_0
- comm=0.2.1=pyhd8ed1ab_0
- contourpy=1.2.0=py311h9547e67_0
- cudatoolkit=11.8.0=h4ba93d1_12
- cycler=0.12.1=pyhd8ed1ab_0
- debugpy=1.8.0=py311hb755f60_1
- decorator=5.1.1=pyhd8ed1ab_0
- defusedxml=0.7.1=pyhd8ed1ab_0
- entrypoints=0.4=pyhd8ed1ab_0
- exceptiongroup=1.2.0=pyhd8ed1ab_2
- executing=2.0.1=pyhd8ed1ab_0
- expat=2.5.0=hcb278e6_1
- fftw=3.3.10=nompi_hc118613_108
- font-ttf-dejavu-sans-mono=2.37=hab24e00_0
- font-ttf-inconsolata=3.000=h77eed37_0
- font-ttf-source-code-pro=2.038=h77eed37_0
- font-ttf-ubuntu=0.83=h77eed37_1
- fontconfig=2.14.2=h14ed4e7_0
- fonts-conda-ecosystem=1=0
- fonts-conda-forge=1=0
- fonttools=4.47.2=py311h459d7ec_0
- fqdn=1.5.1=pyhd8ed1ab_0
- freetype=2.12.1=h267a509_2
- freetype-py=2.3.0=pyhd8ed1ab_0
- gettext=0.21.1=h27087fc_0
- greenlet=3.0.3=py311hb755f60_0
- hdf4=4.2.15=h2a13503_7
- hdf5=1.14.3=nompi_h4f84152_100
- icu=73.2=h59595ed_0
- idna=3.6=pyhd8ed1ab_0
- importlib-metadata=7.0.1=pyha770c72_0
- importlib_metadata=7.0.1=hd8ed1ab_0
- importlib_resources=6.1.1=pyhd8ed1ab_0
- iniconfig=2.0.0=pyhd8ed1ab_0
- ipykernel=6.29.0=pyhd33586a_0
- ipython=8.20.0=pyh707e725_0
- ipywidgets=8.1.1=pyhd8ed1ab_0
- isoduration=20.11.0=pyhd8ed1ab_0
- jedi=0.19.1=pyhd8ed1ab_0
- jinja2=3.1.3=pyhd8ed1ab_0
- joblib=1.3.2=pyhd8ed1ab_0
- json5=0.9.14=pyhd8ed1ab_0
- jsonpointer=2.4=py311h38be061_3
- jsonschema=4.21.1=pyhd8ed1ab_0
- jsonschema-specifications=2023.12.1=pyhd8ed1ab_0
- jsonschema-with-format-nongpl=4.21.1=pyhd8ed1ab_0
- jupyter-lsp=2.2.2=pyhd8ed1ab_0
- jupyter_client=8.6.0=pyhd8ed1ab_0
- jupyter_core=5.7.1=py311h38be061_0
- jupyter_events=0.9.0=pyhd8ed1ab_0
- jupyter_server=2.12.5=pyhd8ed1ab_0
- jupyter_server_terminals=0.5.2=pyhd8ed1ab_0
- jupyterlab=4.0.12=pyhd8ed1ab_0
- jupyterlab_pygments=0.3.0=pyhd8ed1ab_0
- jupyterlab_server=2.25.2=pyhd8ed1ab_0
- jupyterlab_widgets=3.0.9=pyhd8ed1ab_0
- keyutils=1.6.1=h166bdaf_0
- kiwisolver=1.4.5=py311h9547e67_1
- krb5=1.21.2=h659d440_0
- lcms2=2.16=hb7c19ff_0
- ld_impl_linux-64=2.40=h41732ed_0
- lerc=4.0.0=h27087fc_0
- libaec=1.1.2=h59595ed_1
- libblas=3.9.0=21_linux64_openblas
- libboost=1.82.0=h6fcfa73_6
- libboost-python=1.82.0=py311h92ebd52_6
- libbrotlicommon=1.1.0=hd590300_1
- libbrotlidec=1.1.0=hd590300_1
- libbrotlienc=1.1.0=hd590300_1
- libcblas=3.9.0=21_linux64_openblas
- libcurl=8.5.0=hca28451_0
- libdeflate=1.19=hd590300_0
- libedit=3.1.20191231=he28a2e2_2
- libev=4.33=hd590300_2
- libexpat=2.5.0=hcb278e6_1
- libffi=3.4.2=h7f98852_5
- libgcc=14.1.0=h77fa898_1
- libgcc-ng=14.1.0=h69a702a_1
- libgfortran-ng=13.2.0=h69a702a_4
- libgfortran5=13.2.0=ha4646dd_4
- libglib=2.78.3=h783c2da_0
- libgomp=14.1.0=h77fa898_1
- libiconv=1.17=hd590300_2
- libjpeg-turbo=3.0.0=hd590300_1
- liblapack=3.9.0=21_linux64_openblas
- libnetcdf=4.9.2=nompi_h9612171_113
- libnghttp2=1.58.0=h47da74e_1
- libnsl=2.0.1=hd590300_0
- libopenblas=0.3.26=pthreads_h413a1c8_0
- libpng=1.6.39=h753d276_0
- libsodium=1.0.18=h36c2ea0_1
- libsqlite=3.46.0=hde9e2c9_0
- libssh2=1.11.0=h0841786_0
- libstdcxx-ng=13.2.0=h7e041cc_4
- libtiff=4.6.0=ha9c0a0a_2
- libuuid=2.38.1=h0b41bf4_0
- libwebp-base=1.3.2=hd590300_0
- libxcb=1.15=h0b41bf4_0
- libxcrypt=4.4.36=hd590300_1
- libxml2=2.12.4=h232c23b_1
- libzip=1.10.1=h2629f0a_3
- libzlib=1.2.13=hd590300_5
- lz4-c=1.9.4=hcb278e6_0
- lzo=2.10=h516909a_1000
- markupsafe=2.1.4=py311h459d7ec_0
- matplotlib-base=3.8.2=py311h54ef318_0
- matplotlib-inline=0.1.6=pyhd8ed1ab_0
- mda-xdrlib=0.2.0=pyhd8ed1ab_0
- mdtraj=1.9.9=py311h90fe790_1
- mistune=3.0.2=pyhd8ed1ab_0
- msgpack-python=1.0.7=py311h9547e67_0
- munkres=1.1.4=pyh9f0ad1d_0
- nbclient=0.8.0=pyhd8ed1ab_0
- nbconvert-core=7.14.2=pyhd8ed1ab_0
- nbformat=5.9.2=pyhd8ed1ab_0
- ncurses=6.5=h59595ed_0
- nest-asyncio=1.6.0=pyhd8ed1ab_0
- netcdf-fortran=4.6.1=nompi_hacb5139_103
- networkx=3.2.1=pyhd8ed1ab_0
- nomkl=1.0=h5ca1d4c_0
- notebook=7.0.7=pyhd8ed1ab_0
- notebook-shim=0.2.3=pyhd8ed1ab_0
- numexpr=2.8.8=py311h039bad6_100
- numpy=1.26.3=py311h64a7726_0
- ocl-icd=2.3.1=h7f98852_0
- ocl-icd-system=1.0.0=1
- openeye-toolkits=2023.1.1=py311_0
- openff-amber-ff-ports=0.0.4=pyhca7485f_0
- openff-forcefields=2024.01.0=pyhca7485f_0
- openff-interchange=0.3.18=pyhd8ed1ab_0
- openff-interchange-base=0.3.18=pyhd8ed1ab_0
- openff-models=0.1.1=pyhca7485f_0
- openff-qcsubmit=0.53.0=pyhd8ed1ab_1
- openff-toolkit=0.15.1=pyhd8ed1ab_0
- openff-toolkit-base=0.15.1=pyhd8ed1ab_0
- openff-units=0.2.1=pyh1a96a4e_0
- openff-utilities=0.1.12=pyhd8ed1ab_0
- openjpeg=2.5.0=h488ebb8_3
- openmm=8.1.1=py311h9766050_0
- openssl=3.3.2=hb9d3cd8_0
- overrides=7.7.0=pyhd8ed1ab_0
- packaging=23.2=pyhd8ed1ab_0
- packmol=20.010=h86c2bf4_0
- pandas=2.2.0=py311h320fe9a_0
- pandocfilters=1.5.0=pyhd8ed1ab_0
- panedr=0.8.0=pyhd8ed1ab_0
- parmed=4.2.2=py311hb755f60_1
- parso=0.8.3=pyhd8ed1ab_0
- pcre2=10.42=hcad00b1_0
- perl=5.32.1=7_hd590300_perl5
- pexpect=4.9.0=pyhd8ed1ab_0
- pickleshare=0.7.5=py_1003
- pillow=10.2.0=py311ha6c5da5_0
- pint=0.21=pyhd8ed1ab_0
- pip=23.3.2=pyhd8ed1ab_0
- pixman=0.43.2=h59595ed_0
- pkgutil-resolve-name=1.3.10=pyhd8ed1ab_1
- platformdirs=4.2.0=pyhd8ed1ab_0
- pluggy=1.4.0=pyhd8ed1ab_0
- prometheus_client=0.19.0=pyhd8ed1ab_0
- prompt-toolkit=3.0.42=pyha770c72_0
- psutil=5.9.8=py311h459d7ec_0
- pthread-stubs=0.4=h36c2ea0_1001
- ptyprocess=0.7.0=pyhd3deb0d_0
- pure_eval=0.2.2=pyhd8ed1ab_0
- py-cpuinfo=9.0.0=pyhd8ed1ab_0
- pycairo=1.25.1=py311h8feb60e_0
- pycalverter=1.6.1=py_0
- pycparser=2.21=pyhd8ed1ab_0
- pydantic=2.6.0=pyhd8ed1ab_0
- pydantic-core=2.16.1=py311h46250e7_0
- pyedr=0.8.0=pyhd8ed1ab_0
- pygments=2.17.2=pyhd8ed1ab_0
- pyjwt=2.8.0=pyhd8ed1ab_0
- pyparsing=3.1.1=pyhd8ed1ab_0
- pysocks=1.7.1=pyha2e5f31_6
- pytables=3.9.2=py311h10c7f7f_1
- pytest=8.0.0=pyhd8ed1ab_0
- python=3.11.7=hab00c5b_1_cpython
- python-constraint=1.4.0=py_0
- python-dateutil=2.8.2=pyhd8ed1ab_0
- python-fastjsonschema=2.19.1=pyhd8ed1ab_0
- python-json-logger=2.0.7=pyhd8ed1ab_0
- python-tzdata=2023.4=pyhd8ed1ab_0
- python_abi=3.11=4_cp311
- pytz=2023.4=pyhd8ed1ab_0
- pyyaml=6.0.1=py311h459d7ec_1
- pyzmq=25.1.2=py311h34ded2d_0
- qcelemental=0.27.1=pyhd8ed1ab_0
- qcportal=0.55=pyhd8ed1ab_0
- rdkit=2023.09.4=py311h4c2f14b_0
- readline=8.2=h8228510_1
- referencing=0.33.0=pyhd8ed1ab_0
- regex=2023.12.25=py311h459d7ec_0
- reportlab=4.0.9=py311h459d7ec_0
- requests=2.31.0=pyhd8ed1ab_0
- rfc3339-validator=0.1.4=pyhd8ed1ab_0
- rfc3986-validator=0.1.1=pyh9f0ad1d_0
- rlpycairo=0.2.0=pyhd8ed1ab_0
- rpds-py=0.17.1=py311h46250e7_0
- scipy=1.12.0=py311h64a7726_2
- send2trash=1.8.2=pyh41d4057_0
- setuptools=69.0.3=pyhd8ed1ab_0
- six=1.16.0=pyh6c4a22f_0
- smirnoff99frosst=1.1.0=pyh44b312d_0
- snappy=1.1.10=h9fff704_0
- sniffio=1.3.0=pyhd8ed1ab_0
- soupsieve=2.5=pyhd8ed1ab_1
- sqlalchemy=2.0.25=py311h459d7ec_0
- sqlite=3.46.0=h6d4b2fc_0
- stack_data=0.6.2=pyhd8ed1ab_0
- tabulate=0.9.0=pyhd8ed1ab_1
- terminado=0.18.0=pyh0d859eb_0
- tinycss2=1.2.1=pyhd8ed1ab_0
- tk=8.6.13=noxft_h4845f30_101
- tomli=2.0.1=pyhd8ed1ab_0
- tornado=6.3.3=py311h459d7ec_1
- tqdm=4.66.1=pyhd8ed1ab_0
- traitlets=5.14.1=pyhd8ed1ab_0
- types-python-dateutil=2.8.19.20240106=pyhd8ed1ab_0
- typing-extensions=4.9.0=hd8ed1ab_0
- typing_extensions=4.9.0=pyha770c72_0
- typing_utils=0.1.0=pyhd8ed1ab_0
- tzdata=2023d=h0c530f3_0
- unidecode=1.3.8=pyhd8ed1ab_0
- uri-template=1.3.0=pyhd8ed1ab_0
- urllib3=2.2.0=pyhd8ed1ab_0
- wcwidth=0.2.13=pyhd8ed1ab_0
- webcolors=1.13=pyhd8ed1ab_0
- webencodings=0.5.1=pyhd8ed1ab_2
- websocket-client=1.7.0=pyhd8ed1ab_0
- wheel=0.42.0=pyhd8ed1ab_0
- widgetsnbextension=4.0.9=pyhd8ed1ab_0
- xmltodict=0.13.0=pyhd8ed1ab_0
- xorg-kbproto=1.0.7=h7f98852_1002
- xorg-libice=1.1.1=hd590300_0
- xorg-libsm=1.2.4=h7391055_0
- xorg-libx11=1.8.7=h8ee46fc_0
- xorg-libxau=1.0.11=hd590300_0
- xorg-libxdmcp=1.1.3=h7f98852_0
- xorg-libxext=1.3.4=h0b41bf4_2
- xorg-libxrender=0.9.11=hd590300_0
- xorg-libxt=1.3.0=hd590300_1
- xorg-renderproto=0.11.1=h7f98852_1002
- xorg-xextproto=7.3.0=h0b41bf4_1003
- xorg-xproto=7.0.31=h7f98852_1007
- xz=5.2.6=h166bdaf_0
- yaml=0.2.5=h7f98852_2
- zeromq=4.3.5=h59595ed_0
- zipp=3.17.0=pyhd8ed1ab_0
- zlib=1.2.13=hd590300_5
- zlib-ng=2.0.7=h0b41bf4_0
- zstandard=0.22.0=py311haa97af0_0
- zstd=1.5.5=hfc55251_0
- pip:
- amberutils==21.0
- edgembar==0.2
- mmpbsa-py==16.0
- packmol-memgen==2023.2.24
- pdb4amber==22.0
- pymsmt==22.0
- pytraj==2.0.6
- sander==22.0
prefix: /home/brent/mambaforge/envs/qcarchive-user-submit
Loading

0 comments on commit 95809b4

Please sign in to comment.