diff --git a/README.md b/README.md index 9962ee54..de415f81 100644 --- a/README.md +++ b/README.md @@ -243,6 +243,7 @@ These are currently used to compute properties of a minimum energy conformation |`OpenFF NAGL2 ESP Timing Benchmark v1.0` | [2024-09-06-OpenFF-NAGL2-ESP-Timing-Benchmark-v1.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-09-06-OpenFF-NAGL2-ESP-Timing-Benchmark-v1.0) | Single point ESP calculations for timing/memory benchmarking | 'P', 'S', 'N', 'C', 'Cl', 'F', 'Br', 'O', 'H', 'I' | | |`OpenFF NAGL2 ESP Timing Benchmark v1.1` | [2024-09-18-OpenFF-NAGL2-ESP-Timing-Benchmark-v1.1](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-09-18-OpenFF-NAGL2-ESP-Timing-Benchmark-v1.1) | Single point ESP calculations for timing/memory benchmarking | 'P', 'S', 'N', 'C', 'Cl', 'F', 'Br', 'O', 'H', 'I' | | |`OpenFF Sulfur Hessian Training Coverage Supplement v1.0` | [2024-09-18-OpenFF-Sulfur-Hessian-Training-Coverage-Supplement-v1.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-09-18-OpenFF-Sulfur-Hessian-Training-Coverage-Supplement-v1.0) | Additional Hessian training data for Sage sulfur and phosphorus parameters (from ['OpenFF Sulfur Optimization Training Coverage Supplement v1.0'](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-09-11-OpenFF-Sulfur-Optimization-Training-Coverage-Supplement-v1.0)) | O, S, C, Cl, P, N, F, Br, H | | +|`OpenFF Sulfur Hessian Training Coverage Supplement v1.1` | [2024-11-08-OpenFF-Sulfur-Hessian-Training-Coverage-Supplement-v1.1](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-11-08-OpenFF-Sulfur-Hessian-Training-Coverage-Supplement-v1.1) | Additional Hessian training data for Sage sulfur and phosphorus parameters (from ['OpenFF Sulfur Optimization Training Coverage Supplement v1.0'](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-09-11-OpenFF-Sulfur-Optimization-Training-Coverage-Supplement-v1.0)) | O, S, C, Cl, P, N, F, Br, H | | | `OpenFF Aniline Para Hessian v1.0` | [2024-10-07-OpenFF-Aniline-Para-Hessian-v1.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-10-07-OpenFF-Aniline-Para-Hessian-v1.0) | Hessian single points for the final molecules in the `OpenFF Aniline Para Opt v1.0` [dataset](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2021-04-02-OpenFF-Aniline-Para-Opt-v1.0) | 'O', 'Cl', 'S', 'Br', 'H', 'F', 'N', 'C' || |`OpenFF Gen2 Hessian Dataset Protomers v1.0` | [2024-10-07-OpenFF-Gen2-Hessian-Dataset-Protomers-v1.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-10-07-OpenFF-Gen2-Hessian-Dataset-Protomers-v1.0/) | Hessian single points for the final molecules in the `OpenFF Gen2 Optimization Dataset Protomers v1.0` [dataset](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2021-12-21-OpenFF-Gen2-Optimization-Set-Protomers) | 'H', 'C', 'Cl', 'P', 'F', 'Br', 'O', 'N', 'S'|| | `MLPepper-RECAP-Optimized-Fragments-Add-Iodines-v1.0` | [2024-10-11-MLPepper-RECAP-Optimized-Fragments-Add-Iodines-v1.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-10-11-MLPepper-RECAP-Optimized-Fragments-Add-Iodines-v1.0) | Set of diverse iodine containing molecules with a number of calculated electrostatic properties. | Br, Cl, S, B, O, Si, C, N, I, P, H, F| | diff --git a/submissions/2024-11-08-Sulfur-Hessian-Training-Coverage-Supplement-v1.1/README.md b/submissions/2024-11-08-Sulfur-Hessian-Training-Coverage-Supplement-v1.1/README.md new file mode 100644 index 00000000..7a3edb45 --- /dev/null +++ b/submissions/2024-11-08-Sulfur-Hessian-Training-Coverage-Supplement-v1.1/README.md @@ -0,0 +1,68 @@ +# OpenFF Sulfur Hessian Training Coverage Supplement v1.1 + +## Description + +A basic data set created to improve the training coverage of sulfonic and +phosphonic acids, sulfone, sulfonate, sulfinyl, sulfoximine, sulfonamides, +thioether, and 1,3-thiazole groups. The structures in this data set are the +optimized geometries from `OpenFF Sulfur Optimization Training Coverage +Supplement v1.0`. + +## General Information + +* Date: 2024-11-08 +* Class: OpenFF Optimization Dataset +* Purpose: Improve coverage in Sage +* Name: OpenFF Sulfur Hessian Training Coverage Supplement v1.1 +* Number of unique molecules: 129 +* Number of filtered molecules: 0 +* Number of conformers: 899 +* Number of conformers per molecule (min, mean, max): 1, 6.97, 10 +* Mean molecular weight: 218.80 +* Max molecular weight: 493.37 +* Charges: [-2.0, -1.0, 0.0] +* Dataset submitter: Brent Westbrook +* Dataset generator: Brent Westbrook + +## QCSubmit Generation Pipeline + +* `generate.py`: This script shows how the dataset was prepared. + + +## QCSubmit Manifest + +* `generate.py`: Script describing dataset generation and submission +* `input-environment.yaml`: Environment file used to create the Python environment for the script +* `full-environment.yaml`: Fully-resolved environment used to execute the script +* `opt.toml`: Experimental [qcaide](https://github.com/ntBre/qcaide) input file for defining +variables used throughout the QCA submission process +* `dataset.json.bz2`: Compressed dataset ready for submission +* `dataset.pdf`: Visualization of dataset molecules +* `dataset.smi`: SMILES strings for dataset molecules + +## Metadata + +* Elements: {O, S, C, Cl, P, N, F, Br, H} +* Spec: default + * basis: DZVP + * implicit_solvent: None + * keywords: {} + * maxiter: 200 + * method: B3LYP-D3BJ + * program: psi4 + * SCF properties: + * dipole + * quadrupole + * wiberg_lowdin_indices + * mayer_indices + +## Changelog +v1.0 included a manual implementation of +`OptimizationResultCollection.create_basic_dataset` that failed to preserve +QCArchive molecule IDs between the optimization and single-point datasets. +Unfortunately, this issue would not have been avoided by that version of +`create_basic_dataset` either. The issue has been fixed in openff-qcsubmit +[version +0.54](https://github.com/openforcefield/openff-qcsubmit/releases/tag/0.54.0), so +the environment has been updated to use this release, and the `generate.py` +script has been updated to use `create_basic_dataset`. diff --git a/submissions/2024-11-08-Sulfur-Hessian-Training-Coverage-Supplement-v1.1/dataset.json.bz2 b/submissions/2024-11-08-Sulfur-Hessian-Training-Coverage-Supplement-v1.1/dataset.json.bz2 new file mode 100644 index 00000000..4f18b10a --- /dev/null +++ b/submissions/2024-11-08-Sulfur-Hessian-Training-Coverage-Supplement-v1.1/dataset.json.bz2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f81fab3d7782cc1b2e7088de4919fdad9314e2daf5b53d6444190c3f7d50a6b +size 650626 diff --git a/submissions/2024-11-08-Sulfur-Hessian-Training-Coverage-Supplement-v1.1/dataset.pdf b/submissions/2024-11-08-Sulfur-Hessian-Training-Coverage-Supplement-v1.1/dataset.pdf new file mode 100644 index 00000000..47af601c Binary files /dev/null and b/submissions/2024-11-08-Sulfur-Hessian-Training-Coverage-Supplement-v1.1/dataset.pdf differ diff --git a/submissions/2024-11-08-Sulfur-Hessian-Training-Coverage-Supplement-v1.1/full-environment.yaml b/submissions/2024-11-08-Sulfur-Hessian-Training-Coverage-Supplement-v1.1/full-environment.yaml new file mode 100644 index 00000000..821f9714 --- /dev/null +++ b/submissions/2024-11-08-Sulfur-Hessian-Training-Coverage-Supplement-v1.1/full-environment.yaml @@ -0,0 +1,309 @@ +name: qcarchive-user-submit +channels: + - openeye + - conda-forge +dependencies: + - _libgcc_mutex=0.1=conda_forge + - _openmp_mutex=4.5=2_gnu + - ambertools=23.6=cuda_None_nompi_py311h4a53416_105 + - annotated-types=0.7.0=pyhd8ed1ab_0 + - anyio=4.6.2.post1=pyhd8ed1ab_0 + - apsw=3.46.1.0=py311h333efcf_1 + - argcomplete=3.5.1=pyhd8ed1ab_0 + - argon2-cffi=23.1.0=pyhd8ed1ab_0 + - argon2-cffi-bindings=21.2.0=py311h9ecbd09_5 + - arpack=3.9.1=nompi_h77f6705_101 + - arrow=1.3.0=pyhd8ed1ab_0 + - asttokens=2.4.1=pyhd8ed1ab_0 + - async-lru=2.0.4=pyhd8ed1ab_0 + - attrs=24.2.0=pyh71513ae_0 + - babel=2.16.0=pyhd8ed1ab_0 + - basis_set_exchange=0.10=pyhd8ed1ab_1 + - beautifulsoup4=4.12.3=pyha770c72_0 + - bleach=6.2.0=pyhd8ed1ab_0 + - blosc=1.21.6=hef167b5_0 + - brotli=1.1.0=hb9d3cd8_2 + - brotli-bin=1.1.0=hb9d3cd8_2 + - brotli-python=1.1.0=py311hfdbb021_2 + - bson=0.5.9=py_0 + - bzip2=1.0.8=h4bc722e_7 + - c-ares=1.34.2=heb4867d_0 + - c-blosc2=2.15.1=hc57e6cf_0 + - ca-certificates=2024.8.30=hbcca054_0 + - cached-property=1.5.2=hd8ed1ab_1 + - cached_property=1.5.2=pyha770c72_1 + - cachetools=5.5.0=pyhd8ed1ab_0 + - cairo=1.18.0=hebfffa5_3 + - certifi=2024.8.30=pyhd8ed1ab_0 + - cffi=1.17.1=py311hf29c0ef_0 + - chardet=5.2.0=py311h38be061_2 + - charset-normalizer=3.4.0=pyhd8ed1ab_0 + - colorama=0.4.6=pyhd8ed1ab_0 + - comm=0.2.2=pyhd8ed1ab_0 + - contourpy=1.3.0=py311hd18a35c_2 + - cudatoolkit=11.8.0=h4ba93d1_13 + - cycler=0.12.1=pyhd8ed1ab_0 + - debugpy=1.8.8=py311hfdbb021_0 + - decorator=5.1.1=pyhd8ed1ab_0 + - defusedxml=0.7.1=pyhd8ed1ab_0 + - entrypoints=0.4=pyhd8ed1ab_0 + - exceptiongroup=1.2.2=pyhd8ed1ab_0 + - executing=2.1.0=pyhd8ed1ab_0 + - fftw=3.3.10=nompi_hf1063bd_110 + - font-ttf-dejavu-sans-mono=2.37=hab24e00_0 + - font-ttf-inconsolata=3.000=h77eed37_0 + - font-ttf-source-code-pro=2.038=h77eed37_0 + - font-ttf-ubuntu=0.83=h77eed37_3 + - fontconfig=2.15.0=h7e30c49_1 + - fonts-conda-ecosystem=1=0 + - fonts-conda-forge=1=0 + - fonttools=4.54.1=py311h2dc5d0c_1 + - fqdn=1.5.1=pyhd8ed1ab_0 + - freetype=2.12.1=h267a509_2 + - freetype-py=2.3.0=pyhd8ed1ab_0 + - greenlet=3.1.1=py311hfdbb021_0 + - h11=0.14.0=pyhd8ed1ab_0 + - h2=4.1.0=pyhd8ed1ab_0 + - hdf4=4.2.15=h2a13503_7 + - hdf5=1.14.4=nompi_h2d575fe_103 + - hpack=4.0.0=pyh9f0ad1d_0 + - httpcore=1.0.6=pyhd8ed1ab_0 + - httpx=0.27.2=pyhd8ed1ab_0 + - hyperframe=6.0.1=pyhd8ed1ab_0 + - icu=75.1=he02047a_0 + - idna=3.10=pyhd8ed1ab_0 + - importlib-metadata=8.5.0=pyha770c72_0 + - importlib_metadata=8.5.0=hd8ed1ab_0 + - importlib_resources=6.4.5=pyhd8ed1ab_0 + - iniconfig=2.0.0=pyhd8ed1ab_0 + - ipykernel=6.29.5=pyh3099207_0 + - ipython=8.29.0=pyh707e725_0 + - ipywidgets=8.1.5=pyhd8ed1ab_0 + - isoduration=20.11.0=pyhd8ed1ab_0 + - jedi=0.19.1=pyhd8ed1ab_0 + - jinja2=3.1.4=pyhd8ed1ab_0 + - joblib=1.4.2=pyhd8ed1ab_0 + - json5=0.9.25=pyhd8ed1ab_0 + - jsonpointer=3.0.0=py311h38be061_1 + - jsonschema=4.23.0=pyhd8ed1ab_0 + - jsonschema-specifications=2024.10.1=pyhd8ed1ab_0 + - jsonschema-with-format-nongpl=4.23.0=hd8ed1ab_0 + - jupyter-lsp=2.2.5=pyhd8ed1ab_0 + - jupyter_client=8.6.3=pyhd8ed1ab_0 + - jupyter_core=5.7.2=pyh31011fe_1 + - jupyter_events=0.10.0=pyhd8ed1ab_0 + - jupyter_server=2.14.2=pyhd8ed1ab_0 + - jupyter_server_terminals=0.5.3=pyhd8ed1ab_0 + - jupyterlab=4.2.5=pyhd8ed1ab_0 + - jupyterlab_pygments=0.3.0=pyhd8ed1ab_1 + - jupyterlab_server=2.27.3=pyhd8ed1ab_0 + - jupyterlab_widgets=3.0.13=pyhd8ed1ab_0 + - keyutils=1.6.1=h166bdaf_0 + - kiwisolver=1.4.7=py311hd18a35c_0 + - krb5=1.21.3=h659f571_0 + - lcms2=2.16=hb7c19ff_0 + - ld_impl_linux-64=2.43=h712a8e2_2 + - lerc=4.0.0=h27087fc_0 + - libaec=1.1.3=h59595ed_0 + - libblas=3.9.0=25_linux64_openblas + - libboost=1.84.0=hb8260a3_6 + - libboost-python=1.84.0=py311h5b7b71f_6 + - libbrotlicommon=1.1.0=hb9d3cd8_2 + - libbrotlidec=1.1.0=hb9d3cd8_2 + - libbrotlienc=1.1.0=hb9d3cd8_2 + - libcblas=3.9.0=25_linux64_openblas + - libcurl=8.10.1=hbbe4b11_0 + - libdeflate=1.22=hb9d3cd8_0 + - libedit=3.1.20191231=he28a2e2_2 + - libev=4.33=hd590300_2 + - libexpat=2.6.4=h5888daf_0 + - libffi=3.4.2=h7f98852_5 + - libgcc=14.2.0=h77fa898_1 + - libgcc-ng=14.2.0=h69a702a_1 + - libgfortran=14.2.0=h69a702a_1 + - libgfortran-ng=14.2.0=h69a702a_1 + - libgfortran5=14.2.0=hd5240d6_1 + - libglib=2.82.2=h2ff4ddf_0 + - libgomp=14.2.0=h77fa898_1 + - libiconv=1.17=hd590300_2 + - libjpeg-turbo=3.0.0=hd590300_1 + - liblapack=3.9.0=25_linux64_openblas + - libnetcdf=4.9.2=nompi_h2564987_115 + - libnghttp2=1.64.0=h161d5f1_0 + - libnsl=2.0.1=hd590300_0 + - libopenblas=0.3.28=pthreads_h94d23a6_1 + - libpng=1.6.44=hadc24fc_0 + - libpq=16.4=h2d7952a_3 + - librdkit=2024.03.5=h79cfef2_3 + - libsodium=1.0.20=h4ab18f5_0 + - libsqlite=3.46.1=hadc24fc_0 + - libssh2=1.11.0=h0841786_0 + - libstdcxx=14.2.0=hc0a3c3a_1 + - libstdcxx-ng=14.2.0=h4852527_1 + - libtiff=4.7.0=he137b08_1 + - libuuid=2.38.1=h0b41bf4_0 + - libwebp-base=1.4.0=hd590300_0 + - libxcb=1.17.0=h8a09558_0 + - libxcrypt=4.4.36=hd590300_1 + - libxml2=2.13.4=hb346dea_2 + - libzip=1.11.2=h6991a6a_0 + - libzlib=1.3.1=hb9d3cd8_2 + - lz4-c=1.9.4=hcb278e6_0 + - markupsafe=3.0.2=py311h2dc5d0c_0 + - matplotlib-base=3.9.2=py311h2b939e6_2 + - matplotlib-inline=0.1.7=pyhd8ed1ab_0 + - mda-xdrlib=0.2.0=pyhd8ed1ab_0 + - mdtraj=1.10.1=py311h4734c11_0 + - mistune=3.0.2=pyhd8ed1ab_0 + - msgpack-python=1.1.0=py311hd18a35c_0 + - munkres=1.1.4=pyh9f0ad1d_0 + - nbclient=0.10.0=pyhd8ed1ab_0 + - nbconvert-core=7.16.4=pyhd8ed1ab_1 + - nbformat=5.10.4=pyhd8ed1ab_0 + - ncurses=6.5=he02047a_1 + - nest-asyncio=1.6.0=pyhd8ed1ab_0 + - netcdf-fortran=4.6.1=nompi_ha5d1325_107 + - networkx=3.4.2=pyhd8ed1ab_1 + - nomkl=1.0=h5ca1d4c_0 + - notebook=7.2.2=pyhd8ed1ab_0 + - notebook-shim=0.2.4=pyhd8ed1ab_0 + - numexpr=2.10.1=py311h38b10cd_103 + - numpy=1.26.4=py311h64a7726_0 + - ocl-icd=2.3.2=hd590300_1 + - ocl-icd-system=1.0.0=1 + - openeye-toolkits=2024.1.3=py311_0 + - openff-amber-ff-ports=0.0.4=pyhca7485f_0 + - openff-forcefields=2024.09.0=pyhff2d567_0 + - openff-interchange=0.4.0=pyhd8ed1ab_0 + - openff-interchange-base=0.4.0=pyhd8ed1ab_0 + - openff-qcsubmit=0.54.0=pyhd8ed1ab_0 + - openff-toolkit=0.16.5=pyhd8ed1ab_0 + - openff-toolkit-base=0.16.5=pyhd8ed1ab_0 + - openff-units=0.2.2=pyhca7485f_0 + - openff-utilities=0.1.12=pyhd8ed1ab_0 + - openjpeg=2.5.2=h488ebb8_0 + - openmm=8.1.2=py311he040c58_2 + - openssl=3.3.2=hb9d3cd8_0 + - overrides=7.7.0=pyhd8ed1ab_0 + - packaging=24.1=pyhd8ed1ab_0 + - pandas=2.2.3=py311h7db5c69_1 + - pandocfilters=1.5.0=pyhd8ed1ab_0 + - panedr=0.8.0=pyhd8ed1ab_0 + - parmed=4.3.0=py311h8cc7b42_0 + - parso=0.8.4=pyhd8ed1ab_0 + - pcre2=10.44=hba22ea6_2 + - perl=5.32.1=7_hd590300_perl5 + - pexpect=4.9.0=pyhd8ed1ab_0 + - pickleshare=0.7.5=py_1003 + - pillow=11.0.0=py311h49e9ac3_0 + - pint=0.23=pyhd8ed1ab_1 + - pip=24.3.1=pyh8b19718_0 + - pixman=0.43.2=h59595ed_0 + - pkgutil-resolve-name=1.3.10=pyhd8ed1ab_1 + - platformdirs=4.3.6=pyhd8ed1ab_0 + - pluggy=1.5.0=pyhd8ed1ab_0 + - prometheus_client=0.21.0=pyhd8ed1ab_0 + - prompt-toolkit=3.0.48=pyha770c72_0 + - psutil=6.1.0=py311h9ecbd09_0 + - pthread-stubs=0.4=hb9d3cd8_1002 + - ptyprocess=0.7.0=pyhd3deb0d_0 + - pure_eval=0.2.3=pyhd8ed1ab_0 + - py-cpuinfo=9.0.0=pyhd8ed1ab_0 + - pycairo=1.27.0=py311h124c5f0_0 + - pycalverter=1.6.1=pyhd8ed1ab_1 + - pycparser=2.22=pyhd8ed1ab_0 + - pydantic=2.9.2=pyhd8ed1ab_0 + - pydantic-core=2.23.4=py311h9e33e62_0 + - pyedr=0.8.0=pyhd8ed1ab_0 + - pygments=2.18.0=pyhd8ed1ab_0 + - pyjwt=2.9.0=pyhd8ed1ab_1 + - pyparsing=3.2.0=pyhd8ed1ab_1 + - pysocks=1.7.1=pyha2e5f31_6 + - pytables=3.10.1=py311h3ebe2b2_3 + - pytest=8.3.3=pyhd8ed1ab_0 + - python=3.11.10=hc5c86c4_3_cpython + - python-constraint=1.4.0=py_0 + - python-dateutil=2.9.0=pyhd8ed1ab_0 + - python-fastjsonschema=2.20.0=pyhd8ed1ab_0 + - python-json-logger=2.0.7=pyhd8ed1ab_0 + - python-tzdata=2024.2=pyhd8ed1ab_0 + - python_abi=3.11=5_cp311 + - pytz=2024.1=pyhd8ed1ab_0 + - pyyaml=6.0.2=py311h9ecbd09_1 + - pyzmq=26.2.0=py311h7deb3e3_3 + - qcelemental=0.28.0=pyhd8ed1ab_1 + - qcportal=0.56=pyhd8ed1ab_1 + - qhull=2020.2=h434a139_5 + - rdkit=2024.03.5=py311h845bd92_3 + - readline=8.2=h8228510_1 + - referencing=0.35.1=pyhd8ed1ab_0 + - regex=2024.11.6=py311h9ecbd09_0 + - reportlab=4.2.5=py311h9ecbd09_0 + - requests=2.32.3=pyhd8ed1ab_0 + - rfc3339-validator=0.1.4=pyhd8ed1ab_0 + - rfc3986-validator=0.1.1=pyh9f0ad1d_0 + - rlpycairo=0.2.0=pyhd8ed1ab_0 + - rpds-py=0.21.0=py311h9e33e62_0 + - scipy=1.14.1=py311he9a78e4_1 + - send2trash=1.8.3=pyh0d859eb_0 + - setuptools=75.3.0=pyhd8ed1ab_0 + - six=1.16.0=pyh6c4a22f_0 + - smirnoff99frosst=1.1.0=pyh44b312d_0 + - snappy=1.2.1=ha2e4443_0 + - sniffio=1.3.1=pyhd8ed1ab_0 + - soupsieve=2.5=pyhd8ed1ab_1 + - sqlalchemy=2.0.36=py311h9ecbd09_0 + - sqlite=3.46.1=h9eae976_0 + - stack_data=0.6.2=pyhd8ed1ab_0 + - tabulate=0.9.0=pyhd8ed1ab_1 + - terminado=0.18.1=pyh0d859eb_0 + - tinycss2=1.4.0=pyhd8ed1ab_0 + - tk=8.6.13=noxft_h4845f30_101 + - tomli=2.0.2=pyhd8ed1ab_0 + - tornado=6.4.1=py311h9ecbd09_1 + - tqdm=4.67.0=pyhd8ed1ab_0 + - traitlets=5.14.3=pyhd8ed1ab_0 + - types-python-dateutil=2.9.0.20241003=pyhff2d567_0 + - typing-extensions=4.12.2=hd8ed1ab_0 + - typing_extensions=4.12.2=pyha770c72_0 + - typing_utils=0.1.0=pyhd8ed1ab_0 + - tzdata=2024b=hc8b5060_0 + - unicodedata2=15.1.0=py311h9ecbd09_1 + - unidecode=1.3.8=pyhd8ed1ab_0 + - uri-template=1.3.0=pyhd8ed1ab_0 + - urllib3=2.2.3=pyhd8ed1ab_0 + - wcwidth=0.2.13=pyhd8ed1ab_0 + - webcolors=24.8.0=pyhd8ed1ab_0 + - webencodings=0.5.1=pyhd8ed1ab_2 + - websocket-client=1.8.0=pyhd8ed1ab_0 + - wheel=0.44.0=pyhd8ed1ab_0 + - widgetsnbextension=4.0.13=pyhd8ed1ab_0 + - xmltodict=0.14.2=pyhd8ed1ab_0 + - xorg-libice=1.1.1=hb9d3cd8_1 + - xorg-libsm=1.2.4=he73a12e_1 + - xorg-libx11=1.8.10=h4f16b4b_0 + - xorg-libxau=1.0.11=hb9d3cd8_1 + - xorg-libxdmcp=1.1.5=hb9d3cd8_0 + - xorg-libxext=1.3.6=hb9d3cd8_0 + - xorg-libxrender=0.9.11=hb9d3cd8_1 + - xorg-libxt=1.3.0=hb9d3cd8_2 + - xorg-xorgproto=2024.1=hb9d3cd8_1 + - xz=5.2.6=h166bdaf_0 + - yaml=0.2.5=h7f98852_2 + - zeromq=4.3.5=h3b0a872_6 + - zipp=3.20.2=pyhd8ed1ab_0 + - zlib=1.3.1=hb9d3cd8_2 + - zlib-ng=2.2.2=h5888daf_0 + - zstandard=0.23.0=py311hbc35293_1 + - zstd=1.5.6=ha6fb4c9_0 + - pip: + - amberutils==21.0 + - edgembar==0.2 + - mmpbsa-py==16.0 + - packmol-memgen==2024.2.9 + - pdb4amber==22.0 + - pymsmt==22.0 + - pytraj==2.0.6 + - qcaide==0.0.0 + - sander==22.0 +prefix: /home/brent/mambaforge/envs/qcarchive-user-submit diff --git a/submissions/2024-11-08-Sulfur-Hessian-Training-Coverage-Supplement-v1.1/generate.py b/submissions/2024-11-08-Sulfur-Hessian-Training-Coverage-Supplement-v1.1/generate.py new file mode 100644 index 00000000..57f91cc7 --- /dev/null +++ b/submissions/2024-11-08-Sulfur-Hessian-Training-Coverage-Supplement-v1.1/generate.py @@ -0,0 +1,115 @@ +from pathlib import Path + +import numpy as np +import qcportal # noqa avoid zstd disaster +from openff.qcsubmit.results import OptimizationResultCollection +from openff.qcsubmit.results.filters import ( + ConnectivityFilter, + RecordStatusEnum, + RecordStatusFilter, +) +from openff.qcsubmit.utils import _CachedPortalClient, portal_client_manager +from openff.toolkit import ForceField +from qcaide import Submission +from qcportal.singlepoint import SinglepointDriver + +# Load config file and force field +ff = ForceField("openff-2.1.0.offxml") +config = Submission.from_toml("opt.toml") + +client = _CachedPortalClient("https://api.qcarchive.molssi.org", ".") +opt = OptimizationResultCollection.from_server( + client, + datasets=["OpenFF Sulfur Optimization Training Coverage Supplement v1.0"], +) + +print(f"Retrieved {opt.n_results} results") + +with portal_client_manager(lambda _: client): + opt = opt.filter( + RecordStatusFilter(status=RecordStatusEnum.complete), + ConnectivityFilter(tolerance=1.2), + ) + + opt_hashes = { + rec.final_molecule.get_hash() for rec, _mol in opt.to_records() + } + +print(f"Filtered to {opt.n_results} completed records") + +# populate dataset +dataset = opt.create_basic_dataset( + dataset_name=config.name, + description=config.description, + tagline=config.name, + driver=SinglepointDriver.hessian, +) +dataset.metadata.submitter = config.submitter +dataset.metadata.long_description_url = ( + "https://github.com/openforcefield/qca-dataset-submission/tree/master/" + "submissions/" + str(Path.cwd().name) +) + +# confirm that the qcelemental molecule hashes have not changed +new_hashes = { + qcemol.identifiers.molecule_hash + for moldata in dataset.dataset.values() + for qcemol in moldata.initial_molecules +} + +assert opt_hashes == new_hashes + +print("all hashes match") + + +# summarize dataset for readme +confs = np.array([len(mol.conformers) for mol in dataset.molecules]) + +print("* Number of unique molecules:", dataset.n_molecules) +print("* Number of filtered molecules:", dataset.n_filtered) +print("* Number of conformers:", sum(confs)) +print( + "* Number of conformers per molecule (min, mean, max): " + f"{confs.min()}, {confs.mean():.2f}, {confs.max()}" +) + +masses = [ + [ + sum([atom.mass.m for atom in molecule.atoms]) + for molecule in dataset.molecules + ] +] +print(f"* Mean molecular weight: {np.mean(np.array(masses)):.2f}") +print(f"* Max molecular weight: {np.max(np.array(masses)):.2f}") +print("* Charges:", sorted(set(m.total_charge.m for m in dataset.molecules))) + +print("## Metadata") +print(f"* Elements: {{{', '.join(dataset.metadata.dict()['elements'])}}}") + + +def print_field(od, field): + print(f"\t* {field}: {od[field]}") + + +fields = [ + "basis", + "implicit_solvent", + "keywords", + "maxiter", + "method", + "program", +] +for spec, obj in dataset.qc_specifications.items(): + od = obj.dict() + print("* Spec:", spec) + for field in fields: + print_field(od, field) + print("\t* SCF properties:") + for field in od["scf_properties"]: + print(f"\t\t* {field}") + + +# write output files +dataset.export_dataset("dataset.json.bz2") +dataset.molecules_to_file("output.smi", "smi") +dataset.visualize("dataset.pdf", columns=8) diff --git a/submissions/2024-11-08-Sulfur-Hessian-Training-Coverage-Supplement-v1.1/input-environment.yaml b/submissions/2024-11-08-Sulfur-Hessian-Training-Coverage-Supplement-v1.1/input-environment.yaml new file mode 100644 index 00000000..c4f8df3d --- /dev/null +++ b/submissions/2024-11-08-Sulfur-Hessian-Training-Coverage-Supplement-v1.1/input-environment.yaml @@ -0,0 +1,15 @@ +name: qcarchive-user-submit + +channels: + - conda-forge + - openeye + +dependencies: + - python =3.11 + - pip + - qcportal >=0.49 + - openff-qcsubmit >= 0.54 + - openff-toolkit + - openeye-toolkits + - pip: + - git+https://github.com/ntBre/qcaide diff --git a/submissions/2024-11-08-Sulfur-Hessian-Training-Coverage-Supplement-v1.1/opt.toml b/submissions/2024-11-08-Sulfur-Hessian-Training-Coverage-Supplement-v1.1/opt.toml new file mode 100644 index 00000000..1ad15489 --- /dev/null +++ b/submissions/2024-11-08-Sulfur-Hessian-Training-Coverage-Supplement-v1.1/opt.toml @@ -0,0 +1,51 @@ +name = "OpenFF Sulfur Hessian Training Coverage Supplement v1.1" +description = """ + +A basic data set created to improve the training coverage of sulfonic and +phosphonic acids, sulfone, sulfonate, sulfinyl, sulfoximine, sulfonamides, +thioether, and 1,3-thiazole groups. The structures in this data set are the +optimized geometries from `OpenFF Sulfur Optimization Training Coverage +Supplement v1.0`. + +""" +short_description = "Additional Hessian training data for Sage sulfur and phosphorus parameters" +class = "optimization" +purpose = "Improve coverage in Sage" +submitter = "Brent Westbrook" + +[[pipeline]] +filename = "generate-dataset.py" +description = "This script shows how the dataset was prepared." + +# input files +[[manifest]] +filename = "generate-dataset.py" +description = "Script describing dataset generation and submission" + +[[manifest]] +filename = "input-environment.yaml" +description = "Environment file used to create the Python environment for the script" + +[[manifest]] +filename = "full-environment.yaml" +description = "Fully-resolved environment used to execute the script" + +[[manifest]] +filename = "opt.toml" +description = """ +Experimental [qcaide](https://github.com/ntBre/qcaide) input file for defining +variables used throughout the QCA submission process +""" + +# output files +[[manifest]] +filename = "dataset.json.bz2" +description = "Compressed dataset ready for submission" + +[[manifest]] +filename = "dataset.pdf" +description = "Visualization of dataset molecules" + +[[manifest]] +filename = "dataset.smi" +description = "SMILES strings for dataset molecules" \ No newline at end of file diff --git a/submissions/2024-11-08-Sulfur-Hessian-Training-Coverage-Supplement-v1.1/output.smi b/submissions/2024-11-08-Sulfur-Hessian-Training-Coverage-Supplement-v1.1/output.smi new file mode 100644 index 00000000..bee32d71 --- /dev/null +++ b/submissions/2024-11-08-Sulfur-Hessian-Training-Coverage-Supplement-v1.1/output.smi @@ -0,0 +1,129 @@ +C(C(=O)O)S(=O)(=O)CC(=O)O +C(C(C(=O)O)N)P(=O)(O)O +C(CBr)S(=O)(=O)CCBr +C(CCS(=O)(=O)O)CN(Cl)Cl +C(CCS(=O)(=O)[O-])CO +C(CS(=O)(=O)N)N +C1=CSC(=C1)S(=O)(=O)CC#N +C1=CSC(=C1)S(=O)(=O)CC(=S)N +C1=CSC2=C1C(=NC(=O)N2)N +C1CCS(=O)(=O)C1 +C1CS(=O)(=O)NCO1 +C1CS1 +C1[C@H](N=C(S1)N)CC(=O)O +C=CCS(=O)(=O)CC=C +C=CS(=O)(=O)C=C +C=CS(=O)(=O)N +CCN1C=NN=C1SC +CCS(=O)(=O)C1=NON=C1C +CCS(=O)(=O)NO +CS(=O)(=O)CC(=O)/C=N/O +CS(=O)(=O)c1ccccc1 +CSC +CSc1c2c(ncn1)N=CC2 +C[C@@H]1[C@@H](O1)P(=O)(O)O +c1cc2c(cc1F)SC(=N2)NN +c1ccc(cc1)C(F)P(=O)(O)O +c1ccc(cc1)S(=O)(=O)CC#N +c1ccc2c(c1)C(=O)C(=CS2(=O)=O)Br +c1ccc2c(c1)C=CS2(=O)=O +C1(=NN=C(S1)S(=O)(=O)N)N +C1=C(SC2=C1C=C(S2)S(=O)(=O)N)CN +C1=CSC=C1 +C1=NN=C(S1)Br +C1=NSC(=N1)N +C1CC1C(=O)NC2=NN=CS2 +CC(CN)S(=O)(=O)C +CCS(=N)(=O)c1ccc(cc1)Nc2ncc(c(n2)N[C@@H](C)C(C)(C)O)Br +CCSc1ncnc(n1)N +CC[C@@H](CS(=O)(=O)O)[N+](=O)[O-] +CN1C(=O)SSC1=O +COC1=CSC=C1C(=O)O +COS(=O)(=O)CS(=O)(=O)C +CS(=O)(=O)CC#N +CS(=O)(=O)CCC#N +CS(=O)(=O)N +CS(=O)(=O)c1ccncc1N +CS/C(=C(\C#N)/SC)/C#N +CSC1=NN=C(S1)NS(=O)(=O)N +Cc1cccc2c1N=C(S2)S +c1cc(ccc1/N=N/c2c3ccc(cc3ccc2O)S(=O)(=O)[O-])S(=O)(=O)[O-] +C(/C=C\C(C(=O)O)N)P(=O)(O)O +C(C(C(=O)O)N)SCP(=O)(O)O +C1=CSC=N1 +C1=NC(=C(N1)N)S(=O)(=O)N +C1C2=C(NN=C2CS1(=O)=O)C(=O)O +C1CC(=C/C(=C\P(=O)(O)O)/C1)C(=O)O +C1[C@@H]([C@]1(C(=O)O)N)CP(=O)(O)O +CC1(CN(C(=O)N1Cl)CCS(=O)(=O)[O-])C +CS(=N)(=O)CCCOC1=NN2C(=NC=C2C3=Cc4ccccc4O3)C=C1 +C[C@@H]1[C@@H]([C@@H]([C@H]([C@@H](N1)S(=O)(=O)O)O)O)O +C[C@H]([C@@H](C)O)Nc1c(cnc(n1)Nc2ccc(cc2)S(=N)(=O)C)Br +C[C@H]([C@@H](C)O)Nc1c(cnc(n1)Nc2ccc(cc2)S(=N)(=O)CCO)Br +c1cc(ccc1C(CN)CS(=O)(=O)O)Cl +C(/C=C/P(=O)(O)O)C(C(=O)O)N +C(C[C@@H](CS)N)CS(=O)(=O)[O-] +C1=C(SC(=N1)NC(=O)C(=O)O)Br +C1C(C2=C(SC=C2C1=O)Br)O +C1C=CCS1(=O)=O +C1CNS(=O)(=O)C1 +CC(c1ccc(cc1)S(=O)(=O)O)(C(=O)O)N +CN1C(=NC(=N1)[N+](=O)[O-])S(=O)(=O)C +COc1cc(ccc1c2cc(ncc2F)Nc3cc(ccn3)C[S@](=N)(=O)C)F +CS(=O)(=O)NO +C[N+]1(CCCCC1)CS(=O)(=O)[O-] +Cc1cc(ccc1S(=N)(=O)C)Nc2ncc(c(n2)N[C@H](C)C(C)(C)O)Br +c1cc(ccc1C(CN)(CS(=O)(=O)O)O)Cl +c1cc(ccc1CN)S(=O)(=O)N +c1ccc(cc1)[C@H](CO)S(=O)(=O)O +c1ccc2c(c1)ccc(c2/N=N/c3ccc(cc3)S(=O)(=O)[O-])O +C([C@@H](C(=O)O)N)NCP(=O)(O)O +C1(=NN=C(S1)S)S +C1=C(SC(=N1)NC(=O)C(=O)O)C#N +C1=CN=C(N1)S(=O)(=O)N +C1CS(=O)(=O)CCN1 +C1[C@@H]([C@H]1P(=O)(O)O)[C@@H](C(=O)O)N +CC(NC(=O)C(Cc1ccccc1)CS)S(=O)(=O)[O-] +CC1=[N+](c2ccccc2S1)CCCS(=O)(=O)[O-] +CCCN1C(=O)c2cccc3c2c(cc(c3N)S(=O)(=O)[O-])C1=O +CN1C(CC2=C/C(=N/NC(=O)N)/C(=O)C=C21)S(=O)(=O)[O-] +CN1CCS(=O)(=O)C1=O +CNCC(c1ccc(c(c1)O)O)S(=O)(=O)O +COc1cccc2c1C=C(O2)C3=CN=C4N3N=C(C=C4)OCCCS(=N)(=O)C +C[C@H]([C@H](C)O)Nc1c(cnc(n1)Nc2ccc(c(c2)OC)S(=N)(=O)C)Br +c1cc(c(cc1[N+](=O)[O-])S(=O)(=O)[O-])/C=C/c2ccc(cc2S(=O)(=O)[O-])[N+](=O)[O-] +c1cc2c(cc1N)SC(=C2)S(=O)(=O)N +c1nc(c2c(n1)SC=N2)N +C(CP(=O)(O)O)[C@@H](C(=O)O)N +C1CN(S(=O)(=O)C1)N=O +C1[C@@H]([C@H](C(N1)S(=O)(=O)O)O)O +CC(C(C)(C)O)Sc1c(cnc(n1)Nc2ccc(cc2)S(=N)(=O)C)Br +CC(C)(C)/[N+](=C/c1ccc(cc1S(=O)(=O)[O-])S(=O)(=O)[O-])/[O-] +CC1(COC(=O)N1Cl)CS(=O)(=O)CCS(=O)(=O)[O-] +CC1=Nc2cc(ccc2S1)O +CCCCCCCCS(=O)(=O)[O-] +CS(=N)(=O)CC[C@@H](C(=O)O)N +CS(=O)(=O)C +CS(=O)(=O)C=C +C[C@H](C(C)(C)O)Nc1c(cnc(n1)Nc2ccc(cc2)S(=N)(=O)C)Br +c1cc(ccc1C2=NNC(=O)S2)Cl +c1cc2c(ccc(c2c(c1)S(=O)(=O)O)N)S(=O)(=O)O +c1ccc(cc1)/N=N/c2c(ccc3c2c(cc(c3)S(=O)(=O)[O-])S(=O)(=O)[O-])O +c1ccc(cc1)C(O)P(=O)(O)O +c1ccc(cc1)C=C(c2ccccc2)S(=O)(=O)O +C/C(=C\C(C(=O)O)N)/CP(=O)(O)O +C1=C(N=CN1)/C=C\2/C(=O)NC(=O)S2 +C1=CN=C2N(C1=O)N=C(S2)S(=O)(=O)N +C1[C@H]([C@H]1S(=O)(=O)O)[C@H](C(=O)O)N +CC1C(C(C(N1)S(=O)(=O)O)O)O +CC[S@@](=N)(=O)c1ccc(cc1)Nc2ncc(c(n2)N[C@H](C)[C@@H](C)O)Br +CN1C=CC(=O)C(=C1)S(=O)(=O)N +COc1cc(cc(c1NC(=O)c2ccc(cc2)S(=N)(=O)C)C(=O)Nc3ccc(cn3)Cl)Cl +C[C@@H]([C@H](COC)Nc1c(cnc(n1)Nc2ccc(cc2)S(=N)(=O)C)Br)O +C[C@H](C(=O)CP(=O)(O)O)N +c1cc(ccc1C[C@@H](C(=O)O)N)S(=O)(=O)O +c1cc(ccc1[C@@H](C(=O)O)N)P(=O)(O)O +c1ccc2c(c1)N3C(=NN=N3)S2 +c1ccc2c(c1)N=C(S2)NC(=O)CCS(=O)(=O)O +CC(CSc1c(cnc(n1)Nc2ccc(cc2)S(=N)(=O)C)Br)O +COC(=O)C1=CSC2=C1N=NS2