-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
DOCS: update links to available files
- Loading branch information
1 parent
2f7f034
commit 64c4fbb
Showing
4 changed files
with
15 additions
and
6 deletions.
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# Available files | ||
|
||
The generative model was pretrained on combined dataset formed from a combination of [ChemBL 33](https://chembl.gitbook.io/chembl-interface-documentation/downloads), [GuacaMol v1](https://github.com/BenevolentAI/guacamol/), [MOSES](https://github.com/molecularsets/moses), and [BindingDB (08-2023)](https://www.bindingdb.org/rwd/bind/chemsearch/marvin/Download.jsp). The dataset was processed to exclude all SMILES strings containing more than 133 tokens and which contain tokens that occurr less than 1000 times in the dataset. The combined dataset contains 5 539 765 unique and valid SMILES strings, which are split into: | ||
|
||
- [Training Partition (5 262 776 entries; 95%)](https://files.ischemist.com/ChemSpaceAL/publication_runs/1_Pretraining/datasets/combined_train.csv.gz) | ||
- [Validation Partition (276 989 entries; 5%)](https://files.ischemist.com/ChemSpaceAL/publication_runs/1_Pretraining/datasets/combined_valid.csv.gz) | ||
|
||
If you wish to use our pretrained model, you can download the model weights and dataset descriptors (these are internal parameters required for generation). Note that if you use [our Jupyter Notebook](../ChemSpaceAL.ipynb), it has a special cell which will download these files and put them into appropriate folders for you! | ||
|
||
- [Pretrained model weights](https://files.ischemist.com/ChemSpaceAL/publication_runs/1_Pretraining/datasets_descriptors/combined_train.yaml) | ||
- [Pretrained model descriptors](https://files.ischemist.com/ChemSpaceAL/publication_runs/1_Pretraining/model_weights/model7_al0_ch1.pt) | ||
|
||
You can also download the PCA weights fitted on the combined dataset: | ||
|
||
- [scaler,pca tuple stored as a pickle](https://files.ischemist.com/ChemSpaceAL/publication_runs/3_Sampling/pca_weights/scaler_pca_combined_n120.pkl) |