This package provides a data-driven computer-aided synthesis planning tool using retrosynthesis knowledge. In this package, the model of ReTReK was trained with US Patent dataset instead of Reaxys reaction dataset. Hence, please note that we cannot guarantee that the model gives the same synthetic routes in the manuscript.
Note The pure Python version of ReTReK is available at https://github.com/clinfo/ReTReKpy
- Ubuntu: 18.04 (model training & synthetic route prediction)
- macOS Catalina: 10.15.7 (synthetic route prediction)
- python: 3.7
- Java: 1.8
- ChemAxon: 20.13
- Commons Collections: 4.4
- args4j: 2.33
Please refer to the following link.
Note: The order of the knowledge arguments corresponds to that of the knowledge_weight arguments.
javac CxnUtils.java # for the first time only
# use all knowledge
python run.py --config config/sample.json --target data/sample.mol --knowledge cdscore rdscore asscore stscore --knowledge_weights 1.0 1.0 1.0 1.0
# use CDScore with a weight of 2.0
python run.py --config config/sample.json --target data/sample.mol --knowledge cdscore --knowledge_weights 2.0 0.0 0.0 0.0
If you want to try your own molecule, prepare the molecule as MDL MOLfile format and replace data/sample.mol
with the prepared file.
The target molecules used in the manuscript are stored in data/evaluation_compounds
.
If you want to try the molecules in the directory, run the command as follows:
NOTE: You need to download additional files using git-lfs to run the below command.
At first, run git lfs install && git lfs pull
to download data/starting_materials_zinc.smi
.
python run.py --config config/sample2.json --target data/evaluation_compounds/drug-like-compounds/MtbTMPK_inhibitor.mol --knowledge cdscore --knowledge_weights 5.0 0.0 0.0 0.0 --sel_const 10 --expansion_num 500
python run.py --config config/sample2.json --target data/evaluation_compounds/drug-like-compounds/α7_nicotinic_acetylcholine_receptor_silent_agonist.mol --knowledge cdscore --knowledge_weights 5.0 0.0 0.0 0.0 --sel_const 10 --expansion_num 500
--sel_const
: constant value for selection (default value is set to 3).--expansion_num
: number of reaction templates used in the expansion step (default value is set to 50).--starting_material
: path to SMILES format file containing starting materials.--search_count
: the maximum number of iterations of MCTS (default value is set to 100).
CDScore aims to favor convergent synthesis, which is known as an efficient strategy in multi-step chemical synthesis.
For a similar purpose of CDScore, the number of available substances generated in a reaction step is calculated.
A ring construction strategy is preferred if a target compounds has complex ring structures.
A synthetic reaction with few by-products is generally preferred in terms of yield.
- Shoichi Ishida: [email protected]
- Ryosuke Kojima: [email protected]
@article{Ishida2022,
doi = {10.1021/acs.jcim.1c01074},
url = {https://doi.org/10.1021/acs.jcim.1c01074},
year = {2022},
month = mar,
publisher = {American Chemical Society ({ACS})},
volume = {62},
number = {6},
pages = {1357--1367},
author = {Shoichi Ishida and Kei Terayama and Ryosuke Kojima and Kiyosei Takasu and Yasushi Okuno},
title = {{AI}-Driven Synthetic Route Design Incorporated with Retrosynthesis Knowledge},
journal = {Journal of Chemical Information and Modeling}
}
This application is developed as part of a kGCN project.