DyRAMO (Dynamic Reliability Adjustment for Multi-objective Optimization) is a framework to perform multi-objective optimization while maintaining the reliability of multiple prediction models.
DyRAMO is supported for Linux operation systems. DyRAMO has been tested on AlmaLinux release 9.3.
- python: 3.11
- chemtsv2: 1.0.3
- physbo: 2.0.0
- (optional) lightgbm: 3.2.1 (for property prediction)
cd YOUR_WORKSPACE
python3.11 -m venv .venv
source .venv/bin/activate
pip install chemtsv2==1.0.3 physbo==2.0.0 lightgbm==3.2.1
This installation via pip normally finishes in about 40 seconds.
git clone [email protected]:ycu-iil/DyRAMO.git
cd DyRAMO
DyRAMO employs ChemTSv2 as a molecule generator.
Here, please prepare a reward file for ChemTSv2 according to instructions on how to define reward function in ChemTSv2.
An example of a reward file for DyRAMO can be found in reward/DyRAMO_reward.py
.
Please prepare a yaml file containing the settings for both DyRAMO and ChemTSv2.
An example of a setting file can be found in config/setting_dyramo.yaml
.
Details of the settings are described in Setting to run DyRAMO section.
Please execute run.py
with the yaml file as an argument.
python run.py -c config/setting_dyramo.yaml
Expected outputs
run.log
: An execution log file for DyRAMO.serach_history.csv
: A csv file containing information on the explored input variables and the corresponding objective variables.search_result.npz
: A binary file containing information on the search results. Please refer PHYSBO documentation for details.result/
: A directory containing the results of molecule generation with ChemTSv2.
Expected run time
- Generating 10,000 molecules with a C-value of 0.01 is generally assumed to take about 10 minutes.
- In this case, this generation is set to be repeated 40 times, requiring a total of approximately 7 hours.
The settings for DYRAMO and ChemTSv2 are described in a single yaml file. The settings for ChemTSv2 are partially quoted here. More details can be found in the following link. (The description of ChemTSv2 settings written here is taken from the above link.)
Option | Suboption | Descriotion |
---|---|---|
c_val |
- | An exploration parameter to balance the trade-off between exploration and exploitation. A larger value (e.g., 1.0) prioritizes exploration, and a smaller value (e.g., 0.1) prioritizes exploitation. |
threshold_type |
- | Threshold type to select how long (hours ) or how many (generation_num ) molecule generation to perform per run. |
hours |
- | Time for molecule generation in hours per run. |
generation_num |
- | Number of molecules to be generated per run. |
reward_function |
property |
Settings for calculating reward function, Dscore. Datails for setting of the Dscore parameters can be found in the following link. |
search_range |
Search range of reliability levels for each property. Search ranges are defined by upper and lower limits and their intervals. For example, if the upper limit, lower limit, interval are set to 0.9 , 0.1 , and 0.2 , respectively, the search range is defined as follows: [0.1, 0.3, 0.5, 0.7, 0.9] . |
|
[prop].max |
Upper limit of search range. | |
[prop].min |
Lower limit of search range. | |
[prop].step |
Interval of search points. | |
DSS |
Settings for defining DSS score, an objective function in Bayesian optimization processes. | |
reward.ratio |
Proportion of molecules to be evaluated in DSS. The average of the top ratio of rewards from the generated molecules is evaluated . |
|
[prop].priority |
Priority for properties in adjusting reliabilirty levels. Select one from high , middle , and low for each property. |
|
BO |
Settings for Bayesian optimization with PHYSBO. | |
num_random_search |
Number of random search iterations for initialization. | |
num_bayes_search |
Number of search iterations by Bayesian optimization. | |
score |
The type of aquision funciton. TS (Thompson Sampling), EI (Expected Improvement) and PI (Probability of Improvement) are available. |
|
num_generation |
- | Number of running of molecule generation at each search point. |
Note
The [prop]
and .
represent the name of property to be optimized and nesting structure, respectively.
For example, a description of the search_range
parameter in yaml should be as follows.
search_range:
EGFR:
min: 0.1
max: 0.9
step: 0.01
@article{Yoshizawa2024,
title = {Avoiding Reward Hacking in Multi-Objective Molecular Design: A Data-Driven Generative Strategy with a Reliable Design Framework},
url = {https://doi.org/10.26434/chemrxiv-2024-dh681},
DOI = {10.26434/chemrxiv-2024-dh681},
journal = {ChemRxiv},
author = {Yoshizawa, Tatsuya and Ishida, Shoichi and Sato, Tomohiro and Ohta, Masateru and Honma, Teruki and Terayama, Kei},
year = {2024},
month = jun
}
Note
If you would like to reproduce the results of the above article, please refer to doc/reproduction_instruction.md.
This package is distributed under the MIT License.
- Tatsuya Yoshizawa ([email protected])
- Kei Terayama ([email protected])