-
Notifications
You must be signed in to change notification settings - Fork 27
Tutorial
In this tutorial, we will use summit for a multiobjective optimization problem.
Make sure you have summit installed:
pip install summit --upgrade --extra-index-url pypi.rxns.io
Additionally, you will need numpy, pandas and matplotlib for this tutorial:
pip install --upgrade numpy pandas matplotlib
Once you have those packages installed, start a new .py file and add the following lines:
from summit.data import DataSet
from summit.domain import ContinuousVariable, Constraint, Domain
from summit.strategies import TSEMO2
from summit.models import GPyModel, AnalyticalModel
from summit.utils import pareto_efficient
from sklearn.model_selection import KFold
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d.art3d as art3d
In the first block we are importing classes from summit. In the second block, we are importing classes from other packages that we will use.
Finally, download this CSV file. An easy way to download the file is to copy the link on the "Raw" button from the page that comes up. You can then run the following command in your terminal to download the file:
```curl URL -o silica_experiments_data.csv````
Note that the link will expire after a couple hours.
We'll begin by importing the existing experimental data and converting it into a DataSet.
data_pd = pd.read_csv('silica_experiments_data.csv')
input_columns = [ 'TEOS', 'NH3', 'H2O', 'EtOH']
output_columns = ['PSD', 'STD']
metadata_columns=['Batch']
data_pd = data_pd[input_columns + output_columns + metadata_columns]
data_pd['PSD_distance_100'] = ((data_pd['PSD']-100)**2)**0.5
#Transform
data = DataSet.from_df(data_pd, metadata_columns=metadata_columns)
The first line reads in the CSV file as a pandas dataframe. We then highlight the columns corresponding with our inputs; these are the variables we can change in the experiments. The next line highlights the outputs, our objectives that we want to optimize.
We also do a transform on one of the output columns to be the squared difference between the value and 100. This is so we can target a value of 100.
After that we convert into Summits special form of a DataFrame called a DataSet. A DataSet stores metadata about each column such as units. Finally
Now, we specify the optimization domain, i.e., the variables that summit should to optimize the objectives.
#Set up the optimization problem domain
domain = Domain()
#Decision variables
domain += ContinuousVariable('TEOS',
description = '',
bounds=[1, 35])
domain += ContinuousVariable('NH3',
description = '',
bounds=[0, 100])
domain += ContinuousVariable('H2O',
description = '',
bounds=[0, 100])
# domain += ContinuousVariable('EtOH',
# description = '',
# bounds=[0, 100])
#Objectives
domain += ContinuousVariable('PSD_distance_100',
description = 'Distance of average particle size from target of 100',
bounds=[0, 1000],
is_objective=True,
maximize=False)
domain += ContinuousVariable('STD',
description = 'Standard deviation of particle size distribution',
bounds=[0, 1000],
is_objective=True,
maximize=False)
#Constraints
domain += Constraint(lhs='TEOS+NH3+H2O-100',
constraint_type='<')
domain += Constraint(lhs='(H2O+0.75*0.91*NH3)/18-2*TEOS/208.33',
constraint_type='>')
You might notice that there are some constraints added in which specify limits on the inputs.
We now use TSEMO2 to request new experiments. Note that GP models are used in this case. This outputs a CSV file with the conditions for the next experiments.
num_experiments=4
input_dim = domain.num_continuous_dimensions() + domain.num_discrete_variables()
models = {'PSD_distance_100': GPyModel(input_dim=input_dim),
'STD': GPyModel(input_dim=input_dim)
}
tsemo = TSEMO2(domain, models)
experiments = tsemo.generate_experiments(data, num_experiments)
experiments['EtOH'] = 100-(experiments['TEOS'] + experiments['NH3'] + experiments['H2O'])
experiments = experiments.round()
experiments.to_csv('next_experiments.csv')