-
Notifications
You must be signed in to change notification settings - Fork 27
Tutorial
In this tutorial, we will use summit for a multiobjective optimization problem.
Make sure you have summit installed:
pip install summit --upgrade --extra-index-url pypi.rxns.io
Additionally, you will need numpy, pandas and matplotlib for this tutorial:
pip install --upgrade numpy pandas matplotlib
Once you have those packages installed, start a new .py file and add the following lines:
from summit.data import DataSet
from summit.domain import ContinuousVariable, Constraint, Domain
from summit.strategies import TSEMO2
from summit.models import GPyModel, AnalyticalModel
from summit.utils import pareto_efficient
from sklearn.model_selection import KFold
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d.art3d as art3d
In the first block we are importing classes from summit. In the second block, we are importing classes from other packages that we will use.
Finally, download this CSV file.
We'll begin by importing the existing experimental data and converting it into a DataSet.
data_pd = pd.read_csv('silica_experiments_data.csv')
input_columns = [ 'TEOS', 'NH3', 'H2O', 'EtOH']
output_columns = ['PSD', 'STD']
metadata_columns=['Batch']
data_pd = data_pd[input_columns + output_columns + metadata_columns]
data_pd['PSD_distance_100'] = ((data_pd['PSD']-100)**2)**0.5
#Transform
data = DataSet.from_df(data_pd, metadata_columns=metadata_columns)
The first line reads in the CSV file as a pandas dataframe. We then highlight the columns corresponding with our inputs; these are the variables we can change in the experiments. The next line highlights the outputs, our objectives that we want to optimize.
We also do a transform on one of the output columns to be the squared difference between the value and 100. This is so we can target a value of 100.
After that we convert into Summits special form of a DataFrame called a DataSet. A DataSet stores metadata about each column such as units. Finally
Now, we specify the optimization domain, i.e., the variables that summit should to optimize the objectives.
#Set up the optimization problem domain
domain = Domain()
#Decision variables
domain += ContinuousVariable('TEOS',
description = '',
bounds=[1, 35])
domain += ContinuousVariable('NH3',
description = '',
bounds=[0, 100])
domain += ContinuousVariable('H2O',
description = '',
bounds=[0, 100])
# domain += ContinuousVariable('EtOH',
# description = '',
# bounds=[0, 100])
#Objectives
domain += ContinuousVariable('PSD_distance_100',
description = 'Distance of average particle size from target of 100',
bounds=[0, 1000],
is_objective=True,
maximize=False)
domain += ContinuousVariable('STD',
description = 'Standard deviation of particle size distribution',
bounds=[0, 1000],
is_objective=True,
maximize=False)
#Constraints
domain += Constraint(lhs='TEOS+NH3+H2O-100',
constraint_type='<')
domain += Constraint(lhs='(H2O+0.75*0.91*NH3)/18-2*TEOS/208.33',
constraint_type='>')
You might notice that there are some constraints added in which specify limits on the inputs.
We now use TSEMO2 to request new experiments. Note that GP models are used in this case. This outputs a CSV file with the conditions for the next experiments.
num_experiments=4
input_dim = domain.num_continuous_dimensions() + domain.num_discrete_variables()
models = {'PSD_distance_100': GPyModel(input_dim=input_dim),
'STD': GPyModel(input_dim=input_dim)
}
tsemo = TSEMO2(domain, models)
experiments = tsemo.generate_experiments(data, num_experiments)
experiments['EtOH'] = 100-(experiments['TEOS'] + experiments['NH3'] + experiments['H2O'])
experiments = experiments.round()
experiments.to_csv('next_experiments.csv')