diff --git a/master/_images/parameter_types_11_0.png b/master/_images/parameter_types_11_0.png
new file mode 100644
index 00000000..4bea406c
Binary files /dev/null and b/master/_images/parameter_types_11_0.png differ
diff --git a/master/_images/parameter_types_17_0.png b/master/_images/parameter_types_17_0.png
new file mode 100644
index 00000000..111c7dd9
Binary files /dev/null and b/master/_images/parameter_types_17_0.png differ
diff --git a/master/_images/parameter_types_3_0.png b/master/_images/parameter_types_3_0.png
new file mode 100644
index 00000000..e7a4b21e
Binary files /dev/null and b/master/_images/parameter_types_3_0.png differ
diff --git a/master/_sources/advanced-tour.ipynb.txt b/master/_sources/advanced-tour.ipynb.txt
index dc72e40e..9e93d09d 100644
--- a/master/_sources/advanced-tour.ipynb.txt
+++ b/master/_sources/advanced-tour.ipynb.txt
@@ -96,7 +96,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
- "Next point to probe is: {'x': -0.331911981189704, 'y': 1.3219469606529486}\n"
+ "Next point to probe is: {'x': np.float64(-0.331911981189704), 'y': np.float64(1.3219469606529486)}\n"
]
}
],
@@ -167,12 +167,12 @@
"name": "stdout",
"output_type": "stream",
"text": [
- "-18.503835804889988 {'x': 1.953072105336, 'y': -2.9609778030491904}\n",
- "-1.0819533157901717 {'x': 0.22703572807626315, 'y': 2.4249238905875123}\n",
- "-6.50219704520679 {'x': -1.9991881984624875, 'y': 2.872282989383577}\n",
- "-5.747604713731052 {'x': -1.994467585936897, 'y': -0.664242699361514}\n",
- "-2.9682431497650823 {'x': 1.9737252084307952, 'y': 1.269540259274744}\n",
- "{'target': 0.7861845912690544, 'params': {'x': -0.331911981189704, 'y': 1.3219469606529486}}\n"
+ "-18.707136686093495 {'x': np.float64(1.9261486197444082), 'y': np.float64(-2.9996360060323246)}\n",
+ "0.750594563473972 {'x': np.float64(-0.3763326769822668), 'y': np.float64(1.328297354179696)}\n",
+ "-6.559031075654336 {'x': np.float64(1.979183535803597), 'y': np.float64(2.9083667381450318)}\n",
+ "-6.915481333972961 {'x': np.float64(-1.9686133847781613), 'y': np.float64(-1.009985740060171)}\n",
+ "-6.8600832617014085 {'x': np.float64(-1.9763198875239296), 'y': np.float64(2.9885278383464513)}\n",
+ "{'target': np.float64(0.7861845912690544), 'params': {'x': np.float64(-0.331911981189704), 'y': np.float64(1.3219469606529486)}}\n"
]
}
],
@@ -190,112 +190,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "## 2. Dealing with discrete parameters\n",
- "\n",
- "**There is no principled way of dealing with discrete parameters using this package.**\n",
- "\n",
- "Ok, now that we got that out of the way, how do you do it? You're bound to be in a situation where some of your function's parameters may only take on discrete values. Unfortunately, the nature of bayesian optimization with gaussian processes doesn't allow for an easy/intuitive way of dealing with discrete parameters - but that doesn't mean it is impossible. The example below showcases a simple, yet reasonably adequate, way to dealing with discrete parameters."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 9,
- "metadata": {},
- "outputs": [],
- "source": [
- "def func_with_discrete_params(x, y, d):\n",
- " # Simulate necessity of having d being discrete.\n",
- " assert type(d) == int\n",
- " \n",
- " return ((x + y + d) // (1 + d)) / (1 + (x + y) ** 2)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 10,
- "metadata": {},
- "outputs": [],
- "source": [
- "def function_to_be_optimized(x, y, w):\n",
- " d = int(w)\n",
- " return func_with_discrete_params(x, y, d)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 11,
- "metadata": {},
- "outputs": [],
- "source": [
- "optimizer = BayesianOptimization(\n",
- " f=function_to_be_optimized,\n",
- " pbounds={'x': (-10, 10), 'y': (-10, 10), 'w': (0, 5)},\n",
- " verbose=2,\n",
- " random_state=1,\n",
- ")"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 12,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "| iter | target | w | x | y |\n",
- "-------------------------------------------------------------\n",
- "| \u001b[30m1 | \u001b[30m-0.06199 | \u001b[30m2.085 | \u001b[30m4.406 | \u001b[30m-9.998 |\n",
- "| \u001b[35m2 | \u001b[35m-0.0344 | \u001b[35m1.512 | \u001b[35m-7.065 | \u001b[35m-8.153 |\n",
- "| \u001b[30m3 | \u001b[30m-0.2177 | \u001b[30m0.9313 | \u001b[30m-3.089 | \u001b[30m-2.065 |\n",
- "| \u001b[35m4 | \u001b[35m0.1865 | \u001b[35m2.694 | \u001b[35m-1.616 | \u001b[35m3.704 |\n",
- "| \u001b[30m5 | \u001b[30m-0.2187 | \u001b[30m1.022 | \u001b[30m7.562 | \u001b[30m-9.452 |\n",
- "| \u001b[35m6 | \u001b[35m0.2488 | \u001b[35m2.684 | \u001b[35m-2.188 | \u001b[35m3.925 |\n"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "| \u001b[35m7 | \u001b[35m0.2948 | \u001b[35m2.683 | \u001b[35m-2.534 | \u001b[35m4.08 |\n",
- "| \u001b[35m8 | \u001b[35m0.3202 | \u001b[35m2.514 | \u001b[35m-3.83 | \u001b[35m5.287 |\n",
- "| \u001b[30m9 | \u001b[30m0.0 | \u001b[30m4.057 | \u001b[30m-4.458 | \u001b[30m3.928 |\n",
- "| \u001b[35m10 | \u001b[35m0.4802 | \u001b[35m2.296 | \u001b[35m-3.518 | \u001b[35m4.558 |\n",
- "| \u001b[30m11 | \u001b[30m0.0 | \u001b[30m1.084 | \u001b[30m-3.737 | \u001b[30m4.472 |\n",
- "| \u001b[30m12 | \u001b[30m0.0 | \u001b[30m2.649 | \u001b[30m-3.861 | \u001b[30m4.353 |\n",
- "| \u001b[30m13 | \u001b[30m0.0 | \u001b[30m2.442 | \u001b[30m-3.658 | \u001b[30m4.599 |\n",
- "| \u001b[30m14 | \u001b[30m-0.05801 | \u001b[30m1.935 | \u001b[30m-0.4758 | \u001b[30m-8.755 |\n",
- "| \u001b[30m15 | \u001b[30m0.0 | \u001b[30m2.337 | \u001b[30m7.973 | \u001b[30m-8.96 |\n",
- "| \u001b[30m16 | \u001b[30m0.07699 | \u001b[30m0.6926 | \u001b[30m5.59 | \u001b[30m6.854 |\n",
- "| \u001b[30m17 | \u001b[30m-0.02025 | \u001b[30m3.534 | \u001b[30m-8.943 | \u001b[30m1.987 |\n",
- "| \u001b[30m18 | \u001b[30m0.0 | \u001b[30m2.59 | \u001b[30m-7.339 | \u001b[30m5.941 |\n",
- "| \u001b[30m19 | \u001b[30m0.0929 | \u001b[30m2.237 | \u001b[30m-4.535 | \u001b[30m9.065 |\n",
- "| \u001b[30m20 | \u001b[30m0.1538 | \u001b[30m0.477 | \u001b[30m2.931 | \u001b[30m2.683 |\n",
- "| \u001b[30m21 | \u001b[30m0.0 | \u001b[30m0.9999 | \u001b[30m4.397 | \u001b[30m-3.971 |\n",
- "| \u001b[30m22 | \u001b[30m-0.01894 | \u001b[30m3.764 | \u001b[30m-7.043 | \u001b[30m-3.184 |\n",
- "| \u001b[30m23 | \u001b[30m0.03683 | \u001b[30m1.851 | \u001b[30m5.783 | \u001b[30m7.966 |\n",
- "| \u001b[30m24 | \u001b[30m-0.04359 | \u001b[30m1.615 | \u001b[30m-5.133 | \u001b[30m-6.556 |\n",
- "| \u001b[30m25 | \u001b[30m0.02617 | \u001b[30m3.863 | \u001b[30m0.1052 | \u001b[30m8.579 |\n",
- "| \u001b[30m26 | \u001b[30m-0.1071 | \u001b[30m0.8131 | \u001b[30m-0.7949 | \u001b[30m-9.292 |\n",
- "| \u001b[30m27 | \u001b[30m0.0 | \u001b[30m4.969 | \u001b[30m8.778 | \u001b[30m-8.467 |\n",
- "| \u001b[30m28 | \u001b[30m-0.1372 | \u001b[30m0.9475 | \u001b[30m-1.019 | \u001b[30m-7.018 |\n",
- "| \u001b[30m29 | \u001b[30m0.08078 | \u001b[30m1.917 | \u001b[30m-0.2606 | \u001b[30m6.272 |\n",
- "| \u001b[30m30 | \u001b[30m0.02003 | \u001b[30m4.278 | \u001b[30m3.8 | \u001b[30m8.398 |\n",
- "=============================================================\n"
- ]
- }
- ],
- "source": [
- "optimizer.set_gp_params(alpha=1e-3)\n",
- "optimizer.maximize()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 3. Tuning the underlying Gaussian Process\n",
+ "## 2. Tuning the underlying Gaussian Process\n",
"\n",
"The bayesian optimization algorithm works by performing a gaussian process regression of the observed combination of parameters and their associated target values. The predicted parameter $\\rightarrow$ target hyper-surface (and its uncertainty) is then used to guide the next best point to probe."
]
@@ -304,14 +199,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "### 3.1 Passing parameter to the GP\n",
+ "### 2.1 Passing parameter to the GP\n",
"\n",
"Depending on the problem it could be beneficial to change the default parameters of the underlying GP. You can use the `optimizer.set_gp_params` method to do this:"
]
},
{
"cell_type": "code",
- "execution_count": 13,
+ "execution_count": 9,
"metadata": {},
"outputs": [
{
@@ -320,12 +215,12 @@
"text": [
"| iter | target | x | y |\n",
"-------------------------------------------------\n",
- "| \u001b[30m1 | \u001b[30m0.7862 | \u001b[30m-0.3319 | \u001b[30m1.322 |\n",
- "| \u001b[30m2 | \u001b[30m-18.19 | \u001b[30m1.957 | \u001b[30m-2.919 |\n",
- "| \u001b[30m3 | \u001b[30m-12.05 | \u001b[30m-1.969 | \u001b[30m-2.029 |\n",
- "| \u001b[30m4 | \u001b[30m-7.463 | \u001b[30m0.6032 | \u001b[30m-1.846 |\n",
- "| \u001b[30m5 | \u001b[30m-1.093 | \u001b[30m1.444 | \u001b[30m1.096 |\n",
- "| \u001b[35m6 | \u001b[35m0.8586 | \u001b[35m-0.2165 | \u001b[35m1.307 |\n",
+ "| \u001b[39m1 \u001b[39m | \u001b[39m0.7862 \u001b[39m | \u001b[39m-0.331911\u001b[39m | \u001b[39m1.3219469\u001b[39m |\n",
+ "| \u001b[39m2 \u001b[39m | \u001b[39m-18.34 \u001b[39m | \u001b[39m1.9021640\u001b[39m | \u001b[39m-2.965222\u001b[39m |\n",
+ "| \u001b[35m3 \u001b[39m | \u001b[35m0.8731 \u001b[39m | \u001b[35m-0.298167\u001b[39m | \u001b[35m1.1948749\u001b[39m |\n",
+ "| \u001b[39m4 \u001b[39m | \u001b[39m-6.497 \u001b[39m | \u001b[39m1.9876938\u001b[39m | \u001b[39m2.8830942\u001b[39m |\n",
+ "| \u001b[39m5 \u001b[39m | \u001b[39m-4.286 \u001b[39m | \u001b[39m-1.995643\u001b[39m | \u001b[39m-0.141769\u001b[39m |\n",
+ "| \u001b[39m6 \u001b[39m | \u001b[39m-6.781 \u001b[39m | \u001b[39m-1.953302\u001b[39m | \u001b[39m2.9913127\u001b[39m |\n",
"=================================================\n"
]
}
@@ -348,7 +243,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "### 3.2 Tuning the `alpha` parameter\n",
+ "### 2.2 Tuning the `alpha` parameter\n",
"\n",
"When dealing with functions with discrete parameters,or particularly erratic target space it might be beneficial to increase the value of the `alpha` parameter. This parameters controls how much noise the GP can handle, so increase it whenever you think that extra flexibility is needed."
]
@@ -358,7 +253,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "### 3.3 Changing kernels\n",
+ "### 2.3 Changing kernels\n",
"\n",
"By default this package uses the Matern 2.5 kernel. Depending on your use case you may find that tuning the GP kernel could be beneficial. You're on your own here since these are very specific solutions to very specific problems. You should start with the [scikit learn docs](https://scikit-learn.org/stable/modules/gaussian_process.html#kernels-for-gaussian-processes)."
]
@@ -376,7 +271,7 @@
},
{
"cell_type": "code",
- "execution_count": 14,
+ "execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
@@ -385,7 +280,7 @@
},
{
"cell_type": "code",
- "execution_count": 15,
+ "execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
@@ -399,7 +294,7 @@
},
{
"cell_type": "code",
- "execution_count": 16,
+ "execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
@@ -411,7 +306,7 @@
},
{
"cell_type": "code",
- "execution_count": 17,
+ "execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
@@ -433,7 +328,7 @@
},
{
"cell_type": "code",
- "execution_count": 18,
+ "execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
@@ -449,7 +344,7 @@
},
{
"cell_type": "code",
- "execution_count": 19,
+ "execution_count": 15,
"metadata": {},
"outputs": [
{
@@ -476,7 +371,7 @@
},
{
"cell_type": "code",
- "execution_count": 20,
+ "execution_count": 16,
"metadata": {},
"outputs": [
{
@@ -485,7 +380,7 @@
"['optimization:start', 'optimization:step', 'optimization:end']"
]
},
- "execution_count": 20,
+ "execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
@@ -497,7 +392,7 @@
],
"metadata": {
"kernelspec": {
- "display_name": "Python 3 (ipykernel)",
+ "display_name": "bayesian-optimization-t6LLJ9me-py3.10",
"language": "python",
"name": "python3"
},
@@ -511,7 +406,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.1.undefined"
+ "version": "3.10.13"
},
"nbdime-conflicts": {
"local_diff": [
diff --git a/master/_sources/basic-tour.ipynb.txt b/master/_sources/basic-tour.ipynb.txt
index 3cbcbd40..4ecd8329 100644
--- a/master/_sources/basic-tour.ipynb.txt
+++ b/master/_sources/basic-tour.ipynb.txt
@@ -252,7 +252,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Or as an iterable. Beware that the order has to be alphabetical. You can usee `optimizer.space.keys` for guidance"
+ "Or as an iterable. Beware that the order has to match the order of the initial `pbounds` dictionary. You can usee `optimizer.space.keys` for guidance"
]
},
{
diff --git a/master/_sources/index.rst.txt b/master/_sources/index.rst.txt
index ac664a58..5c198c6f 100644
--- a/master/_sources/index.rst.txt
+++ b/master/_sources/index.rst.txt
@@ -11,6 +11,7 @@
Basic Tour
Advanced Tour
Constrained Bayesian Optimization
+ Parameter Types
Sequential Domain Reduction
Acquisition Functions
Exploration vs. Exploitation
@@ -26,6 +27,7 @@
reference/constraint
reference/domain_reduction
reference/target_space
+ reference/parameter
reference/exception
reference/other
@@ -121,11 +123,13 @@ section. We suggest that you:
to learn how to use the package's most important features.
- Take a look at the `advanced tour
notebook `__
- to learn how to make the package more flexible, how to deal with
- categorical parameters, how to use observers, and more.
+ to learn how to make the package more flexible or how to use observers.
- To learn more about acquisition functions, a central building block
of bayesian optimization, see the `acquisition functions
notebook `__
+- If you want to optimize over integer-valued or categorical
+ parameters, see the `parameter types
+ notebook `__.
- Check out this
`notebook `__
with a step by step visualization of how this method works.
@@ -195,6 +199,20 @@ For constrained optimization:
year={2014}
}
+For optimization over non-float parameters:
+
+::
+
+ @article{garrido2020dealing,
+ title={Dealing with categorical and integer-valued variables in bayesian optimization with gaussian processes},
+ author={Garrido-Merch{\'a}n, Eduardo C and Hern{\'a}ndez-Lobato, Daniel},
+ journal={Neurocomputing},
+ volume={380},
+ pages={20--35},
+ year={2020},
+ publisher={Elsevier}
+ }
+
.. |tests| image:: https://github.com/bayesian-optimization/BayesianOptimization/actions/workflows/run_tests.yml/badge.svg
.. |Codecov| image:: https://codecov.io/github/bayesian-optimization/BayesianOptimization/badge.svg?branch=master&service=github
:target: https://codecov.io/github/bayesian-optimization/BayesianOptimization?branch=master
diff --git a/master/_sources/parameter_types.ipynb.txt b/master/_sources/parameter_types.ipynb.txt
new file mode 100644
index 00000000..3d668300
--- /dev/null
+++ b/master/_sources/parameter_types.ipynb.txt
@@ -0,0 +1,756 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Optimizing over non-float Parameters\n",
+ "\n",
+ "Sometimes, you need to optimize a target that is not just a function of floating-point values, but relies on integer or categorical parameters. This notebook shows how such problems are handled by following an approach from [\"Dealing with categorical and integer-valued variables in Bayesian Optimization with Gaussian processes\" by Garrido-Merchán and Hernández-Lobato](https://arxiv.org/abs/1805.03463). One simple way of handling an integer-valued parameter is to run the optimization as normal, but then round to the nearest integer after a point has been suggested. This method is similar, except that the rounding is performed in the _kernel_. Why does this matter? It means that the kernel is aware that two parameters, that map the to same point but are potentially distinct before this transformation are the same."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import warnings\n",
+ "import numpy as np\n",
+ "import matplotlib.pyplot as plt\n",
+ "from bayes_opt import BayesianOptimization\n",
+ "from bayes_opt import acquisition\n",
+ "\n",
+ "from sklearn.gaussian_process.kernels import Matern\n",
+ "\n",
+ "# suppress warnings about this being an experimental feature\n",
+ "warnings.filterwarnings(action=\"ignore\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 1. Simple integer-valued function\n",
+ "Let's look at a simple, one-dimensional, integer-valued target function and compare a typed optimizer and a continuous optimizer."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ "
+
What if our acquisition function does not depend solely on mean and std? In that case, we can intervene at a deeper level, and overwrite the ._get_acq function. The built-in version of ._get_acq is a higher-order function which returns a function accepting an array-like x and performing the following:
+
+
Evaluate the GP mean \(\mu\) and std \(\sigma\) at points x.
+
If applicapable, evaluate the constraint fulfilment probability \(p_c\) at x.
+
Return -1*base_acq(mean,std) or -1*base_acq(mean,std)*p_c
+
An example of such an acquisition function is Thompson Sampling. If you consider a Gaussian Process as a prior over functions, Thompson Sampling works by sampling a function from this prior and then selecting the argmax as next point of interest. This is a somewhat noisy version of the greedy acquisition, which will encourage some exploration.
While we usually find the argmax of the acquisition function through a combination of random sampling and gradient-based optimization, we will skip the gradient-based optimization here, as it is quite expensive and would require us to fix the multivariate normal. We can do this by additionally overwriting .suggest, specifically by changing the default argument of n_l_bfgs_b to 0.
-Next point to probe is: {'x': -0.331911981189704, 'y': 1.3219469606529486}
+Next point to probe is: {'x': np.float64(-0.331911981189704), 'y': np.float64(1.3219469606529486)}
You are now free to evaluate your function at the suggested point however/whenever you like.
There is no principled way of dealing with discrete parameters using this package.
-
Ok, now that we got that out of the way, how do you do it? You’re bound to be in a situation where some of your function’s parameters may only take on discrete values. Unfortunately, the nature of bayesian optimization with gaussian processes doesn’t allow for an easy/intuitive way of dealing with discrete parameters - but that doesn’t mean it is impossible. The example below showcases a simple, yet reasonably adequate, way to dealing with discrete parameters.
-
-
[9]:
-
-
-
deffunc_with_discrete_params(x,y,d):
- # Simulate necessity of having d being discrete.
- asserttype(d)==int
-
- return((x+y+d)//(1+d))/(1+(x+y)**2)
-
The bayesian optimization algorithm works by performing a gaussian process regression of the observed combination of parameters and their associated target values. The predicted parameter \(\rightarrow\) target hyper-surface (and its uncertainty) is then used to guide the next best point to probe.
Depending on the problem it could be beneficial to change the default parameters of the underlying GP. You can use the optimizer.set_gp_params method to do this:
When dealing with functions with discrete parameters,or particularly erratic target space it might be beneficial to increase the value of the alpha parameter. This parameters controls how much noise the GP can handle, so increase it whenever you think that extra flexibility is needed.
By default this package uses the Matern 2.5 kernel. Depending on your use case you may find that tuning the GP kernel could be beneficial. You’re on your own here since these are very specific solutions to very specific problems. You should start with the scikit learn docs.
Observers are objects that subscribe and listen to particular events fired by the BayesianOptimization object.
When an event gets fired a callback function is called with the event and the BayesianOptimization instance passed as parameters. The callback can be specified at the time of subscription. If none is given it will look for an update method from the observer.
Take a look at the advanced tour
notebook
-to learn how to make the package more flexible, how to deal with
-categorical parameters, how to use observers, and more.
+to learn how to make the package more flexible or how to use observers.
To learn more about acquisition functions, a central building block
of bayesian optimization, see the acquisition functions
notebook
+
If you want to optimize over integer-valued or categorical
+parameters, see the parameter types
+notebook.
Check out this
notebook
with a step by step visualization of how this method works.
Sometimes, you need to optimize a target that is not just a function of floating-point values, but relies on integer or categorical parameters. This notebook shows how such problems are handled by following an approach from “Dealing with categorical and integer-valued variables in Bayesian Optimization with Gaussian processes” by Garrido-Merchán and Hernández-Lobato. One simple way of handling an integer-valued parameter is to run the optimization as
+normal, but then round to the nearest integer after a point has been suggested. This method is similar, except that the rounding is performed in the kernel. Why does this matter? It means that the kernel is aware that two parameters, that map the to same point but are potentially distinct before this transformation are the same.
+
+
[1]:
+
+
+
importwarnings
+importnumpyasnp
+importmatplotlib.pyplotasplt
+frombayes_optimportBayesianOptimization
+frombayes_optimportacquisition
+
+fromsklearn.gaussian_process.kernelsimportMatern
+
+# suppress warnings about this being an experimental feature
+warnings.filterwarnings(action="ignore")
+
Let’s look at a simple, one-dimensional, integer-valued target function and compare a typed optimizer and a continuous optimizer.
+
+
[2]:
+
+
+
deftarget_function_1d(x):
+ returnnp.sin(np.round(x))-np.abs(np.round(x)/5)
+
+c_pbounds={'x':(-10,10)}
+bo_cont=BayesianOptimization(target_function_1d,c_pbounds,verbose=0,random_state=1)
+
+# one way of constructing an integer-valued parameter is to add a third element to the tuple
+d_pbounds={'x':(-10,10,int)}
+bo_disc=BayesianOptimization(target_function_1d,d_pbounds,verbose=0,random_state=1)
+
+fig,axs=plt.subplots(2,1,figsize=(10,6),sharex=True,sharey=True)
+
+bo_cont.maximize(init_points=2,n_iter=10)
+bo_cont.acquisition_function._fit_gp(bo_cont._gp,bo_cont.space)
+
+y_mean,y_std=bo_cont._gp.predict(np.linspace(-10,10,1000).reshape(-1,1),return_std=True)
+axs[0].set_title('Continuous')
+axs[0].plot(np.linspace(-10,10,1000),target_function_1d(np.linspace(-10,10,1000)),'k--',label='True function')
+axs[0].plot(np.linspace(-10,10,1000),y_mean,label='Predicted mean')
+axs[0].fill_between(np.linspace(-10,10,1000),y_mean-y_std,y_mean+y_std,alpha=0.3,label='Predicted std')
+axs[0].plot(bo_cont.space.params,bo_cont.space.target,'ro')
+
+bo_disc.maximize(init_points=2,n_iter=10)
+bo_disc.acquisition_function._fit_gp(bo_disc._gp,bo_disc.space)
+
+y_mean,y_std=bo_disc._gp.predict(np.linspace(-10,10,1000).reshape(-1,1),return_std=True)
+axs[1].set_title('Discrete')
+axs[1].plot(np.linspace(-10,10,1000),target_function_1d(np.linspace(-10,10,1000)),'k--',label='True function')
+axs[1].plot(np.linspace(-10,10,1000),y_mean,label='Predicted mean')
+axs[1].fill_between(np.linspace(-10,10,1000),y_mean-y_std,y_mean+y_std,alpha=0.3,label='Predicted std')
+axs[1].plot(bo_disc.space.params,bo_disc.space.target,'ro')
+
+foraxinaxs:
+ ax.grid(True)
+fig.tight_layout()
+
+
+
+
+
+
+
+
+
+
+
We can see, that the discrete optimizer is aware that the function is discrete and does not try to predict values between the integers. The continuous optimizer tries to predict values between the integers, despite the fact that these are known. We can also see that the discrete optimizer predicts blocky mean and standard deviations, which is a result of the discrete nature of the function.
We can also handle categorical variables! This is done under-the-hood by constructing parameters in a one-hot-encoding representation, with a transformation in the kernel rounding to the nearest one-hot representation. If you want to use this, you can specify a collection of strings as options.
+
NB: As internally, the categorical variables are within a range of [0,1] and the GP used for BO is by default isotropic, you might want to ensure your other features are similarly scaled to a range of [0,1] or use an anisotropic GP.
A typical usecase for integer and categorical parameters is optimizing the hyperparameters of a machine learning model. Below you can find an example where the hyperparameters of an SVM are optimized.
Maybe you want to optimize over another form of parameters, which does not align with float, int or categorical. For this purpose, you can create your own, custom parameter. A simple example is a parameter that is discrete, but still admits a distance representation (like an integer) while not being uniformly spaced.
+
However, you can go further even and encode constraints and even symmetries in your parameter. Let’s consider the problem of finding a triangle which maximizes an area given its sides \(a, b, c\) with a constraint that the perimeter is fixed, i.e. \(a + b + c=s\).
+
We will create a parameter that encodes such a triangle, and via it’s kernel transform ensures that the sides sum to the required length \(s\). As you might expect, the solution to this problem is an equilateral triangle, i.e. \(a=b=c=s/3\).
+
To define the parameter, we need to subclass BayesParameter and define a few important functions/properties.
+
+
is_continuous is a property which denotes whether a parameter is continuous. When optimizing the acquisition function, non-continuous parameters will not be optimized using gradient-based methods, but only via random sampling.
+
random_sample is a function that samples randomly from the space of the parameter.
+
to_float transforms the canonical representation of a parameter into float values for the target space to store. There is a one-to-one correspondence between valid float representations produced by this function and canonical representations of the parameter. This function is most important when working with parameters that use a non-numeric canonical representation, such as categorical parameters.
+
to_param performs the inverse of to_float: Given a float-based representation, it creates a canonical representation. This function should perform binning whenever appropriate, e.g. in the case of the IntParameter, this function would round any float values supplied to it.
+
kernel_transform is the most important function of the Parameter and defines how to represent a value in the kernel space. In contrast to to_float, this function expects both the input, as well as the output to be float-representations of the value.
+
to_string produces a stringified version of the parameter, which allows users to define custom pretty-print rules for ththe ScreenLogger use.
+
dim is a property which defines the dimensionality of the parameter. In most cases, this will be 1, but e.g. for categorical parameters it is equivalent to the cardinality of the category space.
This seems to work decently well, but we can improve it significantly if we consider the symmetries inherent in the problem: This problem is permutation invariant, i.e. we do not care which side specifically is denoted as \(a\), \(b\) or \(c\). Instead, we can, without loss of generality, decide that the shortest side will always be denoted as \(a\), and the longest always as \(c\). If we enhance our kernel transform with this symmetry, the performance improves significantly.
+This can be easily done by sub-classing the previously created triangle parameter.
This class takes the function to optimize as well as the parameters bounds
in order to find which values for the parameters yield the maximum value
@@ -1250,6 +1294,8 @@
random_state : np.random.RandomState or int or None, default=None¶
A sequential domain reduction transformer based on the work by Stander, N. and Craig, K:
“On the robustness of a simple domain reduction scheme for simulation-based optimization”
Number of samples to draw. If 0, a single sample is drawn,
+and a 1D array is returned. If n_samples > 0, an array of
+shape (n_samples, dim) is returned.
+
+
random_state : np.random.RandomState | int | None¶