Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Metadata for SMAC enabling Multi-Fidelity #771

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
548af15
Implement metadata for multifidelity in SMAC
jsfreischuetz Jul 2, 2024
36ac67a
Merge branch 'main' into multi-fidelity
jsfreischuetz Jul 2, 2024
4cc133b
Merge branch 'main' into multi-fidelity
bpkroth Jul 2, 2024
5ab03c8
Merge branch 'main' into multi-fidelity
bpkroth Jul 3, 2024
bfd2a42
Update mlos_core/mlos_core/optimizers/README
jsfreischuetz Jul 8, 2024
16208f4
Update mlos_core/mlos_core/optimizers/README
jsfreischuetz Jul 8, 2024
938f8f0
Update mlos_core/mlos_core/optimizers/README
jsfreischuetz Jul 8, 2024
81d6d56
Update mlos_core/mlos_core/optimizers/README
jsfreischuetz Jul 8, 2024
bf2f3cc
Update mlos_core/mlos_core/optimizers/README
jsfreischuetz Jul 8, 2024
1686c7c
some comments
jsfreischuetz Jul 8, 2024
d263613
more comments for README
jsfreischuetz Jul 8, 2024
6766b8d
Update mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimi…
jsfreischuetz Jul 8, 2024
bae1763
Update mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimi…
jsfreischuetz Jul 8, 2024
53af62b
mergE
jsfreischuetz Jul 8, 2024
81e8bb0
Merge branch 'multi-fidelity' of github.com:jsfreischuetz/MLOS into m…
jsfreischuetz Jul 8, 2024
dcff9cc
Update mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimi…
jsfreischuetz Jul 8, 2024
50ef16c
Update mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimi…
jsfreischuetz Jul 8, 2024
c32bd67
Update mlos_core/mlos_core/optimizers/utils.py
jsfreischuetz Jul 8, 2024
2b15694
Update mlos_core/mlos_core/tests/optimizers/optimizer_metadata_test.py
jsfreischuetz Jul 8, 2024
574b8cc
Update mlos_core/mlos_core/optimizers/utils.py
jsfreischuetz Jul 8, 2024
3d4c055
Update mlos_core/mlos_core/optimizers/utils.py
jsfreischuetz Jul 8, 2024
41ee533
Update mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimi…
jsfreischuetz Jul 8, 2024
cfa936a
Update mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimi…
jsfreischuetz Jul 8, 2024
abd3eb6
Update mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimi…
jsfreischuetz Jul 8, 2024
e0ac571
comment
jsfreischuetz Jul 8, 2024
c1e0845
Merge branch 'multi-fidelity' of github.com:jsfreischuetz/MLOS into m…
jsfreischuetz Jul 8, 2024
9234599
comments
jsfreischuetz Jul 8, 2024
054fce3
Merge branch 'main' into multi-fidelity
jsfreischuetz Jul 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
"linalg",
"llamatune",
"matplotlib",
"metadatum",
"mlos",
"mloscore",
"mwait",
Expand Down Expand Up @@ -72,6 +73,7 @@
"sklearn",
"skopt",
"smac",
"Sobol",
"sqlalchemy",
"srcpaths",
"subcmd",
Expand Down
2 changes: 1 addition & 1 deletion mlos_bench/mlos_bench/optimizers/mlos_core_optimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@ def register(self, tunables: TunableGroups, status: Status,
return registered_score

def get_best_observation(self) -> Union[Tuple[Dict[str, float], TunableGroups], Tuple[None, None]]:
(df_config, df_score, _df_context) = self._opt.get_best_observations()
(df_config, df_score, _df_context, _df_metadata) = self._opt.get_best_observations()
if len(df_config) == 0:
return (None, None)
params = configspace_data_to_tunable_values(df_config.iloc[0].to_dict())
Expand Down
29 changes: 29 additions & 0 deletions mlos_core/mlos_core/optimizers/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Optimizers

This is a directory that contains wrappers for different optimizers to integrate into MLOS.
This is implemented though child classes for the `BaseOptimizer` class defined in `optimizer.py`.

The main goal of these optimizers is to `suggest` configurations, possibly based on prior trial data to find an optimum based on some objective(s).
This process is interacted with through `register` and `suggest` interfaces.

The following definitions are useful for understanding the implementation

- `configuration`: a vector representation of a configuration of a system to be evaluated.
- `score`: the objective(s) associated with a configuration
- `context`: additional (static) information about the evaluation used to extend the internal model used for suggesting samples.
For instance, a descriptor of the VM size (vCore count and # of GB of RAM), and some descriptor of the workload.
The intent being to allow either sharing or indexing of trial info between "similar" experiments in order to help make the optimization process more efficient for new scenarios.
> Note: This is not yet implemented.
- `metadata`: additional information about the evaluation, such as the runtime budget used during evaluation.
This data is typically specific to the given optimizer backend and may be returned during a `suggest` call and expected to be provided again during the subsequent `register` call.

- `register`: this is a function that takes a `configuration`, `score`, and, optionally, `metadata` and `context` about the evaluation to update the model for future evaluations.

- `register`: this is a function that takes a `configuration`, `score`, and, optionally, `metadata` and `context` about the evaluation to update the model for future evaluations.
- `suggest`: this function returns a new configuration for evaluation.

Some optimizers will return additional metadata for evaluation, that should be used during the register phase.
- `get_observations`: returns all observations reported to the optimizer as a triplet of DataFrames `(config, score, context, metadata)`.
- `get_best_observations`: returns the best observation as a triplet of best `(config, score, context, metadata)` DataFrames.
- `get_observations`: returns all observations reported to the optimizer as a triplet of DataFrames `(config, score, context, metadata)`.
- `get_best_observations`: returns the best observation as a triplet of best `(config, score, context, metadata)` DataFrames.
261 changes: 231 additions & 30 deletions mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimizer.py

Large diffs are not rendered by default.

59 changes: 41 additions & 18 deletions mlos_core/mlos_core/optimizers/optimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,9 @@ def __init__(self, *,
raise ValueError("Number of weights must match the number of optimization targets")

self._space_adapter: Optional[BaseSpaceAdapter] = space_adapter
self._observations: List[Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]]] = []
self._observations: List[Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]] = []
self._has_context: Optional[bool] = None
self._pending_observations: List[Tuple[pd.DataFrame, Optional[pd.DataFrame]]] = []
self._pending_observations: List[Tuple[pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]] = []

def __repr__(self) -> str:
return f"{self.__class__.__name__}(space_adapter={self.space_adapter})"
Expand Down Expand Up @@ -98,7 +98,7 @@ def register(self, *, configs: pd.DataFrame, scores: pd.DataFrame,
"Mismatched number of configs and context."
assert configs.shape[1] == len(self.parameter_space.values()), \
"Mismatched configuration shape."
self._observations.append((configs, scores, context))
self._observations.append((configs, scores, context, metadata))
self._has_context = context is not None

if self._space_adapter:
Expand Down Expand Up @@ -197,26 +197,48 @@ def register_pending(self, *, configs: pd.DataFrame,
"""
pass # pylint: disable=unnecessary-pass # pragma: no cover

def get_observations(self) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]]:
def _get_observations(self, observations:
List[Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]]
) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]:
"""
Returns the observations as a triplet of DataFrames (config, score, context).
Returns the observations as a quad of DataFrames(config, score, context, metadata)
for a specific set of observations.

Parameters
----------
observations: List[Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]]
Observations to run the transformation on

Returns
-------
observations : Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]]
A triplet of (config, score, context) DataFrames of observations.
observations: Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]], Optional[pd.DataFrame]]
A quad of(config, score, context, metadata) DataFrames of observations.
"""
if len(self._observations) == 0:
if len(observations) == 0:
raise ValueError("No observations registered yet.")
configs = pd.concat([config for config, _, _ in self._observations]).reset_index(drop=True)
scores = pd.concat([score for _, score, _ in self._observations]).reset_index(drop=True)
configs = pd.concat([config for config, _, _, _ in observations]).reset_index(drop=True)
scores = pd.concat([score for _, score, _, _ in observations]).reset_index(drop=True)
contexts = pd.concat([pd.DataFrame() if context is None else context
for _, _, context in self._observations]).reset_index(drop=True)
return (configs, scores, contexts if len(contexts.columns) > 0 else None)
for _, _, context, _ in observations]).reset_index(drop=True)
metadatum = pd.concat([pd.DataFrame() if metadata is None else metadata
for _, _, _, metadata in observations]).reset_index(drop=True)
return (configs, scores, contexts if len(contexts.columns) > 0 else None, metadatum if len(metadatum.columns) > 0 else None)

def get_observations(self) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These Tuples are getting a little large and hard to read (recall a previous version of this PR where the order of them was mistakenly swapped at one point).

Think we discussed creating a NamedTuple or small DataClass for them instead so that they can be accessed by name in order to make it more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want I can do this in this PR, or another follow up PR

Copy link
Contributor

@bpkroth bpkroth Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a predecessor PR would be better. Much like we did with adding the metadata args and named args first.

"""
Returns the observations as a quad of DataFrames(config, score, context, metadata).

Returns
-------
observations: Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]], Optional[pd.DataFrame]]
A quad of(config, score, context, metadata) DataFrames of observations.
"""
return self._get_observations(self._observations)

def get_best_observations(self, *, n_max: int = 1) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]]:
def get_best_observations(self, *, n_max: int = 1) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame],
Optional[pd.DataFrame]]:
"""
Get the N best observations so far as a triplet of DataFrames (config, score, context).
Get the N best observations so far as a quad of DataFrames (config, score, context, metadata).
Default is N=1. The columns are ordered in ASCENDING order of the optimization targets.
The function uses `pandas.DataFrame.nsmallest(..., keep="first")` method under the hood.

Expand All @@ -227,15 +249,16 @@ def get_best_observations(self, *, n_max: int = 1) -> Tuple[pd.DataFrame, pd.Dat

Returns
-------
observations : Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]]
A triplet of best (config, score, context) DataFrames of best observations.
observations : Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]], Optional[pd.DataFrame]]
A quad of best (config, score, context, metadata) DataFrames of best observations.
"""
if len(self._observations) == 0:
raise ValueError("No observations registered yet.")
(configs, scores, contexts) = self.get_observations()
(configs, scores, contexts, metadatum) = self.get_observations()
idx = scores.nsmallest(n_max, columns=self._optimization_targets, keep="first").index
return (configs.loc[idx], scores.loc[idx],
None if contexts is None else contexts.loc[idx])
None if contexts is None else contexts.loc[idx],
None if metadatum is None else metadatum.loc[idx])

def cleanup(self) -> None:
"""
Expand Down
58 changes: 58 additions & 0 deletions mlos_core/mlos_core/optimizers/utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
#
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
#
"""
Contains utils used for implementing the mlos_core optimizers
"""
import inspect
from typing import Any, Callable, Dict, List, Optional
import pandas as pd


def to_metadata(metadata: Optional[pd.DataFrame]) -> Optional[List[pd.Series]]:
"""
Converts a list of metadata dataframe objects to a list of metadata series objects.

Parameters
jsfreischuetz marked this conversation as resolved.
Show resolved Hide resolved
----------
metadata : Optional[pd.DataFrame]
The dataframe to convert to metadata

Returns
-------
Optional[List[pd.Series]]
The list of metadata series objects
"""
if metadata is None:
return None
return [idx_series[1] for idx_series in metadata.iterrows()]


def filter_kwargs(function: Callable, **kwargs: Any) -> Dict[str, Any]:
bpkroth marked this conversation as resolved.
Show resolved Hide resolved
"""
Filters arguments provided in the kwargs dictionary to be restricted to the arguments legal for
the called function.

Parameters
----------
function : Callable
function over which we filter kwargs for.
kwargs:
kwargs that we are filtering for the target function

Returns
-------
dict
kwargs with the non-legal argument filtered out
"""
sig = inspect.signature(function)
filter_keys = [
param.name
for param in sig.parameters.values()
if param.kind == param.POSITIONAL_OR_KEYWORD
]
filtered_dict = {
filter_key: kwargs[filter_key] for filter_key in filter_keys & kwargs.keys()
}
return filtered_dict
119 changes: 119 additions & 0 deletions mlos_core/mlos_core/tests/optimizers/optimizer_metadata_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
#
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
#
"""
Tests for Optimizers using Metadata.
"""

from typing import Callable

import logging
import pytest

import pandas as pd
import ConfigSpace as CS

from smac import MultiFidelityFacade as MFFacade
from smac.intensifier.successive_halving import SuccessiveHalving

from mlos_core.optimizers import (
OptimizerType, OptimizerFactory, BaseOptimizer)
from mlos_core.tests import SEED

_LOG = logging.getLogger(__name__)
_LOG.setLevel(logging.DEBUG)


def smac_verify_best(metadata: pd.DataFrame, best: bool = False) -> bool:
"""
Function to verify if the metadata used by SMAC is in a legal state

Parameters
----------
metadata : pd.DataFrame
metadata returned by SMAC

best: bool
If we are testing just the best contexts or not

Returns
-------
bool
if the metadata that is returned is valid
"""

max_budget = metadata["budget"].max()
assert isinstance(max_budget, float)
assert max_budget == 9

if not best:
min_budget = metadata["budget"].min()
assert isinstance(min_budget, float)
assert min_budget == 1

return True


@pytest.mark.parametrize(('optimizer_type', 'verify', 'kwargs'), [
# Enumerate all supported Optimizers
*[(member, verify, kwargs)
for member, verify, kwargs in [(
OptimizerType.SMAC,
smac_verify_best,
{
"seed": SEED,
"facade": MFFacade,
"intensifier": SuccessiveHalving,
"min_budget": 1,
"max_budget": 9
}
)]],
])
def test_optimizer_metadata(optimizer_type: OptimizerType, verify: Callable[[pd.DataFrame, bool], bool], kwargs: dict) -> None:
"""
Toy problem to test if metadata is properly being handled for each supporting optimizer
"""
max_iterations = 100

def objective(point: pd.DataFrame) -> pd.DataFrame:
# mix of hyperparameters, optimal is to select the highest possible
return pd.DataFrame({"score": point["x"] + point["y"]})

input_space = CS.ConfigurationSpace(seed=SEED)
# add a mix of numeric datatypes
input_space.add_hyperparameter(CS.UniformIntegerHyperparameter(name='x', lower=0, upper=5))
input_space.add_hyperparameter(CS.UniformFloatHyperparameter(name='y', lower=0.0, upper=5.0))

optimizer: BaseOptimizer = OptimizerFactory.create(
parameter_space=input_space,
optimization_targets=['score'],
optimizer_type=optimizer_type,
optimizer_kwargs=kwargs,
)

with pytest.raises(ValueError, match="No observations"):
optimizer.get_best_observations()

with pytest.raises(ValueError, match="No observations"):
optimizer.get_observations()

for _ in range(max_iterations):
config, metadata = optimizer.suggest()
assert isinstance(metadata, pd.DataFrame)

optimizer.register(configs=config, scores=objective(config), metadata=metadata)
bpkroth marked this conversation as resolved.
Show resolved Hide resolved

(all_configs, all_scores, all_contexts, all_metadata) = optimizer.get_observations()
assert isinstance(all_configs, pd.DataFrame)
assert isinstance(all_scores, pd.DataFrame)
assert all_contexts is None
assert isinstance(all_metadata, pd.DataFrame)
assert verify(all_metadata, False)

(best_configs, best_scores, best_contexts, best_metadata) = optimizer.get_best_observations()
assert isinstance(best_configs, pd.DataFrame)
assert isinstance(best_scores, pd.DataFrame)
assert best_contexts is None
assert isinstance(best_metadata, pd.DataFrame)
assert verify(best_metadata, True)
Original file line number Diff line number Diff line change
Expand Up @@ -102,19 +102,21 @@ def objective(point: pd.DataFrame) -> pd.DataFrame:
assert set(observation.columns) == {'main_score', 'other_score'}
optimizer.register(configs=suggestion, scores=observation)

(best_config, best_score, best_context) = optimizer.get_best_observations()
(best_config, best_score, best_context, best_metadata) = optimizer.get_best_observations()
assert isinstance(best_config, pd.DataFrame)
assert isinstance(best_score, pd.DataFrame)
assert best_context is None
assert best_metadata is None
assert set(best_config.columns) == {'x', 'y'}
assert set(best_score.columns) == {'main_score', 'other_score'}
assert best_config.shape == (1, 2)
assert best_score.shape == (1, 2)

(all_configs, all_scores, all_contexts) = optimizer.get_observations()
(all_configs, all_scores, all_contexts, all_metadata) = optimizer.get_observations()
assert isinstance(all_configs, pd.DataFrame)
assert isinstance(all_scores, pd.DataFrame)
assert all_contexts is None
assert all_metadata is None
assert set(all_configs.columns) == {'x', 'y'}
assert set(all_scores.columns) == {'main_score', 'other_score'}
assert all_configs.shape == (max_iterations, 2)
Expand Down
Loading
Loading