Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Introduce TrialRunner Abstraction #720

Draft
wants to merge 217 commits into
base: main
Choose a base branch
from

Conversation

bpkroth
Copy link
Contributor

@bpkroth bpkroth commented Mar 19, 2024

This is another step in adding support for parallel trial execution #380.

Here we separate out the running of an individual trial to a single class - TrialRunner.

Multiple TrialRunners are instantiated at CLI invocation with the --num-trial-runners argument. 
Each TrialRunner associated with a single copy of the root Environment, and made unique by means of a unique trial_runner_id value that's included in that Environment's global_config.

TODO:

  • tests

In future PRs we will add:

  • New Scheduler implementations to run TrialRunners in parallel.
  • Async polling of status results in each TrialRunner independently.

motus added 30 commits February 21, 2024 15:34
assert {
trial_runner.environment.const_args["trial_runner_id"]
for trial_runner in launcher.trial_runners
} == set(range(0, len(launcher.trial_runners)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXME: Some places I think we use 0 indexing and others 1 indexing. We should be consistent about that.

return config

# NOTE: This may no longer be necessary with the new schema.
def add_new_config_data(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were refactored originally to avoid changing the schema. We don't need that anymore, but it still might be handy to keep around.

@motus, thoughts on reverting vs. just leaving this part of it? It also affects the _save_params code movement.

@@ -446,6 +455,25 @@ def tunables(self) -> TunableGroups:
"""
return self._tunables

@abstractmethod
def assign_trial_runner(self, trial_runner_id: int) -> int:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this up to be near the property?

@@ -129,7 +129,7 @@
>>> # Access ExperimentData by experiment id.
>>> experiment_data = storage.experiments["my_experiment_id"]
>>> experiment_data.trials
{1: Trial :: my_experiment_id:1 cid:1 SUCCEEDED}
{1: Trial :: my_experiment_id:1 cid:1 rid:None SUCCEEDED}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No runner assigned here. May want to note that in the docstring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Work in progress - do not merge yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants