You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enable simple creation and execution of experiments
Deliverables
API that enables control over full lifecycle of an experiment and supports a variety of hardware choices.
Client library for interacting with API
Why Are We Doing It?
We want to allow anyone to define a Thunderdome experiment and run it themselves. They shouldn't need to be experts in AWS and Terraform. We want them to be able to define the parameters of an experiment, execute it and then get access to the results in Grafana automatically.
Notes
The work to integrate Kubo release candidate experiments requires a limited API for creating one-shot experiments but this is a fuller API allowing management of the lifecycle of an experiment.
Should be able to say “test these N docker images, in an experiment called ‘foo’” and start seeing results in Grafana in < 5mins with a single cli call or an edit to a single file. Limit to AWS first
We accept here a constrained selection of resources at first (probably Fargate, so 4 core 30G RAM), but should be able to eventually have a wide of machine sizes)
What Is it?
Enable simple creation and execution of experiments
Deliverables
Why Are We Doing It?
We want to allow anyone to define a Thunderdome experiment and run it themselves. They shouldn't need to be experts in AWS and Terraform. We want them to be able to define the parameters of an experiment, execute it and then get access to the results in Grafana automatically.
Notes
The work to integrate Kubo release candidate experiments requires a limited API for creating one-shot experiments but this is a fuller API allowing management of the lifecycle of an experiment.
Should be able to say “test these N docker images, in an experiment called ‘foo’” and start seeing results in Grafana in < 5mins with a single cli call or an edit to a single file. Limit to AWS first
We accept here a constrained selection of resources at first (probably Fargate, so 4 core 30G RAM), but should be able to eventually have a wide of machine sizes)
Project overview is on Notion
Tasks
Now being tracked as part of probe lab: https://www.notion.so/pl-strflt/Thunderdome-Self-Service-Experiments-85dd1389e7bb4bf6a36d638b45d29d20
The text was updated successfully, but these errors were encountered: