Skip to content

Commit

Permalink
Added documentation for input and output (#203)
Browse files Browse the repository at this point in the history
* Started with a documentation page

* Started with a documentation page

* Added documentation of the input required to run OpenDC, and the output returned by OpenDC to the website.
  • Loading branch information
DanteNiewenhuis authored Feb 16, 2024
1 parent 10c4710 commit 29f3fd2
Show file tree
Hide file tree
Showing 9 changed files with 175 additions and 8,924 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ public interface HostTableReader {
public val powerDraw: Double

/**
* The total energy consumption of the host since last time in J.
* The total energy consumption of the host since last sample in J.
*/
public val energyUsage: Double

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ public interface ServerTableReader {
public val bootTime: Instant?

/**
* The capacity of the CPUs of the servers (in MHz).
* The capacity of the CPUs of Host on which the server is running (in MHz).
*/
public val cpuLimit: Double

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ public interface ServiceTableReader {
public val hostsDown: Int

/**
* The number of servers that are registered with the compute service..
* The number of servers that are registered with the compute service.
*/
public val serversTotal: Int

Expand Down
42 changes: 42 additions & 0 deletions site/docs/documentation/Input.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@

OpenDC requires three files to run an experiment. First is the topology of the data center that will be simulated.
Second, is a meta trace providing an overview of the servers that need to be executed. Third is the trace describing the
computational demand of each job over time.

### Topology
The topology of a datacenter is described by a csv file. Each row in the csv is a cluster
of in the data center. Below is an example of a topology file consisting of three clusters:

| ClusterID | ClusterName | Cores | Speed | Memory | numberOfHosts | memoryCapacityPerHost | coreCountPerHost |
|-----------|-------------|-------|-------|--------|---------------|-----------------------|------------------|
| A01 | A01 | 32 | 3.2 | 2048 | 1 | 256 | 32 |
| B01 | B01 | 48 | 2.93 | 1256 | 6 | 64 | 8 |
| C01 | C01 | 32 | 3.2 | 2048 | 2 | 128 | 16 |


### Traces
OpenDC works with two types of traces that describe the servers that need to be run. Both traces have to be provided as
parquet files.

#### Meta
The meta trace provides an overview of the servers:

| Metric | Datatype | Unit | Summary |
|--------------|------------|----------|--------------------------------------------------|
| id | string | | The id of the server |
| start_time | datetime64 | datetime | The submission time of the server |
| stop_time | datetime64 | datetime | The finish time of the submission |
| cpu_count | int32 | count | The number of CPUs required to run this server |
| cpu_capacity | float64 | MHz | The amount of CPU required to run this server |
| mem_capacity | int64 | MB | The amount of memory required to run this server |

#### Trace
The Trace file provides information about the computational demand of each server over time:

| Metric | Datatype | Unit | Summary |
|-----------|------------|---------------|---------------------------------------------|
| id | string | | The id of the server |
| timestamp | datetime64 | datetime | The timestamp of the sample |
| duration | int64 | milli seconds | The duration since the last sample |
| cpu_count | int32 | count | The number of cpus required |
| cpu_usage | float64 | MHz | The amount of computational power required. |
66 changes: 66 additions & 0 deletions site/docs/documentation/Output.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@

Running OpenDC results in three output files. The first file ([Server](#server)) contains metrics related to the jobs being executed.
The second file ([Host](#host)) contains all metrics related to the hosts on which jobs can be executed. Finally, the third file ([Service](#service))
contains metrics describing the overall performance. An experiment in OpenDC has

### Server
The server output file, contains all metrics of related to the servers run.

| Metric | Datatype | Unit | Summary |
|-----------------|----------|---------------|-------------------------------------------------------------------------------|
| timestamp | int | datetime | Timestamp of the sample |
| server_id | string | | The id of the server determined during runtime |
| server_name | string | | The name of the server provided by the Trace |
| host_id | string | | The id of the host on which the server is hosted or `null` if it has no host. |
| mem_capacity | int | Mb | |
| cpu_count | int | count | |
| cpu_limit | float | MHz | The capacity of the CPUs of Host on which the server is running. |
| cpu_time_active | int | seconds | The duration that a CPU was active in the server. |
| cpu_time_idle | int | seconds | The duration that a CPU was idle in the server. |
| cpu_time_steal | int | seconds | The duration that a vCPU wanted to run, but no capacity was available. |
| cpu_time_lost | int | seconds | The duration of CPU time that was lost due to interference. |
| uptime | int | milli seconds | The uptime of the host since last sample. |
| downtime | int | milli seconds | The downtime of the host since last sample. |
| provision_time | int | datetime | The time at which the server was enqueued for the scheduler. |
| boot_time | int | datetime | The time at which the server booted. |

### Host
The host output file, contains all metrics of related to the host run.

| Metric | DataType | Unit | Summary |
|-------------------|----------|---------------|-------------------------------------------------------------------------------------------------|
| timestamp | int | datetime | Timestamp of the sample |
| host_id | string | | The id of the host given by OpenDC |
| cpu_count | int | count | The number of available cpu cores |
| mem_capacity | int | Mb | The amount of available memory |
| guests_terminated | int | count | The number of guests that are in a terminated state. |
| guests_running | int | count | The number of guests that are in a running state. |
| guests_error | int | count | The number of guests that are in an error state. |
| guests_invalid | int | count | The number of guests that are in an unknown state. |
| cpu_limit | float | MHz | The capacity of the CPUs in the host. |
| cpu_usage | float | MHz | The usage of all CPUs in the host. |
| cpu_demand | float | MHz | The demand of all vCPUs of the guests |
| cpu_utilization | float | ratio | The CPU utilization of the host. This is calculated by dividing the cpu_usage, by the cpu_limit |
| cpu_time_active | int | seconds | The duration that a CPU was active in the host. |
| cpu_time_idle | int | seconds | The duration that a CPU was idle in the host. |
| cpu_time_steal | int | seconds | The duration that a vCPU wanted to run, but no capacity was available. |
| cpu_time_lost | int | seconds | The duration of CPU time that was lost due to interference. |
| power_draw | float | Watt | The current power draw of the host. |
| energy_usage | float | Joule (Ws) | he total energy consumption of the host since last sample. |
| uptime | int | milli seconds | The uptime of the host since last sample. |
| downtime | int | milli seconds | The downtime of the host since last sample. |
| boot_time | int | datetime | The timestamp at which the host booted. |

### Service
The service output file, contains metrics providing an overview of the performance.

| Metric | DataType | Unit | Summary |
|------------------|----------|----------|------------------------------------------------------------------------|
| timestamp | int | datetime | Timestamp of the sample |
| hosts_up | int | count | The number of hosts that are up at this instant. |
| hosts_down | int | count | The number of hosts that are down at this instant. |
| servers_pending | int | count | The number of servers that are pending to be scheduled. |
| servers_active | int | count | The number of servers that are currently active. |
| attempts_success | int | count | The scheduling attempts that were successful. |
| attempts_failure | int | count | The scheduling attempts that were unsuccessful due to client error. |
| attempts_error | int | count | The scheduling attempts that were unsuccessful due to scheduler error. |
7 changes: 7 additions & 0 deletions site/docs/documentation/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"label": "Documentation",
"position": 5,
"link": {
"type": "generated-index"
}
}
2 changes: 1 addition & 1 deletion site/docusaurus.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ const config = {
plugins: [
[
"content-docs",
/** @type {import("@docusaurus/plugin-content-docs").Options} */
// /** @type {import("@docusaurus/plugin-content-docs").Options} */
({
id: "community",
path: "community",
Expand Down
Loading

0 comments on commit 29f3fd2

Please sign in to comment.