diff --git a/design/BulkAPI.md b/design/BulkAPI.md
new file mode 100644
index 000000000..bd7327c89
--- /dev/null
+++ b/design/BulkAPI.md
@@ -0,0 +1,254 @@
+# Bulk API Documentation
+
+Bulk is an API designed to provide resource optimization recommendations in bulk for all available
+containers, namespaces, etc., for a cluster connected via the datasource integration framework. Bulk can
+be configured using filters like exclude/include namespaces, workloads, containers, or labels for generating
+recommendations. It also has settings to generate recommendations at both the container or namespace level, or both.
+
+Bulk returns a `job_id` as a response to track the job status. The user can use the `job_id` to monitor the
+progress of the job.
+
+## Task Flow When Bulk Is Invoked
+
+1. Returns a unique `job_id`.
+2. Background Bulk:
+ - First, does a handshake with the datasource.
+ - Using queries, it fetches the list of namespaces, workloads, containers of the connected datasource.
+ - Creates experiments, one for each container *alpha release.
+ - Triggers `generateRecommendations` for each container.
+ - Once all experiments are created, and recommendations are generated, the system marks the `job_id` as "COMPLETED".
+
+## API Specification
+
+### POST /bulk
+
+**Request Payload (JSON):**
+
+```json
+{
+ "filter": {
+ "exclude": {
+ "namespace": [],
+ "workload": [],
+ "containers": [],
+ "labels": {}
+ },
+ "include": {
+ "namespace": [],
+ "workload": [],
+ "containers": [],
+ "labels": {
+ "key1": "value1",
+ "key2": "value2"
+ }
+ }
+ },
+ "time_range": {},
+ "datasource": "Cbank1Xyz",
+ "experiment_types": [
+ "container",
+ "namespace"
+ ]
+}
+```
+
+**filter:** This object contains both exclusion and inclusion filters to specify the scope of data being queried.
+
+- **exclude:** Defines the criteria to exclude certain data.
+ - **namespace:** A list of Kubernetes namespaces to exclude. If empty, no namespaces are excluded.
+ - **workload:** A list of workloads to exclude.
+ - **containers:** A list of container names to exclude.
+ - **labels:** Key-value pairs of labels to exclude.
+
+- **include:** Defines the criteria to include specific data.
+ - **namespace:** A list of Kubernetes namespaces to include.
+ - **workload:** A list of workloads to include.
+ - **containers:** A list of container names to include.
+ - **labels:** Key-value pairs of labels to include.
+
+- **time_range:** Specifies the time range for querying the data. If empty, no specific time range is applied.
+
+- **datasource:** The data source, e.g., `"Cbank1Xyz"`.
+
+- **experiment_types:** Specifies the type(s) of experiments to run, e.g., `"container"` or `"namespace"`.
+
+### Success Response
+
+- **Status:** 200 OK
+- **Body:**
+
+```json
+{
+ "job_id": "123e4567-e89b-12d3-a456-426614174000"
+}
+```
+
+### GET Request:
+
+```bash
+GET /bulk?job_id=123e4567-e89b-12d3-a456-426614174000
+```
+
+**Body (JSON):**
+
+```json
+{
+ "status": "COMPLETED",
+ "total_experiments": 23,
+ "processed_experiments": 23,
+ "job_id": "54905959-77d4-42ba-8e06-90bb97b823b9",
+ "job_start_time": "2024-10-10T06:07:09.066Z",
+ "job_end_time": "2024-10-10T06:07:17.471Z"
+}
+```
+
+```bash
+GET /bulk?job_id=123e4567-e89b-12d3-a456-426614174000&verbose=true
+```
+
+**Body (JSON):**
+When verbose=true, additional detailed information about the job is provided.
+
+```json
+{
+ "status": "IN_PROGRESS",
+ "total_experiments": 23,
+ "processed_experiments": 22,
+ "data": {
+ "experiments": {
+ "new": [
+ "prometheus-1|default|monitoring|node-exporter(daemonset)|node-exporter",
+ "prometheus-1|default|cadvisor|cadvisor(daemonset)|cadvisor",
+ "prometheus-1|default|monitoring|alertmanager-main(statefulset)|config-reloader",
+ "prometheus-1|default|monitoring|alertmanager-main(statefulset)|alertmanager",
+ "prometheus-1|default|monitoring|prometheus-operator(deployment)|kube-rbac-proxy",
+ "prometheus-1|default|kube-system|coredns(deployment)|coredns",
+ "prometheus-1|default|monitoring|prometheus-k8s(statefulset)|config-reloader",
+ "prometheus-1|default|monitoring|blackbox-exporter(deployment)|kube-rbac-proxy",
+ "prometheus-1|default|monitoring|prometheus-operator(deployment)|prometheus-operator",
+ "prometheus-1|default|monitoring|node-exporter(daemonset)|kube-rbac-proxy",
+ "prometheus-1|default|monitoring|kube-state-metrics(deployment)|kube-rbac-proxy-self",
+ "prometheus-1|default|monitoring|kube-state-metrics(deployment)|kube-state-metrics",
+ "prometheus-1|default|monitoring|kruize(deployment)|kruize",
+ "prometheus-1|default|monitoring|blackbox-exporter(deployment)|module-configmap-reloader",
+ "prometheus-1|default|monitoring|prometheus-k8s(statefulset)|prometheus",
+ "prometheus-1|default|monitoring|kube-state-metrics(deployment)|kube-rbac-proxy-main",
+ "prometheus-1|default|kube-system|kube-proxy(daemonset)|kube-proxy",
+ "prometheus-1|default|monitoring|prometheus-adapter(deployment)|prometheus-adapter",
+ "prometheus-1|default|monitoring|grafana(deployment)|grafana",
+ "prometheus-1|default|kube-system|kindnet(daemonset)|kindnet-cni",
+ "prometheus-1|default|monitoring|kruize-db-deployment(deployment)|kruize-db",
+ "prometheus-1|default|monitoring|blackbox-exporter(deployment)|blackbox-exporter"
+ ],
+ "updated": [],
+ "failed": null
+ },
+ "recommendations": {
+ "data": {
+ "processed": [
+ "prometheus-1|default|monitoring|alertmanager-main(statefulset)|config-reloader",
+ "prometheus-1|default|monitoring|node-exporter(daemonset)|node-exporter",
+ "prometheus-1|default|local-path-storage|local-path-provisioner(deployment)|local-path-provisioner",
+ "prometheus-1|default|monitoring|alertmanager-main(statefulset)|alertmanager",
+ "prometheus-1|default|monitoring|prometheus-operator(deployment)|kube-rbac-proxy",
+ "prometheus-1|default|kube-system|coredns(deployment)|coredns",
+ "prometheus-1|default|monitoring|blackbox-exporter(deployment)|kube-rbac-proxy",
+ "prometheus-1|default|monitoring|prometheus-k8s(statefulset)|config-reloader",
+ "prometheus-1|default|monitoring|prometheus-operator(deployment)|prometheus-operator",
+ "prometheus-1|default|monitoring|node-exporter(daemonset)|kube-rbac-proxy",
+ "prometheus-1|default|monitoring|kube-state-metrics(deployment)|kube-rbac-proxy-self",
+ "prometheus-1|default|monitoring|kube-state-metrics(deployment)|kube-state-metrics",
+ "prometheus-1|default|monitoring|kruize(deployment)|kruize",
+ "prometheus-1|default|monitoring|blackbox-exporter(deployment)|module-configmap-reloader",
+ "prometheus-1|default|monitoring|prometheus-k8s(statefulset)|prometheus",
+ "prometheus-1|default|monitoring|kube-state-metrics(deployment)|kube-rbac-proxy-main",
+ "prometheus-1|default|kube-system|kube-proxy(daemonset)|kube-proxy",
+ "prometheus-1|default|monitoring|prometheus-adapter(deployment)|prometheus-adapter",
+ "prometheus-1|default|monitoring|grafana(deployment)|grafana",
+ "prometheus-1|default|kube-system|kindnet(daemonset)|kindnet-cni",
+ "prometheus-1|default|monitoring|kruize-db-deployment(deployment)|kruize-db",
+ "prometheus-1|default|monitoring|blackbox-exporter(deployment)|blackbox-exporter"
+ ],
+ "processing": [
+ "prometheus-1|default|cadvisor|cadvisor(daemonset)|cadvisor"
+ ],
+ "unprocessed": [
+ ],
+ "failed": []
+ }
+ }
+ },
+ "job_id": "5798a2df-6c67-467b-a3c2-befe634a0e3a",
+ "job_start_time": "2024-10-09T18:09:31.549Z",
+ "job_end_time": null
+}
+```
+
+### Response Parameters
+
+## API Description: Experiment and Recommendation Processing Status
+
+This API response describes the status of a job that processes multiple experiments and generates recommendations for
+resource optimization in Kubernetes environments. Below is a breakdown of the JSON response:
+
+### Fields:
+
+- **status**:
+ - **Type**: `String`
+ - **Description**: Current status of the job. Can be "IN_PROGRESS", "COMPLETED", "FAILED", etc.
+
+- **total_experiments**:
+ - **Type**: `Integer`
+ - **Description**: Total number of experiments to be processed in the job.
+
+- **processed_experiments**:
+ - **Type**: `Integer`
+ - **Description**: Number of experiments that have been processed so far.
+
+- **data**:
+ - **Type**: `Object`
+ - **Description**: Contains detailed information about the experiments and recommendations being processed.
+
+ - **experiments**:
+ - **new**:
+ - **Type**: `Array of Strings`
+ - **Description**: List of new experiments that have been identified but not yet processed.
+
+ - **updated**:
+ - **Type**: `Array of Strings`
+ - **Description**: List of experiments that were previously processed but have now been updated.
+
+ - **failed**:
+ - **Type**: `null or Array`
+ - **Description**: List of experiments that failed during processing. If no failures, the value is `null`.
+
+ - **recommendations**:
+ - **data**:
+ - **processed**:
+ - **Type**: `Array of Strings`
+ - **Description**: List of experiments for which recommendations have already been processed.
+
+ - **processing**:
+ - **Type**: `Array of Strings`
+ - **Description**: List of experiments that are currently being processed for recommendations.
+
+ - **unprocessed**:
+ - **Type**: `Array of Strings`
+ - **Description**: List of experiments that have not yet been processed for recommendations.
+
+ - **failed**:
+ - **Type**: `Array of Strings`
+ - **Description**: List of experiments for which the recommendation process failed.
+
+- **job_id**:
+ - **Type**: `String`
+ - **Description**: Unique identifier for the job.
+
+- **job_start_time**:
+ - **Type**: `String (ISO 8601 format)`
+ - **Description**: Start timestamp of the job.
+
+- **job_end_time**:
+ - **Type**: `String (ISO 8601 format) or null`
+ - **Description**: End timestamp of the job. If the job is still in progress, this will be `null`.
+
diff --git a/design/MonitoringModeAPI.md b/design/MonitoringModeAPI.md
index 75899125d..91a3d1364 100644
--- a/design/MonitoringModeAPI.md
+++ b/design/MonitoringModeAPI.md
@@ -2960,6 +2960,506 @@ Returns the recommendation at a particular timestamp if it exists
+
+**Response for GPU workloads**
+
+`GET /listRecommendations`
+
+`curl -H 'Accept: application/json' http://:/listRecommendations?experiment_name=job-01`
+
+
+Example Response with GPU Recommendations
+
+```json
+[
+ {
+ "cluster_name": "default",
+ "experiment_type": "container",
+ "kubernetes_objects": [
+ {
+ "type": "statefulset",
+ "name": "human-eval-benchmark",
+ "namespace": "unpartitioned",
+ "containers": [
+ {
+ "container_name": "human-eval-benchmark",
+ "recommendations": {
+ "version": "1.0",
+ "notifications": {
+ "111000": {
+ "type": "info",
+ "message": "Recommendations Are Available",
+ "code": 111000
+ }
+ },
+ "data": {
+ "2024-10-04T09:16:40.000Z": {
+ "notifications": {
+ "111101": {
+ "type": "info",
+ "message": "Short Term Recommendations Available",
+ "code": 111101
+ },
+ "111102": {
+ "type": "info",
+ "message": "Medium Term Recommendations Available",
+ "code": 111102
+ }
+ },
+ "monitoring_end_time": "2024-10-04T09:16:40.000Z",
+ "current": {
+ "limits": {
+ "cpu": {
+ "amount": 2.0,
+ "format": "cores"
+ },
+ "memory": {
+ "amount": 8.589934592E9,
+ "format": "bytes"
+ }
+ },
+ "requests": {
+ "cpu": {
+ "amount": 1.0,
+ "format": "cores"
+ },
+ "memory": {
+ "amount": 8.589934592E9,
+ "format": "bytes"
+ }
+ }
+ },
+ "recommendation_terms": {
+ "short_term": {
+ "duration_in_hours": 24.0,
+ "notifications": {
+ "112101": {
+ "type": "info",
+ "message": "Cost Recommendations Available",
+ "code": 112101
+ },
+ "112102": {
+ "type": "info",
+ "message": "Performance Recommendations Available",
+ "code": 112102
+ }
+ },
+ "monitoring_start_time": "2024-10-03T09:16:40.000Z",
+ "recommendation_engines": {
+ "cost": {
+ "pods_count": 1,
+ "confidence_level": 0.0,
+ "config": {
+ "limits": {
+ "cpu": {
+ "amount": 1.004649523106615,
+ "format": "cores"
+ },
+ "nvidia.com/mig-3g.20gb": {
+ "amount": 1.0,
+ "format": "cores"
+ },
+ "memory": {
+ "amount": 4.9960943616E9,
+ "format": "bytes"
+ }
+ },
+ "requests": {
+ "cpu": {
+ "amount": 1.004649523106615,
+ "format": "cores"
+ },
+ "memory": {
+ "amount": 4.9960943616E9,
+ "format": "bytes"
+ }
+ }
+ },
+ "variation": {
+ "limits": {
+ "cpu": {
+ "amount": -0.995350476893385,
+ "format": "cores"
+ },
+ "memory": {
+ "amount": -3.5938402303999996E9,
+ "format": "bytes"
+ }
+ },
+ "requests": {
+ "cpu": {
+ "amount": 0.004649523106615039,
+ "format": "cores"
+ },
+ "memory": {
+ "amount": -3.5938402303999996E9,
+ "format": "bytes"
+ }
+ }
+ },
+ "notifications": {}
+ },
+ "performance": {
+ "pods_count": 1,
+ "confidence_level": 0.0,
+ "config": {
+ "limits": {
+ "cpu": {
+ "amount": 1.36656145696268,
+ "format": "cores"
+ },
+ "memory": {
+ "amount": 4.9960943616E9,
+ "format": "bytes"
+ },
+ "nvidia.com/mig-4g.20gb": {
+ "amount": 1.0,
+ "format": "cores"
+ }
+ },
+ "requests": {
+ "cpu": {
+ "amount": 1.36656145696268,
+ "format": "cores"
+ },
+ "memory": {
+ "amount": 4.9960943616E9,
+ "format": "bytes"
+ }
+ }
+ },
+ "variation": {
+ "limits": {
+ "cpu": {
+ "amount": -0.63343854303732,
+ "format": "cores"
+ },
+ "memory": {
+ "amount": -3.5938402303999996E9,
+ "format": "bytes"
+ }
+ },
+ "requests": {
+ "cpu": {
+ "amount": 0.36656145696268005,
+ "format": "cores"
+ },
+ "memory": {
+ "amount": -3.5938402303999996E9,
+ "format": "bytes"
+ }
+ }
+ },
+ "notifications": {}
+ }
+ },
+ "plots": {
+ "datapoints": 4,
+ "plots_data": {
+ "2024-10-04T09:16:40.000Z": {
+ "cpuUsage": {
+ "min": 0.005422723351267242,
+ "q1": 1.003281151419465,
+ "median": 1.0118160468783521,
+ "q3": 1.012961901380266,
+ "max": 1.36656145696268,
+ "format": "cores"
+ },
+ "memoryUsage": {
+ "min": 3.68019456E9,
+ "q1": 3.681001472E9,
+ "median": 4.058411008E9,
+ "q3": 4.093308928E9,
+ "max": 4.094062592E9,
+ "format": "bytes"
+ }
+ },
+ "2024-10-04T03:16:40.000Z": {
+ "cpuUsage": {
+ "min": 0.998888009348188,
+ "q1": 1.0029943714818779,
+ "median": 1.0033621837551019,
+ "q3": 1.0040859908301978,
+ "max": 1.0828338199135354,
+ "format": "cores"
+ },
+ "memoryUsage": {
+ "min": 3.679281152E9,
+ "q1": 3.680755712E9,
+ "median": 3.680989184E9,
+ "q3": 3.687673856E9,
+ "max": 4.163411968E9,
+ "format": "bytes"
+ }
+ },
+ "2024-10-03T15:16:40.000Z": {
+ "cpuUsage": {
+ "min": 0.005425605536480822,
+ "q1": 0.006038658069363403,
+ "median": 0.006183237135144752,
+ "q3": 0.006269460195927269,
+ "max": 0.006916437328481231,
+ "format": "cores"
+ },
+ "memoryUsage": {
+ "min": 2.192125952E9,
+ "q1": 2.192388096E9,
+ "median": 2.192388096E9,
+ "q3": 2.192388096E9,
+ "max": 2.19265024E9,
+ "format": "bytes"
+ }
+ },
+ "2024-10-03T21:16:40.000Z": {
+ "cpuUsage": {
+ "min": 0.0052184839046300075,
+ "q1": 0.006229799261227028,
+ "median": 1.0110868114913476,
+ "q3": 1.0124661560983785,
+ "max": 2.3978065580305032,
+ "format": "cores"
+ },
+ "memoryUsage": {
+ "min": 2.118012928E9,
+ "q1": 2.192392192E9,
+ "median": 4.161662976E9,
+ "q3": 4.162850816E9,
+ "max": 4.163170304E9,
+ "format": "bytes"
+ }
+ }
+ }
+ }
+ },
+ "medium_term": {
+ "duration_in_hours": 168.0,
+ "notifications": {
+ "112101": {
+ "type": "info",
+ "message": "Cost Recommendations Available",
+ "code": 112101
+ },
+ "112102": {
+ "type": "info",
+ "message": "Performance Recommendations Available",
+ "code": 112102
+ }
+ },
+ "monitoring_start_time": "2024-09-27T09:16:40.000Z",
+ "recommendation_engines": {
+ "cost": {
+ "pods_count": 1,
+ "confidence_level": 0.0,
+ "config": {
+ "limits": {
+ "cpu": {
+ "amount": 0.015580688959425347,
+ "format": "cores"
+ },
+ "nvidia.com/mig-3g.20gb": {
+ "amount": 1.0,
+ "format": "cores"
+ },
+ "memory": {
+ "amount": 4.9960943616E9,
+ "format": "bytes"
+ }
+ },
+ "requests": {
+ "cpu": {
+ "amount": 0.015580688959425347,
+ "format": "cores"
+ },
+ "memory": {
+ "amount": 4.9960943616E9,
+ "format": "bytes"
+ }
+ }
+ },
+ "variation": {
+ "limits": {
+ "cpu": {
+ "amount": -1.9844193110405746,
+ "format": "cores"
+ },
+ "memory": {
+ "amount": -3.5938402303999996E9,
+ "format": "bytes"
+ }
+ },
+ "requests": {
+ "cpu": {
+ "amount": -0.9844193110405747,
+ "format": "cores"
+ },
+ "memory": {
+ "amount": -3.5938402303999996E9,
+ "format": "bytes"
+ }
+ }
+ },
+ "notifications": {}
+ },
+ "performance": {
+ "pods_count": 1,
+ "confidence_level": 0.0,
+ "config": {
+ "limits": {
+ "cpu": {
+ "amount": 1.025365696933566,
+ "format": "cores"
+ },
+ "memory": {
+ "amount": 4.9960943616E9,
+ "format": "bytes"
+ },
+ "nvidia.com/mig-4g.20gb": {
+ "amount": 1.0,
+ "format": "cores"
+ }
+ },
+ "requests": {
+ "cpu": {
+ "amount": 1.025365696933566,
+ "format": "cores"
+ },
+ "memory": {
+ "amount": 4.9960943616E9,
+ "format": "bytes"
+ }
+ }
+ },
+ "variation": {
+ "limits": {
+ "cpu": {
+ "amount": -0.974634303066434,
+ "format": "cores"
+ },
+ "memory": {
+ "amount": -3.5938402303999996E9,
+ "format": "bytes"
+ }
+ },
+ "requests": {
+ "cpu": {
+ "amount": 0.02536569693356605,
+ "format": "cores"
+ },
+ "memory": {
+ "amount": -3.5938402303999996E9,
+ "format": "bytes"
+ }
+ }
+ },
+ "notifications": {}
+ }
+ },
+ "plots": {
+ "datapoints": 7,
+ "plots_data": {
+ "2024-09-29T09:16:40.000Z": {},
+ "2024-10-04T09:16:40.000Z": {
+ "cpuUsage": {
+ "min": 0.0052184839046300075,
+ "q1": 0.006207971650471658,
+ "median": 1.0032201196711934,
+ "q3": 1.0115567178617741,
+ "max": 2.3978065580305032,
+ "format": "cores"
+ },
+ "memoryUsage": {
+ "min": 2.118012928E9,
+ "q1": 2.192392192E9,
+ "median": 3.6808704E9,
+ "q3": 4.093349888E9,
+ "max": 4.163411968E9,
+ "format": "bytes"
+ }
+ },
+ "2024-09-30T09:16:40.000Z": {},
+ "2024-10-02T09:16:40.000Z": {
+ "cpuUsage": {
+ "min": 0.00554280490421283,
+ "q1": 0.015358846193868379,
+ "median": 0.015705212168337323,
+ "q3": 1.010702281083678,
+ "max": 1.0139464901392594,
+ "format": "cores"
+ },
+ "memoryUsage": {
+ "min": 2.192125952E9,
+ "q1": 2.717663232E9,
+ "median": 2.719612928E9,
+ "q3": 2.719617024E9,
+ "max": 2.720600064E9,
+ "format": "bytes"
+ }
+ },
+ "2024-09-28T09:16:40.000Z": {},
+ "2024-10-03T09:16:40.000Z": {
+ "cpuUsage": {
+ "min": 0.005373319820852367,
+ "q1": 0.006054991034195089,
+ "median": 0.006142447129874265,
+ "q3": 0.006268777122325054,
+ "max": 0.007366566784856696,
+ "format": "cores"
+ },
+ "memoryUsage": {
+ "min": 2.192125952E9,
+ "q1": 2.192388096E9,
+ "median": 2.192388096E9,
+ "q3": 2.192388096E9,
+ "max": 2.192654336E9,
+ "format": "bytes"
+ }
+ },
+ "2024-10-01T09:16:40.000Z": {
+ "cpuUsage": {
+ "min": 0.003319077875529473,
+ "q1": 1.0101034685479167,
+ "median": 1.0118171810142638,
+ "q3": 1.0208974318073034,
+ "max": 3.5577616386258963,
+ "format": "cores"
+ },
+ "memoryUsage": {
+ "min": 1.77057792E8,
+ "q1": 2.64523776E9,
+ "median": 2.651078656E9,
+ "q3": 2.693431296E9,
+ "max": 2.705133568E9,
+ "format": "bytes"
+ }
+ }
+ }
+ }
+ },
+ "long_term": {
+ "duration_in_hours": 360.0,
+ "notifications": {
+ "120001": {
+ "type": "info",
+ "message": "There is not enough data available to generate a recommendation.",
+ "code": 120001
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ ]
+ }
+ ],
+ "version": "v2.0",
+ "experiment_name": "human_eval_exp"
+ }
+]
+```
+
+
### Invalid Scenarios:
@@ -5049,6 +5549,11 @@ structured and easily interpretable way for users or external systems to access
+
+
+
+
+
---
diff --git a/manifests/autotune/performance-profiles/resource_optimization_local_monitoring.json b/manifests/autotune/performance-profiles/resource_optimization_local_monitoring.json
index add7fd4ca..d2e243127 100644
--- a/manifests/autotune/performance-profiles/resource_optimization_local_monitoring.json
+++ b/manifests/autotune/performance-profiles/resource_optimization_local_monitoring.json
@@ -412,6 +412,46 @@
"query": "max(last_over_time(timestamp((sum by (namespace) (container_cpu_usage_seconds_total{namespace=\"$NAMESPACE$\"})) > 0 )[15d:]))"
}
]
+ },
+ {
+ "name": "gpuCoreUsage",
+ "datasource": "prometheus",
+ "value_type": "double",
+ "kubernetes_object": "container",
+ "aggregation_functions": [
+ {
+ "function": "avg",
+ "query": "avg by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{exported_namespace=\"$NAMESPACE$\",exported_container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
+ },
+ {
+ "function": "max",
+ "query": "max by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{exported_namespace=\"$NAMESPACE$\",exported_container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
+ },
+ {
+ "function": "min",
+ "query": "min by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (min_over_time(DCGM_FI_DEV_GPU_UTIL{exported_namespace=\"$NAMESPACE$\",exported_container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
+ }
+ ]
+ },
+ {
+ "name": "gpuMemoryUsage",
+ "datasource": "prometheus",
+ "value_type": "double",
+ "kubernetes_object": "container",
+ "aggregation_functions": [
+ {
+ "function": "avg",
+ "query": "avg by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (avg_over_time(DCGM_FI_DEV_MEM_COPY_UTIL{exported_namespace=\"$NAMESPACE$\",exported_container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
+ },
+ {
+ "function": "max",
+ "query": "max by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (max_over_time(DCGM_FI_DEV_MEM_COPY_UTIL{exported_namespace=\"$NAMESPACE$\",exported_container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
+ },
+ {
+ "function": "min",
+ "query": "min by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (min_over_time(DCGM_FI_DEV_MEM_COPY_UTIL{exported_namespace=\"$NAMESPACE$\",exported_container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
+ }
+ ]
}
]
}
diff --git a/manifests/autotune/performance-profiles/resource_optimization_local_monitoring.yaml b/manifests/autotune/performance-profiles/resource_optimization_local_monitoring.yaml
index e638c07e9..92a68a6b2 100644
--- a/manifests/autotune/performance-profiles/resource_optimization_local_monitoring.yaml
+++ b/manifests/autotune/performance-profiles/resource_optimization_local_monitoring.yaml
@@ -247,167 +247,207 @@ slo:
- function: max
query: 'max by(namespace,container) (last_over_time((timestamp(container_cpu_usage_seconds_total{namespace="$NAMESPACE$", container="$CONTAINER_NAME$"} > 0))[15d:]))'
- ## namespace related queries
-
- # Namespace quota for CPU requests
- # Show namespace quota for CPU requests in cores for a namespace
- - name: namespaceCpuRequest
- datasource: prometheus
- value_type: "double"
- kubernetes_object: "namespace"
- aggregation_functions:
- # sum of all cpu request quotas for a namespace in cores
- - function: sum
- query: 'sum by (namespace) (kube_resourcequota{namespace="$NAMESPACE$", resource="requests.cpu", type="hard"})'
-
- # Namespace quota for CPU limits
- # Show namespace quota for CPU limits in cores for a namespace
- - name: namespaceCpuLimit
- datasource: prometheus
- value_type: "double"
- kubernetes_object: "namespace"
- aggregation_functions:
- # sum of all cpu limits quotas for a namespace in cores
- - function: sum
- query: 'sum by (namespace) (kube_resourcequota{namespace="$NAMESPACE$", resource="limits.cpu", type="hard"})'
-
-
- # Namespace quota for memory requests
- # Show namespace quota for memory requests in bytes for a namespace
- - name: namespaceMemoryRequest
- datasource: prometheus
- value_type: "double"
- kubernetes_object: "namespace"
- aggregation_functions:
- # sum of all memory requests quotas for a namespace in bytes
- - function: sum
- query: 'sum by (namespace) (kube_resourcequota{namespace="$NAMESPACE$", resource="requests.memory", type="hard"})'
-
-
- # Namespace quota for memory limits
- # Show namespace quota for memory limits in bytes for a namespace
- - name: namespaceMemoryLimit
- datasource: prometheus
- value_type: "double"
- kubernetes_object: "namespace"
- aggregation_functions:
- # sum of all memory limits quotas for a namespace in bytes
- - function: sum
- query: 'sum by (namespace) (kube_resourcequota{namespace="$NAMESPACE$", resource="limits.memory", type="hard"})'
-
-
- # Namespace CPU usage
- # Show cpu usages in cores for a namespace
- - name: namespaceCpuUsage
- datasource: prometheus
- value_type: "double"
- kubernetes_object: "namespace"
- aggregation_functions:
- # average cpu usages in cores for a namespace
- - function: avg
- query: 'avg_over_time(sum by(namespace) (node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ ## namespace related queries
- # maximum cpu usages in cores for a namespace
- - function: max
- query: 'max_over_time(sum by(namespace) (node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ # Namespace quota for CPU requests
+ # Show namespace quota for CPU requests in cores for a namespace
+ - name: namespaceCpuRequest
+ datasource: prometheus
+ value_type: "double"
+ kubernetes_object: "namespace"
+ aggregation_functions:
+ # sum of all cpu request quotas for a namespace in cores
+ - function: sum
+ query: 'sum by (namespace) (kube_resourcequota{namespace="$NAMESPACE$", resource="requests.cpu", type="hard"})'
- # minimum cpu usages in cores for a namespace
- - function: min
- query: 'min_over_time(sum by(namespace) (node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ # Namespace quota for CPU limits
+ # Show namespace quota for CPU limits in cores for a namespace
+ - name: namespaceCpuLimit
+ datasource: prometheus
+ value_type: "double"
+ kubernetes_object: "namespace"
+ aggregation_functions:
+ # sum of all cpu limits quotas for a namespace in cores
+ - function: sum
+ query: 'sum by (namespace) (kube_resourcequota{namespace="$NAMESPACE$", resource="limits.cpu", type="hard"})'
- # Namespace CPU Throttle
- # Show cpu throttle in cores for a namespace
- - name: namespaceCpuThrottle
- datasource: prometheus
- value_type: "double"
- kubernetes_object: "namespace"
- aggregation_functions:
- # average cpu throttle in cores for a namespace
- - function: avg
- query: 'avg_over_time(sum by(namespace) (rate(container_cpu_cfs_throttled_seconds_total{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""}[5m]))[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ # Namespace quota for memory requests
+ # Show namespace quota for memory requests in bytes for a namespace
+ - name: namespaceMemoryRequest
+ datasource: prometheus
+ value_type: "double"
+ kubernetes_object: "namespace"
+ aggregation_functions:
+ # sum of all memory requests quotas for a namespace in bytes
+ - function: sum
+ query: 'sum by (namespace) (kube_resourcequota{namespace="$NAMESPACE$", resource="requests.memory", type="hard"})'
- # maximum cpu throttle in cores for a namespace
- - function: max
- query: 'max_over_time(sum by(namespace) (rate(container_cpu_cfs_throttled_seconds_total{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""}[5m]))[$MEASUREMENT_DURATION_IN_MIN$m:])'
- # minimum cpu throttle in cores for a namespace
- - function: min
- query: 'min_over_time(sum by(namespace) (rate(container_cpu_cfs_throttled_seconds_total{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""}[5m]))[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ # Namespace quota for memory limits
+ # Show namespace quota for memory limits in bytes for a namespace
+ - name: namespaceMemoryLimit
+ datasource: prometheus
+ value_type: "double"
+ kubernetes_object: "namespace"
+ aggregation_functions:
+ # sum of all memory limits quotas for a namespace in bytes
+ - function: sum
+ query: 'sum by (namespace) (kube_resourcequota{namespace="$NAMESPACE$", resource="limits.memory", type="hard"})'
- # Namespace memory usage
- # Show memory usages in bytes for a namespace
- - name: namespaceMemoryUsage
- datasource: prometheus
- value_type: "double"
- kubernetes_object: "namespace"
- aggregation_functions:
- # average memory usage in bytes for a namespace
- - function: avg
- query: 'avg_over_time(sum by(namespace) (container_memory_working_set_bytes{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ # Namespace CPU usage
+ # Show cpu usages in cores for a namespace
+ - name: namespaceCpuUsage
+ datasource: prometheus
+ value_type: "double"
+ kubernetes_object: "namespace"
+ aggregation_functions:
+ # average cpu usages in cores for a namespace
+ - function: avg
+ query: 'avg_over_time(sum by(namespace) (node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
- # maximum memory usage in bytes for a namespace
- - function: max
- query: 'max_over_time(sum by(namespace) (container_memory_working_set_bytes{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ # maximum cpu usages in cores for a namespace
+ - function: max
+ query: 'max_over_time(sum by(namespace) (node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+
+ # minimum cpu usages in cores for a namespace
+ - function: min
+ query: 'min_over_time(sum by(namespace) (node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+
+
+ # Namespace CPU Throttle
+ # Show cpu throttle in cores for a namespace
+ - name: namespaceCpuThrottle
+ datasource: prometheus
+ value_type: "double"
+ kubernetes_object: "namespace"
+ aggregation_functions:
+ # average cpu throttle in cores for a namespace
+ - function: avg
+ query: 'avg_over_time(sum by(namespace) (rate(container_cpu_cfs_throttled_seconds_total{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""}[5m]))[$MEASUREMENT_DURATION_IN_MIN$m:])'
+
+ # maximum cpu throttle in cores for a namespace
+ - function: max
+ query: 'max_over_time(sum by(namespace) (rate(container_cpu_cfs_throttled_seconds_total{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""}[5m]))[$MEASUREMENT_DURATION_IN_MIN$m:])'
+
+ # minimum cpu throttle in cores for a namespace
+ - function: min
+ query: 'min_over_time(sum by(namespace) (rate(container_cpu_cfs_throttled_seconds_total{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""}[5m]))[$MEASUREMENT_DURATION_IN_MIN$m:])'
- # minimum memory usage in bytes for a namespace
- - function: min
- query: 'min_over_time(sum by(namespace) (container_memory_working_set_bytes{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ # Namespace memory usage
+ # Show memory usages in bytes for a namespace
+ - name: namespaceMemoryUsage
+ datasource: prometheus
+ value_type: "double"
+ kubernetes_object: "namespace"
+ aggregation_functions:
+ # average memory usage in bytes for a namespace
+ - function: avg
+ query: 'avg_over_time(sum by(namespace) (container_memory_working_set_bytes{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+
+ # maximum memory usage in bytes for a namespace
+ - function: max
+ query: 'max_over_time(sum by(namespace) (container_memory_working_set_bytes{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
- # Namespace memory rss value
- # Show memory rss in bytes for a namespace
- - name: namespaceMemoryRSS
- datasource: prometheus
- value_type: "double"
- kubernetes_object: "namespace"
- aggregation_functions:
- # average memory rss in bytes for a namespace
+ # minimum memory usage in bytes for a namespace
+ - function: min
+ query: 'min_over_time(sum by(namespace) (container_memory_working_set_bytes{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+
+
+ # Namespace memory rss value
+ # Show memory rss in bytes for a namespace
+ - name: namespaceMemoryRSS
+ datasource: prometheus
+ value_type: "double"
+ kubernetes_object: "namespace"
+ aggregation_functions:
+ # average memory rss in bytes for a namespace
+ - function: avg
+ query: 'avg_over_time(sum by(namespace) (container_memory_rss{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+
+ # maximum memory rss in bytes for a namespace
+ - function: max
+ query: 'max_over_time(sum by(namespace) (container_memory_rss{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+
+ # minimum memory rss in bytes for a namespace
+ - function: min
+ query: 'min_over_time(sum by(namespace) (container_memory_rss{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+
+
+ # Show total pods in a namespace
+ - name: namespaceTotalPods
+ datasource: prometheus
+ value_type: "double"
+ kubernetes_object: "namespace"
+ aggregation_functions:
+ # maximum total pods in a namespace
+ - function: max
+ query: 'max_over_time(sum by(namespace) ((kube_pod_info{namespace="$NAMESPACE$"}))[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ # average total pods in a namespace
+ - function: avg
+ query: 'avg_over_time(sum by(namespace) ((kube_pod_info{namespace="$NAMESPACE$"}))[$MEASUREMENT_DURATION_IN_MIN$m:])'
+
+
+ # Show total running pods in a namespace
+ - name: namespaceRunningPods
+ datasource: prometheus
+ value_type: "double"
+ kubernetes_object: "namespace"
+ aggregation_functions:
+ # maximum total pods in a namespace
+ - function: max
+ query: 'max_over_time(sum by(namespace) ((kube_pod_status_phase{phase="Running"}))[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ # average total pods in a namespace
+ - function: avg
+ query: 'avg_over_time(sum by(namespace) ((kube_pod_status_phase{phase="Running"}))[$MEASUREMENT_DURATION_IN_MIN$m:])'
+
+ # Show last activity for a namespace
+ - name: namespaceMaxDate
+ datasource: prometheus
+ value_type: "double"
+ kubernetes_object: "namespace"
+ aggregation_functions:
+ - function: max
+ query: 'max(last_over_time(timestamp((sum by (namespace) (container_cpu_usage_seconds_total{namespace="$NAMESPACE$"})) > 0 )[15d:]))'
+
+ # GPU Related metrics
+
+ # GPU Core Usage
+ - name: gpuCoreUsage
+ datasource: prometheus
+ value_type: "double"
+ kubernetes_object: "container"
+
+ aggregation_functions:
+ # Average GPU Core Usage Percentage per container in a deployment
- function: avg
- query: 'avg_over_time(sum by(namespace) (container_memory_rss{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ query: 'avg by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{exported_namespace="$NAMESPACE$",exported_container="$CONTAINER_NAME$"}[$MEASUREMENT_DURATION_IN_MIN$m])'
- # maximum memory rss in bytes for a namespace
+ # Maximum GPU Core Usage Percentage per container in a deployment
- function: max
- query: 'max_over_time(sum by(namespace) (container_memory_rss{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ query: 'max by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{exported_namespace="$NAMESPACE$",exported_container="$CONTAINER_NAME$"}[$MEASUREMENT_DURATION_IN_MIN$m])'
- # minimum memory rss in bytes for a namespace
+ # Minimum of GPU Core Usage Percentage for a container in a deployment
- function: min
- query: 'min_over_time(sum by(namespace) (container_memory_rss{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ query: 'min by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (min_over_time(DCGM_FI_DEV_GPU_UTIL{exported_namespace="$NAMESPACE$",exported_container="$CONTAINER_NAME$"}[$MEASUREMENT_DURATION_IN_MIN$m])'
+ # GPU Memory usage
+ - name: gpuMemoryUsage
+ datasource: prometheus
+ value_type: "double"
+ kubernetes_object: "container"
- # Show total pods in a namespace
- - name: namespaceTotalPods
- datasource: prometheus
- value_type: "double"
- kubernetes_object: "namespace"
- aggregation_functions:
- # maximum total pods in a namespace
- - function: max
- query: 'max_over_time(sum by(namespace) ((kube_pod_info{namespace="$NAMESPACE$"}))[$MEASUREMENT_DURATION_IN_MIN$m:])'
- # average total pods in a namespace
+ aggregation_functions:
+ # Average GPU Memory Usage Percentage per container in a deployment
- function: avg
- query: 'avg_over_time(sum by(namespace) ((kube_pod_info{namespace="$NAMESPACE$"}))[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ query: 'avg by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (avg_over_time(DCGM_FI_DEV_MEM_COPY_UTIL{exported_namespace="$NAMESPACE$",exported_container="$CONTAINER_NAME$"}[$MEASUREMENT_DURATION_IN_MIN$m])'
-
- # Show total running pods in a namespace
- - name: namespaceRunningPods
- datasource: prometheus
- value_type: "double"
- kubernetes_object: "namespace"
- aggregation_functions:
- # maximum total pods in a namespace
- - function: max
- query: 'max_over_time(sum by(namespace) ((kube_pod_status_phase{phase="Running"}))[$MEASUREMENT_DURATION_IN_MIN$m:])'
- # average total pods in a namespace
- - function: avg
- query: 'avg_over_time(sum by(namespace) ((kube_pod_status_phase{phase="Running"}))[$MEASUREMENT_DURATION_IN_MIN$m:])'
-
- # Show last activity for a namespace
- - name: namespaceMaxDate
- datasource: prometheus
- value_type: "double"
- kubernetes_object: "namespace"
- aggregation_functions:
+ # Maximum GPU Memory Usage Percentage per container in a deployment
- function: max
- query: 'max(last_over_time(timestamp((sum by (namespace) (container_cpu_usage_seconds_total{namespace="$NAMESPACE$"})) > 0 )[15d:]))'
+ query: 'max by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (max_over_time(DCGM_FI_DEV_MEM_COPY_UTIL{exported_namespace="$NAMESPACE$",exported_container="$CONTAINER_NAME$"}[$MEASUREMENT_DURATION_IN_MIN$m])'
+
+ # Minimum of GPU Memory Usage Percentage for a container in a deployment
+ - function: min
+ query: 'min by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (min_over_time(DCGM_FI_DEV_MEM_COPY_UTIL{exported_namespace="$NAMESPACE$",exported_container="$CONTAINER_NAME$"}[$MEASUREMENT_DURATION_IN_MIN$m])'
\ No newline at end of file
diff --git a/manifests/autotune/performance-profiles/resource_optimization_local_monitoring_norecordingrules.json b/manifests/autotune/performance-profiles/resource_optimization_local_monitoring_norecordingrules.json
index eeef1a07e..4f4d261ae 100644
--- a/manifests/autotune/performance-profiles/resource_optimization_local_monitoring_norecordingrules.json
+++ b/manifests/autotune/performance-profiles/resource_optimization_local_monitoring_norecordingrules.json
@@ -389,6 +389,46 @@
"query": "max(last_over_time(timestamp((sum by (namespace) (container_cpu_usage_seconds_total{namespace=\"$NAMESPACE$\"})) > 0 )[15d:]))"
}
]
+ },
+ {
+ "name": "gpuCoreUsage",
+ "datasource": "prometheus",
+ "value_type": "double",
+ "kubernetes_object": "container",
+ "aggregation_functions": [
+ {
+ "function": "avg",
+ "query": "avg by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{exported_namespace=\"$NAMESPACE$\",exported_container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
+ },
+ {
+ "function": "max",
+ "query": "max by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{exported_namespace=\"$NAMESPACE$\",exported_container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
+ },
+ {
+ "function": "min",
+ "query": "min by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (min_over_time(DCGM_FI_DEV_GPU_UTIL{exported_namespace=\"$NAMESPACE$\",exported_container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
+ }
+ ]
+ },
+ {
+ "name": "gpuMemoryUsage",
+ "datasource": "prometheus",
+ "value_type": "double",
+ "kubernetes_object": "container",
+ "aggregation_functions": [
+ {
+ "function": "avg",
+ "query": "avg by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (avg_over_time(DCGM_FI_DEV_MEM_COPY_UTIL{exported_namespace=\"$NAMESPACE$\",exported_container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
+ },
+ {
+ "function": "max",
+ "query": "max by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (max_over_time(DCGM_FI_DEV_MEM_COPY_UTIL{exported_namespace=\"$NAMESPACE$\",exported_container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
+ },
+ {
+ "function": "min",
+ "query": "min by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (min_over_time(DCGM_FI_DEV_MEM_COPY_UTIL{exported_namespace=\"$NAMESPACE$\",exported_container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
+ }
+ ]
}
]
}
diff --git a/manifests/autotune/performance-profiles/resource_optimization_local_monitoring_norecordingrules.yaml b/manifests/autotune/performance-profiles/resource_optimization_local_monitoring_norecordingrules.yaml
index 8a85c70e7..d50d42df1 100644
--- a/manifests/autotune/performance-profiles/resource_optimization_local_monitoring_norecordingrules.yaml
+++ b/manifests/autotune/performance-profiles/resource_optimization_local_monitoring_norecordingrules.yaml
@@ -210,168 +210,209 @@ slo:
- function: max
query: 'max by(namespace,container) (last_over_time((timestamp(container_cpu_usage_seconds_total{namespace="$NAMESPACE$", container="$CONTAINER_NAME$"} > 0))[15d:]))'
- ## namespace related queries
-
- # Namespace quota for CPU requests
- # Show namespace quota for CPU requests in cores for a namespace
- - name: namespaceCpuRequest
- datasource: prometheus
- value_type: "double"
- kubernetes_object: "namespace"
- aggregation_functions:
- # sum of all cpu request quotas for a namespace in cores
- - function: sum
- query: 'sum by (namespace) (kube_resourcequota{namespace="$NAMESPACE$", resource="requests.cpu", type="hard"})'
-
- # Namespace quota for CPU limits
- # Show namespace quota for CPU limits in cores for a namespace
- - name: namespaceCpuLimit
- datasource: prometheus
- value_type: "double"
- kubernetes_object: "namespace"
- aggregation_functions:
- # sum of all cpu limits quotas for a namespace in cores
- - function: sum
- query: 'sum by (namespace) (kube_resourcequota{namespace="$NAMESPACE$", resource="limits.cpu", type="hard"})'
-
-
- # Namespace quota for memory requests
- # Show namespace quota for memory requests in bytes for a namespace
- - name: namespaceMemoryRequest
- datasource: prometheus
- value_type: "double"
- kubernetes_object: "namespace"
- aggregation_functions:
- # sum of all memory requests quotas for a namespace in bytes
- - function: sum
- query: 'sum by (namespace) (kube_resourcequota{namespace="$NAMESPACE$", resource="requests.memory", type="hard"})'
-
-
- # Namespace quota for memory limits
- # Show namespace quota for memory limits in bytes for a namespace
- - name: namespaceMemoryLimit
- datasource: prometheus
- value_type: "double"
- kubernetes_object: "namespace"
- aggregation_functions:
- # sum of all memory limits quotas for a namespace in bytes
- - function: sum
- query: 'sum by (namespace) (kube_resourcequota{namespace="$NAMESPACE$", resource="limits.memory", type="hard"})'
-
-
- # Namespace CPU usage
- # Show cpu usages in cores for a namespace
- - name: namespaceCpuUsage
- datasource: prometheus
- value_type: "double"
- kubernetes_object: "namespace"
- aggregation_functions:
- # average cpu usages in cores for a namespace
- - function: avg
- query: 'avg_over_time(sum by(namespace) (rate(container_cpu_usage_seconds_total{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""}[5m]) )[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ ## namespace related queries
- # maximum cpu usages in cores for a namespace
- - function: max
- query: 'max_over_time(sum by(namespace) (rate(container_cpu_usage_seconds_total{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""}[5m]) )[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ # Namespace quota for CPU requests
+ # Show namespace quota for CPU requests in cores for a namespace
+ - name: namespaceCpuRequest
+ datasource: prometheus
+ value_type: "double"
+ kubernetes_object: "namespace"
+ aggregation_functions:
+ # sum of all cpu request quotas for a namespace in cores
+ - function: sum
+ query: 'sum by (namespace) (kube_resourcequota{namespace="$NAMESPACE$", resource="requests.cpu", type="hard"})'
- # minimum cpu usages in cores for a namespace
- - function: min
- query: 'min_over_time(sum by(namespace) (rate(container_cpu_usage_seconds_total{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""}[5m]) )[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ # Namespace quota for CPU limits
+ # Show namespace quota for CPU limits in cores for a namespace
+ - name: namespaceCpuLimit
+ datasource: prometheus
+ value_type: "double"
+ kubernetes_object: "namespace"
+ aggregation_functions:
+ # sum of all cpu limits quotas for a namespace in cores
+ - function: sum
+ query: 'sum by (namespace) (kube_resourcequota{namespace="$NAMESPACE$", resource="limits.cpu", type="hard"})'
- # Namespace CPU Throttle
- # Show cpu throttle in cores for a namespace
- - name: namespaceCpuThrottle
- datasource: prometheus
- value_type: "double"
- kubernetes_object: "namespace"
- aggregation_functions:
- # average cpu throttle in cores for a namespace
- - function: avg
- query: 'avg_over_time(sum by(namespace) (rate(container_cpu_cfs_throttled_seconds_total{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""}[5m]))[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ # Namespace quota for memory requests
+ # Show namespace quota for memory requests in bytes for a namespace
+ - name: namespaceMemoryRequest
+ datasource: prometheus
+ value_type: "double"
+ kubernetes_object: "namespace"
+ aggregation_functions:
+ # sum of all memory requests quotas for a namespace in bytes
+ - function: sum
+ query: 'sum by (namespace) (kube_resourcequota{namespace="$NAMESPACE$", resource="requests.memory", type="hard"})'
- # maximum cpu throttle in cores for a namespace
- - function: max
- query: 'max_over_time(sum by(namespace) (rate(container_cpu_cfs_throttled_seconds_total{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""}[5m]))[$MEASUREMENT_DURATION_IN_MIN$m:])'
- # minimum cpu throttle in cores for a namespace
- - function: min
- query: 'min_over_time(sum by(namespace) (rate(container_cpu_cfs_throttled_seconds_total{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""}[5m]))[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ # Namespace quota for memory limits
+ # Show namespace quota for memory limits in bytes for a namespace
+ - name: namespaceMemoryLimit
+ datasource: prometheus
+ value_type: "double"
+ kubernetes_object: "namespace"
+ aggregation_functions:
+ # sum of all memory limits quotas for a namespace in bytes
+ - function: sum
+ query: 'sum by (namespace) (kube_resourcequota{namespace="$NAMESPACE$", resource="limits.memory", type="hard"})'
- # Namespace memory usage
- # Show memory usages in bytes for a namespace
- - name: namespaceMemoryUsage
- datasource: prometheus
- value_type: "double"
- kubernetes_object: "namespace"
- aggregation_functions:
- # average memory usage in bytes for a namespace
- - function: avg
- query: 'avg_over_time(sum by(namespace) (container_memory_working_set_bytes{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ # Namespace CPU usage
+ # Show cpu usages in cores for a namespace
+ - name: namespaceCpuUsage
+ datasource: prometheus
+ value_type: "double"
+ kubernetes_object: "namespace"
+ aggregation_functions:
+ # average cpu usages in cores for a namespace
+ - function: avg
+ query: 'avg_over_time(sum by(namespace) (rate(container_cpu_usage_seconds_total{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""}[5m]) )[$MEASUREMENT_DURATION_IN_MIN$m:])'
- # maximum memory usage in bytes for a namespace
- - function: max
- query: 'max_over_time(sum by(namespace) (container_memory_working_set_bytes{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ # maximum cpu usages in cores for a namespace
+ - function: max
+ query: 'max_over_time(sum by(namespace) (rate(container_cpu_usage_seconds_total{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""}[5m]) )[$MEASUREMENT_DURATION_IN_MIN$m:])'
+
+ # minimum cpu usages in cores for a namespace
+ - function: min
+ query: 'min_over_time(sum by(namespace) (rate(container_cpu_usage_seconds_total{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""}[5m]) )[$MEASUREMENT_DURATION_IN_MIN$m:])'
+
+
+ # Namespace CPU Throttle
+ # Show cpu throttle in cores for a namespace
+ - name: namespaceCpuThrottle
+ datasource: prometheus
+ value_type: "double"
+ kubernetes_object: "namespace"
+ aggregation_functions:
+ # average cpu throttle in cores for a namespace
+ - function: avg
+ query: 'avg_over_time(sum by(namespace) (rate(container_cpu_cfs_throttled_seconds_total{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""}[5m]))[$MEASUREMENT_DURATION_IN_MIN$m:])'
+
+ # maximum cpu throttle in cores for a namespace
+ - function: max
+ query: 'max_over_time(sum by(namespace) (rate(container_cpu_cfs_throttled_seconds_total{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""}[5m]))[$MEASUREMENT_DURATION_IN_MIN$m:])'
+
+ # minimum cpu throttle in cores for a namespace
+ - function: min
+ query: 'min_over_time(sum by(namespace) (rate(container_cpu_cfs_throttled_seconds_total{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""}[5m]))[$MEASUREMENT_DURATION_IN_MIN$m:])'
+
+
+ # Namespace memory usage
+ # Show memory usages in bytes for a namespace
+ - name: namespaceMemoryUsage
+ datasource: prometheus
+ value_type: "double"
+ kubernetes_object: "namespace"
+ aggregation_functions:
+ # average memory usage in bytes for a namespace
+ - function: avg
+ query: 'avg_over_time(sum by(namespace) (container_memory_working_set_bytes{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+
+ # maximum memory usage in bytes for a namespace
+ - function: max
+ query: 'max_over_time(sum by(namespace) (container_memory_working_set_bytes{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+
+ # minimum memory usage in bytes for a namespace
+ - function: min
+ query: 'min_over_time(sum by(namespace) (container_memory_working_set_bytes{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+
+
+ # Namespace memory rss value
+ # Show memory rss in bytes for a namespace
+ - name: namespaceMemoryRSS
+ datasource: prometheus
+ value_type: "double"
+ kubernetes_object: "namespace"
+ aggregation_functions:
+ # average memory rss in bytes for a namespace
+ - function: avg
+ query: 'avg_over_time(sum by(namespace) (container_memory_rss{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+
+ # maximum memory rss in bytes for a namespace
+ - function: max
+ query: 'max_over_time(sum by(namespace) (container_memory_rss{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+
+ # minimum memory rss in bytes for a namespace
+ - function: min
+ query: 'min_over_time(sum by(namespace) (container_memory_rss{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+
+
+ # Show total pods in a namespace
+ - name: namespaceTotalPods
+ datasource: prometheus
+ value_type: "double"
+ kubernetes_object: "namespace"
+ aggregation_functions:
+ # maximum total pods in a namespace
+ - function: max
+ query: 'max_over_time(sum by(namespace) ((kube_pod_info{namespace="$NAMESPACE$"}))[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ # average total pods in a namespace
+ - function: avg
+ query: 'avg_over_time(sum by(namespace) ((kube_pod_info{namespace="$NAMESPACE$"}))[$MEASUREMENT_DURATION_IN_MIN$m:])'
+
+
+ # Show total running pods in a namespace
+ - name: namespaceRunningPods
+ datasource: prometheus
+ value_type: "double"
+ kubernetes_object: "namespace"
+ aggregation_functions:
+ # maximum total pods in a namespace
+ - function: max
+ query: 'max_over_time(sum by(namespace) ((kube_pod_status_phase{phase="Running"}))[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ # average total pods in a namespace
+ - function: avg
+ query: 'avg_over_time(sum by(namespace) ((kube_pod_status_phase{phase="Running"}))[$MEASUREMENT_DURATION_IN_MIN$m:])'
+
+ # Show last activity for a namespace
+ - name: namespaceMaxDate
+ datasource: prometheus
+ value_type: "double"
+ kubernetes_object: "namespace"
+ aggregation_functions:
+ - function: max
+ query: 'max(last_over_time(timestamp((sum by (namespace) (container_cpu_usage_seconds_total{namespace="$NAMESPACE$"})) > 0 )[15d:]))'
- # minimum memory usage in bytes for a namespace
- - function: min
- query: 'min_over_time(sum by(namespace) (container_memory_working_set_bytes{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ # GPU Related metrics
+
+ # GPU Core Usage
+ - name: gpuCoreUsage
+ datasource: prometheus
+ value_type: "double"
+ kubernetes_object: "container"
- # Namespace memory rss value
- # Show memory rss in bytes for a namespace
- - name: namespaceMemoryRSS
- datasource: prometheus
- value_type: "double"
- kubernetes_object: "namespace"
- aggregation_functions:
- # average memory rss in bytes for a namespace
+ aggregation_functions:
+ # Average GPU Core Usage Percentage per container in a deployment
- function: avg
- query: 'avg_over_time(sum by(namespace) (container_memory_rss{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ query: 'avg by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{exported_namespace="$NAMESPACE$",exported_container="$CONTAINER_NAME$"}[$MEASUREMENT_DURATION_IN_MIN$m])'
- # maximum memory rss in bytes for a namespace
+ # Maximum GPU Core Usage Percentage per container in a deployment
- function: max
- query: 'max_over_time(sum by(namespace) (container_memory_rss{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ query: 'max by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{exported_namespace="$NAMESPACE$",exported_container="$CONTAINER_NAME$"}[$MEASUREMENT_DURATION_IN_MIN$m])'
- # minimum memory rss in bytes for a namespace
+ # Minimum of GPU Core Usage Percentage for a container in a deployment
- function: min
- query: 'min_over_time(sum by(namespace) (container_memory_rss{namespace="$NAMESPACE$", container!="", container!="POD", pod!=""})[$MEASUREMENT_DURATION_IN_MIN$m:])'
+ query: 'min by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (min_over_time(DCGM_FI_DEV_GPU_UTIL{exported_namespace="$NAMESPACE$",exported_container="$CONTAINER_NAME$"}[$MEASUREMENT_DURATION_IN_MIN$m])'
+ # GPU Memory usage
+ - name: gpuMemoryUsage
+ datasource: prometheus
+ value_type: "double"
+ kubernetes_object: "container"
- # Show total pods in a namespace
- - name: namespaceTotalPods
- datasource: prometheus
- value_type: "double"
- kubernetes_object: "namespace"
- aggregation_functions:
- # maximum total pods in a namespace
- - function: max
- query: 'max_over_time(sum by(namespace) ((kube_pod_info{namespace="$NAMESPACE$"}))[$MEASUREMENT_DURATION_IN_MIN$m:])'
- # average total pods in a namespace
+ aggregation_functions:
+ # Average GPU Memory Usage Percentage per container in a deployment
- function: avg
- query: 'avg_over_time(sum by(namespace) ((kube_pod_info{namespace="$NAMESPACE$"}))[$MEASUREMENT_DURATION_IN_MIN$m:])'
-
+ query: 'avg by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (avg_over_time(DCGM_FI_DEV_MEM_COPY_UTIL{exported_namespace="$NAMESPACE$",exported_container="$CONTAINER_NAME$"}[$MEASUREMENT_DURATION_IN_MIN$m])'
- # Show total running pods in a namespace
- - name: namespaceRunningPods
- datasource: prometheus
- value_type: "double"
- kubernetes_object: "namespace"
- aggregation_functions:
- # maximum total pods in a namespace
- - function: max
- query: 'max_over_time(sum by(namespace) ((kube_pod_status_phase{phase="Running"}))[$MEASUREMENT_DURATION_IN_MIN$m:])'
- # average total pods in a namespace
- - function: avg
- query: 'avg_over_time(sum by(namespace) ((kube_pod_status_phase{phase="Running"}))[$MEASUREMENT_DURATION_IN_MIN$m:])'
-
- # Show last activity for a namespace
- - name: namespaceMaxDate
- datasource: prometheus
- value_type: "double"
- kubernetes_object: "namespace"
- aggregation_functions:
+ # Maximum GPU Memory Usage Percentage per container in a deployment
- function: max
- query: 'max(last_over_time(timestamp((sum by (namespace) (container_cpu_usage_seconds_total{namespace="$NAMESPACE$"})) > 0 )[15d:]))'
+ query: 'max by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (max_over_time(DCGM_FI_DEV_MEM_COPY_UTIL{exported_namespace="$NAMESPACE$",exported_container="$CONTAINER_NAME$"}[$MEASUREMENT_DURATION_IN_MIN$m])'
+
+ # Minimum of GPU Memory Usage Percentage for a container in a deployment
+ - function: min
+ query: 'min by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (min_over_time(DCGM_FI_DEV_MEM_COPY_UTIL{exported_namespace="$NAMESPACE$",exported_container="$CONTAINER_NAME$"}[$MEASUREMENT_DURATION_IN_MIN$m])'
diff --git a/manifests/crc/BYODB-installation/minikube/kruize-crc-minikube.yaml b/manifests/crc/BYODB-installation/minikube/kruize-crc-minikube.yaml
index 34e122da5..b556f04f2 100644
--- a/manifests/crc/BYODB-installation/minikube/kruize-crc-minikube.yaml
+++ b/manifests/crc/BYODB-installation/minikube/kruize-crc-minikube.yaml
@@ -35,6 +35,7 @@ data:
"plots": "true",
"local": "false",
"logAllHttpReqAndResp": "true",
+ "recommendationsURL" : "http://kruize.monitoring.svc.cluster.local:8080/generateRecommendations?experiment_name=%s",
"hibernate": {
"dialect": "org.hibernate.dialect.PostgreSQLDialect",
"driver": "org.postgresql.Driver",
diff --git a/manifests/crc/BYODB-installation/openshift/kruize-crc-openshift.yaml b/manifests/crc/BYODB-installation/openshift/kruize-crc-openshift.yaml
index 2528656be..889528b1b 100644
--- a/manifests/crc/BYODB-installation/openshift/kruize-crc-openshift.yaml
+++ b/manifests/crc/BYODB-installation/openshift/kruize-crc-openshift.yaml
@@ -48,6 +48,7 @@ data:
"plots": "true",
"local": "false",
"logAllHttpReqAndResp": "true",
+ "recommendationsURL" : "http://kruize.openshift-tuning.svc.cluster.local:8080/generateRecommendations?experiment_name=%s",
"hibernate": {
"dialect": "org.hibernate.dialect.PostgreSQLDialect",
"driver": "org.postgresql.Driver",
diff --git a/manifests/crc/default-db-included-installation/aks/kruize-crc-aks.yaml b/manifests/crc/default-db-included-installation/aks/kruize-crc-aks.yaml
index 5d7231b80..7d2fd6766 100644
--- a/manifests/crc/default-db-included-installation/aks/kruize-crc-aks.yaml
+++ b/manifests/crc/default-db-included-installation/aks/kruize-crc-aks.yaml
@@ -99,6 +99,7 @@ data:
"plots": "true",
"local": "false",
"logAllHttpReqAndResp": "true",
+ "recommendationsURL" : "http://kruize.monitoring.svc.cluster.local:8080/generateRecommendations?experiment_name=%s",
"hibernate": {
"dialect": "org.hibernate.dialect.PostgreSQLDialect",
"driver": "org.postgresql.Driver",
diff --git a/manifests/crc/default-db-included-installation/minikube/kruize-crc-minikube.yaml b/manifests/crc/default-db-included-installation/minikube/kruize-crc-minikube.yaml
index e8fe59ea8..2366cc669 100644
--- a/manifests/crc/default-db-included-installation/minikube/kruize-crc-minikube.yaml
+++ b/manifests/crc/default-db-included-installation/minikube/kruize-crc-minikube.yaml
@@ -113,6 +113,7 @@ data:
"plots": "true",
"local": "false",
"logAllHttpReqAndResp": "true",
+ "recommendationsURL" : "http://kruize.monitoring.svc.cluster.local:8080/generateRecommendations?experiment_name=%s",
"hibernate": {
"dialect": "org.hibernate.dialect.PostgreSQLDialect",
"driver": "org.postgresql.Driver",
diff --git a/manifests/crc/default-db-included-installation/openshift/kruize-crc-openshift.yaml b/manifests/crc/default-db-included-installation/openshift/kruize-crc-openshift.yaml
index 1f73ae042..bb76b0e0b 100644
--- a/manifests/crc/default-db-included-installation/openshift/kruize-crc-openshift.yaml
+++ b/manifests/crc/default-db-included-installation/openshift/kruize-crc-openshift.yaml
@@ -107,6 +107,7 @@ data:
"plots": "true",
"local": "false",
"logAllHttpReqAndResp": "true",
+ "recommendationsURL" : "http://kruize.openshift-tuning.svc.cluster.local:8080/generateRecommendations?experiment_name=%s",
"hibernate": {
"dialect": "org.hibernate.dialect.PostgreSQLDialect",
"driver": "org.postgresql.Driver",
diff --git a/src/main/java/com/autotune/analyzer/Analyzer.java b/src/main/java/com/autotune/analyzer/Analyzer.java
index 9ebf49199..0c2cea55b 100644
--- a/src/main/java/com/autotune/analyzer/Analyzer.java
+++ b/src/main/java/com/autotune/analyzer/Analyzer.java
@@ -58,6 +58,7 @@ public static void addServlets(ServletContextHandler context) {
context.addServlet(MetricProfileService.class, ServerContext.DELETE_METRIC_PROFILE);
context.addServlet(ListDatasources.class, ServerContext.LIST_DATASOURCES);
context.addServlet(DSMetadataService.class, ServerContext.DATASOURCE_METADATA);
+ context.addServlet(BulkService.class, ServerContext.BULK_SERVICE);
// Adding UI support API's
context.addServlet(ListNamespaces.class, ServerContext.LIST_NAMESPACES);
diff --git a/src/main/java/com/autotune/analyzer/adapters/DeviceDetailsAdapter.java b/src/main/java/com/autotune/analyzer/adapters/DeviceDetailsAdapter.java
new file mode 100644
index 000000000..57ceaf735
--- /dev/null
+++ b/src/main/java/com/autotune/analyzer/adapters/DeviceDetailsAdapter.java
@@ -0,0 +1,84 @@
+package com.autotune.analyzer.adapters;
+
+import com.autotune.analyzer.utils.AnalyzerConstants;
+import com.autotune.common.data.system.info.device.DeviceDetails;
+import com.autotune.common.data.system.info.device.accelerator.AcceleratorDeviceData;
+import com.google.gson.TypeAdapter;
+import com.google.gson.stream.JsonReader;
+import com.google.gson.stream.JsonWriter;
+import java.io.IOException;
+
+
+/**
+ * This adapter actually specifies the GSON to identify the type of implementation of DeviceDetails
+ * to serialize or deserialize
+ */
+public class DeviceDetailsAdapter extends TypeAdapter {
+
+ @Override
+ public void write(JsonWriter out, DeviceDetails value) throws IOException {
+ out.beginObject();
+ out.name("type").value(value.getType().name());
+
+ if (value instanceof AcceleratorDeviceData accelerator) {
+ out.name("manufacturer").value(accelerator.getManufacturer());
+ out.name("modelName").value(accelerator.getModelName());
+ out.name("hostName").value(accelerator.getHostName());
+ out.name("UUID").value(accelerator.getUUID());
+ out.name("deviceName").value(accelerator.getDeviceName());
+ out.name("isMIG").value(accelerator.isMIG());
+ }
+ // Add for other devices when added
+
+ out.endObject();
+ }
+
+ @Override
+ public DeviceDetails read(JsonReader in) throws IOException {
+ String type = null;
+ String manufacturer = null;
+ String modelName = null;
+ String hostName = null;
+ String UUID = null;
+ String deviceName = null;
+ boolean isMIG = false;
+
+ in.beginObject();
+ while (in.hasNext()) {
+ switch (in.nextName()) {
+ case "type":
+ type = in.nextString();
+ break;
+ case "manufacturer":
+ manufacturer = in.nextString();
+ break;
+ case "modelName":
+ modelName = in.nextString();
+ break;
+ case "hostName":
+ hostName = in.nextString();
+ break;
+ case "UUID":
+ UUID = in.nextString();
+ break;
+ case "deviceName":
+ deviceName = in.nextString();
+ break;
+ case "isMIG":
+ isMIG = in.nextBoolean();
+ break;
+ default:
+ in.skipValue();
+ }
+ }
+ in.endObject();
+
+ if (type != null && type.equals(AnalyzerConstants.DeviceType.ACCELERATOR.name())) {
+ return (DeviceDetails) new AcceleratorDeviceData(modelName, hostName, UUID, deviceName, isMIG);
+ }
+ // Add for other device types if implemented in future
+
+ return null;
+ }
+}
+
diff --git a/src/main/java/com/autotune/analyzer/adapters/RecommendationItemAdapter.java b/src/main/java/com/autotune/analyzer/adapters/RecommendationItemAdapter.java
new file mode 100644
index 000000000..79139fbc4
--- /dev/null
+++ b/src/main/java/com/autotune/analyzer/adapters/RecommendationItemAdapter.java
@@ -0,0 +1,43 @@
+
+package com.autotune.analyzer.adapters;
+
+import com.autotune.analyzer.utils.AnalyzerConstants;
+import com.google.gson.*;
+
+import java.lang.reflect.Type;
+
+/**
+ * Earlier the RecommendationItem enum has only two entries cpu and memory.
+ * At the time if serialization (store in DB or return as JSON via API)
+ * java has handled the toString conversion and have converted them to "cpu" and "memory" strings.
+ * They are also keys in the recommendation (requests & limits)
+ *
+ * But in case of NVIDIA the resources have / and . in their string representation of the MIG name.
+ * So we cannot add them as enums as is, So we had to create an entry which accepts a string
+ * and then the toString returns the string value of it.
+ *
+ * At the time of deserailization the string entries are converted to enum entries and vice versa in serialization.
+ * For example if the entry is NVIDIA_GPU_PARTITION_1_CORE_5GB("nvidia.com/mig-1g.5gb") then tostring of it
+ * will be nvidia.com/mig-1g.5gb which will not match the enum entry NVIDIA_GPU_PARTITION_1_CORE_5GB
+ *
+ * Also to maintain consistency we changed the cpu to CPU so without the adapter
+ * the JSON will be generated with CPU as the key.
+ */
+public class RecommendationItemAdapter implements JsonSerializer, JsonDeserializer {
+ @Override
+ public JsonElement serialize(AnalyzerConstants.RecommendationItem recommendationItem, Type type, JsonSerializationContext jsonSerializationContext) {
+ return jsonSerializationContext.serialize(recommendationItem.toString());
+ }
+
+
+ @Override
+ public AnalyzerConstants.RecommendationItem deserialize(JsonElement jsonElement, Type type, JsonDeserializationContext jsonDeserializationContext) throws JsonParseException {
+ String value = jsonElement.getAsString();
+ for (AnalyzerConstants.RecommendationItem item : AnalyzerConstants.RecommendationItem.values()) {
+ if (item.toString().equals(value)) {
+ return item;
+ }
+ }
+ throw new JsonParseException("Unknown element " + value);
+ }
+}
\ No newline at end of file
diff --git a/src/main/java/com/autotune/analyzer/exceptions/KruizeErrorHandler.java b/src/main/java/com/autotune/analyzer/exceptions/KruizeErrorHandler.java
index 0f7de32a8..1de629485 100644
--- a/src/main/java/com/autotune/analyzer/exceptions/KruizeErrorHandler.java
+++ b/src/main/java/com/autotune/analyzer/exceptions/KruizeErrorHandler.java
@@ -15,7 +15,9 @@
*******************************************************************************/
package com.autotune.analyzer.exceptions;
+import com.autotune.analyzer.adapters.RecommendationItemAdapter;
import com.autotune.analyzer.serviceObjects.FailedUpdateResultsAPIObject;
+import com.autotune.analyzer.utils.AnalyzerConstants;
import com.autotune.analyzer.utils.GsonUTCDateAdapter;
import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
@@ -56,6 +58,7 @@ public void handle(String target, Request baseRequest, HttpServletRequest reques
.disableHtmlEscaping()
.enableComplexMapKeySerialization()
.registerTypeAdapter(Date.class, new GsonUTCDateAdapter())
+ .registerTypeAdapter(AnalyzerConstants.RecommendationItem.class, new RecommendationItemAdapter())
.create();
String gsonStr = gsonObj.toJson(new KruizeResponse(origMessage, errorCode, "", "ERROR", myList));
diff --git a/src/main/java/com/autotune/analyzer/kruizeObject/CreateExperimentConfigBean.java b/src/main/java/com/autotune/analyzer/kruizeObject/CreateExperimentConfigBean.java
new file mode 100644
index 000000000..5303441f6
--- /dev/null
+++ b/src/main/java/com/autotune/analyzer/kruizeObject/CreateExperimentConfigBean.java
@@ -0,0 +1,111 @@
+/*******************************************************************************
+ * Copyright (c) 2022, 2022 Red Hat, IBM Corporation and others.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *******************************************************************************/
+package com.autotune.analyzer.kruizeObject;
+
+/**
+ * THis is a placeholder class for bulkAPI createExperiment template to store defaults
+ */
+public class CreateExperimentConfigBean {
+
+ // Private fields
+ private String mode;
+ private String target;
+ private String version;
+ private String datasourceName;
+ private String performanceProfile;
+ private double threshold;
+ private String measurementDurationStr;
+ private int measurementDuration;
+
+ // Getters and Setters
+ public String getMode() {
+ return mode;
+ }
+
+ public void setMode(String mode) {
+ this.mode = mode;
+ }
+
+ public String getTarget() {
+ return target;
+ }
+
+ public void setTarget(String target) {
+ this.target = target;
+ }
+
+ public String getVersion() {
+ return version;
+ }
+
+ public void setVersion(String version) {
+ this.version = version;
+ }
+
+ public String getDatasourceName() {
+ return datasourceName;
+ }
+
+ public void setDatasourceName(String datasourceName) {
+ this.datasourceName = datasourceName;
+ }
+
+ public String getPerformanceProfile() {
+ return performanceProfile;
+ }
+
+ public void setPerformanceProfile(String performanceProfile) {
+ this.performanceProfile = performanceProfile;
+ }
+
+ public double getThreshold() {
+ return threshold;
+ }
+
+ public void setThreshold(double threshold) {
+ this.threshold = threshold;
+ }
+
+ public String getMeasurementDurationStr() {
+ return measurementDurationStr;
+ }
+
+ public void setMeasurementDurationStr(String measurementDurationStr) {
+ this.measurementDurationStr = measurementDurationStr;
+ }
+
+ public int getMeasurementDuration() {
+ return measurementDuration;
+ }
+
+ public void setMeasurementDuration(int measurementDuration) {
+ this.measurementDuration = measurementDuration;
+ }
+
+ @Override
+ public String toString() {
+ return "MonitoringConfigBean{" +
+ "mode='" + mode + '\'' +
+ ", target='" + target + '\'' +
+ ", version='" + version + '\'' +
+ ", datasourceName='" + datasourceName + '\'' +
+ ", performanceProfile='" + performanceProfile + '\'' +
+ ", threshold=" + threshold +
+ ", measurementDurationStr='" + measurementDurationStr + '\'' +
+ ", measurementDuration=" + measurementDuration +
+ '}';
+ }
+}
diff --git a/src/main/java/com/autotune/analyzer/recommendations/RecommendationConstants.java b/src/main/java/com/autotune/analyzer/recommendations/RecommendationConstants.java
index 4cc4be488..d708331e9 100644
--- a/src/main/java/com/autotune/analyzer/recommendations/RecommendationConstants.java
+++ b/src/main/java/com/autotune/analyzer/recommendations/RecommendationConstants.java
@@ -738,6 +738,8 @@ public static class PercentileConstants {
public static final Integer TWENTYFIVE_PERCENTILE = 25;
public static final Integer SEVENTYFIVE_PERCENTILE = 75;
public static final Integer FIFTY_PERCENTILE = 50;
+ public static final Integer COST_ACCELERATOR_PERCENTILE = 60;
+ public static final Integer PERFORMANCE_ACCELERATOR_PERCENTILE = 98;
}
}
}
diff --git a/src/main/java/com/autotune/analyzer/recommendations/engine/RecommendationEngine.java b/src/main/java/com/autotune/analyzer/recommendations/engine/RecommendationEngine.java
index bb9a202be..86ea2ebe1 100644
--- a/src/main/java/com/autotune/analyzer/recommendations/engine/RecommendationEngine.java
+++ b/src/main/java/com/autotune/analyzer/recommendations/engine/RecommendationEngine.java
@@ -17,18 +17,14 @@
import com.autotune.analyzer.recommendations.utils.RecommendationUtils;
import com.autotune.analyzer.utils.AnalyzerConstants;
import com.autotune.analyzer.utils.AnalyzerErrorConstants;
-import com.autotune.analyzer.utils.ExperimentTypeUtil;
import com.autotune.common.data.ValidationOutputData;
-import com.autotune.common.data.metrics.AggregationFunctions;
-import com.autotune.common.data.metrics.Metric;
-import com.autotune.common.data.metrics.MetricAggregationInfoResults;
-import com.autotune.common.data.metrics.MetricResults;
+import com.autotune.common.data.metrics.*;
import com.autotune.common.data.result.ContainerData;
import com.autotune.common.data.result.IntervalResults;
import com.autotune.common.data.result.NamespaceData;
+import com.autotune.common.data.system.info.device.DeviceDetails;
+import com.autotune.common.data.system.info.device.accelerator.AcceleratorDeviceData;
import com.autotune.common.datasource.DataSourceInfo;
-import com.autotune.common.auth.AuthenticationStrategy;
-import com.autotune.common.auth.AuthenticationStrategyFactory;
import com.autotune.common.exceptions.DataSourceNotExist;
import com.autotune.common.k8sObjects.K8sObject;
import com.autotune.common.utils.CommonUtils;
@@ -435,12 +431,12 @@ RecommendationConfigItem>> getCurrentConfigData(ContainerData containerData, Tim
if (null == configItem)
continue;
if (null == configItem.getAmount()) {
- if (recommendationItem.equals(AnalyzerConstants.RecommendationItem.cpu)) {
+ if (recommendationItem.equals(AnalyzerConstants.RecommendationItem.CPU)) {
notifications.add(RecommendationConstants.RecommendationNotification.ERROR_AMOUNT_MISSING_IN_CPU_SECTION);
LOGGER.error(RecommendationConstants.RecommendationNotificationMsgConstant.AMOUNT_MISSING_IN_CPU_SECTION
.concat(String.format(AnalyzerErrorConstants.AutotuneObjectErrors.EXPERIMENT_AND_INTERVAL_END_TIME,
experimentName, interval_end_time)));
- } else if (recommendationItem.equals((AnalyzerConstants.RecommendationItem.memory))) {
+ } else if (recommendationItem.equals((AnalyzerConstants.RecommendationItem.MEMORY))) {
notifications.add(RecommendationConstants.RecommendationNotification.ERROR_AMOUNT_MISSING_IN_MEMORY_SECTION);
LOGGER.error(RecommendationConstants.RecommendationNotificationMsgConstant.AMOUNT_MISSING_IN_MEMORY_SECTION
.concat(String.format(AnalyzerErrorConstants.AutotuneObjectErrors.EXPERIMENT_AND_INTERVAL_END_TIME,
@@ -449,12 +445,12 @@ RecommendationConfigItem>> getCurrentConfigData(ContainerData containerData, Tim
continue;
}
if (null == configItem.getFormat()) {
- if (recommendationItem.equals(AnalyzerConstants.RecommendationItem.cpu)) {
+ if (recommendationItem.equals(AnalyzerConstants.RecommendationItem.CPU)) {
notifications.add(RecommendationConstants.RecommendationNotification.ERROR_FORMAT_MISSING_IN_CPU_SECTION);
LOGGER.error(RecommendationConstants.RecommendationNotificationMsgConstant.FORMAT_MISSING_IN_CPU_SECTION
.concat(String.format(AnalyzerErrorConstants.AutotuneObjectErrors.EXPERIMENT_AND_INTERVAL_END_TIME,
experimentName, interval_end_time)));
- } else if (recommendationItem.equals((AnalyzerConstants.RecommendationItem.memory))) {
+ } else if (recommendationItem.equals((AnalyzerConstants.RecommendationItem.MEMORY))) {
notifications.add(RecommendationConstants.RecommendationNotification.ERROR_FORMAT_MISSING_IN_MEMORY_SECTION);
LOGGER.error(RecommendationConstants.RecommendationNotificationMsgConstant.FORMAT_MISSING_IN_MEMORY_SECTION
.concat(String.format(AnalyzerErrorConstants.AutotuneObjectErrors.EXPERIMENT_AND_INTERVAL_END_TIME,
@@ -463,12 +459,12 @@ RecommendationConfigItem>> getCurrentConfigData(ContainerData containerData, Tim
continue;
}
if (configItem.getAmount() <= 0.0) {
- if (recommendationItem.equals(AnalyzerConstants.RecommendationItem.cpu)) {
+ if (recommendationItem.equals(AnalyzerConstants.RecommendationItem.CPU)) {
notifications.add(RecommendationConstants.RecommendationNotification.ERROR_INVALID_AMOUNT_IN_CPU_SECTION);
LOGGER.error(RecommendationConstants.RecommendationNotificationMsgConstant.INVALID_AMOUNT_IN_CPU_SECTION
.concat(String.format(AnalyzerErrorConstants.AutotuneObjectErrors.EXPERIMENT_AND_INTERVAL_END_TIME,
experimentName, interval_end_time)));
- } else if (recommendationItem.equals((AnalyzerConstants.RecommendationItem.memory))) {
+ } else if (recommendationItem.equals((AnalyzerConstants.RecommendationItem.MEMORY))) {
notifications.add(RecommendationConstants.RecommendationNotification.ERROR_INVALID_AMOUNT_IN_MEMORY_SECTION);
LOGGER.error(RecommendationConstants.RecommendationNotificationMsgConstant.INVALID_AMOUNT_IN_MEMORY_SECTION
.concat(String.format(AnalyzerErrorConstants.AutotuneObjectErrors.EXPERIMENT_AND_INTERVAL_END_TIME,
@@ -477,12 +473,12 @@ RecommendationConfigItem>> getCurrentConfigData(ContainerData containerData, Tim
continue;
}
if (configItem.getFormat().isEmpty() || configItem.getFormat().isBlank()) {
- if (recommendationItem.equals(AnalyzerConstants.RecommendationItem.cpu)) {
+ if (recommendationItem.equals(AnalyzerConstants.RecommendationItem.CPU)) {
notifications.add(RecommendationConstants.RecommendationNotification.ERROR_INVALID_FORMAT_IN_CPU_SECTION);
LOGGER.error(RecommendationConstants.RecommendationNotificationMsgConstant.INVALID_FORMAT_IN_CPU_SECTION
.concat(String.format(AnalyzerErrorConstants.AutotuneObjectErrors.EXPERIMENT_AND_INTERVAL_END_TIME,
experimentName, interval_end_time)));
- } else if (recommendationItem.equals((AnalyzerConstants.RecommendationItem.memory))) {
+ } else if (recommendationItem.equals((AnalyzerConstants.RecommendationItem.MEMORY))) {
notifications.add(RecommendationConstants.RecommendationNotification.ERROR_INVALID_FORMAT_IN_MEMORY_SECTION);
LOGGER.error(RecommendationConstants.RecommendationNotificationMsgConstant.INVALID_FORMAT_IN_MEMORY_SECTION
.concat(String.format(AnalyzerErrorConstants.AutotuneObjectErrors.EXPERIMENT_AND_INTERVAL_END_TIME,
@@ -668,20 +664,20 @@ private MappedRecommendationForModel generateRecommendationBasedOnModel(Timestam
if (currentConfigMap.containsKey(AnalyzerConstants.ResourceSetting.requests) && null != currentConfigMap.get(AnalyzerConstants.ResourceSetting.requests)) {
HashMap requestsMap = currentConfigMap.get(AnalyzerConstants.ResourceSetting.requests);
- if (requestsMap.containsKey(AnalyzerConstants.RecommendationItem.cpu) && null != requestsMap.get(AnalyzerConstants.RecommendationItem.cpu)) {
- currentCPURequest = requestsMap.get(AnalyzerConstants.RecommendationItem.cpu);
+ if (requestsMap.containsKey(AnalyzerConstants.RecommendationItem.CPU) && null != requestsMap.get(AnalyzerConstants.RecommendationItem.CPU)) {
+ currentCPURequest = requestsMap.get(AnalyzerConstants.RecommendationItem.CPU);
}
- if (requestsMap.containsKey(AnalyzerConstants.RecommendationItem.memory) && null != requestsMap.get(AnalyzerConstants.RecommendationItem.memory)) {
- currentMemRequest = requestsMap.get(AnalyzerConstants.RecommendationItem.memory);
+ if (requestsMap.containsKey(AnalyzerConstants.RecommendationItem.MEMORY) && null != requestsMap.get(AnalyzerConstants.RecommendationItem.MEMORY)) {
+ currentMemRequest = requestsMap.get(AnalyzerConstants.RecommendationItem.MEMORY);
}
}
if (currentConfigMap.containsKey(AnalyzerConstants.ResourceSetting.limits) && null != currentConfigMap.get(AnalyzerConstants.ResourceSetting.limits)) {
HashMap limitsMap = currentConfigMap.get(AnalyzerConstants.ResourceSetting.limits);
- if (limitsMap.containsKey(AnalyzerConstants.RecommendationItem.cpu) && null != limitsMap.get(AnalyzerConstants.RecommendationItem.cpu)) {
- currentCPULimit = limitsMap.get(AnalyzerConstants.RecommendationItem.cpu);
+ if (limitsMap.containsKey(AnalyzerConstants.RecommendationItem.CPU) && null != limitsMap.get(AnalyzerConstants.RecommendationItem.CPU)) {
+ currentCPULimit = limitsMap.get(AnalyzerConstants.RecommendationItem.CPU);
}
- if (limitsMap.containsKey(AnalyzerConstants.RecommendationItem.memory) && null != limitsMap.get(AnalyzerConstants.RecommendationItem.memory)) {
- currentMemLimit = limitsMap.get(AnalyzerConstants.RecommendationItem.memory);
+ if (limitsMap.containsKey(AnalyzerConstants.RecommendationItem.MEMORY) && null != limitsMap.get(AnalyzerConstants.RecommendationItem.MEMORY)) {
+ currentMemLimit = limitsMap.get(AnalyzerConstants.RecommendationItem.MEMORY);
}
}
if (null != monitoringStartTime) {
@@ -702,6 +698,7 @@ private MappedRecommendationForModel generateRecommendationBasedOnModel(Timestam
// Get the Recommendation Items
RecommendationConfigItem recommendationCpuRequest = model.getCPURequestRecommendation(filteredResultsMap, notifications);
RecommendationConfigItem recommendationMemRequest = model.getMemoryRequestRecommendation(filteredResultsMap, notifications);
+ Map recommendationAcceleratorRequestMap = model.getAcceleratorRequestRecommendation(filteredResultsMap, notifications);
// Get the Recommendation Items
// Calling requests on limits as we are maintaining limits and requests as same
@@ -732,7 +729,8 @@ private MappedRecommendationForModel generateRecommendationBasedOnModel(Timestam
internalMapToPopulate,
numPods,
cpuThreshold,
- memoryThreshold
+ memoryThreshold,
+ recommendationAcceleratorRequestMap
);
} else {
RecommendationNotification notification = new RecommendationNotification(
@@ -826,40 +824,40 @@ private HashMap requestsMap = currentNamespaceConfigMap.get(AnalyzerConstants.ResourceSetting.requests);
- if (requestsMap.containsKey(AnalyzerConstants.RecommendationItem.cpu) && null != requestsMap.get(AnalyzerConstants.RecommendationItem.cpu)) {
- currentNamespaceCPURequest = requestsMap.get(AnalyzerConstants.RecommendationItem.cpu);
+ if (requestsMap.containsKey(AnalyzerConstants.RecommendationItem.CPU) && null != requestsMap.get(AnalyzerConstants.RecommendationItem.CPU)) {
+ currentNamespaceCPURequest = requestsMap.get(AnalyzerConstants.RecommendationItem.CPU);
}
- if (requestsMap.containsKey(AnalyzerConstants.RecommendationItem.memory) && null != requestsMap.get(AnalyzerConstants.RecommendationItem.memory)) {
- currentNamespaceMemRequest = requestsMap.get(AnalyzerConstants.RecommendationItem.memory);
+ if (requestsMap.containsKey(AnalyzerConstants.RecommendationItem.MEMORY) && null != requestsMap.get(AnalyzerConstants.RecommendationItem.MEMORY)) {
+ currentNamespaceMemRequest = requestsMap.get(AnalyzerConstants.RecommendationItem.MEMORY);
}
}
if (currentNamespaceConfigMap.containsKey(AnalyzerConstants.ResourceSetting.limits) && null != currentNamespaceConfigMap.get(AnalyzerConstants.ResourceSetting.limits)) {
HashMap limitsMap = currentNamespaceConfigMap.get(AnalyzerConstants.ResourceSetting.limits);
- if (limitsMap.containsKey(AnalyzerConstants.RecommendationItem.cpu) && null != limitsMap.get(AnalyzerConstants.RecommendationItem.cpu)) {
- currentNamespaceCPULimit = limitsMap.get(AnalyzerConstants.RecommendationItem.cpu);
+ if (limitsMap.containsKey(AnalyzerConstants.RecommendationItem.CPU) && null != limitsMap.get(AnalyzerConstants.RecommendationItem.CPU)) {
+ currentNamespaceCPULimit = limitsMap.get(AnalyzerConstants.RecommendationItem.CPU);
}
- if (limitsMap.containsKey(AnalyzerConstants.RecommendationItem.memory) && null != limitsMap.get(AnalyzerConstants.RecommendationItem.memory)) {
- currentNamespaceMemLimit = limitsMap.get(AnalyzerConstants.RecommendationItem.memory);
+ if (limitsMap.containsKey(AnalyzerConstants.RecommendationItem.MEMORY) && null != limitsMap.get(AnalyzerConstants.RecommendationItem.MEMORY)) {
+ currentNamespaceMemLimit = limitsMap.get(AnalyzerConstants.RecommendationItem.MEMORY);
}
}
if (null != monitoringStartTime) {
@@ -1081,7 +1079,8 @@ private MappedRecommendationForModel generateNamespaceRecommendationBasedOnModel
internalMapToPopulate,
numPodsInNamespace,
namespaceCpuThreshold,
- namespaceMemoryThreshold
+ namespaceMemoryThreshold,
+ null
);
} else {
RecommendationNotification notification = new RecommendationNotification(
@@ -1104,13 +1103,17 @@ private MappedRecommendationForModel generateNamespaceRecommendationBasedOnModel
* @param numPods The number of pods to consider for the recommendation.
* @param cpuThreshold The CPU usage threshold for the recommendation.
* @param memoryThreshold The memory usage threshold for the recommendation.
+ * @param recommendationAcceleratorRequestMap The Map which has Accelerator recommendations
* @return {@code true} if the internal map was successfully populated; {@code false} otherwise.
*/
private boolean populateRecommendation(Map.Entry termEntry,
MappedRecommendationForModel recommendationModel,
ArrayList notifications,
HashMap internalMapToPopulate,
- int numPods, double cpuThreshold, double memoryThreshold) {
+ int numPods,
+ double cpuThreshold,
+ double memoryThreshold,
+ Map recommendationAcceleratorRequestMap) {
// Check for cpu & memory Thresholds (Duplicate check if the caller is generateRecommendations)
String recommendationTerm = termEntry.getKey();
double hours = termEntry.getValue().getDays() * KruizeConstants.TimeConv.NO_OF_HOURS_PER_DAY * KruizeConstants.TimeConv.
@@ -1273,7 +1276,7 @@ private boolean populateRecommendation(Map.Entry termEntry,
generatedCpuRequestFormat = recommendationCpuRequest.getFormat();
if (null != generatedCpuRequestFormat && !generatedCpuRequestFormat.isEmpty()) {
isRecommendedCPURequestAvailable = true;
- requestsMap.put(AnalyzerConstants.RecommendationItem.cpu, recommendationCpuRequest);
+ requestsMap.put(AnalyzerConstants.RecommendationItem.CPU, recommendationCpuRequest);
} else {
RecommendationNotification recommendationNotification = new RecommendationNotification(RecommendationConstants.RecommendationNotification.ERROR_FORMAT_MISSING_IN_CPU_SECTION);
notifications.add(recommendationNotification);
@@ -1289,7 +1292,7 @@ private boolean populateRecommendation(Map.Entry termEntry,
generatedMemRequestFormat = recommendationMemRequest.getFormat();
if (null != generatedMemRequestFormat && !generatedMemRequestFormat.isEmpty()) {
isRecommendedMemoryRequestAvailable = true;
- requestsMap.put(AnalyzerConstants.RecommendationItem.memory, recommendationMemRequest);
+ requestsMap.put(AnalyzerConstants.RecommendationItem.MEMORY, recommendationMemRequest);
} else {
RecommendationNotification recommendationNotification = new RecommendationNotification(RecommendationConstants.RecommendationNotification.ERROR_FORMAT_MISSING_IN_MEMORY_SECTION);
notifications.add(recommendationNotification);
@@ -1325,7 +1328,7 @@ private boolean populateRecommendation(Map.Entry termEntry,
generatedCpuLimitFormat = recommendationCpuLimits.getFormat();
if (null != generatedCpuLimitFormat && !generatedCpuLimitFormat.isEmpty()) {
isRecommendedCPULimitAvailable = true;
- limitsMap.put(AnalyzerConstants.RecommendationItem.cpu, recommendationCpuLimits);
+ limitsMap.put(AnalyzerConstants.RecommendationItem.CPU, recommendationCpuLimits);
} else {
RecommendationNotification recommendationNotification = new RecommendationNotification(RecommendationConstants.RecommendationNotification.ERROR_FORMAT_MISSING_IN_CPU_SECTION);
notifications.add(recommendationNotification);
@@ -1341,7 +1344,7 @@ private boolean populateRecommendation(Map.Entry termEntry,
generatedMemLimitFormat = recommendationMemLimits.getFormat();
if (null != generatedMemLimitFormat && !generatedMemLimitFormat.isEmpty()) {
isRecommendedMemoryLimitAvailable = true;
- limitsMap.put(AnalyzerConstants.RecommendationItem.memory, recommendationMemLimits);
+ limitsMap.put(AnalyzerConstants.RecommendationItem.MEMORY, recommendationMemLimits);
} else {
RecommendationNotification recommendationNotification = new RecommendationNotification(RecommendationConstants.RecommendationNotification.ERROR_FORMAT_MISSING_IN_MEMORY_SECTION);
notifications.add(recommendationNotification);
@@ -1373,7 +1376,7 @@ private boolean populateRecommendation(Map.Entry termEntry,
experimentName, interval_end_time)));
} else {
isCurrentCPURequestAvailable = true;
- currentRequestsMap.put(AnalyzerConstants.RecommendationItem.cpu, currentCpuRequest);
+ currentRequestsMap.put(AnalyzerConstants.RecommendationItem.CPU, currentCpuRequest);
}
}
@@ -1393,7 +1396,7 @@ private boolean populateRecommendation(Map.Entry termEntry,
experimentName, interval_end_time)));
} else {
isCurrentMemoryRequestAvailable = true;
- currentRequestsMap.put(AnalyzerConstants.RecommendationItem.memory, currentMemRequest);
+ currentRequestsMap.put(AnalyzerConstants.RecommendationItem.MEMORY, currentMemRequest);
}
}
@@ -1416,7 +1419,7 @@ private boolean populateRecommendation(Map.Entry termEntry,
experimentName, interval_end_time)));
} else {
isCurrentCPULimitAvailable = true;
- currentLimitsMap.put(AnalyzerConstants.RecommendationItem.cpu, currentCpuLimit);
+ currentLimitsMap.put(AnalyzerConstants.RecommendationItem.CPU, currentCpuLimit);
}
}
@@ -1436,7 +1439,7 @@ private boolean populateRecommendation(Map.Entry termEntry,
experimentName, interval_end_time)));
} else {
isCurrentMemoryLimitAvailable = true;
- currentLimitsMap.put(AnalyzerConstants.RecommendationItem.memory, currentMemLimit);
+ currentLimitsMap.put(AnalyzerConstants.RecommendationItem.MEMORY, currentMemLimit);
}
}
@@ -1454,7 +1457,7 @@ private boolean populateRecommendation(Map.Entry termEntry,
// TODO: If difference is positive it can be considered as under-provisioning, Need to handle it better
isVariationCPURequestAvailable = true;
variationCpuRequest = new RecommendationConfigItem(diff, generatedCpuRequestFormat);
- requestsVariationMap.put(AnalyzerConstants.RecommendationItem.cpu, variationCpuRequest);
+ requestsVariationMap.put(AnalyzerConstants.RecommendationItem.CPU, variationCpuRequest);
}
double currentMemRequestValue = 0.0;
@@ -1466,7 +1469,7 @@ private boolean populateRecommendation(Map.Entry termEntry,
// TODO: If difference is positive it can be considered as under-provisioning, Need to handle it better
isVariationMemoryRequestAvailable = true;
variationMemRequest = new RecommendationConfigItem(diff, generatedMemRequestFormat);
- requestsVariationMap.put(AnalyzerConstants.RecommendationItem.memory, variationMemRequest);
+ requestsVariationMap.put(AnalyzerConstants.RecommendationItem.MEMORY, variationMemRequest);
}
// Create a new map for storing variation in limits
@@ -1483,7 +1486,7 @@ private boolean populateRecommendation(Map.Entry termEntry,
double diff = generatedCpuLimit - currentCpuLimitValue;
isVariationCPULimitAvailable = true;
variationCpuLimit = new RecommendationConfigItem(diff, generatedCpuLimitFormat);
- limitsVariationMap.put(AnalyzerConstants.RecommendationItem.cpu, variationCpuLimit);
+ limitsVariationMap.put(AnalyzerConstants.RecommendationItem.CPU, variationCpuLimit);
}
double currentMemLimitValue = 0.0;
@@ -1494,7 +1497,7 @@ private boolean populateRecommendation(Map.Entry termEntry,
double diff = generatedMemLimit - currentMemLimitValue;
isVariationMemoryLimitAvailable = true;
variationMemLimit = new RecommendationConfigItem(diff, generatedMemLimitFormat);
- limitsVariationMap.put(AnalyzerConstants.RecommendationItem.memory, variationMemLimit);
+ limitsVariationMap.put(AnalyzerConstants.RecommendationItem.MEMORY, variationMemLimit);
}
// build the engine level notifications here
@@ -1535,23 +1538,23 @@ private boolean populateRecommendation(Map.Entry termEntry,
// Alternative - CPU REQUEST VALUE
// Accessing existing recommendation item
- RecommendationConfigItem tempAccessedRecCPURequest = requestsMap.get(AnalyzerConstants.RecommendationItem.cpu);
+ RecommendationConfigItem tempAccessedRecCPURequest = requestsMap.get(AnalyzerConstants.RecommendationItem.CPU);
if (null != tempAccessedRecCPURequest) {
// Updating it with desired value
tempAccessedRecCPURequest.setAmount(currentCpuRequestValue);
}
// Replace the updated object (Step not needed as we are updating existing object, but just to make sure it's updated)
- requestsMap.put(AnalyzerConstants.RecommendationItem.cpu, tempAccessedRecCPURequest);
+ requestsMap.put(AnalyzerConstants.RecommendationItem.CPU, tempAccessedRecCPURequest);
// Alternative - CPU REQUEST VARIATION VALUE
// Accessing existing recommendation item
- RecommendationConfigItem tempAccessedRecCPURequestVariation = requestsVariationMap.get(AnalyzerConstants.RecommendationItem.cpu);
+ RecommendationConfigItem tempAccessedRecCPURequestVariation = requestsVariationMap.get(AnalyzerConstants.RecommendationItem.CPU);
if (null != tempAccessedRecCPURequestVariation) {
// Updating it with desired value (as we are setting to current variation would be 0)
tempAccessedRecCPURequestVariation.setAmount(CPU_ZERO);
}
// Replace the updated object (Step not needed as we are updating existing object, but just to make sure it's updated)
- requestsVariationMap.put(AnalyzerConstants.RecommendationItem.cpu, tempAccessedRecCPURequestVariation);
+ requestsVariationMap.put(AnalyzerConstants.RecommendationItem.CPU, tempAccessedRecCPURequestVariation);
RecommendationNotification recommendationNotification = new RecommendationNotification(RecommendationConstants.RecommendationNotification.NOTICE_CPU_REQUESTS_OPTIMISED);
engineNotifications.add(recommendationNotification);
@@ -1575,23 +1578,23 @@ private boolean populateRecommendation(Map.Entry termEntry,
// Alternative - CPU LIMIT VALUE
// Accessing existing recommendation item
- RecommendationConfigItem tempAccessedRecCPULimit = limitsMap.get(AnalyzerConstants.RecommendationItem.cpu);
+ RecommendationConfigItem tempAccessedRecCPULimit = limitsMap.get(AnalyzerConstants.RecommendationItem.CPU);
if (null != tempAccessedRecCPULimit) {
// Updating it with desired value
tempAccessedRecCPULimit.setAmount(currentCpuLimitValue);
}
// Replace the updated object (Step not needed as we are updating existing object, but just to make sure it's updated)
- limitsMap.put(AnalyzerConstants.RecommendationItem.cpu, tempAccessedRecCPULimit);
+ limitsMap.put(AnalyzerConstants.RecommendationItem.CPU, tempAccessedRecCPULimit);
// Alternative - CPU LIMIT VARIATION VALUE
// Accessing existing recommendation item
- RecommendationConfigItem tempAccessedRecCPULimitVariation = limitsVariationMap.get(AnalyzerConstants.RecommendationItem.cpu);
+ RecommendationConfigItem tempAccessedRecCPULimitVariation = limitsVariationMap.get(AnalyzerConstants.RecommendationItem.CPU);
if (null != tempAccessedRecCPULimitVariation) {
// Updating it with desired value (as we are setting to current variation would be 0)
tempAccessedRecCPULimitVariation.setAmount(CPU_ZERO);
}
// Replace the updated object (Step not needed as we are updating existing object, but just to make sure it's updated)
- limitsVariationMap.put(AnalyzerConstants.RecommendationItem.cpu, tempAccessedRecCPULimitVariation);
+ limitsVariationMap.put(AnalyzerConstants.RecommendationItem.CPU, tempAccessedRecCPULimitVariation);
RecommendationNotification recommendationNotification = new RecommendationNotification(RecommendationConstants.RecommendationNotification.NOTICE_CPU_LIMITS_OPTIMISED);
engineNotifications.add(recommendationNotification);
@@ -1615,23 +1618,23 @@ private boolean populateRecommendation(Map.Entry termEntry,
// Alternative - MEMORY REQUEST VALUE
// Accessing existing recommendation item
- RecommendationConfigItem tempAccessedRecMemoryRequest = requestsMap.get(AnalyzerConstants.RecommendationItem.memory);
+ RecommendationConfigItem tempAccessedRecMemoryRequest = requestsMap.get(AnalyzerConstants.RecommendationItem.MEMORY);
if (null != tempAccessedRecMemoryRequest) {
// Updating it with desired value
tempAccessedRecMemoryRequest.setAmount(currentMemRequestValue);
}
// Replace the updated object (Step not needed as we are updating existing object, but just to make sure it's updated)
- requestsMap.put(AnalyzerConstants.RecommendationItem.memory, tempAccessedRecMemoryRequest);
+ requestsMap.put(AnalyzerConstants.RecommendationItem.MEMORY, tempAccessedRecMemoryRequest);
// Alternative - MEMORY REQUEST VARIATION VALUE
// Accessing existing recommendation item
- RecommendationConfigItem tempAccessedRecMemoryRequestVariation = requestsVariationMap.get(AnalyzerConstants.RecommendationItem.memory);
+ RecommendationConfigItem tempAccessedRecMemoryRequestVariation = requestsVariationMap.get(AnalyzerConstants.RecommendationItem.MEMORY);
if (null != tempAccessedRecMemoryRequestVariation) {
// Updating it with desired value (as we are setting to current variation would be 0)
tempAccessedRecMemoryRequestVariation.setAmount(MEM_ZERO);
}
// Replace the updated object (Step not needed as we are updating existing object, but just to make sure it's updated)
- requestsVariationMap.put(AnalyzerConstants.RecommendationItem.memory, tempAccessedRecMemoryRequestVariation);
+ requestsVariationMap.put(AnalyzerConstants.RecommendationItem.MEMORY, tempAccessedRecMemoryRequestVariation);
RecommendationNotification recommendationNotification = new RecommendationNotification(RecommendationConstants.RecommendationNotification.NOTICE_MEMORY_REQUESTS_OPTIMISED);
engineNotifications.add(recommendationNotification);
@@ -1655,23 +1658,23 @@ private boolean populateRecommendation(Map.Entry termEntry,
// Alternative - MEMORY LIMIT VALUE
// Accessing existing recommendation item
- RecommendationConfigItem tempAccessedRecMemoryLimit = limitsMap.get(AnalyzerConstants.RecommendationItem.memory);
+ RecommendationConfigItem tempAccessedRecMemoryLimit = limitsMap.get(AnalyzerConstants.RecommendationItem.MEMORY);
if (null != tempAccessedRecMemoryLimit) {
// Updating it with desired value
tempAccessedRecMemoryLimit.setAmount(currentMemLimitValue);
}
// Replace the updated object (Step not needed as we are updating existing object, but just to make sure it's updated)
- limitsMap.put(AnalyzerConstants.RecommendationItem.memory, tempAccessedRecMemoryLimit);
+ limitsMap.put(AnalyzerConstants.RecommendationItem.MEMORY, tempAccessedRecMemoryLimit);
// Alternative - MEMORY LIMIT VARIATION VALUE
// Accessing existing recommendation item
- RecommendationConfigItem tempAccessedRecMemoryLimitVariation = limitsVariationMap.get(AnalyzerConstants.RecommendationItem.memory);
+ RecommendationConfigItem tempAccessedRecMemoryLimitVariation = limitsVariationMap.get(AnalyzerConstants.RecommendationItem.MEMORY);
if (null != tempAccessedRecMemoryLimitVariation) {
// Updating it with desired value (as we are setting to current variation would be 0)
tempAccessedRecMemoryLimitVariation.setAmount(MEM_ZERO);
}
// Replace the updated object (Step not needed as we are updating existing object, but just to make sure it's updated)
- limitsVariationMap.put(AnalyzerConstants.RecommendationItem.memory, tempAccessedRecMemoryLimitVariation);
+ limitsVariationMap.put(AnalyzerConstants.RecommendationItem.MEMORY, tempAccessedRecMemoryLimitVariation);
RecommendationNotification recommendationNotification = new RecommendationNotification(RecommendationConstants.RecommendationNotification.NOTICE_MEMORY_LIMITS_OPTIMISED);
engineNotifications.add(recommendationNotification);
@@ -1694,6 +1697,11 @@ private boolean populateRecommendation(Map.Entry termEntry,
config.put(AnalyzerConstants.ResourceSetting.requests, requestsMap);
}
+ // Check if accelerator map is not empty and add to limits map
+ if (null != recommendationAcceleratorRequestMap && !recommendationAcceleratorRequestMap.isEmpty()) {
+ limitsMap.putAll(recommendationAcceleratorRequestMap);
+ }
+
// Set Limits Map
if (!limitsMap.isEmpty()) {
config.put(AnalyzerConstants.ResourceSetting.limits, limitsMap);
@@ -1808,9 +1816,17 @@ public void fetchMetricsBasedOnProfileAndDatasource(KruizeObject kruizeObject, T
}
String maxDateQuery = null;
+ String acceleratorDetectionQuery = null;
if (kruizeObject.isContainerExperiment()) {
maxDateQuery = getMaxDateQuery(metricProfile, AnalyzerConstants.MetricName.maxDate.name());
- fetchContainerMetricsBasedOnDataSourceAndProfile(kruizeObject, interval_end_time, interval_start_time, dataSourceInfo, metricProfile, maxDateQuery);
+ acceleratorDetectionQuery = getMaxDateQuery(metricProfile, AnalyzerConstants.MetricName.gpuMemoryUsage.name());
+ fetchContainerMetricsBasedOnDataSourceAndProfile(kruizeObject,
+ interval_end_time,
+ interval_start_time,
+ dataSourceInfo,
+ metricProfile,
+ maxDateQuery,
+ acceleratorDetectionQuery);
} else if (kruizeObject.isNamespaceExperiment()) {
maxDateQuery = getMaxDateQuery(metricProfile, AnalyzerConstants.MetricName.namespaceMaxDate.name());
fetchNamespaceMetricsBasedOnDataSourceAndProfile(kruizeObject, interval_end_time, interval_start_time, dataSourceInfo, metricProfile, maxDateQuery);
@@ -1897,9 +1913,8 @@ private void fetchNamespaceMetricsBasedOnDataSourceAndProfile(KruizeObject kruiz
k8sObject.setNamespaceData(namespaceData);
}
- List namespaceMetricList = metricProfile.getSloInfo().getFunctionVariables().stream()
- .filter(metricEntry -> metricEntry.getName().startsWith(AnalyzerConstants.NAMESPACE) && !metricEntry.getName().equals("namespaceMaxDate"))
- .toList();
+ List namespaceMetricList = filterMetricsBasedOnExpTypeAndK8sObject(metricProfile,
+ AnalyzerConstants.MetricName.namespaceMaxDate.name(), kruizeObject.getExperimentType());
// Iterate over metrics and aggregation functions
for (Metric metricEntry : namespaceMetricList) {
@@ -1978,7 +1993,7 @@ private void fetchNamespaceMetricsBasedOnDataSourceAndProfile(KruizeObject kruiz
/**
- * Fetches namespace metrics based on the specified datasource using queries from the metricProfile for the given time interval.
+ * Fetches Container metrics based on the specified datasource using queries from the metricProfile for the given time interval.
*
* @param kruizeObject KruizeObject
* @param interval_end_time The end time of the interval in the format yyyy-MM-ddTHH:mm:sssZ
@@ -1988,7 +2003,13 @@ private void fetchNamespaceMetricsBasedOnDataSourceAndProfile(KruizeObject kruiz
* @param maxDateQuery max date query for containers
* @throws Exception
*/
- private void fetchContainerMetricsBasedOnDataSourceAndProfile(KruizeObject kruizeObject, Timestamp interval_end_time, Timestamp interval_start_time, DataSourceInfo dataSourceInfo, PerformanceProfile metricProfile, String maxDateQuery) throws Exception, FetchMetricsError {
+ private void fetchContainerMetricsBasedOnDataSourceAndProfile(KruizeObject kruizeObject,
+ Timestamp interval_end_time,
+ Timestamp interval_start_time,
+ DataSourceInfo dataSourceInfo,
+ PerformanceProfile metricProfile,
+ String maxDateQuery,
+ String acceleratorDetectionQuery) throws Exception, FetchMetricsError {
try {
long interval_end_time_epoc = 0;
long interval_start_time_epoc = 0;
@@ -2007,6 +2028,20 @@ private void fetchContainerMetricsBasedOnDataSourceAndProfile(KruizeObject kruiz
for (Map.Entry entry : containerDataMap.entrySet()) {
ContainerData containerData = entry.getValue();
+
+ // Check if the container data has Accelerator support else check for Accelerator metrics
+ if (null == containerData.getContainerDeviceList() || !containerData.getContainerDeviceList().isAcceleratorDeviceDetected()) {
+ RecommendationUtils.markAcceleratorDeviceStatusToContainer(containerData,
+ maxDateQuery,
+ namespace,
+ workload,
+ workload_type,
+ dataSourceInfo,
+ kruizeObject.getTerms(),
+ measurementDurationMinutesInDouble,
+ acceleratorDetectionQuery);
+ }
+
String containerName = containerData.getContainer_name();
if (null == interval_end_time) {
LOGGER.info(KruizeConstants.APIMessages.CONTAINER_USAGE_INFO);
@@ -2058,20 +2093,47 @@ private void fetchContainerMetricsBasedOnDataSourceAndProfile(KruizeObject kruiz
HashMap containerDataResults = new HashMap<>();
IntervalResults intervalResults = null;
HashMap resMap = null;
- HashMap resultMap = null;
+ HashMap acceleratorMetricResultHashMap;
MetricResults metricResults = null;
MetricAggregationInfoResults metricAggregationInfoResults = null;
- List metricList = metricProfile.getSloInfo().getFunctionVariables();
+ List metricList = filterMetricsBasedOnExpTypeAndK8sObject(metricProfile,
+ AnalyzerConstants.MetricName.maxDate.name(), kruizeObject.getExperimentType());
+ List acceleratorFunctions = Arrays.asList(
+ AnalyzerConstants.MetricName.gpuCoreUsage.toString(),
+ AnalyzerConstants.MetricName.gpuMemoryUsage.toString()
+ );
// Iterate over metrics and aggregation functions
for (Metric metricEntry : metricList) {
+
+ boolean isAcceleratorMetric = false;
+ boolean fetchAcceleratorMetrics = false;
+
+ if (acceleratorFunctions.contains(metricEntry.getName())) {
+ isAcceleratorMetric = true;
+ }
+
+ if (isAcceleratorMetric
+ && null != containerData.getContainerDeviceList()
+ && containerData.getContainerDeviceList().isAcceleratorDeviceDetected()) {
+ fetchAcceleratorMetrics = true;
+ }
+
+ // Skip fetching Accelerator metrics if the workload doesn't use Accelerator
+ if (isAcceleratorMetric && !fetchAcceleratorMetrics)
+ continue;
+
HashMap aggregationFunctions = metricEntry.getAggregationFunctionsMap();
for (Map.Entry aggregationFunctionsEntry: aggregationFunctions.entrySet()) {
// Determine promQL query on metric type
String promQL = aggregationFunctionsEntry.getValue().getQuery();
- String format = null;
+ // Skipping if the promQL is empty
+ if (null == promQL || promQL.isEmpty())
+ continue;
+
+ String format = null;
// Determine format based on metric type - Todo move this metric profile
List cpuFunction = Arrays.asList(AnalyzerConstants.MetricName.cpuUsage.toString(), AnalyzerConstants.MetricName.cpuThrottle.toString(), AnalyzerConstants.MetricName.cpuLimit.toString(), AnalyzerConstants.MetricName.cpuRequest.toString());
@@ -2080,8 +2142,11 @@ private void fetchContainerMetricsBasedOnDataSourceAndProfile(KruizeObject kruiz
format = KruizeConstants.JSONKeys.CORES;
} else if (memFunction.contains(metricEntry.getName())) {
format = KruizeConstants.JSONKeys.BYTES;
+ } else if (isAcceleratorMetric) {
+ format = KruizeConstants.JSONKeys.CORES;
}
+ // If promQL is determined, fetch metrics from the datasource
promQL = promQL
.replace(AnalyzerConstants.NAMESPACE_VARIABLE, namespace)
.replace(AnalyzerConstants.CONTAINER_VARIABLE, containerName)
@@ -2089,48 +2154,150 @@ private void fetchContainerMetricsBasedOnDataSourceAndProfile(KruizeObject kruiz
.replace(AnalyzerConstants.WORKLOAD_VARIABLE, workload)
.replace(AnalyzerConstants.WORKLOAD_TYPE_VARIABLE, workload_type);
- // If promQL is determined, fetch metrics from the datasource
- if (promQL != null) {
- LOGGER.info(promQL);
- String podMetricsUrl;
- try {
- podMetricsUrl = String.format(KruizeConstants.DataSourceConstants.DATASOURCE_ENDPOINT_WITH_QUERY,
- dataSourceInfo.getUrl(),
- URLEncoder.encode(promQL, CHARACTER_ENCODING),
- interval_start_time_epoc,
- interval_end_time_epoc,
- measurementDurationMinutesInDouble.intValue() * KruizeConstants.TimeConv.NO_OF_SECONDS_PER_MINUTE);
- LOGGER.info(podMetricsUrl);
- client.setBaseURL(podMetricsUrl);
- JSONObject genericJsonObject = client.fetchMetricsJson(KruizeConstants.APIMessages.GET, "");
- JsonObject jsonObject = new Gson().fromJson(genericJsonObject.toString(), JsonObject.class);
- JsonArray resultArray = jsonObject.getAsJsonObject(KruizeConstants.JSONKeys.DATA).getAsJsonArray(KruizeConstants.DataSourceConstants.DataSourceQueryJSONKeys.RESULT);
- // Process fetched metrics
- if (null != resultArray && !resultArray.isEmpty()) {
- resultArray = jsonObject.getAsJsonObject(KruizeConstants.JSONKeys.DATA).getAsJsonArray(
- KruizeConstants.DataSourceConstants.DataSourceQueryJSONKeys.RESULT).get(0)
- .getAsJsonObject().getAsJsonArray(KruizeConstants.DataSourceConstants
- .DataSourceQueryJSONKeys.VALUES);
- sdf.setTimeZone(TimeZone.getTimeZone(KruizeConstants.TimeUnitsExt.TimeZones.UTC));
+ LOGGER.info(promQL);
+ String podMetricsUrl;
+ try {
+ podMetricsUrl = String.format(KruizeConstants.DataSourceConstants.DATASOURCE_ENDPOINT_WITH_QUERY,
+ dataSourceInfo.getUrl(),
+ URLEncoder.encode(promQL, CHARACTER_ENCODING),
+ interval_start_time_epoc,
+ interval_end_time_epoc,
+ measurementDurationMinutesInDouble.intValue() * KruizeConstants.TimeConv.NO_OF_SECONDS_PER_MINUTE);
+ LOGGER.info(podMetricsUrl);
+ client.setBaseURL(podMetricsUrl);
+ JSONObject genericJsonObject = client.fetchMetricsJson(KruizeConstants.APIMessages.GET, "");
+ JsonObject jsonObject = new Gson().fromJson(genericJsonObject.toString(), JsonObject.class);
+ JsonArray resultArray = jsonObject.getAsJsonObject(KruizeConstants.JSONKeys.DATA).getAsJsonArray(KruizeConstants.DataSourceConstants.DataSourceQueryJSONKeys.RESULT);
+ // Skipping if Result array is null or empty
+ if (null == resultArray || resultArray.isEmpty())
+ continue;
+
+ // Process fetched metrics
+ if (isAcceleratorMetric){
+ for (JsonElement result : resultArray) {
+ JsonObject resultObject = result.getAsJsonObject();
+ JsonObject metricObject = resultObject.getAsJsonObject(KruizeConstants.JSONKeys.METRIC);
+
+ // Set the data only for the container Accelerator device
+ if (null == metricObject.get(KruizeConstants.JSONKeys.MODEL_NAME).getAsString())
+ continue;
+ if (metricObject.get(KruizeConstants.JSONKeys.MODEL_NAME).getAsString().isEmpty())
+ continue;
+
+ ArrayList deviceDetails = containerData.getContainerDeviceList().getDevices(AnalyzerConstants.DeviceType.ACCELERATOR);
+ // Continuing to next element
+ // All other elements will also fail as there is no Accelerator attached
+ // Theoretically, it doesn't fail, but the future implementations may change
+ // So adding a check after a function call to check it's return value is advisable
+ // TODO: Needs a check to figure out why devicelist is empty if is Accelerator detected is true
+ if (null == deviceDetails)
+ continue;
+ if (deviceDetails.isEmpty())
+ continue;
+
+ // Assuming only one MIG supported Accelerator is attached
+ // Needs to be changed when you support multiple Accelerator's
+ // Same changes need to be applied at the time of adding the device in
+ // DeviceHandler
+ DeviceDetails deviceDetail = deviceDetails.get(0);
+ AcceleratorDeviceData containerAcceleratorDeviceData = (AcceleratorDeviceData) deviceDetail;
+
+ // Skip non-matching Accelerator entries
+ if (!metricObject.get(KruizeConstants.JSONKeys.MODEL_NAME).getAsString().equalsIgnoreCase(containerAcceleratorDeviceData.getModelName()))
+ continue;
+
+ AcceleratorDeviceData acceleratorDeviceData = new AcceleratorDeviceData(metricObject.get(KruizeConstants.JSONKeys.MODEL_NAME).getAsString(),
+ metricObject.get(KruizeConstants.JSONKeys.HOSTNAME).getAsString(),
+ metricObject.get(KruizeConstants.JSONKeys.UUID).getAsString(),
+ metricObject.get(KruizeConstants.JSONKeys.DEVICE).getAsString(),
+ true);
+
+ JsonArray valuesArray = resultObject.getAsJsonArray(KruizeConstants.DataSourceConstants
+ .DataSourceQueryJSONKeys.VALUES);
+ sdf.setTimeZone(TimeZone.getTimeZone(KruizeConstants.TimeUnitsExt.TimeZones.UTC));
// Iterate over fetched metrics
Timestamp sTime = new Timestamp(interval_start_time_epoc);
- for (JsonElement element : resultArray) {
+ for (JsonElement element : valuesArray) {
JsonArray valueArray = element.getAsJsonArray();
long epochTime = valueArray.get(0).getAsLong();
double value = valueArray.get(1).getAsDouble();
String timestamp = sdf.format(new Date(epochTime * KruizeConstants.TimeConv.NO_OF_MSECS_IN_SEC));
Date date = sdf.parse(timestamp);
- Timestamp eTime = new Timestamp(date.getTime());
+ Timestamp tempTime = new Timestamp(date.getTime());
+ Timestamp eTime = RecommendationUtils.getNearestTimestamp(containerDataResults,
+ tempTime,
+ AnalyzerConstants.AcceleratorConstants.AcceleratorMetricConstants.TIMESTAMP_RANGE_CHECK_IN_MINUTES);
+
+ // containerDataResults are empty so will use the prometheus timestamp
+ if (null == eTime) {
+ // eTime = tempTime;
+ // Skipping entry, as inconsistency with CPU & memory records may provide null pointer while accessing metric results
+ // TODO: Need to seperate the data records of CPU and memory based on exporter
+ // TODO: Perform recommendation generation by stitching the outcome
+ continue;
+ }
// Prepare interval results
- prepareIntervalResults(containerDataResults, intervalResults, resMap, metricResults,
- metricAggregationInfoResults, sTime, eTime, metricEntry, aggregationFunctionsEntry, value, format);
+ if (containerDataResults.containsKey(eTime)) {
+ intervalResults = containerDataResults.get(eTime);
+ acceleratorMetricResultHashMap = intervalResults.getAcceleratorMetricResultHashMap();
+ if (null == acceleratorMetricResultHashMap)
+ acceleratorMetricResultHashMap = new HashMap<>();
+ } else {
+ intervalResults = new IntervalResults();
+ acceleratorMetricResultHashMap = new HashMap<>();
+ }
+ AnalyzerConstants.MetricName metricName = AnalyzerConstants.MetricName.valueOf(metricEntry.getName());
+ if (acceleratorMetricResultHashMap.containsKey(metricName)) {
+ metricResults = acceleratorMetricResultHashMap.get(metricName).getMetricResults();
+ metricAggregationInfoResults = metricResults.getAggregationInfoResult();
+ } else {
+ metricResults = new MetricResults();
+ metricAggregationInfoResults = new MetricAggregationInfoResults();
+ }
+ Method method = MetricAggregationInfoResults.class.getDeclaredMethod(KruizeConstants.APIMessages.SET + aggregationFunctionsEntry.getKey().substring(0, 1).toUpperCase() + aggregationFunctionsEntry.getKey().substring(1), Double.class);
+ method.invoke(metricAggregationInfoResults, value);
+ metricAggregationInfoResults.setFormat(format);
+ metricResults.setAggregationInfoResult(metricAggregationInfoResults);
+ metricResults.setName(String.valueOf(metricName));
+ metricResults.setFormat(format);
+ AcceleratorMetricResult acceleratorMetricResult = new AcceleratorMetricResult(acceleratorDeviceData, metricResults);
+ acceleratorMetricResultHashMap.put(metricName, acceleratorMetricResult);
+ intervalResults.setAcceleratorMetricResultHashMap(acceleratorMetricResultHashMap);
+ intervalResults.setIntervalStartTime(sTime); //Todo this will change
+ intervalResults.setIntervalEndTime(eTime);
+ intervalResults.setDurationInMinutes((double) ((eTime.getTime() - sTime.getTime())
+ / ((long) KruizeConstants.TimeConv.NO_OF_SECONDS_PER_MINUTE
+ * KruizeConstants.TimeConv.NO_OF_MSECS_IN_SEC)));
+ containerDataResults.put(eTime, intervalResults);
+ sTime = eTime;
}
}
- } catch (Exception e) {
- throw new RuntimeException(e);
+ } else {
+ resultArray = jsonObject.getAsJsonObject(KruizeConstants.JSONKeys.DATA).getAsJsonArray(
+ KruizeConstants.DataSourceConstants.DataSourceQueryJSONKeys.RESULT).get(0)
+ .getAsJsonObject().getAsJsonArray(KruizeConstants.DataSourceConstants
+ .DataSourceQueryJSONKeys.VALUES);
+ sdf.setTimeZone(TimeZone.getTimeZone(KruizeConstants.TimeUnitsExt.TimeZones.UTC));
+
+ // Iterate over fetched metrics
+ Timestamp sTime = new Timestamp(interval_start_time_epoc);
+ for (JsonElement element : resultArray) {
+ JsonArray valueArray = element.getAsJsonArray();
+ long epochTime = valueArray.get(0).getAsLong();
+ double value = valueArray.get(1).getAsDouble();
+ String timestamp = sdf.format(new Date(epochTime * KruizeConstants.TimeConv.NO_OF_MSECS_IN_SEC));
+ Date date = sdf.parse(timestamp);
+ Timestamp eTime = new Timestamp(date.getTime());
+
+ // Prepare interval results
+ prepareIntervalResults(containerDataResults, intervalResults, resMap, metricResults,
+ metricAggregationInfoResults, sTime, eTime, metricEntry, aggregationFunctionsEntry, value, format);
+ }
}
+ } catch (Exception e) {
+ throw new RuntimeException(e);
}
}
}
@@ -2206,5 +2373,28 @@ private void prepareIntervalResults(Map dataResultsM
throw new Exception(AnalyzerErrorConstants.APIErrors.UpdateRecommendationsAPI.METRIC_EXCEPTION + e.getMessage());
}
}
+
+ /**
+ * Filters out maxDateQuery and includes metrics based on the experiment type and kubernetes_object
+ * @param metricProfile Metric profile to be used
+ * @param maxDateQuery maxDateQuery metric to be filtered out
+ * @param experimentType experiment type
+ */
+ public List filterMetricsBasedOnExpTypeAndK8sObject(PerformanceProfile metricProfile, String maxDateQuery, String experimentType) {
+ String namespace = KruizeConstants.JSONKeys.NAMESPACE;
+ String container = KruizeConstants.JSONKeys.CONTAINER;
+ return metricProfile.getSloInfo().getFunctionVariables().stream()
+ .filter(Metric -> {
+ String name = Metric.getName();
+ String kubernetes_object = Metric.getKubernetesObject();
+
+ // Include metrics based on experiment_type, kubernetes_object and exclude maxDate metric
+ return !name.equals(maxDateQuery) && (
+ (experimentType.equals(AnalyzerConstants.ExperimentTypes.NAMESPACE_EXPERIMENT) && kubernetes_object.equals(namespace)) ||
+ (experimentType.equals(AnalyzerConstants.ExperimentTypes.CONTAINER_EXPERIMENT) && kubernetes_object.equals(container))
+ );
+ })
+ .toList();
+ }
}
diff --git a/src/main/java/com/autotune/analyzer/recommendations/model/CostBasedRecommendationModel.java b/src/main/java/com/autotune/analyzer/recommendations/model/CostBasedRecommendationModel.java
index db8c783ae..891168f4f 100644
--- a/src/main/java/com/autotune/analyzer/recommendations/model/CostBasedRecommendationModel.java
+++ b/src/main/java/com/autotune/analyzer/recommendations/model/CostBasedRecommendationModel.java
@@ -3,16 +3,21 @@
import com.autotune.analyzer.recommendations.RecommendationConfigItem;
import com.autotune.analyzer.recommendations.RecommendationConstants;
import com.autotune.analyzer.recommendations.RecommendationNotification;
+import com.autotune.analyzer.recommendations.utils.RecommendationUtils;
import com.autotune.analyzer.utils.AnalyzerConstants;
+import com.autotune.common.data.metrics.AcceleratorMetricResult;
import com.autotune.common.data.metrics.MetricAggregationInfoResults;
import com.autotune.common.data.metrics.MetricResults;
import com.autotune.common.data.result.IntervalResults;
+import com.autotune.common.data.system.info.device.accelerator.metadata.AcceleratorMetaDataService;
+import com.autotune.common.data.system.info.device.accelerator.metadata.AcceleratorProfile;
import com.autotune.common.utils.CommonUtils;
import com.autotune.utils.KruizeConstants;
import org.json.JSONArray;
import org.json.JSONObject;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
+import software.amazon.awssdk.services.cloudwatchlogs.endpoints.internal.Value;
import java.sql.Timestamp;
import java.util.*;
@@ -22,6 +27,8 @@
import static com.autotune.analyzer.recommendations.RecommendationConstants.RecommendationEngine.PercentileConstants.COST_CPU_PERCENTILE;
import static com.autotune.analyzer.recommendations.RecommendationConstants.RecommendationEngine.PercentileConstants.COST_MEMORY_PERCENTILE;
+import static com.autotune.analyzer.recommendations.RecommendationConstants.RecommendationEngine.PercentileConstants.COST_ACCELERATOR_PERCENTILE;
+
import static com.autotune.analyzer.recommendations.RecommendationConstants.RecommendationValueConstants.*;
public class CostBasedRecommendationModel implements RecommendationModel {
@@ -505,6 +512,80 @@ public RecommendationConfigItem getMemoryRequestRecommendationForNamespace(Map getAcceleratorRequestRecommendation (
+ Map filteredResultsMap,
+ ArrayList notifications
+ ) {
+ List acceleratorCoreMaxValues = new ArrayList<>();
+ List acceleratorMemoryMaxValues = new ArrayList<>();
+
+ boolean isGpuWorkload = false;
+ String acceleratorModel = null;
+
+ for (Map.Entry entry : filteredResultsMap.entrySet()) {
+ IntervalResults intervalResults = entry.getValue();
+
+ // Skip if accelerator map is null
+ if (null == intervalResults.getAcceleratorMetricResultHashMap())
+ continue;
+
+ isGpuWorkload = true;
+ for (Map.Entry gpuEntry : intervalResults.getAcceleratorMetricResultHashMap().entrySet()) {
+ AcceleratorMetricResult gpuMetricResult = gpuEntry.getValue();
+
+ // Set Accelerator name
+ // TODO: Need to handle separate processing in case of container supporting multiple accelerators
+ if (null == acceleratorModel
+ && null != gpuMetricResult.getAcceleratorDeviceData().getModelName()
+ && !gpuMetricResult.getAcceleratorDeviceData().getModelName().isEmpty()
+ && RecommendationUtils.checkIfModelIsKruizeSupportedMIG(gpuMetricResult.getAcceleratorDeviceData().getModelName())
+ ) {
+ String obtainedAcceleratorName = RecommendationUtils.getSupportedModelBasedOnModelName(gpuMetricResult.getAcceleratorDeviceData().getModelName());
+ if (null != obtainedAcceleratorName)
+ acceleratorModel = obtainedAcceleratorName;
+ }
+
+ MetricResults metricResults = gpuMetricResult.getMetricResults();
+
+ // Skip if metric results is null
+ if (null == metricResults || null == metricResults.getAggregationInfoResult())
+ continue;
+
+ MetricAggregationInfoResults aggregationInfo = metricResults.getAggregationInfoResult();
+
+ // Skip if max is null or zero or negative
+ if (null == aggregationInfo.getMax() || aggregationInfo.getMax() <= 0.0)
+ continue;
+
+ boolean isCoreUsage = gpuEntry.getKey() == AnalyzerConstants.MetricName.gpuCoreUsage;
+ boolean isMemoryUsage = gpuEntry.getKey() == AnalyzerConstants.MetricName.gpuMemoryUsage;
+
+ // Skip if it's none of the Accelerator metrics
+ if (!isCoreUsage && !isMemoryUsage)
+ continue;
+
+ if (isCoreUsage) {
+ acceleratorCoreMaxValues.add(aggregationInfo.getMax());
+ } else {
+ acceleratorMemoryMaxValues.add(aggregationInfo.getMax());
+ }
+ }
+ }
+
+ if (!isGpuWorkload) {
+ return null;
+ }
+
+ double coreAverage = CommonUtils.percentile(COST_ACCELERATOR_PERCENTILE, acceleratorCoreMaxValues);
+ double memoryAverage = CommonUtils.percentile(COST_ACCELERATOR_PERCENTILE, acceleratorMemoryMaxValues);
+
+ double coreFraction = coreAverage / 100;
+ double memoryFraction = memoryAverage / 100;
+
+ return RecommendationUtils.getMapWithOptimalProfile(acceleratorModel, coreFraction, memoryFraction);
+ }
+
public static JSONObject calculateNamespaceMemoryUsage(IntervalResults intervalResults) {
// create a JSON object which should be returned here having two values, Math.max and Collections.Min
JSONObject jsonObject = new JSONObject();
diff --git a/src/main/java/com/autotune/analyzer/recommendations/model/PerformanceBasedRecommendationModel.java b/src/main/java/com/autotune/analyzer/recommendations/model/PerformanceBasedRecommendationModel.java
index fcaccd344..0cd9eee41 100644
--- a/src/main/java/com/autotune/analyzer/recommendations/model/PerformanceBasedRecommendationModel.java
+++ b/src/main/java/com/autotune/analyzer/recommendations/model/PerformanceBasedRecommendationModel.java
@@ -3,8 +3,10 @@
import com.autotune.analyzer.recommendations.RecommendationConfigItem;
import com.autotune.analyzer.recommendations.RecommendationConstants;
import com.autotune.analyzer.recommendations.RecommendationNotification;
+import com.autotune.analyzer.recommendations.utils.RecommendationUtils;
import com.autotune.analyzer.services.UpdateRecommendations;
import com.autotune.analyzer.utils.AnalyzerConstants;
+import com.autotune.common.data.metrics.AcceleratorMetricResult;
import com.autotune.common.data.metrics.MetricAggregationInfoResults;
import com.autotune.common.data.metrics.MetricResults;
import com.autotune.common.data.result.IntervalResults;
@@ -19,8 +21,8 @@
import java.util.*;
import java.util.stream.Collectors;
-import static com.autotune.analyzer.recommendations.RecommendationConstants.RecommendationEngine.PercentileConstants.PERFORMANCE_CPU_PERCENTILE;
-import static com.autotune.analyzer.recommendations.RecommendationConstants.RecommendationEngine.PercentileConstants.PERFORMANCE_MEMORY_PERCENTILE;
+import static com.autotune.analyzer.recommendations.RecommendationConstants.RecommendationEngine.PercentileConstants.*;
+import static com.autotune.analyzer.recommendations.RecommendationConstants.RecommendationEngine.PercentileConstants.PERFORMANCE_ACCELERATOR_PERCENTILE;
import static com.autotune.analyzer.recommendations.RecommendationConstants.RecommendationValueConstants.*;
public class PerformanceBasedRecommendationModel implements RecommendationModel {
@@ -372,6 +374,77 @@ public RecommendationConfigItem getMemoryRequestRecommendationForNamespace(Map getAcceleratorRequestRecommendation(Map filteredResultsMap, ArrayList notifications) {
+ List acceleratorCoreMaxValues = new ArrayList<>();
+ List acceleratorMemoryMaxValues = new ArrayList<>();
+
+ boolean isGpuWorkload = false;
+ String acceleratorModel = null;
+
+ for (Map.Entry entry : filteredResultsMap.entrySet()) {
+ IntervalResults intervalResults = entry.getValue();
+
+ // Skip if accelerator map is null
+ if (null == intervalResults.getAcceleratorMetricResultHashMap())
+ continue;
+
+ isGpuWorkload = true;
+ for (Map.Entry gpuEntry : intervalResults.getAcceleratorMetricResultHashMap().entrySet()) {
+ AcceleratorMetricResult gpuMetricResult = gpuEntry.getValue();
+
+ // Set Accelerator name
+ if (null == acceleratorModel
+ && null != gpuMetricResult.getAcceleratorDeviceData().getModelName()
+ && !gpuMetricResult.getAcceleratorDeviceData().getModelName().isEmpty()
+ && RecommendationUtils.checkIfModelIsKruizeSupportedMIG(gpuMetricResult.getAcceleratorDeviceData().getModelName())
+ ) {
+ String obtainedAcceleratorName = RecommendationUtils.getSupportedModelBasedOnModelName(gpuMetricResult.getAcceleratorDeviceData().getModelName());
+
+ if (null != obtainedAcceleratorName)
+ acceleratorModel = obtainedAcceleratorName;
+ }
+
+ MetricResults metricResults = gpuMetricResult.getMetricResults();
+
+ // Skip if metric results is null
+ if (null == metricResults || null == metricResults.getAggregationInfoResult())
+ continue;
+
+ MetricAggregationInfoResults aggregationInfo = metricResults.getAggregationInfoResult();
+
+ // Skip if max is null or zero or negative
+ if (null == aggregationInfo.getMax() || aggregationInfo.getMax() <= 0.0)
+ continue;
+
+ boolean isCoreUsage = gpuEntry.getKey() == AnalyzerConstants.MetricName.gpuCoreUsage;
+ boolean isMemoryUsage = gpuEntry.getKey() == AnalyzerConstants.MetricName.gpuMemoryUsage;
+
+ // Skip if it's none of the Accelerator metrics
+ if (!isCoreUsage && !isMemoryUsage)
+ continue;
+
+ if (isCoreUsage) {
+ acceleratorCoreMaxValues.add(aggregationInfo.getMax());
+ } else {
+ acceleratorMemoryMaxValues.add(aggregationInfo.getMax());
+ }
+ }
+ }
+
+ if (!isGpuWorkload) {
+ return null;
+ }
+
+ double coreAverage = CommonUtils.percentile(PERFORMANCE_ACCELERATOR_PERCENTILE, acceleratorCoreMaxValues);
+ double memoryAverage = CommonUtils.percentile(PERFORMANCE_ACCELERATOR_PERCENTILE, acceleratorMemoryMaxValues);
+
+ double coreFraction = coreAverage / 100;
+ double memoryFraction = memoryAverage / 100;
+
+ return RecommendationUtils.getMapWithOptimalProfile(acceleratorModel, coreFraction, memoryFraction);
+ }
+
@Override
public String getModelName() {
return this.name;
diff --git a/src/main/java/com/autotune/analyzer/recommendations/model/RecommendationModel.java b/src/main/java/com/autotune/analyzer/recommendations/model/RecommendationModel.java
index 5a905805b..923ac0d20 100644
--- a/src/main/java/com/autotune/analyzer/recommendations/model/RecommendationModel.java
+++ b/src/main/java/com/autotune/analyzer/recommendations/model/RecommendationModel.java
@@ -2,6 +2,7 @@
import com.autotune.analyzer.recommendations.RecommendationConfigItem;
import com.autotune.analyzer.recommendations.RecommendationNotification;
+import com.autotune.analyzer.utils.AnalyzerConstants;
import com.autotune.common.data.result.IntervalResults;
import java.sql.Timestamp;
@@ -17,6 +18,8 @@ public interface RecommendationModel {
// get namespace recommendations for Memory Request
RecommendationConfigItem getMemoryRequestRecommendationForNamespace(Map filteredResultsMap, ArrayList notifications);
+ Map getAcceleratorRequestRecommendation(Map filteredResultsMap, ArrayList notifications);
+
public String getModelName();
void validate();
diff --git a/src/main/java/com/autotune/analyzer/recommendations/utils/RecommendationUtils.java b/src/main/java/com/autotune/analyzer/recommendations/utils/RecommendationUtils.java
index 2deac4110..45085f33c 100644
--- a/src/main/java/com/autotune/analyzer/recommendations/utils/RecommendationUtils.java
+++ b/src/main/java/com/autotune/analyzer/recommendations/utils/RecommendationUtils.java
@@ -1,19 +1,39 @@
package com.autotune.analyzer.recommendations.utils;
+import com.autotune.analyzer.exceptions.FetchMetricsError;
import com.autotune.analyzer.recommendations.RecommendationConfigItem;
import com.autotune.analyzer.recommendations.RecommendationConstants;
-import com.autotune.analyzer.recommendations.RecommendationNotification;
+import com.autotune.analyzer.recommendations.term.Terms;
import com.autotune.analyzer.utils.AnalyzerConstants;
import com.autotune.common.data.metrics.MetricResults;
import com.autotune.common.data.result.ContainerData;
import com.autotune.common.data.result.IntervalResults;
+import com.autotune.common.data.system.info.device.ContainerDeviceList;
+import com.autotune.common.data.system.info.device.accelerator.AcceleratorDeviceData;
+import com.autotune.common.datasource.DataSourceInfo;
+import com.autotune.utils.GenericRestApiClient;
import com.autotune.utils.KruizeConstants;
+import com.google.gson.*;
+import org.json.JSONObject;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import com.autotune.common.data.system.info.device.accelerator.metadata.AcceleratorMetaDataService;
+import com.autotune.common.data.system.info.device.accelerator.metadata.AcceleratorProfile;
+import java.io.IOException;
+import java.net.URLEncoder;
+import java.security.KeyManagementException;
+import java.security.KeyStoreException;
+import java.security.NoSuchAlgorithmException;
import java.sql.Timestamp;
-import java.time.LocalDateTime;
+import java.text.ParseException;
+import java.text.SimpleDateFormat;
import java.util.*;
+import static com.autotune.analyzer.utils.AnalyzerConstants.ServiceConstants.CHARACTER_ENCODING;
+
public class RecommendationUtils {
+ private static final Logger LOGGER = LoggerFactory.getLogger(RecommendationUtils.class);
public static RecommendationConfigItem getCurrentValue(Map filteredResultsMap,
Timestamp timestampToExtract,
AnalyzerConstants.ResourceSetting resourceSetting,
@@ -28,15 +48,15 @@ public static RecommendationConfigItem getCurrentValue(Map termsMap,
+ Double measurementDurationMinutesInDouble,
+ String gpuDetectionQuery)
+ throws IOException, NoSuchAlgorithmException, KeyStoreException,
+ KeyManagementException, ParseException, FetchMetricsError {
+
+ SimpleDateFormat sdf = new SimpleDateFormat(KruizeConstants.DateFormats.STANDARD_JSON_DATE_FORMAT, Locale.ROOT);
+ String containerName = containerData.getContainer_name();
+ String queryToEncode = null;
+ long interval_end_time_epoc = 0;
+ long interval_start_time_epoc = 0;
+
+ LOGGER.debug("maxDateQuery: {}", maxDateQuery);
+ queryToEncode = maxDateQuery
+ .replace(AnalyzerConstants.NAMESPACE_VARIABLE, namespace)
+ .replace(AnalyzerConstants.CONTAINER_VARIABLE, containerName)
+ .replace(AnalyzerConstants.WORKLOAD_VARIABLE, workload)
+ .replace(AnalyzerConstants.WORKLOAD_TYPE_VARIABLE, workload_type);
+
+ String dateMetricsUrl = String.format(KruizeConstants.DataSourceConstants.DATE_ENDPOINT_WITH_QUERY,
+ dataSourceInfo.getUrl(),
+ URLEncoder.encode(queryToEncode, CHARACTER_ENCODING)
+ );
+
+ LOGGER.debug(dateMetricsUrl);
+ GenericRestApiClient client = new GenericRestApiClient(dataSourceInfo);
+ client.setBaseURL(dateMetricsUrl);
+ JSONObject genericJsonObject = client.fetchMetricsJson(KruizeConstants.APIMessages.GET, "");
+ JsonObject jsonObject = new Gson().fromJson(genericJsonObject.toString(), JsonObject.class);
+ JsonArray resultArray = jsonObject.getAsJsonObject(KruizeConstants.JSONKeys.DATA).getAsJsonArray(KruizeConstants.DataSourceConstants.DataSourceQueryJSONKeys.RESULT);
+
+ if (null == resultArray || resultArray.isEmpty()) {
+ // Need to alert that container max duration is not detected
+ // Ignoring it here, as we take care of it at generate recommendations
+ return;
+ }
+
+ resultArray = resultArray.get(0)
+ .getAsJsonObject().getAsJsonArray(KruizeConstants.DataSourceConstants.DataSourceQueryJSONKeys.VALUE);
+ long epochTime = resultArray.get(0).getAsLong();
+ String timestamp = sdf.format(new Date(epochTime * KruizeConstants.TimeConv.NO_OF_MSECS_IN_SEC));
+ Date date = sdf.parse(timestamp);
+ Timestamp dateTS = new Timestamp(date.getTime());
+ interval_end_time_epoc = dateTS.getTime() / KruizeConstants.TimeConv.NO_OF_MSECS_IN_SEC
+ - ((long) dateTS.getTimezoneOffset() * KruizeConstants.TimeConv.NO_OF_SECONDS_PER_MINUTE);
+ int maxDay = Terms.getMaxDays(termsMap);
+ LOGGER.debug(KruizeConstants.APIMessages.MAX_DAY, maxDay);
+ Timestamp startDateTS = Timestamp.valueOf(Objects.requireNonNull(dateTS).toLocalDateTime().minusDays(maxDay));
+ interval_start_time_epoc = startDateTS.getTime() / KruizeConstants.TimeConv.NO_OF_MSECS_IN_SEC
+ - ((long) startDateTS.getTimezoneOffset() * KruizeConstants.TimeConv.NO_OF_MSECS_IN_SEC);
+
+ gpuDetectionQuery = gpuDetectionQuery.replace(AnalyzerConstants.NAMESPACE_VARIABLE, namespace)
+ .replace(AnalyzerConstants.CONTAINER_VARIABLE, containerName)
+ .replace(AnalyzerConstants.MEASUREMENT_DURATION_IN_MIN_VARAIBLE, Integer.toString(measurementDurationMinutesInDouble.intValue()))
+ .replace(AnalyzerConstants.WORKLOAD_VARIABLE, workload)
+ .replace(AnalyzerConstants.WORKLOAD_TYPE_VARIABLE, workload_type);
+
+ String podMetricsUrl;
+ try {
+ podMetricsUrl = String.format(KruizeConstants.DataSourceConstants.DATASOURCE_ENDPOINT_WITH_QUERY,
+ dataSourceInfo.getUrl(),
+ URLEncoder.encode(gpuDetectionQuery, CHARACTER_ENCODING),
+ interval_start_time_epoc,
+ interval_end_time_epoc,
+ measurementDurationMinutesInDouble.intValue() * KruizeConstants.TimeConv.NO_OF_SECONDS_PER_MINUTE);
+ LOGGER.debug(podMetricsUrl);
+ client.setBaseURL(podMetricsUrl);
+ genericJsonObject = client.fetchMetricsJson(KruizeConstants.APIMessages.GET, "");
+
+ jsonObject = new Gson().fromJson(genericJsonObject.toString(), JsonObject.class);
+ resultArray = jsonObject.getAsJsonObject(KruizeConstants.JSONKeys.DATA).getAsJsonArray(KruizeConstants.DataSourceConstants.DataSourceQueryJSONKeys.RESULT);
+
+ if (null != resultArray && !resultArray.isEmpty()) {
+ for (JsonElement result : resultArray) {
+ JsonObject resultObject = result.getAsJsonObject();
+ JsonArray valuesArray = resultObject.getAsJsonArray(KruizeConstants.DataSourceConstants
+ .DataSourceQueryJSONKeys.VALUES);
+
+ for (JsonElement element : valuesArray) {
+ JsonArray valueArray = element.getAsJsonArray();
+ double value = valueArray.get(1).getAsDouble();
+ // TODO: Check for non-zero values to mark as GPU workload
+ break;
+ }
+
+ JsonObject metricObject = resultObject.getAsJsonObject(KruizeConstants.JSONKeys.METRIC);
+ String modelName = metricObject.get(KruizeConstants.JSONKeys.MODEL_NAME).getAsString();
+ if (null == modelName)
+ continue;
+
+ boolean isSupportedMig = checkIfModelIsKruizeSupportedMIG(modelName);
+ if (isSupportedMig) {
+ AcceleratorDeviceData acceleratorDeviceData = new AcceleratorDeviceData(metricObject.get(KruizeConstants.JSONKeys.MODEL_NAME).getAsString(),
+ metricObject.get(KruizeConstants.JSONKeys.HOSTNAME).getAsString(),
+ metricObject.get(KruizeConstants.JSONKeys.UUID).getAsString(),
+ metricObject.get(KruizeConstants.JSONKeys.DEVICE).getAsString(),
+ isSupportedMig);
+
+
+ if (null == containerData.getContainerDeviceList()) {
+ ContainerDeviceList containerDeviceList = new ContainerDeviceList();
+ containerData.setContainerDeviceList(containerDeviceList);
+ }
+ containerData.getContainerDeviceList().addDevice(AnalyzerConstants.DeviceType.ACCELERATOR, acceleratorDeviceData);
+ // TODO: Currently we consider only the first mig supported GPU
+ return;
+ }
+ }
+ }
+ } catch (IOException | NoSuchAlgorithmException | KeyStoreException | KeyManagementException |
+ JsonSyntaxException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ public static boolean checkIfModelIsKruizeSupportedMIG(String modelName) {
+ if (null == modelName || modelName.isEmpty())
+ return false;
+
+ modelName = modelName.toUpperCase();
+
+ boolean A100_CHECK = (modelName.contains("A100") &&
+ (modelName.contains("40GB") || modelName.contains("80GB")));
+
+ boolean H100_CHECK = false;
+ if (!A100_CHECK) {
+ H100_CHECK = (modelName.contains("H100") && modelName.contains("80GB"));
+ }
+
+ return A100_CHECK || H100_CHECK;
+ }
+
+ public static Timestamp getNearestTimestamp(HashMap containerDataResults, Timestamp targetTime, int minutesRange) {
+ long rangeInMillis = (long) minutesRange * 60 * 1000;
+ long targetTimeMillis = targetTime.getTime();
+
+ Timestamp nearestTimestamp = null;
+ long nearestDistance = Long.MAX_VALUE;
+
+ for (Map.Entry entry : containerDataResults.entrySet()) {
+ Timestamp currentTimestamp = entry.getKey();
+ long currentTimeMillis = currentTimestamp.getTime();
+ long distance = Math.abs(targetTimeMillis - currentTimeMillis);
+
+ if (distance <= rangeInMillis && distance < nearestDistance) {
+ nearestDistance = distance;
+ nearestTimestamp = currentTimestamp;
+ }
+ }
+
+ return nearestTimestamp;
+ }
+
+ public static HashMap getMapWithOptimalProfile(
+ String acceleratorModel,
+ Double coreFraction,
+ Double memoryFraction
+ ) {
+ if (null == acceleratorModel || null == coreFraction || null == memoryFraction)
+ return null;
+
+ HashMap returnMap = new HashMap<>();
+
+ AcceleratorMetaDataService gpuMetaDataService = AcceleratorMetaDataService.getInstance();
+ AcceleratorProfile acceleratorProfile = gpuMetaDataService.getAcceleratorProfile(acceleratorModel, coreFraction, memoryFraction);
+ RecommendationConfigItem recommendationConfigItem = new RecommendationConfigItem(1.0, "cores");
+
+ if (acceleratorProfile.getProfileName().equalsIgnoreCase(AnalyzerConstants.AcceleratorConstants.AcceleratorProfiles.PROFILE_1G_5GB)) {
+ returnMap.put(AnalyzerConstants.RecommendationItem.NVIDIA_GPU_PARTITION_1_CORE_5GB, recommendationConfigItem);
+ } else if (acceleratorProfile.getProfileName().equalsIgnoreCase(AnalyzerConstants.AcceleratorConstants.AcceleratorProfiles.PROFILE_1G_10GB)) {
+ returnMap.put(AnalyzerConstants.RecommendationItem.NVIDIA_GPU_PARTITION_1_CORE_10GB, recommendationConfigItem);
+ } else if (acceleratorProfile.getProfileName().equalsIgnoreCase(AnalyzerConstants.AcceleratorConstants.AcceleratorProfiles.PROFILE_1G_20GB)) {
+ returnMap.put(AnalyzerConstants.RecommendationItem.NVIDIA_GPU_PARTITION_1_CORE_20GB, recommendationConfigItem);
+ } else if (acceleratorProfile.getProfileName().equalsIgnoreCase(AnalyzerConstants.AcceleratorConstants.AcceleratorProfiles.PROFILE_2G_10GB)) {
+ returnMap.put(AnalyzerConstants.RecommendationItem.NVIDIA_GPU_PARTITION_2_CORES_10GB, recommendationConfigItem);
+ } else if (acceleratorProfile.getProfileName().equalsIgnoreCase(AnalyzerConstants.AcceleratorConstants.AcceleratorProfiles.PROFILE_2G_20GB)) {
+ returnMap.put(AnalyzerConstants.RecommendationItem.NVIDIA_GPU_PARTITION_2_CORES_20GB, recommendationConfigItem);
+ } else if (acceleratorProfile.getProfileName().equalsIgnoreCase(AnalyzerConstants.AcceleratorConstants.AcceleratorProfiles.PROFILE_3G_20GB)) {
+ returnMap.put(AnalyzerConstants.RecommendationItem.NVIDIA_GPU_PARTITION_3_CORES_20GB, recommendationConfigItem);
+ } else if (acceleratorProfile.getProfileName().equalsIgnoreCase(AnalyzerConstants.AcceleratorConstants.AcceleratorProfiles.PROFILE_3G_40GB)) {
+ returnMap.put(AnalyzerConstants.RecommendationItem.NVIDIA_GPU_PARTITION_3_CORES_40GB, recommendationConfigItem);
+ } else if (acceleratorProfile.getProfileName().equalsIgnoreCase(AnalyzerConstants.AcceleratorConstants.AcceleratorProfiles.PROFILE_4G_20GB)) {
+ returnMap.put(AnalyzerConstants.RecommendationItem.NVIDIA_GPU_PARTITION_4_CORES_20GB, recommendationConfigItem);
+ } else if (acceleratorProfile.getProfileName().equalsIgnoreCase(AnalyzerConstants.AcceleratorConstants.AcceleratorProfiles.PROFILE_4G_40GB)) {
+ returnMap.put(AnalyzerConstants.RecommendationItem.NVIDIA_GPU_PARTITION_4_CORES_40GB, recommendationConfigItem);
+ } else if (acceleratorProfile.getProfileName().equalsIgnoreCase(AnalyzerConstants.AcceleratorConstants.AcceleratorProfiles.PROFILE_7G_40GB)) {
+ returnMap.put(AnalyzerConstants.RecommendationItem.NVIDIA_GPU_PARTITION_7_CORES_40GB, recommendationConfigItem);
+ } else if (acceleratorProfile.getProfileName().equalsIgnoreCase(AnalyzerConstants.AcceleratorConstants.AcceleratorProfiles.PROFILE_7G_80GB)) {
+ returnMap.put(AnalyzerConstants.RecommendationItem.NVIDIA_GPU_PARTITION_7_CORES_80GB, recommendationConfigItem);
+ }
+ return returnMap;
+ }
+
+ public static String getSupportedModelBasedOnModelName(String modelName) {
+ if (null == modelName || modelName.isEmpty())
+ return null;
+
+ modelName = modelName.toUpperCase();
+
+ if (modelName.contains("A100") && modelName.contains("40GB"))
+ return AnalyzerConstants.AcceleratorConstants.SupportedAccelerators.A100_40_GB;
+
+ if (modelName.contains("A100") && modelName.contains("80GB"))
+ return AnalyzerConstants.AcceleratorConstants.SupportedAccelerators.A100_80_GB;
+
+ if (modelName.contains("H100") && modelName.contains("80GB"))
+ return AnalyzerConstants.AcceleratorConstants.SupportedAccelerators.H100_80_GB;
+
+ return null;
+ }
}
diff --git a/src/main/java/com/autotune/analyzer/serviceObjects/BulkInput.java b/src/main/java/com/autotune/analyzer/serviceObjects/BulkInput.java
new file mode 100644
index 000000000..e5e31d40d
--- /dev/null
+++ b/src/main/java/com/autotune/analyzer/serviceObjects/BulkInput.java
@@ -0,0 +1,139 @@
+/*******************************************************************************
+ * Copyright (c) 2022 Red Hat, IBM Corporation and others.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *******************************************************************************/
+package com.autotune.analyzer.serviceObjects;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Request payload object for Bulk Api service
+ */
+public class BulkInput {
+ private FilterWrapper filter;
+ private TimeRange time_range;
+ private String datasource;
+
+ // Getters and Setters
+
+ public TimeRange getTime_range() {
+ return time_range;
+ }
+
+ public void setTime_range(TimeRange time_range) {
+ this.time_range = time_range;
+ }
+
+ public String getDatasource() {
+ return datasource;
+ }
+
+ public void setDatasource(String datasource) {
+ this.datasource = datasource;
+ }
+
+ public FilterWrapper getFilter() {
+ return filter;
+ }
+
+ public void setFilter(FilterWrapper filter) {
+ this.filter = filter;
+ }
+
+ // Nested class for FilterWrapper that contains 'exclude' and 'include'
+ public static class FilterWrapper {
+ private Filter exclude;
+ private Filter include;
+
+ // Getters and Setters
+ public Filter getExclude() {
+ return exclude;
+ }
+
+ public void setExclude(Filter exclude) {
+ this.exclude = exclude;
+ }
+
+ public Filter getInclude() {
+ return include;
+ }
+
+ public void setInclude(Filter include) {
+ this.include = include;
+ }
+ }
+
+ public static class Filter {
+ private List namespace;
+ private List workload;
+ private List containers;
+ private Map labels;
+
+ // Getters and Setters
+ public List getNamespace() {
+ return namespace;
+ }
+
+ public void setNamespace(List namespace) {
+ this.namespace = namespace;
+ }
+
+ public List getWorkload() {
+ return workload;
+ }
+
+ public void setWorkload(List workload) {
+ this.workload = workload;
+ }
+
+ public List getContainers() {
+ return containers;
+ }
+
+ public void setContainers(List containers) {
+ this.containers = containers;
+ }
+
+ public Map getLabels() {
+ return labels;
+ }
+
+ public void setLabels(Map labels) {
+ this.labels = labels;
+ }
+ }
+
+ public static class TimeRange {
+ private String start;
+ private String end;
+
+ // Getters and Setters
+ public String getStart() {
+ return start;
+ }
+
+ public void setStart(String start) {
+ this.start = start;
+ }
+
+ public String getEnd() {
+ return end;
+ }
+
+ public void setEnd(String end) {
+ this.end = end;
+ }
+ }
+}
diff --git a/src/main/java/com/autotune/analyzer/serviceObjects/BulkJobStatus.java b/src/main/java/com/autotune/analyzer/serviceObjects/BulkJobStatus.java
new file mode 100644
index 000000000..d45f37774
--- /dev/null
+++ b/src/main/java/com/autotune/analyzer/serviceObjects/BulkJobStatus.java
@@ -0,0 +1,293 @@
+/*******************************************************************************
+ * Copyright (c) 2022 Red Hat, IBM Corporation and others.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *******************************************************************************/
+package com.autotune.analyzer.serviceObjects;
+
+import com.fasterxml.jackson.annotation.JsonFilter;
+import com.fasterxml.jackson.annotation.JsonProperty;
+
+import java.time.Instant;
+import java.time.ZoneOffset;
+import java.time.format.DateTimeFormatter;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+
+import static com.autotune.utils.KruizeConstants.KRUIZE_BULK_API.JOB_ID;
+
+/**
+ * Bulk API Response payload Object.
+ */
+@JsonFilter("jobFilter")
+public class BulkJobStatus {
+ @JsonProperty(JOB_ID)
+ private String jobID;
+ private String status;
+ private int total_experiments;
+ private int processed_experiments;
+ private Data data;
+ @JsonProperty("job_start_time")
+ private String startTime; // Change to String to store formatted time
+ @JsonProperty("job_end_time")
+ private String endTime; // Change to String to store formatted time
+ private String message;
+
+ public BulkJobStatus(String jobID, String status, Data data, Instant startTime) {
+ this.jobID = jobID;
+ this.status = status;
+ this.data = data;
+ setStartTime(startTime);
+ }
+
+ public String getJobID() {
+ return jobID;
+ }
+
+ public String getStatus() {
+ return status;
+ }
+
+ public void setStatus(String status) {
+ this.status = status;
+ }
+
+ public String getStartTime() {
+ return startTime;
+ }
+
+ public void setStartTime(Instant startTime) {
+ this.startTime = formatInstantAsUTCString(startTime);
+ }
+
+ public void setStartTime(String startTime) {
+ this.startTime = startTime;
+ }
+
+ public String getEndTime() {
+ return endTime;
+ }
+
+ public void setEndTime(Instant endTime) {
+ this.endTime = formatInstantAsUTCString(endTime);
+ }
+
+ public void setEndTime(String endTime) {
+ this.endTime = endTime;
+ }
+
+ public int getTotal_experiments() {
+ return total_experiments;
+ }
+
+ public void setTotal_experiments(int total_experiments) {
+ this.total_experiments = total_experiments;
+ }
+
+ public int getProcessed_experiments() {
+ return processed_experiments;
+ }
+
+ public void setProcessed_experiments(int processed_experiments) {
+ this.processed_experiments = processed_experiments;
+ }
+
+ public Data getData() {
+ return data;
+ }
+
+ public void setData(Data data) {
+ this.data = data;
+ }
+
+ // Utility function to format Instant into the required UTC format
+ private String formatInstantAsUTCString(Instant instant) {
+ DateTimeFormatter formatter = DateTimeFormatter
+ .ofPattern("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'")
+ .withZone(ZoneOffset.UTC); // Ensure it's in UTC
+
+ return formatter.format(instant);
+ }
+
+ public String getMessage() {
+ return message;
+ }
+
+ public void setMessage(String message) {
+ this.message = message;
+ }
+
+ // Inner class for the data field
+ public static class Data {
+ private Experiments experiments;
+ private Recommendations recommendations;
+
+ public Data(Experiments experiments, Recommendations recommendations) {
+ this.experiments = experiments;
+ this.recommendations = recommendations;
+ }
+
+ public Experiments getExperiments() {
+ return experiments;
+ }
+
+ public void setExperiments(Experiments experiments) {
+ this.experiments = experiments;
+ }
+
+ public Recommendations getRecommendations() {
+ return recommendations;
+ }
+
+ public void setRecommendations(Recommendations recommendations) {
+ this.recommendations = recommendations;
+ }
+ }
+
+ // Inner class for experiments
+ public static class Experiments {
+ @JsonProperty("new")
+ private List newExperiments;
+ @JsonProperty("updated")
+ private List updatedExperiments;
+ @JsonProperty("failed")
+ private List failedExperiments;
+
+ public Experiments(List newExperiments, List updatedExperiments) {
+ this.newExperiments = newExperiments;
+ this.updatedExperiments = updatedExperiments;
+ }
+
+ public List getNewExperiments() {
+ return newExperiments;
+ }
+
+ public void setNewExperiments(List newExperiments) {
+ this.newExperiments = newExperiments;
+ }
+
+ public List getUpdatedExperiments() {
+ return updatedExperiments;
+ }
+
+ public void setUpdatedExperiments(List updatedExperiments) {
+ this.updatedExperiments = updatedExperiments;
+ }
+ }
+
+ // Inner class for recommendations
+ public static class Recommendations {
+ private RecommendationData data;
+
+ public Recommendations(RecommendationData data) {
+ this.data = data;
+ }
+
+ public RecommendationData getData() {
+ return data;
+ }
+
+ public void setData(RecommendationData data) {
+ this.data = data;
+ }
+ }
+
+ // Inner class for recommendation data
+ public static class RecommendationData {
+ private List processed = Collections.synchronizedList(new ArrayList<>());
+ private List processing = Collections.synchronizedList(new ArrayList<>());
+ private List unprocessed = Collections.synchronizedList(new ArrayList<>());
+ private List failed = Collections.synchronizedList(new ArrayList<>());
+
+ public RecommendationData(List processed, List processing, List unprocessed, List failed) {
+ this.processed = processed;
+ this.processing = processing;
+ this.unprocessed = unprocessed;
+ this.failed = failed;
+ }
+
+ public List getProcessed() {
+ return processed;
+ }
+
+ public synchronized void setProcessed(List processed) {
+ this.processed = processed;
+ }
+
+ public List getProcessing() {
+ return processing;
+ }
+
+ public synchronized void setProcessing(List processing) {
+ this.processing = processing;
+ }
+
+ public List getUnprocessed() {
+ return unprocessed;
+ }
+
+ public synchronized void setUnprocessed(List unprocessed) {
+ this.unprocessed = unprocessed;
+ }
+
+ public List getFailed() {
+ return failed;
+ }
+
+ public synchronized void setFailed(List failed) {
+ this.failed = failed;
+ }
+
+ // Move elements from inqueue to progress
+ public synchronized void moveToProgress(String element) {
+ if (unprocessed.contains(element)) {
+ unprocessed.remove(element);
+ if (!processing.contains(element)) {
+ processing.add(element);
+ }
+ }
+ }
+
+ // Move elements from progress to completed
+ public synchronized void moveToCompleted(String element) {
+ if (processing.contains(element)) {
+ processing.remove(element);
+ if (!processed.contains(element)) {
+ processed.add(element);
+ }
+ }
+ }
+
+ // Move elements from progress to failed
+ public synchronized void moveToFailed(String element) {
+ if (processing.contains(element)) {
+ processing.remove(element);
+ if (!failed.contains(element)) {
+ failed.add(element);
+ }
+ }
+ }
+
+ // Calculate the percentage of completion
+ public int completionPercentage() {
+ int totalTasks = processed.size() + processing.size() + unprocessed.size() + failed.size();
+ if (totalTasks == 0) {
+ return (int) 0.0;
+ }
+ return (int) ((processed.size() * 100.0) / totalTasks);
+ }
+
+
+ }
+}
diff --git a/src/main/java/com/autotune/analyzer/serviceObjects/KubernetesAPIObject.java b/src/main/java/com/autotune/analyzer/serviceObjects/KubernetesAPIObject.java
index 0a6d52ecf..d24cc3638 100644
--- a/src/main/java/com/autotune/analyzer/serviceObjects/KubernetesAPIObject.java
+++ b/src/main/java/com/autotune/analyzer/serviceObjects/KubernetesAPIObject.java
@@ -49,14 +49,26 @@ public String getType() {
return type;
}
+ public void setType(String type) {
+ this.type = type;
+ }
+
public String getName() {
return name;
}
+ public void setName(String name) {
+ this.name = name;
+ }
+
public String getNamespace() {
return namespace;
}
+ public void setNamespace(String namespace) {
+ this.namespace = namespace;
+ }
+
@JsonProperty(KruizeConstants.JSONKeys.CONTAINERS)
public List getContainerAPIObjects() {
return containerAPIObjects;
diff --git a/src/main/java/com/autotune/analyzer/services/BulkService.java b/src/main/java/com/autotune/analyzer/services/BulkService.java
new file mode 100644
index 000000000..1f7e3debf
--- /dev/null
+++ b/src/main/java/com/autotune/analyzer/services/BulkService.java
@@ -0,0 +1,159 @@
+/*******************************************************************************
+ * Copyright (c) 2022 Red Hat, IBM Corporation and others.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *******************************************************************************/
+package com.autotune.analyzer.services;
+
+import com.autotune.analyzer.serviceObjects.BulkInput;
+import com.autotune.analyzer.serviceObjects.BulkJobStatus;
+import com.autotune.analyzer.workerimpl.BulkJobManager;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ser.impl.SimpleBeanPropertyFilter;
+import com.fasterxml.jackson.databind.ser.impl.SimpleFilterProvider;
+import org.json.JSONObject;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.servlet.ServletConfig;
+import javax.servlet.ServletException;
+import javax.servlet.annotation.WebServlet;
+import javax.servlet.http.HttpServlet;
+import javax.servlet.http.HttpServletRequest;
+import javax.servlet.http.HttpServletResponse;
+import java.io.IOException;
+import java.time.Instant;
+import java.util.ArrayList;
+import java.util.Map;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+
+import static com.autotune.analyzer.utils.AnalyzerConstants.ServiceConstants.*;
+import static com.autotune.utils.KruizeConstants.KRUIZE_BULK_API.*;
+
+/**
+ *
+ */
+@WebServlet(asyncSupported = true)
+public class BulkService extends HttpServlet {
+ private static final long serialVersionUID = 1L;
+ private static final Logger LOGGER = LoggerFactory.getLogger(BulkService.class);
+ private ExecutorService executorService = Executors.newFixedThreadPool(10);
+ private Map jobStatusMap = new ConcurrentHashMap<>();
+
+ @Override
+ public void init(ServletConfig config) throws ServletException {
+ super.init(config);
+ }
+
+ /**
+ * @param req
+ * @param resp
+ * @throws ServletException
+ * @throws IOException
+ */
+ @Override
+ protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
+ String jobID = req.getParameter(JOB_ID);
+ String verboseParam = req.getParameter(VERBOSE);
+ // If the parameter is not provided (null), default it to false
+ boolean verbose = verboseParam != null && Boolean.parseBoolean(verboseParam);
+ BulkJobStatus jobDetails = jobStatusMap.get(jobID);
+ resp.setContentType(JSON_CONTENT_TYPE);
+ resp.setCharacterEncoding(CHARACTER_ENCODING);
+ SimpleFilterProvider filters = new SimpleFilterProvider();
+
+ if (jobDetails == null) {
+ sendErrorResponse(
+ resp,
+ null,
+ HttpServletResponse.SC_NOT_FOUND,
+ JOB_NOT_FOUND_MSG
+ );
+ } else {
+ try {
+ resp.setStatus(HttpServletResponse.SC_OK);
+ // Return the JSON representation of the JobStatus object
+ ObjectMapper objectMapper = new ObjectMapper();
+ if (!verbose) {
+ filters.addFilter("jobFilter", SimpleBeanPropertyFilter.serializeAllExcept("data"));
+ } else {
+ filters.addFilter("jobFilter", SimpleBeanPropertyFilter.serializeAll());
+ }
+ objectMapper.setFilterProvider(filters);
+ String jsonResponse = objectMapper.writeValueAsString(jobDetails);
+ resp.getWriter().write(jsonResponse);
+ } catch (Exception e) {
+ e.printStackTrace();
+ }
+ }
+ }
+
+ /**
+ * @param request
+ * @param response
+ * @throws ServletException
+ * @throws IOException
+ */
+ @Override
+ protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
+ // Set response type
+ response.setContentType(JSON_CONTENT_TYPE);
+ response.setCharacterEncoding(CHARACTER_ENCODING);
+
+ // Create ObjectMapper instance
+ ObjectMapper objectMapper = new ObjectMapper();
+
+ // Read the request payload and map to RequestPayload class
+ BulkInput payload = objectMapper.readValue(request.getInputStream(), BulkInput.class);
+
+ // Generate a unique jobID
+ String jobID = UUID.randomUUID().toString();
+ BulkJobStatus.Data data = new BulkJobStatus.Data(
+ new BulkJobStatus.Experiments(new ArrayList<>(), new ArrayList<>()),
+ new BulkJobStatus.Recommendations(new BulkJobStatus.RecommendationData(
+ new ArrayList<>(),
+ new ArrayList<>(),
+ new ArrayList<>(),
+ new ArrayList<>()
+ ))
+ );
+ jobStatusMap.put(jobID, new BulkJobStatus(jobID, IN_PROGRESS, data, Instant.now()));
+ // Submit the job to be processed asynchronously
+ executorService.submit(new BulkJobManager(jobID, jobStatusMap, payload));
+
+ // Just sending a simple success response back
+ // Return the jobID to the user
+ JSONObject jsonObject = new JSONObject();
+ jsonObject.put(JOB_ID, jobID);
+ response.getWriter().write(jsonObject.toString());
+ }
+
+
+ @Override
+ public void destroy() {
+ executorService.shutdown();
+ }
+
+ public void sendErrorResponse(HttpServletResponse response, Exception e, int httpStatusCode, String errorMsg) throws
+ IOException {
+ if (null != e) {
+ LOGGER.error(e.toString());
+ e.printStackTrace();
+ if (null == errorMsg) errorMsg = e.getMessage();
+ }
+ response.sendError(httpStatusCode, errorMsg);
+ }
+}
diff --git a/src/main/java/com/autotune/analyzer/services/DSMetadataService.java b/src/main/java/com/autotune/analyzer/services/DSMetadataService.java
index 904ada6ad..4f786b419 100644
--- a/src/main/java/com/autotune/analyzer/services/DSMetadataService.java
+++ b/src/main/java/com/autotune/analyzer/services/DSMetadataService.java
@@ -16,6 +16,8 @@
package com.autotune.analyzer.services;
+import com.autotune.analyzer.adapters.DeviceDetailsAdapter;
+import com.autotune.analyzer.adapters.RecommendationItemAdapter;
import com.autotune.analyzer.exceptions.KruizeResponse;
import com.autotune.analyzer.serviceObjects.DSMetadataAPIObject;
import com.autotune.analyzer.utils.AnalyzerConstants;
@@ -23,6 +25,7 @@
import com.autotune.analyzer.utils.GsonUTCDateAdapter;
import com.autotune.common.data.ValidationOutputData;
import com.autotune.common.data.dataSourceMetadata.DataSourceMetadataInfo;
+import com.autotune.common.data.system.info.device.DeviceDetails;
import com.autotune.common.datasource.DataSourceInfo;
import com.autotune.common.datasource.DataSourceManager;
import com.autotune.common.datasource.DataSourceMetadataValidation;
@@ -130,7 +133,7 @@ protected void doPost(HttpServletRequest request, HttpServletResponse response)
return;
}
- DataSourceMetadataInfo metadataInfo = dataSourceManager.importMetadataFromDataSource(datasource);
+ DataSourceMetadataInfo metadataInfo = dataSourceManager.importMetadataFromDataSource(datasource,"",0,0,0);
// Validate imported metadataInfo object
DataSourceMetadataValidation validationObject = new DataSourceMetadataValidation();
@@ -240,6 +243,7 @@ private void sendSuccessResponse(HttpServletResponse response, DataSourceMetadat
.setPrettyPrinting()
.enableComplexMapKeySerialization()
.registerTypeAdapter(Date.class, new GsonUTCDateAdapter())
+ .registerTypeAdapter(AnalyzerConstants.RecommendationItem.class, new RecommendationItemAdapter())
.create();
gsonStr = gsonObj.toJson(dataSourceMetadata);
}
@@ -416,6 +420,8 @@ private Gson createGsonObject() {
.setPrettyPrinting()
.enableComplexMapKeySerialization()
.registerTypeAdapter(Date.class, new GsonUTCDateAdapter())
+ .registerTypeAdapter(AnalyzerConstants.RecommendationItem.class, new RecommendationItemAdapter())
+ .registerTypeAdapter(DeviceDetails.class, new DeviceDetailsAdapter())
.create();
}
private boolean isValidBooleanValue(String value) {
diff --git a/src/main/java/com/autotune/analyzer/services/GenerateRecommendations.java b/src/main/java/com/autotune/analyzer/services/GenerateRecommendations.java
index 64d05fe9c..8a2d5f22c 100644
--- a/src/main/java/com/autotune/analyzer/services/GenerateRecommendations.java
+++ b/src/main/java/com/autotune/analyzer/services/GenerateRecommendations.java
@@ -15,6 +15,8 @@
*******************************************************************************/
package com.autotune.analyzer.services;
+import com.autotune.analyzer.adapters.DeviceDetailsAdapter;
+import com.autotune.analyzer.adapters.RecommendationItemAdapter;
import com.autotune.analyzer.exceptions.FetchMetricsError;
import com.autotune.analyzer.kruizeObject.KruizeObject;
import com.autotune.analyzer.recommendations.engine.RecommendationEngine;
@@ -29,6 +31,7 @@
import com.autotune.common.data.metrics.MetricResults;
import com.autotune.common.data.result.ContainerData;
import com.autotune.common.data.result.IntervalResults;
+import com.autotune.common.data.system.info.device.DeviceDetails;
import com.autotune.common.datasource.DataSourceInfo;
import com.autotune.common.k8sObjects.K8sObject;
import com.autotune.utils.GenericRestApiClient;
@@ -171,6 +174,8 @@ public boolean shouldSkipClass(Class> clazz) {
.setPrettyPrinting()
.enableComplexMapKeySerialization()
.registerTypeAdapter(Date.class, new GsonUTCDateAdapter())
+ .registerTypeAdapter(AnalyzerConstants.RecommendationItem.class, new RecommendationItemAdapter())
+ .registerTypeAdapter(DeviceDetails.class, new DeviceDetailsAdapter())
.setExclusionStrategies(strategy)
.create();
gsonStr = gsonObj.toJson(recommendationList);
diff --git a/src/main/java/com/autotune/analyzer/services/ListDatasources.java b/src/main/java/com/autotune/analyzer/services/ListDatasources.java
index 1af77454d..9493f3ad5 100644
--- a/src/main/java/com/autotune/analyzer/services/ListDatasources.java
+++ b/src/main/java/com/autotune/analyzer/services/ListDatasources.java
@@ -16,10 +16,13 @@
package com.autotune.analyzer.services;
+import com.autotune.analyzer.adapters.DeviceDetailsAdapter;
+import com.autotune.analyzer.adapters.RecommendationItemAdapter;
import com.autotune.analyzer.serviceObjects.ListDatasourcesAPIObject;
import com.autotune.analyzer.utils.AnalyzerConstants;
import com.autotune.analyzer.utils.AnalyzerErrorConstants;
import com.autotune.analyzer.utils.GsonUTCDateAdapter;
+import com.autotune.common.data.system.info.device.DeviceDetails;
import com.autotune.common.datasource.DataSourceInfo;
import com.autotune.database.service.ExperimentDBService;
import com.autotune.utils.MetricsConfig;
@@ -148,6 +151,8 @@ private Gson createGsonObject() {
.setPrettyPrinting()
.enableComplexMapKeySerialization()
.registerTypeAdapter(Date.class, new GsonUTCDateAdapter())
+ .registerTypeAdapter(AnalyzerConstants.RecommendationItem.class, new RecommendationItemAdapter())
+ .registerTypeAdapter(DeviceDetails.class, new DeviceDetailsAdapter())
.create();
}
diff --git a/src/main/java/com/autotune/analyzer/services/ListExperiments.java b/src/main/java/com/autotune/analyzer/services/ListExperiments.java
index e71d5e96f..b8ca71447 100644
--- a/src/main/java/com/autotune/analyzer/services/ListExperiments.java
+++ b/src/main/java/com/autotune/analyzer/services/ListExperiments.java
@@ -16,6 +16,8 @@
package com.autotune.analyzer.services;
+import com.autotune.analyzer.adapters.DeviceDetailsAdapter;
+import com.autotune.analyzer.adapters.RecommendationItemAdapter;
import com.autotune.analyzer.experiment.KruizeExperiment;
import com.autotune.analyzer.kruizeObject.KruizeObject;
import com.autotune.analyzer.serviceObjects.ContainerAPIObject;
@@ -29,6 +31,7 @@
import com.autotune.common.data.metrics.MetricResults;
import com.autotune.common.data.result.ContainerData;
import com.autotune.common.data.result.IntervalResults;
+import com.autotune.common.data.system.info.device.DeviceDetails;
import com.autotune.common.k8sObjects.K8sObject;
import com.autotune.common.target.kubernetes.service.KubernetesServices;
import com.autotune.common.trials.ExperimentTrial;
@@ -281,6 +284,8 @@ private Gson createGsonObject() {
.setPrettyPrinting()
.enableComplexMapKeySerialization()
.registerTypeAdapter(Date.class, new GsonUTCDateAdapter())
+ .registerTypeAdapter(AnalyzerConstants.RecommendationItem.class, new RecommendationItemAdapter())
+ .registerTypeAdapter(DeviceDetails.class, new DeviceDetailsAdapter())
.setExclusionStrategies(new ExclusionStrategy() {
@Override
public boolean shouldSkipField(FieldAttributes f) {
diff --git a/src/main/java/com/autotune/analyzer/services/ListRecommendations.java b/src/main/java/com/autotune/analyzer/services/ListRecommendations.java
index 69bcca37c..ee533905f 100644
--- a/src/main/java/com/autotune/analyzer/services/ListRecommendations.java
+++ b/src/main/java/com/autotune/analyzer/services/ListRecommendations.java
@@ -16,6 +16,8 @@
package com.autotune.analyzer.services;
+import com.autotune.analyzer.adapters.DeviceDetailsAdapter;
+import com.autotune.analyzer.adapters.RecommendationItemAdapter;
import com.autotune.analyzer.exceptions.KruizeResponse;
import com.autotune.analyzer.kruizeObject.KruizeObject;
import com.autotune.analyzer.serviceObjects.ContainerAPIObject;
@@ -26,6 +28,7 @@
import com.autotune.analyzer.utils.GsonUTCDateAdapter;
import com.autotune.analyzer.utils.ServiceHelpers;
import com.autotune.common.data.result.ContainerData;
+import com.autotune.common.data.system.info.device.DeviceDetails;
import com.autotune.database.service.ExperimentDBService;
import com.autotune.utils.KruizeConstants;
import com.autotune.utils.MetricsConfig;
@@ -224,6 +227,8 @@ public boolean shouldSkipClass(Class> clazz) {
.setPrettyPrinting()
.enableComplexMapKeySerialization()
.registerTypeAdapter(Date.class, new GsonUTCDateAdapter())
+ .registerTypeAdapter(AnalyzerConstants.RecommendationItem.class, new RecommendationItemAdapter())
+ .registerTypeAdapter(DeviceDetails.class, new DeviceDetailsAdapter())
.setExclusionStrategies(strategy)
.create();
gsonStr = gsonObj.toJson(recommendationList);
diff --git a/src/main/java/com/autotune/analyzer/services/ListSupportedK8sObjects.java b/src/main/java/com/autotune/analyzer/services/ListSupportedK8sObjects.java
index 1ac7dc39d..f0b2db569 100644
--- a/src/main/java/com/autotune/analyzer/services/ListSupportedK8sObjects.java
+++ b/src/main/java/com/autotune/analyzer/services/ListSupportedK8sObjects.java
@@ -15,9 +15,12 @@
*******************************************************************************/
package com.autotune.analyzer.services;
+import com.autotune.analyzer.adapters.DeviceDetailsAdapter;
+import com.autotune.analyzer.adapters.RecommendationItemAdapter;
import com.autotune.analyzer.serviceObjects.ListSupportedK8sObjectsSO;
import com.autotune.analyzer.utils.GsonUTCDateAdapter;
import com.autotune.analyzer.utils.AnalyzerConstants;
+import com.autotune.common.data.system.info.device.DeviceDetails;
import com.autotune.utils.Utils;
import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
@@ -57,6 +60,8 @@ protected void doPost(HttpServletRequest request, HttpServletResponse response)
.setPrettyPrinting()
.enableComplexMapKeySerialization()
.registerTypeAdapter(Date.class, new GsonUTCDateAdapter())
+ .registerTypeAdapter(AnalyzerConstants.RecommendationItem.class, new RecommendationItemAdapter())
+ .registerTypeAdapter(DeviceDetails.class, new DeviceDetailsAdapter())
.create();
// Convert the Service object to JSON
responseGSONString = gsonObj.toJson(listSupportedK8sObjectsSO);
diff --git a/src/main/java/com/autotune/analyzer/services/MetricProfileService.java b/src/main/java/com/autotune/analyzer/services/MetricProfileService.java
index d4311d07d..ca5372c0e 100644
--- a/src/main/java/com/autotune/analyzer/services/MetricProfileService.java
+++ b/src/main/java/com/autotune/analyzer/services/MetricProfileService.java
@@ -16,6 +16,8 @@
package com.autotune.analyzer.services;
+import com.autotune.analyzer.adapters.DeviceDetailsAdapter;
+import com.autotune.analyzer.adapters.RecommendationItemAdapter;
import com.autotune.analyzer.exceptions.InvalidValueException;
import com.autotune.analyzer.exceptions.PerformanceProfileResponse;
import com.autotune.analyzer.performanceProfiles.MetricProfileCollection;
@@ -28,6 +30,7 @@
import com.autotune.common.data.ValidationOutputData;
import com.autotune.common.data.metrics.Metric;
import com.autotune.common.data.result.ContainerData;
+import com.autotune.common.data.system.info.device.DeviceDetails;
import com.autotune.database.dao.ExperimentDAOImpl;
import com.autotune.database.service.ExperimentDBService;
import com.autotune.utils.KruizeConstants;
@@ -378,6 +381,8 @@ private Gson createGsonObject() {
.setPrettyPrinting()
.enableComplexMapKeySerialization()
.registerTypeAdapter(Date.class, new GsonUTCDateAdapter())
+ .registerTypeAdapter(AnalyzerConstants.RecommendationItem.class, new RecommendationItemAdapter())
+ .registerTypeAdapter(DeviceDetails.class, new DeviceDetailsAdapter())
// a custom serializer for serializing metadata of JsonNode type.
.registerTypeAdapter(JsonNode.class, new JsonSerializer() {
@Override
diff --git a/src/main/java/com/autotune/analyzer/services/PerformanceProfileService.java b/src/main/java/com/autotune/analyzer/services/PerformanceProfileService.java
index 71be6267e..43cc8588f 100644
--- a/src/main/java/com/autotune/analyzer/services/PerformanceProfileService.java
+++ b/src/main/java/com/autotune/analyzer/services/PerformanceProfileService.java
@@ -16,6 +16,8 @@
package com.autotune.analyzer.services;
+import com.autotune.analyzer.adapters.DeviceDetailsAdapter;
+import com.autotune.analyzer.adapters.RecommendationItemAdapter;
import com.autotune.analyzer.exceptions.InvalidValueException;
import com.autotune.analyzer.exceptions.PerformanceProfileResponse;
import com.autotune.analyzer.performanceProfiles.PerformanceProfile;
@@ -26,6 +28,7 @@
import com.autotune.analyzer.utils.GsonUTCDateAdapter;
import com.autotune.common.data.ValidationOutputData;
import com.autotune.common.data.metrics.Metric;
+import com.autotune.common.data.system.info.device.DeviceDetails;
import com.autotune.database.service.ExperimentDBService;
import com.google.gson.ExclusionStrategy;
import com.google.gson.FieldAttributes;
@@ -130,6 +133,8 @@ protected void doGet(HttpServletRequest req, HttpServletResponse response) throw
.setPrettyPrinting()
.enableComplexMapKeySerialization()
.registerTypeAdapter(Date.class, new GsonUTCDateAdapter())
+ .registerTypeAdapter(AnalyzerConstants.RecommendationItem.class, new RecommendationItemAdapter())
+ .registerTypeAdapter(DeviceDetails.class, new DeviceDetailsAdapter())
.setExclusionStrategies(new ExclusionStrategy() {
@Override
public boolean shouldSkipField(FieldAttributes f) {
diff --git a/src/main/java/com/autotune/analyzer/services/UpdateRecommendations.java b/src/main/java/com/autotune/analyzer/services/UpdateRecommendations.java
index 903378655..e558d1d37 100644
--- a/src/main/java/com/autotune/analyzer/services/UpdateRecommendations.java
+++ b/src/main/java/com/autotune/analyzer/services/UpdateRecommendations.java
@@ -15,15 +15,19 @@
*******************************************************************************/
package com.autotune.analyzer.services;
+import com.autotune.analyzer.adapters.DeviceDetailsAdapter;
+import com.autotune.analyzer.adapters.RecommendationItemAdapter;
import com.autotune.analyzer.exceptions.FetchMetricsError;
import com.autotune.analyzer.kruizeObject.KruizeObject;
import com.autotune.analyzer.recommendations.engine.RecommendationEngine;
import com.autotune.analyzer.serviceObjects.ContainerAPIObject;
import com.autotune.analyzer.serviceObjects.Converters;
import com.autotune.analyzer.serviceObjects.ListRecommendationsAPIObject;
+import com.autotune.analyzer.utils.AnalyzerConstants;
import com.autotune.analyzer.utils.AnalyzerErrorConstants;
import com.autotune.analyzer.utils.GsonUTCDateAdapter;
import com.autotune.common.data.result.ContainerData;
+import com.autotune.common.data.system.info.device.DeviceDetails;
import com.autotune.operator.KruizeDeploymentInfo;
import com.autotune.utils.KruizeConstants;
import com.autotune.utils.MetricsConfig;
@@ -168,6 +172,8 @@ public boolean shouldSkipClass(Class> clazz) {
.setPrettyPrinting()
.enableComplexMapKeySerialization()
.registerTypeAdapter(Date.class, new GsonUTCDateAdapter())
+ .registerTypeAdapter(AnalyzerConstants.RecommendationItem.class, new RecommendationItemAdapter())
+ .registerTypeAdapter(DeviceDetails.class, new DeviceDetailsAdapter())
.setExclusionStrategies(strategy)
.create();
gsonStr = gsonObj.toJson(recommendationList);
diff --git a/src/main/java/com/autotune/analyzer/services/UpdateResults.java b/src/main/java/com/autotune/analyzer/services/UpdateResults.java
index 7ae38192e..a5d8bbd79 100644
--- a/src/main/java/com/autotune/analyzer/services/UpdateResults.java
+++ b/src/main/java/com/autotune/analyzer/services/UpdateResults.java
@@ -16,6 +16,8 @@
package com.autotune.analyzer.services;
+import com.autotune.analyzer.adapters.DeviceDetailsAdapter;
+import com.autotune.analyzer.adapters.RecommendationItemAdapter;
import com.autotune.analyzer.exceptions.KruizeResponse;
import com.autotune.analyzer.experiment.ExperimentInitiator;
import com.autotune.analyzer.performanceProfiles.PerformanceProfile;
@@ -23,6 +25,7 @@
import com.autotune.analyzer.serviceObjects.UpdateResultsAPIObject;
import com.autotune.analyzer.utils.AnalyzerConstants;
import com.autotune.analyzer.utils.AnalyzerErrorConstants;
+import com.autotune.common.data.system.info.device.DeviceDetails;
import com.autotune.operator.KruizeDeploymentInfo;
import com.autotune.utils.MetricsConfig;
import com.google.gson.*;
@@ -78,6 +81,8 @@ protected void doPost(HttpServletRequest request, HttpServletResponse response)
Gson gson = new GsonBuilder()
.registerTypeAdapter(Double.class, new CustomNumberDeserializer())
.registerTypeAdapter(Integer.class, new CustomNumberDeserializer())
+ .registerTypeAdapter(AnalyzerConstants.RecommendationItem.class, new RecommendationItemAdapter())
+ .registerTypeAdapter(DeviceDetails.class, new DeviceDetailsAdapter())
.create();
LOGGER.debug("updateResults API request payload for requestID {} is {}", calCount, inputData);
try {
diff --git a/src/main/java/com/autotune/analyzer/utils/AnalyzerConstants.java b/src/main/java/com/autotune/analyzer/utils/AnalyzerConstants.java
index 4d6b1460a..740bb859a 100644
--- a/src/main/java/com/autotune/analyzer/utils/AnalyzerConstants.java
+++ b/src/main/java/com/autotune/analyzer/utils/AnalyzerConstants.java
@@ -119,8 +119,31 @@ public enum ExperimentStatus {
}
public enum RecommendationItem {
- cpu,
- memory
+ CPU("cpu"),
+ MEMORY("memory"),
+ NVIDIA_GPU("nvidia.com/gpu"),
+ NVIDIA_GPU_PARTITION_1_CORE_5GB("nvidia.com/mig-1g.5gb"),
+ NVIDIA_GPU_PARTITION_1_CORE_10GB("nvidia.com/mig-1g.10gb"),
+ NVIDIA_GPU_PARTITION_1_CORE_20GB("nvidia.com/mig-1g.20gb"),
+ NVIDIA_GPU_PARTITION_2_CORES_20GB("nvidia.com/mig-2g.20gb"),
+ NVIDIA_GPU_PARTITION_3_CORES_40GB("nvidia.com/mig-3g.40gb"),
+ NVIDIA_GPU_PARTITION_4_CORES_40GB("nvidia.com/mig-4g.40gb"),
+ NVIDIA_GPU_PARTITION_7_CORES_80GB("nvidia.com/mig-7g.80gb"),
+ NVIDIA_GPU_PARTITION_2_CORES_10GB("nvidia.com/mig-2g.10gb"),
+ NVIDIA_GPU_PARTITION_3_CORES_20GB("nvidia.com/mig-3g.20gb"),
+ NVIDIA_GPU_PARTITION_4_CORES_20GB("nvidia.com/mig-4g.20gb"),
+ NVIDIA_GPU_PARTITION_7_CORES_40GB("nvidia.com/mig-7g.40gb");
+
+ private final String value;
+
+ RecommendationItem(String value) {
+ this.value = value;
+ }
+
+ @Override
+ public String toString() {
+ return value;
+ }
}
public enum CapacityMax {
@@ -196,6 +219,66 @@ public enum RegisterRecommendationModelStatus {
INVALID
}
+ public enum DeviceType {
+ CPU,
+ MEMORY,
+ NETWORK,
+ ACCELERATOR
+ }
+
+ public enum DeviceParameters {
+ MODEL_NAME,
+ UUID,
+ HOSTNAME,
+ NAME,
+ MANUFACTURER,
+ DEVICE_NAME
+ }
+
+ public static final class AcceleratorConstants {
+ private AcceleratorConstants() {
+
+ }
+
+ public static final class AcceleratorMetricConstants {
+ private AcceleratorMetricConstants() {
+
+ }
+
+ public static final int TIMESTAMP_RANGE_CHECK_IN_MINUTES = 5;
+ }
+
+ public static final class SupportedAccelerators {
+ private SupportedAccelerators() {
+
+ }
+ public static final String A100_80_GB = "A100-80GB";
+ public static final String A100_40_GB = "A100-40GB";
+ public static final String H100_80_GB = "H100-80GB";
+ }
+
+ public static final class AcceleratorProfiles {
+ private AcceleratorProfiles () {
+
+ }
+
+ // A100 40GB Profiles
+ public static final String PROFILE_1G_5GB = "1g.5gb";
+ public static final String PROFILE_1G_10GB = "1g.10gb";
+ public static final String PROFILE_2G_10GB = "2g.10gb";
+ public static final String PROFILE_3G_20GB = "3g.20gb";
+ public static final String PROFILE_4G_20GB = "4g.20gb";
+ public static final String PROFILE_7G_40GB = "7g.40gb";
+
+ // A100 80GB & H100 80GB Profiles
+ public static final String PROFILE_1G_20GB = "1g.20gb";
+ public static final String PROFILE_2G_20GB = "2g.20gb";
+ public static final String PROFILE_3G_40GB = "3g.40gb";
+ public static final String PROFILE_4G_40GB = "4g.40gb";
+ public static final String PROFILE_7G_80GB = "7g.80gb";
+ }
+ }
+
public static final class ExperimentTypes {
public static final String NAMESPACE_EXPERIMENT = "namespace";
public static final String CONTAINER_EXPERIMENT = "container";
diff --git a/src/main/java/com/autotune/analyzer/workerimpl/BulkJobManager.java b/src/main/java/com/autotune/analyzer/workerimpl/BulkJobManager.java
new file mode 100644
index 000000000..c827fd289
--- /dev/null
+++ b/src/main/java/com/autotune/analyzer/workerimpl/BulkJobManager.java
@@ -0,0 +1,303 @@
+/*******************************************************************************
+ * Copyright (c) 2020, 2021 Red Hat, IBM Corporation and others.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *******************************************************************************/
+package com.autotune.analyzer.workerimpl;
+
+
+import com.autotune.analyzer.kruizeObject.KruizeObject;
+import com.autotune.analyzer.kruizeObject.RecommendationSettings;
+import com.autotune.analyzer.serviceObjects.*;
+import com.autotune.analyzer.utils.AnalyzerConstants;
+import com.autotune.common.data.ValidationOutputData;
+import com.autotune.common.data.dataSourceMetadata.*;
+import com.autotune.common.datasource.DataSourceInfo;
+import com.autotune.common.datasource.DataSourceManager;
+import com.autotune.common.k8sObjects.TrialSettings;
+import com.autotune.common.utils.CommonUtils;
+import com.autotune.database.service.ExperimentDBService;
+import com.autotune.operator.KruizeDeploymentInfo;
+import com.autotune.utils.KruizeConstants;
+import com.autotune.utils.Utils;
+import org.json.JSONObject;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.net.HttpURLConnection;
+import java.net.URL;
+import java.sql.Timestamp;
+import java.time.Instant;
+import java.time.LocalDateTime;
+import java.time.ZoneOffset;
+import java.time.format.DateTimeFormatter;
+import java.util.*;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+
+import static com.autotune.operator.KruizeDeploymentInfo.bulk_thread_pool_size;
+import static com.autotune.utils.KruizeConstants.KRUIZE_BULK_API.*;
+
+
+/**
+ * The `run` method processes bulk input to create experiments and generates resource optimization recommendations.
+ * It handles the creation of experiment names based on various data source components, makes HTTP POST requests
+ * to generate recommendations, and updates job statuses based on the progress of the recommendations.
+ *
+ *
+ * Key operations include:
+ *
+ * - Processing 'include' filter labels to generate a unique key.
+ * - Validating and setting the data source if not provided in the input.
+ * - Extracting time range from the input and converting it to epoch time format.
+ * - Fetching metadata information from the data source for the specified time range and labels.
+ * - Creating experiments for each data source component such as clusters, namespaces, workloads, and containers.
+ * - Submitting HTTP POST requests to retrieve recommendations for each created experiment.
+ * - Updating the job status and progress based on the completion of recommendations.
+ *
+ *
+ *
+ *
+ * In case of an exception during the process, error messages are logged, and the exception is printed for debugging.
+ *
+ *
+ * @throws RuntimeException if URL or HTTP connection setup fails.
+ * @throws IOException if an error occurs while sending HTTP requests.
+ */
+public class BulkJobManager implements Runnable {
+ private static final Logger LOGGER = LoggerFactory.getLogger(BulkJobManager.class);
+
+ private String jobID;
+ private Map jobStatusMap;
+ private BulkInput bulkInput;
+
+
+ public BulkJobManager(String jobID, Map jobStatusMap, BulkInput payload) {
+ this.jobID = jobID;
+ this.jobStatusMap = jobStatusMap;
+ this.bulkInput = payload;
+ }
+
+ public static List appendExperiments(List allExperiments, String experimentName) {
+ allExperiments.add(experimentName);
+ return allExperiments;
+ }
+
+ @Override
+ public void run() {
+ try {
+ BulkJobStatus jobData = jobStatusMap.get(jobID);
+ String uniqueKey = getLabels(this.bulkInput.getFilter());
+ if (null == this.bulkInput.getDatasource()) {
+ this.bulkInput.setDatasource(CREATE_EXPERIMENT_CONFIG_BEAN.getDatasourceName());
+ }
+ DataSourceMetadataInfo metadataInfo = null;
+ DataSourceManager dataSourceManager = new DataSourceManager();
+ DataSourceInfo datasource = CommonUtils.getDataSourceInfo(this.bulkInput.getDatasource());
+ JSONObject daterange = processDateRange(this.bulkInput.getTime_range());
+ if (null != daterange)
+ metadataInfo = dataSourceManager.importMetadataFromDataSource(datasource, uniqueKey, (Long) daterange.get("start_time"), (Long) daterange.get("end_time"), (Integer) daterange.get("steps"));
+ else {
+ metadataInfo = dataSourceManager.importMetadataFromDataSource(datasource, uniqueKey, 0, 0, 0);
+ }
+ if (null == metadataInfo) {
+ jobData.setStatus(COMPLETED);
+ jobData.setMessage(NOTHING);
+ } else {
+ Map createExperimentAPIObjectMap = getExperimentMap(metadataInfo); //Todo Store this map in buffer and use it if BulkAPI pods restarts and support experiment_type
+ jobData.setTotal_experiments(createExperimentAPIObjectMap.size());
+ jobData.setProcessed_experiments(0);
+ if (jobData.getTotal_experiments() > KruizeDeploymentInfo.BULK_API_LIMIT) {
+ jobStatusMap.get(jobID).setStatus(FAILED);
+ jobStatusMap.get(jobID).setMessage(String.format(LIMIT_MESSAGE, KruizeDeploymentInfo.BULK_API_LIMIT));
+ } else {
+ ExecutorService createExecutor = Executors.newFixedThreadPool(bulk_thread_pool_size);
+ ExecutorService generateExecutor = Executors.newFixedThreadPool(bulk_thread_pool_size);
+ for (CreateExperimentAPIObject apiObject : createExperimentAPIObjectMap.values()) {
+ createExecutor.submit(() -> {
+ String experiment_name = apiObject.getExperimentName();
+ BulkJobStatus.Experiments newExperiments = jobData.getData().getExperiments();
+ BulkJobStatus.RecommendationData recommendationData = jobData.getData().getRecommendations().getData();
+ try {
+ ValidationOutputData output = new ExperimentDBService().addExperimentToDB(apiObject);
+ if (output.isSuccess()) {
+ jobData.getData().getExperiments().setNewExperiments(
+ appendExperiments(newExperiments.getNewExperiments(), experiment_name)
+ );
+ }
+ generateExecutor.submit(() -> {
+
+ jobData.getData().getRecommendations().getData().setUnprocessed(
+ appendExperiments(recommendationData.getUnprocessed(), experiment_name)
+ );
+
+ URL url = null;
+ HttpURLConnection connection = null;
+ int statusCode = 0;
+ try {
+ url = new URL(String.format(KruizeDeploymentInfo.recommendations_url, experiment_name));
+ connection = (HttpURLConnection) url.openConnection();
+ connection.setRequestMethod("POST");
+
+ recommendationData.moveToProgress(experiment_name);
+
+ statusCode = connection.getResponseCode();
+ } catch (IOException e) {
+ LOGGER.error(e.getMessage());
+
+ recommendationData.moveToFailed(experiment_name);
+
+ throw new RuntimeException(e);
+ } finally {
+ if (null != connection) connection.disconnect();
+ }
+ if (statusCode == HttpURLConnection.HTTP_CREATED) {
+
+ recommendationData.moveToCompleted(experiment_name);
+ jobData.setProcessed_experiments(jobData.getProcessed_experiments() + 1);
+
+ if (jobData.getTotal_experiments() == jobData.getProcessed_experiments()) {
+ jobData.setStatus(COMPLETED);
+ jobStatusMap.get(jobID).setEndTime(Instant.now());
+ }
+
+ } else {
+
+ recommendationData.moveToFailed(experiment_name);
+
+ }
+ });
+ } catch (Exception e) {
+ e.printStackTrace();
+ recommendationData.moveToFailed(experiment_name);
+ }
+ });
+ }
+ }
+ }
+ } catch (Exception e) {
+ LOGGER.error(e.getMessage());
+ e.printStackTrace();
+ jobStatusMap.get(jobID).setStatus("FAILED");
+ }
+ }
+
+
+ Map getExperimentMap(DataSourceMetadataInfo metadataInfo) {
+ Map createExperimentAPIObjectMap = new HashMap<>();
+ Collection dataSourceCollection = metadataInfo.getDataSourceHashMap().values();
+ for (DataSource ds : dataSourceCollection) {
+ HashMap clusterHashMap = ds.getDataSourceClusterHashMap();
+ for (DataSourceCluster dsc : clusterHashMap.values()) {
+ HashMap namespaceHashMap = dsc.getDataSourceNamespaceHashMap();
+ for (DataSourceNamespace namespace : namespaceHashMap.values()) {
+ HashMap dataSourceWorkloadHashMap = namespace.getDataSourceWorkloadHashMap();
+ if (dataSourceWorkloadHashMap != null) {
+ for (DataSourceWorkload dsw : dataSourceWorkloadHashMap.values()) {
+ HashMap dataSourceContainerHashMap = dsw.getDataSourceContainerHashMap();
+ if (dataSourceContainerHashMap != null) {
+ for (DataSourceContainer dc : dataSourceContainerHashMap.values()) {
+ CreateExperimentAPIObject createExperimentAPIObject = new CreateExperimentAPIObject();
+ createExperimentAPIObject.setMode(CREATE_EXPERIMENT_CONFIG_BEAN.getMode());
+ createExperimentAPIObject.setTargetCluster(CREATE_EXPERIMENT_CONFIG_BEAN.getTarget());
+ createExperimentAPIObject.setApiVersion(CREATE_EXPERIMENT_CONFIG_BEAN.getVersion());
+ String experiment_name = this.bulkInput.getDatasource() + "|" + dsc.getDataSourceClusterName() + "|" + namespace.getDataSourceNamespaceName()
+ + "|" + dsw.getDataSourceWorkloadName() + "(" + dsw.getDataSourceWorkloadType() + ")" + "|" + dc.getDataSourceContainerName();
+ createExperimentAPIObject.setExperimentName(experiment_name);
+ createExperimentAPIObject.setDatasource(this.bulkInput.getDatasource());
+ createExperimentAPIObject.setClusterName(dsc.getDataSourceClusterName());
+ createExperimentAPIObject.setPerformanceProfile(CREATE_EXPERIMENT_CONFIG_BEAN.getPerformanceProfile());
+ List kubernetesAPIObjectList = new ArrayList<>();
+ KubernetesAPIObject kubernetesAPIObject = new KubernetesAPIObject();
+ ContainerAPIObject cao = new ContainerAPIObject(dc.getDataSourceContainerName(),
+ dc.getDataSourceContainerImageName(), null, null);
+ kubernetesAPIObject.setContainerAPIObjects(Arrays.asList(cao));
+ kubernetesAPIObject.setName(dsw.getDataSourceWorkloadName());
+ kubernetesAPIObject.setType(dsw.getDataSourceWorkloadType());
+ kubernetesAPIObject.setNamespace(namespace.getDataSourceNamespaceName());
+ kubernetesAPIObjectList.add(kubernetesAPIObject);
+ createExperimentAPIObject.setKubernetesObjects(kubernetesAPIObjectList);
+ RecommendationSettings rs = new RecommendationSettings();
+ rs.setThreshold(CREATE_EXPERIMENT_CONFIG_BEAN.getThreshold());
+ createExperimentAPIObject.setRecommendationSettings(rs);
+ TrialSettings trialSettings = new TrialSettings();
+ trialSettings.setMeasurement_durationMinutes(CREATE_EXPERIMENT_CONFIG_BEAN.getMeasurementDurationStr());
+ createExperimentAPIObject.setTrialSettings(trialSettings);
+ List kruizeExpList = new ArrayList<>();
+
+ createExperimentAPIObject.setExperiment_id(Utils.generateID(createExperimentAPIObject.toString()));
+ createExperimentAPIObject.setStatus(AnalyzerConstants.ExperimentStatus.IN_PROGRESS);
+ createExperimentAPIObject.setExperimentType(AnalyzerConstants.ExperimentTypes.CONTAINER_EXPERIMENT);
+ createExperimentAPIObjectMap.put(experiment_name, createExperimentAPIObject);
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ return createExperimentAPIObjectMap;
+ }
+
+ private String getLabels(BulkInput.FilterWrapper filter) {
+ String uniqueKey = null;
+ try {
+ // Process labels in the 'include' section
+ if (filter != null && filter.getInclude() != null) {
+ // Initialize StringBuilder for uniqueKey
+ StringBuilder includeLabelsBuilder = new StringBuilder();
+ Map includeLabels = filter.getInclude().getLabels();
+ if (includeLabels != null && !includeLabels.isEmpty()) {
+ includeLabels.forEach((key, value) ->
+ includeLabelsBuilder.append(key).append("=").append("\"" + value + "\"").append(",")
+ );
+ // Remove trailing comma
+ if (includeLabelsBuilder.length() > 0) {
+ includeLabelsBuilder.setLength(includeLabelsBuilder.length() - 1);
+ }
+ LOGGER.debug("Include Labels: " + includeLabelsBuilder.toString());
+ uniqueKey = includeLabelsBuilder.toString();
+ }
+ }
+ } catch (Exception e) {
+ e.printStackTrace();
+ LOGGER.error(e.getMessage());
+ }
+ return uniqueKey;
+ }
+
+ private JSONObject processDateRange(BulkInput.TimeRange timeRange) {
+ JSONObject dateRange = null;
+ if (null != timeRange && timeRange.getStart() != null && timeRange.getEnd() != null) {
+ String intervalEndTimeStr = timeRange.getStart();
+ String intervalStartTimeStr = timeRange.getEnd();
+ long interval_end_time_epoc = 0;
+ long interval_start_time_epoc = 0;
+ LocalDateTime localDateTime = LocalDateTime.parse(intervalEndTimeStr, DateTimeFormatter.ofPattern(KruizeConstants.DateFormats.STANDARD_JSON_DATE_FORMAT));
+ interval_end_time_epoc = localDateTime.toEpochSecond(ZoneOffset.UTC);
+ Timestamp interval_end_time = Timestamp.from(localDateTime.toInstant(ZoneOffset.UTC));
+ localDateTime = LocalDateTime.parse(intervalStartTimeStr, DateTimeFormatter.ofPattern(KruizeConstants.DateFormats.STANDARD_JSON_DATE_FORMAT));
+ interval_start_time_epoc = localDateTime.toEpochSecond(ZoneOffset.UTC);
+ Timestamp interval_start_time = Timestamp.from(localDateTime.toInstant(ZoneOffset.UTC));
+ int steps = CREATE_EXPERIMENT_CONFIG_BEAN.getMeasurementDuration() * KruizeConstants.TimeConv.NO_OF_SECONDS_PER_MINUTE; // todo fetch experiment recommendations setting measurement
+ dateRange = new JSONObject();
+ dateRange.put("start_time", interval_start_time_epoc);
+ dateRange.put("end_time", interval_end_time_epoc);
+ dateRange.put("steps", steps);
+ }
+ return dateRange;
+ }
+
+
+}
diff --git a/src/main/java/com/autotune/common/data/dataSourceQueries/DataSourceQueries.java b/src/main/java/com/autotune/common/data/dataSourceQueries/DataSourceQueries.java
index ccf20f8c6..dbddbe7d4 100644
--- a/src/main/java/com/autotune/common/data/dataSourceQueries/DataSourceQueries.java
+++ b/src/main/java/com/autotune/common/data/dataSourceQueries/DataSourceQueries.java
@@ -7,9 +7,10 @@
*/
public class DataSourceQueries {
public enum PromQLQuery {
- NAMESPACE_QUERY("sum by (namespace) ( avg_over_time(kube_namespace_status_phase{namespace!=\"\"}[15d]))"),
- WORKLOAD_INFO_QUERY("sum by (namespace, workload, workload_type) ( avg_over_time(namespace_workload_pod:kube_pod_owner:relabel{workload!=\"\"}[15d]))"),
- CONTAINER_INFO_QUERY("sum by (container, image, workload, workload_type, namespace) ( avg_over_time(kube_pod_container_info{container!=\"\"}[15d]) * on (pod, namespace) group_left(workload, workload_type) avg_over_time(namespace_workload_pod:kube_pod_owner:relabel{workload!=\"\"}[15d]))");
+ NAMESPACE_QUERY("sum by (namespace) ( avg_over_time(kube_namespace_status_phase{namespace!=\"\" ADDITIONAL_LABEL}[15d]))"),
+ WORKLOAD_INFO_QUERY("sum by (namespace, workload, workload_type) ( avg_over_time(namespace_workload_pod:kube_pod_owner:relabel{workload!=\"\" ADDITIONAL_LABEL}[15d]))"),
+ CONTAINER_INFO_QUERY("sum by (container, image, workload, workload_type, namespace) ( avg_over_time(kube_pod_container_info{container!=\"\" ADDITIONAL_LABEL }[15d]) * on (pod, namespace) group_left(workload, workload_type) avg_over_time(namespace_workload_pod:kube_pod_owner:relabel{workload!=\"\" ADDITIONAL_LABEL}[15d]))");
+
private final String query;
PromQLQuery(String query) {
diff --git a/src/main/java/com/autotune/common/data/metrics/AcceleratorMetricResult.java b/src/main/java/com/autotune/common/data/metrics/AcceleratorMetricResult.java
new file mode 100644
index 000000000..01f570ecb
--- /dev/null
+++ b/src/main/java/com/autotune/common/data/metrics/AcceleratorMetricResult.java
@@ -0,0 +1,29 @@
+package com.autotune.common.data.metrics;
+
+import com.autotune.common.data.system.info.device.accelerator.AcceleratorDeviceData;
+
+public class AcceleratorMetricResult {
+ private AcceleratorDeviceData acceleratorDeviceData;
+ private MetricResults metricResults;
+
+ public AcceleratorMetricResult(AcceleratorDeviceData acceleratorDeviceData, MetricResults metricResults) {
+ this.acceleratorDeviceData = acceleratorDeviceData;
+ this.metricResults = metricResults;
+ }
+
+ public AcceleratorDeviceData getAcceleratorDeviceData() {
+ return acceleratorDeviceData;
+ }
+
+ public void setAcceleratorDeviceData(AcceleratorDeviceData acceleratorDeviceData) {
+ this.acceleratorDeviceData = acceleratorDeviceData;
+ }
+
+ public MetricResults getMetricResults() {
+ return metricResults;
+ }
+
+ public void setMetricResults(MetricResults metricResults) {
+ this.metricResults = metricResults;
+ }
+}
diff --git a/src/main/java/com/autotune/common/data/result/ContainerData.java b/src/main/java/com/autotune/common/data/result/ContainerData.java
index 4f7afcc7f..66aa1dfc5 100644
--- a/src/main/java/com/autotune/common/data/result/ContainerData.java
+++ b/src/main/java/com/autotune/common/data/result/ContainerData.java
@@ -18,6 +18,7 @@
import com.autotune.analyzer.recommendations.ContainerRecommendations;
import com.autotune.analyzer.utils.AnalyzerConstants;
import com.autotune.common.data.metrics.Metric;
+import com.autotune.common.data.system.info.device.ContainerDeviceList;
import com.autotune.utils.KruizeConstants;
import com.google.gson.annotations.SerializedName;
@@ -29,6 +30,7 @@ public class ContainerData {
private String container_name;
//key is intervalEndTime
private HashMap results;
+ private ContainerDeviceList containerDeviceList;
@SerializedName(KruizeConstants.JSONKeys.RECOMMENDATIONS)
private ContainerRecommendations containerRecommendations;
private HashMap metrics;
@@ -85,6 +87,14 @@ public HashMap getMetrics() {
public void setMetrics(HashMap metrics) {
this.metrics = metrics;
}
+
+ public ContainerDeviceList getContainerDeviceList() {
+ return containerDeviceList;
+ }
+
+ public void setContainerDeviceList(ContainerDeviceList containerDeviceList) {
+ this.containerDeviceList = containerDeviceList;
+ }
@Override
public String toString() {
return "ContainerData{" +
diff --git a/src/main/java/com/autotune/common/data/result/IntervalResults.java b/src/main/java/com/autotune/common/data/result/IntervalResults.java
index e9bd880f3..327681690 100644
--- a/src/main/java/com/autotune/common/data/result/IntervalResults.java
+++ b/src/main/java/com/autotune/common/data/result/IntervalResults.java
@@ -16,6 +16,7 @@
package com.autotune.common.data.result;
import com.autotune.analyzer.utils.AnalyzerConstants;
+import com.autotune.common.data.metrics.AcceleratorMetricResult;
import com.autotune.common.data.metrics.MetricResults;
import com.google.gson.annotations.SerializedName;
@@ -32,6 +33,7 @@
public class IntervalResults {
@SerializedName(METRICS)
HashMap metricResultsMap;
+ HashMap acceleratorMetricResultHashMap;
@SerializedName(INTERVAL_START_TIME)
private Timestamp intervalStartTime;
@SerializedName(INTERVAL_END_TIME)
@@ -85,6 +87,14 @@ public void setDurationInMinutes(Double durationInMinutes) {
this.durationInMinutes = durationInMinutes;
}
+ public HashMap getAcceleratorMetricResultHashMap() {
+ return acceleratorMetricResultHashMap;
+ }
+
+ public void setAcceleratorMetricResultHashMap(HashMap acceleratorMetricResultHashMap) {
+ this.acceleratorMetricResultHashMap = acceleratorMetricResultHashMap;
+ }
+
@Override
public String toString() {
return "IntervalResults{" +
diff --git a/src/main/java/com/autotune/common/data/system/info/device/ContainerDeviceList.java b/src/main/java/com/autotune/common/data/system/info/device/ContainerDeviceList.java
new file mode 100644
index 000000000..00de9e322
--- /dev/null
+++ b/src/main/java/com/autotune/common/data/system/info/device/ContainerDeviceList.java
@@ -0,0 +1,144 @@
+package com.autotune.common.data.system.info.device;
+
+import com.autotune.analyzer.utils.AnalyzerConstants;
+import com.autotune.common.data.system.info.device.accelerator.AcceleratorDeviceData;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+
+/**
+ * This class stores the device entries linked to the container
+ */
+public class ContainerDeviceList implements DeviceHandler, DeviceComponentDetector {
+ private final HashMap> deviceMap;
+ private boolean isAcceleratorDeviceDetected;
+ private boolean isCPUDeviceDetected;
+ private boolean isMemoryDeviceDetected;
+ private boolean isNetworkDeviceDetected;
+
+ public ContainerDeviceList(){
+ this.deviceMap = new HashMap>();
+ this.isAcceleratorDeviceDetected = false;
+ // Currently setting up CPU, Memory and Network as true by default
+ this.isCPUDeviceDetected = true;
+ this.isMemoryDeviceDetected = true;
+ this.isNetworkDeviceDetected = true;
+ }
+
+ @Override
+ public void addDevice(AnalyzerConstants.DeviceType deviceType, DeviceDetails deviceInfo) {
+ if (null == deviceType || null == deviceInfo) {
+ // TODO: Handle appropriate returns in future
+ return;
+ }
+
+ if (deviceType == AnalyzerConstants.DeviceType.ACCELERATOR)
+ this.isAcceleratorDeviceDetected = true;
+
+ // TODO: Handle multiple same entries
+ // Currently only first MIG is getting added so no check for existing duplicates is done
+ if (null == deviceMap.get(deviceType)) {
+ ArrayList deviceDetailsList = new ArrayList();
+ deviceDetailsList.add(deviceInfo);
+ this.deviceMap.put(deviceType, deviceDetailsList);
+ } else {
+ this.deviceMap.get(deviceType).add(deviceInfo);
+ }
+ }
+
+ @Override
+ public void removeDevice(AnalyzerConstants.DeviceType deviceType, DeviceDetails deviceInfo) {
+ if (null == deviceType || null == deviceInfo) {
+ // TODO: Handle appropriate returns in future
+ return;
+ }
+ // TODO: Need to be implemented if we need a dynamic experiment device updates
+ if (deviceType == AnalyzerConstants.DeviceType.ACCELERATOR) {
+ if (null == deviceMap.get(deviceType) || this.deviceMap.get(deviceType).isEmpty()) {
+ this.isAcceleratorDeviceDetected = false;
+ }
+ }
+ }
+
+ @Override
+ public void updateDevice(AnalyzerConstants.DeviceType deviceType, DeviceDetails deviceInfo) {
+ // TODO: Need to be implemented if we need a dynamic experiment device updates
+ }
+
+ /**
+ * Returns the Device which matches the identifier based on the device parameter passed
+ * @param deviceType - Type of the device Eg: CPU, Memory, Network or Accelerator
+ * @param matchIdentifier - String which needs to the matched
+ * @param deviceParameters - Parameter to search in device details list
+ * @return the appropriate DeviceDetails object
+ *
+ * USE CASE: To search the device based on a particular parameter, Let's say you have multiple accelerators
+ * to the container, you can pass the Model name as parameter and name of model to get the particular
+ * DeviceDetail object.
+ */
+ @Override
+ public DeviceDetails getDeviceByParameter(AnalyzerConstants.DeviceType deviceType, String matchIdentifier, AnalyzerConstants.DeviceParameters deviceParameters) {
+ if (null == deviceType)
+ return null;
+ if (null == matchIdentifier)
+ return null;
+ if (null == deviceParameters)
+ return null;
+ if (matchIdentifier.isEmpty())
+ return null;
+ if (!deviceMap.containsKey(deviceType))
+ return null;
+ if (null == deviceMap.get(deviceType))
+ return null;
+ if (deviceMap.get(deviceType).isEmpty())
+ return null;
+
+ // Todo: Need to add extractors for each device type currently implementing for GPU
+ if (deviceType == AnalyzerConstants.DeviceType.ACCELERATOR) {
+ for (DeviceDetails deviceDetails: deviceMap.get(deviceType)) {
+ AcceleratorDeviceData deviceData = (AcceleratorDeviceData) deviceDetails;
+ if (deviceParameters == AnalyzerConstants.DeviceParameters.MODEL_NAME) {
+ if (deviceData.getModelName().equalsIgnoreCase(matchIdentifier)) {
+ return deviceData;
+ }
+ }
+ }
+ }
+
+ return null;
+ }
+
+ @Override
+ public ArrayList getDevices(AnalyzerConstants.DeviceType deviceType) {
+ if (null == deviceType)
+ return null;
+ if (!deviceMap.containsKey(deviceType))
+ return null;
+ if (null == deviceMap.get(deviceType))
+ return null;
+ if (deviceMap.get(deviceType).isEmpty())
+ return null;
+
+ return deviceMap.get(deviceType);
+ }
+
+ @Override
+ public boolean isAcceleratorDeviceDetected() {
+ return this.isAcceleratorDeviceDetected;
+ }
+
+ @Override
+ public boolean isCPUDeviceDetected() {
+ return this.isCPUDeviceDetected;
+ }
+
+ @Override
+ public boolean isMemoryDeviceDetected() {
+ return this.isMemoryDeviceDetected;
+ }
+
+ @Override
+ public boolean isNetworkDeviceDetected() {
+ return this.isNetworkDeviceDetected;
+ }
+}
diff --git a/src/main/java/com/autotune/common/data/system/info/device/DeviceComponentDetector.java b/src/main/java/com/autotune/common/data/system/info/device/DeviceComponentDetector.java
new file mode 100644
index 000000000..249ba9c55
--- /dev/null
+++ b/src/main/java/com/autotune/common/data/system/info/device/DeviceComponentDetector.java
@@ -0,0 +1,8 @@
+package com.autotune.common.data.system.info.device;
+
+public interface DeviceComponentDetector {
+ public boolean isAcceleratorDeviceDetected();
+ public boolean isCPUDeviceDetected();
+ public boolean isMemoryDeviceDetected();
+ public boolean isNetworkDeviceDetected();
+}
diff --git a/src/main/java/com/autotune/common/data/system/info/device/DeviceDetails.java b/src/main/java/com/autotune/common/data/system/info/device/DeviceDetails.java
new file mode 100644
index 000000000..584891b60
--- /dev/null
+++ b/src/main/java/com/autotune/common/data/system/info/device/DeviceDetails.java
@@ -0,0 +1,7 @@
+package com.autotune.common.data.system.info.device;
+
+import com.autotune.analyzer.utils.AnalyzerConstants;
+
+public interface DeviceDetails {
+ public AnalyzerConstants.DeviceType getType();
+}
diff --git a/src/main/java/com/autotune/common/data/system/info/device/DeviceHandler.java b/src/main/java/com/autotune/common/data/system/info/device/DeviceHandler.java
new file mode 100644
index 000000000..447716440
--- /dev/null
+++ b/src/main/java/com/autotune/common/data/system/info/device/DeviceHandler.java
@@ -0,0 +1,15 @@
+package com.autotune.common.data.system.info.device;
+
+import com.autotune.analyzer.utils.AnalyzerConstants;
+
+import java.util.ArrayList;
+
+public interface DeviceHandler {
+ public void addDevice(AnalyzerConstants.DeviceType deviceType, DeviceDetails deviceInfo);
+ public void removeDevice(AnalyzerConstants.DeviceType deviceType, DeviceDetails deviceInfo);
+ public void updateDevice(AnalyzerConstants.DeviceType deviceType, DeviceDetails deviceInfo);
+ public DeviceDetails getDeviceByParameter(AnalyzerConstants.DeviceType deviceType,
+ String matchIdentifier,
+ AnalyzerConstants.DeviceParameters deviceParameters);
+ public ArrayList getDevices(AnalyzerConstants.DeviceType deviceType);
+}
diff --git a/src/main/java/com/autotune/common/data/system/info/device/accelerator/AcceleratorDeviceData.java b/src/main/java/com/autotune/common/data/system/info/device/accelerator/AcceleratorDeviceData.java
new file mode 100644
index 000000000..a3a09fead
--- /dev/null
+++ b/src/main/java/com/autotune/common/data/system/info/device/accelerator/AcceleratorDeviceData.java
@@ -0,0 +1,59 @@
+package com.autotune.common.data.system.info.device.accelerator;
+
+import com.autotune.analyzer.utils.AnalyzerConstants;
+
+public class AcceleratorDeviceData implements AcceleratorDeviceDetails {
+ private final String manufacturer;
+ private final String modelName;
+ private final String hostName;
+ private final String UUID;
+ private final String deviceName;
+ private boolean isMIG;
+
+ public AcceleratorDeviceData (String modelName, String hostName, String UUID, String deviceName, boolean isMIG) {
+ this.manufacturer = "NVIDIA";
+ this.modelName = modelName;
+ this.hostName = hostName;
+ this.UUID = UUID;
+ this.deviceName = deviceName;
+ this.isMIG = isMIG;
+ }
+
+ @Override
+ public String getManufacturer() {
+ return this.manufacturer;
+ }
+
+ @Override
+ public String getModelName() {
+ return modelName;
+ }
+
+ @Override
+ public String getHostName() {
+ return hostName;
+ }
+
+ @Override
+ public String getUUID() {
+ return UUID;
+ }
+
+ @Override
+ public String getDeviceName() {
+ return deviceName;
+ }
+
+ public boolean isMIG() {
+ return isMIG;
+ }
+
+ public void setMIG(boolean isMIG) {
+ this.isMIG = isMIG;
+ }
+
+ @Override
+ public AnalyzerConstants.DeviceType getType() {
+ return AnalyzerConstants.DeviceType.ACCELERATOR;
+ }
+}
diff --git a/src/main/java/com/autotune/common/data/system/info/device/accelerator/AcceleratorDeviceDetails.java b/src/main/java/com/autotune/common/data/system/info/device/accelerator/AcceleratorDeviceDetails.java
new file mode 100644
index 000000000..31b90ff66
--- /dev/null
+++ b/src/main/java/com/autotune/common/data/system/info/device/accelerator/AcceleratorDeviceDetails.java
@@ -0,0 +1,11 @@
+package com.autotune.common.data.system.info.device.accelerator;
+
+import com.autotune.common.data.system.info.device.DeviceDetails;
+
+public interface AcceleratorDeviceDetails extends DeviceDetails {
+ public String getManufacturer();
+ public String getModelName();
+ public String getHostName();
+ public String getUUID();
+ public String getDeviceName();
+}
diff --git a/src/main/java/com/autotune/common/data/system/info/device/accelerator/metadata/AcceleratorMetaDataService.java b/src/main/java/com/autotune/common/data/system/info/device/accelerator/metadata/AcceleratorMetaDataService.java
new file mode 100644
index 000000000..6a5fd8187
--- /dev/null
+++ b/src/main/java/com/autotune/common/data/system/info/device/accelerator/metadata/AcceleratorMetaDataService.java
@@ -0,0 +1,103 @@
+package com.autotune.common.data.system.info.device.accelerator.metadata;
+
+
+
+import com.autotune.analyzer.utils.AnalyzerConstants;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * A service which is created to provide the respective Accelerator Profile
+ * based on SM and Memory requirements
+ *
+ * This service initially loads the profiles of supported Accelerators
+ * Currently it supports:
+ * NVIDIA A100 40GB
+ * NVIDIA A100 80GB
+ * NVIDIA H100 80GB
+ */
+public class AcceleratorMetaDataService {
+ private static Map> acceleratorProfilesMap;
+ private static AcceleratorMetaDataService acceleratorMetaDataService = null;
+
+ /**
+ *
+ */
+ private AcceleratorMetaDataService(){
+ acceleratorProfilesMap = new HashMap<>();
+ initializeAcceleratorProfiles();
+ }
+
+ private static void initializeAcceleratorProfiles() {
+ List commonProfiles = new ArrayList<>();
+ // IMPORTANT: Add it in the ascending order according to GPU Core and Memory Units as we will break the loop upon getting the right one
+ commonProfiles.add(new AcceleratorProfile(AnalyzerConstants.AcceleratorConstants.AcceleratorProfiles.PROFILE_1G_10GB,
+ 1.0 / 8, 1.0 / 7, 7));
+ commonProfiles.add(new AcceleratorProfile(AnalyzerConstants.AcceleratorConstants.AcceleratorProfiles.PROFILE_1G_20GB,
+ 1.0 / 4, 1.0 / 7, 4));
+ commonProfiles.add(new AcceleratorProfile(AnalyzerConstants.AcceleratorConstants.AcceleratorProfiles.PROFILE_2G_20GB,
+ 2.0 / 8, 2.0 / 7, 3));
+ commonProfiles.add(new AcceleratorProfile(AnalyzerConstants.AcceleratorConstants.AcceleratorProfiles.PROFILE_3G_40GB,
+ 4.0 / 8, 3.0 / 7, 2));
+ commonProfiles.add(new AcceleratorProfile(AnalyzerConstants.AcceleratorConstants.AcceleratorProfiles.PROFILE_4G_40GB,
+ 4.0 / 8, 4.0 / 7, 1));
+ commonProfiles.add(new AcceleratorProfile(AnalyzerConstants.AcceleratorConstants.AcceleratorProfiles.PROFILE_7G_80GB,
+ 1.0, 1.0, 1));
+
+ List a100_40_gb_profiles = new ArrayList<>();
+ // IMPORTANT: Add it in the ascending order according to GPU Core and Memory Units as we will break the loop upon getting the right one
+ a100_40_gb_profiles.add(new AcceleratorProfile(AnalyzerConstants.AcceleratorConstants.AcceleratorProfiles.PROFILE_1G_5GB,
+ 1.0 / 8, 1.0 / 7, 7));
+ a100_40_gb_profiles.add(new AcceleratorProfile(AnalyzerConstants.AcceleratorConstants.AcceleratorProfiles.PROFILE_1G_10GB,
+ 1.0 / 4, 1.0 / 7, 4));
+ a100_40_gb_profiles.add(new AcceleratorProfile(AnalyzerConstants.AcceleratorConstants.AcceleratorProfiles.PROFILE_2G_10GB,
+ 2.0 / 8, 2.0 / 7, 3));
+ a100_40_gb_profiles.add(new AcceleratorProfile(AnalyzerConstants.AcceleratorConstants.AcceleratorProfiles.PROFILE_3G_20GB,
+ 4.0 / 8, 3.0 / 7, 2));
+ a100_40_gb_profiles.add(new AcceleratorProfile(AnalyzerConstants.AcceleratorConstants.AcceleratorProfiles.PROFILE_4G_20GB,
+ 4.0 / 8, 4.0 / 7, 1));
+ a100_40_gb_profiles.add(new AcceleratorProfile(AnalyzerConstants.AcceleratorConstants.AcceleratorProfiles.PROFILE_7G_40GB,
+ 1.0, 1.0, 1));
+
+ acceleratorProfilesMap.put(AnalyzerConstants.AcceleratorConstants.SupportedAccelerators.A100_80_GB, new ArrayList<>(commonProfiles));
+ acceleratorProfilesMap.put(AnalyzerConstants.AcceleratorConstants.SupportedAccelerators.H100_80_GB, new ArrayList<>(commonProfiles));
+ acceleratorProfilesMap.put(AnalyzerConstants.AcceleratorConstants.SupportedAccelerators.A100_40_GB, new ArrayList<>(a100_40_gb_profiles));
+ }
+
+ public static AcceleratorMetaDataService getInstance() {
+ if(null == acceleratorMetaDataService) {
+ synchronized (AcceleratorMetaDataService.class) {
+ if (null == acceleratorMetaDataService) {
+ acceleratorMetaDataService = new AcceleratorMetaDataService();
+ }
+ }
+ }
+ return acceleratorMetaDataService;
+ }
+
+ public AcceleratorProfile getAcceleratorProfile(String modelName, Double requiredSmFraction, Double requiredMemoryFraction) {
+ if (null == modelName || null == requiredSmFraction || null == requiredMemoryFraction) {
+ return null;
+ }
+ modelName = modelName.strip();
+ if (!modelName.equalsIgnoreCase(AnalyzerConstants.AcceleratorConstants.SupportedAccelerators.A100_80_GB)
+ && !modelName.equalsIgnoreCase(AnalyzerConstants.AcceleratorConstants.SupportedAccelerators.H100_80_GB)
+ && !modelName.equalsIgnoreCase(AnalyzerConstants.AcceleratorConstants.SupportedAccelerators.A100_40_GB)) {
+ return null;
+ }
+ if (requiredMemoryFraction < 0.0 || requiredSmFraction < 0.0) {
+ return null;
+ }
+ List gpuProfiles = acceleratorProfilesMap.get(modelName);
+ for (AcceleratorProfile profile : gpuProfiles) {
+ if (profile.getMemoryFraction() >= requiredMemoryFraction && profile.getSmFraction() >= requiredSmFraction) {
+ // Returning the profile as the list is in ascending order
+ return profile;
+ }
+ }
+ return null;
+ }
+}
diff --git a/src/main/java/com/autotune/common/data/system/info/device/accelerator/metadata/AcceleratorProfile.java b/src/main/java/com/autotune/common/data/system/info/device/accelerator/metadata/AcceleratorProfile.java
new file mode 100644
index 000000000..c0db82b50
--- /dev/null
+++ b/src/main/java/com/autotune/common/data/system/info/device/accelerator/metadata/AcceleratorProfile.java
@@ -0,0 +1,51 @@
+package com.autotune.common.data.system.info.device.accelerator.metadata;
+
+/**
+ * Class which is used to store the details of an accelerator profile
+ */
+public class AcceleratorProfile {
+ private final String profileName;
+ private final double memoryFraction;
+ private final double smFraction;
+ private final int instancesAvailable;
+
+ /**
+ * Constructor to create the Accelerator Profile
+ * @param profileName - Name of the profile
+ * @param memoryFraction - Fraction of memory out of the whole accelerator memory
+ * @param smFraction - Fraction of Cores or Streaming Processors out if the whole accelerator cores
+ * @param instancesAvailable - Number of instances of a profile available on an Accelerator
+ */
+ public AcceleratorProfile(String profileName, double memoryFraction, double smFraction, int instancesAvailable) {
+ this.profileName = profileName;
+ this.memoryFraction = memoryFraction;
+ this.smFraction = smFraction;
+ this.instancesAvailable = instancesAvailable;
+ }
+
+ public String getProfileName() {
+ return this.profileName;
+ }
+
+ public double getMemoryFraction() {
+ return memoryFraction;
+ }
+
+ public double getSmFraction() {
+ return smFraction;
+ }
+
+ public int getInstancesAvailable() {
+ return instancesAvailable;
+ }
+
+ @Override
+ public String toString() {
+ return "AcceleratorProfile{" +
+ "profileName='" + profileName + '\'' +
+ ", memoryFraction=" + memoryFraction +
+ ", smFraction=" + smFraction +
+ ", instancesAvailable=" + instancesAvailable +
+ '}';
+ }
+}
diff --git a/src/main/java/com/autotune/common/datasource/DataSourceManager.java b/src/main/java/com/autotune/common/datasource/DataSourceManager.java
index 441a70516..94bfc4ce5 100644
--- a/src/main/java/com/autotune/common/datasource/DataSourceManager.java
+++ b/src/main/java/com/autotune/common/datasource/DataSourceManager.java
@@ -1,3 +1,18 @@
+/*******************************************************************************
+ * Copyright (c) 2020, 2021 Red Hat, IBM Corporation and others.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *******************************************************************************/
package com.autotune.common.datasource;
import com.autotune.analyzer.utils.AnalyzerErrorConstants;
@@ -32,13 +47,19 @@ public DataSourceManager() {
/**
* Imports Metadata for a specific data source using associated DataSourceInfo.
+ * @param dataSourceInfo
+ * @param uniqueKey this is used as labels in query example container="xyz" namespace="abc"
+ * @param startTime Get metadata from starttime to endtime
+ * @param endTime Get metadata from starttime to endtime
+ * @param steps the interval between data points in a range query
+ * @return
*/
- public DataSourceMetadataInfo importMetadataFromDataSource(DataSourceInfo dataSourceInfo) {
+ public DataSourceMetadataInfo importMetadataFromDataSource(DataSourceInfo dataSourceInfo,String uniqueKey,long startTime,long endTime,int steps) {
try {
if (null == dataSourceInfo) {
throw new DataSourceDoesNotExist(KruizeConstants.DataSourceConstants.DataSourceErrorMsgs.MISSING_DATASOURCE_INFO);
}
- DataSourceMetadataInfo dataSourceMetadataInfo = dataSourceMetadataOperator.createDataSourceMetadata(dataSourceInfo);
+ DataSourceMetadataInfo dataSourceMetadataInfo = dataSourceMetadataOperator.createDataSourceMetadata(dataSourceInfo,uniqueKey, startTime, endTime, steps);
if (null == dataSourceMetadataInfo) {
LOGGER.error(KruizeConstants.DataSourceConstants.DataSourceMetadataErrorMsgs.DATASOURCE_METADATA_INFO_NOT_AVAILABLE, "for datasource {}" + dataSourceInfo.getName());
return null;
@@ -91,7 +112,7 @@ public void updateMetadataFromDataSource(DataSourceInfo dataSource, DataSourceMe
if (null == dataSourceMetadataInfo) {
throw new DataSourceDoesNotExist(KruizeConstants.DataSourceConstants.DataSourceMetadataErrorMsgs.DATASOURCE_METADATA_INFO_NOT_AVAILABLE);
}
- dataSourceMetadataOperator.updateDataSourceMetadata(dataSource);
+ dataSourceMetadataOperator.updateDataSourceMetadata(dataSource,"",0,0,0);
} catch (Exception e) {
LOGGER.error(e.getMessage());
}
diff --git a/src/main/java/com/autotune/common/datasource/DataSourceMetadataOperator.java b/src/main/java/com/autotune/common/datasource/DataSourceMetadataOperator.java
index d1079564b..bd51e797b 100644
--- a/src/main/java/com/autotune/common/datasource/DataSourceMetadataOperator.java
+++ b/src/main/java/com/autotune/common/datasource/DataSourceMetadataOperator.java
@@ -1,3 +1,18 @@
+/*******************************************************************************
+ * Copyright (c) 2020, 2021 Red Hat, IBM Corporation and others.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *******************************************************************************/
package com.autotune.common.datasource;
import com.autotune.common.data.dataSourceQueries.PromQLDataSourceQueries;
@@ -31,10 +46,14 @@ public class DataSourceMetadataOperator {
* Currently supported DataSourceProvider - Prometheus
*
* @param dataSourceInfo The DataSourceInfo object containing information about the data source.
+ * @param uniqueKey this is used as labels in query example container="xyz" namespace="abc"
+ * @param startTime Get metadata from starttime to endtime
+ * @param endTime Get metadata from starttime to endtime
+ * @param steps the interval between data points in a range query
* TODO - support multiple data sources
*/
- public DataSourceMetadataInfo createDataSourceMetadata(DataSourceInfo dataSourceInfo) {
- return processQueriesAndPopulateDataSourceMetadataInfo(dataSourceInfo);
+ public DataSourceMetadataInfo createDataSourceMetadata(DataSourceInfo dataSourceInfo, String uniqueKey, long startTime, long endTime, int steps) {
+ return processQueriesAndPopulateDataSourceMetadataInfo(dataSourceInfo, uniqueKey, startTime, endTime, steps);
}
/**
@@ -75,8 +94,8 @@ public DataSourceMetadataInfo getDataSourceMetadataInfo(DataSourceInfo dataSourc
* TODO - Currently Create and Update functions have identical functionalities, based on UI workflow and requirements
* need to further enhance updateDataSourceMetadata() to support namespace, workload level granular updates
*/
- public DataSourceMetadataInfo updateDataSourceMetadata(DataSourceInfo dataSourceInfo) {
- return processQueriesAndPopulateDataSourceMetadataInfo(dataSourceInfo);
+ public DataSourceMetadataInfo updateDataSourceMetadata(DataSourceInfo dataSourceInfo, String uniqueKey, long startTime, long endTime, int steps) {
+ return processQueriesAndPopulateDataSourceMetadataInfo(dataSourceInfo, uniqueKey, startTime, endTime, steps);
}
/**
@@ -108,9 +127,14 @@ public void deleteDataSourceMetadata(DataSourceInfo dataSourceInfo) {
* DataSourceMetadataInfo object
*
* @param dataSourceInfo The DataSourceInfo object containing information about the data source
+ * @param uniqueKey this is used as labels in query example container="xyz" namespace="abc"
+ * @param startTime Get metadata from starttime to endtime
+ * @param endTime Get metadata from starttime to endtime
+ * @param steps the interval between data points in a range query
* @return DataSourceMetadataInfo object with populated metadata fields
+ * todo rename processQueriesAndFetchClusterMetadataInfo
*/
- public DataSourceMetadataInfo processQueriesAndPopulateDataSourceMetadataInfo(DataSourceInfo dataSourceInfo) {
+ public DataSourceMetadataInfo processQueriesAndPopulateDataSourceMetadataInfo(DataSourceInfo dataSourceInfo, String uniqueKey, long startTime, long endTime, int steps) {
DataSourceMetadataHelper dataSourceDetailsHelper = new DataSourceMetadataHelper();
/**
* Get DataSourceOperatorImpl instance on runtime based on dataSource provider
@@ -129,8 +153,25 @@ public DataSourceMetadataInfo processQueriesAndPopulateDataSourceMetadataInfo(Da
*/
try {
String dataSourceName = dataSourceInfo.getName();
- JsonArray namespacesDataResultArray = op.getResultArrayForQuery(dataSourceInfo, PromQLDataSourceQueries.NAMESPACE_QUERY);
- if (false == op.validateResultArray(namespacesDataResultArray)){
+ String namespaceQuery = PromQLDataSourceQueries.NAMESPACE_QUERY;
+ String workloadQuery = PromQLDataSourceQueries.WORKLOAD_QUERY;
+ String containerQuery = PromQLDataSourceQueries.CONTAINER_QUERY;
+ if (null != uniqueKey) {
+ LOGGER.info("uniquekey: {}", uniqueKey);
+ namespaceQuery = namespaceQuery.replace("ADDITIONAL_LABEL", "," + uniqueKey);
+ workloadQuery = workloadQuery.replace("ADDITIONAL_LABEL", "," + uniqueKey);
+ containerQuery = containerQuery.replace("ADDITIONAL_LABEL", "," + uniqueKey);
+ } else {
+ namespaceQuery = namespaceQuery.replace("ADDITIONAL_LABEL", "");
+ workloadQuery = workloadQuery.replace("ADDITIONAL_LABEL", "");
+ containerQuery = containerQuery.replace("ADDITIONAL_LABEL", "");
+ }
+ LOGGER.info("namespaceQuery: {}", namespaceQuery);
+ LOGGER.info("workloadQuery: {}", workloadQuery);
+ LOGGER.info("containerQuery: {}", containerQuery);
+
+ JsonArray namespacesDataResultArray = op.getResultArrayForQuery(dataSourceInfo, namespaceQuery);
+ if (false == op.validateResultArray(namespacesDataResultArray)) {
dataSourceMetadataInfo = dataSourceDetailsHelper.createDataSourceMetadataInfoObject(dataSourceName, null);
throw new Exception(KruizeConstants.DataSourceConstants.DataSourceMetadataErrorMsgs.NAMESPACE_QUERY_VALIDATION_FAILED);
}
@@ -153,7 +194,7 @@ public DataSourceMetadataInfo processQueriesAndPopulateDataSourceMetadataInfo(Da
*/
HashMap> datasourceWorkloads = new HashMap<>();
JsonArray workloadDataResultArray = op.getResultArrayForQuery(dataSourceInfo,
- PromQLDataSourceQueries.WORKLOAD_QUERY);
+ workloadQuery);
if (op.validateResultArray(workloadDataResultArray)) {
datasourceWorkloads = dataSourceDetailsHelper.getWorkloadInfo(workloadDataResultArray);
@@ -172,7 +213,7 @@ public DataSourceMetadataInfo processQueriesAndPopulateDataSourceMetadataInfo(Da
*/
HashMap> datasourceContainers = new HashMap<>();
JsonArray containerDataResultArray = op.getResultArrayForQuery(dataSourceInfo,
- PromQLDataSourceQueries.CONTAINER_QUERY);
+ containerQuery);
if (op.validateResultArray(containerDataResultArray)) {
datasourceContainers = dataSourceDetailsHelper.getContainerInfo(containerDataResultArray);
diff --git a/src/main/java/com/autotune/common/utils/CommonUtils.java b/src/main/java/com/autotune/common/utils/CommonUtils.java
index 384bc5dc3..ddd965d6e 100644
--- a/src/main/java/com/autotune/common/utils/CommonUtils.java
+++ b/src/main/java/com/autotune/common/utils/CommonUtils.java
@@ -19,12 +19,13 @@
import com.autotune.common.datasource.DataSourceCollection;
import com.autotune.common.datasource.DataSourceInfo;
import com.autotune.common.datasource.DataSourceManager;
+
import com.autotune.utils.KruizeConstants;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
import java.sql.Timestamp;
-import java.util.Calendar;
-import java.util.Collections;
-import java.util.List;
+import java.util.*;
import java.util.concurrent.TimeUnit;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
@@ -34,6 +35,8 @@
*/
public class CommonUtils {
+ private static final Logger LOGGER = LoggerFactory.getLogger(CommonUtils.class);
+
/**
* AutotuneDatasourceTypes is an ENUM which holds different types of
* datasources supported by Autotune
diff --git a/src/main/java/com/autotune/database/dao/ExperimentDAOImpl.java b/src/main/java/com/autotune/database/dao/ExperimentDAOImpl.java
index 21930c327..7b72baf77 100644
--- a/src/main/java/com/autotune/database/dao/ExperimentDAOImpl.java
+++ b/src/main/java/com/autotune/database/dao/ExperimentDAOImpl.java
@@ -1,3 +1,18 @@
+/*******************************************************************************
+ * Copyright (c) 2020, 2021 Red Hat, IBM Corporation and others.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *******************************************************************************/
package com.autotune.database.dao;
import com.autotune.analyzer.kruizeObject.KruizeObject;
@@ -27,7 +42,10 @@
import java.time.LocalDateTime;
import java.time.YearMonth;
import java.time.temporal.ChronoUnit;
-import java.util.*;
+import java.util.ArrayList;
+import java.util.Calendar;
+import java.util.Date;
+import java.util.List;
import java.util.stream.IntStream;
import static com.autotune.database.helper.DBConstants.DB_MESSAGES.DUPLICATE_KEY;
@@ -150,9 +168,9 @@ public void addPartitions(String tableName, String month, String year, int dayOf
year, month, String.format("%02d", i), year, month, String.format("%02d", i));
session.createNativeQuery(daterange).executeUpdate();
});
- } else if (partitionType.equalsIgnoreCase(DBConstants.PARTITION_TYPES.BY_DAY)) {
- String daterange = String.format(DB_PARTITION_DATERANGE, tableName, year, month, String.format("%02d", 1), tableName,
- year, month, String.format("%02d", 1), year, month, String.format("%02d", 1));
+ } else if (partitionType.equalsIgnoreCase(DBConstants.PARTITION_TYPES.BY_DAY)) { //ROS not calling this condition
+ String daterange = String.format(DB_PARTITION_DATERANGE, tableName, year, month, dayOfTheMonth, tableName,
+ year, month, dayOfTheMonth, year, month, dayOfTheMonth);
session.createNativeQuery(daterange).executeUpdate();
} else {
LOGGER.error(DBConstants.DB_MESSAGES.INVALID_PARTITION_TYPE);
@@ -239,7 +257,9 @@ public List addToDBAndFetchFailedResults(List loadAllPerformanceProfiles() throws E
/**
* Fetches all the Metric Profile records from KruizeMetricProfileEntry database table
+ *
* @return List of all KruizeMetricProfileEntry database objects
* @throws Exception
*/
@@ -779,7 +804,6 @@ public List loadExperimentFromDBByInputJSON(StringBuilder
}
-
@Override
public List loadResultsByExperimentName(String experimentName, String cluster_name, Timestamp calculated_start_time, Timestamp interval_end_time) throws Exception {
// TODO: load only experimentStatus=inProgress , playback may not require completed experiments
@@ -898,6 +922,7 @@ public List loadPerformanceProfileByName(String p
/**
* Fetches Metric Profile by name from KruizeMetricProfileEntry database table
+ *
* @param metricProfileName Metric profile name
* @return List of KruizeMetricProfileEntry objects
* @throws Exception
@@ -985,7 +1010,7 @@ public List loadMetadataByName(String dataSourceName) thr
* Retrieves a list of KruizeDSMetadataEntry objects based on the specified datasource name and cluster name.
*
* @param dataSourceName The name of the datasource.
- * @param clusterName The name of the cluster.
+ * @param clusterName The name of the cluster.
* @return A list of KruizeDSMetadataEntry objects associated with the provided datasource and cluster name.
* @throws Exception If there is an error while loading metadata from the database.
*/
@@ -1010,8 +1035,8 @@ public List loadMetadataByClusterName(String dataSourceNa
* datasource name, cluster name and namespace.
*
* @param dataSourceName The name of the datasource.
- * @param clusterName The name of the cluster.
- * @param namespace namespace
+ * @param clusterName The name of the cluster.
+ * @param namespace namespace
* @return A list of KruizeDSMetadataEntry objects associated with the provided datasource, cluster name and namespaces.
* @throws Exception If there is an error while loading metadata from the database.
*/
@@ -1021,7 +1046,7 @@ public List loadMetadataByNamespace(String dataSourceName
Query kruizeMetadataQuery = session.createQuery(SELECT_FROM_METADATA_BY_DATASOURCE_NAME_CLUSTER_NAME_AND_NAMESPACE, KruizeDSMetadataEntry.class)
.setParameter("datasource_name", dataSourceName)
.setParameter("cluster_name", clusterName)
- .setParameter("namespace",namespace);
+ .setParameter("namespace", namespace);
kruizeMetadataList = kruizeMetadataQuery.list();
} catch (Exception e) {
@@ -1066,14 +1091,16 @@ public List loadAllDataSources() throws Exception {
private void getExperimentTypeInKruizeExperimentEntry(List entries) throws Exception {
try (Session session = KruizeHibernateUtil.getSessionFactory().openSession()) {
- for (KruizeExperimentEntry entry: entries) {
+ for (KruizeExperimentEntry entry : entries) {
if (isTargetCluserLocal(entry.getTarget_cluster())) {
- String sql = DBConstants.SQLQUERY.SELECT_EXPERIMENT_EXP_TYPE;
- Query query = session.createNativeQuery(sql);
- query.setParameter("experiment_id", entry.getExperiment_id());
- List experimentType = query.getResultList();
- if (null != experimentType && !experimentType.isEmpty()) {
- entry.setExperimentType(experimentType.get(0));
+ if (null == entry.getExperimentType() || entry.getExperimentType().isEmpty()) {
+ String sql = DBConstants.SQLQUERY.SELECT_EXPERIMENT_EXP_TYPE;
+ Query query = session.createNativeQuery(sql);
+ query.setParameter("experiment_id", entry.getExperiment_id());
+ List experimentType = query.getResultList();
+ if (null != experimentType && !experimentType.isEmpty()) {
+ entry.setExperimentType(experimentType.get(0));
+ }
}
}
}
@@ -1101,7 +1128,7 @@ private void updateExperimentTypeInKruizeExperimentEntry(KruizeExperimentEntry k
}
private void getExperimentTypeInKruizeRecommendationsEntry(List entries) throws Exception {
- for (KruizeRecommendationEntry recomEntry: entries) {
+ for (KruizeRecommendationEntry recomEntry : entries) {
getExperimentTypeInSingleKruizeRecommendationsEntry(recomEntry);
}
}
diff --git a/src/main/java/com/autotune/database/helper/DBHelpers.java b/src/main/java/com/autotune/database/helper/DBHelpers.java
index fd09f54ec..8b3d018fd 100644
--- a/src/main/java/com/autotune/database/helper/DBHelpers.java
+++ b/src/main/java/com/autotune/database/helper/DBHelpers.java
@@ -16,6 +16,8 @@
package com.autotune.database.helper;
+import com.autotune.analyzer.adapters.DeviceDetailsAdapter;
+import com.autotune.analyzer.adapters.RecommendationItemAdapter;
import com.autotune.analyzer.exceptions.InvalidConversionOfRecommendationEntryException;
import com.autotune.analyzer.kruizeObject.KruizeObject;
import com.autotune.analyzer.kruizeObject.SloInfo;
@@ -32,6 +34,7 @@
import com.autotune.common.data.result.ContainerData;
import com.autotune.common.data.result.ExperimentResultData;
import com.autotune.common.data.result.NamespaceData;
+import com.autotune.common.data.system.info.device.DeviceDetails;
import com.autotune.common.datasource.DataSourceCollection;
import com.autotune.common.datasource.DataSourceInfo;
import com.autotune.common.datasource.DataSourceMetadataOperator;
@@ -334,6 +337,8 @@ public static KruizeResultsEntry convertExperimentResultToExperimentResultsTable
.enableComplexMapKeySerialization()
.setDateFormat(KruizeConstants.DateFormats.STANDARD_JSON_DATE_FORMAT)
.registerTypeAdapter(Date.class, new GsonUTCDateAdapter())
+ .registerTypeAdapter(AnalyzerConstants.RecommendationItem.class, new RecommendationItemAdapter())
+ .registerTypeAdapter(DeviceDetails.class, new DeviceDetailsAdapter())
.create();
try {
kruizeResultsEntry = new KruizeResultsEntry();
@@ -473,6 +478,8 @@ public static KruizeRecommendationEntry convertKruizeObjectTORecommendation(Krui
.enableComplexMapKeySerialization()
.setDateFormat(KruizeConstants.DateFormats.STANDARD_JSON_DATE_FORMAT)
.registerTypeAdapter(Date.class, new GsonUTCDateAdapter())
+ .registerTypeAdapter(AnalyzerConstants.RecommendationItem.class, new RecommendationItemAdapter())
+ .registerTypeAdapter(DeviceDetails.class, new DeviceDetailsAdapter())
.create();
try {
ListRecommendationsAPIObject listRecommendationsAPIObject = getListRecommendationAPIObjectForDB(
@@ -480,7 +487,12 @@ public static KruizeRecommendationEntry convertKruizeObjectTORecommendation(Krui
if (null == listRecommendationsAPIObject) {
return null;
}
- LOGGER.debug(new GsonBuilder().setPrettyPrinting().create().toJson(listRecommendationsAPIObject));
+ LOGGER.debug(new GsonBuilder()
+ .setPrettyPrinting()
+ .registerTypeAdapter(AnalyzerConstants.RecommendationItem.class, new RecommendationItemAdapter())
+ .registerTypeAdapter(DeviceDetails.class, new DeviceDetailsAdapter())
+ .create()
+ .toJson(listRecommendationsAPIObject));
kruizeRecommendationEntry = new KruizeRecommendationEntry();
kruizeRecommendationEntry.setVersion(KruizeConstants.KRUIZE_RECOMMENDATION_API_VERSION.LATEST.getVersionNumber());
kruizeRecommendationEntry.setExperiment_name(listRecommendationsAPIObject.getExperimentName());
@@ -557,6 +569,8 @@ public static List convertResultEntryToUpdateResultsAPIO
.enableComplexMapKeySerialization()
.setDateFormat(KruizeConstants.DateFormats.STANDARD_JSON_DATE_FORMAT)
.registerTypeAdapter(Date.class, new GsonUTCDateAdapter())
+ .registerTypeAdapter(AnalyzerConstants.RecommendationItem.class, new RecommendationItemAdapter())
+ .registerTypeAdapter(DeviceDetails.class, new DeviceDetailsAdapter())
.create();
List updateResultsAPIObjects = new ArrayList<>();
for (KruizeResultsEntry kruizeResultsEntry : kruizeResultsEntries) {
@@ -626,6 +640,8 @@ public static List convertRecommendationEntryToRec
.enableComplexMapKeySerialization()
.setDateFormat(KruizeConstants.DateFormats.STANDARD_JSON_DATE_FORMAT)
.registerTypeAdapter(Date.class, new GsonUTCDateAdapter())
+ .registerTypeAdapter(AnalyzerConstants.RecommendationItem.class, new RecommendationItemAdapter())
+ .registerTypeAdapter(DeviceDetails.class, new DeviceDetailsAdapter())
.create();
List listRecommendationsAPIObjectList = new ArrayList<>();
for (KruizeRecommendationEntry kruizeRecommendationEntry : kruizeRecommendationEntryList) {
diff --git a/src/main/java/com/autotune/database/service/ExperimentDBService.java b/src/main/java/com/autotune/database/service/ExperimentDBService.java
index b04614068..270bff3c1 100644
--- a/src/main/java/com/autotune/database/service/ExperimentDBService.java
+++ b/src/main/java/com/autotune/database/service/ExperimentDBService.java
@@ -27,7 +27,6 @@
import com.autotune.common.data.dataSourceMetadata.DataSourceMetadataInfo;
import com.autotune.common.data.result.ExperimentResultData;
import com.autotune.common.datasource.DataSourceInfo;
-import com.autotune.common.k8sObjects.K8sObject;
import com.autotune.database.dao.ExperimentDAO;
import com.autotune.database.dao.ExperimentDAOImpl;
import com.autotune.database.helper.DBConstants;
@@ -39,10 +38,7 @@
import org.slf4j.LoggerFactory;
import java.sql.Timestamp;
-import java.time.LocalDateTime;
-import java.util.ArrayList;
-import java.util.List;
-import java.util.Map;
+import java.util.*;
public class ExperimentDBService {
private static final long serialVersionUID = 1L;
@@ -251,11 +247,15 @@ public ValidationOutputData addRecommendationToDB(Map expe
convertKruizeObjectTORecommendation(kruizeObject, interval_end_time);
if (null != kr) {
if (KruizeDeploymentInfo.local == true) { //todo this code will be removed
- LocalDateTime localDateTime = kr.getInterval_end_time().toLocalDateTime();
+ // Create a Calendar object and set the time with the timestamp
+ Calendar localDateTime = Calendar.getInstance(TimeZone.getTimeZone("UTC"));
+ localDateTime.setTime(kr.getInterval_end_time());
ExperimentDAO dao = new ExperimentDAOImpl();
- int dayOfTheMonth = localDateTime.getDayOfMonth();
+ int dayOfTheMonth = localDateTime.get(Calendar.DAY_OF_MONTH);
try {
- dao.addPartitions(DBConstants.TABLE_NAMES.KRUIZE_RECOMMENDATIONS, String.format("%02d", localDateTime.getMonthValue()), String.valueOf(localDateTime.getYear()), dayOfTheMonth, DBConstants.PARTITION_TYPES.BY_MONTH);
+ synchronized (new Object()) {
+ dao.addPartitions(DBConstants.TABLE_NAMES.KRUIZE_RECOMMENDATIONS, String.format("%02d", localDateTime.get(Calendar.MONTH) + 1), String.valueOf(localDateTime.get(Calendar.YEAR)), dayOfTheMonth, DBConstants.PARTITION_TYPES.BY_DAY);
+ }
} catch (Exception e) {
LOGGER.warn(e.getMessage());
}
@@ -285,6 +285,7 @@ public ValidationOutputData addPerformanceProfileToDB(PerformanceProfile perform
/**
* Adds Metric Profile to kruizeMetricProfileEntry
+ *
* @param metricProfile Metric profile object to be added
* @return ValidationOutputData object
*/
@@ -391,7 +392,8 @@ public void loadPerformanceProfileFromDBByName(Map p
/**
* Fetches Metric Profile by name from kruizeMetricProfileEntry
- * @param metricProfileMap Map to store metric profile loaded from the database
+ *
+ * @param metricProfileMap Map to store metric profile loaded from the database
* @param metricProfileName Metric profile name to be fetched
* @return ValidationOutputData object
*/
diff --git a/src/main/java/com/autotune/operator/KruizeDeploymentInfo.java b/src/main/java/com/autotune/operator/KruizeDeploymentInfo.java
index 4be00ff62..214fab595 100644
--- a/src/main/java/com/autotune/operator/KruizeDeploymentInfo.java
+++ b/src/main/java/com/autotune/operator/KruizeDeploymentInfo.java
@@ -79,7 +79,10 @@ public class KruizeDeploymentInfo {
public static Integer bulk_update_results_limit = 100;
public static Boolean local = false;
public static Boolean log_http_req_resp = false;
-
+ public static String recommendations_url;
+ public static int BULK_API_LIMIT = 1000;
+ public static int BULK_API_MAX_BATCH_SIZE = 100;
+ public static Integer bulk_thread_pool_size = 3;
public static int generate_recommendations_date_range_limit_in_days = 15;
public static Integer delete_partition_threshold_in_days = DELETE_PARTITION_THRESHOLD_IN_DAYS;
private static Hashtable tunableLayerPair;
diff --git a/src/main/java/com/autotune/utils/KruizeConstants.java b/src/main/java/com/autotune/utils/KruizeConstants.java
index 15779cdae..ab3732843 100644
--- a/src/main/java/com/autotune/utils/KruizeConstants.java
+++ b/src/main/java/com/autotune/utils/KruizeConstants.java
@@ -17,6 +17,8 @@
package com.autotune.utils;
+import com.autotune.analyzer.kruizeObject.CreateExperimentConfigBean;
+
import java.text.SimpleDateFormat;
import java.util.Locale;
import java.util.TimeZone;
@@ -168,6 +170,7 @@ public static final class JSONKeys {
public static final String CONTAINER_METRICS = "container_metrics";
public static final String METRICS = "metrics";
public static final String CONFIG = "config";
+ public static final String METRIC = "metric";
public static final String CURRENT = "current";
public static final String NAME = "name";
public static final String QUERY = "query";
@@ -262,6 +265,10 @@ public static final class JSONKeys {
public static final String PLOTS_DATAPOINTS = "datapoints";
public static final String PLOTS_DATA = "plots_data";
public static final String CONFIDENCE_LEVEL = "confidence_level";
+ public static final String HOSTNAME = "Hostname";
+ public static final String UUID = "UUID";
+ public static final String DEVICE = "device";
+ public static final String MODEL_NAME = "modelName";
private JSONKeys() {
}
@@ -407,6 +414,7 @@ private DataSourceConstants() {
public static class DataSourceDetailsInfoConstants {
public static final String version = "v1.0";
public static final String CLUSTER_NAME = "default";
+
private DataSourceDetailsInfoConstants() {
}
}
@@ -448,6 +456,7 @@ public static class DataSourceErrorMsgs {
public static final String ENDPOINT_NOT_FOUND = "Service endpoint not found.";
public static final String MISSING_DATASOURCE_INFO = "Datasource is missing, add a valid Datasource";
public static final String INVALID_DATASOURCE_INFO = "Datasource is either missing or is invalid";
+
private DataSourceErrorMsgs() {
}
}
@@ -459,6 +468,7 @@ public static class DataSourceQueryJSONKeys {
public static final String METRIC = "metric";
public static final String VALUE = "value";
public static final String VALUES = "values";
+
private DataSourceQueryJSONKeys() {
}
@@ -467,6 +477,7 @@ private DataSourceQueryJSONKeys() {
public static class DataSourceQueryStatus {
public static final String SUCCESS = "success";
public static final String ERROR = "error";
+
private DataSourceQueryStatus() {
}
}
@@ -477,6 +488,7 @@ public static class DataSourceQueryMetricKeys {
public static final String WORKLOAD_TYPE = "workload_type";
public static final String CONTAINER_NAME = "container";
public static final String CONTAINER_IMAGE_NAME = "image";
+
private DataSourceQueryMetricKeys() {
}
}
@@ -484,6 +496,7 @@ private DataSourceQueryMetricKeys() {
public static class DataSourceMetadataInfoConstants {
public static final String version = "v1.0";
public static final String CLUSTER_NAME = "default";
+
private DataSourceMetadataInfoConstants() {
}
}
@@ -520,6 +533,7 @@ public static class DataSourceMetadataErrorMsgs {
public static final String DATASOURCE_METADATA_VALIDATION_FAILURE_MSG = "Validation of imported metadata failed, mandatory fields missing: %s";
public static final String NAMESPACE_QUERY_VALIDATION_FAILED = "Validation failed for namespace data query.";
public static final String DATASOURCE_OPERATOR_RETRIEVAL_FAILURE = "Failed to retrieve data source operator for provider: %s";
+
private DataSourceMetadataErrorMsgs() {
}
}
@@ -537,6 +551,7 @@ public static class DataSourceMetadataInfoJSONKeys {
public static final String CONTAINERS = "containers";
public static final String CONTAINER_NAME = "container_name";
public static final String CONTAINER_IMAGE_NAME = "container_image_name";
+
private DataSourceMetadataInfoJSONKeys() {
}
}
@@ -661,6 +676,10 @@ public static final class KRUIZE_CONFIG_ENV_NAME {
public static final String CLOUDWATCH_LOGS_LOG_LEVEL = "logging_cloudwatch_logLevel";
public static final String LOCAL = "local";
public static final String LOG_HTTP_REQ_RESP = "logAllHttpReqAndResp";
+ public static final String RECOMMENDATIONS_URL = "recommendationsURL";
+ public static final String BULK_API_LIMIT = "bulkapilimit";
+ public static final String BULK_API_CHUNK_SIZE = "bulkapichunksize";
+ public static final String BULK_THREAD_POOL_SIZE = "bulkThreadPoolSize";
}
public static final class RecommendationEngineConstants {
@@ -748,4 +767,30 @@ public static final class AuthenticationConstants {
public static final String AUTHORIZATION = "Authorization";
}
+
+ public static final class KRUIZE_BULK_API {
+ public static final String JOB_ID = "job_id";
+ public static final String ERROR = "error";
+ public static final String JOB_NOT_FOUND_MSG = "Job not found";
+ public static final String IN_PROGRESS = "IN_PROGRESS";
+ public static final String COMPLETED = "COMPLETED";
+ public static final String FAILED = "FAILED";
+ public static final String LIMIT_MESSAGE = "The number of experiments exceeds %s.";
+ public static final String NOTHING = "Nothing to do.";
+ // TODO : Bulk API Create Experiments defaults
+ public static final CreateExperimentConfigBean CREATE_EXPERIMENT_CONFIG_BEAN;
+
+ // Static block to initialize the Bean
+ static {
+ CREATE_EXPERIMENT_CONFIG_BEAN = new CreateExperimentConfigBean();
+ CREATE_EXPERIMENT_CONFIG_BEAN.setMode("monitor");
+ CREATE_EXPERIMENT_CONFIG_BEAN.setTarget("local");
+ CREATE_EXPERIMENT_CONFIG_BEAN.setVersion("v2.0");
+ CREATE_EXPERIMENT_CONFIG_BEAN.setDatasourceName("prometheus-1");
+ CREATE_EXPERIMENT_CONFIG_BEAN.setPerformanceProfile("resource-optimization-local-monitoring");
+ CREATE_EXPERIMENT_CONFIG_BEAN.setThreshold(0.1);
+ CREATE_EXPERIMENT_CONFIG_BEAN.setMeasurementDurationStr("15min");
+ CREATE_EXPERIMENT_CONFIG_BEAN.setMeasurementDuration(15);
+ }
+ }
}
diff --git a/src/main/java/com/autotune/utils/ServerContext.java b/src/main/java/com/autotune/utils/ServerContext.java
index 2c95d3efe..eac7f6079 100644
--- a/src/main/java/com/autotune/utils/ServerContext.java
+++ b/src/main/java/com/autotune/utils/ServerContext.java
@@ -75,4 +75,7 @@ public class ServerContext {
public static final String LIST_NAMESPACES = QUERY_CONTEXT + "listNamespaces";
public static final String LIST_DEPLOYMENTS = QUERY_CONTEXT + "listDeployments";
public static final String LIST_K8S_OBJECTS = QUERY_CONTEXT + "listK8sObjects";
+
+ //Bulk Service
+ public static final String BULK_SERVICE = ROOT_CONTEXT + "bulk";
}
diff --git a/src/main/java/com/autotune/utils/Utils.java b/src/main/java/com/autotune/utils/Utils.java
index 3d65dea4c..1b3b281de 100644
--- a/src/main/java/com/autotune/utils/Utils.java
+++ b/src/main/java/com/autotune/utils/Utils.java
@@ -16,9 +16,12 @@
package com.autotune.utils;
+import com.autotune.analyzer.adapters.DeviceDetailsAdapter;
+import com.autotune.analyzer.adapters.RecommendationItemAdapter;
import com.autotune.analyzer.utils.AnalyzerConstants;
import com.autotune.analyzer.utils.GsonUTCDateAdapter;
import com.autotune.common.data.result.ContainerData;
+import com.autotune.common.data.system.info.device.DeviceDetails;
import com.google.gson.ExclusionStrategy;
import com.google.gson.FieldAttributes;
import com.google.gson.Gson;
@@ -169,6 +172,8 @@ public static T getClone(T object, Class classMetadata) {
.setPrettyPrinting()
.enableComplexMapKeySerialization()
.registerTypeAdapter(Date.class, new GsonUTCDateAdapter())
+ .registerTypeAdapter(AnalyzerConstants.RecommendationItem.class, new RecommendationItemAdapter())
+ .registerTypeAdapter(DeviceDetails.class, new DeviceDetailsAdapter())
.create();
String serialisedString = gson.toJson(object);
diff --git a/tests/test_plans/test_plan_rel_0.0.25.md b/tests/test_plans/test_plan_rel_0.0.25.md
new file mode 100644
index 000000000..312ca0a8c
--- /dev/null
+++ b/tests/test_plans/test_plan_rel_0.0.25.md
@@ -0,0 +1,134 @@
+# KRUIZE TEST PLAN RELEASE 0.0.25
+
+- [INTRODUCTION](#introduction)
+- [FEATURES TO BE TESTED](#features-to-be-tested)
+- [BUG FIXES TO BE TESTED](#bug-fixes-to-be-tested)
+- [TEST ENVIRONMENT](#test-environment)
+- [TEST DELIVERABLES](#test-deliverables)
+ - [New Test Cases Developed](#new-test-cases-developed)
+ - [Regression Testing](#regresion-testing)
+- [SCALABILITY TESTING](#scalability-testing)
+- [RELEASE TESTING](#release-testing)
+- [TEST METRICS](#test-metrics)
+- [RISKS AND CONTINGENCIES](#risks-and-contingencies)
+- [APPROVALS](#approvals)
+
+-----
+
+## INTRODUCTION
+
+This document describes the test plan for Kruize remote monitoring release 0.0.25
+
+----
+
+## FEATURES TO BE TESTED
+
+* Addition of Metric profile json into Kruize manifests
+* Support for Datasource authentication using bearer token
+* Support for Kruize Local Namespace level recommendations
+
+------
+
+## BUG FIXES TO BE TESTED
+
+* Configure openshift port for prometheus service
+
+---
+
+## TEST ENVIRONMENT
+
+* Minikube Cluster
+* Openshift Cluster
+
+---
+
+## TEST DELIVERABLES
+
+### New Test Cases Developed
+
+| # | ISSUE (NEW FEATURE) | TEST DESCRIPTION | TEST DELIVERABLES | RESULTS | COMMENTS |
+|---|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|--------------------------------------------------------|---------| --- |
+| 1 | Addition of Metric profile json into Kruize manifests | Metric profile json location update in existing tests and demos | | PASSED | |
+| 2 | [Support for Datasource authentication using bearer token](https://github.com/kruize/autotune/pull/1289) | Tested manually | | PASSED | |
+| 3 | Support for Kruize Local Namespace level recommendations [1248](https://github.com/kruize/autotune/pull/1248), [1249](https://github.com/kruize/autotune/pull/1249), [1275](https://github.com/kruize/autotune/pull/1275) | [New tests added](https://github.com/kruize/autotune/pull/1293) | | | |
+| 4 | [Configure openshift port for prometheus service](https://github.com/kruize/autotune/pull/1278) | Updated existing tests to test with the specified datasource service name and namespace | [1291](https://github.com/kruize/autotune/pull/1291) | PASSED | |
+
+
+
+### Regression Testing
+
+| # | ISSUE (BUG/NEW FEATURE) | TEST CASE | RESULTS | COMMENTS |
+|---|-------------------------------------------------------|---------------------------------------------------------|---------| --- |
+| 1 | Addition of Metric profile json into Kruize manifests | Kruize local monitoring tests and local monitoring demo | PASSED | |
+| 2 | Configure openshift port for prometheus service | Kruize local monitoring functional tests | PASSED | |
+
+---
+
+## SCALABILITY TESTING
+
+Evaluate Kruize Scalability on OCP, with 5k experiments by uploading resource usage data for 15 days and update recommendations.
+Changes do not have scalability implications. Short scalability test will be run as part of the release testing
+
+Short Scalability run
+- 5K exps / 15 days of results / 2 containers per exp
+- Kruize replicas - 10
+- OCP - Scalelab cluster
+
+Kruize Release | Exps / Results / Recos | Execution time | Latency (Max/ Avg) in seconds | | | Postgres DB size(MB) | Kruize Max CPU | Kruize Max Memory (GB)
+-- |------------------------|----------------|-------------------------------|---------------|----------------------|----------------------|----------------| --
+ | | | | UpdateRecommendations | UpdateResults | LoadResultsByExpName | | |
+0.0.24_mvp | 5K / 72L / 3L | 4h 04 mins | 0.8 / 0.47 | 0.13 / 0.12 | 0.53 / 0.36 | 21752 | 4.63 | 34.72
+0.0.25_mvp | 5K / 72L / 3L | 4h 06 mins | 0.8 / 0.47 | 0.14 / 0.12 | 0.52 / 0.36 | 21756 | 4.91 | 30.13
+
+----
+## RELEASE TESTING
+
+As part of the release testing, following tests will be executed:
+- [Kruize Remote monitoring Functional tests](/tests/scripts/remote_monitoring_tests/Remote_monitoring_tests.md)
+- [Fault tolerant test](/tests/scripts/remote_monitoring_tests/fault_tolerant_tests.md)
+- [Stress test](/tests/scripts/remote_monitoring_tests/README.md)
+- [DB Migration test](/tests/scripts/remote_monitoring_tests/db_migration_test.md)
+- [Recommendation and box plot values validation test](https://github.com/kruize/kruize-demos/blob/main/monitoring/remote_monitoring_demo/recommendations_infra_demo/README.md)
+- [Scalability test (On openshift)](/tests/scripts/remote_monitoring_tests/scalability_test.md) - scalability test with 5000 exps / 15 days usage data
+- [Kruize remote monitoring demo (On minikube)](https://github.com/kruize/kruize-demos/blob/main/monitoring/remote_monitoring_demo/README.md)
+- [Kruize local monitoring demo (On openshift)](https://github.com/kruize/kruize-demos/blob/main/monitoring/local_monitoring_demo)
+- [Kruize local monitoring Functional tests](/tests/scripts/local_monitoring_tests/Local_monitoring_tests.md)
+
+
+| # | TEST SUITE | EXPECTED RESULTS | ACTUAL RESULTS | COMMENTS |
+| --- | ---------- |-----------------------------------------|-----------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| 1 | Kruize Remote monitoring Functional testsuite | TOTAL - 359, PASSED - 316 / FAILED - 43 | TOTAL - 359, PASSED - 316 / FAILED - 43 | Intermittent issue seen [1281](https://github.com/kruize/autotune/issues/1281), existing issues - [559](https://github.com/kruize/autotune/issues/559), [610](https://github.com/kruize/autotune/issues/610) |
+| 2 | Fault tolerant test | PASSED | PASSED | |
+| 3 | Stress test | PASSED | PASSED | |
+| 4 | Scalability test (short run)| | | Exps - 5000, Results - 72000, execution time - 4 hours 6 mins |
+| 5 | DB Migration test | PASSED | PASSED | Tested on Openshift |
+| 6 | Recommendation and box plot values validations | PASSED | PASSED | |
+| 7 | Kruize remote monitoring demo | PASSED | PASSED | Tested manually |
+| 8 | Kruize Local monitoring demo | PASSED | PASSED | |
+| 9 | Kruize Local Functional tests | TOTAL - 78, PASSED - 75 / FAILED - 3 | TOTAL - 78, PASSED - 75 / FAILED - 3 | [Issue 1217](https://github.com/kruize/autotune/issues/1217), [Issue 1273](https://github.com/kruize/autotune/issues/1273) |
+
+---
+
+## TEST METRICS
+
+### Test Completion Criteria
+
+* All must_fix defects identified for the release are fixed
+* New features work as expected and tests have been added to validate these
+* No new regressions in the functional tests
+* All non-functional tests work as expected without major issues
+* Documentation updates have been completed
+
+----
+
+## RISKS AND CONTINGENCIES
+
+* None
+
+----
+## APPROVALS
+
+Sign-off
+
+----
+