Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove workload_detail and workload_detail_volume metrics in Harvest #3423

Closed
rahulguptajss opened this issue Jan 9, 2025 · 0 comments · Fixed by #3433
Closed

Remove workload_detail and workload_detail_volume metrics in Harvest #3423

rahulguptajss opened this issue Jan 9, 2025 · 0 comments · Fixed by #3433
Assignees
Labels
25.02 feature New feature or request status/testme

Comments

@rahulguptajss
Copy link
Contributor

rahulguptajss commented Jan 9, 2025

Harvest collects workload_detail and workload_detail_volume objects through the ZapiPerf and RestPerf collectors. These objects have been slow to collect due to the exponential nature of time taken in ONTAP. By default, these objects are disabled but can be enabled by the customer to be consumed via the workload dashboard.

Recently, it was identified that these objects use ops as denominator from workload and workload_volume objects for latency calculation. This is done post-collection of metrics from workload_detail and workload_detail_volume. This causes skew in calculation as the ops collected are from different timestamps compared to other metrics. There is no foolproof way to use REST or ZAPI currently to get these metrics from the same timestamp to cook the data on the Harvest side.

There are alternative objects to workload_detail and workload_detail_volume, which are workload_queue_nblade and workload_queue_dblade. The same ops problem also happens with workload_queue_dblade as it needs ops from workload_queue_nblade for latency calculation, which will lead to skew. workload_queue_nblade and dblade are not available via the REST API.

Given this, Harvest cannot reliably calculate subsystem latencies for a workload. Hence, it has been decided to remove these metrics from Harvest.

More details here.

Dashboard Panels to Be Removed

The following dashboard panels will be removed as part of this issue:

Dashboard Row
ONTAP: LUN Top Volume Latency from QoS
ONTAP: SVM QoS Policy Group Latency from Resource
ONTAP: Workload Latency Breakdown
ONTAP: Volume QoS Resource Latency

Metrics to Be Removed

  • qos_detail_resource_latency
  • qos_detail_ops
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
25.02 feature New feature or request status/testme
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant