Skip to content

Commit

Permalink
[elasticsearch] Extension of the Elasticsearch integration with datas…
Browse files Browse the repository at this point in the history
…tream-centric stats (#11656)
  • Loading branch information
3kt authored Jan 7, 2025
1 parent d93bf87 commit ad70926
Show file tree
Hide file tree
Showing 13 changed files with 7,203 additions and 2 deletions.
36 changes: 36 additions & 0 deletions packages/elasticsearch/_dev/build/docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,3 +150,39 @@ information about all shards.
{{event "shard"}}

{{fields "shard"}}

### Indices and data streams usage analysis

_Technical preview: please report any issue [here](https://github.com/elastic/integrations/issues), and specify the "elasticsearch" integration_

For version 8.17.1+ of the module and collected data, the integration also installs a transform job called `logs-elasticsearch.index_pivot-default-{VERSION}`. This transform **isn't started by default** (Stack management > Transforms), but will perform the following once activated:

* Read the data from the `index` dataset, produced by this very same integration.
* Aggregate the index-level stats in data-stream-centric insights, such as query count, query time or overall data volume.
* This aggregated data is then processed through an additional, integration-installed, ingest pipeline (`{VERSION}-monitoring_indices`) before being shipped to a `monitoring-indices` index.

You can then visualize the resulting data in the `[Elasticsearch] Indices & data streams usage` dashboard.

![Indices & data streams usage](../img/indices_datastream_view.png)

Apart from some high-level statistics, such as total query count, total query time and total addressable data, the dashboard surfaces usage information centered on two dimensions:

* The [data tier](https://www.elastic.co/guide/en/elasticsearch/reference/current/data-tiers.html).
* The data stream (see note below for details about how this is computed).

#### Tier usage

As data ages, it commonly reduces in relative importance and is commonly stored on less efficient and more cost-effective hardware. Usage count and query time should also proportionally diminish. Various visualizations in the dashboard allow you to verify this assumption on your data, and ensure your ILM policy (and therefore data tier transitions) are aligned with how the data is actually being used.

#### Indices and data streams usage

Other visualizations in the dashboard allow you to compare the relative footprint of each data stream, from a storage, querying and indexing perspective. This can help you identify anomalies, stemming from faulty configuration or poor user behavior.

Both approaches can be used in conjunction, allowing you to fine-tune ILM on a data stream basis (if required) to closely match usage patterns.

⚠️ Important notes:

* The transform job will process all compatible historical data, which has two implications: 1. if you have pre-8.17.1 data, this will not get picked up by the job and 2. it might take time for "live" data to be available, as the transform job works its way through all documents. You can modify the transform job as you please if need be.
* The target index `monitoring-indices` is not controlled by ILM. In case you work on a setup with a high count of indices or with a high retention, you may need to tune the transform job, or [activate ILM on the target index](https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-index-lifecycle-management.html#manage-time-series-data-without-data-streams). Per our testing on a cluster with 5000 indices, we generated around 1GB of primary data for each week (your mileage may vary).
* The identification of the data stream is based on the following grok pattern: `^(?:partial-)?(?:restored-)?(?:shrink-.{4}-)?(?:\\.ds-)?(?<elasticsearch.index.datastream>[a-z_0-9\\-\\.]+?)(-(?:\\d{4}\\.\\d{2}(\\.\\d{2})?))?(?:-\\d+)?$`. This should cover all "out of the box" names, but you can modify this to your liking in the `{VERSION}-monitoring_indices` ingest pipeline (though a copy is advised), if you are using non-standard names or would like to aggregate data differently.

5 changes: 5 additions & 0 deletions packages/elasticsearch/changelog.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# newer versions go on top
- version: "1.16.0"
changes:
- description: Add transform job & dashboard for datastream metrics
type: enhancement
link: https://github.com/elastic/integrations/pull/11656
- version: "1.15.3"
changes:
- description: Make elasticsearch.node.name a TSDS dimension to prevent document collisions.
Expand Down
6 changes: 6 additions & 0 deletions packages/elasticsearch/data_stream/index/fields/fields.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,12 @@
type: keyword
- name: status
type: keyword
- name: tier_preference
type: keyword
- name: creation_date
type: date
- name: version
type: keyword
- name: name
type: keyword
dimension: true
Expand Down
5 changes: 4 additions & 1 deletion packages/elasticsearch/data_stream/index/sample_event.json
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@
"index": {
"hidden": true,
"name": ".ml-state-000001",
"tier_preference": "data_content",
"creation_date": 1731657995821,
"version": "8503000",
"primaries": {
"docs": {
"count": 0
Expand Down Expand Up @@ -141,4 +144,4 @@
"address": "http://elastic-package-service-elasticsearch-1:9200",
"type": "elasticsearch"
}
}
}
42 changes: 42 additions & 0 deletions packages/elasticsearch/docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -844,6 +844,9 @@ An example event for `index` looks as following:
"index": {
"hidden": true,
"name": ".ml-state-000001",
"tier_preference": "data_content",
"creation_date": 1731657995821,
"version": "8503000",
"primaries": {
"docs": {
"count": 0
Expand Down Expand Up @@ -974,6 +977,7 @@ An example event for `index` looks as following:
| elasticsearch.cluster.id | Elasticsearch cluster id. | keyword | |
| elasticsearch.cluster.name | Elasticsearch cluster name. | keyword | |
| elasticsearch.cluster.state.id | Elasticsearch state id. | keyword | |
| elasticsearch.index.creation_date | | date | |
| elasticsearch.index.hidden | | boolean | |
| elasticsearch.index.name | Index name. | keyword | |
| elasticsearch.index.primaries.docs.count | | long | gauge |
Expand Down Expand Up @@ -1009,6 +1013,7 @@ An example event for `index` looks as following:
| elasticsearch.index.shards.primaries | | long | |
| elasticsearch.index.shards.total | | long | |
| elasticsearch.index.status | | keyword | |
| elasticsearch.index.tier_preference | | keyword | |
| elasticsearch.index.total.bulk.avg_size_in_bytes | | long | gauge |
| elasticsearch.index.total.bulk.avg_time_in_millis | | long | gauge |
| elasticsearch.index.total.bulk.total_operations | | long | counter |
Expand Down Expand Up @@ -1049,6 +1054,7 @@ An example event for `index` looks as following:
| elasticsearch.index.total.store.size.bytes | | long | gauge |
| elasticsearch.index.total.store.size_in_bytes | Total size of the index in bytes. | long | gauge |
| elasticsearch.index.uuid | | keyword | |
| elasticsearch.index.version | | keyword | |
| elasticsearch.node.id | Node ID | keyword | |
| elasticsearch.node.master | Is the node the master node? | boolean | |
| elasticsearch.node.mlockall | Is mlockall enabled on the node? | boolean | |
Expand Down Expand Up @@ -2648,3 +2654,39 @@ An example event for `shard` looks as following:
| source_node.uuid | | alias |
| timestamp | | alias |


### Indices and data streams usage analysis

_Technical preview: please report any issue [here](https://github.com/elastic/integrations/issues), and specify the "elasticsearch" integration_

For version 8.17.1+ of the module and collected data, the integration also installs a transform job called `logs-elasticsearch.index_pivot-default-{VERSION}`. This transform **isn't started by default** (Stack management > Transforms), but will perform the following once activated:

* Read the data from the `index` dataset, produced by this very same integration.
* Aggregate the index-level stats in data-stream-centric insights, such as query count, query time or overall data volume.
* This aggregated data is then processed through an additional, integration-installed, ingest pipeline (`{VERSION}-monitoring_indices`) before being shipped to a `monitoring-indices` index.

You can then visualize the resulting data in the `[Elasticsearch] Indices & data streams usage` dashboard.

![Indices & data streams usage](../img/indices_datastream_view.png)

Apart from some high-level statistics, such as total query count, total query time and total addressable data, the dashboard surfaces usage information centered on two dimensions:

* The [data tier](https://www.elastic.co/guide/en/elasticsearch/reference/current/data-tiers.html).
* The data stream (see note below for details about how this is computed).

#### Tier usage

As data ages, it commonly reduces in relative importance and is commonly stored on less efficient and more cost-effective hardware. Usage count and query time should also proportionally diminish. Various visualizations in the dashboard allow you to verify this assumption on your data, and ensure your ILM policy (and therefore data tier transitions) are aligned with how the data is actually being used.

#### Indices and data streams usage

Other visualizations in the dashboard allow you to compare the relative footprint of each data stream, from a storage, querying and indexing perspective. This can help you identify anomalies, stemming from faulty configuration or poor user behavior.

Both approaches can be used in conjunction, allowing you to fine-tune ILM on a data stream basis (if required) to closely match usage patterns.

⚠️ Important notes:

* The transform job will process all compatible historical data, which has two implications: 1. if you have pre-8.17.1 data, this will not get picked up by the job and 2. it might take time for "live" data to be available, as the transform job works its way through all documents. You can modify the transform job as you please if need be.
* The target index `monitoring-indices` is not controlled by ILM. In case you work on a setup with a high count of indices or with a high retention, you may need to tune the transform job, or [activate ILM on the target index](https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-index-lifecycle-management.html#manage-time-series-data-without-data-streams). Per our testing on a cluster with 5000 indices, we generated around 1GB of primary data for each week (your mileage may vary).
* The identification of the data stream is based on the following grok pattern: `^(?:partial-)?(?:restored-)?(?:shrink-.{4}-)?(?:\\.ds-)?(?<elasticsearch.index.datastream>[a-z_0-9\\-\\.]+?)(-(?:\\d{4}\\.\\d{2}(\\.\\d{2})?))?(?:-\\d+)?$`. This should cover all "out of the box" names, but you can modify this to your liking in the `{VERSION}-monitoring_indices` ingest pipeline (though a copy is advised), if you are using non-standard names or would like to aggregate data differently.

Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
---
processors:
- set:
field: event.ingested
tag: set_event_ingested
value: "{{_ingest.timestamp}}"
- grok:
field: elasticsearch.index.name
tag: grok_parse_index_name
patterns:
- '^(?:partial-)?(?:restored-)?(?:shrink-.{4}-)?(?:\.ds-)?(?<elasticsearch.index.datastream>[a-z_0-9\-\.]+?)(-(?:\d{4}\.\d{2}(\.\d{2})?))?(?:-\d+)?$'
ignore_failure: true
- script:
source: |
def preference = ctx.end['elasticsearch.index.tier_preference'];
if (preference.contains("data_frozen")) {
ctx.elasticsearch.index.tier = "frozen";
} else if (preference.contains("data_cold")) {
ctx.elasticsearch.index.tier = "cold";
} else if (preference.contains("data_warm")) {
ctx.elasticsearch.index.tier = "warm";
} else if (preference.contains("data_hot") || preference.contains("data_content")) {
ctx.elasticsearch.index.tier = "hot/content";
}
ctx.end.remove('elasticsearch.index.tier_preference');
ignore_failure: true
tag: script_parse_index_tier
# Failure to identify the tier preference will result in the index tier being set to unknown
# This is also the "default" case when tier preference is not available.
- set:
field: elasticsearch.index.tier
value: "unknown"
tag: set_index_tier_unknown
if: "ctx.elasticsearch.index.tier == null"
- foreach:
field: end
processor:
set:
field: "{{ _ingest._key }}"
value: "{{ _ingest._value }}"
tag: set_end_fields
- dot_expander:
field: "*"
tag: dot_expander
- date:
field: elasticsearch.index.creation_date
target_field: elasticsearch.index.creation_date
ignore_failure: true
formats:
- UNIX_MS
tag: date_parse_index_creation_date
- script:
source: |
ZonedDateTime currentDate = ZonedDateTime.parse(ctx['@timestamp']);
ZonedDateTime creationDate = ZonedDateTime.parse(ctx.elasticsearch.index.creation_date);
long ageInMillis = ChronoUnit.MILLIS.between(creationDate, currentDate);
ctx.elasticsearch.index.age = (ageInMillis / (1000 * 60 * 60 * 24)).intValue();
ignore_failure: true
tag: script_compute_index_age
- convert:
field: elasticsearch.index.primaries.docs.count
type: long
ignore_failure: true
tag: convert_primaries_docs_count
- convert:
field: elasticsearch.index.primaries.docs.count_delta
type: long
ignore_failure: true
tag: convert_primaries_docs_count_delta
- convert:
field: elasticsearch.index.primaries.store.total_data_set_size_in_bytes
type: long
ignore_failure: true
tag: convert_primaries_store_total_data_set_size_in_bytes
- convert:
field: elasticsearch.index.primaries.store.total_data_set_size_in_bytes_delta
type: long
ignore_failure: true
tag: convert_primaries_store_total_data_set_size_in_bytes_delta
- convert:
field: elasticsearch.index.total.store.size_in_bytes
type: long
ignore_failure: true
tag: convert_total_store_size_in_bytes
- convert:
field: elasticsearch.index.total.store.size_in_bytes_delta
type: long
ignore_failure: true
tag: convert_total_store_size_in_bytes_delta
- convert:
field: elasticsearch.index.total.search.query_total
type: long
ignore_failure: true
tag: convert_total_search_query_total
- convert:
field: elasticsearch.index.total.search.query_total_delta
type: long
ignore_failure: true
tag: convert_total_search_query_total_delta
- convert:
field: elasticsearch.index.total.search.query_time_in_millis
type: long
ignore_failure: true
tag: convert_total_search_query_time_in_millis
- convert:
field: elasticsearch.index.total.search.query_time_in_millis_delta
type: long
ignore_failure: true
tag: convert_total_search_query_time_in_millis_delta
- convert:
field: elasticsearch.index.total.indexing.index_total
type: long
ignore_failure: true
tag: convert_total_indexing_index_total
- convert:
field: elasticsearch.index.total.indexing.index_total_delta
type: long
ignore_failure: true
tag: convert_total_indexing_index_total_delta
- convert:
field: elasticsearch.index.total.indexing.index_time_in_millis
type: long
ignore_failure: true
tag: convert_total_indexing_index_time_in_millis
- convert:
field: elasticsearch.index.total.indexing.index_time_in_millis_delta
type: long
ignore_failure: true
tag: convert_total_indexing_index_time_in_millis_delta
- remove:
field:
- start
- end
tag: remove_start_end_fields

on_failure:
- set:
field: event.kind
value: "pipeline_error"
- append:
field: error.message
value: "Processor {{ _ingest.on_failure_processor_type }} with tag {{ _ingest.on_failure_processor_tag }} failed with message {{ _ingest.on_failure_message }}"
Loading

0 comments on commit ad70926

Please sign in to comment.