Skip to content

Commit

Permalink
Rest of changes
Browse files Browse the repository at this point in the history
Signed-off-by: natebower <[email protected]>
  • Loading branch information
natebower committed Jan 17, 2025
1 parent 4a92812 commit 8027b42
Show file tree
Hide file tree
Showing 65 changed files with 93 additions and 93 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ nav_order: 10

# Codec processor combinations

At ingestion time, data received by the [`s3` source]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3/) can be parsed by [codecs]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3#codec). Codecs compresses and decompresses large data sets in a certain format before ingestion them through a Data Prepper pipeline [processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/processors/).
At ingestion time, data received by the [`s3` source]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3/) can be parsed by [codecs]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3#codec). Codecs compresses and decompresses large data sets in a certain format before ingestion them through an OpenSearch Data Prepper pipeline [processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/processors/).

While most codecs can be used with most processors, the following codec processor combinations can make your pipeline more efficient when used with the following input types.

Expand Down
2 changes: 1 addition & 1 deletion _data-prepper/common-use-cases/common-use-cases.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ redirect_from:

# Common use cases

You can use Data Prepper for several different purposes, including trace analytics, log analytics, Amazon S3 log analytics, and metrics ingestion.
You can use OpenSearch Data Prepper for several different purposes, including trace analytics, log analytics, Amazon S3 log analytics, and metrics ingestion.
2 changes: 1 addition & 1 deletion _data-prepper/common-use-cases/event-aggregation.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ nav_order: 25

# Event aggregation

You can use Data Prepper to aggregate data from different events over a period of time. Aggregating events can help to reduce unnecessary log volume and manage use cases like multiline logs that are received as separate events. The [`aggregate` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/aggregate/) is a stateful processor that groups events based on the values for a set of specified identification keys and performs a configurable action on each group.
You can use OpenSearch Data Prepper to aggregate data from different events over a period of time. Aggregating events can help to reduce unnecessary log volume and manage use cases like multiline logs that are received as separate events. The [`aggregate` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/aggregate/) is a stateful processor that groups events based on the values for a set of specified identification keys and performs a configurable action on each group.

The `aggregate` processor state is stored in memory. For example, in order to combine four events into one, the processor needs to retain pieces of the first three events. The state of an aggregate group of events is kept for a configurable amount of time. Depending on your logs, the aggregate action being used, and the number of memory options in the processor configuration, the aggregation could take place over a long period of time.

Expand Down
2 changes: 1 addition & 1 deletion _data-prepper/common-use-cases/log-analytics.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ nav_order: 30

# Log analytics

Data Prepper is an extendable, configurable, and scalable solution for log ingestion into OpenSearch and Amazon OpenSearch Service. Data Prepper supports receiving logs from [Fluent Bit](https://fluentbit.io/) through the [HTTP Source](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/http-source/README.md) and processing those logs with a [Grok Processor](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/grok-processor/README.md) before ingesting them into OpenSearch through the [OpenSearch sink](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/README.md).
OpenSearch Data Prepper is an extendable, configurable, and scalable solution for log ingestion into OpenSearch and Amazon OpenSearch Service. Data Prepper supports receiving logs from [Fluent Bit](https://fluentbit.io/) through the [HTTP Source](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/http-source/README.md) and processing those logs with a [Grok Processor](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/grok-processor/README.md) before ingesting them into OpenSearch through the [OpenSearch sink](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/README.md).

The following image shows all of the components used for log analytics with Fluent Bit, Data Prepper, and OpenSearch.

Expand Down
2 changes: 1 addition & 1 deletion _data-prepper/common-use-cases/log-enrichment.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ nav_order: 35

# Log enrichment

You can perform different types of log enrichment with Data Prepper, including:
You can perform different types of log enrichment with OpenSearch Data Prepper, including:

- Filtering.
- Extracting key-value pairs from strings.
Expand Down
2 changes: 1 addition & 1 deletion _data-prepper/common-use-cases/metrics-logs.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ nav_order: 15

# Deriving metrics from logs

You can use Data Prepper to derive metrics from logs.
You can use OpenSearch Data Prepper to derive metrics from logs.

The following example pipeline receives incoming logs using the [`http` source plugin]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/http-source) and the [`grok` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/grok/). It then uses the [`aggregate` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/aggregate/) to extract the metric bytes aggregated during a 30-second window and derives histograms from the results.

Expand Down
2 changes: 1 addition & 1 deletion _data-prepper/common-use-cases/metrics-traces.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ nav_order: 20

# Deriving metrics from traces

You can use Data Prepper to derive metrics from OpenTelemetry traces. The following example pipeline receives incoming traces and extracts a metric called `durationInNanos`, aggregated over a tumbling window of 30 seconds. It then derives a histogram from the incoming traces.
You can use OpenSearch Data Prepper to derive metrics from OpenTelemetry traces. The following example pipeline receives incoming traces and extracts a metric called `durationInNanos`, aggregated over a tumbling window of 30 seconds. It then derives a histogram from the incoming traces.

The pipeline contains the following pipelines:

Expand Down
2 changes: 1 addition & 1 deletion _data-prepper/common-use-cases/s3-logs.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ nav_order: 40

# S3 logs

Data Prepper allows you to load logs from [Amazon Simple Storage Service](https://aws.amazon.com/s3/) (Amazon S3), including traditional logs, JSON documents, and CSV logs.
OpenSearch Data Prepper allows you to load logs from [Amazon Simple Storage Service](https://aws.amazon.com/s3/) (Amazon S3), including traditional logs, JSON documents, and CSV logs.

## Architecture

Expand Down
2 changes: 1 addition & 1 deletion _data-prepper/common-use-cases/sampling.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ nav_order: 45

# Sampling

Data Prepper provides the following sampling capabilities:
OpenSearch Data Prepper provides the following sampling capabilities:

- Time sampling
- Percentage sampling
Expand Down
2 changes: 1 addition & 1 deletion _data-prepper/common-use-cases/text-processing.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ nav_order: 55

# Text processing

Data Prepper provides text processing capabilities with the [`grok processor`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/grok/). The `grok` processor is based on the [`java-grok`](https://mvnrepository.com/artifact/io.krakens/java-grok) library and supports all compatible patterns. The `java-grok` library is built using the [`java.util.regex`](https://docs.oracle.com/javase/8/docs/api/java/util/regex/package-summary.html) regular expression library.
OpenSearch Data Prepper provides text processing capabilities with the [`grok processor`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/grok/). The `grok` processor is based on the [`java-grok`](https://mvnrepository.com/artifact/io.krakens/java-grok) library and supports all compatible patterns. The `java-grok` library is built using the [`java.util.regex`](https://docs.oracle.com/javase/8/docs/api/java/util/regex/package-summary.html) regular expression library.

You can add custom patterns to your pipelines by using the `patterns_definitions` option. When debugging custom patterns, the [Grok Debugger](https://grokdebugger.com/) can be helpful.

Expand Down
2 changes: 1 addition & 1 deletion _data-prepper/common-use-cases/trace-analytics.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ nav_order: 60

# Trace analytics

Trace analytics allows you to collect trace data and customize a pipeline that ingests and transforms the data for use in OpenSearch. The following provides an overview of the trace analytics workflow in Data Prepper, how to configure it, and how to visualize trace data.
Trace analytics allows you to collect trace data and customize a pipeline that ingests and transforms the data for use in OpenSearch. The following provides an overview of the trace analytics workflow in OpenSearch Data Prepper, how to configure it, and how to visualize trace data.

## Introduction

Expand Down
6 changes: 3 additions & 3 deletions _data-prepper/getting-started.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
---
layout: default
title: Getting started
title: Getting started with OpenSearch Data Prepper
nav_order: 5
redirect_from:
- /clients/data-prepper/get-started/
---

# Getting started with Data Prepper
# Getting started with OpenSearch Data Prepper

Data Prepper is an independent component, not an OpenSearch plugin, that converts data for use with OpenSearch. It's not bundled with the all-in-one OpenSearch installation packages.
OpenSearch Data Prepper is an independent component, not an OpenSearch plugin, that converts data for use with OpenSearch. It's not bundled with the all-in-one OpenSearch installation packages.

If you are migrating from Open Distro Data Prepper, see [Migrating from Open Distro]({{site.url}}{{site.baseurl}}/data-prepper/migrate-open-distro/).
{: .note}
Expand Down
8 changes: 4 additions & 4 deletions _data-prepper/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
layout: default
title: Data Prepper
title: OpenSearch Data Prepper
nav_order: 1
has_children: false
has_toc: false
Expand All @@ -12,9 +12,9 @@ redirect_from:
- /data-prepper/index/
---

# Data Prepper
# OpenSearch Data Prepper

Data Prepper is a server-side data collector capable of filtering, enriching, transforming, normalizing, and aggregating data for downstream analysis and visualization. Data Prepper is the preferred data ingestion tool for OpenSearch. It is recommended for most data ingestion use cases in OpenSearch and for processing large, complex datasets.
OpenSearch Data Prepper is a server-side data collector capable of filtering, enriching, transforming, normalizing, and aggregating data for downstream analysis and visualization. Data Prepper is the preferred data ingestion tool for OpenSearch. It is recommended for most data ingestion use cases in OpenSearch and for processing large, complex datasets.

With Data Prepper you can build custom pipelines to improve the operational view of applications. Two common use cases for Data Prepper are trace analytics and log analytics. [Trace analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/trace-analytics/) can help you visualize event flows and identify performance problems. [Log analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/log-analytics/) equips you with tools to enhance your search capabilities, conduct comprehensive analysis, and gain insights into your applications' performance and behavior.

Expand Down Expand Up @@ -74,6 +74,6 @@ In the given pipeline configuration, the `source` component reads string events

## Next steps

- [Get started with Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/).
- [Getting started with OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/).
- [Get familiar with Data Prepper pipelines]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/).
- [Explore common use cases]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/common-use-cases/).
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
---
layout: default
title: Configuring Data Prepper
parent: Managing Data Prepper
title: Configuring OpenSearch Data Prepper
parent: Managing OpenSearch Data Prepper
nav_order: 5
redirect_from:
- /clients/data-prepper/data-prepper-reference/
- /monitoring-plugins/trace/data-prepper-reference/
---

# Configuring Data Prepper
# Configuring OpenSearch Data Prepper

You can customize your Data Prepper configuration by editing the `data-prepper-config.yaml` file in your Data Prepper installation. The following configuration options are independent from pipeline configuration options.
You can customize your OpenSearch Data Prepper configuration by editing the `data-prepper-config.yaml` file in your Data Prepper installation. The following configuration options are independent from pipeline configuration options.


## Data Prepper configuration
Expand Down
4 changes: 2 additions & 2 deletions _data-prepper/managing-data-prepper/configuring-log4j.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
---
layout: default
title: Configuring Log4j
parent: Managing Data Prepper
parent: Managing OpenSearch Data Prepper
nav_order: 20
---

# Configuring Log4j

You can configure logging using Log4j in Data Prepper.
You can configure logging using Log4j in OpenSearch Data Prepper.

## Logging

Expand Down
4 changes: 2 additions & 2 deletions _data-prepper/managing-data-prepper/core-apis.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
---
layout: default
title: Core APIs
parent: Managing Data Prepper
parent: Managing OpenSearch Data Prepper
nav_order: 15
---

# Core APIs

All Data Prepper instances expose a server with some control APIs. By default, this server runs on port 4900. Some plugins, especially source plugins, may expose other servers that run on different ports. Configurations for these plugins are independent of the core API. For example, to shut down Data Prepper, you can run the following curl request:
All OpenSearch Data Prepper instances expose a server with some control APIs. By default, this server runs on port 4900. Some plugins, especially source plugins, may expose other servers that run on different ports. Configurations for these plugins are independent of the core API. For example, to shut down Data Prepper, you can run the following curl request:

```
curl -X POST http://localhost:4900/shutdown
Expand Down
4 changes: 2 additions & 2 deletions _data-prepper/managing-data-prepper/extensions/extensions.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
---
layout: default
title: Extensions
parent: Managing Data Prepper
parent: Managing OpenSearch Data Prepper
has_children: true
nav_order: 18
---

# Extensions

Data Prepper extensions provide Data Prepper functionality outside of core Data Prepper pipeline components.
OpenSearch Data Prepper extensions provide Data Prepper functionality outside of core Data Prepper pipeline components.
Many extensions provide configuration options that give Data Prepper administrators greater flexibility over Data Prepper's functionality.

Extension configurations can be configured in the `data-prepper-config.yaml` file under the `extensions:` YAML block.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@ layout: default
title: geoip_service
nav_order: 5
parent: Extensions
grand_parent: Managing Data Prepper
grand_parent: Managing OpenSearch Data Prepper
---

# geoip_service

The `geoip_service` extension configures all [`geoip`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/geoip) processors in Data Prepper.
The `geoip_service` extension configures all [`geoip`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/geoip) processors in OpenSearch Data Prepper.

## Usage

Expand Down
6 changes: 3 additions & 3 deletions _data-prepper/managing-data-prepper/managing-data-prepper.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
layout: default
title: Managing Data Prepper
title: Managing OpenSearch Data Prepper
has_children: true
nav_order: 20
---

# Managing Data Prepper
# Managing OpenSearch Data Prepper

You can perform administrator functions for Data Prepper, including system configuration, interacting with core APIs, Log4j configuration, and monitoring. You can set up peer forwarding to coordinate multiple Data Prepper nodes when using stateful aggregation.
You can perform administrator functions for OpenSearch Data Prepper, including system configuration, interacting with core APIs, Log4j configuration, and monitoring. You can set up peer forwarding to coordinate multiple Data Prepper nodes when using stateful aggregation.
6 changes: 3 additions & 3 deletions _data-prepper/managing-data-prepper/monitoring.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
---
layout: default
title: Monitoring
parent: Managing Data Prepper
parent: Managing OpenSearch Data Prepper
nav_order: 25
---

# Monitoring Data Prepper with metrics
# Monitoring OpenSearch Data Prepper with metrics

You can monitor Data Prepper with metrics using [Micrometer](https://micrometer.io/). There are two types of metrics: JVM/system metrics and plugin metrics. [Prometheus](https://prometheus.io/) is used as the default metrics backend.
You can monitor OpenSearch Data Prepper with metrics using [Micrometer](https://micrometer.io/). There are two types of metrics: JVM/system metrics and plugin metrics. [Prometheus](https://prometheus.io/) is used as the default metrics backend.

## JVM and system metrics

Expand Down
4 changes: 2 additions & 2 deletions _data-prepper/managing-data-prepper/peer-forwarder.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@
layout: default
title: Peer forwarder
nav_order: 12
parent: Managing Data Prepper
parent: Managing OpenSearch Data Prepper
---

# Peer forwarder

Peer forwarder is an HTTP service that performs peer forwarding of an `event` between Data Prepper nodes for aggregation. This HTTP service uses a hash-ring approach to aggregate events and determine which Data Prepper node it should handle on a given trace before rerouting it to that node. Currently, peer forwarder is supported by the `aggregate`, `service_map_stateful`, and `otel_traces_raw` [processors]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/processors/).
Peer forwarder is an HTTP service that performs peer forwarding of an `event` between OpenSearch Data Prepper nodes for aggregation. This HTTP service uses a hash-ring approach to aggregate events and determine which Data Prepper node it should handle on a given trace before rerouting it to that node. Currently, peer forwarder is supported by the `aggregate`, `service_map_stateful`, and `otel_traces_raw` [processors]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/processors/).

Check failure on line 10 in _data-prepper/managing-data-prepper/peer-forwarder.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _data-prepper/managing-data-prepper/peer-forwarder.md#L10

[Vale.Terms] Use 'Peer Forwarder' instead of 'Peer forwarder'.
Raw output
{"message": "[Vale.Terms] Use 'Peer Forwarder' instead of 'Peer forwarder'.", "location": {"path": "_data-prepper/managing-data-prepper/peer-forwarder.md", "range": {"start": {"line": 10, "column": 1}}}, "severity": "ERROR"}

Check failure on line 10 in _data-prepper/managing-data-prepper/peer-forwarder.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _data-prepper/managing-data-prepper/peer-forwarder.md#L10

[Vale.Terms] Use 'Peer Forwarder' instead of 'peer forwarder'.
Raw output
{"message": "[Vale.Terms] Use 'Peer Forwarder' instead of 'peer forwarder'.", "location": {"path": "_data-prepper/managing-data-prepper/peer-forwarder.md", "range": {"start": {"line": 10, "column": 315}}}, "severity": "ERROR"}

Peer Forwarder groups events based on the identification keys provided by the supported processors. For `service_map_stateful` and `otel_traces_raw`, the identification key is `traceId` by default and cannot be configured. The `aggregate` processor is configured using the `identification_keys` configuration option. From here, you can specify which keys to use for Peer Forwarder. See [Aggregate Processor page](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/aggregate-processor#identification_keys) for more information about identification keys.

Expand Down
4 changes: 2 additions & 2 deletions _data-prepper/managing-data-prepper/source-coordination.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@
layout: default
title: Source coordination
nav_order: 35
parent: Managing Data Prepper
parent: Managing OpenSearch Data Prepper
---

# Source coordination

_Source coordination_ is the concept of coordinating and distributing work between Data Prepper data sources in a multi-node environment. Some data sources, such as Amazon Kinesis or Amazon Simple Queue Service (Amazon SQS), handle coordination natively. Other data sources, such as OpenSearch, Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB, and JDBC/ODBC, do not support source coordination.
_Source coordination_ is the concept of coordinating and distributing work between OpenSearch Data Prepper data sources in a multi-node environment. Some data sources, such as Amazon Kinesis or Amazon Simple Queue Service (Amazon SQS), handle coordination natively. Other data sources, such as OpenSearch, Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB, and JDBC/ODBC, do not support source coordination.

Check failure on line 10 in _data-prepper/managing-data-prepper/source-coordination.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _data-prepper/managing-data-prepper/source-coordination.md#L10

[OpenSearch.Spelling] Error: Kinesis. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Kinesis. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_data-prepper/managing-data-prepper/source-coordination.md", "range": {"start": {"line": 10, "column": 184}}}, "severity": "ERROR"}

Data Prepper source coordination decides which partition of work is performed by each node in the Data Prepper cluster and prevents duplicate partitions of work.

Expand Down
2 changes: 1 addition & 1 deletion _data-prepper/migrate-open-distro.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,4 @@ In your Data Prepper Docker configuration, adjust `amazon/opendistro-for-elastic

## Next steps

For more information about Data Prepper configurations, see [Getting Started with Data Prepper]({{site.url}}{{site.baseurl}}/clients/data-prepper/get-started/).
For more information about Data Prepper configurations, see [Getting Started with OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/clients/data-prepper/get-started/).
Loading

0 comments on commit 8027b42

Please sign in to comment.