diff --git a/_data-prepper/common-use-cases/codec-processor-combinations.md b/_data-prepper/common-use-cases/codec-processor-combinations.md index 525bc704be..279c7d530b 100644 --- a/_data-prepper/common-use-cases/codec-processor-combinations.md +++ b/_data-prepper/common-use-cases/codec-processor-combinations.md @@ -7,7 +7,7 @@ nav_order: 10 # Codec processor combinations -At ingestion time, data received by the [`s3` source]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3/) can be parsed by [codecs]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3#codec). Codecs compresses and decompresses large data sets in a certain format before ingestion them through a Data Prepper pipeline [processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/processors/). +At ingestion time, data received by the [`s3` source]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3/) can be parsed by [codecs]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3#codec). Codecs compresses and decompresses large data sets in a certain format before ingestion them through an OpenSearch Data Prepper pipeline [processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/processors/). While most codecs can be used with most processors, the following codec processor combinations can make your pipeline more efficient when used with the following input types. diff --git a/_data-prepper/common-use-cases/common-use-cases.md b/_data-prepper/common-use-cases/common-use-cases.md index 342a8fc819..adca11418b 100644 --- a/_data-prepper/common-use-cases/common-use-cases.md +++ b/_data-prepper/common-use-cases/common-use-cases.md @@ -9,4 +9,4 @@ redirect_from: # Common use cases -You can use Data Prepper for several different purposes, including trace analytics, log analytics, Amazon S3 log analytics, and metrics ingestion. \ No newline at end of file +You can use OpenSearch Data Prepper for several different purposes, including trace analytics, log analytics, Amazon S3 log analytics, and metrics ingestion. \ No newline at end of file diff --git a/_data-prepper/common-use-cases/event-aggregation.md b/_data-prepper/common-use-cases/event-aggregation.md index f6e2757d9a..4e1464b505 100644 --- a/_data-prepper/common-use-cases/event-aggregation.md +++ b/_data-prepper/common-use-cases/event-aggregation.md @@ -7,7 +7,7 @@ nav_order: 25 # Event aggregation -You can use Data Prepper to aggregate data from different events over a period of time. Aggregating events can help to reduce unnecessary log volume and manage use cases like multiline logs that are received as separate events. The [`aggregate` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/aggregate/) is a stateful processor that groups events based on the values for a set of specified identification keys and performs a configurable action on each group. +You can use OpenSearch Data Prepper to aggregate data from different events over a period of time. Aggregating events can help to reduce unnecessary log volume and manage use cases like multiline logs that are received as separate events. The [`aggregate` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/aggregate/) is a stateful processor that groups events based on the values for a set of specified identification keys and performs a configurable action on each group. The `aggregate` processor state is stored in memory. For example, in order to combine four events into one, the processor needs to retain pieces of the first three events. The state of an aggregate group of events is kept for a configurable amount of time. Depending on your logs, the aggregate action being used, and the number of memory options in the processor configuration, the aggregation could take place over a long period of time. diff --git a/_data-prepper/common-use-cases/log-analytics.md b/_data-prepper/common-use-cases/log-analytics.md index ceb26ff5b7..242e16dfe9 100644 --- a/_data-prepper/common-use-cases/log-analytics.md +++ b/_data-prepper/common-use-cases/log-analytics.md @@ -7,7 +7,7 @@ nav_order: 30 # Log analytics -Data Prepper is an extendable, configurable, and scalable solution for log ingestion into OpenSearch and Amazon OpenSearch Service. Data Prepper supports receiving logs from [Fluent Bit](https://fluentbit.io/) through the [HTTP Source](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/http-source/README.md) and processing those logs with a [Grok Processor](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/grok-processor/README.md) before ingesting them into OpenSearch through the [OpenSearch sink](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/README.md). +OpenSearch Data Prepper is an extendable, configurable, and scalable solution for log ingestion into OpenSearch and Amazon OpenSearch Service. Data Prepper supports receiving logs from [Fluent Bit](https://fluentbit.io/) through the [HTTP Source](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/http-source/README.md) and processing those logs with a [Grok Processor](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/grok-processor/README.md) before ingesting them into OpenSearch through the [OpenSearch sink](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/README.md). The following image shows all of the components used for log analytics with Fluent Bit, Data Prepper, and OpenSearch. diff --git a/_data-prepper/common-use-cases/log-enrichment.md b/_data-prepper/common-use-cases/log-enrichment.md index 0d8ce4ab7d..c09fdec603 100644 --- a/_data-prepper/common-use-cases/log-enrichment.md +++ b/_data-prepper/common-use-cases/log-enrichment.md @@ -7,7 +7,7 @@ nav_order: 35 # Log enrichment -You can perform different types of log enrichment with Data Prepper, including: +You can perform different types of log enrichment with OpenSearch Data Prepper, including: - Filtering. - Extracting key-value pairs from strings. diff --git a/_data-prepper/common-use-cases/metrics-logs.md b/_data-prepper/common-use-cases/metrics-logs.md index 3fda8597c7..fc0518ce26 100644 --- a/_data-prepper/common-use-cases/metrics-logs.md +++ b/_data-prepper/common-use-cases/metrics-logs.md @@ -7,7 +7,7 @@ nav_order: 15 # Deriving metrics from logs -You can use Data Prepper to derive metrics from logs. +You can use OpenSearch Data Prepper to derive metrics from logs. The following example pipeline receives incoming logs using the [`http` source plugin]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/http-source) and the [`grok` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/grok/). It then uses the [`aggregate` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/aggregate/) to extract the metric bytes aggregated during a 30-second window and derives histograms from the results. diff --git a/_data-prepper/common-use-cases/metrics-traces.md b/_data-prepper/common-use-cases/metrics-traces.md index c15eaa099b..2cd0dafbb7 100644 --- a/_data-prepper/common-use-cases/metrics-traces.md +++ b/_data-prepper/common-use-cases/metrics-traces.md @@ -7,7 +7,7 @@ nav_order: 20 # Deriving metrics from traces -You can use Data Prepper to derive metrics from OpenTelemetry traces. The following example pipeline receives incoming traces and extracts a metric called `durationInNanos`, aggregated over a tumbling window of 30 seconds. It then derives a histogram from the incoming traces. +You can use OpenSearch Data Prepper to derive metrics from OpenTelemetry traces. The following example pipeline receives incoming traces and extracts a metric called `durationInNanos`, aggregated over a tumbling window of 30 seconds. It then derives a histogram from the incoming traces. The pipeline contains the following pipelines: diff --git a/_data-prepper/common-use-cases/s3-logs.md b/_data-prepper/common-use-cases/s3-logs.md index 8d5a9ce967..2f93c1281d 100644 --- a/_data-prepper/common-use-cases/s3-logs.md +++ b/_data-prepper/common-use-cases/s3-logs.md @@ -7,7 +7,7 @@ nav_order: 40 # S3 logs -Data Prepper allows you to load logs from [Amazon Simple Storage Service](https://aws.amazon.com/s3/) (Amazon S3), including traditional logs, JSON documents, and CSV logs. +OpenSearch Data Prepper allows you to load logs from [Amazon Simple Storage Service](https://aws.amazon.com/s3/) (Amazon S3), including traditional logs, JSON documents, and CSV logs. ## Architecture diff --git a/_data-prepper/common-use-cases/sampling.md b/_data-prepper/common-use-cases/sampling.md index 7c77e8c3f2..47bead4649 100644 --- a/_data-prepper/common-use-cases/sampling.md +++ b/_data-prepper/common-use-cases/sampling.md @@ -7,7 +7,7 @@ nav_order: 45 # Sampling -Data Prepper provides the following sampling capabilities: +OpenSearch Data Prepper provides the following sampling capabilities: - Time sampling - Percentage sampling diff --git a/_data-prepper/common-use-cases/text-processing.md b/_data-prepper/common-use-cases/text-processing.md index 041ca63ab2..1fc81c5d98 100644 --- a/_data-prepper/common-use-cases/text-processing.md +++ b/_data-prepper/common-use-cases/text-processing.md @@ -7,7 +7,7 @@ nav_order: 55 # Text processing -Data Prepper provides text processing capabilities with the [`grok processor`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/grok/). The `grok` processor is based on the [`java-grok`](https://mvnrepository.com/artifact/io.krakens/java-grok) library and supports all compatible patterns. The `java-grok` library is built using the [`java.util.regex`](https://docs.oracle.com/javase/8/docs/api/java/util/regex/package-summary.html) regular expression library. +OpenSearch Data Prepper provides text processing capabilities with the [`grok processor`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/grok/). The `grok` processor is based on the [`java-grok`](https://mvnrepository.com/artifact/io.krakens/java-grok) library and supports all compatible patterns. The `java-grok` library is built using the [`java.util.regex`](https://docs.oracle.com/javase/8/docs/api/java/util/regex/package-summary.html) regular expression library. You can add custom patterns to your pipelines by using the `patterns_definitions` option. When debugging custom patterns, the [Grok Debugger](https://grokdebugger.com/) can be helpful. diff --git a/_data-prepper/common-use-cases/trace-analytics.md b/_data-prepper/common-use-cases/trace-analytics.md index 1a961077fe..a91f37823c 100644 --- a/_data-prepper/common-use-cases/trace-analytics.md +++ b/_data-prepper/common-use-cases/trace-analytics.md @@ -7,7 +7,7 @@ nav_order: 60 # Trace analytics -Trace analytics allows you to collect trace data and customize a pipeline that ingests and transforms the data for use in OpenSearch. The following provides an overview of the trace analytics workflow in Data Prepper, how to configure it, and how to visualize trace data. +Trace analytics allows you to collect trace data and customize a pipeline that ingests and transforms the data for use in OpenSearch. The following provides an overview of the trace analytics workflow in OpenSearch Data Prepper, how to configure it, and how to visualize trace data. ## Introduction diff --git a/_data-prepper/getting-started.md b/_data-prepper/getting-started.md index 624cd5fcbc..5dc90316d0 100644 --- a/_data-prepper/getting-started.md +++ b/_data-prepper/getting-started.md @@ -1,14 +1,14 @@ --- layout: default -title: Getting started +title: Getting started with OpenSearch Data Prepper nav_order: 5 redirect_from: - /clients/data-prepper/get-started/ --- -# Getting started with Data Prepper +# Getting started with OpenSearch Data Prepper -Data Prepper is an independent component, not an OpenSearch plugin, that converts data for use with OpenSearch. It's not bundled with the all-in-one OpenSearch installation packages. +OpenSearch Data Prepper is an independent component, not an OpenSearch plugin, that converts data for use with OpenSearch. It's not bundled with the all-in-one OpenSearch installation packages. If you are migrating from Open Distro Data Prepper, see [Migrating from Open Distro]({{site.url}}{{site.baseurl}}/data-prepper/migrate-open-distro/). {: .note} diff --git a/_data-prepper/index.md b/_data-prepper/index.md index e418aa1966..63ff2fd07c 100644 --- a/_data-prepper/index.md +++ b/_data-prepper/index.md @@ -1,6 +1,6 @@ --- layout: default -title: Data Prepper +title: OpenSearch Data Prepper nav_order: 1 has_children: false has_toc: false @@ -12,9 +12,9 @@ redirect_from: - /data-prepper/index/ --- -# Data Prepper +# OpenSearch Data Prepper -Data Prepper is a server-side data collector capable of filtering, enriching, transforming, normalizing, and aggregating data for downstream analysis and visualization. Data Prepper is the preferred data ingestion tool for OpenSearch. It is recommended for most data ingestion use cases in OpenSearch and for processing large, complex datasets. +OpenSearch Data Prepper is a server-side data collector capable of filtering, enriching, transforming, normalizing, and aggregating data for downstream analysis and visualization. Data Prepper is the preferred data ingestion tool for OpenSearch. It is recommended for most data ingestion use cases in OpenSearch and for processing large, complex datasets. With Data Prepper you can build custom pipelines to improve the operational view of applications. Two common use cases for Data Prepper are trace analytics and log analytics. [Trace analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/trace-analytics/) can help you visualize event flows and identify performance problems. [Log analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/log-analytics/) equips you with tools to enhance your search capabilities, conduct comprehensive analysis, and gain insights into your applications' performance and behavior. @@ -74,6 +74,6 @@ In the given pipeline configuration, the `source` component reads string events ## Next steps -- [Get started with Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/). +- [Getting started with OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/). - [Get familiar with Data Prepper pipelines]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/). - [Explore common use cases]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/common-use-cases/). diff --git a/_data-prepper/managing-data-prepper/configuring-data-prepper.md b/_data-prepper/managing-data-prepper/configuring-data-prepper.md index e42a9e9449..ab5f3aa066 100644 --- a/_data-prepper/managing-data-prepper/configuring-data-prepper.md +++ b/_data-prepper/managing-data-prepper/configuring-data-prepper.md @@ -1,16 +1,16 @@ --- layout: default -title: Configuring Data Prepper -parent: Managing Data Prepper +title: Configuring OpenSearch Data Prepper +parent: Managing OpenSearch Data Prepper nav_order: 5 redirect_from: - /clients/data-prepper/data-prepper-reference/ - /monitoring-plugins/trace/data-prepper-reference/ --- -# Configuring Data Prepper +# Configuring OpenSearch Data Prepper -You can customize your Data Prepper configuration by editing the `data-prepper-config.yaml` file in your Data Prepper installation. The following configuration options are independent from pipeline configuration options. +You can customize your OpenSearch Data Prepper configuration by editing the `data-prepper-config.yaml` file in your Data Prepper installation. The following configuration options are independent from pipeline configuration options. ## Data Prepper configuration diff --git a/_data-prepper/managing-data-prepper/configuring-log4j.md b/_data-prepper/managing-data-prepper/configuring-log4j.md index 175c754abf..fe256e0da5 100644 --- a/_data-prepper/managing-data-prepper/configuring-log4j.md +++ b/_data-prepper/managing-data-prepper/configuring-log4j.md @@ -1,13 +1,13 @@ --- layout: default title: Configuring Log4j -parent: Managing Data Prepper +parent: Managing OpenSearch Data Prepper nav_order: 20 --- # Configuring Log4j -You can configure logging using Log4j in Data Prepper. +You can configure logging using Log4j in OpenSearch Data Prepper. ## Logging diff --git a/_data-prepper/managing-data-prepper/core-apis.md b/_data-prepper/managing-data-prepper/core-apis.md index b810c7b15e..eecc4ee73b 100644 --- a/_data-prepper/managing-data-prepper/core-apis.md +++ b/_data-prepper/managing-data-prepper/core-apis.md @@ -1,13 +1,13 @@ --- layout: default title: Core APIs -parent: Managing Data Prepper +parent: Managing OpenSearch Data Prepper nav_order: 15 --- # Core APIs -All Data Prepper instances expose a server with some control APIs. By default, this server runs on port 4900. Some plugins, especially source plugins, may expose other servers that run on different ports. Configurations for these plugins are independent of the core API. For example, to shut down Data Prepper, you can run the following curl request: +All OpenSearch Data Prepper instances expose a server with some control APIs. By default, this server runs on port 4900. Some plugins, especially source plugins, may expose other servers that run on different ports. Configurations for these plugins are independent of the core API. For example, to shut down Data Prepper, you can run the following curl request: ``` curl -X POST http://localhost:4900/shutdown diff --git a/_data-prepper/managing-data-prepper/extensions/extensions.md b/_data-prepper/managing-data-prepper/extensions/extensions.md index 8cbfc602c7..80da40767e 100644 --- a/_data-prepper/managing-data-prepper/extensions/extensions.md +++ b/_data-prepper/managing-data-prepper/extensions/extensions.md @@ -1,14 +1,14 @@ --- layout: default title: Extensions -parent: Managing Data Prepper +parent: Managing OpenSearch Data Prepper has_children: true nav_order: 18 --- # Extensions -Data Prepper extensions provide Data Prepper functionality outside of core Data Prepper pipeline components. +OpenSearch Data Prepper extensions provide Data Prepper functionality outside of core Data Prepper pipeline components. Many extensions provide configuration options that give Data Prepper administrators greater flexibility over Data Prepper's functionality. Extension configurations can be configured in the `data-prepper-config.yaml` file under the `extensions:` YAML block. diff --git a/_data-prepper/managing-data-prepper/extensions/geoip-service.md b/_data-prepper/managing-data-prepper/extensions/geoip-service.md index 53c21a08ff..157367dce1 100644 --- a/_data-prepper/managing-data-prepper/extensions/geoip-service.md +++ b/_data-prepper/managing-data-prepper/extensions/geoip-service.md @@ -3,12 +3,12 @@ layout: default title: geoip_service nav_order: 5 parent: Extensions -grand_parent: Managing Data Prepper +grand_parent: Managing OpenSearch Data Prepper --- # geoip_service -The `geoip_service` extension configures all [`geoip`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/geoip) processors in Data Prepper. +The `geoip_service` extension configures all [`geoip`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/geoip) processors in OpenSearch Data Prepper. ## Usage diff --git a/_data-prepper/managing-data-prepper/managing-data-prepper.md b/_data-prepper/managing-data-prepper/managing-data-prepper.md index ea2d1f111c..204510be24 100644 --- a/_data-prepper/managing-data-prepper/managing-data-prepper.md +++ b/_data-prepper/managing-data-prepper/managing-data-prepper.md @@ -1,10 +1,10 @@ --- layout: default -title: Managing Data Prepper +title: Managing OpenSearch Data Prepper has_children: true nav_order: 20 --- -# Managing Data Prepper +# Managing OpenSearch Data Prepper -You can perform administrator functions for Data Prepper, including system configuration, interacting with core APIs, Log4j configuration, and monitoring. You can set up peer forwarding to coordinate multiple Data Prepper nodes when using stateful aggregation. \ No newline at end of file +You can perform administrator functions for OpenSearch Data Prepper, including system configuration, interacting with core APIs, Log4j configuration, and monitoring. You can set up peer forwarding to coordinate multiple Data Prepper nodes when using stateful aggregation. \ No newline at end of file diff --git a/_data-prepper/managing-data-prepper/monitoring.md b/_data-prepper/managing-data-prepper/monitoring.md index 691f376b33..cb29e49a51 100644 --- a/_data-prepper/managing-data-prepper/monitoring.md +++ b/_data-prepper/managing-data-prepper/monitoring.md @@ -1,13 +1,13 @@ --- layout: default title: Monitoring -parent: Managing Data Prepper +parent: Managing OpenSearch Data Prepper nav_order: 25 --- -# Monitoring Data Prepper with metrics +# Monitoring OpenSearch Data Prepper with metrics -You can monitor Data Prepper with metrics using [Micrometer](https://micrometer.io/). There are two types of metrics: JVM/system metrics and plugin metrics. [Prometheus](https://prometheus.io/) is used as the default metrics backend. +You can monitor OpenSearch Data Prepper with metrics using [Micrometer](https://micrometer.io/). There are two types of metrics: JVM/system metrics and plugin metrics. [Prometheus](https://prometheus.io/) is used as the default metrics backend. ## JVM and system metrics diff --git a/_data-prepper/managing-data-prepper/peer-forwarder.md b/_data-prepper/managing-data-prepper/peer-forwarder.md index f6a0f9890a..9d54aef87c 100644 --- a/_data-prepper/managing-data-prepper/peer-forwarder.md +++ b/_data-prepper/managing-data-prepper/peer-forwarder.md @@ -2,12 +2,12 @@ layout: default title: Peer forwarder nav_order: 12 -parent: Managing Data Prepper +parent: Managing OpenSearch Data Prepper --- # Peer forwarder -Peer forwarder is an HTTP service that performs peer forwarding of an `event` between Data Prepper nodes for aggregation. This HTTP service uses a hash-ring approach to aggregate events and determine which Data Prepper node it should handle on a given trace before rerouting it to that node. Currently, peer forwarder is supported by the `aggregate`, `service_map_stateful`, and `otel_traces_raw` [processors]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/processors/). +Peer forwarder is an HTTP service that performs peer forwarding of an `event` between OpenSearch Data Prepper nodes for aggregation. This HTTP service uses a hash-ring approach to aggregate events and determine which Data Prepper node it should handle on a given trace before rerouting it to that node. Currently, peer forwarder is supported by the `aggregate`, `service_map_stateful`, and `otel_traces_raw` [processors]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/processors/). Peer Forwarder groups events based on the identification keys provided by the supported processors. For `service_map_stateful` and `otel_traces_raw`, the identification key is `traceId` by default and cannot be configured. The `aggregate` processor is configured using the `identification_keys` configuration option. From here, you can specify which keys to use for Peer Forwarder. See [Aggregate Processor page](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/aggregate-processor#identification_keys) for more information about identification keys. diff --git a/_data-prepper/managing-data-prepper/source-coordination.md b/_data-prepper/managing-data-prepper/source-coordination.md index 3c60b45280..5dc85e50a7 100644 --- a/_data-prepper/managing-data-prepper/source-coordination.md +++ b/_data-prepper/managing-data-prepper/source-coordination.md @@ -2,12 +2,12 @@ layout: default title: Source coordination nav_order: 35 -parent: Managing Data Prepper +parent: Managing OpenSearch Data Prepper --- # Source coordination -_Source coordination_ is the concept of coordinating and distributing work between Data Prepper data sources in a multi-node environment. Some data sources, such as Amazon Kinesis or Amazon Simple Queue Service (Amazon SQS), handle coordination natively. Other data sources, such as OpenSearch, Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB, and JDBC/ODBC, do not support source coordination. +_Source coordination_ is the concept of coordinating and distributing work between OpenSearch Data Prepper data sources in a multi-node environment. Some data sources, such as Amazon Kinesis or Amazon Simple Queue Service (Amazon SQS), handle coordination natively. Other data sources, such as OpenSearch, Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB, and JDBC/ODBC, do not support source coordination. Data Prepper source coordination decides which partition of work is performed by each node in the Data Prepper cluster and prevents duplicate partitions of work. diff --git a/_data-prepper/migrate-open-distro.md b/_data-prepper/migrate-open-distro.md index 8b3e7a7198..31a47c5682 100644 --- a/_data-prepper/migrate-open-distro.md +++ b/_data-prepper/migrate-open-distro.md @@ -23,4 +23,4 @@ In your Data Prepper Docker configuration, adjust `amazon/opendistro-for-elastic ## Next steps -For more information about Data Prepper configurations, see [Getting Started with Data Prepper]({{site.url}}{{site.baseurl}}/clients/data-prepper/get-started/). +For more information about Data Prepper configurations, see [Getting Started with OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/clients/data-prepper/get-started/). diff --git a/_data-prepper/migrating-from-logstash-data-prepper.md b/_data-prepper/migrating-from-logstash-data-prepper.md index 3d87f29517..13548092dc 100644 --- a/_data-prepper/migrating-from-logstash-data-prepper.md +++ b/_data-prepper/migrating-from-logstash-data-prepper.md @@ -9,9 +9,9 @@ redirect_from: # Migrating from Logstash -You can run Data Prepper with a Logstash configuration. +You can run OpenSearch Data Prepper with a Logstash configuration. -As mentioned in [Getting started with Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/), you'll need to configure Data Prepper with a pipeline using a `pipelines.yaml` file. +As mentioned in [Getting started with OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/), you'll need to configure Data Prepper with a pipeline using a `pipelines.yaml` file. Alternatively, if you have a Logstash configuration `logstash.conf` to configure Data Prepper instead of `pipelines.yaml`. @@ -29,7 +29,7 @@ As of the Data Prepper 1.2 release, the following plugins from the Logstash conf ## Running Data Prepper with a Logstash configuration -1. To install Data Prepper's Docker image, see Installing Data Prepper in [Getting Started with Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started#1-installing-data-prepper). +1. To install Data Prepper's Docker image, see Installing Data Prepper in [Getting Started with OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started#1-installing-data-prepper). 2. Run the Docker image installed in Step 1 by supplying your `logstash.conf` configuration. diff --git a/_data-prepper/pipelines/configuration/buffers/buffers.md b/_data-prepper/pipelines/configuration/buffers/buffers.md index 287825b549..0965b0acd0 100644 --- a/_data-prepper/pipelines/configuration/buffers/buffers.md +++ b/_data-prepper/pipelines/configuration/buffers/buffers.md @@ -8,7 +8,7 @@ nav_order: 30 # Buffers -The `buffer` component acts as an intermediary layer between the `source` and `sink` components in a Data Prepper pipeline. It serves as temporary storage for events, decoupling the `source` from the downstream processors and sinks. Buffers can be either in-memory or disk based. +The `buffer` component acts as an intermediary layer between the `source` and `sink` components in an OpenSearch Data Prepper pipeline. It serves as temporary storage for events, decoupling the `source` from the downstream processors and sinks. Buffers can be either in-memory or disk based. If not explicitly specified in the pipeline configuration, Data Prepper uses the default `bounded_blocking` buffer, which is an in-memory queue bounded by the number of events it can store. The `bounded_blocking` buffer is a convenient option when the event volume and processing rates are manageable within the available memory constraints. diff --git a/_data-prepper/pipelines/configuration/buffers/kafka.md b/_data-prepper/pipelines/configuration/buffers/kafka.md index 87600601b4..0152d967d7 100644 --- a/_data-prepper/pipelines/configuration/buffers/kafka.md +++ b/_data-prepper/pipelines/configuration/buffers/kafka.md @@ -59,7 +59,7 @@ Option | Required | Type | Description `name` | Yes | String | The name of the Kafka topic. `group_id` | Yes | String | Sets Kafka's `group.id` option. `workers` | No | Integer | The number of multithreaded consumers associated with each topic. Default is `2`. The maximum value is `200`. -`encryption_key` | No | String | An Advanced Encryption Standard (AES) encryption key used to encrypt and decrypt data within Data Prepper before sending it to Kafka. This value must be plain text or encrypted using AWS Key Management Service (AWS KMS). +`encryption_key` | No | String | An Advanced Encryption Standard (AES) encryption key used to encrypt and decrypt data within OpenSearch Data Prepper before sending it to Kafka. This value must be plain text or encrypted using AWS Key Management Service (AWS KMS). `kms` | No | AWS KMS key | When configured, uses an AWS KMS key to encrypt data. See [`kms`](#kms) for more information. `auto_commit` | No | Boolean | When `false`, the consumer offset will not be periodically committed to Kafka in the background. Default is `false`. `commit_interval` | No | Integer | When `auto_commit` is set to `true`, sets how often, in seconds, the consumer offsets are auto-committed to Kafka through Kafka's `auto.commit.interval.ms` option. Default is `5s`. diff --git a/_data-prepper/pipelines/configuration/processors/aggregate.md b/_data-prepper/pipelines/configuration/processors/aggregate.md index 38b138a996..1d0052ada6 100644 --- a/_data-prepper/pipelines/configuration/processors/aggregate.md +++ b/_data-prepper/pipelines/configuration/processors/aggregate.md @@ -20,7 +20,7 @@ Option | Required | Type | Description identification_keys | Yes | List | An unordered list by which to group events. Events with the same values as these keys are put into the same group. If an event does not contain one of the `identification_keys`, then the value of that key is considered to be equal to `null`. At least one identification_key is required (for example, `["sourceIp", "destinationIp", "port"]`). action | Yes | AggregateAction | The action to be performed on each group. One of the [available aggregate actions](#available-aggregate-actions) must be provided, or you can create custom aggregate actions. `remove_duplicates` and `put_all` are the available actions. For more information, see [Creating New Aggregate Actions](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/aggregate-processor#creating-new-aggregate-actions). group_duration | No | String | The amount of time that a group should exist before it is concluded automatically. Supports ISO_8601 notation strings ("PT20.345S", "PT15M", etc.) as well as simple notation for seconds (`"60s"`) and milliseconds (`"1500ms"`). Default value is `180s`. -local_mode | No | Boolean | When `local_mode` is set to `true`, the aggregation is performed locally on each Data Prepper node instead of forwarding events to a specific node based on the `identification_keys` using a hash function. Default is `false`. +local_mode | No | Boolean | When `local_mode` is set to `true`, the aggregation is performed locally on each OpenSearch Data Prepper node instead of forwarding events to a specific node based on the `identification_keys` using a hash function. Default is `false`. ## Available aggregate actions @@ -31,7 +31,7 @@ Use the following aggregate actions to determine how the `aggregate` processor p The `remove_duplicates` action processes the first event for a group immediately and drops any events that duplicate the first event from the source. For example, when using `identification_keys: ["sourceIp", "destination_ip"]`: 1. The `remove_duplicates` action processes `{ "sourceIp": "127.0.0.1", "destinationIp": "192.168.0.1", "status": 200 }`, the first event in the source. -2. Data Prepper drops the `{ "sourceIp": "127.0.0.1", "destinationIp": "192.168.0.1", "bytes": 1000 }` event because the `sourceIp` and `destinationIp` match the first event in the source. +2. OpenSearch Data Prepper drops the `{ "sourceIp": "127.0.0.1", "destinationIp": "192.168.0.1", "bytes": 1000 }` event because the `sourceIp` and `destinationIp` match the first event in the source. 3. The `remove_duplicates` action processes the next event, `{ "sourceIp": "127.0.0.2", "destinationIp": "192.168.0.1", "bytes": 1000 }`. Because the `sourceIp` is different from the first event of the group, Data Prepper creates a new group based on the event. ### put_all diff --git a/_data-prepper/pipelines/configuration/processors/anomaly-detector.md b/_data-prepper/pipelines/configuration/processors/anomaly-detector.md index ba574bdf7d..3fae80cb3f 100644 --- a/_data-prepper/pipelines/configuration/processors/anomaly-detector.md +++ b/_data-prepper/pipelines/configuration/processors/anomaly-detector.md @@ -35,7 +35,7 @@ The random cut forest (RCF) ML algorithm is an unsupervised algorithm for detect | :--- | :--- | | `random_cut_forest` | Processes events using the RCF ML algorithm to detect anomalies. | -RCF is an unsupervised ML algorithm for detecting anomalous data points within a dataset. Data Prepper uses RCF to detect anomalies in data by passing the values of the configured key to RCF. For example, when an event with a latency value of 11.5 is sent, the following anomaly event is generated: +RCF is an unsupervised ML algorithm for detecting anomalous data points within a dataset. OpenSearch Data Prepper uses RCF to detect anomalies in data by passing the values of the configured key to RCF. For example, when an event with a latency value of 11.5 is sent, the following anomaly event is generated: ```json diff --git a/_data-prepper/pipelines/configuration/processors/aws-lambda.md b/_data-prepper/pipelines/configuration/processors/aws-lambda.md index bd167996a1..0ef9dfd7d7 100644 --- a/_data-prepper/pipelines/configuration/processors/aws-lambda.md +++ b/_data-prepper/pipelines/configuration/processors/aws-lambda.md @@ -6,9 +6,9 @@ grand_parent: Pipelines nav_order: 10 --- -# aws_lambda integration for Data Prepper +# aws_lambda integration for OpenSearch Data Prepper -The [AWS Lambda](https://aws.amazon.com/lambda/) integration allows developers to use serverless computing capabilities within their Data Prepper pipelines for flexible event processing and data routing. +The [AWS Lambda](https://aws.amazon.com/lambda/) integration allows developers to use serverless computing capabilities within their OpenSearch Data Prepper pipelines for flexible event processing and data routing. ## AWS Lambda processor configuration diff --git a/_data-prepper/pipelines/configuration/processors/convert-entry-type.md b/_data-prepper/pipelines/configuration/processors/convert-entry-type.md index c2c46260ed..cc707832ad 100644 --- a/_data-prepper/pipelines/configuration/processors/convert-entry-type.md +++ b/_data-prepper/pipelines/configuration/processors/convert-entry-type.md @@ -47,7 +47,7 @@ type-conv-pipeline: ``` {% include copy.html %} -Next, create a log file named `logs_json.log` and replace the `path` in the file source of your `pipeline.yaml` file with that filepath. For more information, see [Configuring Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/#2-configuring-data-prepper). +Next, create a log file named `logs_json.log` and replace the `path` in the file source of your `pipeline.yaml` file with that filepath. For more information, see [Configuring OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/#2-configuring-data-prepper). For example, before you run the `convert_entry_type` processor, if the `logs_json.log` file contains the following event record: diff --git a/_data-prepper/pipelines/configuration/processors/csv.md b/_data-prepper/pipelines/configuration/processors/csv.md index e386db4bf4..d640b19eb3 100644 --- a/_data-prepper/pipelines/configuration/processors/csv.md +++ b/_data-prepper/pipelines/configuration/processors/csv.md @@ -113,4 +113,4 @@ The `csv` processor includes the following custom metrics. The `csv` processor includes the following counter metrics: -* `csvInvalidEvents`: The number of invalid events, usually caused by an unclosed quotation mark in the event itself. Data Prepper throws an exception when an invalid event is parsed. +* `csvInvalidEvents`: The number of invalid events, usually caused by an unclosed quotation mark in the event itself. OpenSearch Data Prepper throws an exception when an invalid event is parsed. diff --git a/_data-prepper/pipelines/configuration/processors/delete-entries.md b/_data-prepper/pipelines/configuration/processors/delete-entries.md index e7c022c6a7..f30bccae23 100644 --- a/_data-prepper/pipelines/configuration/processors/delete-entries.md +++ b/_data-prepper/pipelines/configuration/processors/delete-entries.md @@ -41,7 +41,7 @@ pipeline: ``` {% include copy.html %} -Next, create a log file named `logs_json.log` and replace the `path` in the file source of your `pipeline.yaml` file with that filepath. For more information, see [Configuring Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/#2-configuring-data-prepper). +Next, create a log file named `logs_json.log` and replace the `path` in the file source of your `pipeline.yaml` file with that filepath. For more information, see [Configuring OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/#2-configuring-data-prepper). For example, before you run the `delete_entries` processor, if the `logs_json.log` file contains the following event record: diff --git a/_data-prepper/pipelines/configuration/processors/drop-events.md b/_data-prepper/pipelines/configuration/processors/drop-events.md index 1f601c9743..eba3d0a8fb 100644 --- a/_data-prepper/pipelines/configuration/processors/drop-events.md +++ b/_data-prepper/pipelines/configuration/processors/drop-events.md @@ -13,7 +13,7 @@ The `drop_events` processor drops all the events that are passed into it. The fo Option | Required | Type | Description :--- | :--- | :--- | :--- -drop_when | Yes | String | Accepts a Data Prepper expression string following the [Data Prepper Expression Syntax]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). Configuring `drop_events` with `drop_when: true` drops all the events received. +drop_when | Yes | String | Accepts an OpenSearch Data Prepper expression string following the [expression syntax]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). Configuring `drop_events` with `drop_when: true` drops all the events received. handle_failed_events | No | Enum | Specifies how exceptions are handled when an exception occurs while evaluating an event. Default value is `drop`, which drops the event so that it is not sent to OpenSearch. Available options are `drop`, `drop_silently`, `skip`, and `skip_silently`. For more information, see [handle_failed_events](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/drop-events-processor#handle_failed_events).