diff --git a/_config.yml b/_config.yml index 8b41cc7f96..7557fc3989 100644 --- a/_config.yml +++ b/_config.yml @@ -223,7 +223,7 @@ benchmark_collection: data_prepper_collection: collections: data-prepper: - name: Data Prepper + name: OpenSearch Data Prepper nav_fold: true # Defaults @@ -240,7 +240,7 @@ defaults: path: "_data-prepper" values: section: "data-prepper" - section-name: "Data Prepper" + section-name: "OpenSearch Data Prepper" - scope: path: "_clients" diff --git a/_data-prepper/common-use-cases/anomaly-detection.md b/_data-prepper/common-use-cases/anomaly-detection.md index e7003558f1..7d3bbcb390 100644 --- a/_data-prepper/common-use-cases/anomaly-detection.md +++ b/_data-prepper/common-use-cases/anomaly-detection.md @@ -7,7 +7,7 @@ nav_order: 5 # Anomaly detection -You can use Data Prepper to train models and generate anomalies in near real time on time-series aggregated events. You can generate anomalies either on events generated within the pipeline or on events coming directly into the pipeline, like OpenTelemetry metrics. You can feed these tumbling window aggregated time-series events to the [`anomaly_detector` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/anomaly-detector/), which trains a model and generates anomalies with a grade score. Then you can configure your pipeline to write the anomalies to a separate index to create document monitors and trigger fast alerting. +You can use OpenSearch Data Prepper to train models and generate anomalies in near real time on time-series aggregated events. You can generate anomalies either on events generated within the pipeline or on events coming directly into the pipeline, like OpenTelemetry metrics. You can feed these tumbling window aggregated time-series events to the [`anomaly_detector` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/anomaly-detector/), which trains a model and generates anomalies with a grade score. Then you can configure your pipeline to write the anomalies to a separate index to create document monitors and trigger fast alerting. ## Metrics from logs diff --git a/_data-prepper/common-use-cases/codec-processor-combinations.md b/_data-prepper/common-use-cases/codec-processor-combinations.md index 525bc704be..9abf99a414 100644 --- a/_data-prepper/common-use-cases/codec-processor-combinations.md +++ b/_data-prepper/common-use-cases/codec-processor-combinations.md @@ -7,7 +7,7 @@ nav_order: 10 # Codec processor combinations -At ingestion time, data received by the [`s3` source]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3/) can be parsed by [codecs]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3#codec). Codecs compresses and decompresses large data sets in a certain format before ingestion them through a Data Prepper pipeline [processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/processors/). +At ingestion time, data received by the [`s3` source]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3/) can be parsed by [codecs]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3#codec). Codecs compresses and decompresses large data sets in a certain format before ingestion them through an OpenSearch Data Prepper pipeline [processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/processors/). While most codecs can be used with most processors, the following codec processor combinations can make your pipeline more efficient when used with the following input types. @@ -47,4 +47,4 @@ The [`newline` codec]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/config ## `event_json` -The `event_json` output codec converts event data and metadata into JSON format to send to a sink, such as an S3 sink. The `event_json` input codec reads the event and its metadata to create an event in Data Prepper. +The `event_json` output codec converts event data and metadata into JSON format to send to a sink, such as an S3 sink. The `event_json` input codec reads the event and its metadata to create an event in OpenSearch Data Prepper. diff --git a/_data-prepper/common-use-cases/common-use-cases.md b/_data-prepper/common-use-cases/common-use-cases.md index 342a8fc819..adca11418b 100644 --- a/_data-prepper/common-use-cases/common-use-cases.md +++ b/_data-prepper/common-use-cases/common-use-cases.md @@ -9,4 +9,4 @@ redirect_from: # Common use cases -You can use Data Prepper for several different purposes, including trace analytics, log analytics, Amazon S3 log analytics, and metrics ingestion. \ No newline at end of file +You can use OpenSearch Data Prepper for several different purposes, including trace analytics, log analytics, Amazon S3 log analytics, and metrics ingestion. \ No newline at end of file diff --git a/_data-prepper/common-use-cases/event-aggregation.md b/_data-prepper/common-use-cases/event-aggregation.md index f6e2757d9a..4e1464b505 100644 --- a/_data-prepper/common-use-cases/event-aggregation.md +++ b/_data-prepper/common-use-cases/event-aggregation.md @@ -7,7 +7,7 @@ nav_order: 25 # Event aggregation -You can use Data Prepper to aggregate data from different events over a period of time. Aggregating events can help to reduce unnecessary log volume and manage use cases like multiline logs that are received as separate events. The [`aggregate` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/aggregate/) is a stateful processor that groups events based on the values for a set of specified identification keys and performs a configurable action on each group. +You can use OpenSearch Data Prepper to aggregate data from different events over a period of time. Aggregating events can help to reduce unnecessary log volume and manage use cases like multiline logs that are received as separate events. The [`aggregate` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/aggregate/) is a stateful processor that groups events based on the values for a set of specified identification keys and performs a configurable action on each group. The `aggregate` processor state is stored in memory. For example, in order to combine four events into one, the processor needs to retain pieces of the first three events. The state of an aggregate group of events is kept for a configurable amount of time. Depending on your logs, the aggregate action being used, and the number of memory options in the processor configuration, the aggregation could take place over a long period of time. diff --git a/_data-prepper/common-use-cases/log-analytics.md b/_data-prepper/common-use-cases/log-analytics.md index ceb26ff5b7..8de27cc9d3 100644 --- a/_data-prepper/common-use-cases/log-analytics.md +++ b/_data-prepper/common-use-cases/log-analytics.md @@ -7,17 +7,17 @@ nav_order: 30 # Log analytics -Data Prepper is an extendable, configurable, and scalable solution for log ingestion into OpenSearch and Amazon OpenSearch Service. Data Prepper supports receiving logs from [Fluent Bit](https://fluentbit.io/) through the [HTTP Source](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/http-source/README.md) and processing those logs with a [Grok Processor](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/grok-processor/README.md) before ingesting them into OpenSearch through the [OpenSearch sink](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/README.md). +OpenSearch Data Prepper is an extendable, configurable, and scalable solution for log ingestion into OpenSearch and Amazon OpenSearch Service. OpenSearch Data Prepper supports receiving logs from [Fluent Bit](https://fluentbit.io/) through the [HTTP Source](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/http-source/README.md) and processing those logs with a [Grok Processor](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/grok-processor/README.md) before ingesting them into OpenSearch through the [OpenSearch sink](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/README.md). -The following image shows all of the components used for log analytics with Fluent Bit, Data Prepper, and OpenSearch. +The following image shows all of the components used for log analytics with Fluent Bit, OpenSearch Data Prepper, and OpenSearch. ![Log analytics component]({{site.url}}{{site.baseurl}}/images/data-prepper/log-analytics/log-analytics-components.jpg) -In the application environment, run Fluent Bit. Fluent Bit can be containerized through Kubernetes, Docker, or Amazon Elastic Container Service (Amazon ECS). You can also run Fluent Bit as an agent on Amazon Elastic Compute Cloud (Amazon EC2). Configure the [Fluent Bit http output plugin](https://docs.fluentbit.io/manual/pipeline/outputs/http) to export log data to Data Prepper. Then deploy Data Prepper as an intermediate component and configure it to send the enriched log data to your OpenSearch cluster. From there, use OpenSearch Dashboards to perform more intensive visualization and analysis. +In the application environment, run Fluent Bit. Fluent Bit can be containerized through Kubernetes, Docker, or Amazon Elastic Container Service (Amazon ECS). You can also run Fluent Bit as an agent on Amazon Elastic Compute Cloud (Amazon EC2). Configure the [Fluent Bit http output plugin](https://docs.fluentbit.io/manual/pipeline/outputs/http) to export log data to OpenSearch Data Prepper. Then deploy OpenSearch Data Prepper as an intermediate component and configure it to send the enriched log data to your OpenSearch cluster. From there, use OpenSearch Dashboards to perform more intensive visualization and analysis. ## Log analytics pipeline -Log analytics pipelines in Data Prepper are extremely customizable. The following image shows a simple pipeline. +Log analytics pipelines in OpenSearch Data Prepper are extremely customizable. The following image shows a simple pipeline. ![Log analytics component]({{site.url}}{{site.baseurl}}/images/data-prepper/log-analytics/log-ingestion-pipeline.jpg) @@ -27,7 +27,7 @@ The [HTTP Source](https://github.com/opensearch-project/data-prepper/blob/main/d ### Processor -Data Prepper 1.2 and above come with a [Grok Processor](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/grok-processor/README.md). The Grok Processor is an invaluable tool for structuring and extracting important fields from your logs, making them more queryable. +OpenSearch Data Prepper 1.2 and above come with a [Grok Processor](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/grok-processor/README.md). The Grok Processor is an invaluable tool for structuring and extracting important fields from your logs, making them more queryable. The Grok Processor comes with a wide variety of [default patterns](https://github.com/thekrakken/java-grok/blob/master/src/main/resources/patterns/patterns) that match common log formats like Apache logs or syslogs, but it can easily accept any custom patterns that cater to your specific log format. @@ -92,9 +92,9 @@ The following are the main changes you need to make: ## Fluent Bit -You will need to run Fluent Bit in your service environment. See [Getting Started with Fluent Bit](https://docs.fluentbit.io/manual/installation/getting-started-with-fluent-bit) for installation instructions. Ensure that you can configure the [Fluent Bit http output plugin](https://docs.fluentbit.io/manual/pipeline/outputs/http) to your Data Prepper HTTP source. The following is an example `fluent-bit.conf` that tails a log file named `test.log` and forwards it to a locally running Data Prepper HTTP source, which runs by default on port 2021. +You will need to run Fluent Bit in your service environment. See [Getting Started with Fluent Bit](https://docs.fluentbit.io/manual/installation/getting-started-with-fluent-bit) for installation instructions. Ensure that you can configure the [Fluent Bit http output plugin](https://docs.fluentbit.io/manual/pipeline/outputs/http) to your OpenSearch Data Prepper HTTP source. The following is an example `fluent-bit.conf` that tails a log file named `test.log` and forwards it to a locally running OpenSearch Data Prepper HTTP source, which runs by default on port 2021. -Note that you should adjust the file `path`, output `Host`, and `Port` according to how and where you have Fluent Bit and Data Prepper running. +Note that you should adjust the file `path`, output `Host`, and `Port` according to how and where you have Fluent Bit and OpenSearch Data Prepper running. ### Example: Fluent Bit file without SSL and basic authentication enabled @@ -145,8 +145,8 @@ The following is an example `fluent-bit.conf` file with SSL and basic authentica # Next steps -See the [Data Prepper Log Ingestion Demo Guide](https://github.com/opensearch-project/data-prepper/blob/main/examples/log-ingestion/README.md) for a specific example of Apache log ingestion from `FluentBit -> Data Prepper -> OpenSearch` running through Docker. +See the [OpenSearch Data Prepper Log Ingestion Demo Guide](https://github.com/opensearch-project/data-prepper/blob/main/examples/log-ingestion/README.md) for a specific example of Apache log ingestion from `FluentBit -> OpenSearch Data Prepper -> OpenSearch` running through Docker. -In the future, Data Prepper will offer additional sources and processors that will make more complex log analytics pipelines available. Check out the [Data Prepper Project Roadmap](https://github.com/orgs/opensearch-project/projects/221) to see what is coming. +In the future, OpenSearch Data Prepper will offer additional sources and processors that will make more complex log analytics pipelines available. Check out the [OpenSearch Data Prepper Project Roadmap](https://github.com/orgs/opensearch-project/projects/221) to see what is coming. -If there is a specific source, processor, or sink that you would like to include in your log analytics workflow and is not currently on the roadmap, please bring it to our attention by creating a GitHub issue. Additionally, if you are interested in contributing to Data Prepper, see our [Contributing Guidelines](https://github.com/opensearch-project/data-prepper/blob/main/CONTRIBUTING.md) as well as our [developer guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md) and [plugin development guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/plugin_development.md). +If there is a specific source, processor, or sink that you would like to include in your log analytics workflow and is not currently on the roadmap, please bring it to our attention by creating a GitHub issue. Additionally, if you are interested in contributing to OpenSearch Data Prepper, see our [Contributing Guidelines](https://github.com/opensearch-project/data-prepper/blob/main/CONTRIBUTING.md) as well as our [developer guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md) and [plugin development guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/plugin_development.md). diff --git a/_data-prepper/common-use-cases/log-enrichment.md b/_data-prepper/common-use-cases/log-enrichment.md index 0d8ce4ab7d..c09fdec603 100644 --- a/_data-prepper/common-use-cases/log-enrichment.md +++ b/_data-prepper/common-use-cases/log-enrichment.md @@ -7,7 +7,7 @@ nav_order: 35 # Log enrichment -You can perform different types of log enrichment with Data Prepper, including: +You can perform different types of log enrichment with OpenSearch Data Prepper, including: - Filtering. - Extracting key-value pairs from strings. diff --git a/_data-prepper/common-use-cases/metrics-logs.md b/_data-prepper/common-use-cases/metrics-logs.md index 3fda8597c7..fc0518ce26 100644 --- a/_data-prepper/common-use-cases/metrics-logs.md +++ b/_data-prepper/common-use-cases/metrics-logs.md @@ -7,7 +7,7 @@ nav_order: 15 # Deriving metrics from logs -You can use Data Prepper to derive metrics from logs. +You can use OpenSearch Data Prepper to derive metrics from logs. The following example pipeline receives incoming logs using the [`http` source plugin]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/http-source) and the [`grok` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/grok/). It then uses the [`aggregate` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/aggregate/) to extract the metric bytes aggregated during a 30-second window and derives histograms from the results. diff --git a/_data-prepper/common-use-cases/metrics-traces.md b/_data-prepper/common-use-cases/metrics-traces.md index c15eaa099b..2cd0dafbb7 100644 --- a/_data-prepper/common-use-cases/metrics-traces.md +++ b/_data-prepper/common-use-cases/metrics-traces.md @@ -7,7 +7,7 @@ nav_order: 20 # Deriving metrics from traces -You can use Data Prepper to derive metrics from OpenTelemetry traces. The following example pipeline receives incoming traces and extracts a metric called `durationInNanos`, aggregated over a tumbling window of 30 seconds. It then derives a histogram from the incoming traces. +You can use OpenSearch Data Prepper to derive metrics from OpenTelemetry traces. The following example pipeline receives incoming traces and extracts a metric called `durationInNanos`, aggregated over a tumbling window of 30 seconds. It then derives a histogram from the incoming traces. The pipeline contains the following pipelines: diff --git a/_data-prepper/common-use-cases/s3-logs.md b/_data-prepper/common-use-cases/s3-logs.md index 8d5a9ce967..576c7dee8e 100644 --- a/_data-prepper/common-use-cases/s3-logs.md +++ b/_data-prepper/common-use-cases/s3-logs.md @@ -7,13 +7,13 @@ nav_order: 40 # S3 logs -Data Prepper allows you to load logs from [Amazon Simple Storage Service](https://aws.amazon.com/s3/) (Amazon S3), including traditional logs, JSON documents, and CSV logs. +OpenSearch Data Prepper allows you to load logs from [Amazon Simple Storage Service](https://aws.amazon.com/s3/) (Amazon S3), including traditional logs, JSON documents, and CSV logs. ## Architecture -Data Prepper can read objects from S3 buckets using an [Amazon Simple Queue Service (SQS)](https://aws.amazon.com/sqs/) (Amazon SQS) queue and [Amazon S3 Event Notifications](https://docs.aws.amazon.com/AmazonS3/latest/userguide/NotificationHowTo.html). +OpenSearch Data Prepper can read objects from S3 buckets using an [Amazon Simple Queue Service (SQS)](https://aws.amazon.com/sqs/) (Amazon SQS) queue and [Amazon S3 Event Notifications](https://docs.aws.amazon.com/AmazonS3/latest/userguide/NotificationHowTo.html). -Data Prepper polls the Amazon SQS queue for S3 event notifications. When Data Prepper receives a notification that an S3 object was created, Data Prepper reads and parses that S3 object. +OpenSearch Data Prepper polls the Amazon SQS queue for S3 event notifications. When OpenSearch Data Prepper receives a notification that an S3 object was created, OpenSearch Data Prepper reads and parses that S3 object. The following diagram shows the overall architecture of the components involved. @@ -23,38 +23,38 @@ The component data flow is as follows: 1. A system produces logs into the S3 bucket. 2. S3 creates an S3 event notification in the SQS queue. -3. Data Prepper polls Amazon SQS for messages and then receives a message. -4. Data Prepper downloads the content from the S3 object. -5. Data Prepper sends a document to OpenSearch for the content in the S3 object. +3. OpenSearch Data Prepper polls Amazon SQS for messages and then receives a message. +4. OpenSearch Data Prepper downloads the content from the S3 object. +5. OpenSearch Data Prepper sends a document to OpenSearch for the content in the S3 object. ## Pipeline overview -Data Prepper supports reading data from S3 using the [`s3` source]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3/). +OpenSearch Data Prepper supports reading data from S3 using the [`s3` source]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3/). -The following diagram shows a conceptual outline of a Data Prepper pipeline reading from S3. +The following diagram shows a conceptual outline of an OpenSearch Data Prepper pipeline reading from S3. S3 source architecture{: .img-fluid} ## Prerequisites -Before Data Prepper can read log data from S3, you need the following prerequisites: +Before OpenSearch Data Prepper can read log data from S3, you need the following prerequisites: - An S3 bucket. - A log producer that writes logs to S3. The exact log producer will vary depending on your specific use case, but could include writing logs to S3 or a service such as Amazon CloudWatch. ## Getting started -Use the following steps to begin loading logs from S3 with Data Prepper. +Use the following steps to begin loading logs from S3 with OpenSearch Data Prepper. 1. Create an [SQS standard queue](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/step-create-queue.html) for your S3 event notifications. 2. Configure [bucket notifications](https://docs.aws.amazon.com/AmazonS3/latest/userguide/ways-to-add-notification-config-to-bucket.html) for SQS. Use the `s3:ObjectCreated:*` event type. -3. Grant [AWS IAM](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html) permissions to Data Prepper for accessing SQS and S3. +3. Grant [AWS IAM](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html) permissions to OpenSearch Data Prepper for accessing SQS and S3. 4. (Recommended) Create an [SQS dead-letter queue](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-dead-letter-queues.html) (DLQ). 5. (Recommended) Configure an SQS re-drive policy to move failed messages into the DLQ. -### Setting permissions for Data Prepper +### Setting permissions for OpenSearch Data Prepper -To view S3 logs, Data Prepper needs access to Amazon SQS and S3. Use the following example to set up permissions: +To view S3 logs, OpenSearch Data Prepper needs access to Amazon SQS and S3. Use the following example to set up permissions: ```json { @@ -103,7 +103,7 @@ To use an SQS dead-letter queue, perform the following steps: 1. Create a new SQS standard queue to act as the DLQ. 2. Configure your SQS re-drive policy [to use DLQ](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-configure-dead-letter-queue.html). Consider using a low value such as 2 or 3 for the **Maximum Receives** setting. -3. Configure the Data Prepper `s3` source to use `retain_messages` for `on_error`. This is the default behavior. +3. Configure the OpenSearch Data Prepper `s3` source to use `retain_messages` for `on_error`. This is the default behavior. ## Pipeline design @@ -128,7 +128,7 @@ Configure the following options according to your use case: * `queue_url`: This the SQS queue URL and is always unique to your pipeline. * `codec`: The codec determines how to parse the incoming data. -* `visibility_timeout`: Configure this value to be large enough for Data Prepper to process 10 S3 objects. However, if you make this value too large, messages that fail to process will take at least as long as the specified value before Data Prepper retries. +* `visibility_timeout`: Configure this value to be large enough for OpenSearch Data Prepper to process 10 S3 objects. However, if you make this value too large, messages that fail to process will take at least as long as the specified value before OpenSearch Data Prepper retries. The default values for each option work for the majority of use cases. For all available options for the S3 source, see [`s3`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3/). @@ -164,9 +164,9 @@ s3-log-pipeline: ``` {% include copy-curl.html %} -## Multiple Data Prepper pipelines +## Multiple OpenSearch Data Prepper pipelines -It is recommended that you have one SQS queue per Data Prepper pipeline. In addition, you can have multiple nodes in the same cluster reading from the same SQS queue, which doesn't require additional Data Prepper configuration. +It is recommended that you have one SQS queue per OpenSearch Data Prepper pipeline. In addition, you can have multiple nodes in the same cluster reading from the same SQS queue, which doesn't require additional OpenSearch Data Prepper configuration. If you have multiple pipelines, you must create multiple SQS queues for each pipeline, even if both pipelines use the same S3 bucket. @@ -174,7 +174,7 @@ If you have multiple pipelines, you must create multiple SQS queues for each pip To meet the scale of logs produced by S3, some users require multiple SQS queues for their logs. You can use [Amazon Simple Notification Service](https://docs.aws.amazon.com/sns/latest/dg/welcome.html) (Amazon SNS) to route event notifications from S3 to an SQS [fanout pattern](https://docs.aws.amazon.com/sns/latest/dg/sns-common-scenarios.html). Using SNS, all S3 event notifications are sent directly to a single SNS topic, where you can subscribe to multiple SQS queues. -To make sure that Data Prepper can directly parse the event from the SNS topic, configure [raw message delivery](https://docs.aws.amazon.com/sns/latest/dg/sns-large-payload-raw-message-delivery.html) on the SNS-to-SQS subscription. Applying this option does not affect other SQS queues subscribed to the SNS topic. +To make sure that OpenSearch Data Prepper can directly parse the event from the SNS topic, configure [raw message delivery](https://docs.aws.amazon.com/sns/latest/dg/sns-large-payload-raw-message-delivery.html) on the SNS-to-SQS subscription. Applying this option does not affect other SQS queues subscribed to the SNS topic. ## Filtering and retrieving data using Amazon S3 Select diff --git a/_data-prepper/common-use-cases/sampling.md b/_data-prepper/common-use-cases/sampling.md index 7c77e8c3f2..47bead4649 100644 --- a/_data-prepper/common-use-cases/sampling.md +++ b/_data-prepper/common-use-cases/sampling.md @@ -7,7 +7,7 @@ nav_order: 45 # Sampling -Data Prepper provides the following sampling capabilities: +OpenSearch Data Prepper provides the following sampling capabilities: - Time sampling - Percentage sampling diff --git a/_data-prepper/common-use-cases/text-processing.md b/_data-prepper/common-use-cases/text-processing.md index 041ca63ab2..1fc81c5d98 100644 --- a/_data-prepper/common-use-cases/text-processing.md +++ b/_data-prepper/common-use-cases/text-processing.md @@ -7,7 +7,7 @@ nav_order: 55 # Text processing -Data Prepper provides text processing capabilities with the [`grok processor`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/grok/). The `grok` processor is based on the [`java-grok`](https://mvnrepository.com/artifact/io.krakens/java-grok) library and supports all compatible patterns. The `java-grok` library is built using the [`java.util.regex`](https://docs.oracle.com/javase/8/docs/api/java/util/regex/package-summary.html) regular expression library. +OpenSearch Data Prepper provides text processing capabilities with the [`grok processor`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/grok/). The `grok` processor is based on the [`java-grok`](https://mvnrepository.com/artifact/io.krakens/java-grok) library and supports all compatible patterns. The `java-grok` library is built using the [`java.util.regex`](https://docs.oracle.com/javase/8/docs/api/java/util/regex/package-summary.html) regular expression library. You can add custom patterns to your pipelines by using the `patterns_definitions` option. When debugging custom patterns, the [Grok Debugger](https://grokdebugger.com/) can be helpful. diff --git a/_data-prepper/common-use-cases/trace-analytics.md b/_data-prepper/common-use-cases/trace-analytics.md index 1a961077fe..204305f200 100644 --- a/_data-prepper/common-use-cases/trace-analytics.md +++ b/_data-prepper/common-use-cases/trace-analytics.md @@ -7,11 +7,11 @@ nav_order: 60 # Trace analytics -Trace analytics allows you to collect trace data and customize a pipeline that ingests and transforms the data for use in OpenSearch. The following provides an overview of the trace analytics workflow in Data Prepper, how to configure it, and how to visualize trace data. +Trace analytics allows you to collect trace data and customize a pipeline that ingests and transforms the data for use in OpenSearch. The following provides an overview of the trace analytics workflow in OpenSearch Data Prepper, how to configure it, and how to visualize trace data. ## Introduction -When using Data Prepper as a server-side component to collect trace data, you can customize a Data Prepper pipeline to ingest and transform the data for use in OpenSearch. Upon transformation, you can visualize the transformed trace data for use with the Observability plugin inside of OpenSearch Dashboards. Trace data provides visibility into your application's performance, and helps you gain more information about individual traces. +When using OpenSearch Data Prepper as a server-side component to collect trace data, you can customize an OpenSearch Data Prepper pipeline to ingest and transform the data for use in OpenSearch. Upon transformation, you can visualize the transformed trace data for use with the Observability plugin inside of OpenSearch Dashboards. Trace data provides visibility into your application's performance, and helps you gain more information about individual traces. The following flowchart illustrates the trace analytics workflow, from running OpenTelemetry Collector to using OpenSearch Dashboards for visualization. @@ -19,13 +19,13 @@ The following flowchart illustrates the trace analytics workflow, from running O To monitor trace analytics, you need to set up the following components in your service environment: - Add **instrumentation** to your application so it can generate telemetry data and send it to an OpenTelemetry collector. -- Run an **OpenTelemetry collector** as a sidecar or daemonset for Amazon Elastic Kubernetes Service (Amazon EKS), a sidecar for Amazon Elastic Container Service (Amazon ECS), or an agent on Amazon Elastic Compute Cloud (Amazon EC2). You should configure the collector to export trace data to Data Prepper. -- Deploy **Data Prepper** as the ingestion collector for OpenSearch. Configure it to send the enriched trace data to your OpenSearch cluster or to the Amazon OpenSearch Service domain. +- Run an **OpenTelemetry collector** as a sidecar or daemonset for Amazon Elastic Kubernetes Service (Amazon EKS), a sidecar for Amazon Elastic Container Service (Amazon ECS), or an agent on Amazon Elastic Compute Cloud (Amazon EC2). You should configure the collector to export trace data to OpenSearch Data Prepper. +- Deploy **OpenSearch Data Prepper** as the ingestion collector for OpenSearch. Configure it to send the enriched trace data to your OpenSearch cluster or to the Amazon OpenSearch Service domain. - Use **OpenSearch Dashboards** to visualize and detect problems in your distributed applications. ## Trace analytics pipeline -To monitor trace analytics in Data Prepper, we provide three pipelines: `entry-pipeline`, `raw-trace-pipeline`, and `service-map-pipeline`. The following image provides an overview of how the pipelines work together to monitor trace analytics. +To monitor trace analytics in OpenSearch Data Prepper, we provide three pipelines: `entry-pipeline`, `raw-trace-pipeline`, and `service-map-pipeline`. The following image provides an overview of how the pipelines work together to monitor trace analytics. Trace analytics pipeline overview{: .img-fluid} @@ -54,17 +54,17 @@ The sink provides specific configurations for the trace analytics feature. These ## Trace tuning -Starting with version 0.8.x, Data Prepper supports both vertical and horizontal scaling for trace analytics. You can adjust the size of a single Data Prepper instance to meet your workload's demands and scale vertically. +Starting with version 0.8.x, OpenSearch Data Prepper supports both vertical and horizontal scaling for trace analytics. You can adjust the size of a single OpenSearch Data Prepper instance to meet your workload's demands and scale vertically. -You can scale horizontally by using the core [peer forwarder]({{site.url}}{{site.baseurl}}/data-prepper/managing-data-prepper/peer-forwarder/) to deploy multiple Data Prepper instances to form a cluster. This enables Data Prepper instances to communicate with instances in the cluster and is required for horizontally scaling deployments. +You can scale horizontally by using the core [peer forwarder]({{site.url}}{{site.baseurl}}/data-prepper/managing-data-prepper/peer-forwarder/) to deploy multiple OpenSearch Data Prepper instances to form a cluster. This enables OpenSearch Data Prepper instances to communicate with instances in the cluster and is required for horizontally scaling deployments. ### Scaling recommendations -Use the following recommended configurations to scale Data Prepper. We recommend that you modify parameters based on the requirements. We also recommend that you monitor the Data Prepper host metrics and OpenSearch metrics to ensure that the configuration works as expected. +Use the following recommended configurations to scale OpenSearch Data Prepper. We recommend that you modify parameters based on the requirements. We also recommend that you monitor the OpenSearch Data Prepper host metrics and OpenSearch metrics to ensure that the configuration works as expected. #### Buffer -The total number of trace requests processed by Data Prepper is equal to the sum of the `buffer_size` values in `otel-trace-pipeline` and `raw-pipeline`. The total number of trace requests sent to OpenSearch is equal to the product of `batch_size` and `workers` in `raw-trace-pipeline`. For more information about `raw-pipeline`, see [Trace analytics pipeline]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines). +The total number of trace requests processed by OpenSearch Data Prepper is equal to the sum of the `buffer_size` values in `otel-trace-pipeline` and `raw-pipeline`. The total number of trace requests sent to OpenSearch is equal to the product of `batch_size` and `workers` in `raw-trace-pipeline`. For more information about `raw-pipeline`, see [Trace analytics pipeline]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines). We recommend the following when making changes to buffer settings: @@ -74,19 +74,19 @@ We recommend the following when making changes to buffer settings: #### Workers -The `workers` setting determines the number of threads that are used by Data Prepper to process requests from the buffer. We recommend that you set `workers` based on the CPU utilization. This value can be higher than the number of available processors because Data Prepper uses significant input/output time when sending data to OpenSearch. +The `workers` setting determines the number of threads that are used by OpenSearch Data Prepper to process requests from the buffer. We recommend that you set `workers` based on the CPU utilization. This value can be higher than the number of available processors because OpenSearch Data Prepper uses significant input/output time when sending data to OpenSearch. #### Heap -Configure the Data Prepper heap by setting the `JVM_OPTS` environment variable. We recommend that you set the heap value to a minimum value of `4` * `batch_size` * `otel_send_batch_size` * `maximum size of indvidual span`. +Configure the OpenSearch Data Prepper heap by setting the `JVM_OPTS` environment variable. We recommend that you set the heap value to a minimum value of `4` * `batch_size` * `otel_send_batch_size` * `maximum size of indvidual span`. As mentioned in the [OpenTelemetry Collector](#opentelemetry-collector) section, set `otel_send_batch_size` to a value of `50` in your OpenTelemetry Collector configuration. #### Local disk -Data Prepper uses the local disk to store metadata required for service map processing, so we recommend storing only the following key fields: `traceId`, `spanId`, `parentSpanId`, `spanKind`, `spanName`, and `serviceName`. The `service-map` plugin stores only two files, each of which stores `window_duration` seconds of data. As an example, testing with a throughput of `3000 spans/second` resulted in the total disk usage of `4 MB`. +OpenSearch Data Prepper uses the local disk to store metadata required for service map processing, so we recommend storing only the following key fields: `traceId`, `spanId`, `parentSpanId`, `spanKind`, `spanName`, and `serviceName`. The `service-map` plugin stores only two files, each of which stores `window_duration` seconds of data. As an example, testing with a throughput of `3000 spans/second` resulted in the total disk usage of `4 MB`. -Data Prepper also uses the local disk to write logs. In the most recent version of Data Prepper, you can redirect the logs to your preferred path. +OpenSearch Data Prepper also uses the local disk to write logs. In the most recent version of OpenSearch Data Prepper, you can redirect the logs to your preferred path. ### AWS CloudFormation template and Kubernetes/Amazon EKS configuration files @@ -114,7 +114,7 @@ The following sections provide examples of different types of pipelines and how The following example demonstrates how to build a pipeline that supports the [OpenSearch Dashboards Observability plugin]({{site.url}}{{site.baseurl}}/observability-plugin/trace/ta-dashboards/). This pipeline takes data from the OpenTelemetry Collector and uses two other pipelines as sinks. These two separate pipelines serve two different purposes and write to different OpenSearch indexes. The first pipeline prepares trace data for OpenSearch and enriches and ingests the span documents into a span index within OpenSearch. The second pipeline aggregates traces into a service map and writes service map documents into a service map index within OpenSearch. -Starting with Data Prepper version 2.0, Data Prepper no longer supports the `otel_traces_raw_prepper` processor. The `otel_traces_raw` processor replaces the `otel_traces_raw_prepper` processor and supports some of Data Prepper's recent data model changes. Instead, you should use the `otel_traces_raw` processor. See the following YAML file example: +Starting with OpenSearch Data Prepper version 2.0, OpenSearch Data Prepper no longer supports the `otel_traces_raw_prepper` processor. The `otel_traces_raw` processor replaces the `otel_traces_raw_prepper` processor and supports some of OpenSearch Data Prepper's recent data model changes. Instead, you should use the `otel_traces_raw` processor. See the following YAML file example: ```yml entry-pipeline: @@ -177,7 +177,7 @@ The following is an example `otel-trace-source` .yaml file with SSL and basic au ```yaml source: otel_traces_source: - #record_type: event # Add this when using Data Prepper 1.x. This option is removed in 2.0 + #record_type: event # Add this when using OpenSearch Data Prepper 1.x. This option is removed in 2.0 ssl: true sslKeyCertChainFile: "/full/path/to/certfile.crt" sslKeyFile: "/full/path/to/keyfile.key" @@ -195,7 +195,7 @@ The following is an example `pipeline.yaml` file without SSL and basic authentic otel-trace-pipeline: # workers is the number of threads processing data in each pipeline. # We recommend same value for all pipelines. - # default value is 1, set a value based on the machine you are running Data Prepper + # default value is 1, set a value based on the machine you are running OpenSearch Data Prepper workers: 8 # delay in milliseconds is how often the worker threads should process data. # Recommend not to change this config as we want the entry-pipeline to process as quick as possible @@ -203,7 +203,7 @@ otel-trace-pipeline: delay: "100" source: otel_traces_source: - #record_type: event # Add this when using Data Prepper 1.x. This option is removed in 2.0 + #record_type: event # Add this when using OpenSearch Data Prepper 1.x. This option is removed in 2.0 ssl: false # Change this to enable encryption in transit authentication: unauthenticated: @@ -318,11 +318,11 @@ You must make the following changes: * `aws_sigv4` – If you are using Amazon OpenSearch Service with AWS signing, set this value to `true`. It will sign requests with the default AWS credentials provider. * `aws_region` – If you are using Amazon OpenSearch Service with AWS signing, set this value to your AWS Region. -For other configurations available for OpenSearch sinks, see [Data Prepper OpenSearch sink]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sinks/opensearch/). +For other configurations available for OpenSearch sinks, see [OpenSearch Data Prepper OpenSearch sink]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sinks/opensearch/). ## OpenTelemetry Collector -You need to run OpenTelemetry Collector in your service environment. Follow [Getting Started](https://opentelemetry.io/docs/collector/getting-started/#getting-started) to install an OpenTelemetry collector. Ensure that you configure the collector with an exporter configured for your Data Prepper instance. The following example `otel-collector-config.yaml` file receives data from various instrumentations and exports it to Data Prepper. +You need to run OpenTelemetry Collector in your service environment. Follow [Getting Started](https://opentelemetry.io/docs/collector/getting-started/#getting-started) to install an OpenTelemetry collector. Ensure that you configure the collector with an exporter configured for your OpenSearch Data Prepper instance. The following example `otel-collector-config.yaml` file receives data from various instrumentations and exports it to OpenSearch Data Prepper. ### Example otel-collector-config.yaml file @@ -363,15 +363,15 @@ After you run OpenTelemetry in your service environment, you must configure your The [OpenSearch Dashboards Observability plugin]({{site.url}}{{site.baseurl}}/observability-plugin/trace/ta-dashboards/) documentation provides additional information about configuring OpenSearch to view trace analytics in OpenSearch Dashboards. -For more information about how to tune and scale Data Prepper for trace analytics, see [Trace tuning](#trace-tuning). +For more information about how to tune and scale OpenSearch Data Prepper for trace analytics, see [Trace tuning](#trace-tuning). -## Migrating to Data Prepper 2.0 +## Migrating to OpenSearch Data Prepper 2.0 -Starting with Data Prepper version 1.4, trace processing uses Data Prepper's event model. This allows pipeline authors to configure other processors to modify spans or traces. To provide a migration path, Data Prepper version 1.4 introduced the following changes: +Starting with OpenSearch Data Prepper version 1.4, trace processing uses OpenSearch Data Prepper's event model. This allows pipeline authors to configure other processors to modify spans or traces. To provide a migration path, OpenSearch Data Prepper version 1.4 introduced the following changes: * `otel_traces_source` has an optional `record_type` parameter that can be set to `event`. When configured, it will output event objects. * `otel_traces_raw` replaces `otel_traces_raw_prepper` for event-based spans. * `otel_traces_group` replaces `otel_traces_group_prepper` for event-based spans. -In Data Prepper version 2.0, `otel_traces_source` will only output events. Data Prepper version 2.0 also removes `otel_traces_raw_prepper` and `otel_traces_group_prepper` entirely. To migrate to Data Prepper version 2.0, you can configure your trace pipeline using the event model. +In OpenSearch Data Prepper version 2.0, `otel_traces_source` will only output events. OpenSearch Data Prepper version 2.0 also removes `otel_traces_raw_prepper` and `otel_traces_group_prepper` entirely. To migrate to OpenSearch Data Prepper version 2.0, you can configure your trace pipeline using the event model. diff --git a/_data-prepper/getting-started.md b/_data-prepper/getting-started.md index 624cd5fcbc..eac7ba4504 100644 --- a/_data-prepper/getting-started.md +++ b/_data-prepper/getting-started.md @@ -6,18 +6,18 @@ redirect_from: - /clients/data-prepper/get-started/ --- -# Getting started with Data Prepper +# Getting started with OpenSearch Data Prepper -Data Prepper is an independent component, not an OpenSearch plugin, that converts data for use with OpenSearch. It's not bundled with the all-in-one OpenSearch installation packages. +OpenSearch Data Prepper is an independent component, not an OpenSearch plugin, that converts data for use with OpenSearch. It's not bundled with the all-in-one OpenSearch installation packages. -If you are migrating from Open Distro Data Prepper, see [Migrating from Open Distro]({{site.url}}{{site.baseurl}}/data-prepper/migrate-open-distro/). +If you are migrating from Open Distro OpenSearch Data Prepper, see [Migrating from Open Distro]({{site.url}}{{site.baseurl}}/data-prepper/migrate-open-distro/). {: .note} -## 1. Installing Data Prepper +## 1. Installing OpenSearch Data Prepper -There are two ways to install Data Prepper: you can run the Docker image or build from source. +There are two ways to install OpenSearch Data Prepper: you can run the Docker image or build from source. -The easiest way to use Data Prepper is by running the Docker image. We suggest that you use this approach if you have [Docker](https://www.docker.com) available. Run the following command: +The easiest way to use OpenSearch Data Prepper is by running the Docker image. We suggest that you use this approach if you have [Docker](https://www.docker.com) available. Run the following command: ``` docker pull opensearchproject/data-prepper:latest @@ -26,24 +26,24 @@ docker pull opensearchproject/data-prepper:latest If you have special requirements that require you to build from source, or if you want to contribute, see the [Developer Guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md). -## 2. Configuring Data Prepper +## 2. Configuring OpenSearch Data Prepper -Two configuration files are required to run a Data Prepper instance. Optionally, you can configure a Log4j 2 configuration file. See [Configuring Log4j]({{site.url}}{{site.baseurl}}/data-prepper/managing-data-prepper/configuring-log4j/) for more information. The following list describes the purpose of each configuration file: +Two configuration files are required to run an OpenSearch Data Prepper instance. Optionally, you can configure a Log4j 2 configuration file. See [Configuring Log4j]({{site.url}}{{site.baseurl}}/data-prepper/managing-data-prepper/configuring-log4j/) for more information. The following list describes the purpose of each configuration file: * `pipelines.yaml`: This file describes which data pipelines to run, including sources, processors, and sinks. -* `data-prepper-config.yaml`: This file contains Data Prepper server settings that allow you to interact with exposed Data Prepper server APIs. +* `data-prepper-config.yaml`: This file contains OpenSearch Data Prepper server settings that allow you to interact with exposed OpenSearch Data Prepper server APIs. * `log4j2-rolling.properties` (optional): This file contains Log4j 2 configuration options and can be a JSON, YAML, XML, or .properties file type. -For Data Prepper versions earlier than 2.0, the `.jar` file expects the pipeline configuration file path to be followed by the server configuration file path. See the following configuration path example: +For OpenSearch Data Prepper versions earlier than 2.0, the `.jar` file expects the pipeline configuration file path to be followed by the server configuration file path. See the following configuration path example: ``` java -jar data-prepper-core-$VERSION.jar pipelines.yaml data-prepper-config.yaml ``` -Optionally, you can add `"-Dlog4j.configurationFile=config/log4j2.properties"` to the command to pass a custom Log4j 2 configuration file. If you don't provide a properties file, Data Prepper defaults to the `log4j2.properties` file in the `shared-config` directory. +Optionally, you can add `"-Dlog4j.configurationFile=config/log4j2.properties"` to the command to pass a custom Log4j 2 configuration file. If you don't provide a properties file, OpenSearch Data Prepper defaults to the `log4j2.properties` file in the `shared-config` directory. -Starting with Data Prepper 2.0, you can launch Data Prepper by using the following `data-prepper` script that does not require any additional command line arguments: +Starting with OpenSearch Data Prepper 2.0, you can launch OpenSearch Data Prepper by using the following `data-prepper` script that does not require any additional command line arguments: ``` bin/data-prepper @@ -51,7 +51,7 @@ bin/data-prepper Configuration files are read from specific subdirectories in the application's home directory: 1. `pipelines/`: Used for pipeline configurations. Pipeline configurations can be written in one or more YAML files. -2. `config/data-prepper-config.yaml`: Used for the Data Prepper server configuration. +2. `config/data-prepper-config.yaml`: Used for the OpenSearch Data Prepper server configuration. You can supply your own pipeline configuration file path followed by the server configuration file path. However, this method will not be supported in a future release. See the following example: ``` @@ -60,14 +60,14 @@ bin/data-prepper pipelines.yaml data-prepper-config.yaml The Log4j 2 configuration file is read from the `config/log4j2.properties` file located in the application's home directory. -To configure Data Prepper, see the following information for each use case: +To configure OpenSearch Data Prepper, see the following information for each use case: * [Trace analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/trace-analytics/): Learn how to collect trace data and customize a pipeline that ingests and transforms that data. -* [Log analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/log-analytics/): Learn how to set up Data Prepper for log observability. +* [Log analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/log-analytics/): Learn how to set up OpenSearch Data Prepper for log observability. ## 3. Defining a pipeline -Create a Data Prepper pipeline file named `pipelines.yaml` using the following configuration: +Create an OpenSearch Data Prepper pipeline file named `pipelines.yaml` using the following configuration: ```yml simple-sample-pipeline: @@ -80,7 +80,7 @@ simple-sample-pipeline: ``` {% include copy.html %} -## 4. Running Data Prepper +## 4. Running OpenSearch Data Prepper Run the following command with your pipeline configuration YAML. @@ -94,10 +94,10 @@ docker run --name data-prepper \ The example pipeline configuration above demonstrates a simple pipeline with a source (`random`) sending data to a sink (`stdout`). For examples of more advanced pipeline configurations, see [Pipelines]({{site.url}}{{site.baseurl}}/clients/data-prepper/pipelines/). -After starting Data Prepper, you should see log output and some UUIDs after a few seconds: +After starting OpenSearch Data Prepper, you should see log output and some UUIDs after a few seconds: ```yml -2021-09-30T20:19:44,147 [main] INFO com.amazon.dataprepper.pipeline.server.DataPrepperServer - Data Prepper server running at :4900 +2021-09-30T20:19:44,147 [main] INFO com.amazon.dataprepper.pipeline.server.DataPrepperServer - OpenSearch Data Prepper server running at :4900 2021-09-30T20:19:44,681 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer 2021-09-30T20:19:45,183 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer 2021-09-30T20:19:45,687 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer @@ -112,27 +112,27 @@ After starting Data Prepper, you should see log output and some UUIDs after a fe e51e700e-5cab-4f6d-879a-1c3235a77d18 b4ed2d7e-cf9c-4e9d-967c-b18e8af35c90 ``` -The remainder of this page provides examples for running Data Prepper from the Docker image. If you +The remainder of this page provides examples for running OpenSearch Data Prepper from the Docker image. If you built it from source, refer to the [Developer Guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md) for more information. -However you configure your pipeline, you'll run Data Prepper the same way. You run the Docker +However you configure your pipeline, you'll run OpenSearch Data Prepper the same way. You run the Docker image and modify both the `pipelines.yaml` and `data-prepper-config.yaml` files. -For Data Prepper 2.0 or later, use this command: +For OpenSearch Data Prepper 2.0 or later, use this command: ``` docker run --name data-prepper -p 4900:4900 -v ${PWD}/pipelines.yaml:/usr/share/data-prepper/pipelines/pipelines.yaml -v ${PWD}/data-prepper-config.yaml:/usr/share/data-prepper/config/data-prepper-config.yaml opensearchproject/data-prepper:latest ``` {% include copy.html %} -For Data Prepper versions earlier than 2.0, use this command: +For OpenSearch Data Prepper versions earlier than 2.0, use this command: ``` docker run --name data-prepper -p 4900:4900 -v ${PWD}/pipelines.yaml:/usr/share/data-prepper/pipelines.yaml -v ${PWD}/data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml opensearchproject/data-prepper:1.x ``` {% include copy.html %} -Once Data Prepper is running, it processes data until it is shut down. Once you are done, shut it down with the following command: +Once OpenSearch Data Prepper is running, it processes data until it is shut down. Once you are done, shut it down with the following command: ``` POST /shutdown @@ -141,20 +141,20 @@ POST /shutdown ### Additional configurations -For Data Prepper 2.0 or later, the Log4j 2 configuration file is read from `config/log4j2.properties` in the application's home directory. By default, it uses `log4j2-rolling.properties` in the *shared-config* directory. +For OpenSearch Data Prepper 2.0 or later, the Log4j 2 configuration file is read from `config/log4j2.properties` in the application's home directory. By default, it uses `log4j2-rolling.properties` in the *shared-config* directory. -For Data Prepper 1.5 or earlier, optionally add `"-Dlog4j.configurationFile=config/log4j2.properties"` to the command if you want to pass a custom log4j2 properties file. If no properties file is provided, Data Prepper defaults to the log4j2.properties file in the *shared-config* directory. +For OpenSearch Data Prepper 1.5 or earlier, optionally add `"-Dlog4j.configurationFile=config/log4j2.properties"` to the command if you want to pass a custom log4j2 properties file. If no properties file is provided, OpenSearch Data Prepper defaults to the log4j2.properties file in the *shared-config* directory. ## Next steps -Trace analytics is an important Data Prepper use case. If you haven't yet configured it, see [Trace analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/trace-analytics/). +Trace analytics is an important OpenSearch Data Prepper use case. If you haven't yet configured it, see [Trace analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/trace-analytics/). -Log ingestion is also an important Data Prepper use case. To learn more, see [Log analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/log-analytics/). +Log ingestion is also an important OpenSearch Data Prepper use case. To learn more, see [Log analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/log-analytics/). -To learn how to run Data Prepper with a Logstash configuration, see [Migrating from Logstash]({{site.url}}{{site.baseurl}}/data-prepper/migrating-from-logstash-data-prepper/). +To learn how to run OpenSearch Data Prepper with a Logstash configuration, see [Migrating from Logstash]({{site.url}}{{site.baseurl}}/data-prepper/migrating-from-logstash-data-prepper/). -For information on how to monitor Data Prepper, see [Monitoring]({{site.url}}{{site.baseurl}}/data-prepper/managing-data-prepper/monitoring/). +For information on how to monitor OpenSearch Data Prepper, see [Monitoring]({{site.url}}{{site.baseurl}}/data-prepper/managing-data-prepper/monitoring/). ## More examples -For more examples of Data Prepper, see [examples](https://github.com/opensearch-project/data-prepper/tree/main/examples/) in the Data Prepper repo. +For more examples of OpenSearch Data Prepper, see [examples](https://github.com/opensearch-project/data-prepper/tree/main/examples/) in the OpenSearch Data Prepper repo. diff --git a/_data-prepper/index.md b/_data-prepper/index.md index e418aa1966..f29f34d81a 100644 --- a/_data-prepper/index.md +++ b/_data-prepper/index.md @@ -1,6 +1,6 @@ --- layout: default -title: Data Prepper +title: OpenSearch Data Prepper nav_order: 1 has_children: false has_toc: false @@ -12,26 +12,26 @@ redirect_from: - /data-prepper/index/ --- -# Data Prepper +# OpenSearch Data Prepper -Data Prepper is a server-side data collector capable of filtering, enriching, transforming, normalizing, and aggregating data for downstream analysis and visualization. Data Prepper is the preferred data ingestion tool for OpenSearch. It is recommended for most data ingestion use cases in OpenSearch and for processing large, complex datasets. +OpenSearch Data Prepper is a server-side data collector capable of filtering, enriching, transforming, normalizing, and aggregating data for downstream analysis and visualization. OpenSearch Data Prepper is the preferred data ingestion tool for OpenSearch. It is recommended for most data ingestion use cases in OpenSearch and for processing large, complex datasets. -With Data Prepper you can build custom pipelines to improve the operational view of applications. Two common use cases for Data Prepper are trace analytics and log analytics. [Trace analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/trace-analytics/) can help you visualize event flows and identify performance problems. [Log analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/log-analytics/) equips you with tools to enhance your search capabilities, conduct comprehensive analysis, and gain insights into your applications' performance and behavior. +With OpenSearch Data Prepper you can build custom pipelines to improve the operational view of applications. Two common use cases for OpenSearch Data Prepper are trace analytics and log analytics. [Trace analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/trace-analytics/) can help you visualize event flows and identify performance problems. [Log analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/log-analytics/) equips you with tools to enhance your search capabilities, conduct comprehensive analysis, and gain insights into your applications' performance and behavior. ## Key concepts and fundamentals -Data Prepper ingests data through customizable [pipelines]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/). These pipelines consist of pluggable components that you can customize to fit your needs, even allowing you to plug in your own implementations. A Data Prepper pipeline consists of the following components: +OpenSearch Data Prepper ingests data through customizable [pipelines]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/). These pipelines consist of pluggable components that you can customize to fit your needs, even allowing you to plug in your own implementations. An OpenSearch Data Prepper pipeline consists of the following components: - One [source]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/sources/) - One or more [sinks]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sinks/sinks/) - (Optional) One [buffer]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/buffers/buffers/) - (Optional) One or more [processors]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/processors/) -Each pipeline contains two required components: `source` and `sink`. If a `buffer`, a `processor`, or both are missing from the pipeline, then Data Prepper uses the default `bounded_blocking` buffer and a no-op processor. Note that a single instance of Data Prepper can have one or more pipelines. +Each pipeline contains two required components: `source` and `sink`. If a `buffer`, a `processor`, or both are missing from the pipeline, then OpenSearch Data Prepper uses the default `bounded_blocking` buffer and a no-op processor. Note that a single instance of OpenSearch Data Prepper can have one or more pipelines. ## Basic pipeline configurations -To understand how the pipeline components function within a Data Prepper configuration, see the following examples. Each pipeline configuration uses a `yaml` file format. For more information, see [Pipelines]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/) for more information and examples. +To understand how the pipeline components function within an OpenSearch Data Prepper configuration, see the following examples. Each pipeline configuration uses a `yaml` file format. For more information, see [Pipelines]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/) for more information and examples. ### Minimal configuration @@ -74,6 +74,6 @@ In the given pipeline configuration, the `source` component reads string events ## Next steps -- [Get started with Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/). -- [Get familiar with Data Prepper pipelines]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/). +- [Get started with OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/). +- [Get familiar with OpenSearch Data Prepper pipelines]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/). - [Explore common use cases]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/common-use-cases/). diff --git a/_data-prepper/managing-data-prepper/configuring-data-prepper.md b/_data-prepper/managing-data-prepper/configuring-data-prepper.md index e42a9e9449..657f1991f9 100644 --- a/_data-prepper/managing-data-prepper/configuring-data-prepper.md +++ b/_data-prepper/managing-data-prepper/configuring-data-prepper.md @@ -1,21 +1,21 @@ --- layout: default -title: Configuring Data Prepper -parent: Managing Data Prepper +title: Configuring OpenSearch Data Prepper +parent: Managing OpenSearch Data Prepper nav_order: 5 redirect_from: - /clients/data-prepper/data-prepper-reference/ - /monitoring-plugins/trace/data-prepper-reference/ --- -# Configuring Data Prepper +# Configuring OpenSearch Data Prepper -You can customize your Data Prepper configuration by editing the `data-prepper-config.yaml` file in your Data Prepper installation. The following configuration options are independent from pipeline configuration options. +You can customize your OpenSearch Data Prepper configuration by editing the `data-prepper-config.yaml` file in your OpenSearch Data Prepper installation. The following configuration options are independent from pipeline configuration options. -## Data Prepper configuration +## OpenSearch Data Prepper configuration -Use the following options to customize your Data Prepper configuration. +Use the following options to customize your OpenSearch Data Prepper configuration. Option | Required | Type | Description :--- | :--- |:--- | :--- @@ -48,7 +48,7 @@ client_thread_count | No | Integer | The number of threads used by the peer forw max_connection_count | No | Integer | The maximum number of open connections for the peer forwarder server. Default is 500. max_pending_requests | No | Integer | The maximum number of allowed tasks in ScheduledThreadPool work queue. Default is 1024. discovery_mode | No | String | The peer discovery mode to use. Valid options are `local_node`, `static`, `dns`, or `aws_cloud_map`. Defaults to `local_node`, which processes events locally. -static_endpoints | Conditionally | List | A list containing endpoints of all Data Prepper instances. Required if `discovery_mode` is set to static. +static_endpoints | Conditionally | List | A list containing endpoints of all OpenSearch Data Prepper instances. Required if `discovery_mode` is set to static. domain_name | Conditionally | String | A single domain name to query DNS against. Typically, used by creating multiple DNS A Records for the same domain. Required if `discovery_mode` is set to dns. aws_cloud_map_namespace_name | Conditionally | String | Cloud Map namespace when using AWS Cloud Map service discovery. Required if `discovery_mode` is set to `aws_cloud_map`. aws_cloud_map_service_name | Conditionally | String | The Cloud Map service name when using AWS Cloud Map service discovery. Required if `discovery_mode` is set to `aws_cloud_map`. @@ -69,7 +69,7 @@ ssl_insecure_disable_verification | No | Boolean | Disables the verification of ssl_fingerprint_verification_only | No | Boolean | Disables the verification of server's TLS certificate chain and instead verifies only the certificate fingerprint. Default is `false`. use_acm_certificate_for_ssl | No | Boolean | Enables TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default is `false`. acm_certificate_arn | Conditionally | String | The ACM certificate ARN. The ACM certificate takes preference over S3 or a local file system certificate. Required if `use_acm_certificate_for_ssl` is set to true. -acm_private_key_password | No | String | The ACM private key password that decrypts the private key. If not provided, Data Prepper generates a random password. +acm_private_key_password | No | String | The ACM private key password that decrypts the private key. If not provided, OpenSearch Data Prepper generates a random password. acm_certificate_timeout_millis | No | Integer | The timeout in milliseconds for ACM to get certificates. Default is 120000. aws_region | Conditionally | String | The AWS region to use ACM, S3 or AWS Cloud Map. Required if `use_acm_certificate_for_ssl` is set to true or `ssl_certificate_file` and `ssl_key_file` is AWS S3 path or `discovery_mode` is set to `aws_cloud_map`. @@ -81,9 +81,9 @@ authentication | No | Map | The authentication method to use. Valid options are ### Circuit breakers -Data Prepper provides a circuit breaker to help prevent exhausting Java memory. And is useful when pipelines have stateful processors as these can retain memory usage outside of the buffers. +OpenSearch Data Prepper provides a circuit breaker to help prevent exhausting Java memory. And is useful when pipelines have stateful processors as these can retain memory usage outside of the buffers. -When a circuit breaker is tripped, Data Prepper rejects incoming data routing into buffers. +When a circuit breaker is tripped, OpenSearch Data Prepper rejects incoming data routing into buffers. Option | Required | Type | Description @@ -93,7 +93,7 @@ heap | No | [heap](#heap-circuit-breaker) | Enables a heap circuit breaker. By d #### Heap circuit breaker -Configures Data Prepper to trip a circuit breaker when JVM heap reaches a specified usage threshold. +Configures OpenSearch Data Prepper to trip a circuit breaker when JVM heap reaches a specified usage threshold. Option | Required | Type | Description :--- |:---|:---| :--- @@ -103,7 +103,7 @@ check_interval | No | Duration | Specifies the time between checks of the heap s ### Extension plugins -Data Prepper provides support for user-configurable extension plugins. Extension plugins are common configurations shared across pipeline plugins, such as [sources, buffers, processors, and sinks]({{site.url}}{{site.baseurl}}/data-prepper/index/#key-concepts-and-fundamentals). +OpenSearch Data Prepper provides support for user-configurable extension plugins. Extension plugins are common configurations shared across pipeline plugins, such as [sources, buffers, processors, and sinks]({{site.url}}{{site.baseurl}}/data-prepper/index/#key-concepts-and-fundamentals). ### AWS extension plugins diff --git a/_data-prepper/managing-data-prepper/configuring-log4j.md b/_data-prepper/managing-data-prepper/configuring-log4j.md index 175c754abf..8e051c1c5e 100644 --- a/_data-prepper/managing-data-prepper/configuring-log4j.md +++ b/_data-prepper/managing-data-prepper/configuring-log4j.md @@ -1,25 +1,25 @@ --- layout: default title: Configuring Log4j -parent: Managing Data Prepper +parent: Managing OpenSearch Data Prepper nav_order: 20 --- # Configuring Log4j -You can configure logging using Log4j in Data Prepper. +You can configure logging using Log4j in OpenSearch Data Prepper. ## Logging -Data Prepper uses [SLF4J](https://www.slf4j.org/) with a [Log4j 2 binding](https://logging.apache.org/log4j/2.x/log4j-slf4j-impl.html). +OpenSearch Data Prepper uses [SLF4J](https://www.slf4j.org/) with a [Log4j 2 binding](https://logging.apache.org/log4j/2.x/log4j-slf4j-impl.html). -For Data Prepper versions 2.0 and later, the Log4j 2 configuration file can be found and edited in `config/log4j2.properties` in the application's home directory. The default properties for Log4j 2 can be found in `log4j2-rolling.properties` in the *shared-config* directory. +For OpenSearch Data Prepper versions 2.0 and later, the Log4j 2 configuration file can be found and edited in `config/log4j2.properties` in the application's home directory. The default properties for Log4j 2 can be found in `log4j2-rolling.properties` in the *shared-config* directory. -For Data Prepper versions before 2.0, the Log4j 2 configuration file can be overridden by setting the `log4j.configurationFile` system property when running Data Prepper. The default properties for Log4j 2 can be found in `log4j2.properties` in the *shared-config* directory. +For OpenSearch Data Prepper versions before 2.0, the Log4j 2 configuration file can be overridden by setting the `log4j.configurationFile` system property when running OpenSearch Data Prepper. The default properties for Log4j 2 can be found in `log4j2.properties` in the *shared-config* directory. ### Example -When running Data Prepper, the following command can be overridden by setting the system property `-Dlog4j.configurationFile={property_value}`, where `{property_value}` is a path to the Log4j 2 configuration file: +When running OpenSearch Data Prepper, the following command can be overridden by setting the system property `-Dlog4j.configurationFile={property_value}`, where `{property_value}` is a path to the Log4j 2 configuration file: ``` java "-Dlog4j.configurationFile=config/custom-log4j2.properties" -jar data-prepper-core-$VERSION.jar pipelines.yaml data-prepper-config.yaml diff --git a/_data-prepper/managing-data-prepper/core-apis.md b/_data-prepper/managing-data-prepper/core-apis.md index b810c7b15e..675594e9a7 100644 --- a/_data-prepper/managing-data-prepper/core-apis.md +++ b/_data-prepper/managing-data-prepper/core-apis.md @@ -1,13 +1,13 @@ --- layout: default title: Core APIs -parent: Managing Data Prepper +parent: Managing OpenSearch Data Prepper nav_order: 15 --- # Core APIs -All Data Prepper instances expose a server with some control APIs. By default, this server runs on port 4900. Some plugins, especially source plugins, may expose other servers that run on different ports. Configurations for these plugins are independent of the core API. For example, to shut down Data Prepper, you can run the following curl request: +All OpenSearch Data Prepper instances expose a server with some control APIs. By default, this server runs on port 4900. Some plugins, especially source plugins, may expose other servers that run on different ports. Configurations for these plugins are independent of the core API. For example, to shut down OpenSearch Data Prepper, you can run the following curl request: ``` curl -X POST http://localhost:4900/shutdown @@ -20,13 +20,13 @@ The following table lists the available APIs. | Name | Description | | --- | --- | | ```GET /list```
```POST /list``` | Returns a list of running pipelines. | -| ```POST /shutdown``` | Starts a graceful shutdown of Data Prepper. | -| ```GET /metrics/prometheus```
```POST /metrics/prometheus``` | Returns a scrape of Data Prepper metrics in Prometheus text format. This API is available as a `metricsRegistries` parameter in the Data Prepper configuration file `data-prepper-config.yaml` and contains `Prometheus` as part of the registry. -| ```GET /metrics/sys```
```POST /metrics/sys``` | Returns JVM metrics in Prometheus text format. This API is available as a `metricsRegistries` parameter in the Data Prepper configuration file `data-prepper-config.yaml` and contains `Prometheus` as part of the registry. +| ```POST /shutdown``` | Starts a graceful shutdown of OpenSearch Data Prepper. | +| ```GET /metrics/prometheus```
```POST /metrics/prometheus``` | Returns a scrape of OpenSearch Data Prepper metrics in Prometheus text format. This API is available as a `metricsRegistries` parameter in the OpenSearch Data Prepper configuration file `data-prepper-config.yaml` and contains `Prometheus` as part of the registry. +| ```GET /metrics/sys```
```POST /metrics/sys``` | Returns JVM metrics in Prometheus text format. This API is available as a `metricsRegistries` parameter in the OpenSearch Data Prepper configuration file `data-prepper-config.yaml` and contains `Prometheus` as part of the registry. ## Configuring the server -You can configure your Data Prepper core APIs through the `data-prepper-config.yaml` file. +You can configure your OpenSearch Data Prepper core APIs through the `data-prepper-config.yaml` file. ### SSL/TLS connection @@ -36,7 +36,7 @@ Many of the getting started guides for this project disable SSL on the endpoint: ssl: false ``` -To enable SSL on your Data Prepper endpoint, configure your `data-prepper-config.yaml` file with the following options: +To enable SSL on your OpenSearch Data Prepper endpoint, configure your `data-prepper-config.yaml` file with the following options: ```yaml ssl: true @@ -45,7 +45,7 @@ keyStorePassword: "secret" privateKeyPassword: "secret" ``` -For more information about configuring your Data Prepper server with SSL, see [Server Configuration](https://github.com/opensearch-project/data-prepper/blob/main/docs/configuration.md#server-configuration). If you are using a self-signed certificate, you can add the `-k` flag to the request to quickly test core APIs with SSL. Use the following `shutdown` request to test core APIs with SSL: +For more information about configuring your OpenSearch Data Prepper server with SSL, see [Server Configuration](https://github.com/opensearch-project/data-prepper/blob/main/docs/configuration.md#server-configuration). If you are using a self-signed certificate, you can add the `-k` flag to the request to quickly test core APIs with SSL. Use the following `shutdown` request to test core APIs with SSL: ``` @@ -54,7 +54,7 @@ curl -k -X POST https://localhost:4900/shutdown ### Authentication -The Data Prepper core APIs support HTTP basic authentication. You can set the username and password with the following configuration in the `data-prepper-config.yaml` file: +The OpenSearch Data Prepper core APIs support HTTP basic authentication. You can set the username and password with the following configuration in the `data-prepper-config.yaml` file: ```yaml authentication: @@ -63,7 +63,7 @@ authentication: password: "mys3cr3t" ``` -You can disable authentication of core endpoints using the following configuration. Use this with caution because the shutdown API and others will be accessible to anybody with network access to your Data Prepper instance. +You can disable authentication of core endpoints using the following configuration. Use this with caution because the shutdown API and others will be accessible to anybody with network access to your OpenSearch Data Prepper instance. ```yaml authentication: @@ -72,15 +72,15 @@ authentication: ### Peer Forwarder -Peer Forwarder can be configured to enable stateful aggregation across multiple Data Prepper nodes. For more information about configuring Peer Forwarder, see [Peer forwarder]({{site.url}}{{site.baseurl}}/data-prepper/managing-data-prepper/peer-forwarder/). It is supported by the `service_map_stateful`, `otel_traces_raw`, and `aggregate` processors. +Peer Forwarder can be configured to enable stateful aggregation across multiple OpenSearch Data Prepper nodes. For more information about configuring Peer Forwarder, see [Peer forwarder]({{site.url}}{{site.baseurl}}/data-prepper/managing-data-prepper/peer-forwarder/). It is supported by the `service_map_stateful`, `otel_traces_raw`, and `aggregate` processors. ### Shutdown timeouts -When you run the Data Prepper `shutdown` API, the process gracefully shuts down and clears any remaining data for both the `ExecutorService` sink and `ExecutorService` processor. The default timeout for shutdown of both processes is 10 seconds. You can configure the timeout with the following optional `data-prepper-config.yaml` file parameters: +When you run the OpenSearch Data Prepper `shutdown` API, the process gracefully shuts down and clears any remaining data for both the `ExecutorService` sink and `ExecutorService` processor. The default timeout for shutdown of both processes is 10 seconds. You can configure the timeout with the following optional `data-prepper-config.yaml` file parameters: ```yaml processorShutdownTimeout: "PT15M" sinkShutdownTimeout: 30s ``` -The values for these parameters are parsed into a `Duration` object through the [Data Prepper Duration Deserializer](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-pipeline-parser/src/main/java/org/opensearch/dataprepper/pipeline/parser/DataPrepperDurationDeserializer.java). +The values for these parameters are parsed into a `Duration` object through the [OpenSearch Data Prepper Duration Deserializer](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-pipeline-parser/src/main/java/org/opensearch/dataprepper/pipeline/parser/DataPrepperDurationDeserializer.java). diff --git a/_data-prepper/managing-data-prepper/extensions/extensions.md b/_data-prepper/managing-data-prepper/extensions/extensions.md index 8cbfc602c7..f5e96185b8 100644 --- a/_data-prepper/managing-data-prepper/extensions/extensions.md +++ b/_data-prepper/managing-data-prepper/extensions/extensions.md @@ -1,15 +1,15 @@ --- layout: default title: Extensions -parent: Managing Data Prepper +parent: Managing OpenSearch Data Prepper has_children: true nav_order: 18 --- # Extensions -Data Prepper extensions provide Data Prepper functionality outside of core Data Prepper pipeline components. -Many extensions provide configuration options that give Data Prepper administrators greater flexibility over Data Prepper's functionality. +OpenSearch Data Prepper extensions provide OpenSearch Data Prepper functionality outside of core OpenSearch Data Prepper pipeline components. +Many extensions provide configuration options that give OpenSearch Data Prepper administrators greater flexibility over OpenSearch Data Prepper's functionality. Extension configurations can be configured in the `data-prepper-config.yaml` file under the `extensions:` YAML block. diff --git a/_data-prepper/managing-data-prepper/extensions/geoip-service.md b/_data-prepper/managing-data-prepper/extensions/geoip-service.md index 53c21a08ff..2784bb5e69 100644 --- a/_data-prepper/managing-data-prepper/extensions/geoip-service.md +++ b/_data-prepper/managing-data-prepper/extensions/geoip-service.md @@ -3,16 +3,16 @@ layout: default title: geoip_service nav_order: 5 parent: Extensions -grand_parent: Managing Data Prepper +grand_parent: Managing OpenSearch Data Prepper --- # geoip_service -The `geoip_service` extension configures all [`geoip`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/geoip) processors in Data Prepper. +The `geoip_service` extension configures all [`geoip`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/geoip) processors in OpenSearch Data Prepper. ## Usage -You can configure the GeoIP service that Data Prepper uses for the `geoip` processor. +You can configure the GeoIP service that OpenSearch Data Prepper uses for the `geoip` processor. By default, the GeoIP service comes with the [`maxmind`](#maxmind) option configured. The following example shows how to configure the `geoip_service` in the `data-prepper-config.yaml` file: @@ -28,13 +28,13 @@ extensions: ## maxmind The GeoIP service supports the MaxMind [GeoIP and GeoLite](https://dev.maxmind.com/geoip) databases. -By default, Data Prepper will use all three of the following [MaxMind GeoLite2](https://dev.maxmind.com/geoip/geolite2-free-geolocation-data) databases: +By default, OpenSearch Data Prepper will use all three of the following [MaxMind GeoLite2](https://dev.maxmind.com/geoip/geolite2-free-geolocation-data) databases: * City * Country * ASN -The service also downloads databases automatically to keep Data Prepper up to date with changes from MaxMind. +The service also downloads databases automatically to keep OpenSearch Data Prepper up to date with changes from MaxMind. You can use the following options to configure the `maxmind` extension. @@ -64,4 +64,4 @@ Option | Required | Type | Description `region` | No | String | The AWS Region to use for the credentials. Default is the [standard SDK behavior for determining the Region](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/region-selection.html). `sts_role_arn` | No | String | The AWS Security Token Service (AWS STS) role to assume for requests to Amazon S3. Default is `null`, which will use the [standard SDK behavior for credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html). `aws_sts_header_overrides` | No | Map | A map of header overrides that the AWS Identity and Access Management (IAM) role assumes when downloading from Amazon S3. -`sts_external_id` | No | String | An STS external ID used when Data Prepper assumes the STS role. For more information, see the `ExternalID` documentation in the [STS AssumeRole](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html) API reference. +`sts_external_id` | No | String | An STS external ID used when OpenSearch Data Prepper assumes the STS role. For more information, see the `ExternalID` documentation in the [STS AssumeRole](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html) API reference. diff --git a/_data-prepper/managing-data-prepper/managing-data-prepper.md b/_data-prepper/managing-data-prepper/managing-data-prepper.md index ea2d1f111c..867c1f97d3 100644 --- a/_data-prepper/managing-data-prepper/managing-data-prepper.md +++ b/_data-prepper/managing-data-prepper/managing-data-prepper.md @@ -1,10 +1,10 @@ --- layout: default -title: Managing Data Prepper +title: Managing OpenSearch Data Prepper has_children: true nav_order: 20 --- -# Managing Data Prepper +# Managing OpenSearch Data Prepper -You can perform administrator functions for Data Prepper, including system configuration, interacting with core APIs, Log4j configuration, and monitoring. You can set up peer forwarding to coordinate multiple Data Prepper nodes when using stateful aggregation. \ No newline at end of file +You can perform administrator functions for OpenSearch Data Prepper, including system configuration, interacting with core APIs, Log4j configuration, and monitoring. You can set up peer forwarding to coordinate multiple OpenSearch Data Prepper nodes when using stateful aggregation. \ No newline at end of file diff --git a/_data-prepper/managing-data-prepper/monitoring.md b/_data-prepper/managing-data-prepper/monitoring.md index 691f376b33..abe576b87f 100644 --- a/_data-prepper/managing-data-prepper/monitoring.md +++ b/_data-prepper/managing-data-prepper/monitoring.md @@ -1,17 +1,17 @@ --- layout: default title: Monitoring -parent: Managing Data Prepper +parent: Managing OpenSearch Data Prepper nav_order: 25 --- -# Monitoring Data Prepper with metrics +# Monitoring OpenSearch Data Prepper with metrics -You can monitor Data Prepper with metrics using [Micrometer](https://micrometer.io/). There are two types of metrics: JVM/system metrics and plugin metrics. [Prometheus](https://prometheus.io/) is used as the default metrics backend. +You can monitor OpenSearch Data Prepper with metrics using [Micrometer](https://micrometer.io/). There are two types of metrics: JVM/system metrics and plugin metrics. [Prometheus](https://prometheus.io/) is used as the default metrics backend. ## JVM and system metrics -JVM and system metrics are runtime metrics that are used to monitor Data Prepper instances. They include metrics for classloaders, memory, garbage collection, threads, and others. For more information, see [JVM and system metrics](https://micrometer.io/?/docs/ref/jvm). +JVM and system metrics are runtime metrics that are used to monitor OpenSearch Data Prepper instances. They include metrics for classloaders, memory, garbage collection, threads, and others. For more information, see [JVM and system metrics](https://micrometer.io/?/docs/ref/jvm). ### Naming @@ -19,11 +19,11 @@ JVM and system metrics follow predefined names in [Micrometer](https://micromete ### Serving -By default, metrics are served from the **/metrics/sys** endpoint on the Data Prepper server in Prometheus scrape format. You can configure Prometheus to scrape from the Data Prepper URL. Prometheus then polls Data Prepper for metrics and stores them in its database. To visualize the data, you can set up any frontend that accepts Prometheus metrics, such as [Grafana](https://prometheus.io/docs/visualization/grafana/). You can update the configuration to serve metrics to other registries like Amazon CloudWatch, which does not require or host the endpoint but publishes the metrics directly to CloudWatch. +By default, metrics are served from the **/metrics/sys** endpoint on the OpenSearch Data Prepper server in Prometheus scrape format. You can configure Prometheus to scrape from the OpenSearch Data Prepper URL. Prometheus then polls OpenSearch Data Prepper for metrics and stores them in its database. To visualize the data, you can set up any frontend that accepts Prometheus metrics, such as [Grafana](https://prometheus.io/docs/visualization/grafana/). You can update the configuration to serve metrics to other registries like Amazon CloudWatch, which does not require or host the endpoint but publishes the metrics directly to CloudWatch. ## Plugin metrics -Plugins report their own metrics. Data Prepper uses a naming convention to help with consistency in the metrics. Plugin metrics do not use dimensions. +Plugins report their own metrics. OpenSearch Data Prepper uses a naming convention to help with consistency in the metrics. Plugin metrics do not use dimensions. 1. AbstractBuffer @@ -56,4 +56,4 @@ Metrics follow a naming convention of **PIPELINE_NAME_PLUGIN_NAME_METRIC_NAME**. ### Serving -By default, metrics are served from the **/metrics/sys** endpoint on the Data Prepper server in a Prometheus scrape format. You can configure Prometheus to scrape from the Data Prepper URL. The Data Prepper server port has a default value of `4900` that you can modify, and this port can be used for any frontend that accepts Prometheus metrics, such as [Grafana](https://prometheus.io/docs/visualization/grafana/). You can update the configuration to serve metrics to other registries like CloudWatch, that does not require or host the endpoint, but publishes the metrics directly to CloudWatch. \ No newline at end of file +By default, metrics are served from the **/metrics/sys** endpoint on the OpenSearch Data Prepper server in a Prometheus scrape format. You can configure Prometheus to scrape from the OpenSearch Data Prepper URL. The OpenSearch Data Prepper server port has a default value of `4900` that you can modify, and this port can be used for any frontend that accepts Prometheus metrics, such as [Grafana](https://prometheus.io/docs/visualization/grafana/). You can update the configuration to serve metrics to other registries like CloudWatch, that does not require or host the endpoint, but publishes the metrics directly to CloudWatch. \ No newline at end of file diff --git a/_data-prepper/managing-data-prepper/peer-forwarder.md b/_data-prepper/managing-data-prepper/peer-forwarder.md index f6a0f9890a..dd6384e2b6 100644 --- a/_data-prepper/managing-data-prepper/peer-forwarder.md +++ b/_data-prepper/managing-data-prepper/peer-forwarder.md @@ -2,16 +2,16 @@ layout: default title: Peer forwarder nav_order: 12 -parent: Managing Data Prepper +parent: Managing OpenSearch Data Prepper --- # Peer forwarder -Peer forwarder is an HTTP service that performs peer forwarding of an `event` between Data Prepper nodes for aggregation. This HTTP service uses a hash-ring approach to aggregate events and determine which Data Prepper node it should handle on a given trace before rerouting it to that node. Currently, peer forwarder is supported by the `aggregate`, `service_map_stateful`, and `otel_traces_raw` [processors]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/processors/). +Peer forwarder is an HTTP service that performs peer forwarding of an `event` between OpenSearch Data Prepper nodes for aggregation. This HTTP service uses a hash-ring approach to aggregate events and determine which OpenSearch Data Prepper node it should handle on a given trace before rerouting it to that node. Currently, peer forwarder is supported by the `aggregate`, `service_map_stateful`, and `otel_traces_raw` [processors]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/processors/). Peer Forwarder groups events based on the identification keys provided by the supported processors. For `service_map_stateful` and `otel_traces_raw`, the identification key is `traceId` by default and cannot be configured. The `aggregate` processor is configured using the `identification_keys` configuration option. From here, you can specify which keys to use for Peer Forwarder. See [Aggregate Processor page](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/aggregate-processor#identification_keys) for more information about identification keys. -Peer discovery allows Data Prepper to find other nodes that it will communicate with. Currently, peer discovery is provided by a static list, a DNS record lookup, or AWS Cloud Map. +Peer discovery allows OpenSearch Data Prepper to find other nodes that it will communicate with. Currently, peer discovery is provided by a static list, a DNS record lookup, or AWS Cloud Map. ## Discovery modes @@ -19,7 +19,7 @@ The following sections provide information about discovery modes. ### Static -Static discovery mode allows a Data Prepper node to discover nodes using a list of IP addresses or domain names. See the following YAML file for an example of static discovery mode: +Static discovery mode allows an OpenSearch Data Prepper node to discover nodes using a list of IP addresses or domain names. See the following YAML file for an example of static discovery mode: ```yaml peer_forwarder:4 @@ -29,7 +29,7 @@ peer_forwarder:4 ### DNS lookup -DNS discovery is preferred over static discovery when scaling out a Data Prepper cluster. DNS discovery configures a DNS provider to return a list of Data Prepper hosts when given a single domain name. This list consists of a [DNS A record](https://www.cloudflare.com/learning/dns/dns-records/dns-a-record/), and a list of IP addresses of a given domain. See the following YAML file for an example of DNS lookup: +DNS discovery is preferred over static discovery when scaling out an OpenSearch Data Prepper cluster. DNS discovery configures a DNS provider to return a list of OpenSearch Data Prepper hosts when given a single domain name. This list consists of a [DNS A record](https://www.cloudflare.com/learning/dns/dns-records/dns-a-record/), and a list of IP addresses of a given domain. See the following YAML file for an example of DNS lookup: ```yaml peer_forwarder: @@ -43,13 +43,13 @@ peer_forwarder: Peer forwarder can use the API-based service discovery in AWS Cloud Map. To support this, you must have an existing namespace configured for API instance discovery. You can create a new one by following the instructions provided by the [AWS Cloud Map documentation](https://docs.aws.amazon.com/cloud-map/latest/dg/working-with-namespaces.html). -Your Data Prepper configuration needs to include the following: +Your OpenSearch Data Prepper configuration needs to include the following: * `aws_cloud_map_namespace_name` – Set to your AWS Cloud Map namespace name. * `aws_cloud_map_service_name` – Set to the service name within your specified namespace. * `aws_region` – Set to the AWS Region in which your namespace exists. * `discovery_mode` – Set to `aws_cloud_map`. -Your Data Prepper configuration can optionally include the following: +Your OpenSearch Data Prepper configuration can optionally include the following: * `aws_cloud_map_query_parameters` – Key-value pairs are used to filter the results based on the custom attributes attached to an instance. Results include only those instances that match all of the specified key-value pairs. #### Example configuration @@ -68,7 +68,7 @@ peer_forwarder: ### IAM policy with necessary permissions -Data Prepper must also be running with the necessary permissions. The following AWS Identity and Access Management (IAM) policy shows the necessary permissions: +OpenSearch Data Prepper must also be running with the necessary permissions. The following AWS Identity and Access Management (IAM) policy shows the necessary permissions: ```json { @@ -98,7 +98,7 @@ The following table provides optional configuration values. | `client_thread_count` | Integer | Represents the number of threads used by the peer forwarder client. Default value is `200`.| | `maxConnectionCount` | Integer | Represents the maximum number of open connections for the peer forwarder server. Default value is `500`. | | `discovery_mode` | String | Represents the peer discovery mode to be used. Allowable values are `local_node`, `static`, `dns`, and `aws_cloud_map`. Defaults to `local_node`, which processes events locally. | -| `static_endpoints` | List | Contains the endpoints of all Data Prepper instances. Required if `discovery_mode` is set to `static`. | +| `static_endpoints` | List | Contains the endpoints of all OpenSearch Data Prepper instances. Required if `discovery_mode` is set to `static`. | | `domain_name` | String | Represents the single domain name to query DNS against. Typically used by creating multiple [DNS A records](https://www.cloudflare.com/learning/dns/dns-records/dns-a-record/) for the same domain. Required if `discovery_mode` is set to `dns`. | | `aws_cloud_map_namespace_name` | String | Represents the AWS Cloud Map namespace when using AWS Cloud Map service discovery. Required if `discovery_mode` is set to `aws_cloud_map`. | | `aws_cloud_map_service_name` | String | Represents the AWS Cloud Map service when using AWS Cloud Map service discovery. Required if `discovery_mode` is set to `aws_cloud_map`. | @@ -110,7 +110,7 @@ The following table provides optional configuration values. ## SSL configuration -The following table provides optional SSL configuration values that allow you to set up a trust manager for the peer forwarder client in order to connect to other Data Prepper instances. +The following table provides optional SSL configuration values that allow you to set up a trust manager for the peer forwarder client in order to connect to other OpenSearch Data Prepper instances. | Value | Type | Description | | ----- | ---- | ----------- | @@ -179,4 +179,4 @@ The following table provides counter metric options. ### Gauge -`peerEndpoints` Measures the number of dynamically discovered peer Data Prepper endpoints. For `static` mode, the size is fixed. +`peerEndpoints` Measures the number of dynamically discovered peer OpenSearch Data Prepper endpoints. For `static` mode, the size is fixed. diff --git a/_data-prepper/managing-data-prepper/source-coordination.md b/_data-prepper/managing-data-prepper/source-coordination.md index 3c60b45280..a27934c31e 100644 --- a/_data-prepper/managing-data-prepper/source-coordination.md +++ b/_data-prepper/managing-data-prepper/source-coordination.md @@ -2,30 +2,30 @@ layout: default title: Source coordination nav_order: 35 -parent: Managing Data Prepper +parent: Managing OpenSearch Data Prepper --- # Source coordination -_Source coordination_ is the concept of coordinating and distributing work between Data Prepper data sources in a multi-node environment. Some data sources, such as Amazon Kinesis or Amazon Simple Queue Service (Amazon SQS), handle coordination natively. Other data sources, such as OpenSearch, Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB, and JDBC/ODBC, do not support source coordination. +_Source coordination_ is the concept of coordinating and distributing work between OpenSearch Data Prepper data sources in a multi-node environment. Some data sources, such as Amazon Kinesis or Amazon Simple Queue Service (Amazon SQS), handle coordination natively. Other data sources, such as OpenSearch, Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB, and JDBC/ODBC, do not support source coordination. -Data Prepper source coordination decides which partition of work is performed by each node in the Data Prepper cluster and prevents duplicate partitions of work. +OpenSearch Data Prepper source coordination decides which partition of work is performed by each node in the OpenSearch Data Prepper cluster and prevents duplicate partitions of work. -Inspired by the [Kinesis Client Library](https://docs.aws.amazon.com/streams/latest/dev/shared-throughput-kcl-consumers.html), Data Prepper utilizes a distributed store in the form of a lease to handle the distribution and deduplication of work. +Inspired by the [Kinesis Client Library](https://docs.aws.amazon.com/streams/latest/dev/shared-throughput-kcl-consumers.html), OpenSearch Data Prepper utilizes a distributed store in the form of a lease to handle the distribution and deduplication of work. ## Formatting partitions Source coordination separates sources into "partitions of work." For example, an S3 object would be a partition of work for Amazon S3, or an OpenSearch index would be a partition of work for OpenSearch. -Data Prepper takes each partition of work that is chosen by the source and creates corresponding items in the distributed store that Data Prepper uses for source coordination. Each of these items has the following standard format, which can be extended by the distributed store implementation. +OpenSearch Data Prepper takes each partition of work that is chosen by the source and creates corresponding items in the distributed store that OpenSearch Data Prepper uses for source coordination. Each of these items has the following standard format, which can be extended by the distributed store implementation. | Value | Type | Description | | :--- | :--- | :--- | -| `sourceIdentifier` | String | The identifier for which the Data Prepper pipeline works on this partition. By default, the `sourceIdentifier` is prefixed by the sub-pipeline name, but an additional prefix can be configured with `partition_prefix` in your data-prepper-config.yaml file. | +| `sourceIdentifier` | String | The identifier for which the OpenSearch Data Prepper pipeline works on this partition. By default, the `sourceIdentifier` is prefixed by the sub-pipeline name, but an additional prefix can be configured with `partition_prefix` in your data-prepper-config.yaml file. | | `sourcePartitionKey` | String | The identifier for the partition of work associated with this item. For example, for an `s3` source with scan capabilities, this identifier is the S3 bucket's `objectKey` combination. | `partitionOwner` | String | An identifier for the node that actively owns and is working on this partition. This ID contains the hostname of the node but is `null` when this partition is not owned. | | `partitionProgressState` | String | A JSON string object representing the progress made on a partition of work or any additional metadata that may be needed by the source in the case of another node resuming where the last node stopped during a crash. | -| `partitionOwnershipTimeout` | Timestamp | Whenever a Data Prepper node acquires a partition, a 10-minute timeout is given to the owner of the partition to handle the event of a node crashing. The ownership is renewed with another 10 minutes when the owner saves the state of the partition. | +| `partitionOwnershipTimeout` | Timestamp | Whenever an OpenSearch Data Prepper node acquires a partition, a 10-minute timeout is given to the owner of the partition to handle the event of a node crashing. The ownership is renewed with another 10 minutes when the owner saves the state of the partition. | | `sourcePartitionStatus` | Enum | Represents the current state of the partition: `ASSIGNED` means the partition is currently being processed, `UNASSIGNED` means the partition is waiting to be processed, `CLOSED` means the partition is waiting to be processed at a later date, and `COMPLETED` means the partition has already been processed. | | `reOpenAt` | Timestamp | Represents the time at which CLOSED partitions reopen and are considered to be available for processing. Only applies to CLOSED partitions. | | `closedCount` | Long | Tracks how many times the partition has been marked as `CLOSED`.| @@ -33,13 +33,13 @@ Data Prepper takes each partition of work that is chosen by the source and creat ## Acquiring partitions -Partitions are acquired in the order that they are returned in the `List` provided by the source. When a node attempts to acquire a partition, Data Prepper performs the following steps: +Partitions are acquired in the order that they are returned in the `List` provided by the source. When a node attempts to acquire a partition, OpenSearch Data Prepper performs the following steps: -1. Data Prepper queries the `ASSIGNED` partitions to check whether any `ASSIGNED` partitions have expired partition owners. This is intended to assign priority to partitions that have had nodes crash in the middle of processing, which can allow for using a partition state that may be time sensitive. -2. After querying `ASSIGNED` partitions, Data Prepper queries the `CLOSED` partitions to determine whether any of the partition's `reOpenAt` timestamps have been reached. -3. If there are no `ASSIGNED` or `CLOSED` partitions available, then Data Prepper queries the `UNASSIGNED` partitions until on of these partitions is `ASSIGNED`. +1. OpenSearch Data Prepper queries the `ASSIGNED` partitions to check whether any `ASSIGNED` partitions have expired partition owners. This is intended to assign priority to partitions that have had nodes crash in the middle of processing, which can allow for using a partition state that may be time sensitive. +2. After querying `ASSIGNED` partitions, OpenSearch Data Prepper queries the `CLOSED` partitions to determine whether any of the partition's `reOpenAt` timestamps have been reached. +3. If there are no `ASSIGNED` or `CLOSED` partitions available, then OpenSearch Data Prepper queries the `UNASSIGNED` partitions until on of these partitions is `ASSIGNED`. -If this flow occurs and no partition is acquired by the node, then the partition supplier function provided in the `getNextPartition` method of `SourceCoordinator` will create new partitions. After the supplier function completes, Data Prepper again queries the partitions for `ASSIGNED`, `CLOSED`, and `UNASSIGNED`. +If this flow occurs and no partition is acquired by the node, then the partition supplier function provided in the `getNextPartition` method of `SourceCoordinator` will create new partitions. After the supplier function completes, OpenSearch Data Prepper again queries the partitions for `ASSIGNED`, `CLOSED`, and `UNASSIGNED`. ## Global state @@ -51,23 +51,23 @@ The following table provide optional configuration values for `source_coordinati | Value | Type | Description | | :--- | :--- | :--- | -| `partition_prefix` | String | A prefix to the `sourceIdentifier` used to differentiate between Data Prepper clusters that share the same distributed store. | +| `partition_prefix` | String | A prefix to the `sourceIdentifier` used to differentiate between OpenSearch Data Prepper clusters that share the same distributed store. | | `store` | Object | The object that comprises the configuration for the store to be used, where the key is the name of the store, such as `in_memory` or `dynamodb`, and the value is any configuration available on that store type. | ### Supported stores -As of Data Prepper 2.4, only `in_memory` and `dynamodb` stores are supported: +As of OpenSearch Data Prepper 2.4, only `in_memory` and `dynamodb` stores are supported: - The `in_memory` store is the default when no `source_coordination` settings are configured in the `data-prepper-config.yaml` file and should only be used for single-node configurations. -- The `dynamodb` store is used for multi-node Data Prepper environments. The `dynamodb` store can be shared between one or more Data Prepper clusters that need to utilize source coordination. +- The `dynamodb` store is used for multi-node OpenSearch Data Prepper environments. The `dynamodb` store can be shared between one or more OpenSearch Data Prepper clusters that need to utilize source coordination. #### DynamoDB store -Data Prepper will attempt to create the `dynamodb` table on startup unless the `skip_table_creation` flag is configured to `true`. Optionally, you can configure the [time-to-live](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html) (`ttl`) on the table, which results in the store cleaning up items over time. Some sources rely on source coordination for the deduplication of data, so be sure to configure a large enough `ttl` for the pipeline duration. +OpenSearch Data Prepper will attempt to create the `dynamodb` table on startup unless the `skip_table_creation` flag is configured to `true`. Optionally, you can configure the [time-to-live](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html) (`ttl`) on the table, which results in the store cleaning up items over time. Some sources rely on source coordination for the deduplication of data, so be sure to configure a large enough `ttl` for the pipeline duration. If `ttl` is not configured on the table, any items no longer needed in the table must be cleaned manually. -The following shows the full set of permissions needed for Data Prepper to create the table, enable `ttl`, and interact with the table: +The following shows the full set of permissions needed for OpenSearch Data Prepper to create the table, enable `ttl`, and interact with the table: ```json { diff --git a/_data-prepper/migrate-open-distro.md b/_data-prepper/migrate-open-distro.md index 8b3e7a7198..ad3ac88bf8 100644 --- a/_data-prepper/migrate-open-distro.md +++ b/_data-prepper/migrate-open-distro.md @@ -8,19 +8,19 @@ redirect_from: # Migrating from Open Distro -Existing users can migrate from the Open Distro Data Prepper to OpenSearch Data Prepper. Beginning with Data Prepper version 1.1, there is only one distribution of OpenSearch Data Prepper. +Existing users can migrate from the Open Distro OpenSearch Data Prepper to OpenSearch Data Prepper. Beginning with OpenSearch Data Prepper version 1.1, there is only one distribution of OpenSearch Data Prepper. ## Change your pipeline configuration The `elasticsearch` sink has changed to `opensearch`. Therefore, change your existing pipeline to use the `opensearch` plugin instead of `elasticsearch`. -While the Data Prepper plugin is titled `opensearch`, it remains compatible with Open Distro and ElasticSearch 7.x. +While the OpenSearch Data Prepper plugin is titled `opensearch`, it remains compatible with Open Distro and ElasticSearch 7.x. {: .note} ## Update Docker image -In your Data Prepper Docker configuration, adjust `amazon/opendistro-for-elasticsearch-data-prepper` to `opensearchproject/data-prepper`. This change will download the latest Data Prepper Docker image. +In your OpenSearch Data Prepper Docker configuration, adjust `amazon/opendistro-for-elasticsearch-data-prepper` to `opensearchproject/data-prepper`. This change will download the latest OpenSearch Data Prepper Docker image. ## Next steps -For more information about Data Prepper configurations, see [Getting Started with Data Prepper]({{site.url}}{{site.baseurl}}/clients/data-prepper/get-started/). +For more information about OpenSearch Data Prepper configurations, see [Getting Started with OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/clients/data-prepper/get-started/). diff --git a/_data-prepper/migrating-from-logstash-data-prepper.md b/_data-prepper/migrating-from-logstash-data-prepper.md index 3d87f29517..fb1d31b46f 100644 --- a/_data-prepper/migrating-from-logstash-data-prepper.md +++ b/_data-prepper/migrating-from-logstash-data-prepper.md @@ -9,15 +9,15 @@ redirect_from: # Migrating from Logstash -You can run Data Prepper with a Logstash configuration. +You can run OpenSearch Data Prepper with a Logstash configuration. -As mentioned in [Getting started with Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/), you'll need to configure Data Prepper with a pipeline using a `pipelines.yaml` file. +As mentioned in [Getting started with OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/), you'll need to configure OpenSearch Data Prepper with a pipeline using a `pipelines.yaml` file. -Alternatively, if you have a Logstash configuration `logstash.conf` to configure Data Prepper instead of `pipelines.yaml`. +Alternatively, if you have a Logstash configuration `logstash.conf` to configure OpenSearch Data Prepper instead of `pipelines.yaml`. ## Supported plugins -As of the Data Prepper 1.2 release, the following plugins from the Logstash configuration are supported: +As of the OpenSearch Data Prepper 1.2 release, the following plugins from the Logstash configuration are supported: * HTTP Input plugin * Grok Filter plugin * Elasticsearch Output plugin @@ -25,11 +25,11 @@ As of the Data Prepper 1.2 release, the following plugins from the Logstash conf ## Limitations * Apart from the supported plugins, all other plugins from the Logstash configuration will throw an `Exception` and fail to run. -* Conditionals in the Logstash configuration are not supported as of the Data Prepper 1.2 release. +* Conditionals in the Logstash configuration are not supported as of the OpenSearch Data Prepper 1.2 release. -## Running Data Prepper with a Logstash configuration +## Running OpenSearch Data Prepper with a Logstash configuration -1. To install Data Prepper's Docker image, see Installing Data Prepper in [Getting Started with Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started#1-installing-data-prepper). +1. To install OpenSearch Data Prepper's Docker image, see Installing OpenSearch Data Prepper in [Getting Started with OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started#1-installing-opensearch-data-prepper). 2. Run the Docker image installed in Step 1 by supplying your `logstash.conf` configuration. @@ -37,11 +37,11 @@ As of the Data Prepper 1.2 release, the following plugins from the Logstash conf docker run --name data-prepper -p 4900:4900 -v ${PWD}/logstash.conf:/usr/share/data-prepper/pipelines.conf opensearchproject/data-prepper:latest pipelines.conf ``` -The `logstash.conf` file is converted to `logstash.yaml` by mapping the plugins and attributes in the Logstash configuration to the corresponding plugins and attributes in Data Prepper. +The `logstash.conf` file is converted to `logstash.yaml` by mapping the plugins and attributes in the Logstash configuration to the corresponding plugins and attributes in OpenSearch Data Prepper. You can find the converted `logstash.yaml` file in the same directory where you stored `logstash.conf`. -The following output in your terminal indicates that Data Prepper is running correctly: +The following output in your terminal indicates that OpenSearch Data Prepper is running correctly: ``` INFO org.opensearch.dataprepper.pipeline.ProcessWorker - log-pipeline Worker: No records received from buffer diff --git a/_data-prepper/pipelines/configuration/buffers/buffers.md b/_data-prepper/pipelines/configuration/buffers/buffers.md index 287825b549..d21b8b7746 100644 --- a/_data-prepper/pipelines/configuration/buffers/buffers.md +++ b/_data-prepper/pipelines/configuration/buffers/buffers.md @@ -8,8 +8,8 @@ nav_order: 30 # Buffers -The `buffer` component acts as an intermediary layer between the `source` and `sink` components in a Data Prepper pipeline. It serves as temporary storage for events, decoupling the `source` from the downstream processors and sinks. Buffers can be either in-memory or disk based. +The `buffer` component acts as an intermediary layer between the `source` and `sink` components in an OpenSearch Data Prepper pipeline. It serves as temporary storage for events, decoupling the `source` from the downstream processors and sinks. Buffers can be either in-memory or disk based. -If not explicitly specified in the pipeline configuration, Data Prepper uses the default `bounded_blocking` buffer, which is an in-memory queue bounded by the number of events it can store. The `bounded_blocking` buffer is a convenient option when the event volume and processing rates are manageable within the available memory constraints. +If not explicitly specified in the pipeline configuration, OpenSearch Data Prepper uses the default `bounded_blocking` buffer, which is an in-memory queue bounded by the number of events it can store. The `bounded_blocking` buffer is a convenient option when the event volume and processing rates are manageable within the available memory constraints. diff --git a/_data-prepper/pipelines/configuration/buffers/kafka.md b/_data-prepper/pipelines/configuration/buffers/kafka.md index 87600601b4..0152d967d7 100644 --- a/_data-prepper/pipelines/configuration/buffers/kafka.md +++ b/_data-prepper/pipelines/configuration/buffers/kafka.md @@ -59,7 +59,7 @@ Option | Required | Type | Description `name` | Yes | String | The name of the Kafka topic. `group_id` | Yes | String | Sets Kafka's `group.id` option. `workers` | No | Integer | The number of multithreaded consumers associated with each topic. Default is `2`. The maximum value is `200`. -`encryption_key` | No | String | An Advanced Encryption Standard (AES) encryption key used to encrypt and decrypt data within Data Prepper before sending it to Kafka. This value must be plain text or encrypted using AWS Key Management Service (AWS KMS). +`encryption_key` | No | String | An Advanced Encryption Standard (AES) encryption key used to encrypt and decrypt data within OpenSearch Data Prepper before sending it to Kafka. This value must be plain text or encrypted using AWS Key Management Service (AWS KMS). `kms` | No | AWS KMS key | When configured, uses an AWS KMS key to encrypt data. See [`kms`](#kms) for more information. `auto_commit` | No | Boolean | When `false`, the consumer offset will not be periodically committed to Kafka in the background. Default is `false`. `commit_interval` | No | Integer | When `auto_commit` is set to `true`, sets how often, in seconds, the consumer offsets are auto-committed to Kafka through Kafka's `auto.commit.interval.ms` option. Default is `5s`. diff --git a/_data-prepper/pipelines/configuration/processors/aggregate.md b/_data-prepper/pipelines/configuration/processors/aggregate.md index 38b138a996..cf0ef64909 100644 --- a/_data-prepper/pipelines/configuration/processors/aggregate.md +++ b/_data-prepper/pipelines/configuration/processors/aggregate.md @@ -20,7 +20,7 @@ Option | Required | Type | Description identification_keys | Yes | List | An unordered list by which to group events. Events with the same values as these keys are put into the same group. If an event does not contain one of the `identification_keys`, then the value of that key is considered to be equal to `null`. At least one identification_key is required (for example, `["sourceIp", "destinationIp", "port"]`). action | Yes | AggregateAction | The action to be performed on each group. One of the [available aggregate actions](#available-aggregate-actions) must be provided, or you can create custom aggregate actions. `remove_duplicates` and `put_all` are the available actions. For more information, see [Creating New Aggregate Actions](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/aggregate-processor#creating-new-aggregate-actions). group_duration | No | String | The amount of time that a group should exist before it is concluded automatically. Supports ISO_8601 notation strings ("PT20.345S", "PT15M", etc.) as well as simple notation for seconds (`"60s"`) and milliseconds (`"1500ms"`). Default value is `180s`. -local_mode | No | Boolean | When `local_mode` is set to `true`, the aggregation is performed locally on each Data Prepper node instead of forwarding events to a specific node based on the `identification_keys` using a hash function. Default is `false`. +local_mode | No | Boolean | When `local_mode` is set to `true`, the aggregation is performed locally on each OpenSearch Data Prepper node instead of forwarding events to a specific node based on the `identification_keys` using a hash function. Default is `false`. ## Available aggregate actions @@ -31,8 +31,8 @@ Use the following aggregate actions to determine how the `aggregate` processor p The `remove_duplicates` action processes the first event for a group immediately and drops any events that duplicate the first event from the source. For example, when using `identification_keys: ["sourceIp", "destination_ip"]`: 1. The `remove_duplicates` action processes `{ "sourceIp": "127.0.0.1", "destinationIp": "192.168.0.1", "status": 200 }`, the first event in the source. -2. Data Prepper drops the `{ "sourceIp": "127.0.0.1", "destinationIp": "192.168.0.1", "bytes": 1000 }` event because the `sourceIp` and `destinationIp` match the first event in the source. -3. The `remove_duplicates` action processes the next event, `{ "sourceIp": "127.0.0.2", "destinationIp": "192.168.0.1", "bytes": 1000 }`. Because the `sourceIp` is different from the first event of the group, Data Prepper creates a new group based on the event. +2. OpenSearch Data Prepper drops the `{ "sourceIp": "127.0.0.1", "destinationIp": "192.168.0.1", "bytes": 1000 }` event because the `sourceIp` and `destinationIp` match the first event in the source. +3. The `remove_duplicates` action processes the next event, `{ "sourceIp": "127.0.0.2", "destinationIp": "192.168.0.1", "bytes": 1000 }`. Because the `sourceIp` is different from the first event of the group, OpenSearch Data Prepper creates a new group based on the event. ### put_all diff --git a/_data-prepper/pipelines/configuration/processors/anomaly-detector.md b/_data-prepper/pipelines/configuration/processors/anomaly-detector.md index ba574bdf7d..3fae80cb3f 100644 --- a/_data-prepper/pipelines/configuration/processors/anomaly-detector.md +++ b/_data-prepper/pipelines/configuration/processors/anomaly-detector.md @@ -35,7 +35,7 @@ The random cut forest (RCF) ML algorithm is an unsupervised algorithm for detect | :--- | :--- | | `random_cut_forest` | Processes events using the RCF ML algorithm to detect anomalies. | -RCF is an unsupervised ML algorithm for detecting anomalous data points within a dataset. Data Prepper uses RCF to detect anomalies in data by passing the values of the configured key to RCF. For example, when an event with a latency value of 11.5 is sent, the following anomaly event is generated: +RCF is an unsupervised ML algorithm for detecting anomalous data points within a dataset. OpenSearch Data Prepper uses RCF to detect anomalies in data by passing the values of the configured key to RCF. For example, when an event with a latency value of 11.5 is sent, the following anomaly event is generated: ```json diff --git a/_data-prepper/pipelines/configuration/processors/aws-lambda.md b/_data-prepper/pipelines/configuration/processors/aws-lambda.md index bd167996a1..77aac39c80 100644 --- a/_data-prepper/pipelines/configuration/processors/aws-lambda.md +++ b/_data-prepper/pipelines/configuration/processors/aws-lambda.md @@ -6,13 +6,13 @@ grand_parent: Pipelines nav_order: 10 --- -# aws_lambda integration for Data Prepper +# aws_lambda integration for OpenSearch Data Prepper -The [AWS Lambda](https://aws.amazon.com/lambda/) integration allows developers to use serverless computing capabilities within their Data Prepper pipelines for flexible event processing and data routing. +The [AWS Lambda](https://aws.amazon.com/lambda/) integration allows developers to use serverless computing capabilities within their OpenSearch Data Prepper pipelines for flexible event processing and data routing. ## AWS Lambda processor configuration -The `aws_lambda` processor enables invocation of an AWS Lambda function within your Data Prepper pipeline in order to process events. It supports both synchronous and asynchronous invocations based on your use case. +The `aws_lambda` processor enables invocation of an AWS Lambda function within your OpenSearch Data Prepper pipeline in order to process events. It supports both synchronous and asynchronous invocations based on your use case. ## Configuration fields @@ -30,7 +30,7 @@ Field | Type | Required | Description `response_codec` | Object | Optional | A codec configuration for parsing Lambda responses. Default is `json`. `tags_on_match_failure` | List | Optional | A list of tags to add to events when Lambda matching fails or encounters an unexpected error. `sdk_timeout` | Duration| Optional | Configures the SDK's client connection timeout period. Default is `60s`. -`response_events_match` | Boolean | Optional | Specifies how Data Prepper interprets and processes Lambda function responses. Default is `false`. +`response_events_match` | Boolean | Optional | Specifies how OpenSearch Data Prepper interprets and processes Lambda function responses. Default is `false`. #### Example configuration @@ -71,9 +71,9 @@ When configured for batching, the AWS Lambda processor groups multiple events in ## Lambda response handling -The `response_events_match` setting defines how Data Prepper handles the relationship between batch events sent to Lambda and the response received: +The `response_events_match` setting defines how OpenSearch Data Prepper handles the relationship between batch events sent to Lambda and the response received: -- `true`: Lambda returns a JSON array with results for each batched event. Data Prepper maps this array back to its corresponding original event, ensuring that each event in the batch gets the corresponding part of the response from the array. +- `true`: Lambda returns a JSON array with results for each batched event. OpenSearch Data Prepper maps this array back to its corresponding original event, ensuring that each event in the batch gets the corresponding part of the response from the array. - `false`: Lambda returns one or more events for the entire batch. Response events are not correlated with the original events. Original event metadata is not preserved in the response events. For example, when `response_events_match` is set to `true`, the Lambda function is expected to return the same number of response events as the number of original requests, maintaining the original order. ## Limitations @@ -85,7 +85,7 @@ Note the following limitations: ## Integration testing -Integration tests for this plugin are executed separately from the main Data Prepper build process. Use the following Gradle command to run these tests: +Integration tests for this plugin are executed separately from the main OpenSearch Data Prepper build process. Use the following Gradle command to run these tests: ``` ./gradlew :data-prepper-plugins:aws-lambda:integrationTest -Dtests.processor.lambda.region="us-east-1" -Dtests.processor.lambda.functionName="lambda_test_function" -Dtests.processor.lambda.sts_role_arn="arn:aws:iam::123456789012:role/dataprepper-role diff --git a/_data-prepper/pipelines/configuration/processors/convert-entry-type.md b/_data-prepper/pipelines/configuration/processors/convert-entry-type.md index c2c46260ed..c01b10c147 100644 --- a/_data-prepper/pipelines/configuration/processors/convert-entry-type.md +++ b/_data-prepper/pipelines/configuration/processors/convert-entry-type.md @@ -29,7 +29,7 @@ This table is autogenerated. Do not edit it. | `null_values` | No | String representation of what constitutes a `null` value. If the field value equals one of these strings, then the value is considered `null` and is converted to `null`. | | `scale` | No | Modifies the scale of the `big_decimal` when converting to a `big_decimal`. The default value is `0`. | | `tags_on_failure` | No | A list of tags to be added to the event metadata when the event fails to convert. | -| `convert_when` | No | Specifies a condition using a [Data Prepper expression]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/) for performing the `convert_entry_type` operation. If specified, the `convert_entry_type` operation runs only when the expression evaluates to `true`. | +| `convert_when` | No | Specifies a condition using an [OpenSearch Data Prepper expression]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/) for performing the `convert_entry_type` operation. If specified, the `convert_entry_type` operation runs only when the expression evaluates to `true`. | ## Usage @@ -47,7 +47,7 @@ type-conv-pipeline: ``` {% include copy.html %} -Next, create a log file named `logs_json.log` and replace the `path` in the file source of your `pipeline.yaml` file with that filepath. For more information, see [Configuring Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/#2-configuring-data-prepper). +Next, create a log file named `logs_json.log` and replace the `path` in the file source of your `pipeline.yaml` file with that filepath. For more information, see [Configuring OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/#2-configuring-opensearch-data-prepper). For example, before you run the `convert_entry_type` processor, if the `logs_json.log` file contains the following event record: diff --git a/_data-prepper/pipelines/configuration/processors/csv.md b/_data-prepper/pipelines/configuration/processors/csv.md index e386db4bf4..d640b19eb3 100644 --- a/_data-prepper/pipelines/configuration/processors/csv.md +++ b/_data-prepper/pipelines/configuration/processors/csv.md @@ -113,4 +113,4 @@ The `csv` processor includes the following custom metrics. The `csv` processor includes the following counter metrics: -* `csvInvalidEvents`: The number of invalid events, usually caused by an unclosed quotation mark in the event itself. Data Prepper throws an exception when an invalid event is parsed. +* `csvInvalidEvents`: The number of invalid events, usually caused by an unclosed quotation mark in the event itself. OpenSearch Data Prepper throws an exception when an invalid event is parsed. diff --git a/_data-prepper/pipelines/configuration/processors/decompress.md b/_data-prepper/pipelines/configuration/processors/decompress.md index d03c236ac5..5183350ed7 100644 --- a/_data-prepper/pipelines/configuration/processors/decompress.md +++ b/_data-prepper/pipelines/configuration/processors/decompress.md @@ -16,7 +16,7 @@ Option | Required | Type | Description :--- | :--- | :--- | :--- `keys` | Yes | List | The fields in the event that will be decompressed. `type` | Yes | Enum | The type of decompression to use for the `keys` in the event. Only `gzip` is supported. -`decompress_when` | No | String| A [Data Prepper conditional expression](https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/) that determines when the `decompress` processor will run on certain events. +`decompress_when` | No | String| An [OpenSearch Data Prepper conditional expression](https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/) that determines when the `decompress` processor will run on certain events. `tags_on_failure` | No | List | A list of strings with which to tag events when the processor fails to decompress the `keys` inside an event. Defaults to `_decompression_failure`. ## Usage diff --git a/_data-prepper/pipelines/configuration/processors/delete-entries.md b/_data-prepper/pipelines/configuration/processors/delete-entries.md index e7c022c6a7..9894eb8b74 100644 --- a/_data-prepper/pipelines/configuration/processors/delete-entries.md +++ b/_data-prepper/pipelines/configuration/processors/delete-entries.md @@ -41,7 +41,7 @@ pipeline: ``` {% include copy.html %} -Next, create a log file named `logs_json.log` and replace the `path` in the file source of your `pipeline.yaml` file with that filepath. For more information, see [Configuring Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/#2-configuring-data-prepper). +Next, create a log file named `logs_json.log` and replace the `path` in the file source of your `pipeline.yaml` file with that filepath. For more information, see [Configuring OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/#2-configuring-opensearch-data-prepper). For example, before you run the `delete_entries` processor, if the `logs_json.log` file contains the following event record: diff --git a/_data-prepper/pipelines/configuration/processors/dissect.md b/_data-prepper/pipelines/configuration/processors/dissect.md index a8258bee4e..227fa50a9b 100644 --- a/_data-prepper/pipelines/configuration/processors/dissect.md +++ b/_data-prepper/pipelines/configuration/processors/dissect.md @@ -57,7 +57,7 @@ You can configure the `dissect` processor with the following options. | :--- | :--- | :--- | :--- | | `map` | Yes | Map | Defines the `dissect` patterns for specific keys. For details on how to define fields in the `dissect` pattern, see [Field notations](#field-notations). | | `target_types` | No | Map | Specifies the data types for extract fields. Valid options are `integer`, `double`, `string`, and `boolean`. By default, all fields are of the `string` type. | -| `dissect_when` | No | String | Specifies a condition for performing the `dissect` operation using a [Data Prepper expression]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). If specified, the `dissect` operation will only run when the expression evaluates to true. | +| `dissect_when` | No | String | Specifies a condition for performing the `dissect` operation using an [OpenSearch Data Prepper expression]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). If specified, the `dissect` operation will only run when the expression evaluates to true. | ### Field notations diff --git a/_data-prepper/pipelines/configuration/processors/drop-events.md b/_data-prepper/pipelines/configuration/processors/drop-events.md index 1f601c9743..f3e861f2f3 100644 --- a/_data-prepper/pipelines/configuration/processors/drop-events.md +++ b/_data-prepper/pipelines/configuration/processors/drop-events.md @@ -13,7 +13,7 @@ The `drop_events` processor drops all the events that are passed into it. The fo Option | Required | Type | Description :--- | :--- | :--- | :--- -drop_when | Yes | String | Accepts a Data Prepper expression string following the [Data Prepper Expression Syntax]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). Configuring `drop_events` with `drop_when: true` drops all the events received. +drop_when | Yes | String | Accepts an OpenSearch Data Prepper expression string following the [OpenSearch Data Prepper Expression Syntax]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). Configuring `drop_events` with `drop_when: true` drops all the events received. handle_failed_events | No | Enum | Specifies how exceptions are handled when an exception occurs while evaluating an event. Default value is `drop`, which drops the event so that it is not sent to OpenSearch. Available options are `drop`, `drop_silently`, `skip`, and `skip_silently`. For more information, see [handle_failed_events](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/drop-events-processor#handle_failed_events). @@ -207,7 +207,7 @@ To analyze the Jaeger trace data in Dashboards, first set up the trace analytics ### Data sources -You can specify either Data Prepper or Jaeger as the data source when you perform trace analytics. +You can specify either OpenSearch Data Prepper or Jaeger as the data source when you perform trace analytics. From Dashboards, go to **Observability > Trace analytics** and select Jaeger. ![Select data source]({{site.url}}{{site.baseurl}}/images/trace-analytics/select-data.png) diff --git a/_query-dsl/compound/function-score.md b/_query-dsl/compound/function-score.md index b28a6abed6..35acd4e295 100644 --- a/_query-dsl/compound/function-score.md +++ b/_query-dsl/compound/function-score.md @@ -429,7 +429,7 @@ The first two blog posts in the results have a score of 1 because one is at the "_id": "3", "_score": 0.5, "_source": { - "name": "Distributed tracing with Data Prepper", + "name": "Distributed tracing with OpenSearch Data Prepper", "views": 800, "likes": 50, "comments": 5, @@ -511,7 +511,7 @@ In the results, the first blog post was published within one day of 04/24/2022, "_id": "3", "_score": 1, "_source": { - "name": "Distributed tracing with Data Prepper", + "name": "Distributed tracing with OpenSearch Data Prepper", "views": 800, "likes": 50, "comments": 5, @@ -790,7 +790,7 @@ The results contain the three matching blog posts: "_id": "3", "_score": 31.191923, "_source": { - "name": "Distributed tracing with Data Prepper", + "name": "Distributed tracing with OpenSearch Data Prepper", "views": 800, "likes": 50, "comments": 5, diff --git a/_tools/index.md b/_tools/index.md index c9d446a81a..821f611c0c 100644 --- a/_tools/index.md +++ b/_tools/index.md @@ -20,7 +20,7 @@ This section provides documentation for OpenSearch-supported tools, including: - [OpenSearch upgrade, migration, and comparison tools](#opensearch-upgrade-migration-and-comparison-tools) - [Sycamore](#sycamore) for AI-powered extract, transform, load (ETL) on complex documents for vector and hybrid search -For information about Data Prepper, the server-side data collector for filtering, enriching, transforming, normalizing, and aggregating data for downstream analytics and visualization, see [Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/index/). +For information about OpenSearch Data Prepper, the server-side data collector for filtering, enriching, transforming, normalizing, and aggregating data for downstream analytics and visualization, see [OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/index/). ## Agents and ingestion tools