From ea43da8dfa20bcd06161ddf1cc127fb24bf5e254 Mon Sep 17 00:00:00 2001 From: David Venable Date: Thu, 5 Sep 2024 11:33:47 -0500 Subject: [PATCH 01/17] Corrects the Data Prepper roadmap URL. (#8178) The original Data Prepper roadmap was a project linked to the repository. GitHub has been removing these and migrated it to a project linked to the organization instead. With this change on GitHub, the old URL became invalid and now shows a 404 page. Now commit is updating all occurrences of the old URL to use the new URL. Signed-off-by: David Venable --- _data-prepper/common-use-cases/log-analytics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_data-prepper/common-use-cases/log-analytics.md b/_data-prepper/common-use-cases/log-analytics.md index 30a021b101..ceb26ff5b7 100644 --- a/_data-prepper/common-use-cases/log-analytics.md +++ b/_data-prepper/common-use-cases/log-analytics.md @@ -147,6 +147,6 @@ The following is an example `fluent-bit.conf` file with SSL and basic authentica See the [Data Prepper Log Ingestion Demo Guide](https://github.com/opensearch-project/data-prepper/blob/main/examples/log-ingestion/README.md) for a specific example of Apache log ingestion from `FluentBit -> Data Prepper -> OpenSearch` running through Docker. -In the future, Data Prepper will offer additional sources and processors that will make more complex log analytics pipelines available. Check out the [Data Prepper Project Roadmap](https://github.com/opensearch-project/data-prepper/projects/1) to see what is coming. +In the future, Data Prepper will offer additional sources and processors that will make more complex log analytics pipelines available. Check out the [Data Prepper Project Roadmap](https://github.com/orgs/opensearch-project/projects/221) to see what is coming. If there is a specific source, processor, or sink that you would like to include in your log analytics workflow and is not currently on the roadmap, please bring it to our attention by creating a GitHub issue. Additionally, if you are interested in contributing to Data Prepper, see our [Contributing Guidelines](https://github.com/opensearch-project/data-prepper/blob/main/CONTRIBUTING.md) as well as our [developer guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md) and [plugin development guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/plugin_development.md). From 12d82fafd0bd04b1b257901a8654cc286e0d5521 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Thu, 5 Sep 2024 16:26:51 -0400 Subject: [PATCH 02/17] Fix type in query insights documentation (#8182) Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _observing-your-data/query-insights/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_observing-your-data/query-insights/index.md b/_observing-your-data/query-insights/index.md index 549371240f..b929e51491 100644 --- a/_observing-your-data/query-insights/index.md +++ b/_observing-your-data/query-insights/index.md @@ -8,7 +8,7 @@ has_toc: false # Query insights -To monitor and analyze the search queries within your OpenSearch clusterQuery information, you can obtain query insights. With minimal performance impact, query insights features aim to provide comprehensive insights into search query execution, enabling you to better understand search query characteristics, patterns, and system behavior during query execution stages. Query insights facilitate enhanced detection, diagnosis, and prevention of query performance issues, ultimately improving query processing performance, user experience, and overall system resilience. +To monitor and analyze the search queries within your OpenSearch cluster, you can obtain query insights. With minimal performance impact, query insights features aim to provide comprehensive insights into search query execution, enabling you to better understand search query characteristics, patterns, and system behavior during query execution stages. Query insights facilitate enhanced detection, diagnosis, and prevention of query performance issues, ultimately improving query processing performance, user experience, and overall system resilience. Typical use cases for query insights features include the following: From ad0d76ef42ec7404592420aec48cdb53103a30c5 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 5 Sep 2024 16:43:47 -0600 Subject: [PATCH 03/17] Delete graphs and copy edit (#8188) * Delete graphs and copy edit Signed-off-by: Melissa Vagi * Delete graphs and copy edit Signed-off-by: Melissa Vagi * Delete graphs and copy edit Signed-off-by: Melissa Vagi --------- Signed-off-by: Melissa Vagi --- _dashboards/query-workbench.md | 46 +++++++++------------------------- 1 file changed, 12 insertions(+), 34 deletions(-) diff --git a/_dashboards/query-workbench.md b/_dashboards/query-workbench.md index 8fe41afcdf..700d6a7340 100644 --- a/_dashboards/query-workbench.md +++ b/_dashboards/query-workbench.md @@ -8,19 +8,14 @@ redirect_from: # Query Workbench -Query Workbench is a tool within OpenSearch Dashboards. You can use Query Workbench to run on-demand [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/sql/index/) and [PPL]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/index/) queries, translate queries into their equivalent REST API calls, and view and save results in different [response formats]({{site.url}}{{site.baseurl}}/search-plugins/sql/response-formats/). +You can use Query Workbench in OpenSearch Dashboards to run on-demand [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/sql/index/) and [PPL]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/index/) queries, translate queries into their equivalent REST API calls, and view and save results in different [response formats]({{site.url}}{{site.baseurl}}/search-plugins/sql/response-formats/). -A view of the Query Workbench interface within OpenSearch Dashboards is shown in the following image. - -Query Workbench interface within OpenSearch Dashboards - -## Prerequisites - -Before getting started, make sure you have [indexed your data]({{site.url}}{{site.baseurl}}/im-plugin/index/). +Query Workbench does not support delete or update operations through SQL or PPL. Access to data is read-only. +{: .important} -For this tutorial, you can index the following sample documents. Alternatively, you can use the [OpenSearch Playground](https://playground.opensearch.org/app/opensearch-query-workbench#/), which has preloaded indexes that you can use to try out Query Workbench. +## Prerequisites -To index sample documents, send the following [Bulk API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/) request: +Before getting started with this tutorial, index the sample documents by sending the following [Bulk API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/) request: ```json PUT accounts/_bulk?refresh @@ -35,9 +30,11 @@ PUT accounts/_bulk?refresh ``` {% include copy-curl.html %} -## Running SQL queries within Query Workbench +See [Managing indexes]({{site.url}}{{site.baseurl}}/im-plugin/index/) to learn about indexing your own data. -Follow these steps to learn how to run SQL queries against your OpenSearch data using Query Workbench: +## Running SQL queries within Query Workbench + + The following steps guide you through running SQL queries against OpenSearch data: 1. Access Query Workbench. - To access Query Workbench, go to OpenSearch Dashboards and choose **OpenSearch Plugins** > **Query Workbench** from the main menu. @@ -64,23 +61,15 @@ Follow these steps to learn how to run SQL queries against your OpenSearch data 3. View the results. - View the results in the **Results** pane, which presents the query output in tabular format. You can filter and download the results as needed. - The following image shows the query editor pane and results pane for the preceding SQL query: - - Query Workbench SQL query input and results output panes - 4. Clear the query editor. - Select the **Clear** button to clear the query editor and run a new query. 5. Examine how the query is processed. - - Select the **Explain** button to examine how OpenSearch processes the query, including the steps involved and order of operations. - - The following image shows the explanation of the SQL query that was run in step 2. - - Query Workbench SQL query explanation pane + - Select the **Explain** button to examine how OpenSearch processes the query, including the steps involved and order of operations. ## Running PPL queries within Query Workbench -Follow these steps to learn how to run PPL queries against your OpenSearch data using Query Workbench: +Follow these steps to learn how to run PPL queries against OpenSearch data: 1. Access Query Workbench. - To access Query Workbench, go to OpenSearch Dashboards and choose **OpenSearch Plugins** > **Query Workbench** from the main menu. @@ -100,19 +89,8 @@ Follow these steps to learn how to run PPL queries against your OpenSearch data 3. View the results. - View the results in the **Results** pane, which presents the query output in tabular format. - The following image shows the query editor pane and results pane for the PPL query that was run in step 2: - - Query Workbench PPL query input and results output panes - 4. Clear the query editor. - Select the **Clear** button to clear the query editor and run a new query. 5. Examine how the query is processed. - - Select the **Explain** button to examine how OpenSearch processes the query, including the steps involved and order of operations. - - The following image shows the explanation of the PPL query that was run in step 2. - - Query Workbench PPL query explanation pane - -Query Workbench does not support delete or update operations through SQL or PPL. Access to data is read-only. -{: .important} \ No newline at end of file + - Select the **Explain** button to examine how OpenSearch processes the query, including the steps involved and order of operations. From 62a4c18a3ea64f8d0811c2cedcee8fbbe69e5b05 Mon Sep 17 00:00:00 2001 From: jazzl0ver Date: Fri, 6 Sep 2024 22:20:03 +0300 Subject: [PATCH 04/17] user accounts manipulation audit example (#8158) * user accounts manipulation audit example Signed-off-by: jazzl0ver * user accounts manipulation audit example Signed-off-by: jazzl0ver * user accounts manipulation audit example Signed-off-by: jazzl0ver * Update _security/audit-logs/index.md Co-authored-by: Craig Perkins Signed-off-by: jazzl0ver * Update _security/audit-logs/index.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: jazzl0ver Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Craig Perkins Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _security/audit-logs/index.md | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/_security/audit-logs/index.md b/_security/audit-logs/index.md index becb001ec0..8eeea33447 100644 --- a/_security/audit-logs/index.md +++ b/_security/audit-logs/index.md @@ -224,3 +224,36 @@ plugins.security.audit.config.threadpool.max_queue_len: 100000 To disable audit logs after they've been enabled, remove the `plugins.security.audit.type: internal_opensearch` setting from `opensearch.yml`, or switch off the **Enable audit logging** check box in OpenSearch Dashboards. +## Audit user account manipulation + +To enable audit logging on changes to a security index, such as changes to roles mappings and role creation or deletion, use the following settings in the `compliance:` portion of the audit log configuration, as shown in the following example: + +``` +_meta: + type: "audit" + config_version: 2 + +config: + # enable/disable audit logging + enabled: true + + ... + + + compliance: + # enable/disable compliance + enabled: true + + # Log updates to internal security changes + internal_config: true + + # Log only metadata of the document for write events + write_metadata_only: false + + # Log only diffs for document updates + write_log_diffs: true + + # List of indices to watch for write events. Wildcard patterns are supported + # write_watched_indices: ["twitter", "logs-*"] + write_watched_indices: [".opendistro_security"] +``` From b79eed39e9c8ea933d64b80a23caad14ca12941c Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Mon, 9 Sep 2024 15:18:33 -0400 Subject: [PATCH 05/17] Fix heading levels in geoshape query documentation (#8198) * Fix heading levels in geoshape query documentation Signed-off-by: Fanit Kolchina * One more Signed-off-by: Fanit Kolchina --------- Signed-off-by: Fanit Kolchina --- _query-dsl/geo-and-xy/geoshape.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/_query-dsl/geo-and-xy/geoshape.md b/_query-dsl/geo-and-xy/geoshape.md index 42948666f4..8acc691c3a 100644 --- a/_query-dsl/geo-and-xy/geoshape.md +++ b/_query-dsl/geo-and-xy/geoshape.md @@ -25,15 +25,15 @@ Relation | Description | Supporting geographic field type ## Defining the shape in a geoshape query -You can define the shape to filter documents in a geoshape query either by providing a new shape definition at query time or by referencing the name of a shape pre-indexed in another index. +You can define the shape to filter documents in a geoshape query either by [providing a new shape definition at query time](#using-a-new-shape-definition) or by [referencing the name of a shape pre-indexed in another index](#using-a-pre-indexed-shape-definition). -### Using a new shape definition +## Using a new shape definition To provide a new shape to a geoshape query, define it in the `geo_shape` field. You must define the geoshape in [GeoJSON format](https://geojson.org/). The following example illustrates searching for documents containing geoshapes that match a geoshape defined at query time. -#### Step 1: Create an index +### Step 1: Create an index First, create an index and map the `location` field as a `geo_shape`: @@ -422,7 +422,7 @@ GET /testindex/_search Geoshape queries whose geometry collection contains a linestring or a multilinestring do not support the `WITHIN` relation. {: .note} -### Using a pre-indexed shape definition +## Using a pre-indexed shape definition When constructing a geoshape query, you can also reference the name of a shape pre-indexed in another index. Using this method, you can define a geoshape at index time and refer to it by name at search time. From 9435b466de5f1512b25dd5b75fd171e161514048 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Tue, 10 Sep 2024 15:04:18 -0400 Subject: [PATCH 06/17] Remove ODFE color scheme (#8208) Signed-off-by: Fanit Kolchina --- _sass/color_schemes/odfe.scss | 75 ----------------------------------- 1 file changed, 75 deletions(-) delete mode 100644 _sass/color_schemes/odfe.scss diff --git a/_sass/color_schemes/odfe.scss b/_sass/color_schemes/odfe.scss deleted file mode 100644 index f9b2ca02ba..0000000000 --- a/_sass/color_schemes/odfe.scss +++ /dev/null @@ -1,75 +0,0 @@ -// -// Brand colors -// - -$white: #FFFFFF; - -$grey-dk-300: #241F21; // Error -$grey-dk-250: mix(white, $grey-dk-300, 12.5%); -$grey-dk-200: mix(white, $grey-dk-300, 25%); -$grey-dk-100: mix(white, $grey-dk-300, 50%); -$grey-dk-000: mix(white, $grey-dk-300, 75%); - -$grey-lt-300: #DBDBDB; // Cloud -$grey-lt-200: mix(white, $grey-lt-300, 25%); -$grey-lt-100: mix(white, $grey-lt-300, 50%); -$grey-lt-000: mix(white, $grey-lt-300, 75%); - -$blue-300: #00007C; // Meta -$blue-200: mix(white, $blue-300, 25%); -$blue-100: mix(white, $blue-300, 50%); -$blue-000: mix(white, $blue-300, 75%); - -$purple-300: #9600FF; // Prpl -$purple-200: mix(white, $purple-300, 25%); -$purple-100: mix(white, $purple-300, 50%); -$purple-000: mix(white, $purple-300, 75%); - -$green-300: #00671A; // Element -$green-200: mix(white, $green-300, 25%); -$green-100: mix(white, $green-300, 50%); -$green-000: mix(white, $green-300, 75%); - -$yellow-300: #FFDF00; // Kan-Banana -$yellow-200: mix(white, $yellow-300, 25%); -$yellow-100: mix(white, $yellow-300, 50%); -$yellow-000: mix(white, $yellow-300, 75%); - -$red-300: #BD145A; // Ruby -$red-200: mix(white, $red-300, 25%); -$red-100: mix(white, $red-300, 50%); -$red-000: mix(white, $red-300, 75%); - -$blue-lt-300: #0000FF; // Cascade -$blue-lt-200: mix(white, $blue-lt-300, 25%); -$blue-lt-100: mix(white, $blue-lt-300, 50%); -$blue-lt-000: mix(white, $blue-lt-300, 75%); - -/* -Other, unused brand colors - -Float #2797F4 -Firewall #0FF006B -Hyper Pink #F261A1 -Cluster #ED20EB -Back End #808080 -Python #25EE5C -Warm Node #FEA501 -*/ - -$body-background-color: $white; -$sidebar-color: $grey-lt-000; -$code-background-color: $grey-lt-000; - -$body-text-color: $grey-dk-200; -$body-heading-color: $grey-dk-300; -$nav-child-link-color: $grey-dk-200; -$link-color: mix(black, $blue-lt-300, 37.5%); -$btn-primary-color: $purple-300; -$base-button-color: $grey-lt-000; - -// $border-color: $grey-dk-200; -// $search-result-preview-color: $grey-dk-000; -// $search-background-color: $grey-dk-250; -// $table-background-color: $grey-dk-250; -// $feedback-color: darken($sidebar-color, 3%); From 41f62e09e8c0576af0f40df06714fc505b3747b7 Mon Sep 17 00:00:00 2001 From: anand kumar rai Date: Wed, 11 Sep 2024 20:19:54 +0530 Subject: [PATCH 07/17] Add documentation for max_number_processors (#8157) * Add documentation for max_number_processors Signed-off-by: Rai * Refined the documentation Signed-off-by: Rai * Doc review Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/index-processors.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/index-processors.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --------- Signed-off-by: Rai Signed-off-by: Melissa Vagi Co-authored-by: Rai Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower --- _ingest-pipelines/processors/index-processors.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/_ingest-pipelines/processors/index-processors.md b/_ingest-pipelines/processors/index-processors.md index 0e1ee1e114..9628a16728 100644 --- a/_ingest-pipelines/processors/index-processors.md +++ b/_ingest-pipelines/processors/index-processors.md @@ -69,6 +69,12 @@ Processor type | Description `urldecode` | Decodes a string from URL-encoded format. `user_agent` | Extracts details from the user agent sent by a browser to its web requests. +## Processor limit settings + +You can limit the number of ingest processors using the cluster setting `cluster.ingest.max_number_processors`. The total number of processors includes both the number of processors and the number of [`on_failure`]({{site.url}}{{site.baseurl}}/ingest-pipelines/pipeline-failures/) processors. + +The default value for `cluster.ingest.max_number_processors` is `Integer.MAX_VALUE`. Adding a higher number of processors than the value configured in `cluster.ingest.max_number_processors` will throw an `IllegalStateException`. + ## Batch-enabled processors Some processors support batch ingestion---they can process multiple documents at the same time as a batch. These batch-enabled processors usually provide better performance when using batch processing. For batch processing, use the [Bulk API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/) and provide a `batch_size` parameter. All batch-enabled processors have a batch mode and a single-document mode. When you ingest documents using the `PUT` method, the processor functions in single-document mode and processes documents in series. Currently, only the `text_embedding` and `sparse_encoding` processors are batch enabled. All other processors process documents one at a time. From ce3b0fecb049d427f2057e0f9c44291109efaf8c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C4=90=E1=BB=97=20Tr=E1=BB=8Dng=20H=E1=BA=A3i?= <41283691+hainenber@users.noreply.github.com> Date: Wed, 11 Sep 2024 22:54:33 +0700 Subject: [PATCH 08/17] Allow copy as curl for Query DSL example in "Updating documents" section (#8213) Signed-off-by: hainenber --- _getting-started/communicate.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_getting-started/communicate.md b/_getting-started/communicate.md index 9960f63b2c..3472270c30 100644 --- a/_getting-started/communicate.md +++ b/_getting-started/communicate.md @@ -200,7 +200,7 @@ PUT /students/_doc/1 "address": "123 Main St." } ``` -{% include copy.html %} +{% include copy-curl.html %} Alternatively, you can update parts of a document by calling the Update Document API: From d91e281d15090e1d6e79917454bc8620f9508a08 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Wed, 11 Sep 2024 12:32:30 -0400 Subject: [PATCH 09/17] Explicitly insert text that links PR with issue in PR template (#8218) Signed-off-by: Fanit Kolchina --- .github/PULL_REQUEST_TEMPLATE.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 21b6fbfea6..fd4213b7e5 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -2,7 +2,7 @@ _Describe what this change achieves._ ### Issues Resolved -_List any issues this PR will resolve, e.g. Closes [...]._ +Closes #[_insert issue number_] ### Version _List the OpenSearch version to which this PR applies, e.g. 2.14, 2.12--2.14, or all._ From 1c3e4361dd9a9d9436aa2668c8b8abcd8fe619c0 Mon Sep 17 00:00:00 2001 From: Ganesh Krishna Ramadurai Date: Wed, 11 Sep 2024 09:57:08 -0700 Subject: [PATCH 10/17] Doc update for concurrent search (#8181) * Doc update for concurrent search Signed-off-by: Ganesh Ramadurai * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Ganesh Krishna Ramadurai * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Ganesh Ramadurai Signed-off-by: Ganesh Krishna Ramadurai Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- _search-plugins/concurrent-segment-search.md | 97 ++++++++++++++++++-- 1 file changed, 91 insertions(+), 6 deletions(-) diff --git a/_search-plugins/concurrent-segment-search.md b/_search-plugins/concurrent-segment-search.md index cbbb993ac9..80614e2fff 100644 --- a/_search-plugins/concurrent-segment-search.md +++ b/_search-plugins/concurrent-segment-search.md @@ -22,6 +22,8 @@ Without concurrent segment search, Lucene executes a request sequentially across ## Enabling concurrent segment search at the index or cluster level +Starting with OpenSearch version 2.17, you can use the `search.concurrent_segment_search.mode` setting to configure concurrent segment search on your cluster. The existing `search.concurrent_segment_search.enabled` setting will be deprecated in future version releases in favor of the new setting. + By default, concurrent segment search is disabled on the cluster. You can enable concurrent segment search at two levels: - Cluster level @@ -30,8 +32,37 @@ By default, concurrent segment search is disabled on the cluster. You can enable The index-level setting takes priority over the cluster-level setting. Thus, if the cluster setting is enabled but the index setting is disabled, then concurrent segment search will be disabled for that index. Because of this, the index-level setting is not evaluated unless it is explicitly set, regardless of the default value configured for the setting. You can retrieve the current value of the index-level setting by calling the [Index Settings API]({{site.url}}{{site.baseurl}}/api-reference/index-apis/get-settings/) and omitting the `?include_defaults` query parameter. {: .note} -To enable concurrent segment search for all indexes in the cluster, set the following dynamic cluster setting: +Both the cluster- and index-level `search.concurrent_segment_search.mode` settings accept the following values: + +- `all`: Enables concurrent segment search across all search requests. This is equivalent to setting `search.concurrent_segment_search.enabled` to `true`. + +- `none`: Disables concurrent segment search for all search requests, effectively turning off the feature. This is equivalent to setting `search.concurrent_segment_search.enabled` to `false`. This is the **default** behavior. + +- `auto`: In this mode, OpenSearch will use the pluggable _concurrent search decider_ to decide whether to use a concurrent or sequential path for the search request based on the query evaluation and the presence of aggregations in the request. By default, if there are no deciders configured by any plugin, then the decision to use concurrent search will be made based on the presence of aggregations in the request. For more information about the pluggable decider semantics, see [Pluggable concurrent search deciders](#pluggable-concurrent-search-deciders-concurrentsearchrequestdecider). + +To enable concurrent segment search for all search requests across every index in the cluster, send the following request: +```json +PUT _cluster/settings +{ + "persistent":{ + "search.concurrent_segment_search.mode": "all" + } +} +``` +{% include copy-curl.html %} + +To enable concurrent segment search for all search requests on a particular index, specify the index name in the endpoint: + +```json +PUT /_settings +{ + "index.search.concurrent_segment_search.mode": "all" +} +``` +{% include copy-curl.html %} + +You can continue to use the existing `search.concurrent_segment_search.enabled` setting to enable concurrent segment search for all indexes in the cluster as follows: ```json PUT _cluster/settings { @@ -52,6 +83,35 @@ PUT /_settings ``` {% include copy-curl.html %} + +When evaluating whether concurrent segment search is enabled on a cluster, the `search.concurrent_segment_search.mode` setting takes precedence over the `search.concurrent_segment_search.enabled` setting. +If the `search.concurrent_segment_search.mode` setting is not explicitly set, then the `search.concurrent_segment_search.enabled` setting will be evaluated to determine whether to enable concurrent segment search. + +When upgrading a cluster from an earlier version that specifies the older `search.concurrent_segment_search.enabled` setting, this setting will continue to be honored. However, once the `search.concurrent_segment_search.mode` is set, it will override the previous setting, enabling or disabling concurrent search based on the specified mode. +We recommend setting `search.concurrent_segment_search.enabled` to `null` on your cluster once you configure `search.concurrent_segment_search.mode`: + +```json +PUT _cluster/settings +{ + "persistent":{ + "search.concurrent_segment_search.enabled": null + } +} +``` +{% include copy-curl.html %} + +To disable the old setting for a particular index, specify the index name in the endpoint: +```json +PUT /_settings +{ + "index.search.concurrent_segment_search.enabled": null +} +``` +{% include copy-curl.html %} + + + + ## Slicing mechanisms You can choose one of two available mechanisms for assigning segments to slices: the default [Lucene mechanism](#the-lucene-mechanism) or the [max slice count mechanism](#the-max-slice-count-mechanism). @@ -66,7 +126,10 @@ The _max slice count_ mechanism is an alternative slicing mechanism that uses a ### Setting the slicing mechanism -By default, concurrent segment search uses the Lucene mechanism to calculate the number of slices for each shard-level request. To use the max slice count mechanism instead, configure the `search.concurrent.max_slice_count` cluster setting: +By default, concurrent segment search uses the Lucene mechanism to calculate the number of slices for each shard-level request. +To use the max slice count mechanism instead, you can set the slice count for concurrent segment search at either the cluster level or index level. + +To configure the slice count for all indexes in a cluster, use the following dynamic cluster setting: ```json PUT _cluster/settings @@ -78,7 +141,17 @@ PUT _cluster/settings ``` {% include copy-curl.html %} -The `search.concurrent.max_slice_count` setting can take the following valid values: +To configure the slice count for a particular index, specify the index name in the endpoint: + +```json +PUT /_settings +{ + "index.search.concurrent.max_slice_count": 2 +} +``` +{% include copy-curl.html %} + +Both the cluster- and index-level `search.concurrent.max_slice_count` settings can take the following valid values: - `0`: Use the default Lucene mechanism. - Positive integer: Use the max target slice count mechanism. Usually, a value between 2 and 8 should be sufficient. @@ -117,8 +190,20 @@ Non-concurrent search calculates the document count error and returns it in the For more information about how `shard_size` can affect both `doc_count_error_upper_bound` and collected buckets, see [this GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/11680#issuecomment-1885882985). -## Developer information: AggregatorFactory changes +## Developer information + +The following sections provide additional information for developers. + +### AggregatorFactory changes + +Because of implementation details, not all aggregator types can support concurrent segment search. To accommodate this, we have introduced a [`supportsConcurrentSegmentSearch()`](https://github.com/opensearch-project/OpenSearch/blob/2.x/server/src/main/java/org/opensearch/search/aggregations/AggregatorFactory.java#L123) method in the `AggregatorFactory` class to indicate whether a given aggregation type supports concurrent segment search. By default, this method returns `false`. Any aggregator that needs to support concurrent segment search must override this method in its own factory implementation. + +To ensure that a custom plugin-based `Aggregator` implementation functions with the concurrent search path, plugin developers can verify their implementation with concurrent search enabled and then update the plugin to override the [`supportsConcurrentSegmentSearch()`](https://github.com/opensearch-project/OpenSearch/blob/2.x/server/src/main/java/org/opensearch/search/aggregations/AggregatorFactory.java#L123) method to return `true`. + +### Pluggable concurrent search deciders: ConcurrentSearchRequestDecider -Because of implementation details, not all aggregator types can support concurrent segment search. To accommodate this, we have introduced a [`supportsConcurrentSegmentSearch()`](https://github.com/opensearch-project/OpenSearch/blob/bb38ed4836496ac70258c2472668325a012ea3ed/server/src/main/java/org/opensearch/search/aggregations/AggregatorFactory.java#L121) method in the `AggregatorFactory` class to indicate whether a given aggregation type supports concurrent segment search. By default, this method returns `false`. Any aggregator that needs to support concurrent segment search must override this method in its own factory implementation. +Introduced 2.17 +{: .label .label-purple } -To ensure that a custom plugin-based `Aggregator` implementation works with the concurrent search path, plugin developers can verify their implementation with concurrent search enabled and then update the plugin to override the [`supportsConcurrentSegmentSearch()`](https://github.com/opensearch-project/OpenSearch/blob/bb38ed4836496ac70258c2472668325a012ea3ed/server/src/main/java/org/opensearch/search/aggregations/AggregatorFactory.java#L121) method to return `true`. +Plugin developers can customize the concurrent search decision-making for `auto` mode by extending [`ConcurrentSearchRequestDecider`](https://github.com/opensearch-project/OpenSearch/blob/2.x/server/src/main/java/org/opensearch/search/deciders/ConcurrentSearchRequestDecider.java) and registering its factory through [`SearchPlugin#getConcurrentSearchRequestFactories()`](https://github.com/opensearch-project/OpenSearch/blob/2.x/server/src/main/java/org/opensearch/plugins/SearchPlugin.java#L148). The deciders are evaluated only if a request does not belong to any category listed in the [Limitations](#limitations) and [Other considerations](#other-considerations) sections. For more information about the decider implementation, see [the corresponding GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/15259). +The search request is parsed using a `QueryBuilderVisitor`, which calls the [`ConcurrentSearchRequestDecider#evaluateForQuery()`](https://github.com/opensearch-project/OpenSearch/blob/2.x/server/src/main/java/org/opensearch/search/deciders/ConcurrentSearchRequestDecider.java#L36) method of all the configured deciders for every node of the `QueryBuilder` tree in the search request. The final concurrent search decision is obtained by combining the decision from each decider returned by the [`ConcurrentSearchRequestDecider#getConcurrentSearchDecision()`](https://github.com/opensearch-project/OpenSearch/blob/2.x/server/src/main/java/org/opensearch/search/deciders/ConcurrentSearchRequestDecider.java#L44) method. \ No newline at end of file From 632e8f2de0fbb7d357bb4e96bcd6caa5c9a395ac Mon Sep 17 00:00:00 2001 From: Sooraj Sinha <81695996+soosinha@users.noreply.github.com> Date: Wed, 11 Sep 2024 22:32:25 +0530 Subject: [PATCH 11/17] Add new settings for remote publication (#8176) * Add new settings for remote publication Signed-off-by: Sooraj Sinha * Update remote-cluster-state.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * remove redundant lines Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Sooraj Sinha Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../remote-store/remote-cluster-state.md | 21 ++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/_tuning-your-cluster/availability-and-recovery/remote-store/remote-cluster-state.md b/_tuning-your-cluster/availability-and-recovery/remote-store/remote-cluster-state.md index d967aca914..03cd1716f0 100644 --- a/_tuning-your-cluster/availability-and-recovery/remote-store/remote-cluster-state.md +++ b/_tuning-your-cluster/availability-and-recovery/remote-store/remote-cluster-state.md @@ -67,10 +67,14 @@ The remote cluster state functionality has the following limitations: ## Remote cluster state publication - The cluster manager node processes updates to the cluster state. It then publishes the updated cluster state through the local transport layer to all of the follower nodes. With the `remote_store.publication` feature enabled, the cluster state is backed up to the remote store during every state update. The follower nodes can then fetch the state from the remote store directly, which reduces the overhead on the cluster manager node for publication. -To enable the feature flag for the `remote_store.publication` feature, follow the steps in the [experimental feature flag documentation]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/). +To enable this feature, configure the following setting in `opensearch.yml`: + +```yml +# Enable Remote cluster state publication +cluster.remote_store.publication.enabled: true +``` Enabling the setting does not change the publication flow, and follower nodes will not send acknowledgements back to the cluster manager node until they download the updated cluster state from the remote store. @@ -89,8 +93,11 @@ You do not have to use different remote store repositories for state and routing To configure remote publication, use the following cluster settings. -Setting | Default | Description -:--- | :--- | :--- -`cluster.remote_store.state.read_timeout` | 20s | The amount of time to wait for remote state download to complete on the follower node. -`cluster.remote_store.routing_table.path_type` | HASHED_PREFIX | The path type to be used for creating an index routing path in the blob store. Valid values are `FIXED`, `HASHED_PREFIX`, and `HASHED_INFIX`. -`cluster.remote_store.routing_table.path_hash_algo` | FNV_1A_BASE64 | The algorithm to be used for constructing the prefix or infix of the blob store path. This setting is applied if `cluster.remote_store.routing_table.path_type` is `hashed_prefix` or `hashed_infix`. Valid algorithm values are `FNV_1A_BASE64` and `FNV_1A_COMPOSITE_1`. +Setting | Default | Description +:--- |:---| :--- +`cluster.remote_store.state.read_timeout` | 20s | The amount of time to wait for the remote state download to complete on the follower node. +`cluster.remote_store.state.path.prefix` | "" (Empty string) | The fixed prefix to add to the index metadata files in the blob store. +`cluster.remote_store.index_metadata.path_type` | `HASHED_PREFIX` | The path type used for creating an index metadata path in the blob store. Valid values are `FIXED`, `HASHED_PREFIX`, and `HASHED_INFIX`. +`cluster.remote_store.index_metadata.path_hash_algo` | `FNV_1A_BASE64 ` | The algorithm that constructs the prefix or infix for the index metadata path in the blob store. This setting is applied if the ``cluster.remote_store.index_metadata.path_type` setting is `HASHED_PREFIX` or `HASHED_INFIX`. Valid algorithm values are `FNV_1A_BASE64` and `FNV_1A_COMPOSITE_1`. +`cluster.remote_store.routing_table.path.prefix` | "" (Empty string) | The fixed prefix to add for the index routing files in the blob store. + From 9b609c6146eb03811ecb209b02715fd513726be1 Mon Sep 17 00:00:00 2001 From: Anshu Agarwal Date: Wed, 11 Sep 2024 22:32:35 +0530 Subject: [PATCH 12/17] Add documentation changes for shallow snapshot v2 (#8207) * Add documentation changes for shallow snapshot Signed-off-by: Anshu Agarwal * Update create-repository.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update snapshot-interoperability.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Anshu Agarwal Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Anshu Agarwal Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Nathan Bower --- _api-reference/snapshots/create-repository.md | 4 +- _api-reference/snapshots/create-snapshot.md | 3 +- .../remote-store/snapshot-interoperability.md | 42 ++++++++++++++++++- 3 files changed, 46 insertions(+), 3 deletions(-) diff --git a/_api-reference/snapshots/create-repository.md b/_api-reference/snapshots/create-repository.md index ca4c04114c..367aa3606a 100644 --- a/_api-reference/snapshots/create-repository.md +++ b/_api-reference/snapshots/create-repository.md @@ -38,7 +38,7 @@ Request parameters depend on the type of repository: `fs` or `s3`. ### Common parameters -The following table lists parameters that can be used with both the `fs` and `s3` repositories. +The following table lists parameters that can be used with both the `fs` and `s3` repositories. Request field | Description :--- | :--- @@ -54,6 +54,7 @@ Request field | Description `max_restore_bytes_per_sec` | The maximum rate at which snapshots restore. Default is 40 MB per second (`40m`). Optional. `max_snapshot_bytes_per_sec` | The maximum rate at which snapshots take. Default is 40 MB per second (`40m`). Optional. `remote_store_index_shallow_copy` | Boolean | Determines whether the snapshot of the remote store indexes are captured as a shallow copy. Default is `false`. +`shallow_snapshot_v2` | Boolean | Determines whether the snapshots of the remote store indexes are captured as a [shallow copy v2]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability/#shallow-snapshot-v2). Default is `false`. `readonly` | Whether the repository is read-only. Useful when migrating from one cluster (`"readonly": false` when registering) to another cluster (`"readonly": true` when registering). Optional. @@ -73,6 +74,7 @@ Request field | Description `max_snapshot_bytes_per_sec` | The maximum rate at which snapshots take. Default is 40 MB per second (`40m`). Optional. `readonly` | Whether the repository is read-only. Useful when migrating from one cluster (`"readonly": false` when registering) to another cluster (`"readonly": true` when registering). Optional. `remote_store_index_shallow_copy` | Boolean | Whether the snapshot of the remote store indexes is captured as a shallow copy. Default is `false`. +`shallow_snapshot_v2` | Boolean | Determines whether the snapshots of the remote store indexes are captured as a [shallow copy v2]([shallow copy v2]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability/#shallow-snapshot-v2). Default is `false`. `server_side_encryption` | Whether to encrypt snapshot files in the S3 bucket. This setting uses AES-256 with S3-managed keys. See [Protecting data using server-side encryption](https://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html). Default is `false`. Optional. `storage_class` | Specifies the [S3 storage class](https://docs.aws.amazon.com/AmazonS3/latest/dev/storage-class-intro.html) for the snapshots files. Default is `standard`. Do not use the `glacier` and `deep_archive` storage classes. Optional. diff --git a/_api-reference/snapshots/create-snapshot.md b/_api-reference/snapshots/create-snapshot.md index d4c9ef8219..b35d1a1d0c 100644 --- a/_api-reference/snapshots/create-snapshot.md +++ b/_api-reference/snapshots/create-snapshot.md @@ -144,4 +144,5 @@ The snapshot definition is returned. | failures | array | Failures, if any, that occured during snapshot creation. | | shards | object | Total number of shards created along with number of successful and failed shards. | | state | string | Snapshot status. Possible values: `IN_PROGRESS`, `SUCCESS`, `FAILED`, `PARTIAL`. | -| remote_store_index_shallow_copy | Boolean | Whether the snapshot of the remote store indexes is captured as a shallow copy. Default is `false`. | \ No newline at end of file +| remote_store_index_shallow_copy | Boolean | Whether the snapshots of the remote store indexes is captured as a shallow copy. Default is `false`. | +| pinned_timestamp | long | A timestamp (in milliseconds) pinned by the snapshot for the implicit locking of remote store files referenced by the snapshot. | \ No newline at end of file diff --git a/_tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability.md b/_tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability.md index 0415af65f1..e93f504be3 100644 --- a/_tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability.md +++ b/_tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability.md @@ -27,7 +27,7 @@ PUT /_snapshot/snap_repo ``` {% include copy-curl.html %} -Once enabled, all requests using the [Snapshot API]({{site.url}}{{site.baseurl}}/api-reference/snapshots/index/) will remain the same for all snapshots. After the setting is enabled, we recommend not disabling the setting. Doing so could affect data durability. +Once enabled, all requests using the [Snapshot API]({{site.url}}{{site.baseurl}}/api-reference/snapshots/index/) will remain the same for all snapshots. Therefore, do not disable the shallow snapshot setting after it has been enabled because disabling the setting could affect data durability. ## Considerations @@ -37,3 +37,43 @@ Consider the following before using shallow copy snapshots: - All nodes in the cluster must use OpenSearch 2.10 or later to take advantage of shallow copy snapshots. - The `incremental` file count and size between the current snapshot and the last snapshot is `0` when using shallow copy snapshots. - Searchable snapshots are not supported inside shallow copy snapshots. + +## Shallow snapshot v2 + +Starting with OpenSearch 2.17, the shallow snapshot feature offers an improved version called `shallow snapshot v2`, which aims to makes snapshot operations more efficient and scalable by introducing the following enhancements: + +* Deterministic snapshot operations: Shallow snapshot v2 makes snapshot operations more deterministic, ensuring consistent and predictable behavior. +* Minimized cluster state updates: Shallow snapshot v2 minimizes the number of cluster state updates required during snapshot operations, reducing overhead and improving performance. +* Scalability: Shallow snapshot v2 allows snapshot operations to scale independently of the number of shards in the cluster, enabling better performance and efficiency for large datasets. + +Shallow snapshot v2 must be enabled separately from shallow copies. + +### Enabling shallow snapshot v2 + +To enable shallow snapshot v2, enable the following repository settings: + +- `remote_store_index_shallow_copy: true` +- `shallow_snapshot_v2: true` + +The following example request creates a shallow snapshot v2 repository: + +```bash +PUT /_snapshot/snap_repo +{ +"type": "s3", +"settings": { +"bucket": "test-bucket", +"base_path": "daily-snaps", +"remote_store_index_shallow_copy": true, +"shallow_snapshot_v2": true +} +} +``` +{% include copy-curl.html %} + +### Limitations + +Shallow snapshot v2 has the following limitations: + +* Shallow snapshot v2 only supported for remote-backed indexes. +* All nodes in the cluster must use OpenSearch 2.17 or later to take advantage of shallow snapshot v2. From b41858a146abd7fa8f248f9457d3e8dee07ffba6 Mon Sep 17 00:00:00 2001 From: AntonEliatra Date: Wed, 11 Sep 2024 18:08:11 +0100 Subject: [PATCH 13/17] Add Ascii folding token filter (#7912) * adding asciifolding token filter page #7873 Signed-off-by: AntonEliatra * updating the naming Signed-off-by: AntonEliatra * updating as per PR comments Signed-off-by: AntonEliatra * updating the heading Signed-off-by: AntonEliatra * Updating details as per comments Signed-off-by: AntonEliatra * Updating details as per comments Signed-off-by: AntonEliatra * Updating details as per comments Signed-off-by: AntonEliatra * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: AntonEliatra * updating as per comments Signed-off-by: Anton Rubin * Apply suggestions from code review Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: AntonEliatra * Update asciifolding.md Signed-off-by: AntonEliatra --------- Signed-off-by: AntonEliatra Signed-off-by: Anton Rubin Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- _analyzers/token-filters/asciifolding.md | 135 +++++++++++++++++++++++ _analyzers/token-filters/index.md | 2 +- 2 files changed, 136 insertions(+), 1 deletion(-) create mode 100644 _analyzers/token-filters/asciifolding.md diff --git a/_analyzers/token-filters/asciifolding.md b/_analyzers/token-filters/asciifolding.md new file mode 100644 index 0000000000..d572251988 --- /dev/null +++ b/_analyzers/token-filters/asciifolding.md @@ -0,0 +1,135 @@ +--- +layout: default +title: ASCII folding +parent: Token filters +nav_order: 20 +--- + +# ASCII folding token filter + +The `asciifolding` token filter converts non-ASCII characters to their closest ASCII equivalents. For example, *é* becomes *e*, *ü* becomes *u*, and *ñ* becomes *n*. This process is known as *transliteration*. + + +The `asciifolding` token filter offers a number of benefits: + + - **Enhanced search flexibility**: Users often omit accents or special characters when entering queries. The `asciifolding` token filter ensures that such queries still return relevant results. + - **Normalization**: Standardizes the indexing process by ensuring that accented characters are consistently converted to their ASCII equivalents. + - **Internationalization**: Particularly useful for applications including multiple languages and character sets. + +While the `asciifolding` token filter can simplify searches, it may also lead to the loss of specific information, particularly if the distinction between accented and non-accented characters in the dataset is significant. +{: .warning} + +## Parameters + +You can configure the `asciifolding` token filter using the `preserve_original` parameter. Setting this parameter to `true` keeps both the original token and its ASCII-folded version in the token stream. This can be particularly useful when you want to match both the original (with accents) and the normalized (without accents) versions of a term in a search query. Default is `false`. + +## Example + +The following example request creates a new index named `example_index` and defines an analyzer with the `asciifolding` filter and `preserve_original` parameter set to `true`: + +```json +PUT /example_index +{ + "settings": { + "analysis": { + "filter": { + "custom_ascii_folding": { + "type": "asciifolding", + "preserve_original": true + } + }, + "analyzer": { + "custom_ascii_analyzer": { + "type": "custom", + "tokenizer": "standard", + "filter": [ + "lowercase", + "custom_ascii_folding" + ] + } + } + } + } +} +``` +{% include copy-curl.html %} + +## Generated tokens + +Use the following request to examine the tokens generated using the analyzer: + +```json +POST /example_index/_analyze +{ + "analyzer": "custom_ascii_analyzer", + "text": "Résumé café naïve coördinate" +} +``` +{% include copy-curl.html %} + +The response contains the generated tokens: + +```json +{ + "tokens": [ + { + "token": "resume", + "start_offset": 0, + "end_offset": 6, + "type": "", + "position": 0 + }, + { + "token": "résumé", + "start_offset": 0, + "end_offset": 6, + "type": "", + "position": 0 + }, + { + "token": "cafe", + "start_offset": 7, + "end_offset": 11, + "type": "", + "position": 1 + }, + { + "token": "café", + "start_offset": 7, + "end_offset": 11, + "type": "", + "position": 1 + }, + { + "token": "naive", + "start_offset": 12, + "end_offset": 17, + "type": "", + "position": 2 + }, + { + "token": "naïve", + "start_offset": 12, + "end_offset": 17, + "type": "", + "position": 2 + }, + { + "token": "coordinate", + "start_offset": 18, + "end_offset": 28, + "type": "", + "position": 3 + }, + { + "token": "coördinate", + "start_offset": 18, + "end_offset": 28, + "type": "", + "position": 3 + } + ] +} +``` + + diff --git a/_analyzers/token-filters/index.md b/_analyzers/token-filters/index.md index f4e9c434e7..a9b621d5ab 100644 --- a/_analyzers/token-filters/index.md +++ b/_analyzers/token-filters/index.md @@ -14,7 +14,7 @@ The following table lists all token filters that OpenSearch supports. Token filter | Underlying Lucene token filter| Description [`apostrophe`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/apostrophe/) | [ApostropheFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/tr/ApostropheFilter.html) | In each token containing an apostrophe, the `apostrophe` token filter removes the apostrophe itself and all characters following it. -`asciifolding` | [ASCIIFoldingFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html) | Converts alphabetic, numeric, and symbolic characters. +[`asciifolding`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/asciifolding/) | [ASCIIFoldingFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html) | Converts alphabetic, numeric, and symbolic characters. `cjk_bigram` | [CJKBigramFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/cjk/CJKBigramFilter.html) | Forms bigrams of Chinese, Japanese, and Korean (CJK) tokens. `cjk_width` | [CJKWidthFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/cjk/CJKWidthFilter.html) | Normalizes Chinese, Japanese, and Korean (CJK) tokens according to the following rules:
- Folds full-width ASCII character variants into the equivalent basic Latin characters.
- Folds half-width Katakana character variants into the equivalent Kana characters. `classic` | [ClassicFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/classic/ClassicFilter.html) | Performs optional post-processing on the tokens generated by the classic tokenizer. Removes possessives (`'s`) and removes `.` from acronyms. From 9bc06e46376ea29cc7015e07a091333e583adb97 Mon Sep 17 00:00:00 2001 From: Ashish Singh Date: Wed, 11 Sep 2024 22:38:27 +0530 Subject: [PATCH 14/17] Create documentation for snapshots with hashed prefix path type (#8196) * Create documentation for snapshots with hashed prefix path type Signed-off-by: Ashish Singh * Add documentation on new cluster settings for fixed prefix Signed-off-by: Ashish Singh * Update create-repository.md * Update create-repository.md * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Ashish Singh Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Nathan Bower --- _api-reference/snapshots/create-repository.md | 9 +++++++++ .../configuring-opensearch/index-settings.md | 6 ++++++ 2 files changed, 15 insertions(+) diff --git a/_api-reference/snapshots/create-repository.md b/_api-reference/snapshots/create-repository.md index 367aa3606a..34e2ea8376 100644 --- a/_api-reference/snapshots/create-repository.md +++ b/_api-reference/snapshots/create-repository.md @@ -43,6 +43,15 @@ The following table lists parameters that can be used with both the `fs` and `s3 Request field | Description :--- | :--- `prefix_mode_verification` | When enabled, adds a hashed value of a random seed to the prefix for repository verification. For remote-store-enabled clusters, you can add the `setting.prefix_mode_verification` setting to the node attributes for the supplied repository. This field works with both new and existing repositories. Optional. +`shard_path_type` | Controls the path structure of shard-level blobs. Supported values are `FIXED`, `HASHED_PREFIX`, and `HASHED_INFIX`. For more information about each value, see [shard_path_type values](#shard_path_type-values)/. Default is `FIXED`. Optional. + +#### shard_path_type values + +The following values are supported in the `shard_path_type` setting: + +- `FIXED`: Keeps the path structure in the existing hierarchical manner, such as `//indices//0/`. +- `HASHED_PREFIX`: Prepends a hashed prefix at the start of the path for each unique shard ID, for example, `///indices//0/`. +- `HASHED_INFIX`: Appends a hashed prefix after the base path for each unique shard ID, for example, `///indices//0/`. The hash method used is `FNV_1A_COMPOSITE_1`, which uses the `FNV1a` hash function and generates a custom-encoded 64-bit hash value that scales well with most remote store options. `FNV1a` takes the most significant 6 bits to create a URL-safe Base64 character and the next 14 bits to create a binary string. ### fs repository diff --git a/_install-and-configure/configuring-opensearch/index-settings.md b/_install-and-configure/configuring-opensearch/index-settings.md index a1894a0d2c..bd9b9651aa 100644 --- a/_install-and-configure/configuring-opensearch/index-settings.md +++ b/_install-and-configure/configuring-opensearch/index-settings.md @@ -73,6 +73,12 @@ OpenSearch supports the following dynamic cluster-level index settings: - `cluster.remote_store.segment.transfer_timeout` (Time unit): Controls the maximum amount of time to wait for all new segments to update after refresh to the remote store. If the upload does not complete within a specified amount of time, it throws a `SegmentUploadFailedException` error. Default is `30m`. It has a minimum constraint of `10m`. +- `cluster.remote_store.translog.path.prefix` (String): Controls the fixed path prefix for translog data on a remote-store-enabled cluster. This setting only applies when the `cluster.remote_store.index.path.type` setting is either `HASHED_PREFIX` or `HASHED_INFIX`. Default is an empty string, `""`. + +- `cluster.remote_store.segments.path.prefix` (String): Controls the fixed path prefix for segment data on a remote-store-enabled cluster. This setting only applies when the `cluster.remote_store.index.path.type` setting is either `HASHED_PREFIX` or `HASHED_INFIX`. Default is an empty string, `""`. + +- `cluster.snapshot.shard.path.prefix` (String): Controls the fixed path prefix for snapshot shard-level blobs. This setting only applies when the repository `shard_path_type` setting is either `HASHED_PREFIX` or `HASHED_INFIX`. Default is an empty string, `""`. + ## Index-level index settings You can specify index settings at index creation. There are two types of index settings: From 1fe62b09e0ea727ed16995cfcc00285a74251320 Mon Sep 17 00:00:00 2001 From: Harsha Vamsi Kalluri Date: Wed, 11 Sep 2024 11:02:39 -0700 Subject: [PATCH 15/17] Adding new cluster search setting docs (#8180) * Adding new cluster search setting docs Signed-off-by: Harsha Vamsi Kalluri * Update _install-and-configure/configuring-opensearch/search-settings.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Harsha Vamsi Kalluri --------- Signed-off-by: Harsha Vamsi Kalluri Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- .../configuring-opensearch/search-settings.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/_install-and-configure/configuring-opensearch/search-settings.md b/_install-and-configure/configuring-opensearch/search-settings.md index c3c4337d01..e53f05aa64 100644 --- a/_install-and-configure/configuring-opensearch/search-settings.md +++ b/_install-and-configure/configuring-opensearch/search-settings.md @@ -39,6 +39,8 @@ OpenSearch supports the following search settings: - `search.dynamic_pruning.cardinality_aggregation.max_allowed_cardinality` (Dynamic, integer): Determines the threshold for applying dynamic pruning in cardinality aggregation. If a field’s cardinality exceeds this threshold, the aggregation reverts to the default method. This is an experimental feature and may change or be removed in future versions. +- `search.keyword_index_or_doc_values_enabled` (Dynamic, Boolean): Determines whether to use the index or doc values when running `multi_term` queries on `keyword` fields. Default value is `false`. + ## Point in Time settings For information about PIT settings, see [PIT settings]({{site.url}}{{site.baseurl}}/search-plugins/point-in-time-api/#pit-settings). From cdf4985aeeb7efba4f4a60982b5ce9eea66905c8 Mon Sep 17 00:00:00 2001 From: Siddhant Deshmukh Date: Wed, 11 Sep 2024 11:35:00 -0700 Subject: [PATCH 16/17] Grouping Top N queries documentation (#8173) * Grouping Top N queries documentation Signed-off-by: Siddhant Deshmukh * Fix dead links Signed-off-by: Siddhant Deshmukh * Fix dead link Signed-off-by: Siddhant Deshmukh * Fix dead links Signed-off-by: Siddhant Deshmukh * Address reviewdog comments Signed-off-by: Siddhant Deshmukh * reviewdog fix Signed-off-by: Siddhant Deshmukh * Doc review Signed-off-by: Fanit Kolchina * Add table Signed-off-by: Siddhant Deshmukh * Table review and added ability to collapse the response Signed-off-by: Fanit Kolchina * More explanation to a couple of parameters Signed-off-by: Fanit Kolchina * Typo fix Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Editorial comment Signed-off-by: Fanit Kolchina * Update _observing-your-data/query-insights/grouping-top-n-queries.md Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Siddhant Deshmukh Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Fanit Kolchina Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- .../query-insights/grouping-top-n-queries.md | 331 ++++++++++++++++++ _observing-your-data/query-insights/index.md | 3 + .../query-insights/query-metrics.md | 4 +- 3 files changed, 337 insertions(+), 1 deletion(-) create mode 100644 _observing-your-data/query-insights/grouping-top-n-queries.md diff --git a/_observing-your-data/query-insights/grouping-top-n-queries.md b/_observing-your-data/query-insights/grouping-top-n-queries.md new file mode 100644 index 0000000000..28cbcbb8e5 --- /dev/null +++ b/_observing-your-data/query-insights/grouping-top-n-queries.md @@ -0,0 +1,331 @@ +--- +layout: default +title: Grouping top N queries +parent: Query insights +nav_order: 20 +--- + +# Grouping top N queries +**Introduced 2.17** +{: .label .label-purple } + +Monitoring the [top N queries]({{site.url}}{{site.baseurl}}/observing-your-data/query-insights/top-n-queries/) can help you to identify the most resource-intensive queries based on latency, CPU, and memory usage in a specified time window. However, if a single computationally expensive query is executed multiple times, it can occupy all top N query slots, potentially preventing other expensive queries from appearing in the list. To address this issue, you can group similar queries, gaining insight into various high-impact query groups. + +Starting with OpenSearch version 2.17, the top N queries can be grouped by `similarity`, with additional grouping options planned for future version releases. + +## Grouping queries by similarity + +Grouping queries by `similarity` organizes them based on the query structure, removing everything except the core query operations. + +For example, the following query: + +```json +{ + "query": { + "bool": { + "must": [ + { "exists": { "field": "field1" } } + ], + "query_string": { + "query": "search query" + } + } + } +} +``` + +Has the following corresponding query structure: + +```c +bool + must + exists + query_string +``` + +When queries share the same query structure, they are grouped together, ensuring that all similar queries belong to the same group. + + +## Aggregate metrics per group + +In addition to retrieving latency, CPU, and memory metrics for individual top N queries, you can obtain aggregate statistics for the +top N query groups. For each query group, the response includes the following statistics: +- The total latency, CPU usage, or memory usage (depending on the configured metric type) +- The total query count + +Using these statistics, you can calculate the average latency, CPU usage, or memory usage for each query group. +The response also includes one example query from the query group. + +## Configuring query grouping + +Before you enable query grouping, you must enable top N query monitoring for a metric type of your choice. For more information, see [Configuring top N query monitoring]({{site.url}}{{site.baseurl}}/observing-your-data/query-insights/top-n-queries/#configuring-top-n-query-monitoring). + +To configure grouping for top N queries, use the following steps. + +### Step 1: Enable top N query monitoring + +Ensure that top N query monitoring is enabled for at least one of the metrics: latency, CPU, or memory. For more information, see [Configuring top N query monitoring]({{site.url}}{{site.baseurl}}/observing-your-data/query-insights/top-n-queries/#configuring-top-n-query-monitoring). + +For example, to enable top N query monitoring by latency with the default settings, send the following request: + +```json +PUT _cluster/settings +{ + "persistent" : { + "search.insights.top_queries.latency.enabled" : true + } +} +``` +{% include copy-curl.html %} + +### Step 2: Configure query grouping + +Set the desired grouping method by updating the following cluster setting: + +```json +PUT _cluster/settings +{ + "persistent" : { + "search.insights.top_queries.group_by" : "similarity" + } +} +``` +{% include copy-curl.html %} + +The default value for the `group_by` setting is `none`, which disables grouping. As of OpenSearch 2.17, the supported values for `group_by` are `similarity` and `none`. + +### Step 3 (Optional): Limit the number of monitored query groups + +Optionally, you can limit the number of monitored query groups. Queries already included in the top N query list (the most resource-intensive queries) will not be considered in determining the limit. Essentially, the maximum applies only to other query groups, and the top N queries are tracked separately. This helps manage the tracking of query groups based on workload and query window size. + +To limit tracking to 100 query groups, send the following request: + +```json +PUT _cluster/settings +{ + "persistent" : { + "search.insights.top_queries.max_groups_excluding_topn" : 100 + } +} +``` +{% include copy-curl.html %} + +The default value for `max_groups_excluding_topn` is `100`, and you can set it to any value between `0` and `10,000`, inclusive. + +## Monitoring query groups + +To view the top N query groups, send the following request: + +```json +GET /_insights/top_queries +``` +{% include copy-curl.html %} + +The response contains the top N query groups: + +
+ + Response + + {: .text-delta} + +```json +{ + "top_queries": [ + { + "timestamp": 1725495127359, + "source": { + "query": { + "match_all": { + "boost": 1.0 + } + } + }, + "phase_latency_map": { + "expand": 0, + "query": 55, + "fetch": 3 + }, + "total_shards": 1, + "node_id": "ZbINz1KFS1OPeFmN-n5rdg", + "query_hashcode": "b4c4f69290df756021ca6276be5cbb75", + "task_resource_usages": [ + { + "action": "indices:data/read/search[phase/query]", + "taskId": 30, + "parentTaskId": 29, + "nodeId": "ZbINz1KFS1OPeFmN-n5rdg", + "taskResourceUsage": { + "cpu_time_in_nanos": 33249000, + "memory_in_bytes": 2896848 + } + }, + { + "action": "indices:data/read/search", + "taskId": 29, + "parentTaskId": -1, + "nodeId": "ZbINz1KFS1OPeFmN-n5rdg", + "taskResourceUsage": { + "cpu_time_in_nanos": 3151000, + "memory_in_bytes": 133936 + } + } + ], + "indices": [ + "my_index" + ], + "labels": {}, + "search_type": "query_then_fetch", + "measurements": { + "latency": { + "number": 160, + "count": 10, + "aggregationType": "AVERAGE" + } + } + }, + { + "timestamp": 1725495135160, + "source": { + "query": { + "term": { + "content": { + "value": "first", + "boost": 1.0 + } + } + } + }, + "phase_latency_map": { + "expand": 0, + "query": 18, + "fetch": 0 + }, + "total_shards": 1, + "node_id": "ZbINz1KFS1OPeFmN-n5rdg", + "query_hashcode": "c3620cc3d4df30fb3f95aeb2167289a4", + "task_resource_usages": [ + { + "action": "indices:data/read/search[phase/query]", + "taskId": 50, + "parentTaskId": 49, + "nodeId": "ZbINz1KFS1OPeFmN-n5rdg", + "taskResourceUsage": { + "cpu_time_in_nanos": 10188000, + "memory_in_bytes": 288136 + } + }, + { + "action": "indices:data/read/search", + "taskId": 49, + "parentTaskId": -1, + "nodeId": "ZbINz1KFS1OPeFmN-n5rdg", + "taskResourceUsage": { + "cpu_time_in_nanos": 262000, + "memory_in_bytes": 3216 + } + } + ], + "indices": [ + "my_index" + ], + "labels": {}, + "search_type": "query_then_fetch", + "measurements": { + "latency": { + "number": 109, + "count": 7, + "aggregationType": "AVERAGE" + } + } + }, + { + "timestamp": 1725495139766, + "source": { + "query": { + "match": { + "content": { + "query": "first", + "operator": "OR", + "prefix_length": 0, + "max_expansions": 50, + "fuzzy_transpositions": true, + "lenient": false, + "zero_terms_query": "NONE", + "auto_generate_synonyms_phrase_query": true, + "boost": 1.0 + } + } + } + }, + "phase_latency_map": { + "expand": 0, + "query": 15, + "fetch": 0 + }, + "total_shards": 1, + "node_id": "ZbINz1KFS1OPeFmN-n5rdg", + "query_hashcode": "484eaabecd13db65216b9e2ff5eee999", + "task_resource_usages": [ + { + "action": "indices:data/read/search[phase/query]", + "taskId": 64, + "parentTaskId": 63, + "nodeId": "ZbINz1KFS1OPeFmN-n5rdg", + "taskResourceUsage": { + "cpu_time_in_nanos": 12161000, + "memory_in_bytes": 473456 + } + }, + { + "action": "indices:data/read/search", + "taskId": 63, + "parentTaskId": -1, + "nodeId": "ZbINz1KFS1OPeFmN-n5rdg", + "taskResourceUsage": { + "cpu_time_in_nanos": 293000, + "memory_in_bytes": 3216 + } + } + ], + "indices": [ + "my_index" + ], + "labels": {}, + "search_type": "query_then_fetch", + "measurements": { + "latency": { + "number": 43, + "count": 3, + "aggregationType": "AVERAGE" + } + } + } + ] +} +``` + +
+ +## Response fields + +The response includes the following fields. + +Field | Data type | Description +:--- |:---| :--- +`top_queries` | Array | The list of top query groups. +`top_queries.timestamp` | Integer | The execution timestamp for the first query in the query group. +`top_queries.source` | Object | The first query in the query group. +`top_queries.phase_latency_map` | Object | The phase latency map for the first query in the query group. The map includes the amount of time, in milliseconds, that the query spent in the `expand`, `query`, and `fetch` phases. +`top_queries.total_shards` | Integer | The number of shards on which the first query was executed. +`top_queries.node_id` | String | The node ID of the node that coordinated the execution of the first query in the query group. +`top_queries.query_hashcode` | String | The hash code that uniquely identifies the query group. This is essentially the hash of the [query structure](#grouping-queries-by-similarity). +`top_queries.task_resource_usages` | Array of objects | The resource usage breakdown for the various tasks belonging to the first query in the query group. +`top_queries.indices` | Array | The indexes searched by the first query in the query group. +`top_queries.labels` | Object | Used to label the top query. +`top_queries.search_type` | String | The search request execution type (`query_then_fetch` or `dfs_query_then_fetch`). For more information, see the `search_type` parameter in the [Search API documentation]({{site.url}}{{site.baseurl}}/api-reference/search/#url-parameters). +`top_queries.measurements` | Object | The aggregate measurements for the query group. +`top_queries.measurements.latency` | Object | The aggregate latency measurements for the query group. +`top_queries.measurements.latency.number` | Integer | The total latency for the query group. +`top_queries.measurements.latency.count` | Integer | The number of queries in the query group. +`top_queries.measurements.latency.aggregationType` | String | The aggregation type for the current entry. If grouping by similarity is enabled, then `aggregationType` is `AVERAGE`. If it is not enabled, then `aggregationType` is `NONE`. \ No newline at end of file diff --git a/_observing-your-data/query-insights/index.md b/_observing-your-data/query-insights/index.md index b929e51491..ef3a65bfcd 100644 --- a/_observing-your-data/query-insights/index.md +++ b/_observing-your-data/query-insights/index.md @@ -7,6 +7,8 @@ has_toc: false --- # Query insights +**Introduced 2.12** +{: .label .label-purple } To monitor and analyze the search queries within your OpenSearch cluster, you can obtain query insights. With minimal performance impact, query insights features aim to provide comprehensive insights into search query execution, enabling you to better understand search query characteristics, patterns, and system behavior during query execution stages. Query insights facilitate enhanced detection, diagnosis, and prevention of query performance issues, ultimately improving query processing performance, user experience, and overall system resilience. @@ -36,4 +38,5 @@ For information about installing plugins, see [Installing plugins]({{site.url}}{ You can obtain the following information using Query Insights: - [Top n queries]({{site.url}}{{site.baseurl}}/observing-your-data/query-insights/top-n-queries/) +- [Grouping top N queries]({{site.url}}{{site.baseurl}}/observing-your-data/query-insights/grouping-top-n-queries/) - [Query metrics]({{site.url}}{{site.baseurl}}/observing-your-data/query-insights/query-metrics/) diff --git a/_observing-your-data/query-insights/query-metrics.md b/_observing-your-data/query-insights/query-metrics.md index c8caf21d65..beac8d4e18 100644 --- a/_observing-your-data/query-insights/query-metrics.md +++ b/_observing-your-data/query-insights/query-metrics.md @@ -2,10 +2,12 @@ layout: default title: Query metrics parent: Query insights -nav_order: 20 +nav_order: 30 --- # Query metrics +**Introduced 2.16** +{: .label .label-purple } Key query [metrics](#metrics), such as aggregation types, query types, latency, and resource usage per query type, are captured along the search path by using the OpenTelemetry (OTel) instrumentation framework. The telemetry data can be consumed using OTel metrics [exporters]({{site.url}}{{site.baseurl}}/observing-your-data/trace/distributed-tracing/#exporters). From f44deb242466533f8557e46422dc6885eb3c75ab Mon Sep 17 00:00:00 2001 From: Daniel Widdis Date: Wed, 11 Sep 2024 11:40:31 -0700 Subject: [PATCH 17/17] Document reprovision param for Update Workflow API (#8172) * Document reprovision param for Update Workflow API Signed-off-by: Daniel Widdis * Update _automating-configurations/api/create-workflow.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Daniel Widdis * Update _automating-configurations/api/create-workflow.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Daniel Widdis * Update _automating-configurations/api/create-workflow.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Daniel Widdis * Update _automating-configurations/api/create-workflow.md Co-authored-by: Nathan Bower Signed-off-by: Daniel Widdis --------- Signed-off-by: Daniel Widdis Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- _automating-configurations/api/create-workflow.md | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/_automating-configurations/api/create-workflow.md b/_automating-configurations/api/create-workflow.md index 83c0110ac3..3fc16c754d 100644 --- a/_automating-configurations/api/create-workflow.md +++ b/_automating-configurations/api/create-workflow.md @@ -58,7 +58,7 @@ POST /_plugins/_flow_framework/workflow?validation=none ``` {% include copy-curl.html %} -You cannot update a full workflow once it has been provisioned, but you can update fields other than the `workflows` field, such as `name` and `description`: +In a workflow that has not been provisioned, you can update fields other than the `workflows` field. For example, you can update the `name` and `description` fields as follows: ```json PUT /_plugins/_flow_framework/workflow/?update_fields=true @@ -72,12 +72,25 @@ PUT /_plugins/_flow_framework/workflow/?update_fields=true You cannot specify both the `provision` and `update_fields` parameters at the same time. {: .note} +If a workflow has been provisioned, you can update and reprovision the full template: + +```json +PUT /_plugins/_flow_framework/workflow/?reprovision=true +{ + +} +``` + +You can add new steps to the workflow but cannot delete them. Only index setting, search pipeline, and ingest pipeline steps can currently be updated. +{: .note} + The following table lists the available query parameters. All query parameters are optional. User-provided parameters are only allowed if the `provision` parameter is set to `true`. | Parameter | Data type | Description | | :--- | :--- | :--- | | `provision` | Boolean | Whether to provision the workflow as part of the request. Default is `false`. | | `update_fields` | Boolean | Whether to update only the fields included in the request body. Default is `false`. | +| `reprovision` | Boolean | Whether to reprovision the entire template if it has already been provisioned. A complete template must be provided in the request body. Default is `false`. | | `validation` | String | Whether to validate the workflow. Valid values are `all` (validate the template) and `none` (do not validate the template). Default is `all`. | | User-provided substitution expressions | String | Parameters matching substitution expressions in the template. Only allowed if `provision` is set to `true`. Optional. If `provision` is set to `false`, you can pass these parameters in the [Provision Workflow API query parameters]({{site.url}}{{site.baseurl}}/automating-configurations/api/provision-workflow/#query-parameters). |