Merge branch 'main' into top-n

opensearch-project · Sep 11, 2024 · 156c34e · 156c34e
2 parents bea3622 + f44deb2
commit 156c34e
Show file tree

Hide file tree

Showing 21 changed files with 714 additions and 136 deletions.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -2,7 +2,7 @@
 _Describe what this change achieves._
 
 ### Issues Resolved
-_List any issues this PR will resolve, e.g. Closes [...]._
+Closes #[_insert issue number_]
 
 ### Version
 _List the OpenSearch version to which this PR applies, e.g. 2.14, 2.12--2.14, or all._

diff --git a/_analyzers/token-filters/asciifolding.md b/_analyzers/token-filters/asciifolding.md
@@ -0,0 +1,135 @@
+---
+layout: default
+title: ASCII folding
+parent: Token filters
+nav_order: 20
+---
+
+# ASCII folding token filter
+
+The `asciifolding` token filter converts non-ASCII characters to their closest ASCII equivalents. For example, *é* becomes *e*, *ü* becomes *u*, and *ñ* becomes *n*. This process is known as *transliteration*.
+
+
+The `asciifolding` token filter offers a number of benefits:
+
+  - **Enhanced search flexibility**: Users often omit accents or special characters when entering queries. The `asciifolding` token filter ensures that such queries still return relevant results.
+  - **Normalization**: Standardizes the indexing process by ensuring that accented characters are consistently converted to their ASCII equivalents.
+  - **Internationalization**: Particularly useful for applications including multiple languages and character sets.
+
+While the `asciifolding` token filter can simplify searches, it may also lead to the loss of specific information, particularly if the distinction between accented and non-accented characters in the dataset is significant.
+{: .warning}
+
+## Parameters
+
+You can configure the `asciifolding` token filter using the `preserve_original` parameter. Setting this parameter to `true` keeps both the original token and its ASCII-folded version in the token stream. This can be particularly useful when you want to match both the original (with accents) and the normalized (without accents) versions of a term in a search query. Default is `false`.
+
+## Example
+
+The following example request creates a new index named `example_index` and defines an analyzer with the `asciifolding` filter and `preserve_original` parameter set to `true`:
+
+```json
+PUT /example_index
+{
+  "settings": {
+    "analysis": {
+      "filter": {
+        "custom_ascii_folding": {
+          "type": "asciifolding",
+          "preserve_original": true
+        }
+      },
+      "analyzer": {
+        "custom_ascii_analyzer": {
+          "type": "custom",
+          "tokenizer": "standard",
+          "filter": [
+            "lowercase",
+            "custom_ascii_folding"
+          ]
+        }
+      }
+    }
+  }
+}
+```
+{% include copy-curl.html %}
+
+## Generated tokens
+
+Use the following request to examine the tokens generated using the analyzer:
+
+```json
+POST /example_index/_analyze
+{
+  "analyzer": "custom_ascii_analyzer",
+  "text": "Résumé café naïve coördinate"
+}
+```
+{% include copy-curl.html %}
+
+The response contains the generated tokens:
+
+```json
+{
+  "tokens": [
+    {
+      "token": "resume",
+      "start_offset": 0,
+      "end_offset": 6,
+      "type": "<ALPHANUM>",
+      "position": 0
+    },
+    {
+      "token": "résumé",
+      "start_offset": 0,
+      "end_offset": 6,
+      "type": "<ALPHANUM>",
+      "position": 0
+    },
+    {
+      "token": "cafe",
+      "start_offset": 7,
+      "end_offset": 11,
+      "type": "<ALPHANUM>",
+      "position": 1
+    },
+    {
+      "token": "café",
+      "start_offset": 7,
+      "end_offset": 11,
+      "type": "<ALPHANUM>",
+      "position": 1
+    },
+    {
+      "token": "naive",
+      "start_offset": 12,
+      "end_offset": 17,
+      "type": "<ALPHANUM>",
+      "position": 2
+    },
+    {
+      "token": "naïve",
+      "start_offset": 12,
+      "end_offset": 17,
+      "type": "<ALPHANUM>",
+      "position": 2
+    },
+    {
+      "token": "coordinate",
+      "start_offset": 18,
+      "end_offset": 28,
+      "type": "<ALPHANUM>",
+      "position": 3
+    },
+    {
+      "token": "coördinate",
+      "start_offset": 18,
+      "end_offset": 28,
+      "type": "<ALPHANUM>",
+      "position": 3
+    }
+  ]
+}
+```
+
+
diff --git a/_analyzers/token-filters/index.md b/_analyzers/token-filters/index.md
@@ -14,7 +14,7 @@ The following table lists all token filters that OpenSearch supports.
 
 Token filter | Underlying Lucene token filter|  Description
 [`apostrophe`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/apostrophe/) | [ApostropheFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/tr/ApostropheFilter.html) | In each token containing an apostrophe, the `apostrophe` token filter removes the apostrophe itself and all characters following it. 
-`asciifolding` | [ASCIIFoldingFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html) | Converts alphabetic, numeric, and symbolic characters.
+[`asciifolding`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/asciifolding/) | [ASCIIFoldingFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html) | Converts alphabetic, numeric, and symbolic characters.
 `cjk_bigram` | [CJKBigramFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/cjk/CJKBigramFilter.html) | Forms bigrams of Chinese, Japanese, and Korean (CJK) tokens. 
 `cjk_width` | [CJKWidthFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/cjk/CJKWidthFilter.html) | Normalizes Chinese, Japanese, and Korean (CJK) tokens according to the following rules: <br> - Folds full-width ASCII character variants into the equivalent basic Latin characters. <br> - Folds half-width Katakana character variants into the equivalent Kana characters.
 `classic` | [ClassicFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/classic/ClassicFilter.html) | Performs optional post-processing on the tokens generated by the classic tokenizer. Removes possessives (`'s`) and removes `.` from acronyms.

diff --git a/_api-reference/snapshots/create-repository.md b/_api-reference/snapshots/create-repository.md
@@ -38,11 +38,20 @@ Request parameters depend on the type of repository: `fs` or `s3`.
 
 ### Common parameters
 
-The following table lists parameters that can be used with both the `fs` and `s3` repositories. 
+The following table lists parameters that can be used with both the `fs` and `s3` repositories.
 
 Request field | Description
 :--- | :---
 `prefix_mode_verification` | When enabled, adds a hashed value of a random seed to the prefix for repository verification. For remote-store-enabled clusters, you can add the `setting.prefix_mode_verification` setting to the node attributes for the supplied repository. This field works with both new and existing repositories. Optional.
+`shard_path_type` | Controls the path structure of shard-level blobs. Supported values are `FIXED`, `HASHED_PREFIX`, and `HASHED_INFIX`. For more information about each value, see [shard_path_type values](#shard_path_type-values)/. Default is `FIXED`. Optional.
+
+#### shard_path_type values
+
+The following values are supported in the `shard_path_type` setting:
+
+- `FIXED`: Keeps the path structure in the existing hierarchical manner, such as `<ROOT>/<BASE_PATH>/indices/<index-id>/0/<SHARD_BLOBS>`.
+- `HASHED_PREFIX`: Prepends a hashed prefix at the start of the path for each unique shard ID, for example, `<ROOT>/<HASH-OF-INDEX-ID-AND-SHARD-ID>/<BASE_PATH>/indices/<index-id>/0/<SHARD_BLOBS>`.
+- `HASHED_INFIX`: Appends a hashed prefix after the base path for each unique shard ID, for example, `<ROOT>/<BASE-PATH>/<HASH-OF-INDEX-ID-AND-SHARD-ID>/indices/<index-id>/0/<SHARD_BLOBS>`. The hash method used is `FNV_1A_COMPOSITE_1`, which uses the `FNV1a` hash function and generates a custom-encoded 64-bit hash value that scales well with most remote store options. `FNV1a` takes the most significant 6 bits to create a URL-safe Base64 character and the next 14 bits to create a binary string.
 
 ### fs repository
 
@@ -54,6 +63,7 @@ Request field | Description
 `max_restore_bytes_per_sec` | The maximum rate at which snapshots restore. Default is 40 MB per second (`40m`). Optional.
 `max_snapshot_bytes_per_sec` | The maximum rate at which snapshots take. Default is 40 MB per second (`40m`). Optional.
 `remote_store_index_shallow_copy` | Boolean | Determines whether the snapshot of the remote store indexes are captured as a shallow copy. Default is `false`.
+`shallow_snapshot_v2` | Boolean | Determines whether the snapshots of the remote store indexes are captured as a [shallow copy v2]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability/#shallow-snapshot-v2). Default is `false`.
 `readonly` | Whether the repository is read-only. Useful when migrating from one cluster (`"readonly": false` when registering) to another cluster (`"readonly": true` when registering). Optional.
 
 
@@ -73,6 +83,7 @@ Request field | Description
 `max_snapshot_bytes_per_sec` | The maximum rate at which snapshots take. Default is 40 MB per second (`40m`). Optional.
 `readonly` | Whether the repository is read-only. Useful when migrating from one cluster (`"readonly": false` when registering) to another cluster (`"readonly": true` when registering). Optional.
 `remote_store_index_shallow_copy` | Boolean | Whether the snapshot of the remote store indexes is captured as a shallow copy. Default is `false`.
+`shallow_snapshot_v2` | Boolean | Determines whether the snapshots of the remote store indexes are captured as a [shallow copy v2]([shallow copy v2]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability/#shallow-snapshot-v2). Default is `false`.
 `server_side_encryption` | Whether to encrypt snapshot files in the S3 bucket. This setting uses AES-256 with S3-managed keys. See [Protecting data using server-side encryption](https://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html). Default is `false`. Optional.
 `storage_class` | Specifies the [S3 storage class](https://docs.aws.amazon.com/AmazonS3/latest/dev/storage-class-intro.html) for the snapshots files. Default is `standard`. Do not use the `glacier` and `deep_archive` storage classes. Optional.
 

diff --git a/_api-reference/snapshots/create-snapshot.md b/_api-reference/snapshots/create-snapshot.md
@@ -144,4 +144,5 @@ The snapshot definition is returned.
 | failures | array | Failures, if any, that occured during snapshot creation. |
 | shards | object | Total number of shards created along with number of successful and failed shards. |
 | state | string | Snapshot status. Possible values: `IN_PROGRESS`, `SUCCESS`, `FAILED`, `PARTIAL`. |
-| remote_store_index_shallow_copy | Boolean | Whether the snapshot of the remote store indexes is captured as a shallow copy. Default is `false`. |
+| remote_store_index_shallow_copy | Boolean | Whether the snapshots of the remote store indexes is captured as a shallow copy. Default is `false`. |
+| pinned_timestamp | long      | A timestamp (in milliseconds) pinned by the snapshot for the implicit locking of remote store files referenced by the snapshot. |
diff --git a/_automating-configurations/api/create-workflow.md b/_automating-configurations/api/create-workflow.md
@@ -58,7 +58,7 @@ POST /_plugins/_flow_framework/workflow?validation=none
 ```
 {% include copy-curl.html %}
 
-You cannot update a full workflow once it has been provisioned, but you can update fields other than the `workflows` field, such as `name` and `description`:
+In a workflow that has not been provisioned, you can update fields other than the `workflows` field. For example, you can update the `name` and `description` fields as follows:
 
 ```json
 PUT /_plugins/_flow_framework/workflow/<workflow_id>?update_fields=true
@@ -72,12 +72,25 @@ PUT /_plugins/_flow_framework/workflow/<workflow_id>?update_fields=true
 You cannot specify both the `provision` and `update_fields` parameters at the same time.
 {: .note}
 
+If a workflow has been provisioned, you can update and reprovision the full template:
+
+```json
+PUT /_plugins/_flow_framework/workflow/<workflow_id>?reprovision=true
+{
+  <updated complete template>
+}
+```
+
+You can add new steps to the workflow but cannot delete them. Only index setting, search pipeline, and ingest pipeline steps can currently be updated.
+{: .note}
+
 The following table lists the available query parameters. All query parameters are optional. User-provided parameters are only allowed if the `provision` parameter is set to `true`.
 
 | Parameter | Data type | Description |
 | :--- | :--- | :--- |
 | `provision` | Boolean | Whether to provision the workflow as part of the request. Default is `false`. |
 | `update_fields` | Boolean | Whether to update only the fields included in the request body. Default is `false`. |
+| `reprovision` | Boolean | Whether to reprovision the entire template if it has already been provisioned. A complete template must be provided in the request body. Default is `false`. |
 | `validation` | String | Whether to validate the workflow. Valid values are `all` (validate the template) and `none` (do not validate the template). Default is `all`. |
 | User-provided substitution expressions | String | Parameters matching substitution expressions in the template. Only allowed if `provision` is set to `true`. Optional. If `provision` is set to `false`, you can pass these parameters in the [Provision Workflow API query parameters]({{site.url}}{{site.baseurl}}/automating-configurations/api/provision-workflow/#query-parameters). |
 

diff --git a/_dashboards/query-workbench.md b/_dashboards/query-workbench.md
@@ -8,19 +8,14 @@ redirect_from:
 
 # Query Workbench
 
-Query Workbench is a tool within OpenSearch Dashboards. You can use Query Workbench to run on-demand [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/sql/index/) and [PPL]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/index/) queries, translate queries into their equivalent REST API calls, and view and save results in different [response formats]({{site.url}}{{site.baseurl}}/search-plugins/sql/response-formats/). 
+You can use Query Workbench in OpenSearch Dashboards to run on-demand [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/sql/index/) and [PPL]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/index/) queries, translate queries into their equivalent REST API calls, and view and save results in different [response formats]({{site.url}}{{site.baseurl}}/search-plugins/sql/response-formats/).
 
-A view of the Query Workbench interface within OpenSearch Dashboards is shown in the following image.
-
-<img src="{{site.url}}{{site.baseurl}}/images/dashboards/query-workbench-ui.png" alt="Query Workbench interface within OpenSearch Dashboards"  width="700">
-
-## Prerequisites 
-
-Before getting started, make sure you have [indexed your data]({{site.url}}{{site.baseurl}}/im-plugin/index/). 
+Query Workbench does not support delete or update operations through SQL or PPL. Access to data is read-only.
+{: .important}
 
-For this tutorial, you can index the following sample documents. Alternatively, you can use the [OpenSearch Playground](https://playground.opensearch.org/app/opensearch-query-workbench#/), which has preloaded indexes that you can use to try out Query Workbench.
+## Prerequisites
 
-To index sample documents, send the following [Bulk API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/) request:
+Before getting started with this tutorial, index the sample documents by sending the following [Bulk API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/) request:
 
 ```json
 PUT accounts/_bulk?refresh
@@ -35,9 +30,11 @@ PUT accounts/_bulk?refresh
 ```
 {% include copy-curl.html %}
 
-## Running SQL queries within Query Workbench
+See [Managing indexes]({{site.url}}{{site.baseurl}}/im-plugin/index/) to learn about indexing your own data. 
 
-Follow these steps to learn how to run SQL queries against your OpenSearch data using Query Workbench:
+## Running SQL queries within Query Workbench
+
+ The following steps guide you through running SQL queries against OpenSearch data:
 
 1. Access Query Workbench.
     - To access Query Workbench, go to OpenSearch Dashboards and choose **OpenSearch Plugins** > **Query Workbench** from the main menu.
@@ -64,23 +61,15 @@ Follow these steps to learn how to run SQL queries against your OpenSearch data
 3. View the results.
     - View the results in the **Results** pane, which presents the query output in tabular format. You can filter and download the results as needed.
 
-   The following image shows the query editor pane and results pane for the preceding SQL query:
-
-    <img src="{{site.url}}{{site.baseurl}}/images/dashboards/query-workbench-query-step2.png" alt="Query Workbench SQL query input and results output panes" width="800">
-
 4. Clear the query editor.  
     - Select the **Clear** button to clear the query editor and run a new query. 
 
 5. Examine how the query is processed.
-    - Select the **Explain** button to examine how OpenSearch processes the query, including the steps involved and order of operations. 
-
-    The following image shows the explanation of the SQL query that was run in step 2.
-
-    <img src="{{site.url}}{{site.baseurl}}/images/dashboards/query-explain.png" alt="Query Workbench SQL query explanation pane" width="500">
+    - Select the **Explain** button to examine how OpenSearch processes the query, including the steps involved and order of operations.
 
 ## Running PPL queries within Query Workbench
 
-Follow these steps to learn how to run PPL queries against your OpenSearch data using Query Workbench:
+Follow these steps to learn how to run PPL queries against OpenSearch data:
 
 1. Access Query Workbench.
     - To access Query Workbench, go to OpenSearch Dashboards and choose **OpenSearch Plugins** > **Query Workbench** from the main menu.
@@ -100,19 +89,8 @@ Follow these steps to learn how to run PPL queries against your OpenSearch data
 3. View the results.
     - View the results in the **Results** pane, which presents the query output in tabular format.
 
-   The following image shows the query editor pane and results pane for the PPL query that was run in step 2:
-
-    <img src="{{site.url}}{{site.baseurl}}/images/dashboards/query-workbench-ppl.png" alt="Query Workbench PPL query input and results output panes">
-
 4. Clear the query editor.  
     - Select the **Clear** button to clear the query editor and run a new query. 
 
 5. Examine how the query is processed.
-    - Select the **Explain** button to examine how OpenSearch processes the query, including the steps involved and order of operations. 
-
-    The following image shows the explanation of the PPL query that was run in step 2.
-
-    <img src="{{site.url}}{{site.baseurl}}/images/dashboards/query-PPL-explain.png" alt="Query Workbench PPL query explanation pane" width="500">
-
-Query Workbench does not support delete or update operations through SQL or PPL. Access to data is read-only.
-{: .important}
+    - Select the **Explain** button to examine how OpenSearch processes the query, including the steps involved and order of operations.
diff --git a/_data-prepper/common-use-cases/log-analytics.md b/_data-prepper/common-use-cases/log-analytics.md
@@ -147,6 +147,6 @@ The following is an example `fluent-bit.conf` file with SSL and basic authentica
 
 See the [Data Prepper Log Ingestion Demo Guide](https://github.com/opensearch-project/data-prepper/blob/main/examples/log-ingestion/README.md) for a specific example of Apache log ingestion from `FluentBit -> Data Prepper -> OpenSearch` running through Docker.
 
-In the future, Data Prepper will offer additional sources and processors that will make more complex log analytics pipelines available. Check out the [Data Prepper Project Roadmap](https://github.com/opensearch-project/data-prepper/projects/1) to see what is coming.  
+In the future, Data Prepper will offer additional sources and processors that will make more complex log analytics pipelines available. Check out the [Data Prepper Project Roadmap](https://github.com/orgs/opensearch-project/projects/221) to see what is coming.
 
 If there is a specific source, processor, or sink that you would like to include in your log analytics workflow and is not currently on the roadmap, please bring it to our attention by creating a GitHub issue. Additionally, if you are interested in contributing to Data Prepper, see our [Contributing Guidelines](https://github.com/opensearch-project/data-prepper/blob/main/CONTRIBUTING.md) as well as our [developer guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md) and [plugin development guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/plugin_development.md).