Merge branch 'main' into introduce_ltr_plugin

Signed-off-by: Eric Pugh <[email protected]>
opensearch-project · Sep 12, 2024 · 3fc4cf0 · 3fc4cf0
2 parents f2b6bba + 76486a4
commit 3fc4cf0
Show file tree

Hide file tree

Showing 71 changed files with 3,076 additions and 647 deletions.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -2,7 +2,7 @@
 _Describe what this change achieves._
 
 ### Issues Resolved
-_List any issues this PR will resolve, e.g. Closes [...]._
+Closes #[_insert issue number_]
 
 ### Version
 _List the OpenSearch version to which this PR applies, e.g. 2.14, 2.12--2.14, or all._

diff --git a/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt b/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt
@@ -80,6 +80,7 @@ Levenshtein
 [Oo]versamples?
 [Oo]nboarding
 pebibyte
+p\d{2}
 [Pp]erformant
 [Pp]laintext
 [Pp]luggable
@@ -101,6 +102,7 @@ pebibyte
 [Rr]eenable
 [Rr]eindex
 [Rr]eingest
+[Rr]eprovision(ed|ing)?
 [Rr]erank(er|ed|ing)?
 [Rr]epo
 [Rr]ewriter
@@ -126,6 +128,7 @@ stdout
 [Ss]ubvector
 [Ss]ubwords?
 [Ss]uperset
+[Ss]uperadmins?
 [Ss]yslog
 tebibyte
 [Tt]emplated

diff --git a/_about/index.md b/_about/index.md
@@ -22,16 +22,21 @@ This section contains documentation for OpenSearch and OpenSearch Dashboards.
 
 ## Getting started
 
-- [Intro to OpenSearch]({{site.url}}{{site.baseurl}}/intro/)
-- [Quickstart]({{site.url}}{{site.baseurl}}/quickstart/)
+To get started, explore the following documentation:
+
+- [Getting started guide]({{site.url}}{{site.baseurl}}/getting-started/): 
+  - [Intro to OpenSearch]({{site.url}}{{site.baseurl}}/getting-started/intro/)
+  - [Installation quickstart]({{site.url}}{{site.baseurl}}/getting-started/quickstart/)
+  - [Communicate with OpenSearch]({{site.url}}{{site.baseurl}}/getting-started/communicate/)
+  - [Ingest data]({{site.url}}{{site.baseurl}}/getting-started/ingest-data/)
+  - [Search data]({{site.url}}{{site.baseurl}}/getting-started/search-data/)
+  - [Getting started with OpenSearch security]({{site.url}}{{site.baseurl}}/getting-started/security/)
 - [Install OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/install-opensearch/index/)
 - [Install OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/install-and-configure/install-dashboards/index/)
-- [See the FAQ](https://opensearch.org/faq)
+- [FAQ](https://opensearch.org/faq)
 
 ## Why use OpenSearch?
 
-With OpenSearch, you can perform the following use cases:
-
 <table style="table-layout: auto ; width: 100%;">
 <tbody>
 <tr style="text-align: center; vertical-align:center;">
@@ -41,35 +46,38 @@ With OpenSearch, you can perform the following use cases:
 <td><img src="{{site.url}}{{site.baseurl}}/images/4_tracking.png" class="no-border" alt="Operational health tracking" height="100"/></td>
 </tr>
 <tr style="text-align: left; vertical-align:top; font-weight: bold; color: rgb(0,59,92)">
-<td>Fast, Scalable Full-text Search</td>
-<td>Application and Infrastructure Monitoring</td>
-<td>Security and Event Information Management</td>
-<td>Operational Health Tracking</td>
+<td>Fast, scalable full-text search</td>
+<td>Application and infrastructure monitoring</td>
+<td>Security and event information management</td>
+<td>Operational health tracking</td>
 </tr>
 <tr style="text-align: left; vertical-align:top;">
 <td>Help users find the right information within your application, website, or data lake catalog. </td>
-<td>Easily store and analyze log data, and set automated alerts for underperformance.</td>
+<td>Easily store and analyze log data, and set automated alerts for performance issues.</td>
 <td>Centralize logs to enable real-time security monitoring and forensic analysis.</td>
-<td>Use observability logs, metrics, and traces to monitor your applications and business in real time.</td>
+<td>Use observability logs, metrics, and traces to monitor your applications in real time.</td>
 </tr>
 </tbody>
 </table>
 
-**Additional features and plugins:**
+## Key features
+
+OpenSearch provides several features to help index, secure, monitor, and analyze your data:
 
-OpenSearch has several features and plugins to help index, secure, monitor, and analyze your data. Most OpenSearch plugins have corresponding OpenSearch Dashboards plugins that provide a convenient, unified user interface.
-- [Anomaly detection]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/) - Identify atypical data and receive automatic notifications
-- [KNN]({{site.url}}{{site.baseurl}}/search-plugins/knn/) - Find “nearest neighbors” in your vector data
-- [Performance Analyzer]({{site.url}}{{site.baseurl}}/monitoring-plugins/pa/) - Monitor and optimize your cluster
-- [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) - Use SQL or a piped processing language to query your data
-- [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/) - Automate index operations
-- [ML Commons plugin]({{site.url}}{{site.baseurl}}/ml-commons-plugin/index/) - Train and execute machine-learning models
-- [Asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/) - Run search requests in the background
-- [Cross-cluster replication]({{site.url}}{{site.baseurl}}/replication-plugin/index/) - Replicate your data across multiple OpenSearch clusters
+- [Anomaly detection]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/) -- Identify atypical data and receive automatic notifications.
+- [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) -- Use SQL or a Piped Processing Language (PPL) to query your data.
+- [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/) -- Automate index operations.
+- [Search methods]({{site.url}}{{site.baseurl}}/search-plugins/knn/) -- From traditional lexical search to advanced vector and hybrid search, discover the optimal search method for your use case.
+- [Machine learning]({{site.url}}{{site.baseurl}}/ml-commons-plugin/index/) -- Integrate machine learning models into your workloads.
+- [Workflow automation]({{site.url}}{{site.baseurl}}/automating-configurations/index/) -- Automate complex OpenSearch setup and preprocessing tasks.
+- [Performance evaluation]({{site.url}}{{site.baseurl}}/monitoring-plugins/pa/) -- Monitor and optimize your cluster.
+- [Asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/) -- Run search requests in the background.
+- [Cross-cluster replication]({{site.url}}{{site.baseurl}}/replication-plugin/index/) -- Replicate your data across multiple OpenSearch clusters.
 
 
 ## The secure path forward
-OpenSearch includes a demo configuration so that you can get up and running quickly, but before using OpenSearch in a production environment, you must [configure the Security plugin manually]({{site.url}}{{site.baseurl}}/security/configuration/index/) with your own certificates, authentication method, users, and passwords.
+
+OpenSearch includes a demo configuration so that you can get up and running quickly, but before using OpenSearch in a production environment, you must [configure the Security plugin manually]({{site.url}}{{site.baseurl}}/security/configuration/index/) with your own certificates, authentication method, users, and passwords. To get started, see [Getting started with OpenSearch security]({{site.url}}{{site.baseurl}}/getting-started/security/).
 
 ## Looking for the Javadoc?
 

diff --git a/_about/version-history.md b/_about/version-history.md
@@ -31,6 +31,7 @@ OpenSearch version | Release highlights | Release date
 [2.0.1](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.0.1.md) | Includes bug fixes and maintenance updates for Alerting and Anomaly Detection. | 16 June 2022
 [2.0.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.0.0.md) | Includes document-level monitors for alerting, OpenSearch Notifications plugins, and Geo Map Tiles in OpenSearch Dashboards. Also adds support for Lucene 9 and bug fixes for all OpenSearch plugins. For a full list of release highlights, see the Release Notes. | 26 May 2022
 [2.0.0-rc1](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.0.0-rc1.md) | The Release Candidate for 2.0.0. This version allows you to preview the upcoming 2.0.0 release before the GA release. The preview release adds document-level alerting, support for Lucene 9, and the ability to use term lookup queries in document level security. | 03 May 2022
+[1.3.19](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.19.md) | Includes bug fixes and maintenance updates for OpenSearch security, OpenSearch security Dashboards, and anomaly detection. | 27 August 2024
 [1.3.18](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.18.md) | Includes maintenance updates for OpenSearch security. | 16 July 2024
 [1.3.17](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.17.md) | Includes maintenance updates for OpenSearch security and OpenSearch Dashboards security. | 06 June 2024
 [1.3.16](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.16.md) | Includes bug fixes and maintenance updates for OpenSearch security, index management, performance analyzer, and reporting. | 23 April 2024

diff --git a/_analyzers/token-filters/asciifolding.md b/_analyzers/token-filters/asciifolding.md
@@ -0,0 +1,135 @@
+---
+layout: default
+title: ASCII folding
+parent: Token filters
+nav_order: 20
+---
+
+# ASCII folding token filter
+
+The `asciifolding` token filter converts non-ASCII characters to their closest ASCII equivalents. For example, *é* becomes *e*, *ü* becomes *u*, and *ñ* becomes *n*. This process is known as *transliteration*.
+
+
+The `asciifolding` token filter offers a number of benefits:
+
+  - **Enhanced search flexibility**: Users often omit accents or special characters when entering queries. The `asciifolding` token filter ensures that such queries still return relevant results.
+  - **Normalization**: Standardizes the indexing process by ensuring that accented characters are consistently converted to their ASCII equivalents.
+  - **Internationalization**: Particularly useful for applications including multiple languages and character sets.
+
+While the `asciifolding` token filter can simplify searches, it may also lead to the loss of specific information, particularly if the distinction between accented and non-accented characters in the dataset is significant.
+{: .warning}
+
+## Parameters
+
+You can configure the `asciifolding` token filter using the `preserve_original` parameter. Setting this parameter to `true` keeps both the original token and its ASCII-folded version in the token stream. This can be particularly useful when you want to match both the original (with accents) and the normalized (without accents) versions of a term in a search query. Default is `false`.
+
+## Example
+
+The following example request creates a new index named `example_index` and defines an analyzer with the `asciifolding` filter and `preserve_original` parameter set to `true`:
+
+```json
+PUT /example_index
+{
+  "settings": {
+    "analysis": {
+      "filter": {
+        "custom_ascii_folding": {
+          "type": "asciifolding",
+          "preserve_original": true
+        }
+      },
+      "analyzer": {
+        "custom_ascii_analyzer": {
+          "type": "custom",
+          "tokenizer": "standard",
+          "filter": [
+            "lowercase",
+            "custom_ascii_folding"
+          ]
+        }
+      }
+    }
+  }
+}
+```
+{% include copy-curl.html %}
+
+## Generated tokens
+
+Use the following request to examine the tokens generated using the analyzer:
+
+```json
+POST /example_index/_analyze
+{
+  "analyzer": "custom_ascii_analyzer",
+  "text": "Résumé café naïve coördinate"
+}
+```
+{% include copy-curl.html %}
+
+The response contains the generated tokens:
+
+```json
+{
+  "tokens": [
+    {
+      "token": "resume",
+      "start_offset": 0,
+      "end_offset": 6,
+      "type": "<ALPHANUM>",
+      "position": 0
+    },
+    {
+      "token": "résumé",
+      "start_offset": 0,
+      "end_offset": 6,
+      "type": "<ALPHANUM>",
+      "position": 0
+    },
+    {
+      "token": "cafe",
+      "start_offset": 7,
+      "end_offset": 11,
+      "type": "<ALPHANUM>",
+      "position": 1
+    },
+    {
+      "token": "café",
+      "start_offset": 7,
+      "end_offset": 11,
+      "type": "<ALPHANUM>",
+      "position": 1
+    },
+    {
+      "token": "naive",
+      "start_offset": 12,
+      "end_offset": 17,
+      "type": "<ALPHANUM>",
+      "position": 2
+    },
+    {
+      "token": "naïve",
+      "start_offset": 12,
+      "end_offset": 17,
+      "type": "<ALPHANUM>",
+      "position": 2
+    },
+    {
+      "token": "coordinate",
+      "start_offset": 18,
+      "end_offset": 28,
+      "type": "<ALPHANUM>",
+      "position": 3
+    },
+    {
+      "token": "coördinate",
+      "start_offset": 18,
+      "end_offset": 28,
+      "type": "<ALPHANUM>",
+      "position": 3
+    }
+  ]
+}
+```
+
+
diff --git a/_analyzers/token-filters/index.md b/_analyzers/token-filters/index.md
@@ -14,7 +14,7 @@ The following table lists all token filters that OpenSearch supports.
 
 Token filter | Underlying Lucene token filter|  Description
 [`apostrophe`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/apostrophe/) | [ApostropheFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/tr/ApostropheFilter.html) | In each token containing an apostrophe, the `apostrophe` token filter removes the apostrophe itself and all characters following it. 
-`asciifolding` | [ASCIIFoldingFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html) | Converts alphabetic, numeric, and symbolic characters.
+[`asciifolding`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/asciifolding/) | [ASCIIFoldingFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html) | Converts alphabetic, numeric, and symbolic characters.
 `cjk_bigram` | [CJKBigramFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/cjk/CJKBigramFilter.html) | Forms bigrams of Chinese, Japanese, and Korean (CJK) tokens. 
 `cjk_width` | [CJKWidthFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/cjk/CJKWidthFilter.html) | Normalizes Chinese, Japanese, and Korean (CJK) tokens according to the following rules: <br> - Folds full-width ASCII character variants into the equivalent basic Latin characters. <br> - Folds half-width Katakana character variants into the equivalent Kana characters.
 `classic` | [ClassicFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/classic/ClassicFilter.html) | Performs optional post-processing on the tokens generated by the classic tokenizer. Removes possessives (`'s`) and removes `.` from acronyms.

diff --git a/_api-reference/cat/cat-shards.md b/_api-reference/cat/cat-shards.md
@@ -33,6 +33,7 @@ Parameter | Type | Description
 bytes | Byte size | Specify the units for byte size. For example, `7kb` or `6gb`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/).
 local | Boolean | Whether to return information from the local node only instead of from the cluster manager node. Default is `false`.
 cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds.
+cancel_after_time_interval | Time | The amount of time after which the shard request will be canceled. Default is `-1`.
 time | Time | Specify the units for time. For example, `5d` or `7h`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/).
 
 ## Example requests

diff --git a/_api-reference/snapshots/create-repository.md b/_api-reference/snapshots/create-repository.md
@@ -38,11 +38,20 @@ Request parameters depend on the type of repository: `fs` or `s3`.
 
 ### Common parameters
 
-The following table lists parameters that can be used with both the `fs` and `s3` repositories. 
+The following table lists parameters that can be used with both the `fs` and `s3` repositories.
 
 Request field | Description
 :--- | :---
 `prefix_mode_verification` | When enabled, adds a hashed value of a random seed to the prefix for repository verification. For remote-store-enabled clusters, you can add the `setting.prefix_mode_verification` setting to the node attributes for the supplied repository. This field works with both new and existing repositories. Optional.
+`shard_path_type` | Controls the path structure of shard-level blobs. Supported values are `FIXED`, `HASHED_PREFIX`, and `HASHED_INFIX`. For more information about each value, see [shard_path_type values](#shard_path_type-values)/. Default is `FIXED`. Optional.
+
+#### shard_path_type values
+
+The following values are supported in the `shard_path_type` setting:
+
+- `FIXED`: Keeps the path structure in the existing hierarchical manner, such as `<ROOT>/<BASE_PATH>/indices/<index-id>/0/<SHARD_BLOBS>`.
+- `HASHED_PREFIX`: Prepends a hashed prefix at the start of the path for each unique shard ID, for example, `<ROOT>/<HASH-OF-INDEX-ID-AND-SHARD-ID>/<BASE_PATH>/indices/<index-id>/0/<SHARD_BLOBS>`.
+- `HASHED_INFIX`: Appends a hashed prefix after the base path for each unique shard ID, for example, `<ROOT>/<BASE-PATH>/<HASH-OF-INDEX-ID-AND-SHARD-ID>/indices/<index-id>/0/<SHARD_BLOBS>`. The hash method used is `FNV_1A_COMPOSITE_1`, which uses the `FNV1a` hash function and generates a custom-encoded 64-bit hash value that scales well with most remote store options. `FNV1a` takes the most significant 6 bits to create a URL-safe Base64 character and the next 14 bits to create a binary string.
 
 ### fs repository
 
@@ -54,6 +63,7 @@ Request field | Description
 `max_restore_bytes_per_sec` | The maximum rate at which snapshots restore. Default is 40 MB per second (`40m`). Optional.
 `max_snapshot_bytes_per_sec` | The maximum rate at which snapshots take. Default is 40 MB per second (`40m`). Optional.
 `remote_store_index_shallow_copy` | Boolean | Determines whether the snapshot of the remote store indexes are captured as a shallow copy. Default is `false`.
+`shallow_snapshot_v2` | Boolean | Determines whether the snapshots of the remote store indexes are captured as a [shallow copy v2]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability/#shallow-snapshot-v2). Default is `false`.
 `readonly` | Whether the repository is read-only. Useful when migrating from one cluster (`"readonly": false` when registering) to another cluster (`"readonly": true` when registering). Optional.
 
 
@@ -73,6 +83,7 @@ Request field | Description
 `max_snapshot_bytes_per_sec` | The maximum rate at which snapshots take. Default is 40 MB per second (`40m`). Optional.
 `readonly` | Whether the repository is read-only. Useful when migrating from one cluster (`"readonly": false` when registering) to another cluster (`"readonly": true` when registering). Optional.
 `remote_store_index_shallow_copy` | Boolean | Whether the snapshot of the remote store indexes is captured as a shallow copy. Default is `false`.
+`shallow_snapshot_v2` | Boolean | Determines whether the snapshots of the remote store indexes are captured as a [shallow copy v2]([shallow copy v2]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability/#shallow-snapshot-v2). Default is `false`.
 `server_side_encryption` | Whether to encrypt snapshot files in the S3 bucket. This setting uses AES-256 with S3-managed keys. See [Protecting data using server-side encryption](https://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html). Default is `false`. Optional.
 `storage_class` | Specifies the [S3 storage class](https://docs.aws.amazon.com/AmazonS3/latest/dev/storage-class-intro.html) for the snapshots files. Default is `standard`. Do not use the `glacier` and `deep_archive` storage classes. Optional.
 

diff --git a/_api-reference/snapshots/create-snapshot.md b/_api-reference/snapshots/create-snapshot.md
@@ -144,4 +144,5 @@ The snapshot definition is returned.
 | failures | array | Failures, if any, that occured during snapshot creation. |
 | shards | object | Total number of shards created along with number of successful and failed shards. |
 | state | string | Snapshot status. Possible values: `IN_PROGRESS`, `SUCCESS`, `FAILED`, `PARTIAL`. |
-| remote_store_index_shallow_copy | Boolean | Whether the snapshot of the remote store indexes is captured as a shallow copy. Default is `false`. |
+| remote_store_index_shallow_copy | Boolean | Whether the snapshots of the remote store indexes is captured as a shallow copy. Default is `false`. |
+| pinned_timestamp | long      | A timestamp (in milliseconds) pinned by the snapshot for the implicit locking of remote store files referenced by the snapshot. |