From cf5f3a6edc046b1b9cd8641dbcbecb7e9d7eaefb Mon Sep 17 00:00:00 2001 From: Landon Lengyel Date: Fri, 5 Jul 2024 07:36:30 -0600 Subject: [PATCH 001/154] Update cat-nodes.md (#7626) 'Local' option is deprecated and no has any purpose. See Issue #7625 Signed-off-by: Landon Lengyel --- _api-reference/cat/cat-nodes.md | 1 - 1 file changed, 1 deletion(-) diff --git a/_api-reference/cat/cat-nodes.md b/_api-reference/cat/cat-nodes.md index 149e590536..6f68204710 100644 --- a/_api-reference/cat/cat-nodes.md +++ b/_api-reference/cat/cat-nodes.md @@ -39,7 +39,6 @@ Parameter | Type | Description :--- | :--- | :--- bytes | Byte size | Specify the units for byte size. For example, `7kb` or `6gb`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). full_id | Boolean | If true, return the full node ID. If false, return the shortened node ID. Defaults to false. -local | Boolean | Whether to return information from the local node only instead of from the cluster_manager node. Default is false. cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds. time | Time | Specify the units for time. For example, `5d` or `7h`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). include_unloaded_segments | Boolean | Whether to include information from segments not loaded into memory. Default is false. From 674d50bcc16a4e3fbd31e2d65c03e0aea949237e Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Mon, 8 Jul 2024 09:32:21 -0500 Subject: [PATCH 002/154] Fix Key value table (#7636) Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../configuration/processors/key-value.md | 62 ++++++++----------- 1 file changed, 27 insertions(+), 35 deletions(-) diff --git a/_data-prepper/pipelines/configuration/processors/key-value.md b/_data-prepper/pipelines/configuration/processors/key-value.md index aedc1f8822..52ecc7719c 100644 --- a/_data-prepper/pipelines/configuration/processors/key-value.md +++ b/_data-prepper/pipelines/configuration/processors/key-value.md @@ -11,40 +11,32 @@ nav_order: 56 You can use the `key_value` processor to parse the specified field into key-value pairs. You can customize the `key_value` processor to parse field information with the following options. The type for each of the following options is `string`. -| Option | Description | Example | -| :--- | :--- | :--- | -| source | The message field to be parsed. Optional. Default value is `message`. | If `source` is `"message1"`, `{"message1": {"key1=value1"}, "message2": {"key2=value2"}}` parses into `{"message1": {"key1=value1"}, "message2": {"key2=value2"}, "parsed_message": {"key1": "value1"}}`. | -| destination | The destination field for the parsed source. The parsed source overwrites the preexisting data for that key. Optional. If `destination` is set to `null`, the parsed fields will be written to the root of the event. Default value is `parsed_message`. | If `destination` is `"parsed_data"`, `{"message": {"key1=value1"}}` parses into `{"message": {"key1=value1"}, "parsed_data": {"key1": "value1"}}`. | -| field_delimiter_regex | A regular expression specifying the delimiter that separates key-value pairs. Special regular expression characters such as `[` and `]` must be escaped with `\\`. Cannot be defined at the same time as `field_split_characters`. Optional. If this option is not defined, `field_split_characters` is used. | If `field_delimiter_regex` is `"&\\{2\\}"`, `{"key1=value1&&key2=value2"}` parses into `{"key1": "value1", "key2": "value2"}`. | -| field_split_characters | A string of characters specifying the delimeter that separates key-value pairs. Special regular expression characters such as `[` and `]` must be escaped with `\\`. Cannot be defined at the same time as `field_delimiter_regex`. Optional. Default value is `&`. | If `field_split_characters` is `"&&"`, `{"key1=value1&&key2=value2"}` parses into `{"key1": "value1", "key2": "value2"}`. | -| key_value_delimiter_regex | A regular expression specifying the delimiter that separates the key and value within a key-value pair. Special regular expression characters such as `[` and `]` must be escaped with `\\`. This option cannot be defined at the same time as `value_split_characters`. Optional. If this option is not defined, `value_split_characters` is used. | If `key_value_delimiter_regex` is `"=\\{2\\}"`, `{"key1==value1"}` parses into `{"key1": "value1"}`. | -| value_split_characters | A string of characters specifying the delimiter that separates the key and value within a key-value pair. Special regular expression characters such as `[` and `]` must be escaped with `\\`. Cannot be defined at the same time as `key_value_delimiter_regex`. Optional. Default value is `=`. | If `value_split_characters` is `"=="`, `{"key1==value1"}` parses into `{"key1": "value1"}`. | -| non_match_value | When a key-value pair cannot be successfully split, the key-value pair is placed in the `key` field, and the specified value is placed in the `value` field. Optional. Default value is `null`. | `key1value1&key2=value2` parses into `{"key1value1": null, "key2": "value2"}`. | -| prefix | A prefix to append before all keys. Optional. Default value is an empty string. | If `prefix` is `"custom"`, `{"key1=value1"}` parses into `{"customkey1": "value1"}`.| -| delete_key_regex | A regular expression specifying the characters to delete from the key. Special regular expression characters such as `[` and `]` must be escaped with `\\`. Cannot be an empty string. Optional. No default value. | If `delete_key_regex` is `"\s"`, `{"key1 =value1"}` parses into `{"key1": "value1"}`. | -| delete_value_regex | A regular expression specifying the characters to delete from the value. Special regular expression characters such as `[` and `]` must be escaped with `\\`. Cannot be an empty string. Optional. No default value. | If `delete_value_regex` is `"\s"`, `{"key1=value1 "}` parses into `{"key1": "value1"}`. | -| include_keys | An array specifying the keys that should be added for parsing. By default, all keys will be added. | If `include_keys` is `["key2"]`,`key1=value1&key2=value2` will parse into `{"key2": "value2"}`. | -| exclude_keys | An array specifying the parsed keys that should not be added to the event. By default, no keys will be excluded. | If `exclude_keys` is `["key2"]`, `key1=value1&key2=value2` will parse into `{"key1": "value1"}`. | -| default_values | A map specifying the default keys and their values that should be added to the event in case these keys do not exist in the source field being parsed. If the default key already exists in the message, the value is not changed. The `include_keys` filter will be applied to the message before `default_values`. | If `default_values` is `{"defaultkey": "defaultvalue"}`, `key1=value1` will parse into `{"key1": "value1", "defaultkey": "defaultvalue"}`.
If `default_values` is `{"key1": "abc"}`, `key1=value1` will parse into `{"key1": "value1"}`.
If `include_keys` is `["key1"]` and `default_values` is `{"key2": "value2"}`, `key1=value1&key2=abc` will parse into `{"key1": "value1", "key2": "value2"}`. | -| transform_key | When to lowercase, uppercase, or capitalize keys. | If `transform_key` is `lowercase`, `{"Key1=value1"}` will parse into `{"key1": "value1"}`.
If `transform_key` is `uppercase`, `{"key1=value1"}` will parse into `{"KEY1": "value1"}`.
If `transform_key` is `capitalize`, `{"key1=value1"}` will parse into `{"Key1": "value1"}`. | -| whitespace | Specifies whether to be lenient or strict with the acceptance of unnecessary white space surrounding the configured value-split sequence. Default is `lenient`. | If `whitespace` is `"lenient"`, `{"key1 = value1"}` will parse into `{"key1 ": " value1"}`. If `whitespace` is `"strict"`, `{"key1 = value1"}` will parse into `{"key1": "value1"}`. | -| skip_duplicate_values | A Boolean option for removing duplicate key-value pairs. When set to `true`, only one unique key-value pair will be preserved. Default is `false`. | If `skip_duplicate_values` is `false`, `{"key1=value1&key1=value1"}` will parse into `{"key1": ["value1", "value1"]}`. If `skip_duplicate_values` is `true`, `{"key1=value1&key1=value1"}` will parse into `{"key1": "value1"}`. | -| remove_brackets | Specifies whether to treat square brackets, angle brackets, and parentheses as value "wrappers" that should be removed from the value. Default is `false`. | If `remove_brackets` is `true`, `{"key1=(value1)"}` will parse into `{"key1": value1}`. If `remove_brackets` is `false`, `{"key1=(value1)"}` will parse into `{"key1": "(value1)"}`. | -| recursive | Specifies whether to recursively obtain additional key-value pairs from values. The extra key-value pairs will be stored as sub-keys of the root key. Default is `false`. The levels of recursive parsing must be defined by different brackets for each level: `[]`, `()`, and `<>`, in this order. Any other configurations specified will only be applied to the outmost keys.
When `recursive` is `true`:
`remove_brackets` cannot also be `true`;
`skip_duplicate_values` will always be `true`;
`whitespace` will always be `"strict"`. | If `recursive` is true, `{"item1=[item1-subitem1=item1-subitem1-value&item1-subitem2=(item1-subitem2-subitem2A=item1-subitem2-subitem2A-value&item1-subitem2-subitem2B=item1-subitem2-subitem2B-value)]&item2=item2-value"}` will parse into `{"item1": {"item1-subitem1": "item1-subitem1-value", "item1-subitem2" {"item1-subitem2-subitem2A": "item1-subitem2-subitem2A-value", "item1-subitem2-subitem2B": "item1-subitem2-subitem2B-value"}}}`. | -| overwrite_if_destination_exists | Specifies whether to overwrite existing fields if there are key conflicts when writing parsed fields to the event. Default is `true`. | If `overwrite_if_destination_exists` is `true` and destination is `null`, `{"key1": "old_value", "message": "key1=new_value"}` will parse into `{"key1": "new_value", "message": "key1=new_value"}`. | -| tags_on_failure | When a `kv` operation causes a runtime exception within the processor, the operation is safely stopped without crashing the processor, and the event is tagged with the provided tags. | If `tags_on_failure` is set to `["keyvalueprocessor_failure"]`, `{"tags": ["keyvalueprocessor_failure"]}` will be added to the event's metadata in the event of a runtime exception. | -| value_grouping | Specifies whether to group values using predefined value grouping delimiters: `{...}`, `[...]', `<...>`, `(...)`, `"..."`, `'...'`, `http://... (space)`, and `https:// (space)`. If this flag is enabled, then the content between the delimiters is considered to be one entity and is not parsed for key-value pairs. Default is `false`. If `value_grouping` is `true`, then `{"key1=[a=b,c=d]&key2=value2"}` parses to `{"key1": "[a=b,c=d]", "key2": "value2"}`. | -| drop_keys_with_no_value | Specifies whether keys should be dropped if they have a null value. Default is `false`. If `drop_keys_with_no_value` is set to `true`, then `{"key1=value1&key2"}` parses to `{"key1": "value1"}`. | -| strict_grouping | Specifies whether strict grouping should be enabled when the `value_grouping` or `string_literal_character` options are used. Default is `false`. | When enabled, groups with unmatched end characters yield errors. The event is ignored after the errors are logged. | -| string_literal_character | Can be set to either a single quotation mark (`'`) or a double quotation mark (`"`). Default is `null`. | When this option is used, any text contained within the specified quotation mark character will be ignored and excluded from key-value parsing. For example, `text1 "key1=value1" text2 key2=value2` would parse to `{"key2": "value2"}`. | -| key_value_when | Allows you to specify a [conditional expression](https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/), such as `/some-key == "test"`, that will be evaluated to determine whether the processor should be applied to the event. | +Option | Description | Example +:--- | :--- | :--- +`source` | The message field to be parsed. Optional. Default value is `message`. | If `source` is `"message1"`, `{"message1": {"key1=value1"}, "message2": {"key2=value2"}}` parses into `{"message1": {"key1=value1"}, "message2": {"key2=value2"}, "parsed_message": {"key1": "value1"}}`. +destination | The destination field for the parsed source. The parsed source overwrites the preexisting data for that key. Optional. If `destination` is set to `null`, the parsed fields will be written to the root of the event. Default value is `parsed_message`. | If `destination` is `"parsed_data"`, `{"message": {"key1=value1"}}` parses into `{"message": {"key1=value1"}, "parsed_data": {"key1": "value1"}}`. +`field_delimiter_regex` | A regular expression specifying the delimiter that separates key-value pairs. Special regular expression characters such as `[` and `]` must be escaped with `\\`. Cannot be defined at the same time as `field_split_characters`. Optional. If this option is not defined, `field_split_characters` is used. | If `field_delimiter_regex` is `"&\\{2\\}"`, `{"key1=value1&&key2=value2"}` parses into `{"key1": "value1", "key2": "value2"}`. +`field_split_characters` | A string of characters specifying the delimeter that separates key-value pairs. Special regular expression characters such as `[` and `]` must be escaped with `\\`. Cannot be defined at the same time as `field_delimiter_regex`. Optional. Default value is `&`. | If `field_split_characters` is `"&&"`, `{"key1=value1&&key2=value2"}` parses into `{"key1": "value1", "key2": "value2"}`. +`key_value_delimiter_regex` | A regular expression specifying the delimiter that separates the key and value within a key-value pair. Special regular expression characters such as `[` and `]` must be escaped with `\\`. This option cannot be defined at the same time as `value_split_characters`. Optional. If this option is not defined, `value_split_characters` is used. | If `key_value_delimiter_regex` is `"=\\{2\\}"`, `{"key1==value1"}` parses into `{"key1": "value1"}`. +`value_split_characters` | A string of characters specifying the delimiter that separates the key and value within a key-value pair. Special regular expression characters such as `[` and `]` must be escaped with `\\`. Cannot be defined at the same time as `key_value_delimiter_regex`. Optional. Default value is `=`. | If `value_split_characters` is `"=="`, `{"key1==value1"}` parses into `{"key1": "value1"}`. +`non_match_value` | When a key-value pair cannot be successfully split, the key-value pair is placed in the `key` field, and the specified value is placed in the `value` field. Optional. Default value is `null`. | `key1value1&key2=value2` parses into `{"key1value1": null, "key2": "value2"}`. | +`prefix` | A prefix to append before all keys. Optional. Default value is an empty string. | If `prefix` is `"custom"`, `{"key1=value1"}` parses into `{"customkey1": "value1"}`. +`delete_key_regex` | A regular expression specifying the characters to delete from the key. Special regular expression characters such as `[` and `]` must be escaped with `\\`. Cannot be an empty string. Optional. No default value. | If `delete_key_regex` is `"\s"`, `{"key1 =value1"}` parses into `{"key1": "value1"}`. +`delete_value_regex` | A regular expression specifying the characters to delete from the value. Special regular expression characters such as `[` and `]` must be escaped with `\\`. Cannot be an empty string. Optional. No default value. | If `delete_value_regex` is `"\s"`, `{"key1=value1 "}` parses into `{"key1": "value1"}`. +`include_keys` | An array specifying the keys that should be added for parsing. By default, all keys will be added. | If `include_keys` is `["key2"]`,`key1=value1&key2=value2` will parse into `{"key2": "value2"}`. +`exclude_keys` | An array specifying the parsed keys that should not be added to the event. By default, no keys will be excluded. | If `exclude_keys` is `["key2"]`, `key1=value1&key2=value2` will parse into `{"key1": "value1"}`. +`default_values` | A map specifying the default keys and their values that should be added to the event in case these keys do not exist in the source field being parsed. If the default key already exists in the message, the value is not changed. The `include_keys` filter will be applied to the message before `default_values`. | If `default_values` is `{"defaultkey": "defaultvalue"}`, `key1=value1` will parse into `{"key1": "value1", "defaultkey": "defaultvalue"}`.
If `default_values` is `{"key1": "abc"}`, `key1=value1` will parse into `{"key1": "value1"}`.
If `include_keys` is `["key1"]` and `default_values` is `{"key2": "value2"}`, `key1=value1&key2=abc` will parse into `{"key1": "value1", "key2": "value2"}`. +`transform_key` | When to lowercase, uppercase, or capitalize keys. | If `transform_key` is `lowercase`, `{"Key1=value1"}` will parse into `{"key1": "value1"}`.
If `transform_key` is `uppercase`, `{"key1=value1"}` will parse into `{"KEY1": "value1"}`.
If `transform_key` is `capitalize`, `{"key1=value1"}` will parse into `{"Key1": "value1"}`. +`whitespace` | Specifies whether to be lenient or strict with the acceptance of unnecessary white space surrounding the configured value-split sequence. Default is `lenient`. | If `whitespace` is `"lenient"`, `{"key1 = value1"}` will parse into `{"key1 ": " value1"}`. If `whitespace` is `"strict"`, `{"key1 = value1"}` will parse into `{"key1": "value1"}`. +`skip_duplicate_values` | A Boolean option for removing duplicate key-value pairs. When set to `true`, only one unique key-value pair will be preserved. Default is `false`. | If `skip_duplicate_values` is `false`, `{"key1=value1&key1=value1"}` will parse into `{"key1": ["value1", "value1"]}`. If `skip_duplicate_values` is `true`, `{"key1=value1&key1=value1"}` will parse into `{"key1": "value1"}`. +`remove_brackets` | Specifies whether to treat square brackets, angle brackets, and parentheses as value "wrappers" that should be removed from the value. Default is `false`. | If `remove_brackets` is `true`, `{"key1=(value1)"}` will parse into `{"key1": value1}`. If `remove_brackets` is `false`, `{"key1=(value1)"}` will parse into `{"key1": "(value1)"}`. +`recursive` | Specifies whether to recursively obtain additional key-value pairs from values. The extra key-value pairs will be stored as sub-keys of the root key. Default is `false`. The levels of recursive parsing must be defined by different brackets for each level: `[]`, `()`, and `<>`, in this order. Any other configurations specified will only be applied to the outmost keys.
When `recursive` is `true`:
`remove_brackets` cannot also be `true`;
`skip_duplicate_values` will always be `true`;
`whitespace` will always be `"strict"`. | If `recursive` is true, `{"item1=[item1-subitem1=item1-subitem1-value&item1-subitem2=(item1-subitem2-subitem2A=item1-subitem2-subitem2A-value&item1-subitem2-subitem2B=item1-subitem2-subitem2B-value)]&item2=item2-value"}` will parse into `{"item1": {"item1-subitem1": "item1-subitem1-value", "item1-subitem2" {"item1-subitem2-subitem2A": "item1-subitem2-subitem2A-value", "item1-subitem2-subitem2B": "item1-subitem2-subitem2B-value"}}}`. +`overwrite_if_destination_exists` | Specifies whether to overwrite existing fields if there are key conflicts when writing parsed fields to the event. Default is `true`. | If `overwrite_if_destination_exists` is `true` and destination is `null`, `{"key1": "old_value", "message": "key1=new_value"}` will parse into `{"key1": "new_value", "message": "key1=new_value"}`. +`tags_on_failure` | When a `kv` operation causes a runtime exception within the processor, the operation is safely stopped without crashing the processor, and the event is tagged with the provided tags. | If `tags_on_failure` is set to `["keyvalueprocessor_failure"]`, `{"tags": ["keyvalueprocessor_failure"]}` will be added to the event's metadata in the event of a runtime exception. +`value_grouping` | Specifies whether to group values using predefined value grouping delimiters: `{...}`, `[...]', `<...>`, `(...)`, `"..."`, `'...'`, `http://... (space)`, and `https:// (space)`. If this flag is enabled, then the content between the delimiters is considered to be one entity and is not parsed for key-value pairs. Default is `false`. If `value_grouping` is `true`, then `{"key1=[a=b,c=d]&key2=value2"}` parses to `{"key1": "[a=b,c=d]", "key2": "value2"}`. +`drop_keys_with_no_value` | Specifies whether keys should be dropped if they have a null value. Default is `false`. If `drop_keys_with_no_value` is set to `true`, then `{"key1=value1&key2"}` parses to `{"key1": "value1"}`. +`strict_grouping` | Specifies whether strict grouping should be enabled when the `value_grouping` or `string_literal_character` options are used. Default is `false`. | When enabled, groups with unmatched end characters yield errors. The event is ignored after the errors are logged. +`string_literal_character` | Can be set to either a single quotation mark (`'`) or a double quotation mark (`"`). Default is `null`. | When this option is used, any text contained within the specified quotation mark character will be ignored and excluded from key-value parsing. For example, `text1 "key1=value1" text2 key2=value2` would parse to `{"key2": "value2"}`. +`key_value_when` | Allows you to specify a [conditional expression](https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/), such as `/some-key == "test"`, that will be evaluated to determine whether the processor should be applied to the event. - - From 1b11a237ad81119f7b50ee0c8170fd687baa555d Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Tue, 9 Jul 2024 09:26:32 -0400 Subject: [PATCH 003/154] Add reranking search results with MS Marco cross-encoder tutorial (#7634) * Add reranking search results with MS Marco cross-encoder tutorial Signed-off-by: Fanit Kolchina * Update _ml-commons-plugin/tutorials/reranking-cross-encoder.md Co-authored-by: Melissa Vagi Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _ml-commons-plugin/tutorials/reranking-cross-encoder.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower --- _ml-commons-plugin/tutorials/index.md | 1 + .../tutorials/reranking-cross-encoder.md | 391 ++++++++++++++++++ 2 files changed, 392 insertions(+) create mode 100644 _ml-commons-plugin/tutorials/reranking-cross-encoder.md diff --git a/_ml-commons-plugin/tutorials/index.md b/_ml-commons-plugin/tutorials/index.md index 4479d0878f..070da3cae1 100644 --- a/_ml-commons-plugin/tutorials/index.md +++ b/_ml-commons-plugin/tutorials/index.md @@ -19,6 +19,7 @@ Using the OpenSearch machine learning (ML) framework, you can build various appl - **Reranking search results**: - [Reranking search results using the Cohere Rerank model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/reranking-cohere/) + - [Reranking search results using the MS MARCO cross-encoder model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/reranking-cross-encoder/) - **Agents and tools**: - [Retrieval-augmented generation (RAG) chatbot]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/rag-chatbot/) diff --git a/_ml-commons-plugin/tutorials/reranking-cross-encoder.md b/_ml-commons-plugin/tutorials/reranking-cross-encoder.md new file mode 100644 index 0000000000..e46c7eb511 --- /dev/null +++ b/_ml-commons-plugin/tutorials/reranking-cross-encoder.md @@ -0,0 +1,391 @@ +--- +layout: default +title: Reranking with the MS MARCO cross-encoder +parent: Tutorials +nav_order: 35 +--- + +# Reranking search results using the MS MARCO cross-encoder model + +A [reranking pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/reranking-search-results/) can rerank search results, providing a relevance score for each document in the search results with respect to the search query. The relevance score is calculated by a cross-encoder model. + +This tutorial illustrates how to use the [Hugging Face `ms-marco-MiniLM-L-6-v2` model](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-6-v2) in a reranking pipeline. + +Replace the placeholders beginning with the prefix `your_` with your own values. +{: .note} + +## Prerequisite + +Before you start, deploy the model on Amazon SageMaker. For better performance, use a GPU. + +Run the following code to deploy the model on [Amazon SageMaker](https://aws.amazon.com/pm/sagemaker): + +```python +import sagemaker +import boto3 +from sagemaker.huggingface import HuggingFaceModel +sess = sagemaker.Session() +role = sagemaker.get_execution_role() + +hub = { + 'HF_MODEL_ID':'cross-encoder/ms-marco-MiniLM-L-6-v2', + 'HF_TASK':'text-classification' +} +huggingface_model = HuggingFaceModel( + transformers_version='4.37.0', + pytorch_version='2.1.0', + py_version='py310', + env=hub, + role=role, +) +predictor = huggingface_model.deploy( + initial_instance_count=1, # number of instances + instance_type='ml.m5.xlarge' # ec2 instance type +) +``` +{% include copy.html %} + +Note the model inference endpoint; you'll use it to create a connector in the next step. + +## Step 1: Create a connector and register the model + +First, create a connector for the model, providing the inference endpoint and your AWS credentials: + +```json +POST /_plugins/_ml/connectors/_create +{ + "name": "Sagemaker cross-encoder model", + "description": "Test connector for Sagemaker cross-encoder model", + "version": 1, + "protocol": "aws_sigv4", + "credential": { + "access_key": "your_access_key", + "secret_key": "your_secret_key", + "session_token": "your_session_token" + }, + "parameters": { + "region": "your_sagemkaer_model_region_like_us-west-2", + "service_name": "sagemaker" + }, + "actions": [ + { + "action_type": "predict", + "method": "POST", + "url": "your_sagemaker_model_inference_endpoint_created_in_last_step", + "headers": { + "content-type": "application/json" + }, + "request_body": "{ \"inputs\": ${parameters.inputs} }", + "pre_process_function": "\n String escape(def input) { \n if (input.contains(\"\\\\\")) {\n input = input.replace(\"\\\\\", \"\\\\\\\\\");\n }\n if (input.contains(\"\\\"\")) {\n input = input.replace(\"\\\"\", \"\\\\\\\"\");\n }\n if (input.contains('\r')) {\n input = input = input.replace('\r', '\\\\r');\n }\n if (input.contains(\"\\\\t\")) {\n input = input.replace(\"\\\\t\", \"\\\\\\\\\\\\t\");\n }\n if (input.contains('\n')) {\n input = input.replace('\n', '\\\\n');\n }\n if (input.contains('\b')) {\n input = input.replace('\b', '\\\\b');\n }\n if (input.contains('\f')) {\n input = input.replace('\f', '\\\\f');\n }\n return input;\n }\n\n String query = params.query_text;\n StringBuilder builder = new StringBuilder('[');\n \n for (int i=0; i Date: Tue, 9 Jul 2024 23:20:31 +0800 Subject: [PATCH 004/154] Add fingerprint processor (#7631) * Add fingerprint processor Signed-off-by: gaobinlong * Completed doc review Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/fingerprint.md Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/fingerprint.md Signed-off-by: Melissa Vagi * Update nav order Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/fingerprint.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/fingerprint.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/fingerprint.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/fingerprint.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/fingerprint.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/index-processors.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --------- Signed-off-by: gaobinlong Signed-off-by: Melissa Vagi Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower --- _ingest-pipelines/processors/fingerprint.md | 158 ++++++++++++++++++ .../processors/index-processors.md | 1 + 2 files changed, 159 insertions(+) create mode 100644 _ingest-pipelines/processors/fingerprint.md diff --git a/_ingest-pipelines/processors/fingerprint.md b/_ingest-pipelines/processors/fingerprint.md new file mode 100644 index 0000000000..4775da98b6 --- /dev/null +++ b/_ingest-pipelines/processors/fingerprint.md @@ -0,0 +1,158 @@ +--- +layout: default +title: Fingerprint +parent: Ingest processors +nav_order: 105 +--- + +# Fingerprint processor +Introduced 2.16 +{: .label .label-purple } + +The `fingerprint` processor is used to generate a hash value for either certain specified fields or all fields in a document. The hash value can be used to deduplicate documents within an index and collapse search results. + +For each field, the field name, the length of the field value, and the field value itself are concatenated and separated by the pipe character `|`. For example, if the field name is `field1` and the value is `value1`, then the concatenated string would be `|field1|3:value1|field2|10:value2|`. For object fields, the field name is flattened by joining the nested field names with a period `.`. For instance, if the object field is `root_field` with a sub-field `sub_field1` having the value `value1` and another sub-field `sub_field2` with the value `value2`, then the concatenated string would be `|root_field.sub_field1|1:value1|root_field.sub_field2|100:value2|`. + +The following is the syntax for the `fingerprint` processor: + +```json +{ + "community_id": { + "fields": ["foo", "bar"], + "target_field": "fingerprint", + "hash_method": "SHA-1@2.16.0" + } +} +``` +{% include copy-curl.html %} + +## Configuration parameters + +The following table lists the required and optional parameters for the `fingerprint` processor. + +Parameter | Required/Optional | Description | +|-----------|-----------|-----------| +`fields` | Optional | A list of fields used to generate a hash value. | +`exclude_fields` | Optional | Specifies the fields to be excluded from hash value generation. It is mutually exclusive with the `fields` parameter; if both `exclude_fields` and `fields` are empty or null, then all fields are included in the hash value calculation. | +`hash_method` | Optional | Specifies the hashing algorithm to be used, with options being `MD5@2.16.0`, `SHA-1@2.16.0`, `SHA-256@2.16.0`, or `SHA3-256@2.16.0`. Default is `SHA-1@2.16.0`. The version number is appended to ensure consistent hashing across OpenSearch versions, and new versions will support new hash methods. | +`target_field` | Optional | Specifies the name of the field in which the generated hash value will be stored. If not provided, then the hash value is stored in the `fingerprint` field by default. | +`ignore_missing` | Optional | Specifies whether the processor should exit quietly if one of the required fields is missing. Default is `false`. | +`description` | Optional | A brief description of the processor. | +`if` | Optional | A condition for running the processor. | +`ignore_failure` | Optional | If set to `true`, then failures are ignored. Default is `false`. | +`on_failure` | Optional | A list of processors to run if the processor fails. | +`tag` | Optional | An identifier tag for the processor. Useful for debugging in order to distinguish between processors of the same type. | + +## Using the processor + +Follow these steps to use the processor in a pipeline. + +**Step 1: Create a pipeline** + +The following query creates a pipeline named `fingerprint_pipeline` that uses the `fingerprint` processor to generate a hash value for specified fields in the document: + +```json +PUT /_ingest/pipeline/fingerprint_pipeline +{ + "description": "generate hash value for some specified fields the document", + "processors": [ + { + "fingerprint": { + "fields": ["foo", "bar"] + } + } + ] +} +``` +{% include copy-curl.html %} + +**Step 2 (Optional): Test the pipeline** + +It is recommended that you test your pipeline before ingesting documents. +{: .tip} + +To test the pipeline, run the following query: + +```json +POST _ingest/pipeline/fingerprint_pipeline/_simulate +{ + "docs": [ + { + "_index": "testindex1", + "_id": "1", + "_source": { + "foo": "foo", + "bar": "bar" + } + } + ] +} +``` +{% include copy-curl.html %} + +#### Response + +The following example response confirms that the pipeline is working as expected: + +```json +{ + "docs": [ + { + "doc": { + "_index": "testindex1", + "_id": "1", + "_source": { + "foo": "foo", + "bar": "bar", + "fingerprint": "SHA-1@2.16.0:fYeen7hTJ2zs9lpmUnk6nvH54sM=" + }, + "_ingest": { + "timestamp": "2024-03-11T02:17:22.329823Z" + } + } + } + ] +} +``` + +**Step 3: Ingest a document** + +The following query ingests a document into an index named `testindex1`: + +```json +PUT testindex1/_doc/1?pipeline=fingerprint_pipeline +{ + "foo": "foo", + "bar": "bar" +} +``` +{% include copy-curl.html %} + +#### Response + +The request indexes the document into the `testindex1` index: + +```json +{ + "_index": "testindex1", + "_id": "1", + "_version": 1, + "result": "created", + "_shards": { + "total": 2, + "successful": 1, + "failed": 0 + }, + "_seq_no": 0, + "_primary_term": 1 +} +``` + +**Step 4 (Optional): Retrieve the document** + +To retrieve the document, run the following query: + +```json +GET testindex1/_doc/1 +``` +{% include copy-curl.html %} diff --git a/_ingest-pipelines/processors/index-processors.md b/_ingest-pipelines/processors/index-processors.md index 4b229f0a61..0e1ee1e114 100644 --- a/_ingest-pipelines/processors/index-processors.md +++ b/_ingest-pipelines/processors/index-processors.md @@ -40,6 +40,7 @@ Processor type | Description `dot_expander` | Expands a field with dots into an object field. `drop` |Drops a document without indexing it or raising any errors. `fail` | Raises an exception and stops the execution of a pipeline. +`fingerprint` | Generates a hash value for either certain specified fields or all fields in a document. `foreach` | Allows for another processor to be applied to each element of an array or an object field in a document. `geoip` | Adds information about the geographical location of an IP address. `geojson-feature` | Indexes GeoJSON data into a geospatial field. From 46fecd2ac1a247b8fdc88535b2983b4c86dd337f Mon Sep 17 00:00:00 2001 From: "Daniel (dB.) Doubrovkine" Date: Tue, 9 Jul 2024 10:21:51 -0500 Subject: [PATCH 005/154] Removed incorrect ignore_malformed query parameter. (#7652) Signed-off-by: dblock --- _api-reference/index-apis/put-mapping.md | 1 - 1 file changed, 1 deletion(-) diff --git a/_api-reference/index-apis/put-mapping.md b/_api-reference/index-apis/put-mapping.md index 47c47fa125..f7d9321d33 100644 --- a/_api-reference/index-apis/put-mapping.md +++ b/_api-reference/index-apis/put-mapping.md @@ -76,7 +76,6 @@ Parameter | Data type | Description allow_no_indices | Boolean | Whether to ignore wildcards that don’t match any indexes. Default is `true`. expand_wildcards | String | Expands wildcard expressions to different indexes. Combine multiple values with commas. Available values are `all` (match all indexes), `open` (match open indexes), `closed` (match closed indexes), `hidden` (match hidden indexes), and `none` (do not accept wildcard expressions), which must be used with `open`, `closed`, or both. Default is `open`. ignore_unavailable | Boolean | If true, OpenSearch does not include missing or closed indexes in the response. -ignore_malformed | Boolean | Use this parameter with the `ip_range` data type to specify that OpenSearch should ignore malformed fields. If `true`, OpenSearch does not include entries that do not match the IP range specified in the index in the response. The default is `false`. cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. timeout | Time | How long to wait for the response to return. Default is `30s`. write_index_only | Boolean | Whether OpenSearch should apply mapping updates only to the write index. From bdc4c8c1dcd0d8dd3be34f824dbe05ae0b1d4a43 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 9 Jul 2024 09:36:58 -0600 Subject: [PATCH 006/154] Add geo-centroid and weighted average aggregations documentation (#7613) * Add geo-centroid and weighted avaerage aggregations documentation Signed-off-by: Melissa Vagi * Add geocentroid content and examples Signed-off-by: Melissa Vagi * Add weighted average content and examples Signed-off-by: Melissa Vagi * Update _aggregations/metric/geocentroid.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/geocentroid.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/geocentroid.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/geocentroid.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/geocentroid.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/geocentroid.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/weighted-avg.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/geocentroid.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/weighted-avg.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/weighted-avg.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/weighted-avg.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/weighted-avg.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/geocentroid.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/weighted-avg.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/weighted-avg.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/weighted-avg.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/weighted-avg.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/geocentroid.md Signed-off-by: Melissa Vagi * Update _aggregations/metric/geocentroid.md Signed-off-by: Melissa Vagi --------- Signed-off-by: Melissa Vagi Co-authored-by: Nathan Bower --- _aggregations/metric/geocentroid.md | 256 +++++++++++++++++++++++++++ _aggregations/metric/weighted-avg.md | 149 ++++++++++++++++ 2 files changed, 405 insertions(+) create mode 100644 _aggregations/metric/geocentroid.md create mode 100644 _aggregations/metric/weighted-avg.md diff --git a/_aggregations/metric/geocentroid.md b/_aggregations/metric/geocentroid.md new file mode 100644 index 0000000000..711f49862a --- /dev/null +++ b/_aggregations/metric/geocentroid.md @@ -0,0 +1,256 @@ +--- +layout: default +title: Geocentroid +parent: Metric aggregations +grand_parent: Aggregations +nav_order: 45 +--- + +# Geocentroid + +The OpenSearch `geo_centroid` aggregation is a powerful tool that allows you to calculate the weighted geographic center or focal point of a set of spatial data points. This metric aggregation operates on `geo_point` fields and returns the centroid location as a latitude-longitude pair. + +## Using the aggregation + +Follow these steps to use the `geo_centroid` aggregation: + +**1. Create an index with a `geopoint` field** + +First, you need to create an index with a `geo_point` field type. This field stores the geographic coordinates you want to analyze. For example, to create an index called `restaurants` with a `location` field of type `geo_point`, use the following request: + +```json +PUT /restaurants +{ + "mappings": { + "properties": { + "name": { + "type": "text" + }, + "location": { + "type": "geo_point" + } + } + } +} +``` +{% include copy-curl.html %} + +**2. Index documents with spatial data** + +Next, index your documents containing the spatial data points you want to analyze. Make sure to include the `geo_point` field with the appropriate latitude-longitude coordinates. For example, index your documents using the following request: + +```json +POST /restaurants/_bulk?refresh +{"index": {"_id": 1}} +{"name": "Cafe Delish", "location": "40.7128, -74.0059"} +{"index": {"_id": 2}} +{"name": "Tasty Bites", "location": "51.5074, -0.1278"} +{"index": {"_id": 3}} +{"name": "Sushi Palace", "location": "48.8566, 2.3522"} +{"index": {"_id": 4}} +{"name": "Burger Joint", "location": "34.0522, -118.2437"} +``` +{% include copy-curl.html %} + +**3. Run the `geo_centroid` aggregation** + +To caluculate the centroid location across all documents, run a search with the `geo_centroid` aggregation on the `geo_point` field. For example, use the following request: + +```json +GET /restaurants/_search +{ + "size": 0, + "aggs": { + "centroid": { + "geo_centroid": { + "field": "location" + } + } + } +} +``` +{% include copy-curl.html %} + +The response includes a `centroid` object with `lat` and `lon` properties representing the weighted centroid location of all indexed data point, as shown in the following example: + + ```json + "aggregations": { + "centroid": { + "location": { + "lat": 43.78224998130463, + "lon": -47.506300045643 + }, + "count": 4 +``` +{% include copy-curl.html %} + +**4. Nest under other aggregations (optional)** + +You can also nest the `geo_centroid` aggregation under other bucket aggregations, such as `terms`, to calculate the centroid for subsets of your data. For example, to find the centroid location for each city, use the following request: + +```json +GET /restaurants/_search +{ + "size": 0, + "aggs": { + "cities": { + "terms": { + "field": "city.keyword" + }, + "aggs": { + "centroid": { + "geo_centroid": { + "field": "location" + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +This returns a centroid location for each city bucket, allowing you to analyze the geographic center of data points in different cities. + +## Using `geo_centroid` with the `geohash_grid` aggregation + +The `geohash_grid` aggregation partitions geospatial data into buckets based on geohash prefixes. + +When a document contains multiple geopoint values in a field, the `geohash_grid` aggregation assigns the document to multiple buckets, even if one or more of its geopoints are outside the bucket boundaries. This behavior is different from how individual geopoints are treated, where only those within the bucket boundaries are considered. + +When you nest the `geo_centroid` aggregation under the `geohash_grid` aggregation, each centroid is calculated using all geopoints in a bucket, including those that may be outside the bucket boundaries. This can result in centroid locations that fall outside the geographic area represented by the bucket. + +#### Example + +In this example, the `geohash_grid` aggregation with a `precision` of `3` creates buckets based on geohash prefixes of length `3`. Because each document has multiple geopoints, they may be assigned to multiple buckets, even if some of the geopoints fall outside the bucket boundaries. + +The `geo_centroid` subaggregation calculates the centroid for each bucket using all geopoints assigned to that bucket, including those outside the bucket boundaries. This means that the resulting centroid locations may not necessarily lie within the geographic area represented by the corresponding geohash bucket. + +First, create an index and index documents containing multiple geopoints: + +```json +PUT /locations +{ + "mappings": { + "properties": { + "name": { + "type": "text" + }, + "coordinates": { + "type": "geo_point" + } + } + } +} + +POST /locations/_bulk?refresh +{"index": {"_id": 1}} +{"name": "Point A", "coordinates": ["40.7128, -74.0059", "51.5074, -0.1278"]} +{"index": {"_id": 2}} +{"name": "Point B", "coordinates": ["48.8566, 2.3522", "34.0522, -118.2437"]} +``` + +Then, run `geohash_grid` with the `geo_centroid` subaggregation: + +```json +GET /locations/_search +{ + "size": 0, + "aggs": { + "grid": { + "geohash_grid": { + "field": "coordinates", + "precision": 3 + }, + "aggs": { + "centroid": { + "geo_centroid": { + "field": "coordinates" + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +
+   +    Response +   +  {: .text-delta} + +```json +{ + "took": 26, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 2, + "relation": "eq" + }, + "max_score": null, + "hits": [] + }, + "aggregations": { + "grid": { + "buckets": [ + { + "key": "u09", + "doc_count": 1, + "centroid": { + "location": { + "lat": 41.45439997315407, + "lon": -57.945750039070845 + }, + "count": 2 + } + }, + { + "key": "gcp", + "doc_count": 1, + "centroid": { + "location": { + "lat": 46.11009998945519, + "lon": -37.06685005221516 + }, + "count": 2 + } + }, + { + "key": "dr5", + "doc_count": 1, + "centroid": { + "location": { + "lat": 46.11009998945519, + "lon": -37.06685005221516 + }, + "count": 2 + } + }, + { + "key": "9q5", + "doc_count": 1, + "centroid": { + "location": { + "lat": 41.45439997315407, + "lon": -57.945750039070845 + }, + "count": 2 + } + } + ] + } + } +} +``` +{% include copy-curl.html %} + +
diff --git a/_aggregations/metric/weighted-avg.md b/_aggregations/metric/weighted-avg.md new file mode 100644 index 0000000000..268f78bfdc --- /dev/null +++ b/_aggregations/metric/weighted-avg.md @@ -0,0 +1,149 @@ +--- +layout: default +title: Weighted average +parent: Metric aggregations +grand_parent: Aggregations +nav_order: 150 +--- + +# Weighted average + +The `weighted_avg` aggregation calculates the weighted average of numeric values across documents. This is useful when you want to calculate an average but weight some data points more heavily than others. + +## Weighted average calculation + +The weighted average is calculated as `(sum of value * weight) / (sum of weights)`. + +## Parameters + +When using the `weighted_avg` aggregation, you must define the following parameters: + +- `value`: The field or script used to obtain the average numeric values +- `weight`: The field or script used to obtain the weight for each value + +Optionally, you can specify the following parameters: + +- `format`: A numeric format to apply to the output value +- `value_type`: A type hint for the values when using scripts or unmapped fields + +For the value or weight, you can specify the following parameters: + +- `field`: The document field to use +- `missing`: A value or weight to use if the field is missing + + +## Using the aggregation + +Follow these steps to use the `weighted_avg` aggregation: + +**1. Create an index and index some documents** + +```json +PUT /products + +POST /products/_doc/1 +{ + "name": "Product A", + "rating": 4, + "num_reviews": 100 +} + +POST /products/_doc/2 +{ + "name": "Product B", + "rating": 5, + "num_reviews": 20 +} + +POST /products/_doc/3 +{ + "name": "Product C", + "rating": 3, + "num_reviews": 50 +} +``` +{% include copy-curl.html %} + +**2. Run the `weighted_avg` aggregation** + +```json +GET /products/_search +{ + "size": 0, + "aggs": { + "weighted_rating": { + "weighted_avg": { + "value": { + "field": "rating" + }, + "weight": { + "field": "num_reviews" + } + } + } + } +} +``` +{% include copy-curl.html %} + +## Handling missing values + +The `missing` parameter allows you to specify default values for documents missing the `value` field or the `weight` field instead of excluding them from the calculation. + +The following is an example of this behavior. First, create an index and add sample documents. This example includes five documents with different combinations of missing values for the `rating` and `num_reviews` fields: + +```json +PUT /products +{ + "mappings": { + "properties": { + "name": { + "type": "text" + }, + "rating": { + "type": "double" + }, + "num_reviews": { + "type": "integer" + } + } + } +} + +POST /_bulk +{ "index": { "_index": "products" } } +{ "name": "Product A", "rating": 4.5, "num_reviews": 100 } +{ "index": { "_index": "products" } } +{ "name": "Product B", "rating": 3.8, "num_reviews": 50 } +{ "index": { "_index": "products" } } +{ "name": "Product C", "rating": null, "num_reviews": 20 } +{ "index": { "_index": "products" } } +{ "name": "Product D", "rating": 4.2, "num_reviews": null } +{ "index": { "_index": "products" } } +{ "name": "Product E", "rating": null, "num_reviews": null } +``` +{% include copy-curl.html %} + +Next, run the following `weighted_avg` aggregation: + +```json +GET /products/_search +{ + "size": 0, + "aggs": { + "weighted_rating": { + "weighted_avg": { + "value": { + "field": "rating" + }, + "weight": { + "field": "num_reviews" + } + } + } + } +} +``` +{% include copy-curl.html %} + +In the response, you can see that the missing values for `Product E` were completely ignored in the calculation. From 8f6ea3f1a9d0b9c5b0b8f8edb152fb9dffe8d77a Mon Sep 17 00:00:00 2001 From: AntonEliatra Date: Tue, 9 Jul 2024 17:47:02 +0100 Subject: [PATCH 007/154] Setting-envars-docs #3582 (#7400) * setting-envars-docs #3582 Signed-off-by: AntonEliatra * Update index.md Signed-off-by: AntonEliatra * Apply suggestions from code review Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: AntonEliatra * Update index.md Signed-off-by: AntonEliatra --------- Signed-off-by: AntonEliatra Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../configuring-opensearch/index.md | 45 ++++++++++++++++++- 1 file changed, 44 insertions(+), 1 deletion(-) diff --git a/_install-and-configure/configuring-opensearch/index.md b/_install-and-configure/configuring-opensearch/index.md index ecbce1310d..c2ffbf571b 100755 --- a/_install-and-configure/configuring-opensearch/index.md +++ b/_install-and-configure/configuring-opensearch/index.md @@ -25,6 +25,10 @@ Certain operations are static and require you to modify the `opensearch.yml` [co ## Specifying settings as environment variables +You can specify environment variables in the following ways. + +### Arguments at startup + You can specify environment variables as arguments using `-E` when launching OpenSearch: ```bash @@ -32,6 +36,45 @@ You can specify environment variables as arguments using `-E` when launching Ope ``` {% include copy.html %} +### Directly in the shell environment + +You can configure the environment variables directly in a shell environment before starting OpenSearch, as shown in the following example: + +```bash +export OPENSEARCH_JAVA_OPTS="-Xms2g -Xmx2g" +export OPENSEARCH_PATH_CONF="/etc/opensearch" +./opensearch +``` +{% include copy.html %} + +### Systemd service file + +When running OpenSearch as a service managed by `systemd`, you can specify environment variables in the service file, as shown in the following example: + +```bash +# /etc/systemd/system/opensearch.service.d/override.conf +[Service] +Environment="OPENSEARCH_JAVA_OPTS=-Xms2g -Xmx2g" +Environment="OPENSEARCH_PATH_CONF=/etc/opensearch" +``` +After creating or modifying the file, reload the systemd configuration and restart the service using the following command: + +```bash +sudo systemctl daemon-reload +sudo systemctl restart opensearch +``` +{% include copy.html %} + +### Docker environment variables + +When running OpenSearch in Docker, you can specify environment variables using the `-e` option with `docker run` command, as shown in the following command: + +```bash +docker run -e "OPENSEARCH_JAVA_OPTS=-Xms2g -Xmx2g" -e "OPENSEARCH_PATH_CONF=/usr/share/opensearch/config" opensearchproject/opensearch:latest +``` +{% include copy.html %} + + ## Updating cluster settings using the API The first step in changing a setting is to view the current settings by sending the following request: @@ -113,4 +156,4 @@ If you are working on a client application running against an OpenSearch cluster - http.cors.enabled:true - http.cors.allow-headers:X-Requested-With,X-Auth-Token,Content-Type,Content-Length,Authorization - http.cors.allow-credentials:true -``` \ No newline at end of file +``` From 0d9fc0ed7f3a1904b429a545a90a6033ff6dd77b Mon Sep 17 00:00:00 2001 From: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> Date: Tue, 9 Jul 2024 12:48:41 -0400 Subject: [PATCH 008/154] mention both Dashboards and endpoint (#7638) * mention both Dashboards and endpoint Old text said that Dashboards are a prerequisite for using PPL, and mentioned only the Query Workbench, not the _ppl endpoint. Is it really true that Dashboards are a prerequisite? Or is it just the SQL plugin that is a prerequisite? Signed-off-by: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> * Update index.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _search-plugins/sql/ppl/index.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/_search-plugins/sql/ppl/index.md b/_search-plugins/sql/ppl/index.md index 850a540bc4..602255d126 100644 --- a/_search-plugins/sql/ppl/index.md +++ b/_search-plugins/sql/ppl/index.md @@ -37,7 +37,15 @@ PPL filters, transforms, and aggregates data using a series of commands. See [Co ## Using PPL within OpenSearch -To use PPL, you must have installed OpenSearch Dashboards. PPL is available within the [Query Workbench tool](https://playground.opensearch.org/app/opensearch-query-workbench#/). See the [Query Workbench]({{site.url}}{{site.baseurl}}/dashboards/query-workbench/) documentation for a tutorial on using PPL within OpenSearch. +The SQL plugin is required to run PPL queries in OpenSearch. If you're running a minimal distribution of OpenSearch, you might have to [install the SQL plugin]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/) before using PPL. +{: .note} + +You can run PPL queries interactively in OpenSearch Dashboards or programmatically using the ``_ppl`` endpoint. + +In OpenSearch Dashboards, the [Query Workbench tool](https://playground.opensearch.org/app/opensearch-query-workbench#/) provides an interactive testing environment, documented in [Query Workbench documentation]({{site.url}}{{site.baseurl}}/dashboards/query-workbench/). + +To run a PPL query using the API, see [SQL and PPL API]({{site.url}}{{site.baseurl}}/search-plugins/sql/sql-ppl-api/). + ## Developer documentation From b0ebb50900acc57a08d3061bab08fd89be0e80dc Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=81ukasz=20Rynek?= <36886649+lrynek@users.noreply.github.com> Date: Tue, 9 Jul 2024 18:59:52 +0200 Subject: [PATCH 009/154] Document '_name' field in 'function_score' query's function definition (#7340) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Document '_name' field in 'function_score' query function definition Signed-off-by: Łukasz Rynek * Ensure real request JSON payload Signed-off-by: Łukasz Rynek * Ensure real response JSON payload + finish the paragraph Signed-off-by: Łukasz Rynek * Add missing copy-curl tag Signed-off-by: Łukasz Rynek * Add missing article Signed-off-by: Łukasz Rynek * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Łukasz Rynek <36886649+lrynek@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Łukasz Rynek <36886649+lrynek@users.noreply.github.com> --------- Signed-off-by: Łukasz Rynek Signed-off-by: Łukasz Rynek <36886649+lrynek@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- _query-dsl/compound/function-score.md | 195 +++++++++++++++++++++++++- 1 file changed, 194 insertions(+), 1 deletion(-) diff --git a/_query-dsl/compound/function-score.md b/_query-dsl/compound/function-score.md index 8180058ae6..98568e0965 100644 --- a/_query-dsl/compound/function-score.md +++ b/_query-dsl/compound/function-score.md @@ -826,4 +826,197 @@ The results contain the three matching blog posts: } } ``` - \ No newline at end of file + + +## Named functions + +When defining a function, you can specify its name using the `_name` parameter at the top level. This name is useful for debugging and understanding the scoring process. Once specified, the function name is included in the score calculation explanation whenever possible (this applies to functions, filters, and queries). You can identify the function by its `_name` in the response. + +### Example + +The following request sets `explain` to `true` for debugging purposes in order to obtain a scoring explanation in the response. Each function contains a `_name` parameter so that you can identify the function unambiguously: + +```json +GET blogs/_search +{ + "explain": true, + "size": 1, + "query": { + "function_score": { + "functions": [ + { + "_name": "likes_function", + "script_score": { + "script": { + "lang": "painless", + "source": "return doc['likes'].value * 2;" + } + }, + "weight": 0.6 + }, + { + "_name": "views_function", + "field_value_factor": { + "field": "views", + "factor": 1.5, + "modifier": "log1p", + "missing": 1 + }, + "weight": 0.3 + }, + { + "_name": "comments_function", + "gauss": { + "comments": { + "origin": 1000, + "scale": 800 + } + }, + "weight": 0.1 + } + ] + } + } +} +``` +{% include copy-curl.html %} + +The response explains the scoring process. For each function, the explanation contains the function `_name` in its `description`: + +
+ + Response + + {: .text-delta} + +```json +{ + "took": 14, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 3, + "relation": "eq" + }, + "max_score": 6.1600614, + "hits": [ + { + "_shard": "[blogs][0]", + "_node": "_yndTaZHQWimcDgAfOfRtQ", + "_index": "blogs", + "_id": "1", + "_score": 6.1600614, + "_source": { + "name": "Semantic search in OpenSearch", + "views": 1200, + "likes": 150, + "comments": 16, + "date_posted": "2022-04-17" + }, + "_explanation": { + "value": 6.1600614, + "description": "function score, product of:", + "details": [ + { + "value": 1, + "description": "*:*", + "details": [] + }, + { + "value": 6.1600614, + "description": "min of:", + "details": [ + { + "value": 6.1600614, + "description": "function score, score mode [multiply]", + "details": [ + { + "value": 180, + "description": "product of:", + "details": [ + { + "value": 300, + "description": "script score function(_name: likes_function), computed with script:\"Script{type=inline, lang='painless', idOrCode='return doc['likes'].value * 2;', options={}, params={}}\"", + "details": [ + { + "value": 1, + "description": "_score: ", + "details": [ + { + "value": 1, + "description": "*:*", + "details": [] + } + ] + } + ] + }, + { + "value": 0.6, + "description": "weight", + "details": [] + } + ] + }, + { + "value": 0.9766541, + "description": "product of:", + "details": [ + { + "value": 3.2555137, + "description": "field value function(_name: views_function): log1p(doc['views'].value?:1.0 * factor=1.5)", + "details": [] + }, + { + "value": 0.3, + "description": "weight", + "details": [] + } + ] + }, + { + "value": 0.035040613, + "description": "product of:", + "details": [ + { + "value": 0.35040614, + "description": "Function for field comments:", + "details": [ + { + "value": 0.35040614, + "description": "exp(-0.5*pow(MIN[Math.max(Math.abs(16.0(=doc value) - 1000.0(=origin))) - 0.0(=offset), 0)],2.0)/461662.4130844683, _name: comments_function)", + "details": [] + } + ] + }, + { + "value": 0.1, + "description": "weight", + "details": [] + } + ] + } + ] + }, + { + "value": 3.4028235e+38, + "description": "maxBoost", + "details": [] + } + ] + } + ] + } + } + ] + } +} +``` +
+ From c1542b7cfcb2c7d7d02a86db5c99edba39eef1f9 Mon Sep 17 00:00:00 2001 From: "Daniel (dB.) Doubrovkine" Date: Tue, 9 Jul 2024 12:10:19 -0500 Subject: [PATCH 010/154] Fix: the value of include_defaults is a boolean. (#7657) * Fix: the value of include_defaults is a boolean. Signed-off-by: dblock * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: dblock Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _api-reference/index-apis/get-settings.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/index-apis/get-settings.md b/_api-reference/index-apis/get-settings.md index 41eb4ea113..9ad0078757 100644 --- a/_api-reference/index-apis/get-settings.md +++ b/_api-reference/index-apis/get-settings.md @@ -40,7 +40,7 @@ Parameter | Data type | Description allow_no_indices | Boolean | Whether to ignore wildcards that don’t match any indexes. Default is `true`. expand_wildcards | String | Expands wildcard expressions to different indexes. Combine multiple values with commas. Available values are `all` (match all indexes), `open` (match open indexes), `closed` (match closed indexes), `hidden` (match hidden indexes), and `none` (do not accept wildcard expressions), which must be used with `open`, `closed`, or both. Default is `open`. flat_settings | Boolean | Whether to return settings in the flat form, which can improve readability, especially for heavily nested settings. For example, the flat form of “index”: { “creation_date”: “123456789” } is “index.creation_date”: “123456789”. -include_defaults | String | Whether to include default settings, including settings used within OpenSearch plugins, in the response. Default is false. +include_defaults | Boolean | Whether to include default settings, including settings used within OpenSearch plugins, in the response. Default is `false`. ignore_unavailable | Boolean | If true, OpenSearch does not include missing or closed indexes in the response. local | Boolean | Whether to return information from the local node only instead of the cluster manager node. Default is false. cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. From e88c84a22fd2efbdb66d883120f4795051ff2f19 Mon Sep 17 00:00:00 2001 From: Tyler Ohlsen Date: Wed, 10 Jul 2024 15:31:46 -0700 Subject: [PATCH 011/154] Update detector-visualization integration documentation to specify real-time AD results only (#7663) * Update doc to specify real-time AD results only Signed-off-by: Tyler Ohlsen * Update _observing-your-data/ad/dashboards-anomaly-detection.md Signed-off-by: Melissa Vagi * Update _observing-your-data/ad/dashboards-anomaly-detection.md Signed-off-by: Melissa Vagi --------- Signed-off-by: Tyler Ohlsen Signed-off-by: Melissa Vagi Co-authored-by: Melissa Vagi --- _observing-your-data/ad/dashboards-anomaly-detection.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_observing-your-data/ad/dashboards-anomaly-detection.md b/_observing-your-data/ad/dashboards-anomaly-detection.md index 6905b8b06e..679237094a 100644 --- a/_observing-your-data/ad/dashboards-anomaly-detection.md +++ b/_observing-your-data/ad/dashboards-anomaly-detection.md @@ -11,7 +11,7 @@ Introduced 2.9 OpenSearch provides an automated means of detecting harmful outliers and protecting your data when you enable anomaly detection. When applied to metrics, OpenSearch uses algorithms to continuously analyze systems and applications, determine normal baselines, and surface anomalies. -You can connect data visualizations to OpenSearch datasets and then create, run, and view anomaly alarms and results from visualizations in the **Dashboard** interface. With only a couple of steps, you can bring together traces, metrics, and logs to make your applications and infrastructure fully observable. +You can connect data visualizations to OpenSearch datasets and then create, run, and view real-time anomaly results from visualizations in the **Dashboard** interface. With only a couple of steps, you can bring together traces, metrics, and logs to make your applications and infrastructure fully observable. ## Getting started @@ -23,7 +23,7 @@ Before getting started, you must have: ## General requirements for anomaly detection visualizations -Anomaly detection visualizations are displayed as time-series charts that give you a snapshot of when anomalies have occurred from different anomaly detectors you have configured for the visualization. You can display up to 10 metrics on your chart, and each series can be shown as a line on the chart. +Anomaly detection visualizations are displayed as time-series charts that give you a snapshot of when anomalies have occurred from different anomaly detectors you have configured for the visualization. You can display up to 10 metrics on your chart, and each series can be shown as a line on the chart. Note that only real-time anomalies will be visible on the chart. For more information on real-time and historical anomaly detection, see [Anomaly detection, Step 3: Set up detector jobs]({{site.url}}{{site.baseurl}}/observing-your-data/ad/index/#step-3-set-up-detector-jobs). Keep in mind the following requirements when setting up or creating anomaly detection visualizations. The visualization: From f2d1cd5e914ecc428c70b62b4e74163a9fa013d2 Mon Sep 17 00:00:00 2001 From: Heather Halter Date: Wed, 10 Jul 2024 15:51:33 -0700 Subject: [PATCH 012/154] Fixes table in Data Prepper write_json processor (#7518) * fixtable Signed-off-by: Heather Halter * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update write_json.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Heather Halter Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../pipelines/configuration/processors/write_json.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/_data-prepper/pipelines/configuration/processors/write_json.md b/_data-prepper/pipelines/configuration/processors/write_json.md index 9e94176010..8f1e6851da 100644 --- a/_data-prepper/pipelines/configuration/processors/write_json.md +++ b/_data-prepper/pipelines/configuration/processors/write_json.md @@ -11,8 +11,8 @@ nav_order: 56 The `write_json` processor converts an object in an event into a JSON string. You can customize the processor to choose the source and target field names. -| Option | Description | Example | -| :--- | :--- | :--- | -| source | Mandatory field that specifies the name of the field in the event containing the message or object to be parsed. | If `source` is set to `"message"` and the input is `{"message": {"key1":"value1", "key2":{"key3":"value3"}}`, then the `write_json` processor generates `{"message": "{\"key1\":\"value`\", \"key2\":"{\"key3\":\"value3\"}"}"`. -| target | An optional field that specifies the name of the field in which the resulting JSON string should be stored. If `target` is not specified, then the `source` field is used. +Option | Description | Example +:--- | :--- | :--- +source | Mandatory field that specifies the name of the field in the event containing the message or object to be parsed. | If `source` is set to `"message"` and the input is `{"message": {"key1":"value1", "key2":{"key3":"value3"}}}`, then the `write_json` processor outputs the event as `"{\"key1\":\"value1\",\"key2\":{\"key3\":\"value3\"}}"`. +target | An optional field that specifies the name of the field in which the resulting JSON string should be stored. If `target` is not specified, then the `source` field is used. | `key1` From a94e5b601fb234ac895a86a9262559f4617f5d50 Mon Sep 17 00:00:00 2001 From: "Daniel (dB.) Doubrovkine" Date: Wed, 10 Jul 2024 18:01:40 -0500 Subject: [PATCH 013/154] Quote all alphabetic defaults. (#7660) * Quote all default is true/false. Signed-off-by: dblock * Fixed non-boolean defaults. Signed-off-by: dblock * Replaced cluster_manager node by cluster manager node. Signed-off-by: dblock * Replaced master node by cluster manager node. Signed-off-by: dblock * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _tuning-your-cluster/availability-and-recovery/snapshots/sm-api.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: dblock Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _api-reference/cat/cat-aliases.md | 2 +- _api-reference/cat/cat-allocation.md | 4 +-- _api-reference/cat/cat-health.md | 2 +- _api-reference/cat/cat-indices.md | 4 +-- _api-reference/cat/cat-nodeattrs.md | 2 +- _api-reference/cat/cat-nodes.md | 2 +- _api-reference/cat/cat-pending-tasks.md | 2 +- _api-reference/cat/cat-plugins.md | 4 +-- _api-reference/cat/cat-recovery.md | 4 +-- _api-reference/cat/cat-repositories.md | 4 +-- _api-reference/cat/cat-shards.md | 4 +-- _api-reference/cat/cat-templates.md | 2 +- _api-reference/cat/cat-thread-pool.md | 4 +-- .../cluster-api/cluster-allocation.md | 4 +-- _api-reference/cluster-api/cluster-health.md | 6 ++-- _api-reference/count.md | 10 +++---- .../document-apis/delete-by-query.md | 6 ++-- _api-reference/document-apis/get-documents.md | 6 ++-- .../document-apis/index-document.md | 4 +-- _api-reference/document-apis/multi-get.md | 2 +- _api-reference/document-apis/reindex.md | 4 +-- .../document-apis/update-by-query.md | 4 +-- .../document-apis/update-document.md | 2 +- _api-reference/explain.md | 8 +++--- _api-reference/index-apis/close-index.md | 6 ++-- _api-reference/index-apis/delete-index.md | 4 +-- _api-reference/index-apis/exists.md | 8 +++--- _api-reference/index-apis/get-index.md | 6 ++-- _api-reference/index-apis/get-settings.md | 2 +- _api-reference/index-apis/open-index.md | 6 ++-- _api-reference/index-apis/update-settings.md | 2 +- _api-reference/search.md | 28 +++++++++---------- _api-reference/snapshots/create-repository.md | 2 +- _api-reference/snapshots/create-snapshot.md | 6 ++-- .../snapshots/get-snapshot-repository.md | 2 +- .../snapshots/verify-snapshot-repository.md | 2 +- _clients/javascript/helpers.md | 2 +- _dashboards/visualize/visbuilder.md | 2 +- .../configuring-data-prepper.md | 6 ++-- _im-plugin/index-rollups/rollup-api.md | 4 +-- .../index-transforms/transforms-apis.md | 4 +-- _observing-your-data/notifications/api.md | 2 +- _security/audit-logs/storage-types.md | 4 +-- .../authentication-backends/openid-connect.md | 6 ++-- _security/authentication-backends/proxy.md | 2 +- _security/authentication-backends/saml.md | 4 +-- _security/configuration/security-admin.md | 4 +-- _security/configuration/tls.md | 14 +++++----- .../snapshots/sm-api.md | 4 +-- .../snapshots/snapshot-restore.md | 8 +++--- _tuning-your-cluster/index.md | 2 +- 51 files changed, 119 insertions(+), 119 deletions(-) diff --git a/_api-reference/cat/cat-aliases.md b/_api-reference/cat/cat-aliases.md index 9e4407dced..b0c2d7184e 100644 --- a/_api-reference/cat/cat-aliases.md +++ b/_api-reference/cat/cat-aliases.md @@ -52,7 +52,7 @@ In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-refe Parameter | Type | Description :--- | :--- | :--- -local | Boolean | Whether to return information from the local node only instead of from the master node. Default is false. +local | Boolean | Whether to return information from the local node only instead of from the cluster manager node. Default is `false`. expand_wildcards | Enum | Expands wildcard expressions to concrete indexes. Combine multiple values with commas. Supported values are `all`, `open`, `closed`, `hidden`, and `none`. Default is `open`. ## Response diff --git a/_api-reference/cat/cat-allocation.md b/_api-reference/cat/cat-allocation.md index 9598c8f3b5..23ebed79ff 100644 --- a/_api-reference/cat/cat-allocation.md +++ b/_api-reference/cat/cat-allocation.md @@ -51,8 +51,8 @@ In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-refe Parameter | Type | Description :--- | :--- | :--- bytes | Byte size | Specify the units for byte size. For example, `7kb` or `6gb`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). -local | Boolean | Whether to return information from the local node only instead of from the cluster_manager node. Default is false. -cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster_manager node. Default is 30 seconds. +local | Boolean | Whether to return information from the local node only instead of from the cluster manager node. Default is `false`. +cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds. ## Response diff --git a/_api-reference/cat/cat-health.md b/_api-reference/cat/cat-health.md index 6077c77e43..7767cfbc46 100644 --- a/_api-reference/cat/cat-health.md +++ b/_api-reference/cat/cat-health.md @@ -36,7 +36,7 @@ All CAT health URL parameters are optional. Parameter | Type | Description :--- | :--- | :--- time | Time | Specify the units for time. For example, `5d` or `7h`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). -ts | Boolean | If true, returns HH:MM:SS and Unix epoch timestamps. Default is true. +ts | Boolean | If true, returns HH:MM:SS and Unix epoch timestamps. Default is `true`. ## Response diff --git a/_api-reference/cat/cat-indices.md b/_api-reference/cat/cat-indices.md index 3a21e900ff..fe9556899e 100644 --- a/_api-reference/cat/cat-indices.md +++ b/_api-reference/cat/cat-indices.md @@ -52,9 +52,9 @@ Parameter | Type | Description :--- | :--- | :--- bytes | Byte size | Specify the units for byte size. For example, `7kb` or `6gb`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). health | String | Limit indexes based on their health status. Supported values are `green`, `yellow`, and `red`. -include_unloaded_segments | Boolean | Whether to include information from segments not loaded into memory. Default is false. +include_unloaded_segments | Boolean | Whether to include information from segments not loaded into memory. Default is `false`. cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds. -pri | Boolean | Whether to return information only from the primary shards. Default is false. +pri | Boolean | Whether to return information only from the primary shards. Default is `false`. time | Time | Specify the units for time. For example, `5d` or `7h`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). expand_wildcards | Enum | Expands wildcard expressions to concrete indexes. Combine multiple values with commas. Supported values are `all`, `open`, `closed`, `hidden`, and `none`. Default is `open`. diff --git a/_api-reference/cat/cat-nodeattrs.md b/_api-reference/cat/cat-nodeattrs.md index 95c1e50afc..6b4cc6d92e 100644 --- a/_api-reference/cat/cat-nodeattrs.md +++ b/_api-reference/cat/cat-nodeattrs.md @@ -35,7 +35,7 @@ In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-refe Parameter | Type | Description :--- | :--- | :--- -local | Boolean | Whether to return information from the local node only instead of from the cluster_manager node. Default is false. +local | Boolean | Whether to return information from the local node only instead of from the cluster manager node. Default is `false`. cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds. diff --git a/_api-reference/cat/cat-nodes.md b/_api-reference/cat/cat-nodes.md index 6f68204710..864e5dfdd5 100644 --- a/_api-reference/cat/cat-nodes.md +++ b/_api-reference/cat/cat-nodes.md @@ -41,7 +41,7 @@ bytes | Byte size | Specify the units for byte size. For example, `7kb` or `6gb` full_id | Boolean | If true, return the full node ID. If false, return the shortened node ID. Defaults to false. cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds. time | Time | Specify the units for time. For example, `5d` or `7h`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). -include_unloaded_segments | Boolean | Whether to include information from segments not loaded into memory. Default is false. +include_unloaded_segments | Boolean | Whether to include information from segments not loaded into memory. Default is `false`. ## Response diff --git a/_api-reference/cat/cat-pending-tasks.md b/_api-reference/cat/cat-pending-tasks.md index c8e1b744e8..748defd06e 100644 --- a/_api-reference/cat/cat-pending-tasks.md +++ b/_api-reference/cat/cat-pending-tasks.md @@ -36,7 +36,7 @@ In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-refe Parameter | Type | Description :--- | :--- | :--- -local | Boolean | Whether to return information from the local node only instead of from the cluster_manager node. Default is false. +local | Boolean | Whether to return information from the local node only instead of from the cluster manager node. Default is `false`. cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds. time | Time | Specify the units for time. For example, `5d` or `7h`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). diff --git a/_api-reference/cat/cat-plugins.md b/_api-reference/cat/cat-plugins.md index 3498462236..519c77f27f 100644 --- a/_api-reference/cat/cat-plugins.md +++ b/_api-reference/cat/cat-plugins.md @@ -36,8 +36,8 @@ In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-refe Parameter | Type | Description :--- | :--- | :--- -local | Boolean | Whether to return information from the local node only instead of from the cluster_manager node. Default is false. -cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster_manager node. Default is 30 seconds. +local | Boolean | Whether to return information from the local node only instead of from the cluster manager node. Default is `false`. +cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds. ## Response diff --git a/_api-reference/cat/cat-recovery.md b/_api-reference/cat/cat-recovery.md index 54abac6d99..da66aa7272 100644 --- a/_api-reference/cat/cat-recovery.md +++ b/_api-reference/cat/cat-recovery.md @@ -50,9 +50,9 @@ In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-refe Parameter | Type | Description :--- | :--- | :--- -active_only | Boolean | Whether to only include ongoing shard recoveries. Default is false. +active_only | Boolean | Whether to only include ongoing shard recoveries. Default is `false`. bytes | Byte size | Specify the units for byte size. For example, `7kb` or `6gb`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). -detailed | Boolean | Whether to include detailed information about shard recoveries. Default is false. +detailed | Boolean | Whether to include detailed information about shard recoveries. Default is `false`. time | Time | Specify the units for time. For example, `5d` or `7h`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). ## Response diff --git a/_api-reference/cat/cat-repositories.md b/_api-reference/cat/cat-repositories.md index 94f39b9d15..c6d62c9c62 100644 --- a/_api-reference/cat/cat-repositories.md +++ b/_api-reference/cat/cat-repositories.md @@ -36,8 +36,8 @@ In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-refe Parameter | Type | Description :--- | :--- | :--- -local | Boolean | Whether to return information from the local node only instead of from the cluster_manager node. Default is false. -cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster_manager node. Default is 30 seconds. +local | Boolean | Whether to return information from the local node only instead of from the cluster manager node. Default is `false`. +cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds. ## Response diff --git a/_api-reference/cat/cat-shards.md b/_api-reference/cat/cat-shards.md index e74667b5ac..9a727b5b11 100644 --- a/_api-reference/cat/cat-shards.md +++ b/_api-reference/cat/cat-shards.md @@ -51,8 +51,8 @@ In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-refe Parameter | Type | Description :--- | :--- | :--- bytes | Byte size | Specify the units for byte size. For example, `7kb` or `6gb`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). -local | Boolean | Whether to return information from the local node only instead of from the cluster_manager node. Default is false. -cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster_manager node. Default is 30 seconds. +local | Boolean | Whether to return information from the local node only instead of from the cluster manager node. Default is `false`. +cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds. time | Time | Specify the units for time. For example, `5d` or `7h`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). diff --git a/_api-reference/cat/cat-templates.md b/_api-reference/cat/cat-templates.md index d2aed7b0b8..d7c7aac90f 100644 --- a/_api-reference/cat/cat-templates.md +++ b/_api-reference/cat/cat-templates.md @@ -44,7 +44,7 @@ In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-refe Parameter | Type | Description :--- | :--- | :--- -local | Boolean | Whether to return information from the local node only instead of from the cluster manager node. Default is false. +local | Boolean | Whether to return information from the local node only instead of from the cluster manager node. Default is `false`. cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds. diff --git a/_api-reference/cat/cat-thread-pool.md b/_api-reference/cat/cat-thread-pool.md index 5d3e341b74..491b523092 100644 --- a/_api-reference/cat/cat-thread-pool.md +++ b/_api-reference/cat/cat-thread-pool.md @@ -49,8 +49,8 @@ In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-refe Parameter | Type | Description :--- | :--- | :--- -local | Boolean | Whether to return information from the local node only instead of from the cluster_manager node. Default is false. -cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster_manager node. Default is 30 seconds. +local | Boolean | Whether to return information from the local node only instead of from the cluster manager node. Default is `false`. +cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds. ## Response diff --git a/_api-reference/cluster-api/cluster-allocation.md b/_api-reference/cluster-api/cluster-allocation.md index da6e3aab05..b1b1c266d6 100644 --- a/_api-reference/cluster-api/cluster-allocation.md +++ b/_api-reference/cluster-api/cluster-allocation.md @@ -43,8 +43,8 @@ All cluster allocation explain parameters are optional. Parameter | Type | Description :--- | :--- | :--- -include_yes_decisions | Boolean | OpenSearch makes a series of yes or no decisions when trying to allocate a shard to a node. If this parameter is true, OpenSearch includes the (generally more numerous) "yes" decisions in its response. Default is false. -include_disk_info | Boolean | Whether to include information about disk usage in the response. Default is false. +include_yes_decisions | Boolean | OpenSearch makes a series of yes or no decisions when trying to allocate a shard to a node. If this parameter is true, OpenSearch includes the (generally more numerous) "yes" decisions in its response. Default is `false`. +include_disk_info | Boolean | Whether to include information about disk usage in the response. Default is `false`. ## Request body diff --git a/_api-reference/cluster-api/cluster-health.md b/_api-reference/cluster-api/cluster-health.md index e9e2bb0e47..73c83d5ee6 100644 --- a/_api-reference/cluster-api/cluster-health.md +++ b/_api-reference/cluster-api/cluster-health.md @@ -44,14 +44,14 @@ Parameter | Type | Description expand_wildcards | Enum | Expands wildcard expressions to concrete indexes. Combine multiple values with commas. Supported values are `all`, `open`, `closed`, `hidden`, and `none`. Default is `open`. level | Enum | The level of detail for returned health information. Supported values are `cluster`, `indices`, `shards`, and `awareness_attributes`. Default is `cluster`. awareness_attribute | String | The name of the awareness attribute, for which to return cluster health (for example, `zone`). Applicable only if `level` is set to `awareness_attributes`. -local | Boolean | Whether to return information from the local node only instead of from the cluster manager node. Default is false. +local | Boolean | Whether to return information from the local node only instead of from the cluster manager node. Default is `false`. cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds. timeout | Time | The amount of time to wait for a response. If the timeout expires, the request fails. Default is 30 seconds. wait_for_active_shards | String | Wait until the specified number of shards is active before returning a response. `all` for all shards. Default is `0`. wait_for_nodes | String | Wait for N number of nodes. Use `12` for exact match, `>12` and `<12` for range. wait_for_events | Enum | Wait until all currently queued events with the given priority are processed. Supported values are `immediate`, `urgent`, `high`, `normal`, `low`, and `languid`. -wait_for_no_relocating_shards | Boolean | Whether to wait until there are no relocating shards in the cluster. Default is false. -wait_for_no_initializing_shards | Boolean | Whether to wait until there are no initializing shards in the cluster. Default is false. +wait_for_no_relocating_shards | Boolean | Whether to wait until there are no relocating shards in the cluster. Default is `false`. +wait_for_no_initializing_shards | Boolean | Whether to wait until there are no initializing shards in the cluster. Default is `false`. wait_for_status | Enum | Wait until the cluster health reaches the specified status or better. Supported values are `green`, `yellow`, and `red`. weights | JSON object | Assigns weights to attributes within the request body of the PUT request. Weights can be set in any ration, for example, 2:3:5. In a 2:3:5 ratio with three zones, for every 100 requests sent to the cluster, each zone would receive either 20, 30, or 50 search requests in a random order. When assigned a weight of `0`, the zone does not receive any search traffic. diff --git a/_api-reference/count.md b/_api-reference/count.md index 3e777a413e..2ac336eeb0 100644 --- a/_api-reference/count.md +++ b/_api-reference/count.md @@ -79,14 +79,14 @@ All count parameters are optional. Parameter | Type | Description :--- | :--- | :--- -`allow_no_indices` | Boolean | If false, the request returns an error if any wildcard expression or index alias targets any closed or missing indexes. Default is false. +`allow_no_indices` | Boolean | If false, the request returns an error if any wildcard expression or index alias targets any closed or missing indexes. Default is `false`. `analyzer` | String | The analyzer to use in the query string. -`analyze_wildcard` | Boolean | Specifies whether to analyze wildcard and prefix queries. Default is false. -`default_operator` | String | Indicates whether the default operator for a string query should be AND or OR. Default is OR. +`analyze_wildcard` | Boolean | Specifies whether to analyze wildcard and prefix queries. Default is `false`. +`default_operator` | String | Indicates whether the default operator for a string query should be `AND` or `OR`. Default is `OR`. `df` | String | The default field in case a field prefix is not provided in the query string. `expand_wildcards` | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are `all` (match any index), `open` (match open, non-hidden indexes), `closed` (match closed, non-hidden indexes), `hidden` (match hidden indexes), and `none` (deny wildcard expressions). Default is `open`. -`ignore_unavailable` | Boolean | Specifies whether to include missing or closed indexes in the response. Default is false. -`lenient` | Boolean | Specifies whether OpenSearch should accept requests if queries have format errors (for example, querying a text field for an integer). Default is false. +`ignore_unavailable` | Boolean | Specifies whether to include missing or closed indexes in the response. Default is `false`. +`lenient` | Boolean | Specifies whether OpenSearch should accept requests if queries have format errors (for example, querying a text field for an integer). Default is `false`. `min_score` | Float | Include only documents with a minimum `_score` value in the result. `routing` | String | Value used to route the operation to a specific shard. `preference` | String | Specifies which shard or node OpenSearch should perform the count operation on. diff --git a/_api-reference/document-apis/delete-by-query.md b/_api-reference/document-apis/delete-by-query.md index ca90ea3484..6f4104c254 100644 --- a/_api-reference/document-apis/delete-by-query.md +++ b/_api-reference/document-apis/delete-by-query.md @@ -42,14 +42,14 @@ Parameter | Type | Description <index> | String | Name or list of the data streams, indexes, or aliases to delete from. Supports wildcards. If left blank, OpenSearch searches all indexes. allow_no_indices | Boolean | Whether to ignore wildcards that don’t match any indexes. Default is `true`. analyzer | String | The analyzer to use in the query string. -analyze_wildcard | Boolean | Specifies whether to analyze wildcard and prefix queries. Default is false. +analyze_wildcard | Boolean | Specifies whether to analyze wildcard and prefix queries. Default is `false`. conflicts | String | Indicates to OpenSearch what should happen if the delete by query operation runs into a version conflict. Valid options are `abort` and `proceed`. Default is `abort`. -default_operator | String | Indicates whether the default operator for a string query should be AND or OR. Default is OR. +default_operator | String | Indicates whether the default operator for a string query should be `AND` or `OR`. Default is `OR`. df | String | The default field in case a field prefix is not provided in the query string. expand_wildcards | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are `all` (match any index), `open` (match open, non-hidden indexes), `closed` (match closed, non-hidden indexes), `hidden` (match hidden indexes), and `none` (deny wildcard expressions). Default is `open`. from | Integer | The starting index to search from. Default is 0. ignore_unavailable | Boolean | Specifies whether to include missing or closed indexes in the response and ignores unavailable shards during the search request. Default is `false`. -lenient | Boolean | Specifies whether OpenSearch should accept requests if queries have format errors (for example, querying a text field for an integer). Default is false. +lenient | Boolean | Specifies whether OpenSearch should accept requests if queries have format errors (for example, querying a text field for an integer). Default is `false`. max_docs | Integer | How many documents the delete by query operation should process at most. Default is all documents. preference | String | Specifies which shard or node OpenSearch should perform the delete by query operation on. q | String | Lucene query string's query. diff --git a/_api-reference/document-apis/get-documents.md b/_api-reference/document-apis/get-documents.md index d5c2e52d93..3eaeb507d4 100644 --- a/_api-reference/document-apis/get-documents.md +++ b/_api-reference/document-apis/get-documents.md @@ -38,11 +38,11 @@ All get document URL parameters are optional. Parameter | Type | Description :--- | :--- | :--- preference | String | Specifies a preference of which shard to retrieve results from. Available options are `_local`, which tells the operation to retrieve results from a locally allocated shard replica, and a custom string value assigned to a specific shard replica. By default, OpenSearch executes get document operations on random shards. -realtime | Boolean | Specifies whether the operation should run in realtime. If false, the operation waits for the index to refresh to analyze the source to retrieve data, which makes the operation near-realtime. Default is true. +realtime | Boolean | Specifies whether the operation should run in realtime. If false, the operation waits for the index to refresh to analyze the source to retrieve data, which makes the operation near-realtime. Default is `true`. refresh | Boolean | If true, OpenSearch refreshes shards to make the get operation available to search results. Valid options are `true`, `false`, and `wait_for`, which tells OpenSearch to wait for a refresh before executing the operation. Default is `false`. routing | String | A value used to route the operation to a specific shard. -stored_fields | Boolean | Whether the get operation should retrieve fields stored in the index. Default is false. -_source | String | Whether to include the `_source` field in the response body. Default is true. +stored_fields | Boolean | Whether the get operation should retrieve fields stored in the index. Default is `false`. +_source | String | Whether to include the `_source` field in the response body. Default is `true`. _source_excludes | String | A comma-separated list of source fields to exclude in the query response. _source_includes | String | A comma-separated list of source fields to include in the query response. version | Integer | The version of the document to return, which must match the current version of the document. diff --git a/_api-reference/document-apis/index-document.md b/_api-reference/document-apis/index-document.md index 3460fc1d50..d131a2f50e 100644 --- a/_api-reference/document-apis/index-document.md +++ b/_api-reference/document-apis/index-document.md @@ -93,12 +93,12 @@ if_primary_term | Integer | Only perform the index operation if the document has op_type | Enum | Specifies the type of operation to complete with the document. Valid values are `create` (index a document only if it doesn't exist) and `index`. If a document ID is included in the request, then the default is `index`. Otherwise, the default is `create`. | No pipeline | String | Route the index operation to a certain pipeline. | No routing | String | value used to assign the index operation to a specific shard. | No -refresh | Enum | If true, OpenSearch refreshes shards to make the operation visible to searching. Valid options are `true`, `false`, and `wait_for`, which tells OpenSearch to wait for a refresh before executing the operation. Default is false. | No +refresh | Enum | If true, OpenSearch refreshes shards to make the operation visible to searching. Valid options are `true`, `false`, and `wait_for`, which tells OpenSearch to wait for a refresh before executing the operation. Default is `false`. | No timeout | Time | How long to wait for a response from the cluster. Default is `1m`. | No version | Integer | The document's version number. | No version_type | Enum | Assigns a specific type to the document. Valid options are `external` (retrieve the document if the specified version number is greater than the document's current version) and `external_gte` (retrieve the document if the specified version number is greater than or equal to the document's current version). For example, to index version 3 of a document, use `/_doc/1?version=3&version_type=external`. | No wait_for_active_shards | String | The number of active shards that must be available before OpenSearch processes the request. Default is 1 (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the operation to succeed. | No -require_alias | Boolean | Specifies whether the target index must be an index alias. Default is false. | No +require_alias | Boolean | Specifies whether the target index must be an index alias. Default is `false`. | No ## Request body diff --git a/_api-reference/document-apis/multi-get.md b/_api-reference/document-apis/multi-get.md index 16e9ceeb95..2d3246fa58 100644 --- a/_api-reference/document-apis/multi-get.md +++ b/_api-reference/document-apis/multi-get.md @@ -29,7 +29,7 @@ All multi-get URL parameters are optional. Parameter | Type | Description :--- | :--- | :--- | :--- <index> | String | Name of the index to retrieve documents from. -preference | String | Specifies the nodes or shards OpenSearch should execute the multi-get operation on. Default is random. +preference | String | Specifies the nodes or shards OpenSearch should execute the multi-get operation on. Default is `random`. realtime | Boolean | Specifies whether the operation should run in realtime. If false, the operation waits for the index to refresh to analyze the source to retrieve data, which makes the operation near-realtime. Default is `true`. refresh | Boolean | If true, OpenSearch refreshes shards to make the multi-get operation available to search results. Valid options are `true`, `false`, and `wait_for`, which tells OpenSearch to wait for a refresh before executing the operation. Default is `false`. routing | String | Value used to route the multi-get operation to a specific shard. diff --git a/_api-reference/document-apis/reindex.md b/_api-reference/document-apis/reindex.md index 2bc3646e68..48f14923f5 100644 --- a/_api-reference/document-apis/reindex.md +++ b/_api-reference/document-apis/reindex.md @@ -46,7 +46,7 @@ timeout | Time | How long to wait for a response from the cluster. Default is `3 wait_for_active_shards | String | The number of active shards that must be available before OpenSearch processes the reindex request. Default is 1 (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the operation to succeed. wait_for_completion | Boolean | Waits for the matching tasks to complete. Default is `false`. requests_per_second | Integer | Specifies the request’s throttling in sub-requests per second. Default is -1, which means no throttling. -require_alias | Boolean | Whether the destination index must be an index alias. Default is false. +require_alias | Boolean | Whether the destination index must be an index alias. Default is `false`. scroll | Time | How long to keep the search context open. Default is `5m`. slices | Integer | Number of sub-tasks OpenSearch should divide this task into. Default is 1, which means OpenSearch should not divide this task. Setting this parameter to `auto` indicates to OpenSearch that it should automatically decide how many slices to split the task into. max_docs | Integer | How many documents the update by query operation should process at most. Default is all documents. @@ -70,7 +70,7 @@ socket_timeout | The wait time for socket reads. Default is 30s. connect_timeout | The wait time for remote connection timeouts. Default is 30s. size | The number of documents to reindex. slice | Whether to manually or automatically slice the reindex operation so it executes in parallel. Setting this field to `auto` allows OpenSearch to control the number of slices to use, which is one slice per shard, up to a maximum of 20. If there are multiple sources, the number of slices used are based on the index or backing index with the smallest number of shards. -_source | Whether to reindex source fields. Specify a list of fields to reindex or true to reindex all fields. Default is true. +_source | Whether to reindex source fields. Specify a list of fields to reindex or true to reindex all fields. Default is `true`. id | The ID to associate with manual slicing. max | Maximum number of slices. dest | Information about the destination index. Valid values are `index`, `version_type`, `op_type`, and `pipeline`. diff --git a/_api-reference/document-apis/update-by-query.md b/_api-reference/document-apis/update-by-query.md index 4cd686dcb4..217ae69550 100644 --- a/_api-reference/document-apis/update-by-query.md +++ b/_api-reference/document-apis/update-by-query.md @@ -49,14 +49,14 @@ Parameter | Type | Description <index> | String | Comma-separated list of indexes to update. To update all indexes, use * or omit this parameter. allow_no_indices | Boolean | Whether to ignore wildcards that don’t match any indexes. Default is `true`. analyzer | String | Analyzer to use in the query string. -analyze_wildcard | Boolean | Whether the update operation should include wildcard and prefix queries in the analysis. Default is false. +analyze_wildcard | Boolean | Whether the update operation should include wildcard and prefix queries in the analysis. Default is `false`. conflicts | String | Indicates to OpenSearch what should happen if the update by query operation runs into a version conflict. Valid options are `abort` and `proceed`. Default is `abort`. default_operator | String | Indicates whether the default operator for a string query should be `AND` or `OR`. Default is `OR`. df | String | The default field if a field prefix is not provided in the query string. expand_wildcards | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are `all` (match any index), `open` (match open, non-hidden indexes), `closed` (match closed, non-hidden indexes), `hidden` (match hidden indexes), and `none` (deny wildcard expressions). Default is `open`. from | Integer | The starting index to search from. Default is 0. ignore_unavailable | Boolean | Whether to exclude missing or closed indexes in the response and ignores unavailable shards during the search request. Default is `false`. -lenient | Boolean | Specifies whether OpenSearch should accept requests if queries have format errors (for example, querying a text field for an integer). Default is false. +lenient | Boolean | Specifies whether OpenSearch should accept requests if queries have format errors (for example, querying a text field for an integer). Default is `false`. max_docs | Integer | How many documents the update by query operation should process at most. Default is all documents. pipeline | String | ID of the pipeline to use to process documents. preference | String | Specifies which shard or node OpenSearch should perform the update by query operation on. diff --git a/_api-reference/document-apis/update-document.md b/_api-reference/document-apis/update-document.md index 365cb3aa73..3da7030fa5 100644 --- a/_api-reference/document-apis/update-document.md +++ b/_api-reference/document-apis/update-document.md @@ -53,7 +53,7 @@ Parameter | Type | Description | Required if_seq_no | Integer | Only perform the update operation if the document has the specified sequence number. | No if_primary_term | Integer | Perform the update operation if the document has the specified primary term. | No lang | String | Language of the script. Default is `painless`. | No -require_alias | Boolean | Specifies whether the destination must be an index alias. Default is false. | No +require_alias | Boolean | Specifies whether the destination must be an index alias. Default is `false`. | No refresh | Enum | If true, OpenSearch refreshes shards to make the operation visible to searching. Valid options are `true`, `false`, and `wait_for`, which tells OpenSearch to wait for a refresh before executing the operation. Default is `false`. | No retry_on_conflict | Integer | The amount of times OpenSearch should retry the operation if there's a document conflict. Default is 0. | No routing | String | Value to route the update operation to a specific shard. | No diff --git a/_api-reference/explain.md b/_api-reference/explain.md index 57b7d9fada..8c2b757945 100644 --- a/_api-reference/explain.md +++ b/_api-reference/explain.md @@ -64,15 +64,15 @@ Parameter | Type | Description | Required `` | String | Name of the index. You can only specify a single index. | Yes `<_id>` | String | A unique identifier to attach to the document. | Yes `analyzer` | String | The analyzer to use in the query string. | No -`analyze_wildcard` | Boolean | Specifies whether to analyze wildcard and prefix queries. Default is false. | No +`analyze_wildcard` | Boolean | Specifies whether to analyze wildcard and prefix queries. Default is `false`. | No `default_operator` | String | Indicates whether the default operator for a string query should be AND or OR. Default is OR. | No `df` | String | The default field in case a field prefix is not provided in the query string. | No -`lenient` | Boolean | Specifies whether OpenSearch should ignore format-based query failures (for example, querying a text field for an integer). Default is false. | No +`lenient` | Boolean | Specifies whether OpenSearch should ignore format-based query failures (for example, querying a text field for an integer). Default is `false`. | No `preference` | String | Specifies a preference of which shard to retrieve results from. Available options are `_local`, which tells the operation to retrieve results from a locally allocated shard replica, and a custom string value assigned to a specific shard replica. By default, OpenSearch executes the explain operation on random shards. | No `q` | String | Query in the Lucene query string syntax. | No -`stored_fields` | Boolean | If true, the operation retrieves document fields stored in the index rather than the document’s `_source`. Default is false. | No +`stored_fields` | Boolean | If true, the operation retrieves document fields stored in the index rather than the document’s `_source`. Default is `false`. | No `routing` | String | Value used to route the operation to a specific shard. | No -`_source` | String | Whether to include the `_source` field in the response body. Default is true. | No +`_source` | String | Whether to include the `_source` field in the response body. Default is `true`. | No `_source_excludes` | String | A comma-separated list of source fields to exclude in the query response. | No `_source_includes` | String | A comma-separated list of source fields to include in the query response. | No diff --git a/_api-reference/index-apis/close-index.md b/_api-reference/index-apis/close-index.md index e8d2e3e1e2..7e43198d37 100644 --- a/_api-reference/index-apis/close-index.md +++ b/_api-reference/index-apis/close-index.md @@ -33,9 +33,9 @@ All parameters are optional. Parameter | Type | Description :--- | :--- | :--- <index-name> | String | The index to close. Can be a comma-separated list of multiple index names. Use `_all` or * to close all indexes. -allow_no_indices | Boolean | Whether to ignore wildcards that don't match any indexes. Default is true. -expand_wildcards | String | Expands wildcard expressions to different indexes. Combine multiple values with commas. Available values are all (match all indexes), open (match open indexes), closed (match closed indexes), hidden (match hidden indexes), and none (do not accept wildcard expressions). Default is open. -ignore_unavailable | Boolean | If true, OpenSearch does not search for missing or closed indexes. Default is false. +allow_no_indices | Boolean | Whether to ignore wildcards that don't match any indexes. Default is `true`. +expand_wildcards | String | Expands wildcard expressions to different indexes. Combine multiple values with commas. Available values are all (match all indexes), open (match open indexes), closed (match closed indexes), hidden (match hidden indexes), and none (do not accept wildcard expressions). Default is `open`. +ignore_unavailable | Boolean | If true, OpenSearch does not search for missing or closed indexes. Default is `false`. wait_for_active_shards | String | Specifies the number of active shards that must be available before OpenSearch processes the request. Default is 1 (only the primary shard). Set to all or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the request to succeed. cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. timeout | Time | How long to wait for a response from the cluster. Default is `30s`. diff --git a/_api-reference/index-apis/delete-index.md b/_api-reference/index-apis/delete-index.md index 7b2be5e83b..20e5c51c93 100644 --- a/_api-reference/index-apis/delete-index.md +++ b/_api-reference/index-apis/delete-index.md @@ -31,8 +31,8 @@ All parameters are optional. Parameter | Type | Description :--- | :--- | :--- -allow_no_indices | Boolean | Whether to ignore wildcards that don't match any indexes. Default is true. -expand_wildcards | String | Expands wildcard expressions to different indexes. Combine multiple values with commas. Available values are all (match all indexes), open (match open indexes), closed (match closed indexes), hidden (match hidden indexes), and none (do not accept wildcard expressions), which must be used with open, closed, or both. Default is open. +allow_no_indices | Boolean | Whether to ignore wildcards that don't match any indexes. Default is `true`. +expand_wildcards | String | Expands wildcard expressions to different indexes. Combine multiple values with commas. Available values are all (match all indexes), open (match open indexes), closed (match closed indexes), hidden (match hidden indexes), and none (do not accept wildcard expressions), which must be used with open, closed, or both. Default is `open`. ignore_unavailable | Boolean | If true, OpenSearch does not include missing or closed indexes in the response. cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. timeout | Time | How long to wait for the response to return. Default is `30s`. diff --git a/_api-reference/index-apis/exists.md b/_api-reference/index-apis/exists.md index 6d439a96cf..429ac40745 100644 --- a/_api-reference/index-apis/exists.md +++ b/_api-reference/index-apis/exists.md @@ -32,12 +32,12 @@ All parameters are optional. Parameter | Type | Description :--- | :--- | :--- -allow_no_indices | Boolean | Whether to ignore wildcards that don't match any indexes. Default is true. -expand_wildcards | String | Expands wildcard expressions to different indexes. Combine multiple values with commas. Available values are all (match all indexes), open (match open indexes), closed (match closed indexes), hidden (match hidden indexes), and none (do not accept wildcard expressions). Default is open. +allow_no_indices | Boolean | Whether to ignore wildcards that don't match any indexes. Default is `true`. +expand_wildcards | String | Expands wildcard expressions to different indexes. Combine multiple values with commas. Available values are all (match all indexes), open (match open indexes), closed (match closed indexes), hidden (match hidden indexes), and none (do not accept wildcard expressions). Default is `open`. flat_settings | Boolean | Whether to return settings in the flat form, which can improve readability, especially for heavily nested settings. For example, the flat form of "index": { "creation_date": "123456789" } is "index.creation_date": "123456789". include_defaults | Boolean | Whether to include default settings as part of the response. This parameter is useful for identifying the names and current values of settings you want to update. -ignore_unavailable | Boolean | If true, OpenSearch does not search for missing or closed indexes. Default is false. -local | Boolean | Whether to return information from only the local node instead of from the cluster manager node. Default is false. +ignore_unavailable | Boolean | If true, OpenSearch does not search for missing or closed indexes. Default is `false`. +local | Boolean | Whether to return information from only the local node instead of from the cluster manager node. Default is `false`. ## Response diff --git a/_api-reference/index-apis/get-index.md b/_api-reference/index-apis/get-index.md index 899e82e901..733110d63a 100644 --- a/_api-reference/index-apis/get-index.md +++ b/_api-reference/index-apis/get-index.md @@ -32,12 +32,12 @@ All parameters are optional. Parameter | Type | Description :--- | :--- | :--- -allow_no_indices | Boolean | Whether to ignore wildcards that don't match any indexes. Default is true. -expand_wildcards | String | Expands wildcard expressions to different indexes. Combine multiple values with commas. Available values are all (match all indexes), open (match open indexes), closed (match closed indexes), hidden (match hidden indexes), and none (do not accept wildcard expressions), which must be used with open, closed, or both. Default is open. +allow_no_indices | Boolean | Whether to ignore wildcards that don't match any indexes. Default is `true`. +expand_wildcards | String | Expands wildcard expressions to different indexes. Combine multiple values with commas. Available values are all (match all indexes), open (match open indexes), closed (match closed indexes), hidden (match hidden indexes), and none (do not accept wildcard expressions), which must be used with open, closed, or both. Default is `open`. flat_settings | Boolean | Whether to return settings in the flat form, which can improve readability, especially for heavily nested settings. For example, the flat form of "index": { "creation_date": "123456789" } is "index.creation_date": "123456789". include_defaults | Boolean | Whether to include default settings as part of the response. This parameter is useful for identifying the names and current values of settings you want to update. ignore_unavailable | Boolean | If true, OpenSearch does not include missing or closed indexes in the response. -local | Boolean | Whether to return information from only the local node instead of from the cluster manager node. Default is false. +local | Boolean | Whether to return information from only the local node instead of from the cluster manager node. Default is `false`. cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. diff --git a/_api-reference/index-apis/get-settings.md b/_api-reference/index-apis/get-settings.md index 9ad0078757..c41b25b4f5 100644 --- a/_api-reference/index-apis/get-settings.md +++ b/_api-reference/index-apis/get-settings.md @@ -42,7 +42,7 @@ expand_wildcards | String | Expands wildcard expressions to different indexes. C flat_settings | Boolean | Whether to return settings in the flat form, which can improve readability, especially for heavily nested settings. For example, the flat form of “index”: { “creation_date”: “123456789” } is “index.creation_date”: “123456789”. include_defaults | Boolean | Whether to include default settings, including settings used within OpenSearch plugins, in the response. Default is `false`. ignore_unavailable | Boolean | If true, OpenSearch does not include missing or closed indexes in the response. -local | Boolean | Whether to return information from the local node only instead of the cluster manager node. Default is false. +local | Boolean | Whether to return information from the local node only instead of the cluster manager node. Default is `false`. cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. ## Response diff --git a/_api-reference/index-apis/open-index.md b/_api-reference/index-apis/open-index.md index 6ca0348695..12381aa8c6 100644 --- a/_api-reference/index-apis/open-index.md +++ b/_api-reference/index-apis/open-index.md @@ -33,9 +33,9 @@ All parameters are optional. Parameter | Type | Description :--- | :--- | :--- <index-name> | String | The index to open. Can be a comma-separated list of multiple index names. Use `_all` or * to open all indexes. -allow_no_indices | Boolean | Whether to ignore wildcards that don't match any indexes. Default is true. -expand_wildcards | String | Expands wildcard expressions to different indexes. Combine multiple values with commas. Available values are all (match all indexes), open (match open indexes), closed (match closed indexes), hidden (match hidden indexes), and none (do not accept wildcard expressions). Default is open. -ignore_unavailable | Boolean | If true, OpenSearch does not search for missing or closed indexes. Default is false. +allow_no_indices | Boolean | Whether to ignore wildcards that don't match any indexes. Default is `true`. +expand_wildcards | String | Expands wildcard expressions to different indexes. Combine multiple values with commas. Available values are all (match all indexes), open (match open indexes), closed (match closed indexes), hidden (match hidden indexes), and none (do not accept wildcard expressions). Default is `open`. +ignore_unavailable | Boolean | If true, OpenSearch does not search for missing or closed indexes. Default is `false`. wait_for_active_shards | String | Specifies the number of active shards that must be available before OpenSearch processes the request. Default is 1 (only the primary shard). Set to all or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the request to succeed. cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. timeout | Time | How long to wait for a response from the cluster. Default is `30s`. diff --git a/_api-reference/index-apis/update-settings.md b/_api-reference/index-apis/update-settings.md index 3f38418ef4..9fc9f01f85 100644 --- a/_api-reference/index-apis/update-settings.md +++ b/_api-reference/index-apis/update-settings.md @@ -43,7 +43,7 @@ Parameter | Data type | Description allow_no_indices | Boolean | Whether to ignore wildcards that don’t match any indexes. Default is `true`. expand_wildcards | String | Expands wildcard expressions to different indexes. Combine multiple values with commas. Available values are `all` (match all indexes), `open` (match open indexes), `closed` (match closed indexes), `hidden` (match hidden indexes), and `none` (do not accept wildcard expressions), which must be used with `open`, `closed`, or both. Default is `open`. cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. -preserve_existing | Boolean | Whether to preserve existing index settings. Default is false. +preserve_existing | Boolean | Whether to preserve existing index settings. Default is `false`. timeout | Time | How long to wait for a connection to return. Default is `30s`. ## Request body diff --git a/_api-reference/search.md b/_api-reference/search.md index 46212e0634..777f48354e 100644 --- a/_api-reference/search.md +++ b/_api-reference/search.md @@ -42,29 +42,29 @@ All URL parameters are optional. Parameter | Type | Description :--- | :--- | :--- -allow_no_indices | Boolean | Whether to ignore wildcards that don’t match any indexes. Default is true. -allow_partial_search_results | Boolean | Whether to return partial results if the request runs into an error or times out. Default is true. +allow_no_indices | Boolean | Whether to ignore wildcards that don’t match any indexes. Default is `true`. +allow_partial_search_results | Boolean | Whether to return partial results if the request runs into an error or times out. Default is `true`. analyzer | String | Analyzer to use in the query string. -analyze_wildcard | Boolean | Whether the update operation should include wildcard and prefix queries in the analysis. Default is false. +analyze_wildcard | Boolean | Whether the update operation should include wildcard and prefix queries in the analysis. Default is `false`. batched_reduce_size | Integer | How many shard results to reduce on a node. Default is 512. cancel_after_time_interval | Time | The time after which the search request will be canceled. Request-level parameter takes precedence over cancel_after_time_interval [cluster setting]({{site.url}}{{site.baseurl}}/api-reference/cluster-settings). Default is -1. -ccs_minimize_roundtrips | Boolean | Whether to minimize roundtrips between a node and remote clusters. Default is true. +ccs_minimize_roundtrips | Boolean | Whether to minimize roundtrips between a node and remote clusters. Default is `true`. default_operator | String | Indicates whether the default operator for a string query should be AND or OR. Default is OR. df | String | The default field in case a field prefix is not provided in the query string. docvalue_fields | String | The fields that OpenSearch should return using their docvalue forms. expand_wildcards | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are all (match any index), open (match open, non-hidden indexes), closed (match closed, non-hidden indexes), hidden (match hidden indexes), and none (deny wildcard expressions). Default is open. -explain | Boolean | Whether to return details about how OpenSearch computed the document's score. Default is false. +explain | Boolean | Whether to return details about how OpenSearch computed the document's score. Default is `false`. from | Integer | The starting index to search from. Default is 0. -ignore_throttled | Boolean | Whether to ignore concrete, expanded, or indexes with aliases if indexes are frozen. Default is true. +ignore_throttled | Boolean | Whether to ignore concrete, expanded, or indexes with aliases if indexes are frozen. Default is `true`. ignore_unavailable | Boolean | Specifies whether to include missing or closed indexes in the response and ignores unavailable shards during the search request. Default is `false`. -lenient | Boolean | Specifies whether OpenSearch should accept requests if queries have format errors (for example, querying a text field for an integer). Default is false. +lenient | Boolean | Specifies whether OpenSearch should accept requests if queries have format errors (for example, querying a text field for an integer). Default is `false`. max_concurrent_shard_requests | Integer | How many concurrent shard requests this request should execute on each node. Default is 5. -phase_took | Boolean | Whether to return phase-level `took` time values in the response. Default is false. +phase_took | Boolean | Whether to return phase-level `took` time values in the response. Default is `false`. pre_filter_shard_size | Integer | A prefilter size threshold that triggers a prefilter operation if the request exceeds the threshold. Default is 128 shards. preference | String | Specifies the shards or nodes on which OpenSearch should perform the search. For valid values, see [The `preference` query parameter](#the-preference-query-parameter). q | String | Lucene query string’s query. request_cache | Boolean | Specifies whether OpenSearch should use the request cache. Default is whether it’s enabled in the index’s settings. -rest_total_hits_as_int | Boolean | Whether to return `hits.total` as an integer. Returns an object otherwise. Default is false. +rest_total_hits_as_int | Boolean | Whether to return `hits.total` as an integer. Returns an object otherwise. Default is `false`. routing | String | Value used to route the update by query operation to a specific shard. scroll | Time | How long to keep the search context open. search_type | String | Whether OpenSearch should use global term and document frequencies when calculating relevance scores. Valid choices are `query_then_fetch` and `dfs_query_then_fetch`. `query_then_fetch` scores documents using local term and document frequencies for the shard. It’s usually faster but less accurate. `dfs_query_then_fetch` scores documents using global term and document frequencies across all shards. It’s usually slower but more accurate. Default is `query_then_fetch`. @@ -75,18 +75,18 @@ _source | String | Whether to include the `_source` field in the response. _source_excludes | List | A comma-separated list of source fields to exclude from the response. _source_includes | List | A comma-separated list of source fields to include in the response. stats | String | Value to associate with the request for additional logging. -stored_fields | Boolean | Whether the get operation should retrieve fields stored in the index. Default is false. +stored_fields | Boolean | Whether the get operation should retrieve fields stored in the index. Default is `false`. suggest_field | String | Fields OpenSearch can use to look for similar terms. suggest_mode | String | The mode to use when searching. Available options are `always` (use suggestions based on the provided terms), `popular` (use suggestions that have more occurrences), and `missing` (use suggestions for terms not in the index). suggest_size | Integer | How many suggestions to return. suggest_text | String | The source that suggestions should be based off of. terminate_after | Integer | The maximum number of documents OpenSearch should process before terminating the request. Default is 0. timeout | Time | How long the operation should wait for a response from active shards. Default is `1m`. -track_scores | Boolean | Whether to return document scores. Default is false. +track_scores | Boolean | Whether to return document scores. Default is `false`. track_total_hits | Boolean or Integer | Whether to return how many documents matched the query. -typed_keys | Boolean | Whether returned aggregations and suggested terms should include their types in the response. Default is true. +typed_keys | Boolean | Whether returned aggregations and suggested terms should include their types in the response. Default is `true`. version | Boolean | Whether to include the document version as a match. -include_named_queries_score | Boolean | Whether to return scores with named queries. Default is false. +include_named_queries_score | Boolean | Whether to return scores with named queries. Default is `false`. ### The `preference` query parameter @@ -111,7 +111,7 @@ Field | Type | Description aggs | Object | In the optional `aggs` parameter, you can define any number of aggregations. Each aggregation is defined by its name and one of the types of aggregations that OpenSearch supports. For more information, see [Aggregations]({{site.url}}{{site.baseurl}}/aggregations/). docvalue_fields | Array of objects | The fields that OpenSearch should return using their docvalue forms. Specify a format to return results in a certain format, such as date and time. fields | Array | The fields to search for in the request. Specify a format to return results in a certain format, such as date and time. -explain | String | Whether to return details about how OpenSearch computed the document's score. Default is false. +explain | String | Whether to return details about how OpenSearch computed the document's score. Default is `false`. from | Integer | The starting index to search from. Default is 0. indices_boost | Array of objects | Values used to boost the score of specified indexes. Specify in the format of <index> : <boost-multiplier> min_score | Integer | Specify a score threshold to return only documents above the threshold. diff --git a/_api-reference/snapshots/create-repository.md b/_api-reference/snapshots/create-repository.md index 856332b793..54807b85d1 100644 --- a/_api-reference/snapshots/create-repository.md +++ b/_api-reference/snapshots/create-repository.md @@ -79,7 +79,7 @@ Request field | Description `max_snapshot_bytes_per_sec` | The maximum rate at which snapshots take. Default is 40 MB per second (`40m`). Optional. `readonly` | Whether the repository is read-only. Useful when migrating from one cluster (`"readonly": false` when registering) to another cluster (`"readonly": true` when registering). Optional. `remote_store_index_shallow_copy` | Boolean | Whether the snapshot of the remote store indexes is captured as a shallow copy. Default is `false`. -`server_side_encryption` | Whether to encrypt snapshot files in the S3 bucket. This setting uses AES-256 with S3-managed keys. See [Protecting data using server-side encryption](https://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html). Default is false. Optional. +`server_side_encryption` | Whether to encrypt snapshot files in the S3 bucket. This setting uses AES-256 with S3-managed keys. See [Protecting data using server-side encryption](https://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html). Default is `false`. Optional. `storage_class` | Specifies the [S3 storage class](https://docs.aws.amazon.com/AmazonS3/latest/dev/storage-class-intro.html) for the snapshots files. Default is `standard`. Do not use the `glacier` and `deep_archive` storage classes. Optional. For the `base_path` parameter, do not enter the `s3://` prefix when entering your S3 bucket details. Only the name of the bucket is required. diff --git a/_api-reference/snapshots/create-snapshot.md b/_api-reference/snapshots/create-snapshot.md index 4f0a6d05cf..6334878d8c 100644 --- a/_api-reference/snapshots/create-snapshot.md +++ b/_api-reference/snapshots/create-snapshot.md @@ -42,9 +42,9 @@ The request body is optional. Field | Data type | Description :--- | :--- | :--- `indices` | String | The indices you want to include in the snapshot. You can use `,` to create a list of indices, `*` to specify an index pattern, and `-` to exclude certain indices. Don't put spaces between items. Default is all indices. -`ignore_unavailable` | Boolean | If an index from the `indices` list doesn't exist, whether to ignore it rather than fail the snapshot. Default is false. -`include_global_state` | Boolean | Whether to include cluster state in the snapshot. Default is true. -`partial` | Boolean | Whether to allow partial snapshots. Default is false, which fails the entire snapshot if one or more shards fails to stor +`ignore_unavailable` | Boolean | If an index from the `indices` list doesn't exist, whether to ignore it rather than fail the snapshot. Default is `false`. +`include_global_state` | Boolean | Whether to include cluster state in the snapshot. Default is `true`. +`partial` | Boolean | Whether to allow partial snapshots. Default is `false`, which fails the entire snapshot if one or more shards fails to stor #### Example requests diff --git a/_api-reference/snapshots/get-snapshot-repository.md b/_api-reference/snapshots/get-snapshot-repository.md index e3664e11a8..501d0785dd 100644 --- a/_api-reference/snapshots/get-snapshot-repository.md +++ b/_api-reference/snapshots/get-snapshot-repository.md @@ -27,7 +27,7 @@ You can also get details about a snapshot during and after snapshot creation. Se | Parameter | Data type | Description | :--- | :--- | :--- | local | Boolean | Whether to get information from the local node. Optional, defaults to `false`.| -| cluster_manager_timeout | Time | Amount of time to wait for a connection to the master node. Optional, defaults to 30 seconds. | +| cluster_manager_timeout | Time | Amount of time to wait for a connection to the cluster manager node. Optional, defaults to 30 seconds. | #### Example request diff --git a/_api-reference/snapshots/verify-snapshot-repository.md b/_api-reference/snapshots/verify-snapshot-repository.md index 2929952472..12fada3303 100644 --- a/_api-reference/snapshots/verify-snapshot-repository.md +++ b/_api-reference/snapshots/verify-snapshot-repository.md @@ -29,7 +29,7 @@ Path parameters are optional. | Parameter | Data type | Description | :--- | :--- | :--- -| cluster_manager_timeout | Time | Amount of time to wait for a connection to the master node. Optional, defaults to `30s`. | +| cluster_manager_timeout | Time | Amount of time to wait for a connection to the cluster manager node. Optional, defaults to `30s`. | | timeout | Time | The period of time to wait for a response. If a response is not received before the timeout value, the request fails and returns an error. Defaults to `30s`. | #### Example request diff --git a/_clients/javascript/helpers.md b/_clients/javascript/helpers.md index f88efd8e00..c6cff46be0 100644 --- a/_clients/javascript/helpers.md +++ b/_clients/javascript/helpers.md @@ -62,7 +62,7 @@ When creating a new bulk helper instance, you can use the following configuratio | `flushBytes` | Integer | Optional. Default is 5,000,000. | Maximum bulk body size to send in bytes. | `flushInterval` | Integer | Optional. Default is 30,000. | Time in milliseconds to wait before flushing the body after the last document has been read. | `onDrop` | Function | Optional. Default is `noop`. | A function to be invoked for every document that can’t be indexed after reaching the maximum number of retries. -| `refreshOnCompletion` | Boolean | Optional. Default is false. | Whether or not a refresh should be run on all affected indexes at the end of the bulk operation. +| `refreshOnCompletion` | Boolean | Optional. Default is `false`. | Whether or not a refresh should be run on all affected indexes at the end of the bulk operation. | `retries` | Integer | Optional. Defaults to the client's `maxRetries` value. | The number of times an operation is retried before `onDrop` is called for that document. | `wait` | Integer | Optional. Default is 5,000. | Time in milliseconds to wait before retrying an operation. diff --git a/_dashboards/visualize/visbuilder.md b/_dashboards/visualize/visbuilder.md index 51ce5b1e46..2b6818a00e 100644 --- a/_dashboards/visualize/visbuilder.md +++ b/_dashboards/visualize/visbuilder.md @@ -27,7 +27,7 @@ Follow these steps to create a new visualization using VisBuilder in your enviro 1. Open Dashboards: - If you're not running the Security plugin, go to http://localhost:5601. - - If you're running the Security plugin, go to https://localhost:5601 and log in with your username and password (default is admin/admin). + - If you're running the Security plugin, go to https://localhost:5601 and log in with your username and password (default is `admin/admin`). 1. From the top menu, select **Visualize > Create visualization > VisBuilder**. diff --git a/_data-prepper/managing-data-prepper/configuring-data-prepper.md b/_data-prepper/managing-data-prepper/configuring-data-prepper.md index d6750daba4..d890b741cc 100644 --- a/_data-prepper/managing-data-prepper/configuring-data-prepper.md +++ b/_data-prepper/managing-data-prepper/configuring-data-prepper.md @@ -65,9 +65,9 @@ Option | Required | Type | Description ssl | No | Boolean | Enables TLS/SSL. Default is `true`. ssl_certificate_file | Conditionally | String | The SSL certificate chain file path or AWS S3 path. S3 path example `s3:///`. Required if `ssl` is true and `use_acm_certificate_for_ssl` is false. Defaults to `config/default_certificate.pem` which is the default certificate file. Read more about how the certificate file is generated [here](https://github.com/opensearch-project/data-prepper/tree/main/examples/certificates). ssl_key_file | Conditionally | String | The SSL key file path or AWS S3 path. S3 path example `s3:///`. Required if `ssl` is true and `use_acm_certificate_for_ssl` is false. Defaults to `config/default_private_key.pem` which is the default private key file. Read more about how the default private key file is generated [here](https://github.com/opensearch-project/data-prepper/tree/main/examples/certificates). -ssl_insecure_disable_verification | No | Boolean | Disables the verification of server's TLS certificate chain. Default is false. -ssl_fingerprint_verification_only | No | Boolean | Disables the verification of server's TLS certificate chain and instead verifies only the certificate fingerprint. Default is false. -use_acm_certificate_for_ssl | No | Boolean | Enables TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default is false. +ssl_insecure_disable_verification | No | Boolean | Disables the verification of server's TLS certificate chain. Default is `false`. +ssl_fingerprint_verification_only | No | Boolean | Disables the verification of server's TLS certificate chain and instead verifies only the certificate fingerprint. Default is `false`. +use_acm_certificate_for_ssl | No | Boolean | Enables TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default is `false`. acm_certificate_arn | Conditionally | String | The ACM certificate ARN. The ACM certificate takes preference over S3 or a local file system certificate. Required if `use_acm_certificate_for_ssl` is set to true. acm_private_key_password | No | String | The ACM private key password that decrypts the private key. If not provided, Data Prepper generates a random password. acm_certificate_timeout_millis | No | Integer | The timeout in milliseconds for ACM to get certificates. Default is 120000. diff --git a/_im-plugin/index-rollups/rollup-api.md b/_im-plugin/index-rollups/rollup-api.md index 61bfdf76d4..5064d2ac49 100644 --- a/_im-plugin/index-rollups/rollup-api.md +++ b/_im-plugin/index-rollups/rollup-api.md @@ -105,8 +105,8 @@ Options | Description | Type | Required `schedule.interval.cron.expression` | Specify a Unix cron expression. | String | Yes `schedule.interval.cron.timezone` | Specify timezones as defined by the IANA Time Zone Database. Defaults to UTC. | String | No `description` | Optionally, describe the rollup job. | String | No -`enabled` | When true, the index rollup job is scheduled. Default is true. | Boolean | Yes -`continuous` | Specify whether or not the index rollup job continuously rolls up data forever or just executes over the current data set once and stops. Default is false. | Boolean | Yes +`enabled` | When true, the index rollup job is scheduled. Default is `true`. | Boolean | Yes +`continuous` | Specify whether or not the index rollup job continuously rolls up data forever or executes over the current dataset once and stops. Default is `false`. | Boolean | Yes `error_notification` | Set up a Mustache message template for error notifications. For example, if an index rollup job fails, the system sends a message to a Slack channel. | Object | No `page_size` | Specify the number of buckets to paginate at a time during rollup. | Number | Yes `delay` | The number of milliseconds to delay execution of the index rollup job. | Long | No diff --git a/_im-plugin/index-transforms/transforms-apis.md b/_im-plugin/index-transforms/transforms-apis.md index df9ff19f8f..37d2c035b5 100644 --- a/_im-plugin/index-transforms/transforms-apis.md +++ b/_im-plugin/index-transforms/transforms-apis.md @@ -39,7 +39,7 @@ You can specify the following options in the HTTP request body: Option | Data Type | Description | Required :--- | :--- | :--- | :--- enabled | Boolean | If true, the transform job is enabled at creation. | No -continuous | Boolean | Specifies whether the transform job should be continuous. Continuous jobs execute every time they are scheduled according to the `schedule` field and run based off of newly transformed buckets as well as any new data added to source indexes. Non-continuous jobs execute only once. Default is false. | No +continuous | Boolean | Specifies whether the transform job should be continuous. Continuous jobs execute every time they are scheduled according to the `schedule` field and run based off of newly transformed buckets as well as any new data added to source indexes. Non-continuous jobs execute only once. Default is `false`. | No schedule | Object | The schedule for the transform job. | Yes start_time | Integer | The Unix epoch time of the transform job's start time. | Yes description | String | Describes the transform job. | No @@ -447,7 +447,7 @@ from | The starting transform to return. Default is 0. | No size | Specifies the number of transforms to return. Default is 10. | No search |The search term to use to filter results. | No sortField | The field to sort results with. | No -sortDirection | Specifies the direction to sort results in. Can be `ASC` or `DESC`. Default is ASC. | No +sortDirection | Specifies the direction to sort results in. Can be `ASC` or `DESC`. Default is `ASC`. | No #### Sample Request diff --git a/_observing-your-data/notifications/api.md b/_observing-your-data/notifications/api.md index 2930d15ecb..882977b153 100644 --- a/_observing-your-data/notifications/api.md +++ b/_observing-your-data/notifications/api.md @@ -200,7 +200,7 @@ config | Object | Contains all relevant information, such as channel name, confi name | String | Name of the channel. | Yes description | String | The channel's description. | No config_type | String | The destination of your notification. Valid options are `sns`, `slack`, `chime`, `webhook`, `smtp_account`, `ses_account`, `email_group`, and `email`. | Yes -is_enabled | Boolean | Indicates whether the channel is enabled for sending and receiving notifications. Default is true. | No +is_enabled | Boolean | Indicates whether the channel is enabled for sending and receiving notifications. Default is `true`. | No The create channel operation accepts multiple `config_types` as possible notification destinations, so follow the format for your preferred `config_type`. diff --git a/_security/audit-logs/storage-types.md b/_security/audit-logs/storage-types.md index c0707ff424..719287ad7f 100644 --- a/_security/audit-logs/storage-types.md +++ b/_security/audit-logs/storage-types.md @@ -53,8 +53,8 @@ If you use `external_opensearch` and the remote cluster also uses the Security p Name | Data type | Description :--- | :--- | :--- -`plugins.security.audit.config.enable_ssl` | Boolean | If you enabled SSL/TLS on the receiving cluster, set to true. The default is false. -`plugins.security.audit.config.verify_hostnames` | Boolean | Whether to verify the hostname of the SSL/TLS certificate of the receiving cluster. Default is true. +`plugins.security.audit.config.enable_ssl` | Boolean | If you enabled SSL/TLS on the receiving cluster, set to true. The Default is `false`. +`plugins.security.audit.config.verify_hostnames` | Boolean | Whether to verify the hostname of the SSL/TLS certificate of the receiving cluster. Default is `true`. `plugins.security.audit.config.pemtrustedcas_filepath` | String | The trusted root certificate of the external OpenSearch cluster, relative to the `config` directory. `plugins.security.audit.config.pemtrustedcas_content` | String | Instead of specifying the path (`plugins.security.audit.config.pemtrustedcas_filepath`), you can configure the Base64-encoded certificate content directly. `plugins.security.audit.config.enable_ssl_client_auth` | Boolean | Whether to enable SSL/TLS client authentication. If you set this to true, the audit log module sends the node's certificate along with the request. The receiving cluster can use this certificate to verify the identity of the caller. diff --git a/_security/authentication-backends/openid-connect.md b/_security/authentication-backends/openid-connect.md index 8efb66fbb6..8e785a9e65 100755 --- a/_security/authentication-backends/openid-connect.md +++ b/_security/authentication-backends/openid-connect.md @@ -181,8 +181,8 @@ config: Name | Description :--- | :--- -`enable_ssl` | Whether to use TLS. Default is false. -`verify_hostnames` | Whether to verify the hostnames of the IdP's TLS certificate. Default is true. +`enable_ssl` | Whether to use TLS. Default is `false`. +`verify_hostnames` | Whether to verify the hostnames of the IdP's TLS certificate. Default is `true`. ### Certificate validation @@ -252,7 +252,7 @@ config: Name | Description :--- | :--- -`enable_ssl_client_auth` | Whether to send the client certificate to the IdP server. Default is false. +`enable_ssl_client_auth` | Whether to send the client certificate to the IdP server. Default is `false`. `pemcert_filepath` | Absolute path to the client certificate. `pemcert_content` | The content of the client certificate. Cannot be used when `pemcert_filepath` is set. `pemkey_filepath` | Absolute path to the file containing the private key of the client certificate. diff --git a/_security/authentication-backends/proxy.md b/_security/authentication-backends/proxy.md index bb7d1f0151..7716b1d6d2 100644 --- a/_security/authentication-backends/proxy.md +++ b/_security/authentication-backends/proxy.md @@ -40,7 +40,7 @@ You can configure the following settings: Name | Description :--- | :--- -`enabled` | Enables or disables proxy support. Default is false. +`enabled` | Enables or disables proxy support. Default is `false`. `internalProxies` | A regular expression containing the IP addresses of all trusted proxies. The pattern `.*` trusts all internal proxies. `remoteIpHeader` | Name of the HTTP header field that has the hostname chain. Default is `x-forwarded-for`. diff --git a/_security/authentication-backends/saml.md b/_security/authentication-backends/saml.md index a4511a5325..652345ccdc 100755 --- a/_security/authentication-backends/saml.md +++ b/_security/authentication-backends/saml.md @@ -244,7 +244,7 @@ If you are loading the IdP metadata from a URL, we recommend that you use SSL/TL Name | Description :--- | :--- -`idp.enable_ssl` | Whether to enable the custom TLS configuration. Default is false (JDK settings are used). +`idp.enable_ssl` | Whether to enable the custom TLS configuration. Default is `false` (JDK settings are used). `idp.verify_hostnames` | Whether to verify the hostnames of the server's TLS certificate. Example: @@ -302,7 +302,7 @@ The Security plugin can use TLS client authentication when fetching the IdP meta Name | Description :--- | :--- -`idp.enable_ssl_client_auth` | Whether to send a client certificate to the IdP server. Default is false. +`idp.enable_ssl_client_auth` | Whether to send a client certificate to the IdP server. Default is `false`. `idp.pemcert_filepath` | Path to the PEM file containing the client certificate. The file must be placed under the OpenSearch `config` directory, and the path must be specified relative to the `config` directory. `idp.pemcert_content` | The content of the client certificate. Cannot be used when `pemcert_filepath` is set. `idp.pemkey_filepath` | Path to the private key of the client certificate. The file must be placed under the OpenSearch `config` directory, and the path must be specified relative to the `config` directory. diff --git a/_security/configuration/security-admin.md b/_security/configuration/security-admin.md index ed293b7e91..77d3711385 100755 --- a/_security/configuration/security-admin.md +++ b/_security/configuration/security-admin.md @@ -201,7 +201,7 @@ Name | Description `-cn` | Cluster name. Default is `opensearch`. `-icl` | Ignore cluster name. `-sniff` | Sniff cluster nodes. Sniffing detects available nodes using the OpenSearch `_cluster/state` API. -`-arc,--accept-red-cluster` | Execute `securityadmin.sh` even if the cluster state is red. Default is false, which means the script will not execute on a red cluster. +`-arc,--accept-red-cluster` | Execute `securityadmin.sh` even if the cluster state is red. Default is `false`, which means the script will not execute on a red cluster. ### Certificate validation settings @@ -210,7 +210,7 @@ Use the following options to control certificate validation. Name | Description :--- | :--- -`-nhnv` | Do not validate hostname. Default is false. +`-nhnv` | Do not validate hostname. Default is `false`. `-nrhn` | Do not resolve hostname. Only relevant if `-nhnv` is not set. diff --git a/_security/configuration/tls.md b/_security/configuration/tls.md index d06b16a47e..bca932bc0c 100755 --- a/_security/configuration/tls.md +++ b/_security/configuration/tls.md @@ -52,11 +52,11 @@ The following settings configure the location and password of your keystore and Name | Description :--- | :--- -`plugins.security.ssl.transport.keystore_type` | The type of the keystore file, JKS or PKCS12/PFX. Optional. Default is JKS. +`plugins.security.ssl.transport.keystore_type` | The type of the keystore file, `JKS` or `PKCS12/PFX`. Optional. Default is `JKS`. `plugins.security.ssl.transport.keystore_filepath` | Path to the keystore file, which must be under the `config` directory, specified using a relative path. Required. `plugins.security.ssl.transport.keystore_alias` | The alias name of the keystore. Optional. Default is the first alias. `plugins.security.ssl.transport.keystore_password` | Keystore password. Default is `changeit`. -`plugins.security.ssl.transport.truststore_type` | The type of the truststore file, JKS or PKCS12/PFX. Default is JKS. +`plugins.security.ssl.transport.truststore_type` | The type of the truststore file, `JKS` or `PKCS12/PFX`. Default is `JKS`. `plugins.security.ssl.transport.truststore_filepath` | Path to the truststore file, which must be under the `config` directory, specified using a relative path. Required. `plugins.security.ssl.transport.truststore_alias` | The alias name of the truststore. Optional. Default is all certificates. `plugins.security.ssl.transport.truststore_password` | Truststore password. Default is `changeit`. @@ -65,7 +65,7 @@ Name | Description Name | Description :--- | :--- -`plugins.security.ssl.http.enabled` | Whether to enable TLS on the REST layer. If enabled, only HTTPS is allowed. Optional. Default is false. +`plugins.security.ssl.http.enabled` | Whether to enable TLS on the REST layer. If enabled, only HTTPS is allowed. Optional. Default is `false`. `plugins.security.ssl.http.keystore_type` | The type of the keystore file, JKS or PKCS12/PFX. Optional. Default is JKS. `plugins.security.ssl.http.keystore_filepath` | Path to the keystore file, which must be under the `config` directory, specified using a relative path. Required. `plugins.security.ssl.http.keystore_alias` | The alias name of the keystore. Optional. Default is the first alias. @@ -150,8 +150,8 @@ If OpenSSL is enabled, but for one reason or another the installation does not w Name | Description :--- | :--- -`plugins.security.ssl.transport.enable_openssl_if_available` | Enable OpenSSL on the transport layer if available. Optional. Default is true. -`plugins.security.ssl.http.enable_openssl_if_available` | Enable OpenSSL on the REST layer if available. Optional. Default is true. +`plugins.security.ssl.transport.enable_openssl_if_available` | Enable OpenSSL on the transport layer if available. Optional. Default is `true`. +`plugins.security.ssl.http.enable_openssl_if_available` | Enable OpenSSL on the REST layer if available. Optional. Default is `true`. {% comment %} 1. Install [OpenSSL 1.1.0](https://www.openssl.org/community/binaries.html) on every node. @@ -179,8 +179,8 @@ In addition, when `resolve_hostname` is enabled, the Security plugin resolves th Name | Description :--- | :--- -`plugins.security.ssl.transport.enforce_hostname_verification` | Whether to verify hostnames on the transport layer. Optional. Default is true. -`plugins.security.ssl.transport.resolve_hostname` | Whether to resolve hostnames against DNS on the transport layer. Optional. Default is true. Only works if hostname verification is also enabled. +`plugins.security.ssl.transport.enforce_hostname_verification` | Whether to verify hostnames on the transport layer. Optional. Default is `true`. +`plugins.security.ssl.transport.resolve_hostname` | Whether to resolve hostnames against DNS on the transport layer. Optional. Default is `true`. Only works if hostname verification is also enabled. ## (Advanced) Client authentication diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/sm-api.md b/_tuning-your-cluster/availability-and-recovery/snapshots/sm-api.md index cd3a238f9c..5d89a3747b 100644 --- a/_tuning-your-cluster/availability-and-recovery/snapshots/sm-api.md +++ b/_tuning-your-cluster/availability-and-recovery/snapshots/sm-api.md @@ -181,7 +181,7 @@ Parameter | Type | Description `enabled` | Boolean | Should this SM policy be enabled at creation? Optional. `snapshot_config` | Object | The configuration options for snapshot creation. Required. `snapshot_config.date_format` | String | Snapshot names have the format `--`. `date_format` specifies the format for the date in the snapshot name. Supports all date formats supported by OpenSearch. Optional. Default is "yyyy-MM-dd'T'HH:mm:ss". -`snapshot_config.date_format_timezone` | String | Snapshot names have the format `--`. `date_format_timezone` specifies the time zone for the date in the snapshot name. Optional. Default is UTC. +`snapshot_config.date_format_timezone` | String | Snapshot names have the format `--`. `date_format_timezone` specifies the time zone for the date in the snapshot name. Optional. Default is `UTC`. `snapshot_config.indices` | String | The names of the indexes in the snapshot. Multiple index names are separated by `,`. Supports wildcards (`*`). Optional. Default is `*` (all indexes). `snapshot_config.repository` | String | The repository in which to store snapshots. Required. `snapshot_config.ignore_unavailable` | Boolean | Do you want to ignore unavailable indexes? Optional. Default is `false`. @@ -197,7 +197,7 @@ Parameter | Type | Description `deletion.delete_condition` | Object | Conditions for snapshot deletion. Optional. `deletion.delete_condition.max_count` | Integer | The maximum number of snapshots to be retained. Optional. `deletion.delete_condition.max_age` | String | The maximum time a snapshot is retained. Optional. -`deletion.delete_condition.min_count` | Integer | The minimum number of snapshots to be retained. Optional. Default is one. +`deletion.delete_condition.min_count` | Integer | The minimum number of snapshots to be retained. Optional. Default is `1`. `notification` | Object | Defines notifications for SM events. Optional. `notification.channel` | Object | Defines a channel for notifications. You must [create and configure a notification channel]({{site.url}}{{site.baseurl}}/notifications-plugin/api) before setting up SM notifications. Required. `notification.channel.id` | String | The channel ID of the channel used for notifications. To get the channel IDs of all created channels, use `GET _plugins/_notifications/configs`. Required. diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md b/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md index f35115c95f..812d5104c7 100644 --- a/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md +++ b/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md @@ -475,10 +475,10 @@ POST /_snapshot/my-repository/2/_restore Request parameters | Description :--- | :--- `indices` | The indexes you want to restore. You can use `,` to create a list of indexes, `*` to specify an index pattern, and `-` to exclude certain indexes. Don't put spaces between items. Default is all indexes. -`ignore_unavailable` | If an index from the `indices` list doesn't exist, whether to ignore it rather than fail the restore operation. Default is false. -`include_global_state` | Whether to restore the cluster state. Default is false. -`include_aliases` | Whether to restore aliases alongside their associated indexes. Default is true. -`partial` | Whether to allow the restoration of partial snapshots. Default is false. +`ignore_unavailable` | If an index from the `indices` list doesn't exist, whether to ignore it rather than fail the restore operation. Default is `false`. +`include_global_state` | Whether to restore the cluster state. Default is `false`. +`include_aliases` | Whether to restore aliases alongside their associated indexes. Default is `true`. +`partial` | Whether to allow the restoration of partial snapshots. Default is `false`. `rename_pattern` | If you want to rename indexes as you restore them, use this option to specify a regular expression that matches all indexes you want to restore. Use capture groups (`()`) to reuse portions of the index name. `rename_replacement` | If you want to rename indexes as you restore them, use this option to specify the replacement pattern. Use `$0` to include the entire matching index name, `$1` to include the content of the first capture group, and so on. `index_settings` | If you want to change [index settings]({{site.url}}{{site.baseurl}}/im-plugin/index-settings/) applied during the restore operation, specify them here. You cannot change `index.number_of_shards`. diff --git a/_tuning-your-cluster/index.md b/_tuning-your-cluster/index.md index dbba404af8..99db78565f 100644 --- a/_tuning-your-cluster/index.md +++ b/_tuning-your-cluster/index.md @@ -20,7 +20,7 @@ To create and deploy an OpenSearch cluster according to your requirements, it’ There are many ways to design a cluster. The following illustration shows a basic architecture that includes a four-node cluster that has one dedicated cluster manager node, one dedicated coordinating node, and two data nodes that are cluster manager eligible and also used for ingesting data. - The nomenclature for the master node is now referred to as the cluster manager node. + The nomenclature for the cluster manager node is now referred to as the cluster manager node. {: .note } ![multi-node cluster architecture diagram]({{site.url}}{{site.baseurl}}/images/cluster.png) From a7f316f2a5e335508e4eff0d5082c87b309176de Mon Sep 17 00:00:00 2001 From: AntonEliatra Date: Thu, 11 Jul 2024 00:03:03 +0100 Subject: [PATCH 014/154] adding basic_auth config to ldap #907 (#7671) * adding basic_auth config to ldap #907 Signed-off-by: AntonEliatra * Update ldap.md Signed-off-by: AntonEliatra * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: AntonEliatra Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _security/authentication-backends/ldap.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/_security/authentication-backends/ldap.md b/_security/authentication-backends/ldap.md index 49b01e332b..9f98f7f5b0 100755 --- a/_security/authentication-backends/ldap.md +++ b/_security/authentication-backends/ldap.md @@ -61,8 +61,21 @@ We provide a fully functional example that can help you understand how to use an To enable LDAP authentication and authorization, add the following lines to `config/opensearch-security/config.yml`: +The internal user database authentication should also be enabled because OpenSearch Dashboards connects to OpenSearch using the `kibanaserver` internal user. +{: .note} + ```yml authc: + internal_auth: + order: 0 + description: "HTTP basic authentication using the internal user database" + http_enabled: true + transport_enabled: true + http_authenticator: + type: basic + challenge: false + authentication_backend: + type: internal ldap: http_enabled: true transport_enabled: true From c2bfab30a1025afe9ac136dfbcf7b7b5fe03f7af Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Thu, 11 Jul 2024 08:49:59 -0400 Subject: [PATCH 015/154] Remove Point in Time from Vale terms (#7679) Signed-off-by: Fanit Kolchina --- .github/vale/styles/Vocab/OpenSearch/Products/accept.txt | 1 - 1 file changed, 1 deletion(-) diff --git a/.github/vale/styles/Vocab/OpenSearch/Products/accept.txt b/.github/vale/styles/Vocab/OpenSearch/Products/accept.txt index 83e9aee603..9be8da79a9 100644 --- a/.github/vale/styles/Vocab/OpenSearch/Products/accept.txt +++ b/.github/vale/styles/Vocab/OpenSearch/Products/accept.txt @@ -76,7 +76,6 @@ Painless Peer Forwarder Performance Analyzer Piped Processing Language -Point in Time Powershell Python PyTorch From 8fffcbc45066ac13003e4e5ab630089e34fe9d5b Mon Sep 17 00:00:00 2001 From: AntonEliatra Date: Thu, 11 Jul 2024 16:43:07 +0100 Subject: [PATCH 016/154] Adding DLS with write permission recommendation #1273 (#7668) * Adding DLS with write permission recommendation #1273 Signed-off-by: AntonEliatra * Update _security/access-control/document-level-security.md Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: AntonEliatra --------- Signed-off-by: AntonEliatra Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _security/access-control/document-level-security.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/_security/access-control/document-level-security.md b/_security/access-control/document-level-security.md index 08de85bbf7..352fe06a61 100644 --- a/_security/access-control/document-level-security.md +++ b/_security/access-control/document-level-security.md @@ -191,6 +191,10 @@ Adaptive | `adaptive-level` | The default setting that allows OpenSearch to auto OpenSearch combines all DLS queries with the logical `OR` operator. However, when a role that uses DLS is combined with another security role that doesn't use DLS, the query results are filtered to display only documents matching the DLS from the first role. This filter rule also applies to roles that do not grant read documents. +### DLS and write permissions + +Make sure that a user that has DLS-configured roles does not have write permissions. If write permissions are added, the user will be able to index documents which they will not be able to retrieve due to DLS filtering. + ### When to enable `plugins.security.dfm_empty_overrides_all` When to enable the `plugins.security.dfm_empty_overrides_all` setting depends on whether you want to restrict user access to documents without DLS. From 33ba41c8fa3ffe9f2c85dacd6103ce1a15a8fc31 Mon Sep 17 00:00:00 2001 From: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> Date: Thu, 11 Jul 2024 13:41:30 -0400 Subject: [PATCH 017/154] Add some example results to make functionality clearer (#7686) There doesn't seem to be a detailed spec for these functions. For example, what are the arguments of substring? First and last positions? (no) First position and length? (yes) Is position 0-origin or 1-origin? (1-origin) Does it accept counting position from the end with negative arguments? (yes) I've added an example result which at least clarifies the first 3 questions. Signed-off-by: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> --- _search-plugins/sql/functions.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/_search-plugins/sql/functions.md b/_search-plugins/sql/functions.md index de3b578e1a..9706148d76 100644 --- a/_search-plugins/sql/functions.md +++ b/_search-plugins/sql/functions.md @@ -32,10 +32,10 @@ The SQL plugin supports the following common functions shared across the SQL and | `expm1` | `expm1(number T) -> double` | `SELECT expm1(0.5)` | | `floor` | `floor(number T) -> long` | `SELECT floor(0.5)` | | `ln` | `ln(number T) -> double` | `SELECT ln(10)` | -| `log` | `log(number T) -> double` or `log(number T, number T) -> double` | `SELECT log(10)`, `SELECT log(2, 16)` | +| `log` | `log(number T) -> double` or `log(number T, number T) -> double` | `SELECT log(10) -> 2.3`, `SELECT log(2, 16) -> 4`| | `log2` | `log2(number T) -> double` | `SELECT log2(10)` | -| `log10` | `log10(number T) -> double` | `SELECT log10(10)` | -| `mod` | `mod(number T, number T) -> T` | `SELECT mod(2, 3)` | +| `log10` | `log10(number T) -> double` | `SELECT log10(100)` | +| `mod` | `mod(number T, number T) -> T` | `SELECT mod(10,4) -> 2 ` | | `modulus` | `modulus(number T, number T) -> T` | `SELECT modulus(2, 3)` | | `multiply` | `multiply(number T, number T) -> T` | `SELECT multiply(2, 3)` | | `pi` | `pi() -> double` | `SELECT pi()` | @@ -162,7 +162,7 @@ Functions marked with * are only available in SQL. | `replace` | `replace(string, string, string) -> string` | `SELECT replace('hello', 'l', 'x')` | | `right` | `right(string, integer) -> string` | `SELECT right('hello', 2)` | | `rtrim` | `rtrim(string) -> string` | `SELECT rtrim('hello ')` | -| `substring` | `substring(string, integer, integer) -> string` | `SELECT substring('hello', 2, 4)` | +| `substring` | `substring(string, integer, integer) -> string` | `SELECT substring('hello', 2, 2) -> 'el'` | | `trim` | `trim(string) -> string` | `SELECT trim(' hello')` | | `upper` | `upper(string) -> string` | `SELECT upper('hello world')` | From cb4e4ca89a365a1130fe415e83746d695caede47 Mon Sep 17 00:00:00 2001 From: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> Date: Thu, 11 Jul 2024 15:09:00 -0400 Subject: [PATCH 018/154] Update functions.md (#7688) Several of the functions mentioned in the SQL/PPL Functions page (https://opensearch.org/docs/latest/search-plugins/sql/functions/) are not in fact implemented in PPL. Signed-off-by: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> --- _search-plugins/sql/ppl/functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/sql/ppl/functions.md b/_search-plugins/sql/ppl/functions.md index 275030f723..d192799f2e 100644 --- a/_search-plugins/sql/ppl/functions.md +++ b/_search-plugins/sql/ppl/functions.md @@ -11,7 +11,7 @@ redirect_from: # Commands -`PPL` supports all [`SQL` common]({{site.url}}{{site.baseurl}}/search-plugins/sql/functions/) functions, including [relevance search]({{site.url}}{{site.baseurl}}/search-plugins/sql/full-text/), but also introduces few more functions (called `commands`) which are available in `PPL` only. +`PPL` supports most [`SQL` common]({{site.url}}{{site.baseurl}}/search-plugins/sql/functions/) functions, including [relevance search]({{site.url}}{{site.baseurl}}/search-plugins/sql/full-text/), but also introduces few more functions (called `commands`) which are available in `PPL` only. ## dedup From 2df2b5d263464a2adf1de61b37a5b5add603c58f Mon Sep 17 00:00:00 2001 From: Heather Halter Date: Thu, 11 Jul 2024 13:43:36 -0700 Subject: [PATCH 019/154] Update _security/configuration/tls.md (#7691) Removed a link to a section that referenced itself. Signed-off-by: Heather Halter --- _security/configuration/tls.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_security/configuration/tls.md b/_security/configuration/tls.md index bca932bc0c..a4115b8c25 100755 --- a/_security/configuration/tls.md +++ b/_security/configuration/tls.md @@ -137,7 +137,7 @@ plugins.security.authcz.admin_dn: For security reasons, you cannot use wildcards or regular expressions as values for the `admin_dn` setting. -For more information about admin and super admin user roles, see [Admin and super admin roles](https://opensearch.org/docs/latest/security/access-control/users-roles/#admin-and-super-admin-roles) and [Configuring super admin certificates](https://opensearch.org/docs/latest/security/configuration/tls/#configuring-admin-certificates). +For more information about admin and super admin user roles, see [Admin and super admin roles](https://opensearch.org/docs/latest/security/access-control/users-roles/#admin-and-super-admin-roles). ## (Advanced) OpenSSL From 015481acb5d15e86bdad77852698b1b04d79c654 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Fri, 12 Jul 2024 13:44:05 -0400 Subject: [PATCH 020/154] Add geopolygon query (#7665) * Add geopolygon query Signed-off-by: Fanit Kolchina * Update _query-dsl/geo-and-xy/geopolygon.md Co-authored-by: Melissa Vagi Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Add link to index file Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower --- _query-dsl/geo-and-xy/geopolygon.md | 177 ++++++++++++++++++++++++++++ _query-dsl/geo-and-xy/index.md | 2 +- images/geopolygon-query.png | Bin 0 -> 51100 bytes 3 files changed, 178 insertions(+), 1 deletion(-) create mode 100644 _query-dsl/geo-and-xy/geopolygon.md create mode 100644 images/geopolygon-query.png diff --git a/_query-dsl/geo-and-xy/geopolygon.md b/_query-dsl/geo-and-xy/geopolygon.md new file mode 100644 index 0000000000..c53b1379cf --- /dev/null +++ b/_query-dsl/geo-and-xy/geopolygon.md @@ -0,0 +1,177 @@ +--- +layout: default +title: Geopolygon +parent: Geographic and xy queries +grand_parent: Query DSL +nav_order: 30 +--- + +# Geopolygon query + +A geopolygon query returns documents containing geopoints that are within the specified polygon. A document containing multiple geopoints matches the query if at least one geopoint matches the query. + +A polygon is specified by a list of vertices in coordinate form. Unlike specifying a polygon for a geoshape field, the polygon does not have to be closed (specifying the first and last points at the same is unnecessary). Though points do not have to follow either clockwise or counterclockwise order, it is recommended that you list them in either of these orders. This will ensure that the correct polygon is captured. + +The searched document field must be mapped as `geo_point`. +{: .note} + +## Example + +Create a mapping with the `point` field mapped as `geo_point`: + +```json +PUT /testindex1 +{ + "mappings": { + "properties": { + "point": { + "type": "geo_point" + } + } + } +} +``` +{% include copy-curl.html %} + +Index a geopoint, specifying its latitude and longitude: + +```json +PUT testindex1/_doc/1 +{ + "point": { + "lat": 73.71, + "lon": 41.32 + } +} +``` +{% include copy-curl.html %} + +Search for documents whose `point` objects are within the specified `geo_polygon`: + +```json +GET /testindex1/_search +{ + "query": { + "bool": { + "must": { + "match_all": {} + }, + "filter": { + "geo_polygon": { + "point": { + "points": [ + { "lat": 74.5627, "lon": 41.8645 }, + { "lat": 73.7562, "lon": 42.6526 }, + { "lat": 73.3245, "lon": 41.6189 }, + { "lat": 74.0060, "lon": 40.7128 } + ] + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +The polygon specified in the preceding request is the quadrilateral depicted in the following image. The matching document is within this quadrilateral. The coordinates of the quadrilateral vertices are specified in `(latitude, longitude)` format. + +![Search for points within the specified quadrilateral]({{site.url}}{{site.baseurl}}/images/geopolygon-query.png) + +The response contains the matching document: + +```json +{ + "took": 6, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "testindex1", + "_id": "1", + "_score": 1, + "_source": { + "point": { + "lat": 73.71, + "lon": 41.32 + } + } + } + ] + } +} +``` + +In the preceding search request, you specified the polygon vertices in clockwise order: + +```json +"geo_polygon": { + "point": { + "points": [ + { "lat": 74.5627, "lon": 41.8645 }, + { "lat": 73.7562, "lon": 42.6526 }, + { "lat": 73.3245, "lon": 41.6189 }, + { "lat": 74.0060, "lon": 40.7128 } + ] + } +} +``` + +Alternatively, you can specify the vertices in counterclockwise order: + +```json +"geo_polygon": { + "point": { + "points": [ + { "lat": 74.5627, "lon": 41.8645 }, + { "lat": 74.0060, "lon": 40.7128 }, + { "lat": 73.3245, "lon": 41.6189 }, + { "lat": 73.7562, "lon": 42.6526 } + ] + } +} +``` + +The resulting query response contains the same matching document. + +However, if you specify the vertices in the following order: + +```json +"geo_polygon": { + "point": { + "points": [ + { "lat": 74.5627, "lon": 41.8645 }, + { "lat": 74.0060, "lon": 40.7128 }, + { "lat": 73.7562, "lon": 42.6526 }, + { "lat": 73.3245, "lon": 41.6189 } + ] + } +} +``` + +The response returns no results. + +## Request fields + +Geopolygon queries accept the following fields. + +Field | Data type | Description +:--- | :--- | :--- +`_name` | String | The name of the filter. Optional. +`validation_method` | String | The validation method. Valid values are `IGNORE_MALFORMED` (accept geopoints with invalid coordinates), `COERCE` (try to coerce coordinates to valid values), and `STRICT` (return an error when coordinates are invalid). Optional. Default is `STRICT`. +`ignore_unmapped` | Boolean | Specifies whether to ignore an unmapped field. If set to `true`, then the query does not return any documents that contain an unmapped field. If set to `false`, then an exception is thrown when the field is unmapped. Optional. Default is `false`. + +## Accepted formats + +You can specify the geopoint coordinates when indexing a document and searching for documents in any [format]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-point#formats) accepted by the geopoint field type. \ No newline at end of file diff --git a/_query-dsl/geo-and-xy/index.md b/_query-dsl/geo-and-xy/index.md index cb0559927d..83cdbf08d7 100644 --- a/_query-dsl/geo-and-xy/index.md +++ b/_query-dsl/geo-and-xy/index.md @@ -30,7 +30,7 @@ OpenSearch provides the following geographic query types: - [**Geo-bounding box queries**]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/geo-and-xy/geo-bounding-box/): Return documents with geopoint field values that are within a bounding box. - [**Geodistance queries**]({{site.url}}{{site.baseurl}}/query-dsl/geo-and-xy/geodistance/): Return documents with geopoints that are within a specified distance from the provided geopoint. -- **Geopolygon queries**: Return documents with geopoints that are within a polygon. +- [**Geopolygon queries**]({{site.url}}{{site.baseurl}}/query-dsl/geo-and-xy/geodistance/): Return documents containing geopoints that are within a polygon. - **Geoshape queries**: Return documents that contain: - Geoshapes and geopoints that have one of four spatial relations to the provided shape: `INTERSECTS`, `DISJOINT`, `WITHIN`, or `CONTAINS`. - Geopoints that intersect the provided shape. \ No newline at end of file diff --git a/images/geopolygon-query.png b/images/geopolygon-query.png new file mode 100644 index 0000000000000000000000000000000000000000..16d73628de93e9739aa702ef2b27f0d0ca43bc01 GIT binary patch literal 51100 zcmbrlWmH^E*EWa+3GVJeg1c*QcTMo%?(P;mcyI{r?i$?PH3WBu;B%V$dEe*$X6DbV zVKs}U&pErQc5S)#wX4GvV(9i$_d;im1 zL_|?aM1(}q!Pdmw$`}kxGAuC(Mo~ozd*ICTfRt2dpZ%IrlFBb(Qshid^%ELN0t$q1 zUo>4smo^G2xbr)#cNkw1$t?Q<-jij2^A^T>=d44wsJH}!VBp%xv60fu-O0^;=R47S zbvg>}Cu;}?Ra}JuX8uHH@GY5^o`xqR1Rs(M3=(1z%!SUSzz-E;8wMrq&bRT-6^w1m zHf7Z8`RMg^E{}UvmIF)?rTZdX)&;MdD)uIU9>oOQuSE3EGHopss~3@h7O65xccUqn zee{3{mrr!(=x$JO6S$vvOLx2`odFrx2el9KBPLX2Jb4tbX&2ml0^O`PJo)+gjwk{F zRp4>tLT=(&M6cKxy`AR$5$sxJqRbOFES7Grx&=gcQ-L@=^JGY-8rRQPs97oIRVF~P@qW_fy*(m zDjPHDj#Uu)Z9*EQg?TrH;tQ8X0vba!m3@lgb9!FYp!@i)7}ThviFDD~r|2{;28`fu zM|+p;U$Dn-Fzzyw?vs7oWiQst@|bMx&&q-yn!Wl1OiU7*IFJ_Bj7X`zq%#gWU}jNB zBBJd%srTJU1@Mz zZyp^Uq`S@+y!Z(Y9no5*+A0YB-STno_qHM?8zdimT}YF31HHP>VmYfFNwM@Gge(QX z!tbyRc^<*Dx4|c;e__GZ?iuc<5F?O4y#M}AT8PjLGH_|w$`8^GoTv*1Du4+EF$t1G z*%1#VAW@%w5$aNybP?Po(7_H&HL&M5H5ZchFLOKaFW@@a*a&_w`mm7!*!mD{Fe)fe zn)`uuLpvZ1(Dj0Dh(f#{nETrlH-jlYcTFukB`{29A+&cXPg zI{m1Wz$pCVB+P%xDmo?L)1Y7mN(;d$9VxvlQdY$*L7j^}6kYp-n4|ksAg9`lCN+dZ zm^Zt9YSWB^2eO?=Cx}6USiCWtF~>G1bP9dn7KcR~(_Jq*5QjOXDNIw`>Vqz&z=w{+@82ims^VdjeoI*;P{(so+N&9A{y~=t z`6}u_-WMmDXPqaKca`Uv*Z7r1E9Ur9G^uH9W9-o&_E5|aF-37)L7c2q^fvU+`ys3$ zpCJtl2(jeuY`Vg2Md^=J)SMp~^I?@elx>vhif@!Ll?(Hr^C|O-r+26Qr_HCurf;Sf z@&>-zv3Rf$u@I#2rC6jyu#l@_sHGi>8>?`tSQaH0=hkRyd21C|)mY^mD=iT%!Exhq z(>B94t2tmgm~Mw`D_weA%HFHqpIi>?oDKS_E2`*I+hI1)u+d!8dMTPQ8=FGMkf;LTgq)Qk_ zPI1XW`NwI#qNQB!{QOGHip0{nyc@~1BDn?mNrtJ@$;`>A$%_T8(wS^Wz2t0}1-a?9 z+?C=-VFf)#W!nhd()Ax5KLu0-B|3#)bHJCnmPmTChkrMa&VRT0Vc0ZLwE)Xe!y(O5 z!qj4pfWLxs6?unyN8e3X^SM_~wcczp=xaCA5QjMvE=MBg4eJ=YHM@7crm6ktpNt4z zY+u$|4jJ}RW>0+vqf?_y%XOQV{_ep5Yb68wl{l*d^X6YQW0&=b)tCzwopU;tCH0** zSVu@ll^hKmpVGwB>^~E(aIHwM2)EGjf72E1vwto~PkR?zi zI3(;(+GpHkDAr>6Q?xVaz_-t}E$SHV_%LNYs5O(a^JV9$S8OGCOZn(B(qK^1gutN9 zf!z_ag`?HSuGaPQp36Gn&gqWs@AWOxE!0z`(~Cuw5!R{NUyj`)%a&0K?zWMQm3Q)I z!UcHoV`WD2e-|Ihu9~-HE)j3&@4Ww*Zc|-b-i1DUKe^ruB1|IwhN*)N5Hk5)(Ip}L zAdD;A31t)*7(g4474jpr4DkRVKcoV-9NC+A7q5Z;Shz{JiMowE01Xvi3HyR{n!kq2 zf!e};wxy`TY{yL3=`#WYC0PN7i9_+g^qP!>QMu8T(HY&0UW3I_{m$@+N^+K#u+}ea zdom<+Y+McEyI{=05<|Iu)-BpCszsY){$p$cY%+8S!#J487JA*M6S&3 zbk)@BxV?Vby>|Iqd4cawq!dg$io;G8UoBMd1n3z~F!VyQx;qmpGCOG674+!_u@Q_H zO^?Tapgg>{WX%~hPO;Xi8C0C5XF>6%!W(xPt{8l2VlmtaQ=2aRTiT+5U%uY*pcQR8 zu3$N7xoF8VZ&9&U=i)|awD#*}^5O3dA$}d_wl&V$e*MKc<)4BCIZlT$hX}_F>*&?h z%)+EbX4`Gv$}_x6imTxJz|RB?1b>O1`Oh9R%acCe**vRveQeS-Q`gk~oBvJ6=^5sV zc-ZI1dQw%nY3tbSGjA2|U*6MWr*-!x%@(dgx39^s6-j8?Qo=*6YqTzUwDmOnRiRB1 zb*4q5m1WkI%oo2IX~fa`Ssn@}bqy;i1}5BWD-l&yWl9Y6*{+ z7Gi5lHgmqk4;!j(2n6Vao-dB86Ry#ZL}y&Wf>UnFcVCTJ>zlLRS!k`XOr_59%wnz0 zyAJQEO0J(dJMcKjnT}vNboea9o+uBHy{bY&I z=(_rFt3(gc*BY+|mxuLOVD#>RrN9-E~TTZ3}Zm^~_f0R{KP< z0I@fz8}kjztLwoAtsZV0&wc0!@fx}kdTwQKW#I-!yPAjRe#TsD^icNDz}d{Yf;aWw zqU(!us)xeyiP)?xx$>ugN9EhObIV804O`^GnS#Q1bflIp7(%4*@{(Y`$4J2jr@$0F z!ATf0((+9aQ`25e;hv~6(;7|3p#2Vo!Mf+ce!D{YweJj4N9?N@g2VnKv4;WstR24a zw93nyJMlw{oMqrCS_u0uCkJQeYyPRP2UD)ayO)Dks9-y&Q>kIXF92v&H`bIgk&^?X z0p7!cfrpxdK>_c;fnPk}7YqzCE(8n)_=^Voie^Ln&sB))Y{>t44;}!zQCLMpN(%U^ zV&q_KY~yHV>m=-nTLOrhH&@kk(v*|sF|xI0F!*X~Xw2YdZ3lV;jL(e+cx!F!WI*C( zZDr%g|NrC4e@Fb+lbZj1l9T=ac=BIY{_jt! zIT|~N*jfWaItl!DXZ|znzc2phK|V&%%KxQ_f7JZ;DWGS8_k4{1yJrIL_gRx7!N7#T zq(p^P-N27CV7yfOZU+=Rz?G$nNWmd;T)(eJkcfrx;uVf4 z_VH5D1G)V!4nX^? z4BAdX+^|H-luo;a0tZI{``4>V z8-i)ST-|w}3)TAuhN-w6Vf z?aCf-{nz{e+5bPM=x5~Q#0L12(x*=lODik!85tQ5Dq4Ki)zyjCe;OX+S{g zhGYH6jcHgdn13*6UuID#A3%Cmy=DwPc&wX87JSFUGS>1lU;7C5p}2@rcB8P~=kLeK zSo3+zkHJ1PpUVGwj{Z_HpJ65;@k`Ih7@C@jg@A-qGs;Btqz_t|Rrd1sHWsY6QaKWE z*s!XXbKD;GfN^#NsG@PoF@|2^988;J&*)6>%i9;Zlr(00>B0~-~g0>=Xd zp_beg)doRBO-JYKr#)bG2{|@pvex1(LnxzG{(En)@P0DVxpxB__H9#y4xk8RDg*F9 zGf;GgfaeUF;ymive@t$fda3%bY)4nIC~oQ5JpFU8HMQ+!biwmL)d{cMqaAtf!i7~! z;-lBmf{_ysj~cg{pyj+>&YA9UL1v-JS~jPTfb^c>%*RJ@qPa_l+!UkFN(%+-b39&E zyNU?jt)%H_GmeA5)}s#})pPIq8*{M0Gof!9G9wG~c^RjMMvAw|Gug{T_&wY;zX@lA zrxUa%d}8$S6uPxspUIHrh)s!?AQxB`N3>ASB`$R0)Ea0yZym$lztf3L$n=gcY35SS zN~+(_fYFW^H4Pa;eOV+F>kPN=%QuNHjfcFkI8`_czu@Ify+ur%-GC8<&&PMa@7Kw( zA@#X_*iUjIWeDtlvV`GZs$`ZUUV3Wqp#H~Gn#91MpXsr~t?<_*$JgvfbWR*>DF3m4xJjG- zafd3-?s;`1CP;hs=$rn%zm1JZ@08(@`zqO}?pBg+3~ON#Ju_KPQqX1w_lh{2IdWe$ zYtb#UW1Fd(^QAn0z-+`5(wMi_hnF(yp5kZuEu!}~DumqLAcL6;u0LU--W%b4nF`Nm z!sdKOF(R*SS#C$n&!Hj|^HY@*1AHOANJo+GK1Nd%iyCd-gZaPCYzw~qMc{V-!;<;E zTmXlDp#FGHceX6EkYJ=Jrg7$W=3=_(uB0ICFZBd*%q=&8YcaseY|LfVJytT~! zHU4EjMZqZ9oPE>BPlGSuUZ{c;SRk)K?6hm zd@}q7UC5ZEoOF6GDz|W9Z8`f3Jj~6D-VCu8O)0uTBkR{v@V8P%nOPd@o1CP43?{=N zyb)QU@a)A~q%kq@?bTjwmkDKePOtjrqmH-=vXsc?pH2e2_6-9zSCXb5JpcZz(3!9A zuqkO>jBSOzR|*>Fo&W?r*s3KQ{e^l<4wI;U!ZEfGuzr48a9*vVOiw2W6liZehSHH6$$4yr`d;}>iZOx!C4@%?-{8k<+tyDuH}~nT6KV%p zoNlX07nFWZiHJ%;kKhhn*Ua(vke3VQ{0GmW%fls*E3*ZNX9(FFh4>y}+~VnC7c|7M z3&D^#Hk*tiiF)Kb2M7;w-H{0`{aFOdES3lM4o+3MjbdL~KT4saqjTJHpZab(-u);s ztSPIA;pOID90^P4KGNPrY(dFlIJ&dd`RtP!B^95A4g)Q^%j<9y?e?9Y-l_i*~th@>$4ws~iKG+Q|t0lIP=@Eq$ z*V>*nUUPK+RY=BX|GO`XSa3P>6cZhRLtg7rv`JxsbAD(NPvBL(_phVo5aWzTk^M~p z)^DGV2Re#?6}^!eyK0-0pM9;xTAar1mrtqGSm2Ia(*&qKe^f9gY`p{( zNL+l`M6n#bJlwzb7h(0HXP9;@QBl3aH8h};ajoW3bI~rx`Pi%Nv&p4{r=*pG0af;{y+Hj~L;<^yFd^0SB%&3(;`<)@gFI|uSRmJ&a+HQa{ z;3!eB)H!H~mTH_!WTr`*RRrUBSEF5&HSz{Z7Bad;57U1FSzJ6X;C1BI^}ZO$){IwK zlYO$`O5a_BSs6T!@|@(r`1UvZ3m%AFjBULPN+j5VWioV>f-z>$=bUT7@wn_j#LkEW(IJch#vOcyry z1RL!H^^}7}>YD`Z`{eAgUhUI>Ixr%>P=zq!f+5lvKBUTrgyTa(XO zYO#NaE!FMk_rdJ93?8zW+o8{WwkknXG4>tZr;prpI$V&(=7`J6QpjzERvf^!%pXYW zvAj0U*W?;0Cer%5Y$-^XLr-O^P(HNunR0YRY_BqJ#QvErKNRooL>=RGETp32>$f3D zcm<4Mnl|d2qnkZruSbmumybu*J0%0vcj+N<&R3V-q}Og6;=LDVS!I0X@cyW4-sbz| zPHAlwej>4&1xU8ykn3w7sEpG4_WV!%jSp>|IVEsgdgGxeR6|8Oxhr}9{K{#^l;l%x-iu|PG_y#gH@n=0y85;)Jq)yG(J zfk{2nE=R5y!%;^aq=8L=T7z-s&G0IXED@i@AA5aiBYH>lpB$awC7b2xH|0z=`EDNK z6ca+?5oc=Nn~rZ{V_J0y{Blt+|FVu0omT$f=_DR(T(c-fNQHaja_Y@z%`iYSDPBbFj^XbP~^61AbqJ4%??BXocK1vt{J0`NK0;di2RB;qQrufX!@C62nonJ-*c-_O}7Y z4gPSJkt%};aW-EEoWaZ2Bl-&57dS-f%k!*A7Bzuc5*GGg{)@ zkIlZLL7aNZmm6soxpgcQyojc0Q_0|D$KS9^Y0y!}PkWx3k|M2iwB%xOsV++x(u?X| zE}_m!>-WN_)NP|I^Erk#)rRkSf2gR(W{$LwXj@~zfFhedITXUH?N{=^ekNfss8a%P z(@h;?qo^nadD3D==_H)340T6G$KYRk5>&g03zZBpOCsa;_r9HT6Z&Y`da*Ndncrj+ zZDG_;VhW}Vg)5Siv2~=HD@>X3LW}`8utqky8TuLoz1%a zv&7F&9dRM~2lYN@R&nr}d*OtZFyVG72jpop8=*#dN)xukiz*ZMr-__ol~FbyvtBQ& zP#(ncUbdxhZD0D(iFHl8=$?9iqxc!7KEH2YypS86D!P$$4S)Bf2KlY5uTQtai% z4aU5##x>cMQU?!Dic|DqV3Ulh&8On_;~jghDGYloDTh^F4NQtrgFEy7j|#-Ej+RJ$ zG-JbJ*#r%_uRSS-7+P%2rLyirDSAm(QQs}W7fWBq@>Z}L95zn#?A+opqAAe2@vWZ! zIv^iVn388=4_;C!D00z$k5kCnP%iVStsJ%LK~EFB69rHcr;qkc>OmvsC3xS{pHU9J z=B;-VzF|XXj(g|SKD#~b0SKq>`EwuF6Y668e<5HFhFg z^)$mxDpusK=K+n2b>Y~HsVuuUt-p@Yxfp$<;~0E&U$xz15*rdsbvwM4vcoFpGvZT1 z3cg{a40}KYzW+zTDBqz7a&vQshK9n>(9nASFpf(a6uzh?DOnwtFU%%d%QOpebEil@ zWECk_mSyek=h8P8*ZXM3G>oboRc|0Z1JusoWTIFa!=#>}|KTWeIW;|f;`Gc+#ItG1 zO;h5Tl6zgTdip436~tPY#Cak;jb%ESwVK&#%I5@QzMbM+b{U@r z?&YB9;oP~nqfdeqS0B&R9M4skoe*GQNyhJw_xC0J{ryXrCqmk>x~(lfc64;qWN9}P zI}^}`sAFbqU~ z5Mz;-j89sG}!&%Ybx!Gd#7ud-%2bS!@8fF4UC{RYg6Y?li^m##H$2LS^Ee zW^&;3K4c ze$*B&S~k5=U}`|veJCl@i9`g=qyz{2wjZ0;aH}lg=h$#~c_9272!99AT$KURec#_E zqYihxL}*BBH6(H5XCiUq8dK7YnGE`G_huG&SIGfuyBD)$;Srqsi8po z>%PytX3L@82&V&ZZ*w;W&ArBjnd5DRXb_-p)4%@!Ro;EiHb zC4_iv5FMUOjo6Eu6XxdOp8U-$moKqCvLHv#c=9{g)niiQ3~O>zhuP_?D}pzxC+FIo z=<~EqkMrB4JRkt6ySw6?w(B2wd3kRna5Df`r&AffXQic$I6gTEJEN8V^5qLPBjd#H zn!vywYinH9uD9j+2_;fhSw4&5YNLBIs}Yu#rsZDuJDSPWVuOsOEZWavGBh-FW;}ye%_t4gQ>;IlQ0C?3W%#49iOKX?=c_!u zc3t-AT5IV}KaAk(6RWYQDLw`=Ax}a;Kmfw9-9LCwA0MjWnK7Lsyo~)~s_~O0U)vY+ z@8^S@M}Oph$>FKv_0Dp)o1fyZ81p?3JxkU7jrYDj#@C@;W59c(qGTwt@vd)<7790e zLQB`(Z|7|ntNbA#AO;5pBu;#Vg~6@Y+q68c=02gr9RD%}IE5G@=({xXx`Ey;CBRcb zNbDoj`!J3&E-DerICLB1!i$?qN<{bD!IhO2B~{e~6aHPX>MZ!&%-WsfC8w}s!eg^P zuWPwlk7lrM&Km}30nj-*{*CnDSbEZ5Jf&8Jo#H)Dt*Q)*sGg|JwRg^g2jskuncs)ur%Q z#p%sv%`cXzK!tQ+k7BGKfA;excajKXPpi%lCHX|y*b<2Nhq})F{r$?KqEPm3kpR=X zJQNfJ#bfsyjQ{Dpr_Oi;tA2G>SXnuaPPf(sQK#KQjmY(AK5Dt%(zJwU802?=`@{N| zMdD9?-$Tr)FkM%aQ0vS|*4$rC@IbaMOG>%w8d~xvN?A^SdDVjP4<+$J{z{8{LCN&+Og&c~;GKKN2LoEgsQhn^z)VC}U6Pf28SFrml*viO z%k%SW^>XdRVwLBI8+7gfPo;W`mepU*hqF0n9gp%2h=Coluu5=mW*s0Dw^*MjV!24J zr%4U58<8hPD~PK-Q>@{q?4%*UZ8ScM-1})MWku|-+>+v~~gTi1L+H)_zH zn8|h755-Zihzqc@C+d3NsI@p9#CeHJO2XO`c*)Bn%@rw$;Qb-+skK^SaeoFjpjVI} zgw^BPjK}AR>tv%-FJ$)otDlPcC30h{=>!oThZW9T`DcwWkK9?tGe0i$n51qaI%|F* z_bk&9Ym0JX6I)PXAk%)DirtD=5Rhsk;h>-p^XVTLK(~eM6K^9suHUP|_1;9mtH{mEwpwYR zzlb5D)u{NOprC-qr0>VJ%T4^8NUJf+mfU}n_&9vYZ9bh>qe`nnI9`%{gAe)HCO*d! zNU0M^?x{w{Z#fm!aX-}449wE<>8nGHI*c`$aeuNwtdv zJfS{yh`bt3c_FJ!EiKaQwFb6)rKSp=h4L8)9nh;_p;PC?B3QS*S-h^%+6`785xvK4 zYK;bvnkSQHbcuLfWam{V!7iaX|M8o$kgmSkA(lFm4fUSuI0EG!ElaN6{^`DfiHw3l zzMBq;y9kfeUZp@Zj{rRROlN)Z~pC<+fN#xCk zpm{w{(>Sc9&CSg(Zd8U_{UiP3;86ct+;@C1H0iooBA~-fG0R$i=g>Kd#hwK`Lx#P z8$Li8^+z%{jp8fvzdl`A;O%@X*)JsY0FY1ziqUOd^>hc+ zBUf(g^hvd=)G0Q@j*N%@WWQBH2#LF1X_1%XDX2iCb8$ri?Vf(7P#x{oS@YjHGUoTY zE-gt3awD&vUpx*&qith)l=Z(R)8FsTrzzwV8P%%;yC?<3 zG9L&NQZi*a8M$btxp^o39@A8-V-2|gX=;fej8&o)2@i@d(xg>eS+M4V1L6h6?C$84 zw}Z%7f;Ok}7Vc_-X0xM>gDRaF0MRzh_{#l(|K?d7O5-NS$O|L}=s-TuWQ!tzgTxGr z__<2&Wax1BM3Cur*zkzx2F-Kva8vhj+~BU(6dV>$j1A>xb#PIuXx|d4d+c(iK?MH- z)U8PXj&qGwi01T`z3QmGXK9c`x@C}N#kuKt3(3WH$8Pn(MB9`kEnNyr;`?S4zC@iQ zQ2ls5zX#teZY<0c!CPj*iX;S#s2kM$w0oE13DUYEE(G_2?esW1vy=Xj!LaOzT3QJ$ zOPigU+{PyCN2MGLLN;5yhytvgVn#Z;yw?}Ytm|1!X6eY7pKe9LO;y zlfdxL@Pj3Mw!$3~(#ls%gR)R?j_DEai2^t68N%F3umu3B{t22XJgNrvBAnEb+u_Pfd z=ynP-aBq++`a7WgmdjrJzEOJ+_bcP1bs;{)>&m4bVc1@o8$#!SJ^LvDLi>73cWLse z(Pi>F=24lJ*&qM}Il9(>-u8=)CD;Van;euTJ02+pC;le#<=+e(Trg2^Ilr;k5r^%6 z68hUf2ACZsSvwhY(uQqwUiZ;)96l;VApPJom%yb5+FYtZfI)^U_V9yt&@L1%oj;X? z?{e>8KT|W1>JH_83I<-J!+%g0l35VEGdpdzRD<|ov-UY5OGEN%V_j+nNTA;|OTj?p zzke&Jp}0mCvj>U@Lb$N$i@submq4_^pFQPmtG*swK)hz79tgXKyCz!x_Wlxnf508) z07zjjFJ&3m!vo$$K$gA|Bv5#ZROtUMbqCkiGmnptXNl2oM$|CyE%&8p7RLHi<>lq) zo&1xzAI#pe=nLNBKaFsVtI+&_J|K~{=IrkuZ+>0wr4+?#=3L{M8cRs#4IMpS? z>%c^0=(KyS&0by-EFNxrFv;6lym>`@Cx9uLNu+zd!YM(`OB<8qjY-LWBP{1VoQtQW zF(N>N+&IT^xk_ubXLFUuS=GMH%fQQr}$q4IV%2n%PotumBe zr36dx_YHFR#T$XxSAqxrl=ZX39AgWxohmno^vmz8^XCf6UP;S(#DcCRN@sZOS7)KU zsg%~;M05T2`sxsO{Hveb4&lvrkg2He51k(Zu4a9wlPwl14d(ZL+Jl@^AL6zPuw#%Z zo5HTP#VWi$xrFGt+a*S}sz=^y-&joMs_;zhb;kT`A?4H`-N1D=dr3fzX~^DZU73j-C0f}Pb1Ssq=Ih+%qGrY*>f0V)*#3>*fGXG_wt@yHNV^St$-VRSxa*o9A z6}>#a&}Xc&@S^}_E_J@K-sO#JB_3oiHLyy$IM6``Rd}9vKzPN3&CCBPBa$g%X|ugs zD)RpQMJ)GrD>8!eh@GM{Y@~ZjRAjN<<0Ec(l&lK&usWNF(i~}EEa34!A^<6lv0|;j z$~a_5xJhN)ww#$S2^cFSpJfNCj*b^`pfNZ|h?0=Moo znZM50GU3-EcU}nMeF0f7Gz3J9+==MBcfobfXYd$4G)j0kl|HM<>0b!I-YNt(Qm{Z> z^hh%(Ad4lzrVHAuh0GHmMRDM`EP(s7)D`4tUcUzi_l8hto7SRBpwFIk#y)tuCQK+E8cBZ+*-m@ZnV3^aO63$TD%FmpmY0To$=cU8AnvsjKT<-$R4 zovwCrM7jS&cY$Qj4v3F~<-79GNSSunNRs4nW{U2>5PyD-hmdJ`w*WjTl&uP+J{hr~ z_rSawq<{a89+@Veu%8HDTv`en%T5Kzh0$YCoU8XZUP&okj;yUi2}v*wdId(0h99Xo z({x`E#Jy9%f06+?!Wz)=MhMh9kRW*g`N#kzp=w|_S1Z>cf;+1A)@Jz#MW%TZmU?lx z0t$6v*Ol$v_e>AnQ2SG}_ukJh3?3J2nfdFXpox&^hXeMfqN)h+G5YaPE)`MVL%aZ- z?)-E*mtj8Ggn~fWIjmMYf_}q=Cu}tS_xln@AQch4HE&0je^Np-uA(6nWx2$CGpla zg3J?!H}%?tQ!dNDAJ9{Y0HAF(y#X9jV9($n-rWZq<{WW1lg`%n9$6rE^33IvPH0(@ zkl0bd^0Cx&muuRE$rDExwx^K~3FF+L>m7Me5k;8-=BJ7ZB8I+^F(UxctOj*(Y#d$u z$^9U9LUK&Wwq&MDgw6>ur=uruj)lOlD}zMe)>h)UIO}c~iN}-9-^IG=1K-W<^=zrm zY46qrNCb^g7c&n}YI=G)H7Th-h_qqlq83af)U&d(GEQqcdD-6n8Xi7vLy`KwISb+~ zXW5KAgu%uXl$SEhbjaB|kyGhCuPC<)vmf#lA=MuN4gTS)J2A^=I~Xf1iQV7D&S|@| z6Z?C3%JZv7PTYKfZu}B1E?8@Kn&daF+L@5$17i~nNgQLvxtIIdbuIHW<<4w|}m6PXhXnO(Y$xh?vtu5Awu2q^}_kKLA zS4X2b7=Wt#t~uHiva(%1pgK2UO8VaL0{FB@%mefez=X??bZI4WJ(*O%b~ZZ+ zX!-aOivArFnP_YbI?X`_>7(b8frQ1vT@soUX=ED}Ne{;&=TD>q9%gw$C>wzRj7~Se36%o;F*nsRgPM z&g$~MTUCA>9V{*{iOTv>a0W}oWwMn2JIHL%ho&`R?ypr3)2Qpa`y0cZNV zV9NkgntIGH#2v#0ps1l(kBK1BM})GNuMk%eL}9)GK+W8u?8C-HVP2untWkMc&f`8g z5HScrho6xA`;7qo*OA&?3Z;*d`yT423kRkPX1cnRB-2)1rF*6dVP|yB*uF{21++nz%}vwMOlH_?NZpeX+&NRW(oSt{T7Cl;7kl@^IgLO z?V*2(XtoWc!d#I8DldNJ-m>j~R60gU$Vag@c*lNw+y z7gjY7LK0M-2ECz@BI~UMt=iF$C`H3ihn&|xKZ@}V7vz7P$%=>xCU{vo6S(?zdOK+Tom9PjgdST*QlB=IcHQ7uJ;!v=sbgF;O&I6%pCfn5N94%83W z&maG6omJpnT2%oE63PjRi;#A&IEyy&VKtra!Tg03S7l?bqhAQ%I`spgejnBd=Nk=z zI8U4PL2Tg5QvjS}zfY1_FHE9E^sb^nY$G;+O8QBA-E#@cW_?n);w61do#tBWmo&;x z(qrqfwla9h6F~$z5asS zKS3LeZj6p;@D4HmO0O?i=_1hz(a}+3;{x+nhO0YNl`x&1SS3ATc=<{kQ51R z^TEa=^@(f(Dq!uHRFfT^{2mAGnb&iktzXGEM%U#SCmITo_zyMOlvaVE^f7@+eGX2Y zltQUufSAHo0l41%mI!D<+m-qXw(ZX@W!50)00Xy%bvnirn&?~$#PX{8%<(tN_0vwb z)=MJ{0CS64Kr7Q(GyFGV{%mHsk9kp@!rby8_oJH3w}os; z{Vqw16+G16>1GKd_nOC8fBHq@F*HXE7$yQZp}bBu>nHE$=Zp@^n?fyOi4@w8gOGRR zM`Aw6X|Zeh&vdTP%+~5D-8~;CU9(Q(ky8aNspMXpkn8vT8jW@-V&;W^pQ6}BMn_Xp zQBg@tN|Jbbde+p|s`%2Z@K@dLXe<;_15Mk_ zlRm1p3U9U{bl~!LuBtiTWbFgVEDGWk8sOBljWuw7+SA>ACguuhdJO;F^`r0j%yrCz zB6{^lHL+g24vG1K-0=OJJkPrM*!dG-!R52+Zk|tra#F7}c0hRP^6LVC4&Z@Ls?+5T zygx~e(ty)x8`{UQp@73^qWJ^p;iF)upU?dz)6SPpsVgyy5se_FjV3WUnM2A40Glhq z0T_f99Jw*TuN-Bum0Q_kn=wN{DfIaG4+-l$+u0v^`V}^%5ql62bBBVCw2%hyPTlnp zd2cK0QRG8MZ(55mU^8V|fG-~{kcxvrn2=XFS_iG7(A+AgK15our@z6W)ly!|mI5(v zSR~+XHma^x<~{gapQY~HgxFLWy~|HpvCov*)ycaPp8^4wa7_Y;0p1aO1xWBxk}6~8 zCWP{4JWosJ(!v;8M27d}_6{yp;r7|_z^x%rr{38q=d;+8K=f6ykZ$-wJtFSiPNL8g z&IKgBDjHyY<3e|Qn%@l?&aEQ+ zL9yQbGn}+k8$;BCMzB(4pP|c9en4g*VCjCswrdX}XTsUjP!5ilL7XpbY}@zjZDh`P zZjvG$8{hQvVY7f4sRea?gQGW=eby7q%**>rmnuk`po{8$_ZNk-nwB-GUa>QB;p z8`Y6yqYcVVBof_6(T<0WV!K;#56n22$xVA)D#wQl;)^_(&6w3ha3tV5@I2);^>oQkcNPQ6-TaQ?rz64`9^)`Ew@9W>2Kh8 zOjiw1;+51dI?-g!zxX6oh~`V>X7b;xv#-&Y=I$c`j7b;}9g@GnU<+p-qDV%LQ~}3x~#q2y;B;KDqk*9u?E^8Ei#1l2rz)28l?1}wr^RUOlF3!Ffe`O zZe}&^?5gERaM)oraY3=OfXN9PYFL_th7VwJS~+!Dn)1ExJOC)>{;2&1#ZaoOA!6DV zg$wGo>*}#5D317U$R4B89_c0gtkJvog}WR;iSx@^wke4sDFmu%oLS(v7{R9JnqSX4 z4; z026egJR!M@wbC^o2yV-*+;NKrc3a{|?cAJ2ZDZL2AixEZf+W;WE+ixa@z^LPC6Z~h zgK2M9hb``|q!FCkzBOE7Jyj9Z*(Ta1*_PGyH=6%6K_llESxU*|3LqpkB+xU z0qi)y1=X-P%vCn}^1HU2<*3hkY8us-vB11hgQK7cfxc*?K+=BU?w$I$i6@qW4Ska8I_X}fT_FHd2mbw<`uOA+U31BK2s zrZPpq7p<}}fmom&E6Pivhy*dDQ7I$hvvujHs_AyrPg(wyN4n9h0I5}E?6y>1`1`FT zK_k)CHwQ2_$Hkp?Jh2VsaM3Q%1=D^a)1hgyrEK zgc>MO(UfWd7XCc!6-)nuVi+&Oxm0oK)lD+;1`vr~-96RPXF^-U_r}r3ufk zR$uIOJq`y17uLHyqfh;X=70*Q!{Puvx0(s!xc(20`#O(~9?$K2fbIa=D9?s(-xBeg zLp?utNfhVzJTzj+pMDS=2S}K$Omj*_5{`!+`+Z__iySdOe zdljKX$x-irgs7pPR~)FSLNR8%+@C~V&&IfyM5)EVXTX44g33;9wqPIacmAvr zXuY(1j!s@1n1L_1Ese3~gQ1E-MvS2P6~wcvktzaNG*vm_nUa(vaQ?|byMH#uqs>_j zm#iNg0!s!c5c^7?32P}Kth!zmo+gg|EU^XJ+z~H{;$7B5THyRg(7#}CZCt|6R~hZ)%;s5TL0Kah3~m_AIh)( z=E_wR5I`oLoJIDSSxs5_D3-UcbFr~Cts>~;<@mYVuR5TMkOGK#(SI7sLYrG!wADk) zia0$RMN#*@#~{ckd7f-rF)Ax?#D}yMA}^M_1uT6>s6+Pa1o3CS&0+<)@d80C!g%Hz zk$(1CZY&%oSOA-2Sx#;Y(vnETPRwVnWq7cdPk(4<9pNj}ZErDl87d|BJ1+42q)- z+iep<2m}Zg+=4p?k>UI-GjT!;O_1Yg9aUBu+w?JZ=X7+YX7EcnCa>6 z=jr=e*P0{|{HT6Rp4%0@%KF2HO3EsMYlt20_`uw0M1joP9x3m8g|y~AQY26CQ?f5w}=i9K*D zY%_IA0nYc_WYW_hb)jDwx&-iTt+XcpQySui0b+_UB`xcGV3clgv>0M~FZ-$}>kHg3 z1cm><=+icux_1JYlbd^lWj{lpuC{hhCjER?30=v-d^_0or`~2fH`{HYD1I-^)z2R_ z(ZA)#+1;#$7vQ;VNi_A^Xd?P|L{f)%FP7yjzP7e;#szc`)_mtZm?gX!jL@5FZnXYx zY%j*h$IFp$h4^u}EbW`A$z}mV&lh5LGde3zGvBRTk91Tzq`60dWw{o2J!7I?2gJMS z+vR}aw${4S2{C5H)XQDN?xdlnR>Xc_;9$gDvelX`pm}*#x2*lVllLxxf|63b68DPc z1Si6cFuGlhk`MO%`;sPWel+>bkIyx>`ZxOuA}q3>F)Dk6;@X$nCW4<=L4?WiIg8>= z3G<7@9YR&ljv_>DTs#UtoU`UmI_jFH8d2bsoob$4Z8P!aI%83uoB8|)oNNS9^8O!bLGZ}-DMbw%?*rX#eeHGuv8QJ4Uo!K)jp_H-s8VqCx@8a z1EFsBzlj1cu$j+6K|v`7aGug&9(+hx1R+n-)3c40Z+vR1Bb5Fl0UEIl8T(eU zWt)UNiaGtcY1KFU>zN_XmNtUPG5x6N6|$PV4van`hfe;JCgYhDebVE4n?tfyTnb8< zrzGFVRcPnmey6s+tCCkfiBzs&xO#hG&B2^Gb^g)P5P(#YW~@TwyW$MP@b>5)co=S6M?p6u>HeXFL znF`FKfeje2v!t3TNYqyZSOZ7pCxK@m1KmGRM%Lm?=s=qT0ycms!E@9Bd3)jUdp1Ae zG8@4n~eQ)A>KB~HvB1_|IHH|fG%QtCoA)dnx@x0o&Z)EX))i4QTRhX^X z{G7GRiufqKgT*eS*}U;`LB8Wnmn+!*$rpKY!B1~*yAtv8e>!-?Y_~6U z%#%v;I2XdAL&%8oEVQI2eFcV1@QNkdMIKS!_yfj6LqlJ=;GR7ac59EjmzPrqd^(l! zuAhqX(_D8A>3I#MWfpNq&UvS_yz07h*~dbVy$&YgD20+WvWWA-TTJ9%F61DlxFIXl zJ*TcPecU_43X)fg0^2vY4v&W?N7_eArHD6ILzp*v&6o+!3?XU^T;HpBFOr8wY=zhk z-r;%7&c3>!c(t2wMm)rI?fl0UBnalXwQ0@#aY21OH+guedv^4cH$rd$2wz`8O$30b z!8fv;qZ7bp)cK;fHXO_svPNxxbEr5nIY~vKB1*HkbG;gAIFo|E(rH~YeulveIejqg zwcbz;uG#jS^%#|tuz7s$;Z!jy54V!h*{c5dDRbNP9?PA1!?Ju4Vi*4$Dm7M+X;wM% z18<&1=VC=yhIYvSoZ^;bOv)OBJoBkX0xY0Oz|v#4XBL$Ea8jRj^CH|@QH>IK(&P6r ziSqLQxB%Rnnjr@P=fu;JW>d+ZfAm3Q5bg{YW_G#Bba6%ojCh0R=2SHMU6BUvQn{Jr zPXL`zTBc*|v>Qp9K?=qfoJx8X? z0fqmz0dM?9(9D->8rM9fEKKKJw(EV$CQ6DD{*^>>o?um^`4}0a03?oD`*&ny;qTw@ zc-)TEG~r?oLO|u6kki(e$ot-KrPfsXAzN*Puw5c<|HT6)tm0vzN_@Y;3=U43so7gJ zYE-Y~oMrxVIECAHy1~X|%3&jBdPI+$NYP7s2l#08vUAnXua)|Xk~ps%VLLZWnEj8} zq?464~)Y%n*wji?5|Fi|U^Oat?3JD!#xI@Us(YC4h zDWMj;aQ*iO&X7Tc6^DCwJNK>g>5qngk}2Ea{CsZqF)j~+I_`8PY%DbHWrqviKbk*m zjNaUBYqZh;{k>D5LnIGb4C17*ia1;F#rpkz{qUll^@%Jfv3?UMkr@A8%O`JI;vrzBa1+o`3f`?xxy$ zoqPA#lv}@+K}c?(`O34SoVSK%ktSQ;4c-uiAOD&@gYEaXjYC9v7+8B;{VM%RpA15M zd(J4AZ#M468*luXtK^Xf|J^=w82p^qMZ}u!JW3+Hzj}u+W@_A{}%$RypW8Ha^Ak9S+ zTQMk& zDiGuAjiK4)uIPU?`QKIKfNCn&%*%btZ5>MOz2dWhTxjg!M3Q*m!<_}pzruZY<0zy} zB6tUtLD1te$Dy_L|=28!B8x%uFH;tZ8%j|uGJ8Rj^Pw;%IP3ATSU@lw@Uv_tRZF+z>wJ> zhtmOT&oHK)?wtcAUmik&zU5Z)K92K$ECG}7fx@wwl1!iII5|_GcjrtUtr>J#roF+) z2?+@z7?d(GH4vSFZ8|A#=IX`)xtu%c8 z{8{9#(dPR%(PmWq3xX59GU|dr>=64_1?zLa4@5bENSP#)!8D_JqTzdHcJx$o84@<3 z5xkb5E0M$QNnz9Ha#9o@R~_!iC)c(yzs(%#;!c*}{n2FQChK);{CA+x(1?hHv%^rf{{r8f}Ps*d1b)Xj3(fUb6p?!92;x)SOU9z7fobEb$xNZ+9ZG2 zMned)7nVhX`oZUhMMV`AHF)kB?UImM?YmtL1S8))=e+>`%Hrw%(q3pN%X340>niA* zhD#?mbSTvdqXQ2j(n!XTrFu?s41=DG<0&}Bakr9pX)`P-n28s%6I1L z;@-gwO$dY*ZB$_kn-su!{F(JgE4=6Ks7Ia)saW&e@~Sfb(WkZq#8sI1U3}$8;>1Uw zoaa!S=lQF=sCbk-|3d+5xssC7o|&g7@594`Bv}?+ZS`U&AcJX*&)Wp>**K>q4O5HN z#)_v{Dp8PYerE0IRmmFJsp1IDN6~Fb|IUjYrQ0ZAJ!8`Psu`J}Yb=l5zv1Ae94@(B z4Q7qX2vcqm2U&U3bxrQa;8L`q=tnb2X)b+tjhGF?k`kjj5@OGCKc-oRZ5kKj1`C^6 zs7NQ*>1F-a2TqFk5S@_4@GBfX5jZ^4n#18gDm8K>8fS~8WN5YC3bB_JH_6{-F8W3i zE_(E^Ze`4H_fNrK7K{n_UK@6e1ZLj5H3$3cx=S25fV4eHg{sC8B|xRJ6CL*Jtk+wQ zIa?x(Ur@rP$PZrE!*%B^1$5hvg^J|<&YMClXs-azzyL+kPos&xUqnl`U|Z^}gbS2# zbhSPkJx>5fzTFwrpp?^)ecvCrVM4j2+(+c&ew8>e2T+IuO-ZhL8|NOOk(j!y;_3ZI z#q`h2s)Gbx_519pK;7dwU9E=2!8180JM5@uPB+Z^)-tT1Kl?OMwZ}z0U!DK2aJz3~ z9t@#sLdLdNm^YF%oe$izN>=WSe;%2WVtH8^qchOWFs|YT|2&@?{+f@(p-?PG7!MbJ z6?5gyjo2GyyuEzCH|~EE#Nlol}116M^AH%QAD9vuCPTIYC^)Aa!V-xN8<1(da zY0ytX(5`%1zdgR6I8~=!=r^|O_88yIK;^XA9hsz zE5=q{5@kx+a(tjS7a8bX>AjHg;NxAIOYg4Q{f{CjhuQCMfIqeV6DUSm^B}xwQzd+13k?&qBk5_w1q!YnK(*Lz842xI`0{cSOJ)|dUZWo%NiY> zmEME4`$+U8A!DyDr*%399n@PoFyoDDD^D3BEBN?qX+%@?ARM_2sHQWxkei-gM9m`1 z_`cpSXHuo~S`(eDO5o+igVyB%mqRXj4L~3Fn=GGUhube*!%X~deRby|7R`v{L~a`E z{B%XdRzc}&Ey?QhIHYy`mjI}$RyaA;PzwK@K*1(Y`rcm9;AFZ0O!zCmahpVUay0HLjO643$6z(^%Yh9!BlRw%q%_kX* z;5KinCD$KEN_z;4AhlTIC@!;YT^qO9eQ}Hx=}E@H&DRzxDMs6ULPH3)xf}ti%&y4K zK3+7QYi)DjY+2rYN&U(XnbnhPslvV7(fL%fjfyE(JZwnOPBIVlVI5JhK-jW>Y5AT{ zA^Np!{+#<1#l}xRX3h|`!?pKH*wqFCYNNjEK8-R9xhpC19adfLOUCqMySzPcOdAl$ zSP$NBQg>G>gVVF`trTz;mrM#$q2{7(>!V#qWLTlgDeyu{Xzlm9#h(t`c=!ST9?oNw zR@`}T>J-c|GaS(BgxkL!Vh=oO1Ng!S#vu=H3Q>AqIQ$UiV_dZ&v7H(iwaoLx{4H!L z*6yk=l;X41#}EMd!ufdaH*cq_e4;o#N7&BPV%)!I1gKXIa(k^wiTFoF*8;ja+#$} zC;uSDm$MsaSBicde%O%3z19(5_DhUz;Zq*q)mILTbl#|#2W$u@85O46aIs>nIaj%2 zBiHxwgQyy0RZGl=iyffS#7o$fZ+rbS2W@5^K!oByNH|Lx3y8bV=6)?!fY)us;nF_V6?+7og!3=F$m zS00l4!uybd3z%Wht$27$A^<;^|AgLP(R|A$no8-U8~mYZYT893Sr$f*Nut2p?OrZV zCL1zmWAe4*dEVfod3m}5;it%=yR1#YVpRWp8Z*}aDv$%P!4bd2P~>KU-V_(^0X(>i z(rF>I1#k~g$%nIL$ji2IUqLwS>;@Dl>klPaaUYyWj7z9_KxE`?f|7ABM}7Ag|E zXeN{A@A24S!diy=iUJi7czeK7G@lUNScpEGzu)Zn+XChcABsA7>wuHc(x-PS^K=uz zcw;_o#Na(~^-5G+#RV&Fa`wMnxs;Hu3vsO}fM6OOz}RKk7g`QnLb~Qfs_^$sM6Z?z z*pSl9TV)qC@YN`XEcn1FJ_ok~d37i%!Gym9)V3Wtf=KwEdB?9!*kI3?w8L?_^NZz} z4^8bRdB53u=EPRX7X!b4phk|;6vZed>RKLUN85$oF;_wCsd?T|XURrAFajnTiH z?f?i2luKoNIUd{2x3TPD)kT(&i|b|3@ZbJh8R%JqiNXNVjB1N|CS1<*j0mWwi8qv` zCyP&W)VU}pEqP)6ht>Y8s@P4|7kUs|QsPIoB+y1eZ+@v#bqCz`P32Xf`i5+%u}0dZ z1$A{z%N|uQ0mmZwJs7|`Ss#Ot9SCqfUd~K_~W4MxNw)GrKi(tpFsd>h5$Bk z{(Fa8)E6paq671P)uKiOUZ}nJF$bxJ&~MIo@?IMtYq!MzC}vX!9NX3MV`f``5=YHQ zz5bxIE~tx@@oqJ*PfX?H=u|Oq>whuvf3dLqiA+bx(Jefe&rT`pF(TsWf3Fe$?==cm zn9Uu~&1XjBEk)P@Tal6CqFeCNC12BO5*J?WCNiQ$SO5Dk@vvF$4dH1{5t&qAq4nz7 zc3E!>OOGkwh-km!bfm&_ws1}X$tc>O8*T0BYCgc5S_f`X3;xf@84qo%+;|-hc6dQ) zm$0B%FTZFS30~vmM%hKE|V3fA%+x_EQ(8fYL-Ch~)YSpXCforPJb{mtiVHP(mBc zpNu`PwHWl5*lBIw5xSQ+4&ObU!HNg`z1kE;S>RB%f_pI*{?;lq@BF1vk_MY-xOgH{v3sK zwrLg-@N~{Giri*It+oY#76zT~hkEmrw&(B04;v;Ql`GuY_5|;&qYCaQtBR}Rhy0Zr z*rKNa#tulRqM~kR{(w31>3J4Kr=m&s&|qLxBZ2o)TBH~;cnr^KqaMDY`Bg)2 zSbkL~T`jArwI-+cze~UZHlx4u%V`OUHBBu#{4zjtK=((5T{_YMN2f~%%Aw#T?ArQz zILO>j{eN$Kt7KbxXG8*6fk+lzbJ3Lif31T@IFHB0R zLF(+U|5P^NulvZq0gQ%rHFw~B?3omwGwNGz{S@|gT!>ADw%Ns&S*|15`K^=Ay3o@h zWSU{#S*gx?3SWS@(4poi!HX>Vr(4VPSWU&BU&`y+sFfND8WP!6{B%%{pqV)*Vudq+Cv0ZFk6yDuQWpg(a?=!LP%%^mh#<2)WQyhWpM*HBu5DvgCq;^<0ZU*` z%HByrvc6y^o6ccNx2(BJ@|!}M2Ti;72A1@O_WE^BxJ9C#pA2e+y1&t{f`gjGCUAl8~AhqQI_U-!c-S9Dm-_OlzpWf2mW{ zReJP{L>5b;Htfp^myJKOUxf;UkSQftE0lj#_|Vam3b_0V#xl|qm?-hqCjLcb4P*E} zJ1&jvFRr7)9oZN{G6+s-&Y;LEUW|$x>E6#pT2<{IPAj|t!{d(Rc;c_+D`#x%Of){1 zaZP|;Gj;^RqmTWGr}z){GAQH7go1&J#>vXUYXruIGa>WnMSemT> ziP1ZDXhvl?e+@yHL{SJhXl6Nubvmjt0{OAYher-bLMj^7v68x-ekweQ@!#wMixDH_ zM}C0RL~MjsFAYEyQNjXT+%PJEya3P2P4NWA#ff?d4GVoxR>npuK~V9#|5@b~ykP)P zt58wkZk*7Of;RFW!9@_*&2@0ZzHd{W;nIG4jds-XVlKo(`+49t8G}bUUm28|-jmL= zlW$e)35CxTPeR^r2&szcSRymuZF@JgIh)9cK;}L-xLPO<&HS($Q}^{MGHBG<^XpP` ziZ`0xL4SJBaB(XlAKw=Zy-g>zNo}K@y8RBn5r!r&@A{pD zDFjfBIb#ZI(xmHFU8X%uuQC2YZdN-Lr6#yz^Xlb`{b!puH~n>K@x1#@*lP>_61KWQ zp5Kw$XXX$;bBie^S0qEO*JGdl?t|m1$i@C+hs?o2F2aB`y z&|+o4R72X0KaYAJD#k2Eh||MF=bXLuYRf{&>$MTj$`qDEQSh z-_yr6?z9R0V_!VdCyRg9N+P2Qm{DlH`kq-bCELEnLts@+?pROON!Sp#p+;RYC^-Eu z7u3p9AoEM4x%?#o`+??C%t2=jIpRw{0{$s)BalhV@vFHw72Jgn6t=rc#Z*SnJ>hNo<)?uik|*q!@ct_spF7tr_C%v(B2+O@H3+Z5<} z)Ny2#7OBgv9EgCB7emZzS%dcOn%vY+!>2~D9JIxOIsg-}kpe&Qy$c56!l)8^^Gxzz z%`nNwMd$V+`IR6EvCXx+JRnrP0*e#L@jo90eVE|c8x)`dIp7TB^Kx_`0g_&uU_Oted-%D7mCTslH1tWfD2bo zbapjlloqe~k;@qhD!TsTF!n@%_Nf4Va8F=~Eape#FNW}x52GccB`gWz!0^^5)Hdys z&;PK(uEGQj{|Y+5i|tLi(O!L*HTu8+$f%oe3TqOO!}N{&1EY}L^}q&F|JptuM+Ccs z5Z3N&5#1rP-v4;u<=!a>Ht<{u7r3SIr}IDa(I}My(~xpe{mqSCV&=YRZM%lrFGV2U zT9^q^z-yYsHRhxb`BumKtvdA56yiDFjUy#7i@SN62@Wc;q)rmW3T<}7q97M-3F(zh znEQR_R4y*9e?$XCm6;d$BO5%VaJ|;^T>H~e@izG#X#_DgOu`Ip5b3+ET657% znqV3kQPxZ1MKJIh#AJYFcZcm3wvmAae=*pG)p9D8J@@=3QqagN7$R&;4b5YO1=rAI z{(1eg^Gao~-79qWZ)dkvi3V?e`p=7CV_h6Er=!()@#kW0bvlQcBn=W8sWoxi6I@M` z`r0%Rzg89m!rL(_-B^ro9b#7twT+fn%gWpLU{ZEcFGg3!v*#g@LMKXQjOqT@(eUMjnkE#9C zW%5om$nyR?_kI83DeSsDHMPgqS)O!w2Ln z%WiMYYOc3YvI(0RycH^Z-=R*$_?DGTq?Z`Rn}3A;OX~;VlB|$`jXshn`u5AuHiCsg z)qcZbIA&hV6zxk&l4YF6Du~Leb1eNa_~TrMk0)5E*4%b&Bo!M9ju-PQ*DNm`PVa2y zaa50O(=&>7FWKq#Ej_P{bdoSg$H@QFd9F1Zja{>`JG)tCSOimj3bRc?GutsV&Jwl- zJbCgQ_KuoH3EYKV_Re_TZ@axbFRm5vXjVL~v31SSw-fgz5%-# zi%xjzyJ=qW@p2hrjgaaxD#OBc2*G53F$jVU7MSq|#9Yaq&raGZ=Q`LV!nTZAbo{%1 zn4hjIom&*>8{>Sw%ii=X(V65)wS-g3rm6h(_&b0M5O*_AHKta|$0FCo`QK5$`~(Iu zvS?x*WkX^;SK+n^mamUOh10e=Ts#NM@2sWRKQes;kr&e8~-$1U;;k%IH@{y^_&j zPrs$Hguz3_1ixU18u@#pWTd)nm6WnT@$74Gl}3$FONC4}@3NucWPO}+ws5@0eG9bZ z#*L+WfEV0pM&|Xr`fSSnMYAPPe`G*WG&+v$?Cfkm&zQf2Qpi%{o}o{pMlsoL2{Ghd z%^^ON;2QL-;?LUS#PVNNF53#JkL_bUVSKzYYn zghJTXxqoUOaq%z!n`35omRlKw)~&`{+qMY_WGg$GN!C|5vB6RtIEqiGH+87Vw10lR zop{GvN*lf#!U|6j2rGn)KykCdL;BHQcQMM}^bAvXT~&QfbWs^^*8bG6`dB#T-g&s&Jsu{LyFh3yUN2!0)G@|&B%#rpqgl~|bnL`ZCcUugk^~ck zTI~`_uaBV_V`DMD)7T2dCO6Q1!0SripJ!QN)m}3SrOBOkeD?QVo|Q=GRcJY^NkdTt zXk#zgvsW=5Z->*inPBj@Y^}yF*ikBAwQl;<&oqJ!wMAOy6Y=iruX37k+SY{U{%|H!9o9& zq7kl;=i27yhXnT%)#3FVyMT+PJVf zA&{!<@1QP@)kMMJ;F$E<%81ZlWLV-+LtSaMvW8hXe#e%@Iq8ePZnl^ zO8eU`a`k2(PDHhepq@{oxBFFNN#AL8$mp+U7fxp-vlYS_zUP#0R#HZIKLJ9i*WN-} z6%$>;EW3TV^!9@vIvjzd$#?o=x%ji1RpKYQ`&sSpnI~7hpF^JPY;6Zvm!Grm3XIOU(K$^1 zJVU1$v`tPnUny~1H7JVbggAHxJCE3TE(CR1ykDlR@YJ!LJ0Dv4*RL8DA#aXOOO$J& zx7vG4QD^oV9RO;N;8yH;lPkAqbR$5~>VeYjYr4o|OmV%aPyVo3CG{vTFuRD)2Td_y z%3vC{d6L$8IYCwzy&^zxm3{NemzTXYJ3^Ii#7K29`lUgnzXTss20@NpwZqu4hO#$l<)@% z{W}3VAHbV1I43wb;7F_PSK0wgYe`>Nm^i^z>yh{3qTQ)EdKguD&FVBa%PUnxm&Y_& zXujz(w|VbZ#`BFbTcJfUzu~(F>DxBEwCS!J7zxR1u$Sot6ch{@s@#SPwNk~QVI!La zA0LJz9JyMZ%2a->uVh4e3CKKvKE=7N&XS+$!E&LIHqnW6EU?SVUfI5v}Qp)s{swB>~h@w!+5%T6-jj7vk zrCIb7y7?O;7gn3Su&ZisV6&F&l7g|d2vmAqaBy&HB<^gLVB()YBTe7Y%BZzIil;t*Y8BNkH=Gt-a}+L$ z#xp-{T{rTlC9N(f`NN!Yf$;jUGX0*axC-B+o5jY2Jx7+%&S|4Lih=%SQ(#@c%Ai$r zUtfIRs`)Bq){t1y{^I+e@vo*0F5bJ<7xUc{P0hT)JrSP#N&x{o@H$>&XWczYI2W$8 z2NLW9eARtEPYUv9$sqpy7(GWsnT+=sv6st5ID~38Fe?;uQ#E&yj+Iz%zg4!ZVD+Ql z8C1U6Na_+byMMDPVh(QaQJSY)yST6&R=^^cdE+}7b7?21v$k7bkCI@@wZ7C*)=3#{mT-}->Jp5tGD)cE&-Ci;1{+E7^X z_2@%|&H^5U^yv(XIVM)k=HI9Vu5u+^# zaI>@LUOeLRn!>)hj=_UNZ+I^~6I2xBJ<&^p zN}fGwTKpyu! zL1o4^t5y>jXQjF{BDZHNWW@=7Z$a`CFdbY{c44KYajBh?rg&fv+tbRdzZ!W2PoFyP zRKN}BZE$ok7|6gf{I{s160Gfa!<&5 z?8hq{@$TRKd4$UEmhbo|=nwZz{qyc}2ZOerw{6Gc<8-1qR`(l(Yl?K9c%z?L z5vMV}AIJ@u{TQEj_B+9r@1#whCpN*ut%cY+-MkE+=#tdbF{sfNVoCi&>r*VH?PT9v z6X(&14++?u?{RlO)TtaI_V}%2?M=gIPJPuaZGh(MNTToRaq$#a@@GSvV)O6?T=zgHiTk zEiGSIkVdLJdf&U?kwqr16286^f2u&ZBQ);xO?*hd4m(2Jmr5f?r#W&R*? zQZU3Kh-us#;h@$$7v*5XY&9Us*SXuRFk8pV1AqLDTG4A~jS9cVV~@#DPEbBx)DT-W zUb-*^9xqaVlh#QiZRnM*ecWt_gOkRv?m1}J)^v04^~hMGI2^sa=wr!Bo~fcFU+Z6M z7)CYMGiGCFzi7olx%kdT`A*26FW8|%bmGU7)fPYZb*N8+2~uv`)DF5=ToK>udsAP- zpDh2)mp=nXdW>lY-ptu3%>|l#L`Zn=_ma#J%R_XM7anYnVoId+-AX7LJc`)slX!ED zNO*+!Bn-n(E*P{CVw=i_aL?oSORPR|@MT*5l#x^5JjREYS63-MJc{Y>xz%-%_q^6B z9QUBJC0$G;niyJH5@~y0-CPS)>+IZ1sY5QLfeUyxHy!`Ix<3`yzjC#C7&~~@MBn1* zOONYV^Nhoz;C>1B22rWydfTNrHzr_jQ6D2A7J4)oR8V_6lPikol=|x&H}_j5Ff;g~ z#t$d)D;Q;#uH72%A?4I_w9!{}M>zV>@uFCqT4CrYLudSx8%{O^Wuv$X*C)zqwbZ4I z)y6PmCka+6v&*+Eu3mIQwOipdyXBliJ=rx1mYxdul=cN%2-B-{Q% z9Q$3YC*rV)?{yDe8^2ZK85vsr`qZuhPHM~BDljA~m6}hg&>4sheiT1x+Eo6_X~+G- zOW8w8D%bo`dD;E6P3g#UNz}tBv;1j*RPp`Hvdx7jIO5|w1V4&ZGo@|_DAwTtd&Kf# zK!ebX2vKocHVF9bsQf{(k4DeJTfH{$uKOhru`yziHfF>gccGt|=XBR>f8zx$&@>tt zm9_WQlUzPWaehX>;Cl<5ddy;${_++?rI4>?>b5iga`o)9IGn)5{&RCmJ`wSgS}Nh~ zpR>f;O^%+Pj{~2YHJm|rQ3NS3ZC(DBQPndqy9l>&p9na=h?!tLN;bR;cPy5S-*~C%CPP3C&)E3F5 zkE`TxNc%D(CzIXDn12bY zr9B2&e+*(sFddhMOONNTH2mvfs~E2*!-+DDIm0?F1EL%;L0*}uyJV4`6R~s zM@^_GMvw*-zWr69@J*0tj9@j!@5nMzGW|A*%Jzx|vDj=MHU5V?rH3oN^iBKf3rXne zCPm&4S*}iF4?%X^c#ksHUyQi4_A+jOfo-7@hYU-DbM#;z| z+-7_KaB3{f+6=vpH7*QclKj$z;P@Ax>7EL%&^y4s zyIY_+nOFr;k#Abn9p&d|+X`XJhZ`D!5d=(8ic&A}5|`QWw~KIUDXg~LyqE+fZ59)v8Bk!9r!m(@eW{N94-=pWS&2(1s^A|$W?AFdE7qd!*g z3&#EI`dLGR_YTg!e?Gm2ZTE5`v+m!=U-fg+7dF@#$~^HDC&k1cw{fo`#VPo(J!Q7R1TkOwKZta zFZ5w_6A+`&-ka@?I4;+nRcrnJnHHhI$l1XN(Z|lnr?qb3;mV9BYoSw@fJ}5_J8TkS zWKsEDNqlje{Z3c{7uN13+w?okaK=T#_*v3iE1#D=(~1@h`)4Zo`qPhVi7sr|pCt(T z^3NW=QBYq6>tgw=*l0cNJghu!@+RJ;VQf+NOz=@hNzg%ThdBu*R0!dc6akje@BOT^ zg(#8YuQ8bYb%nj$Ib{jJjdZb!|A%kTJ>HiXwRwAzus|yE8w(_S-TrilCiP`qtzyo& z$8tSb(OaPxQT6xflw{O+mfLpVte(vIhMbwJ1PnRq5G*Q_UfVqNVp=a36+s$|3!X@+ zp9gXf?8rOd*zf0JFTc8~jGD}rA}zNvD(fjRa%%i81$Xtal&2a` zW3JXal})KflrhX}H958WiIWS$yp`OVdim<+GUut;U{EFe{A;e%5``-iq3Sa_wLEN( zlCm-dFRu<0FM6JuP`N^`|MRYu@a)V-ee@Rc2QyRC(Bjbmk2ui(`!wEbaA4uI>NPI_ zISF68x5Y@kl>I~hWR;!48_Tq-uSamz}GcIpdYaUD$l^v^k{mrBS zO0a}o^06~#jKf-0m;1Ja0iF+mE6qhP3d$r{ob~S}`Q#}i z+9o_y#-OHgE~oYMS&j0wv)7QoW$w#{X0d;)uUtnvehV-CqDVx=!VzXj^rp^Jfhb12X=VY6e}1MQ)nM@dwD;CgReoFlC?ZIAqjaYNf^;`XcOxJO(k0y?-6b8; z-CfcMigY*9-Su0W?>Xl^@9&Or$GHDr#~%2wdG^}#S!?$EthrdgXb*A*Q)=u3sy*R- zGy;z$txZ{mWUjspxIUl~xE?#bPhb}WGe`gtzY>v|SC~hp(C-AG@KW)Ksvku4+U_`9 zsy)bx8eH8?4n{rtnxeg5BQT~nYTx!*Yp#Kr6Us4bCMVIPnom?USfoV%1vC$w!)4_O zCHfZ_S7h0W#^r)v3EERc3ZaU5bGX?`#X}#?#Ssyr1Z~hclRRZVJ|@xc;+95>z(w-B zZIWsKQVLoD>W!}OqJhXp2h7`H3eso{6v9=MAkZu)Q^>+%6O>ax$5(Mh#OmajtBl3!+@gFUcQ;a>Z2fWaJRF!5!CjWgM3_O0 zXAR2+U3>m4L+j%MK#3K<=dRhvdpDtsgdCp}SVc#Pz7A92`WDyAmpmh74?LzzGiPc4 zCGj|IKV9WDrGgz$;4&i-&VVuvL8xf@Wpx3=VY`_+e&y!ny;)a<1J|OehwCc_w_u)|&+l0U<&h=S6wdAOJq zOg?*{Pd>g^8{kQJeb`yQ{E%t2e1p zq!Jnx7l(anasZi-DDb?PzH63%2PUx^&a>pUbZru(AGYmJb{{$D+HXD+Jczz4)XWb# zhh>$(Bz-mnZ%o2z0JU+oXr*E%$4Bo2nagICCE@9=d4m!5)9o;4V% z^8H~0pQ=2%pUqT2E82J_Vo)^8Y_*}>Wd@c&BSd*8LzNsIMe1|U*~VDZngIsXT*cra zwop_6E9)-djo5lP&1#BC>Vt_HT&%B|?s1EGebO3#(uX#;{&z+BWHIz~BSc$!=-j!U z!LY0)97rKObkdIqgfa~%MC;$fG2`FAlPbL@$s1M%14;*hhaFoPR)Sm$F4qO|%+U;Q zX7E$A%g&nFcM{CJ|FA}%NGh=?3{AO=;={)?PC0cN!b?7t1e@@M_T;_x)Z2_re!!M= z5UEFu(u4um0|*_keYmrBjg;2hw7aUv77fcCt&Aa+hGA zi3g9MZ~vNVI!sWEjxfI+!!x&SYmFstO8#i8>!iaa)2a0P~= zeB@Wi-D+|(5vI5rLM(_=C#=}v1*FKHVIGk{+>!j(8d!h{(BPLR_;Is+mC6CL(BY-R zW{M|oz^sY9l^qXOjoOK zoj;Q==--!TkJWF7kKL#q+c?U?IvGJa&3#lwG45bTLAtYo8M`c)KZ_pHqrJ1tCLj!2Wd z9=`=Tfi znB3{P(#wqeYqz$I)#U(@H7O{=6U^Ap%$z*+#sjldx9cVwzHE12%eqwQ?yH703R%gc zV)LjQ)6uwG3?3DOv}z6&{GMQsP#4Db{Cb)cHWk1N-vSxygp@fe=X!2 z(;uMtuDHN({{%`=g2xDyv>7*wB*>tax^?%#zBBpk0(W2tC9&AbTKNvy1rFTq6Ms5E zslGuH$wm3GP_@1YmFsIG^@I_koE>}q21F?=1JNaE@aRhcqhkv}VHsh?n49$wOL-+# z=)D#VI0g5LchZsb=mESWxee!*zX#RS@Y_DAGAT=o{>)u1nMBWHPgPU3K%m6l0m;1$ zWUvRbcZ3Cvr~6ywdZKUeYHD-n0UO8OO)=%7I7n%!%#BZ=kg8Y+Ivs1e?!jAa9Zz;{ z+eSHW(C{KE)>POoK9nHd1x^t719r0Bh#p)AIvfxUDNb6ciPCr(aKXBDi=2FGQ3W(Q zvr$GG!&HQs4BjhGDQLeAuHVt-Ao-%~^w}VS<4%HDJL9IVg*baQvC~&CBi=lZd*y4a=m6BQ`j&eG}x|4~x{3<#$-q?OPSP z(E?8EaUyyLF{pFFY!qqS(Wj&=t_s4K;NU5%7463vn;x^{ev$ zSz8Eim6#DzuQxT%4#d~jQqw@3K?ytg9Z7bZV$jHPPa%jy zcgO7Ustk|a;fb!ZS?PI3Ypagw z^?m*>@YxCxs#S~CYqIe-g`mYSB^PW(KYCWH$UCsplxfh(YOg{}a0MYNu=+sOFe(_% zqNAhjZq7axX1;nK9%-wrp^?NNvg9T4Z60p*u$~XsTNgFr*<>Ptk{>=~+x9adKxC84rgJOr zbl@Dq*9yNXjrwX>mztfeJ=&u?@XGka7w3VZh!T`S@b_1XYK}ufEy>YuF$(oP-KVjz z1j|GVa(;h(m6hsR-bp}4d0aLjyU zWOY{!%+||Nkg<-QLq9K*!U5l- z*pEIq#8&JTgZa6+3d!%fS0Fm35@8zW2F!s>j(E7ki;CWEXtc{DP(Jo3la8nNMPt0Q z8J(#x^b-{m`-Ie1tb|5fa}6Vx%qqsp#^x9UjsuhqLM76n+TwbmNv?rO*fzM+|0@M62L?+-JtU}OO`j}^l?Fo7D# z1LR8&3vlpeDu?wVV@1Kc&l+!daj_j6W(8wnV&=6LP^j8KXP2>B3w1syjg~K!d9gtLPcU%x@nqORa6x=H_WA= z-J~ho2ZGwK)|i&+>llrJz`YTKR>Pt_8nQcCm$LLZoGkq&W6}zGMx1m!N9%9%xLwcB z@WgD`IRnS33AVmK#wYc-y_it*(D~s(;&oR!@(t1BnGUZIAR-@zxv?%8(UO1yYrYRm7Mlx^G>Z6TN*pi9-0r^VlgPh&3f*Kli9dcsKzFX)#X#zEvR{% z55^Js$d?&)n^c;|4c>T1#b1#!QU5_D;#OHQ676Rh!>cC?{$Z4t8LlQjj6p{X&F1M@i43afgLLyUzpXHk@n%m#SJJ2W94DkuDQu#4)Wzp^~C z>BYXhoX_VVxtGe|H^V6Yx`*m|?Fq)}9{52?W`MT6oL^M1f z?=O{FT^$KHZ8Ft1F(Qf8Jp7ydIrFDz`0k9x2fEh-sJo3RiCs zrAS5Uk{jDi9d|c@NqZ2ifH9C`=v_Y6>1OA0&~Qg=2~Oq?{3+LQchdEZ_g{{q+CEWj z|JEO^=)IX!_*WnV1MsU3SQ??ySuVppyP3oohM|x7Aad1|EQUe`0l zqB20vLUb6TkcjvzmY-SC-Gt`+s)ROFox!GB=)vy=%9xtlR*Q=)g!g?!NeeBRYSAYQ zTnTx_WuM8#)*} z1sqXHQgj65rnW{)GWrc)zaGDY5pnEP5$L=&vD*g0&q@lDF83$*pal^&MYT#~VJ^i) zN9lL9g(6nZJMn*F+bC~+bfBHu3Q#^wx)t5sX1i|!r$rDtPYAS><0Qz_v8h0x0KpEG z$05e`RDZJF>GL$>ql}b9aU2bC1o+h9q57ay*@bNo**6k~k*yma3e~~l$wcv4gLsJ4 zM~PxtIiPJ^tJC*s62lM-Iy2oe_6tkT^~0-s=R5|cou^ZMYQF>VoHH7Z<90G6k z)T_R6Vm*W3LV$$O{M$OBSevbkg~-qh(h4fXl3sa3J$7F~(N_Z>16~i>&)O`DBApab z&UMF7la+1bZl`J?7xcsHMCwJ;if9Ab<4>whB9=Q;qT7<}3hNB&8;BjKHeY~l1$_p9 z7bNU6jn#BL|sFMf~=!sh4LbsA!)j_nPlmc&+>U$~a`lW^P2 zXZ@A)ApC2_PUJf7{4W;>{w;|9z$rBv`ryI6Nfo35*jgSN^=7pMW)6I-wX}C73J|I> z=%rD6oMAZ1WT&Fd6MovJt`p*vQa2Xa#9ob;S!)aacCIxT0n|_`a6@SWZ~S|Mhs0tJ#GT23`jCb_iXdM2p6F1hb}oa<5S8X{j0*^**) zh&qlStg8cw#F?9-!y(4i6q<==JV9~oGq7hx@g^9HCmE5WaoQw1l!fyOI3)Hvy46Am^5t;BxDehPN}REO`r@08%V7Pzccq-eFDVbV@cD`#5``<8odp zunrO$aCX)1T9@yOS!>AKMVaPJ5{eWtwnXGOqPdS#*Udk6dq7-mGT>@|xk+JyL>Cm$ z@x_5d&%F*yOV5|wYt*LphA!2lOMghmcx@3NAemynV!U|^%3x~iD=Eea-%Blaq{{9C z-?BUnP85Djg%kA{_*GqHLWcLQ#t8O>?nX5*GyT#pA~6 zAo~d1M{hT1!@|NsOk!eY0s;a|OiXedm1VuPwFi#TvfR8pmxK!Z!m6q&N-C;6d}r+R zaXd%lf9$m0zwC6E7g~th#Lk1zb^PkZ(7wDLvoiHTRH`3M?c>!&IDuUcQFxwEY|Q(c zGcmr_pY~G;I+}a}%9Bj|%ierd(n%3SOO@8C<3;M4x;$$W)@}9Y4h5;)%WUhiKbk={ z0It@M#AS+Rvw23~BjMuVt#5v`sfN3Ii>RomB#0wICcYV{Z)S~allA|R{u&-WabCkJ z4~%^itVMd;-;Afy#f-q@12MqI^`m(>9RG3}E0ssu@U2!+8(Cu5F6>)hQ=c5#k93L- zEteCn`VcB`N1ZkxV1dTfXi*G8ieWsniP|O|wvu92X3f*bK5t`_xoc$DUN&QsjN4T= zY)KcbpE09R*6CzT^Z4>O^^m1mmViNzI9|cK9ef?jG3cC8@`wHQE)A#3&8;rw&D>AB zys5|)*1zs^o1sqZdF3SC5ZlG2ONjwY7Wv@TvxIgND zLmDx)X2N4^cq^>A%b>C6tT)eLshe1ca~<%8`lp zrQW-t4xJ4&o9t)fKq`1a)MqV}iVP<}&;Ap)Ln7fNpy>{I>V}gcQ|T@a6)Q#J(U@^} zcP)Nx=nEuyb$a6hp}LLs%S2b}EP4CK$~8iY+MOb)vPAaD5O;^qxvxV$rb8Z05IO*& z&f#C_0i8+u%!|fmVn5hdS%jM20s>fM-AW_^AZJAzOaR!{y{F(ZkJ-JBirsdL!iF6zOyJ5n8+CaK4v zZ*c_;mb++XVxB%qn?IH4SY@@>zBvV}gzDF_>aT*R3o=mz5N$5LtjP%ZRYMA@j5#v= zfCiGu@=R^e8%e{e9ou8w_x>s5f z4+y)Y)BE@jd&n&B@XVUwr6L$5$XGe052sh@)>%%!o=C|Z=;d;weEXQ1T1;xEra3DS z`H~T3S3(N-0&X=a6pb`Q^z%lUz+&93X*;e`s>8i`5{zd8AosQy$?c~KaZ^B^!r`x` zK7fAZFrE#W?8m&7n#;reqLlILG+A9?^7hx-^-0%}ejYd&x9XEvlSpNyZTvnN4JeE5 z*@VOWvnmgeMEv79f`uq#jr6d*kw(_a&y+E2CW|FGU+Qea^(>YQi!>hc?@%2=(iU30vBrr!PuaToqW*h>e2OM>Nr4hEJNYKKuR z{z)EXmira^0y0t$1(fMj0{O;fuHQ!f`$j=YsEK%0V7nl(IknG2E8$3jzgU<+rq2+sFCxn}tii8QU$IUw_d`%P+s}E0$wf`0Usm?U=bfz|ZP|(K3H7 zmQn51vWJ~N7tW6u8X1`gn)D(dA;tRp!!)y^^(-O^rK?GQ{#;U)AMc{2D>2Tbae5~9 z2*V~{JU8-Cd2!hfER^RS#p40RMWojyS_awfIa={K0l!F_OWJ`ZB;-=?vyGXse?F;9=uZ+D`Di_by`NO9tj%>Di;knk;JFeq$ zTwEKI;;2L3C7f&Gw{vpxb3dT{7lc`nhcs1qF-T~eFN$^${6Qb2F&EpoNNrLo62;js2Z>4w7wc^;L*s4p(uB+*SmPd`8}3$A~a?y z&@rxwg@xJL<*~l>WP8~7darVEOpAq8CgsEZ)N8e&k+2*)@pIkQB+2Vr6w5p#3T-{m zs&8_@OTSfqQTh5+KOdRx1j9gwljHh>(9c{Y(So)%qeqO!gC%=MfykDkyheCPoiECJ z3F~OyVde6@AuL3YKoXc@%RfvRkqvu?`TYlz{R5FIp3{Q4Q9{RssIA9(acZzrT~k6% zS}ZS4MDf+>=|>mOF+z&3L;EXh`6koaNXENu-FBP9nq1$p7kaeO`f2rU+}LFe6VvYeWHCHzuj_B&hfL3<2!s9}3uP`=pu*88CLwH>eN!;wOj|C@!neZ+>bbU3IUjkbGE7sOvLGMNwj z^uTm{V2f8GDjY-IR*_$dj8(ujDpDJl{(C}nfM?{p8tEoMm$wwP>-KiM`Q_#M-~H5Z z0`82R`0iR(aIf%+)$;)*N9P37I3AR^G4tq^k;F5Wl9Q+*?(ypSpJp`kp3>y)bUm&s z3D2#De!N6OX3adBQZF8HrB|5FlC3Ww$%}WHS64;h)>GQI{)WzkuRvh;=i6xwD_UYM zJ&rD#tP<+@l` zt*5@$R~fDxaXsa_)Q>6;^MM_ayxNV82Bodc?~TI$z}5*ooRe$bl#^*`zGe30{_~N* zYT=DmMw95ZrM#PAraSxC@0(CoQk-rgub3^%L6x?<+nQ{Ip>~D=HR(5TsL9J=#VMjKXHy2Yfk1|cC7LGm^s{0b>x^r14 z$|l8^+$}N;B}QIGYu{3H&rbnR(T=}GI&d$@}}hRauy!c4W4Wta`i zf?AW#XRU|1^A^AIXH_LyVOe0T-;*X)3;x3;WXrRA7__yfkP@V1$5-!_Z{c4{4%QoA zTY~OXjk$#$O1bu08nH_$RbpfIpC8t`Uz=xp?4k?kU0s2Mme3$b*6( zPQ!)!X`*K@U1WU$&wR8v!(}Y~xp;T{nFS8bsf2z}$toG@>2sYt_6!*T>A&;lG)y21!+FH(O z6KX8{gvO&mz9aooY$U7GriZe_uYgsQ_GeD%T#C*0|45Jq?R!*5e$|Xm}26eZq$4w+he!Zl8I_KE59FBT1*cTKGC(HsKx$2DeC zz1)t^yRdx@-E7xw7C+t2!w>5NDe#d|PBWY9D3l-4NrdE9{$1q`TV) zIW5h5SH%!nWklAJ#jLElUtybWX_v8oBKvRKttU>Q7H{p}Z4D(58y&G)pOLzy%hSIj zYY-$GH?!^fOpb`QnE6)O@U_I!tRCG7#!0XIHwV3`HV%n|c77GKSIg>=Uk8#Xe9&P0 zNY8L?$m&w|n+;XqqixU?;j9IsBA%SB2RUzCSFb~|$w(j%M$if)nf{GR>R|`TBUV z?dh6b5Z$Le2n!oid^b~mv-RqprDQ5LuF5N1N$t1VdU!=@AglE}nmTrWbc_3%O0>4< zv=0*Ej_2C9Q^#a4<>Ljixl5ACzD*A7t7CyQ+{lC(^3LxtsW z)}SjHXHEJadqW`Fn(6()7`NCFai1IABdSkI+ z>O;VJg0gp``hCEIQWtTBd9yCONpO*_WW7UYBh&LwT%EMFa5?1;cuo#@` zJE8DXsWgSji#7nB%){`nzwzA- zY~ZF|A)h4_(28m>B7Ycu-SC6XNnjDP!EGqVdYuFkC>)7EU_}+m8dM-Yh{2LndKj-5 zDN~A8K@PxeB(ob<%exbvO;dk3^VuG~_lvCo4FTjK4FR?%TD}jfR<5yp#YJ2hs-Ir% z+2u{Y63fu$*+fix{sKyXSth={IQ}DsIRLscqTt0xgnw}0z0DYRG@8(m{oy|su6Y#r zER@WJ{u0If`;fmK@87&+SH)sSx*!LCBHHHi#g9ns2zm8WOP*6$4l7IgTCB?f1g@|% zEQ2VUm^e;YRxDSfzQdr;z~sY;Ogs|y3x1ne)(`z|TJ6Qn?C8U`_zwrdEZCie89EF${6$S61xX=j)h>Iy68)OmAw@vpb0`ITZ7}Ci@Q$zhvhQ;f zoIZWN8Ht_`&h9zhP4VBC!vk=`0!NU@k4ZyNU%Xvnl>R-)d*yj&ATQ-mRBjttsoLrG z$~aCdYA40&_~s}4CjRnsB)*dKqv)XZjg9Ru*TJsO-@TOOTue&IS&DK$u^ELv;5b#q zmkmDUesM$j2yunHGcYp)7Ug*YCxES^CGgR{q1*|kKIQnhBj)!x+DNpbVjOtVp~E#f zCynATxDC|?WAt}h;rJVy-=@NAH8+25IrtOcw0ndr6=k_QNru_l6m}Y9<@!z-hvS(1 zbTK&2yDmqbEaR(qFv3ymj_;9#3m`CL61Xu6vK`D}I&qqiJN=3Own4@y!q{m3xy)T3 z+Bx>sM!C8*Rt0x4izWvfoAT=FE1Kcew(<R%$<9Gk@L>E_c{f8g#X?Pc zd!GyS+uIJ@kjE~Hgmh`ndw)#)_%x1St#0#p^sMgdFLBK$D1x^D7I@eUw;)Ff@UR>{ z=y_5C_Ldg=PCl@q$Wk&4i#vr1Z(mz!_vNu#uP4XEeAt_~BK{DdvfqLg? zp5J@><`8i%NeXB-OaSV=*eg4IewRM{Ke+emr{u9;mRELm_BuCC1`$K?L&05fMr&F4 z`sOCgFvuwIhlGWVs~rLZ5e@g^T`U%j+}|ZP_4c41F4J@8YQ~K7orz)@2ZtKjly+@% zG1a)a+jnY9$vWWlYC7*9I}db2PIzWI7&Op%qcaj>4#q6Zx7njJ0zCjT7en~ppmu5m z1Oz7&V(C^KU*6KV>1lvJX8=X=hyv|@?|fFP=Su>;>&p1QtG(zcDzzcFy1 zLHivoEi%mSC{4|_f}X^E6osys7vmY77LzRdO&w|zy~4?{Ew26<+8}MQ6Hk44!J1$J z&27I{>(CCr>7&M+K<^=($KTMK^HoTC!q6)TY0Dh?)2eG zn~dk>Prm;QMkboaOr`0-e{w01i)F};89fSP2vGBXUJ6c-G8vIu!c)QbKkw^70MG2g zAOHJSu!0~*_x}(4G$;Qbr{dpxK3G{Nk_S8RS>iZ?of0b31xAP&4^`DoZHw1?G?;PcS-i87Q&#lX+R4Z_=|9L?tL2|AKylQFx zdgGtB3x1vQ{BPcR20;%(ITIH@C9MC$h$?@tiT*p+M+%x5fy$ycPoyf%f29CP0dL=- z{ttzKCP663W`~Znz9L9!>VIB1uAXrJArEpv>J0%rfrIU#tX}+I-`*)jI&9`-Y@D0Q z@1gayX!uiIU0r8fuQGU?R+SnBtbTwI!)BWVom7L#!|=evn^wVJQ~`-#Cd zgcy*n^-;2Jm>*7J5h+qDrT2qF!)=)VZ*W;*4g#@&XDa1CS@8cn&tR~(hteMSvVB!= zc4FKs0DOX=s`>Tvq;X1zyY1RP{h_{Iixi-wr(4TS%d)de*lOxn!w3|<$dH$rvqzi}|f zpEo(~4n9ZYB~{n=sFedKWXz?>s75$!AAs6}QStCAt+FZ3xq9aVI(FPT!)o$i9@#_R z39N4s{n)wo7o7G-z^aLKmy?U&yu#4=$m764P4qb}8wgbB_H=TooLO)#@aiFlKYQ@>8N&(LtGsC5Y(XytfQ=G-S+m10 zq!G!J2+to9Z{pgvoJ-3fJ%WqxaEPI2mxcKSg&SjVKbr#jqQ_rO;+UmOg2F8e3gYK;0&KLLjIQTSVS z1}9H%F23D7pF7A33;Wt?WHT*7k*aK_IWN-1)2ogNBq=bO{!F1tPEO{Qr;Sh4vHMD# zgk6u$u=7{vqsuUsUw(PpjWm!=DtL8s_!kaHc+A*3K@Pi7)Bt&(&GFG7%v+9eE+XK(fEG^ux#? z^Yb^=W@wWb8I8)K)h0k(2yx}c0<*Z!f2NQ?rm&}bPufZNthl>KwHRj7?JYKLF0*bE zCtJlH{nJM34`JMAo zP>5)Y{(5({waNG;m`wbO%32f4@hs=&BsX+fI@vm^%xZ7=OOxx9jY*u|)Ok#sOXQUo zw809YSzp}kFg*O0IMyD?>%PeFdfYhdr*ND7BK0}<97*N!ea8S! z^0F_Kyvel_iYsh3H3PqDkr1#N2cza#`j7X1b5k#S=N>5iD6@8lyuySilBKEA1N-h<{em?|IyVH_Hbttpjjuh`F_^FxaTH(fE={u#abNx(;Tl^F8ms zHhrLs2qIKt4mpQ;Q_$5ha2511&btx!>i3UKR9q+7?A|%dhS7$Y=WwWhVF!y#;08Ff-nA{*kxKkD?F*URBxT( zRL=5}c9}fb=I7Pp6hNQ~!ElyL_`arSBakxb-kQ2Mg; z6>?+tYO_6%&c9Ib0TD}lXD>yv+?L>+kXl#f+yT6?xt1AsUyP(}ye(#?+SJ0C z=uR>PNA&7e_(?aCqu9rh)a@u1&o?ZYtPJa4B(rFn zFp+V8d*P$6l91dWmMJOh&`n~2h0CIpbq#XEPA3SlM!>#67UF7Ii{PnIg0J_<_8(PH z^e<^fE#?9HrTTj3Bt?Y$ZJ)Q>D{P|pupqoZy(1Rx%+-*|tT63&Aw^cF4Q{U2bq_4r z|JG$^e~D};80PbaA0KLM#zzs?l?azDFD}e`K{%3xQ64sJS;mBzD0J$>M}5m2tHAT_ z5>Y~d%&QZH5()#}Fi;2k{n(Le-%I*y=8~vSVNw^tQ}UP^{*e?8^;H2}KvIy;Gx8oS z=m0(uT7=OGhDRj}i_ZBF*T7$J;Bd%*(Wv~~7I`>4LJbs*yH3WV2`AiPWuFba_IPXy z1nhIX)0O7yn|y!| zPk0|vz!_Ww9U4+?5g$!b-$SD?&1T#($>>?ul0yjthz%4Sl9a!} zxVc2HvB~0v6nndT(ODlpQqv-EaYo>)i1{cVhI6p6&EX!{a)>2wcXUR;DxB5L4tEdt zoGHvRZWXJdc^>;wZE4;oN*|r~2fB#i-AWKiO(;~?F2NSc#$v_r%P9xwhvN_^Mk0JM zzvK0RSVz69&L42)0_Ck2TBQvyF^I#umb(-|hxfH#hP(3uy$p&?9-IazH<<;d{d;Ht z95&p+^h`dTto~hdV~1%!3Vx`(5hlsQCBC5!%F-9no30itS|Wu_JVuR7UD(WFo@P%C zoxIH9kw2#J;>dsD-w1Z%oBe%JmU&EIC)RUgpUx}t28wWzcqHprX}RL{=pSO3Nz z@N?)_Ov05vjAdv;di@ad#wr2@3xD|SS&3S$_mpnI7>K>Q4PKRNu+2mZQ%^pk2rwDc zz=QgPt_q9xy}HBHYAjc34B+vb*@?^f$;1V?m%7(tFlcuO=rzk)iW^{fT>?X5A=Ccr zgxRv$X@9SmavjEkFHZc&PYxYk!{k0|@_D$l1-ll8H0PI)yED&3lKbvpMHRaFt}={i zL&Wu_PY7859PN85jz$$L{QXO!e!n@Fl8`Z9JA{I5C_=zdXEGs5dvJyCOQJ-37Hn+f zKly`!P*{w$V;{BC!ar~O@Vg{2}ZB5>O928GP0N`%KLK~$qpf}gzjr2ws=RZX+Z@z z6!yUE3(i801X}AAUPe}Su%K>v;VWQ43bhyt8@kKmc zhv>V9+ryLVt&Cj~Wetg8&+wk%lmR|UgEOD#R|0ik?!%6t0 zW=E2!i>hi?eh7eQv>R?ldW~PfO58BNyn^p7^fU$=60A4+R%}7)M58N?Brkxz$)3=-v2wxeP%B3r4YZ7#fzOZQ4C>SY^;^ zY2NzHk1kRqk7rGOmBb;hMY0lghh!>^H(q{`|A4#|hqIch)gt5-2RmUg)9O|~ z<-%#2S<-o-OwT>Yh4ZP$QzwQ9Yyq4;`Qa>1cSsDtXmR#C6DfV$e&P89VH1|>*3H8+ z+Evz$M2dv~MG-5v!nTe;rA=7eE@NO_s{xA2uai#Ca=osQU4#eTEm53*T*|Wfn>D`i zDhhkI8e1`qU%|6 zK>A=yk8v{5|B4jipnaeb`EYKPGDpb&mk-}T4*duh`RV=-bUoZC6!<44CMQ}ZY~cHU E05%C=Z2$lO literal 0 HcmV?d00001 From 67e28b35aebed1e367210744a720ef08810acc18 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Mon, 15 Jul 2024 09:24:40 -0400 Subject: [PATCH 021/154] Add PR checklist workflow (#7699) * Add PR checklist workflow Signed-off-by: Fanit Kolchina * Assign to user instead of owner Signed-off-by: Fanit Kolchina * Testing Signed-off-by: Fanit Kolchina * Remove test Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * If a PR is submitted by a doc team member, assign that member Signed-off-by: Fanit Kolchina * Remove test Signed-off-by: Fanit Kolchina --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- .github/workflows/pr_checklist.yml | 40 ++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) create mode 100644 .github/workflows/pr_checklist.yml diff --git a/.github/workflows/pr_checklist.yml b/.github/workflows/pr_checklist.yml new file mode 100644 index 0000000000..c2c5c6db53 --- /dev/null +++ b/.github/workflows/pr_checklist.yml @@ -0,0 +1,40 @@ +name: PR Checklist + +on: + pull_request: + types: [opened] + +jobs: + add-checklist: + runs-on: ubuntu-latest + + steps: + - name: Comment PR with checklist + uses: peter-evans/create-or-update-comment@v3 + with: + token: ${{ secrets.GITHUB_TOKEN }} + issue-number: ${{ github.event.pull_request.number }} + body: | + Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged. + + Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a [maintainer](https://github.com/opensearch-project/documentation-website/blob/main/MAINTAINERS.md). + + **When you're ready for doc review, tag the assignee of this PR**. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review. + + - name: Auto assign PR to repo owner + uses: actions/github-script@v6 + with: + script: | + let assignee = context.payload.pull_request.user.login; + const prOwners = ['Naarcha-AWS', 'kolchfa-aws', 'vagimeli', 'natebower']; + + if (!prOwners.includes(assignee)) { + assignee = 'hdhalter' + } + + github.rest.issues.addAssignees({ + issue_number: context.issue.number, + owner: context.repo.owner, + repo: context.repo.repo, + assignees: [assignee] + }); \ No newline at end of file From a594df393b4d863b4a11a1341710a2ec3519e910 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Mon, 15 Jul 2024 09:26:20 -0400 Subject: [PATCH 022/154] Add vector database page (#6238) * Add vector database page Signed-off-by: Fanit Kolchina * Revise wording Signed-off-by: Fanit Kolchina * Add k-NN example and address feedback Signed-off-by: Fanit Kolchina * Update _search-plugins/vector-search.md Co-authored-by: Melissa Vagi Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _search-plugins/vector-search.md Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Link fix Signed-off-by: Fanit Kolchina --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Heather Halter Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower --- _search-plugins/vector-search.md | 283 +++++++++++++++++++++++++++++++ images/k-nn-search-hotels.png | Bin 0 -> 17116 bytes 2 files changed, 283 insertions(+) create mode 100644 _search-plugins/vector-search.md create mode 100644 images/k-nn-search-hotels.png diff --git a/_search-plugins/vector-search.md b/_search-plugins/vector-search.md new file mode 100644 index 0000000000..862b26b375 --- /dev/null +++ b/_search-plugins/vector-search.md @@ -0,0 +1,283 @@ +--- +layout: default +title: Vector search +nav_order: 22 +has_children: false +has_toc: false +--- + +# Vector search + +OpenSearch is a comprehensive search platform that supports a variety of data types, including vectors. OpenSearch vector database functionality is seamlessly integrated with its generic database function. + +In OpenSearch, you can generate vector embeddings, store those embeddings in an index, and use them for vector search. Choose one of the following options: + +- Generate embeddings using a library of your choice before ingesting them into OpenSearch. Once you ingest vectors into an index, you can perform a vector similarity search on the vector space. For more information, see [Working with embeddings generated outside of OpenSearch](#working-with-embeddings-generated-outside-of-opensearch). +- Automatically generate embeddings within OpenSearch. To use embeddings for semantic search, the ingested text (the corpus) and the query need to be embedded using the same model. [Neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/) packages this functionality, eliminating the need to manage the internal details. For more information, see [Generating vector embeddings within OpenSearch](#generating-vector-embeddings-in-opensearch). + +## Working with embeddings generated outside of OpenSearch + +After you generate vector embeddings, upload them to an OpenSearch index and search the index using vector search. For a complete example, see [Example](#example). + +### k-NN index + +To build a vector database and use vector search, you must specify your index as a [k-NN index]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/) when creating it by setting `index.knn` to `true`: + +```json +PUT test-index +{ + "settings": { + "index": { + "knn": true, + "knn.algo_param.ef_search": 100 + } + }, + "mappings": { + "properties": { + "my_vector1": { + "type": "knn_vector", + "dimension": 1024, + "method": { + "name": "hnsw", + "space_type": "l2", + "engine": "nmslib", + "parameters": { + "ef_construction": 128, + "m": 24 + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +### k-NN vector + +You must designate the field that will store vectors as a [`knn_vector`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) field type. OpenSearch supports vectors of up to 16,000 dimensions, each of which is represented as a 32-bit or 16-bit float. + +To save storage space, you can use `byte` vectors. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector). + +### k-NN vector search + +Vector search finds the vectors in your database that are most similar to the query vector. OpenSearch supports the following search methods: + +- [Approximate search](#approximate-search) (approximate k-NN, or ANN): Returns approximate nearest neighbors to the query vector. Usually, approximate search algorithms sacrifice indexing speed and search accuracy in exchange for performance benefits such as lower latency, smaller memory footprints, and more scalable search. For most use cases, approximate search is the best option. + +- Exact search (exact k-NN): A brute-force, exact k-NN search of vector fields. OpenSearch supports the following types of exact search: + - [Exact k-NN with scoring script]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-score-script/): Using the k-NN scoring script, you can apply a filter to an index before executing the nearest neighbor search. + - [Painless extensions]({{site.url}}{{site.baseurl}}/search-plugins/knn/painless-functions/): Adds the distance functions as Painless extensions that you can use in more complex combinations. You can use this method to perform a brute-force, exact k-NN search of an index, which also supports pre-filtering. + +### Approximate search + +OpenSearch supports several algorithms for approximate vector search, each with its own advantages. For complete documentation, see [Approximate search]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/). For more information about the search methods and engines, see [Method definitions]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#method-definitions). For method recommendations, see [Choosing the right method]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#choosing-the-right-method). + +To use approximate vector search, specify one of the following search methods (algorithms) in the `method` parameter: + +- Hierarchical Navigable Small World (HNSW) +- Inverted File System (IVF) + +Additionally, specify the engine (library) that implements this method in the `engine` parameter: + +- [Non-Metric Space Library (NMSLIB)](https://github.com/nmslib/nmslib) +- [Facebook AI Similarity Search (Faiss)](https://github.com/facebookresearch/faiss) +- Lucene + +The following table lists the combinations of search methods and libraries supported by the k-NN engine for approximate vector search. + +Method | Engine +:--- | :--- +HNSW | NMSLIB, Faiss, Lucene +IVF | Faiss + +### Engine recommendations + +In general, select NMSLIB or Faiss for large-scale use cases. Lucene is a good option for smaller deployments and offers benefits like smart filtering, where the optimal filtering strategy—pre-filtering, post-filtering, or exact k-NN—is automatically applied depending on the situation. The following table summarizes the differences between each option. + +| | NMSLIB/HNSW | Faiss/HNSW | Faiss/IVF | Lucene/HNSW | +|:---|:---|:---|:---|:---| +| Max dimensions | 16,000 | 16,000 | 16,000 | 1,024 | +| Filter | Post-filter | Post-filter | Post-filter | Filter during search | +| Training required | No | No | Yes | No | +| Similarity metrics | `l2`, `innerproduct`, `cosinesimil`, `l1`, `linf` | `l2`, `innerproduct` | `l2`, `innerproduct` | `l2`, `cosinesimil` | +| Number of vectors | Tens of billions | Tens of billions | Tens of billions | Less than 10 million | +| Indexing latency | Low | Low | Lowest | Low | +| Query latency and quality | Low latency and high quality | Low latency and high quality | Low latency and low quality | High latency and high quality | +| Vector compression | Flat | Flat
Product quantization | Flat
Product quantization | Flat | +| Memory consumption | High | High
Low with PQ | Medium
Low with PQ | High | + +### Example + +In this example, you'll create a k-NN index, add data to the index, and search the data. + +#### Step 1: Create a k-NN index + +First, create an index that will store sample hotel data. Set `index.knn` to `true` and specify the `location` field as a `knn_vector`: + +```json +PUT /hotels-index +{ + "settings": { + "index": { + "knn": true, + "knn.algo_param.ef_search": 100, + "number_of_shards": 1, + "number_of_replicas": 0 + } + }, + "mappings": { + "properties": { + "location": { + "type": "knn_vector", + "dimension": 2, + "method": { + "name": "hnsw", + "space_type": "l2", + "engine": "lucene", + "parameters": { + "ef_construction": 100, + "m": 16 + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +#### Step 2: Add data to your index + +Next, add data to your index. Each document represents a hotel. The `location` field in each document contains a vector specifying the hotel's location: + +```json +POST /_bulk +{ "index": { "_index": "hotels-index", "_id": "1" } } +{ "location": [5.2, 4.4] } +{ "index": { "_index": "hotels-index", "_id": "2" } } +{ "location": [5.2, 3.9] } +{ "index": { "_index": "hotels-index", "_id": "3" } } +{ "location": [4.9, 3.4] } +{ "index": { "_index": "hotels-index", "_id": "4" } } +{ "location": [4.2, 4.6] } +{ "index": { "_index": "hotels-index", "_id": "5" } } +{ "location": [3.3, 4.5] } +``` +{% include copy-curl.html %} + +#### Step 3: Search your data + +Now search for hotels closest to the pin location `[5, 4]`. This location is labeled `Pin` in the following image. Each hotel is labeled with its document number. + +![Hotels on a coordinate plane]({{site.url}}{{site.baseurl}}/images/k-nn-search-hotels.png/) + +To search for the top three closest hotels, set `k` to `3`: + +```json +POST /hotels-index/_search +{ + "size": 3, + "query": { + "knn": { + "location": { + "vector": [ + 5, + 4 + ], + "k": 3 + } + } + } +} +``` +{% include copy-curl.html %} + +The response contains the hotels closest to the specified pin location: + +```json +{ + "took": 1093, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 3, + "relation": "eq" + }, + "max_score": 0.952381, + "hits": [ + { + "_index": "hotels-index", + "_id": "2", + "_score": 0.952381, + "_source": { + "location": [ + 5.2, + 3.9 + ] + } + }, + { + "_index": "hotels-index", + "_id": "1", + "_score": 0.8333333, + "_source": { + "location": [ + 5.2, + 4.4 + ] + } + }, + { + "_index": "hotels-index", + "_id": "3", + "_score": 0.72992706, + "_source": { + "location": [ + 4.9, + 3.4 + ] + } + } + ] + } +} +``` + +### Vector search with filtering + +For information about vector search with filtering, see [k-NN search with filters]({{site.url}}{{site.baseurl}}/search-plugins/knn/filter-search-knn/). + +## Generating vector embeddings in OpenSearch + +[Neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/) encapsulates the infrastructure needed to perform semantic vector searches. After you integrate an inference (embedding) service, neural search functions like lexical search, accepting a textual query and returning relevant documents. + +When you index your data, neural search transforms text into vector embeddings and indexes both the text and its vector embeddings in a vector index. When you use a neural query during search, neural search converts the query text into vector embeddings and uses vector search to return the results. + +### Choosing a model + +The first step in setting up neural search is choosing a model. You can upload a model to your OpenSearch cluster, use one of the pretrained models provided by OpenSearch, or connect to an externally hosted model. For more information, see [Integrating ML models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/). + +### Neural search tutorial + +For a step-by-step tutorial, see [Neural search tutorial]({{site.url}}{{site.baseurl}}/search-plugins/neural-search-tutorial/). + +### Search methods + +Choose one of the following search methods to use your model for neural search: + +- [Semantic search]({{site.url}}{{site.baseurl}}/search-plugins/semantic-search/): Uses dense retrieval based on text embedding models to search text data. + +- [Hybrid search]({{site.url}}{{site.baseurl}}/search-plugins/hybrid-search/): Combines lexical and neural search to improve search relevance. + +- [Multimodal search]({{site.url}}{{site.baseurl}}/search-plugins/multimodal-search/): Uses neural search with multimodal embedding models to search text and image data. + +- [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/): Uses neural search with sparse retrieval based on sparse embedding models to search text data. + +- [Conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/): With conversational search, you can ask questions in natural language, receive a text response, and ask additional clarifying questions. diff --git a/images/k-nn-search-hotels.png b/images/k-nn-search-hotels.png new file mode 100644 index 0000000000000000000000000000000000000000..f17fd171cfabafbb057ba805fb1ca83faa07415d GIT binary patch literal 17116 zcmeIZ1ymhdw=M_-CpZL`5E9%Hf@^RH?(XjHo&N=_z zw{O25?~Q)lqq{F->_OG8wN_QFHP@Wq_szLOtWQg@+pyoU zJn|yZGO(gBJ|MAR$dwJOAeX>lV#=vHDGRq(Lsh&G#-W3Gu}5SiF9+&^d7*RxEi|)O zI^=3%Gi)m}d=GX!XySH+`thC9z@DoJQ6Pv~6YJnmB;woG+*Noev^GB|Cx4XSn%YDA%^*@8SJV+8rVE1{OVpxJp$d>gh}KNL zQq{rf<02`L|8jk!#@zJ`JB77N_l+Y(M>Dj*EA8KX#NonFQ0gcl6OMH-!ofV~`j(io z)F0e3bS}ffK9c?t!(bC}=tPv=dKdMO@#-qsBuXo~1GCq?AYP*57O}E(JH_~M)A`PJ zKs;V1T|bQ8A(WDw_!zl2by0TUc~m7e4Xahs=cc|p37JTEVd;151Kd7I?!@Wj*Gp%k z%jPmKZpzoVW{Y~wH%zZZKUPw{)ACWN=uGAuBJLZkAFrZz=lMJmwY!-u-2KrpoSDlh zsdAhmz{H_x-9Cg>ONU=~?eM*0NqW;|SW)X4ol)PVNv<$_ujl6AgK)@1`R$;kOM`9vl>Fl|yz^NtI z{FxA8^~te1{YCt~PNqOplH<`MSPHzPhSzL!k%PYQ^Ipc_Kn@QSwS(7!if?xwMLGx& z6M(;Ir(=FD+Kvw^$ngbcj2u-Q?`;4*x#kz>dZ89MW*oFG!I1AzSb_>>oK!H|BMy|B2}5pDXfx)JPVu3lNoq3Ph)VCdu#T z%~KOtAf6+A6PXxws(L+#OX3&)!EsD#lx~b@40lUt3*L$7hOGgQH|!zkr(PBF3w*ri zDA->y^&D6}C%hem9;6w3J*fSW-vsAtSAo8Cnam876)_uba>U}7#m~h(HF}L@-ZPlQ zXu1BCZS}u@*%528m0;9jyTZDXPJi3&@$QU1YH-tRBI6@v`c>K${~LXE!u;1yQMviL5aDf(N#Lg{AmWyYD*Yj6JC?<5&|j>P-cQoycTr zXeaGl(7UD~mrJ)!kXm#`jF-XhSNA`&`M~ zjb<0ev*^`&435Qi#a_kU8the~RnArS)m?Mk)l3%X+j-k=J6St1J1G+cliAx1yU5N` z&Lg`mS%u??XHFiR&ENQnaF)o-(xRWrt8?-AbhxQCm`oTrdVRlJ^f+fxCmn3f!( zov|oZ&Yaq<4{}`Hv`L-kyG*VQbZ$8J*%f2!XY*&Xv`(^KnoF8@u%@#hjFNsAlQ7oafxu}^^C%1{utd(r7T$QdyLrMg7}$6I{xT+B+B*~(k?}Dz zOTKt)a_iGp(iS=kAIoc&ikbRyI)l<4^KPWoq!s()v15c4u@_g2Gc7;(+h6`<{=M`e z`m+?Cj&@c(wfER;*JM{p7Znyu2unyXcB|blj;WMLN)-j=apvjd#=zUq%|5zy)RV=l z*LQe*Hhr)Ap24uUv$gAmib<;Xe;?IUd+8NT_`QMZTu4K0uD$B`EaB|txtFLxw|Qg* zwe)0Mesn!+rIgi6Mx24TI{MibBQ-2a&hK!>41*T^DuO{7L<)-AuH#F@mnr=z5%pvB zRB^NnxD3tZgI=>i#!M%7CqDgRn~tZN@w*JVl})a2j$KQ!qy1U2#f8sRaEo?y_gim9 z30IP;-m_0@lrpH_`e+X}Y{2XXOu>1BuYFZD^W>;$_S6p>d`2#}4nKokt;*)h96M?$ zbjGR;c2_D^n273$O7Hbfg|5+8lcU(3ty~tGJW*V}nhi7;lWL;JQpc{vo>`;Jg5)Hb z7-+p)y^k*njW3KnT`*k5U3>cj2a;U~4;Cs;r9>pV79~sjZ)l4PP?gDCj$ND&FBY_G z#XMz?(?I#km8WUkRbCt;cqx@$mAjFtvzF zgj;zM>OH+BzRgGTnD<@pom0_x7U%Znyms&kbfp zpKD`iZCPqB+-w=_b#Yst&T}!sX*W2`9-ef$nA~2|*{BX{IoEsK5Uy&hI`{0NvJ=~+ zG;G_QDg_&gbui>=N~aE|4r!k@3_E)|>DKK#-ky+vQg>a24qX<%Hn_G@{SVel);>0Ne>wPw^evNbSfbhEaD zbb;b=;{pz?jh*z#-K?!_9J$<#y8C$6fn_C0Z1Kfj;m6@IAsr~=%%D?XTZ#~ui zt>=5zfA9HkSN>y9Wk+KNAzN$UmQH;CTA9B(|NF(iI`S|<-u&NY;?FcceGAMpA2JWq zKWoN^Y(FGC0R{EGR6Bz_#Tr3!c4=UAV+65Iy zlW@C;Q{zZxWKA+?Xll|$iVO3Q%I+XbXJj=raOLOa*$I2^?Cxe46@B#L)5T(tT+AS1 zv-+-Fs3Ku*ZvND+yIYiok?}<XMF*m7hBFkzipY z;$o7z*liZ5zv3~)U7v2|=Cn~tnp;~}Yo}D8T3K3F-r5NY3St!lci7+GzhD!)mCXsZ zp0AN$#*bMai0>(uv{|6}C;*T`41u=KFtC{@s$9~dkkcB0XkspGhOPNSCA1p@yPL4=b7RDAAO;v zq@<)PMeKy}r6oooAt4381pEca{a=ssO9=@DK1V~7v?+P#&{|$De{*}=wR)V@dW_v9 zq^KBuc5x9py_>2xyq&l(K5lTCdT?;C?cRa3B(gg(HTBxv-5q-{@u9D{xOjYamd2-* z|K2vPjk~J4T7nuU>hA6ic0=Zoel9++{m0tcTGr#x8#jbSw4I9)n)$ZyaI{8nUIccK z87nPc`p5x)VoAhCU`g(;U%%>1F@hbAms{rxc zOn)_b`QKk|%r@!i=qkDMK@1atAdy=cz*ut%4O&}+4?xE^R7WQd&rVjqHLf4&N7ft&MQbM1Yrkq^7o9G$APn_ugbNBrAKl{R%n51CgBB*}swfFc!}uqav$9u7u~X zGas)rsswA<301Y&hVZLjBdpE+zD#3#DoBt$-s=$4v#Gvv#HXnTdFXL=iTu(1ygq7k zojc&$FW(xdc5yLaV-?|+4Ud)u--Iu%&gHf8x?DO-={{n#yQPFT>XPFxHb0p2!l*MG z4NBvYOt;uBT{hi$%0_53qNSk@5%u>!U^U z15f?2Z74!gbNVIH^&$!CflV`ABmXyu%MnGfnqQ>Vtzt~h1>+M4&%{bHJ$VcIGH=~r zZ*E3Zk>*~mXYJx-Pv0_(y4$t2;7-3k^p3u}QZ%ifGsrsLlO%~;#(Z)YZS_){y>W*+ zX0=T@**%|Ode{^7c$Pk?0$vIQjugB|fzA&$(}l6=3Q*weLH;zW@B5-A=FMw50p>S! zy(fTMG4mIC*5Pn}&(wFAo*%8V+mOd~H?8PgG1;)%9S^SjILuMrId-1wFu~N;&OTu~ z%uV6f#)^Em^;7$#Xa8JG$bPNZb<(-tThqeG*734eiibiEum>ktSd3l`U}&L=+I-D3 zsA#p0Ri;{T?cH{@Xt$v*zyp6(<&zd8pNG@2+=~l8c&HniY}>2+6%7ro=m*A^M(xJ5 zD{H&8^-#J&hyZ_ioa!?^Ss#ewtlLN|oNKE1F^GJi^SHd8eZ=(#fw>U{Rp9s;CJGPXjdWd$sQTA}>cmK2!JFX%&1XgK6|MI_ ztQMBYdAnba$Rz91tETYkZj8|2uWxTqO&Py*e~A0K_))JmRnN&A4@&P1B2Q)RV#&t5 zEKT%EK$C>Bx~%HeL#b{ILV9t0nvht@XLX$ZXy2R0*ADIGxLExo2SaTsA6m6MyUZ&* z+&^0_PEp*)^@OayhH;(M(pN>RnsM1!H>=ZV{=y3GViZub9he%3P(b2 zhc-V?XI2!pI4V(x zKYzZ=FD{N1AZPxJrO+z=C$jYS=I&3ncC1NSSy{zAJop^Dz4a~oS7-@Kg=1h~Z~<>0 zxO7j6Rq<(SYe&Vi*MpDWOuxcPZmh1JI_SZT{BqZWH&Q-Q%RMpY&Be@2K2RVN=UDX2 zp3h@mW$cxTGy8Vi&%g}dhz*RGz^F}5s}@#^CWAP!hB?hF(eS+jlK1w5(KR(}NS{AM zr5}c6a#S-R1PT(&F7o6wOx!3fI^S-_SnG+21z+9!%$sVwTxn8gCB#i1F}+OZsQfrT zkc8fl2B4kQmfTtZE**3Vzln*7nIXC>D4-Q%)z~fa$1i7lBZEj$90AJA$oPeMJ~phb zqGIBzP^_7QQ2g~Gb$opMqiUCr?_vTD9^O1r>_=5dXlijN1{1`qklelkIp=i{sEEU# z@LXgMr`5V7wixT@f9m)xKQ+w2LEY-ZMq9lfEZWR>vKHfBX?!eknnmxg_`=CBW1ZW0 zRaZxcF*Y_fYLCsTlOj>F&&DN)%KYp|wbN~l}x8E-2%kSS|J5T6MqARVQCwGRQ z4h!$Q%fe!m@vm)1#*j6wv((#ke9aU_Q*TriE17!><{IP!$r&%YxeU!07%us_bOm75 zyBQmQ_=#U=r8B_j4=NF=xXw0lul^|&6tsG1+*_j3WAp9ZMQ&FKQ7MI=-|O$2L$P?p zwx5`$MAv%=9&b;{$5zDsn$#rvu)=f(C-(Uai&INE$_wBBd3pgzUYnm+nnNhaIM zlDJI3@a6NLAH0~EUULrzB)?=Js1kK;fyw;@z0+n zc^?PruWA}-&kOFVXftV$HK7ey>}sjjqV7pL&lBc0zmpU|x0(=)5Y|@1j0*%I-L$$_ zfcVtGmiKWFOozIRWE`d~yy%ZFnbrNl8kgB#C$TClLiSl<-Dc@cTD}4q2&=lEyp5Ku zJlBW}Yl`LPqGMrhz8z0LdZlgU6E&1`{iN327z)2{G*E#V&8tFWbve_(^Zer5FgyU_ zrMuJwlYR|I&ho(Z{* z5oan^J3oI*JH{UXDT4z3U!3JYZqrLsfi>O9e9w|o@iE;+@Hdha2)f zJ>wgbYuXwVU$8tf;U;KL*tbl$fDDR_m{MyMzQrH(3g)g`Tow`(0Su3^OqA*0P7UCu z?-u^?trxi4KJJ@&298CJa=h)|Ve#m{*Q*LuOZ_FK4SSEXS|oP1mM~TBzs_Zewv$ zDum+}J(AZr*KFP~YVc+zNiv78EF%Gr&CCl^4`LX8QQEAM)^KJ3+;Nk`YO&I}u2dFH zqfyb3N9QZ8-fa_#o~&y?&R0( zOtLo@(-OCFJy<^cCLguPACknIo^J?47A8SpxohMhv-zu%DK`%*wGhak?NvuW;e)#o z>qcfRxfQ}nZ|BKZrpO>&+wXy%m*Rc%Bc$$$Jvab9r3uKw2Ub>(2l^hBrN?oN&A`K{ zp!E)RS1-j~;XWcgQedj-_(iP1d3m6Vlc;Z3q$lN$e9`lKfCwqtN`?T4t<4_BFhY{o zCdgjrD9H3%l12Ir%MNZ@mI44M6!Rt2pFrI^9Y4L4Z?*z#s!Eo5T`j2MnDa+|`$|LRU=K9%k5X}CijiB`^;u8&W9DQuw;^kaJt91SK=`e>mp=W5uyJ-ahE9&DVBKg{?e(%IDXfB-Au z>HfJ%Zy*AiT5W^J>M$wG=8A}7n`aM+E3r49>w#ANasbL&N#j9OFsX<5!%2MWmyu0w zE^V=!Vp+;Ym;DO#P6ATWxMIy(Nmtj}1^3O=mCIYPq2zojj~{aK@{wy?YXulL`;BE~ zqnPI*8ooTg?bK3E^fwbt(<0~QG=M0cxWkyaq#Dr%YT)>t8~6;=O^<;{e96vH@N43# z)BJq4+%TBu{%qo{=f$46(wBRc2GmJwLZUTSrasqnL)JPW;PV_?|?3 z4-bz$>u_werG-k4)HCEc4U{7$ecy2#3;%{ zq7t#qG^&VwFulJyudrF<8k56{YCyEBz+}Y?+SpUIu3TL+)flI z7iLvevD#3j4kLa0_Dw-cE2SJYWfqi^^XB$hEH>L4jna(`lUt$Kpla-(Z$#|6j-ryfaWiV5L|M7a{tIq%d0jbZG{%nhPW087g_mbI>=jxZ|5fFpefHO&L`H%rx ziZ8)S<1cypnNCGg?;)h8c^Ksw&H#ROlkc)#UTTYRqNIeeej~|}Bf>uBxhzC38ASOM z&k)_4G>~0VtLmZSy>9l|B-JHp@P0W~q0H@WVxP<7W<3&*$awxhg@&d#T52STib_lr zOnhzjlMz3JAk1){w_@IDZS4>> zVe90;<9)j7+Eh?UChGN6jmGtnJ5?XTa2ivkDd*};WM*t~P-LB0cG6+9%f7~~0CDae zNFVgg9@x`R{;y#8cF9|RO^kd$;@p%RMeYmvP+qeCgXhZXK2L5k>gFbREp5ni{aBl! zBuknaUzG5Xn6r%TvxF7sTtX)Qa3eM_FTZUd3*Z(>mdW9ds{u>tLX;UUr2pq1rQhx^_Pt<+aiDquN9_D1Gl zg_UTov*H6zhl8Rl&mb;E4u=~bC6JG^3wd}}AFbtQ>mMmfSkIo4KEAnH@`XwmpTql+ zWIDXk6goP*xKOHT@9937Ehp6YGt8tANYs{D{x$KcV);`UkA7-L-N21B3-M7OcUdPA-1K?yi@U)uKl>|{1)I?!-n1H;mcR}6ZulJ5OaKi_#BNS&KG=^v>3I^ z=hVRCe&4v3NW>I%D$CpMvu}CH6AxF+1A+}WPJlCf^2;o+q~(lkJ~KB-d8!@Pjj^#8 z#Py(%#|9$h;F@Q_dp%ay5n%jEW9e};;7!P-VG85Z$ z?M_<1ooPB;cN3wi-@+1}+rbK%FMoL$fKZ+M)FiDeE!;U>>mT8NbV0hQq?4>De4nX8 z2H8K*unIJzi|o|#K988@H8yrd`fi7KBXQW9I9VqQ#PToNx4ujXyoNM1MG^MAS&us8 zrMTh9W!7c$+SZ~$UgvBaS1clF=Mdp|0TDD7D35;oE*b={NUX!DP^VGp$X+bvM(Q~! zj^q>r^VEg|OX^mkVxi%&_qx#oVJ+kE&RmJc!*CVsZ9SDYZE}JgVE?9+cLxIfJ_feg zrZ4%PX=_yM89ij8z7s_heHJMH^kS=U58zy|$K6-;JLF9kljHtu^QTpa27)}5Z8z4_ zi8{Ri^Z(4HY~91Mny+o)0J(x$kqDF%)+}kInCUQdZ#c8XUZR6d$~OJ zIeD{^Iq~AYc+p)P`hP?QLxhPjg*C`fJ3~W}#U{nLP1V&C7bU7~5=pBV=UgSFQtkQK zQGc$i7>Ip}g#Z3<%=P+f4`)J3sGEXD>Ojm`2GWxv++D*0i1iL zj~;$0m}8YrIVaoA_s`y3!m#UpU{yhY1o*a3Fu_HhFPF}HEl+MuFQmY2`eP^_zIUZ~ zy@^z@AM5DfQ8EI;l1*0%&{&`wc4rQ)AGOt8f~|FuvLiS`$rtwqXN`?nWZp@Vf&lonjIbh3QM_=Je|YLT(yVV zqoH^RRC@xJC^AHAAE#ojr6d=5sk^{lmWyZfurObTo1%}6*iBILD~}QIcCu}|eX+Hv zT5qU^L%elX`r)$Qw$fj)UYfrK8w9Yp3rRw@**N1-E!{?c@Cdhjfuxx3kZ6Rc_Gyhp zh#HudK;c!}HetSf79@p0UJT>0-FT@E?RGKFS&>hEzuP{*OQl_=aZy+9#+R&-;^YxPe9NKATl|I%#_j z%#Z1^WM==>+a_H?3PSbFHo#q5|`&XyNaC7-JR&Z~M53;$m!j)tgwc-iX(a4fc(PaJySpn}sE2N5Z$yEF_tS z{bGf2F$LNdVM=Q+dVic3+b!*VZ#oV8@+~rbGxD7*ZFvvb{Oq3151iUG0J}ezw1?PJ z0URp^yYFUzeAfj!+5jkCb8mWB=dI}^YZ@gLV0D{4;}8?ALY}NT>wUbg%{9_>jCihj zsmD{@AkJ9ENICqS3)&?+XE-VfVjfi}Jm(h4dw}T($%6+CNg_R#mj&j))6%v;*2dg= zLIq$m$1sPth3*yi8(-H=(>;^#lUE%*xz|=}`T17o*R$X;!elgvCLuW(aXA6>ruzM8 z+nPz~^Hg%=Z!mkaNw7g0yK3kM_g74(30Y8L z*>8|~S0J;}00+?^9i;**z+8!s+Te%iBEI4eE|a3`u|bTt#r(G=(G%Kms<9*NqE8At zz-^S5)0k2isKiVp0L(FFnPWk{gPBhl`nY=s_pVw!yoeo)Pg1XuhXV;l8b9jDWqq!* z9rv~K=P#YbMeGG$XbSdy_W;rc=Q%N7fS1dX7Vwqgu~m1U>(AB0bjcn^c?v)n%&`A> zc4*-6251UF0YQGXIXT^KT-Re47cK;Zgi#Ri4GR;^M2;%O`o|SJt6G|_`4}2zo?Tvk z?OBx&vZPWq`X_G`mL9LEp|RI8vAD=vACzToQ!J~h8o#z~{E$)x0;z28?co6V#2k=J z$;{vCTsnU}+yraw#kw9dH75w^Ad%FGJ8zyp*XkKJ7uTLoCZ7y`##r8!7hmI}Q3K!K z!kkK%rnWYTs{sYJq@?5?s*kdA>?^`zXOKXcFhw{-mxDJD5&8UTCo{FOsA%vYOQZH9 zx(41pcXoDmVoB)z1L23RuC8#zSE%IotIm=b+ufAJXxt|MrmWH9lF0Hi=SA>w=~Giv zm)({D2ctfj^Q7K`XFuBW$nd*&*@v$rW6f!n$ey5*KRYmy4KROABviYZ9&_lQY5gl- z0Fg$>@ll_PKIIRA!|i8Jla7cFNbY|G6JQBoUwH$D23aK`KfS+!mw=yM)_;}-`FEHW zFxIAr^QUJ(BqTQi4E|5v3UVgEJdW_+U;Y30Jap6h_wOZWs%mQoMn}I-f95qvU~sflTGBc8XnuAG}%m=eRx+#GFcDukjcDjO-R|0}nd)h{vi>z~|aK`%{w=DgtI1E(?3%-ERzCDFP4 zFN|-ZU7M$~_f!+-V_u`@N@{BOwX{n+J3DpQ*okrq3I?`Eg#`t~V`jyb@CKAv_QDwE z-mDmOw6w(XWcw+`#>OsGQw_!*qYY+Y8U}_TC*{4%X$)FMMi*gxbEs{yQpEsq_BX%n zW>|;2NRGBf)QAV^NHD`t&9lV{W{u%Ne4XP?=$ zN^y5f`CT0b4G8ik|9%d`5?BtPV7}Ygk^bsw9lFzvp=jfQxW1(KmcqJ+J39uo z79dv?LWozdQ*9$E+=kR!R9jAyrUIUs@^Z8Z9uWM?5cp z+LxQ=A$idH=mUUSL#k8aKQ5~X6Q3RbTp-!9q+S$MtVzWV_Nvr!`~7+4*sUVxm=*R< zt8$nn?t?nRJ`VSJ1&hVC@ zr)OjH06RPThHT!1PAyObKtoTDLp4c7_)U_7*kurbn0LIsrbgw%&(W&Vcs1KRgfw&Q zrX_p+cikGS>!sUZO>zvoK^?-}Dbt>Cl9RIu*@3;;N<|y~Sr})cJkJTB%EU>-`$^A{ z@$q>uGFRk#Oyl(g@V{)0_56$4J>uQF&IBrIpYuZ9*kqdq37?bb^j2dahciJ)bJ~rt8wd(D)yhbeE~{gOq;koCR-Xf0RI32nOy!3 z2nYz9H-kGH2=%cWMPE-OWyJvMB4`n3xVUsHS^uHRaM=&u(KeO#;j zrV3TRDwaj4gknC~6s!Uz#yfo9>jB-+1I8&?qni<8m%}A%vef>_{2b)fr)8g|z&I7F z+bwO?frp3RJH>UDExfiJ%MkfMZ8ETvOY+pl5TfzSB)UCWpP!#55WH%O%;z9l0w7ic zQ2ejN0b-p9P$4jol#}|@4GEBgkyuy&aUO^+|JCV#)kL&u0+32cbAEo_*vw2EpslYl z5Gj>a3SQU%?x?+8uy4p&^njg>P14u5)r}`j>AijbIx7$oZEZ{4nwMzeBm3e0#$l=b z2Dw^spxnYjsgskF94*m;7gqmb#k6np9p>idC~=S_e0+Qq*0U`1_Xpbb-S5&-hjJ8;O%WI}xbr=&DXVO4>achZ) zi-*Te5yt9w7BNjDm_-zA$FT=&M0fXkB^4FPiup1nC!YUNW(>e9$FL?FjH~!aQ z!Xwb+BxUyK|EAC^ze1tf?msUyD+p0!%mo6Sho@)P?zs#AxEN>tmT#0^aIaVVRT2=- zzW!FBS`UboPA)E?;6qvENQu}95a`WMIq{Rz(`Z^MWfaT51f1vsid?O58-#z-`ir)c zJxD@-r}RHgi=7HiE9&d(FBG^H78b6`d^GwXAraEtURqG#Ae>n}O!y^Iyc1I}>wHY0 z=&3fc$}uiBb~aUMH*Q;SH^Vy@5b-CJ%fY%rd3AL&nEF}JS-wp&NEC|+KoMk*)!o3r zKz2oiLeyeVcE;*90{r=y7}aw)I5-zF5O6SO>olJk0l0ho^Jg7y_uT}#`EMDzZl9_J z$&Z47VkuRKUh!W9;m?NszlYU8%obmkihu6k4Oj03E=xZoC%YTvQx^iv zlNz(LyAHS)7Z=Svg8Og_&$oAWqMDkT`WO;|>NPbrvnwkV?|gg-CDXkkt*x!GkUvxM z^ZTBlyr@Lj%KO=Ozqe2vb7D92ghFjX9z$*59wJ=^b#5`>pUId zXX17}h|yzC2pXr-&AM0@ASc%;bp;I34T)+>BpI~0$l%ke3lxHfYI0St0C@;_xrDG09#w(0eVa|-(0Y>H>W;_+tH8aA^Q*B78VwJMFn|zU!5Enu*n z;I>+UAm0N!1`u;YwmZqKtQ(GeMvIB*X;W7PDlfCqo9F20^Y=4IWCbZoc)7D?-e@FT zMz(jUM=v9svn?qfmMPQy{h=E(R=wK>IIU2RliwZV+~?$hxm@>V^nI}t-9pYrYDkIIOmef1>VmC}n z7yzSNsHhy_(<~?~G#*ajblA?04^1QM^%JOo@kqF@Z8>M)JnqEa_eH^Dj9~(pM249D zTk`LDZHum8hC4NX${mH(qujbig5`=yo z)Nhj3uZD9QxQt8S4iA=K%mB&hMO_{DbzIe|c+odZM_+aCKAhpC1KhU1u(X9X{ceqF ze}9zh@v3JPIf5Wev(^T;x!{vc-5Jl%*HUr4%BtG0`R})7_oFQ$=sx3NBz*Eq+jZWr zp6>TtIwigJ%5ZJHUA&l*x;R*rdQAG1scAZ;oja}T+l+&YJL_4sup`ijjW<(kn{G8* zAu`N;$a6g!%tuL;>J*h~-cT!8%WoK*(4H4RI0p81`W;E$tmhVv@*-X5u*P$nv4yfFl zxL9f$8^j$2XE>`C9Oz%2tW%B=yj)W~yp)FJk6*h+g)Z&IphTTh2XB!YPI`{aH3 z-KaIz+q5u07nOT&`BHcH<1V~;RS$_Ap12_!e*`iwxd&jFKI3@pH?|%V$rX*|RgA_k z4e^5LHDl>^tQ~w0#}V`KJClYaSWB^lyKg^uw0(ll&>1&f(=_V0kQOY#nE^^6njY^D zYneXy&Rf^cBCOP51&C5n{%Gv5X*w45bWKRxF!ZsL&4V&=X!i5@(eOKnXwN5$A9qIt zobG+0e!Jy{O|{`3w!XgZa<$Fg6S+&V!>B}ZhNv7cMrq_l3)2An_CQ{C`s#Ybmyu4T z$gY!wA!qzWTq3DI`0?Q~gDmH)E#&r~QTB{nkT>eOQ1`xI>u@6ds*`Fj6E!!Nt!dyR z)+le9hgC<0$k)A9f~v!Q&C+SM+em#(AqSy29S4|Mx`=1Y*9hHP27irZiv>bhAnrRi=eS3R( z7|(zA8~tR1y><3LNT%LdJ!4C0^af5=NWSMh2-ze20jN-MUy$Ws36vi> z9d*PP3zwXNIcPeSxfTE5qWcltKL~Oie}#Om_&BAiYwbJkdw)K?pR+?EezuSi*^Dzg z=QH>Iy?^B}m(&@!_h`X!WHvW)n-hem?G$NvfWWQ5Pn!DbhpXNgRMT!ryDgswa~DS?3zGjTtd84nxZCBB)i*~l zFsq1Lq;$O9`c3|N#InJEQqd(qUQN5U-<%mDPPDlJwOwvpA0a7n47*go4L2yu1U^ac d-Pl);*l##;Fe=dAKz@QJAtEDO@+{& literal 0 HcmV?d00001 From 17c0de982ff2677e650754baf155b5b9f804aa13 Mon Sep 17 00:00:00 2001 From: Heather Halter Date: Mon, 15 Jul 2024 07:47:11 -0700 Subject: [PATCH 023/154] Update CONTRIBUTING.md (#7702) * Update CONTRIBUTING.md Tweaked the wording in the troubleshooting section. Signed-off-by: Heather Halter * Update CONTRIBUTING.md Co-authored-by: Nathan Bower Signed-off-by: Heather Halter * Update CONTRIBUTING.md Co-authored-by: Nathan Bower Signed-off-by: Heather Halter --------- Signed-off-by: Heather Halter Co-authored-by: Nathan Bower --- CONTRIBUTING.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index de44bbe4ee..7afa9d7596 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -100,10 +100,10 @@ Follow these steps to set up your local copy of the repository: #### Troubleshooting -If you encounter an error while trying to build the documentation website, find the error in the following troubleshooting list: +Try the following troubleshooting steps if you encounter an error when trying to build the documentation website: -- When running `rvm install 3.2` if you receive a `Error running '__rvm_make -j10'`, resolve this by running `rvm install 3.2.0 -C --with-openssl-dir=/opt/homebrew/opt/openssl@3.2` instead of `rvm install 3.2`. -- If receive a `bundle install`: `An error occurred while installing posix-spawn (0.3.15), and Bundler cannot continue.` error when trying to run `bundle install`, resolve this by running `gem install posix-spawn -v 0.3.15 -- --with-cflags=\"-Wno-incompatible-function-pointer-types\"`. Then, run `bundle install`. +- If you see the `Error running '__rvm_make -j10'` error when running `rvm install 3.2`, you can resolve it by running `rvm install 3.2.0 -C --with-openssl-dir=/opt/homebrew/opt/openssl@3.2` instead of `rvm install 3.2`. +- If you see the `bundle install`: `An error occurred while installing posix-spawn (0.3.15), and Bundler cannot continue.` error when trying to run `bundle install`, you can resolve it by running `gem install posix-spawn -v 0.3.15 -- --with-cflags=\"-Wno-incompatible-function-pointer-types\"` and then `bundle install`. From 9c3925429ded1ee73f0b6e5148d41d54d275fc11 Mon Sep 17 00:00:00 2001 From: Daniel Widdis Date: Mon, 15 Jul 2024 10:53:25 -0700 Subject: [PATCH 024/154] Add new update_fields parameter to update workflow API (#7632) * Add new update_fields parameter to update workflow API Signed-off-by: Daniel Widdis * Update _automating-configurations/api/create-workflow.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Daniel Widdis * Update _automating-configurations/api/create-workflow.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Daniel Widdis * Update _automating-configurations/api/create-workflow.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Daniel Widdis * Fixes from doc review Signed-off-by: Daniel Widdis * Update _automating-configurations/api/create-workflow.md Co-authored-by: Nathan Bower Signed-off-by: Daniel Widdis --------- Signed-off-by: Daniel Widdis Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- .../api/create-workflow.md | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/_automating-configurations/api/create-workflow.md b/_automating-configurations/api/create-workflow.md index 5c501ce4e8..83c0110ac3 100644 --- a/_automating-configurations/api/create-workflow.md +++ b/_automating-configurations/api/create-workflow.md @@ -20,9 +20,9 @@ You can include placeholder expressions in the value of workflow step fields. Fo Once a workflow is created, provide its `workflow_id` to other APIs. -The `POST` method creates a new workflow. The `PUT` method updates an existing workflow. +The `POST` method creates a new workflow. The `PUT` method updates an existing workflow. You can specify the `update_fields` parameter to update specific fields. -You can only update a workflow if it has not yet been provisioned. +You can only update a complete workflow if it has not yet been provisioned. {: .note} ## Path and HTTP methods @@ -58,11 +58,26 @@ POST /_plugins/_flow_framework/workflow?validation=none ``` {% include copy-curl.html %} +You cannot update a full workflow once it has been provisioned, but you can update fields other than the `workflows` field, such as `name` and `description`: + +```json +PUT /_plugins/_flow_framework/workflow/?update_fields=true +{ + "name": "new-template-name", + "description": "A new description for the existing template" +} +``` +{% include copy-curl.html %} + +You cannot specify both the `provision` and `update_fields` parameters at the same time. +{: .note} + The following table lists the available query parameters. All query parameters are optional. User-provided parameters are only allowed if the `provision` parameter is set to `true`. | Parameter | Data type | Description | | :--- | :--- | :--- | | `provision` | Boolean | Whether to provision the workflow as part of the request. Default is `false`. | +| `update_fields` | Boolean | Whether to update only the fields included in the request body. Default is `false`. | | `validation` | String | Whether to validate the workflow. Valid values are `all` (validate the template) and `none` (do not validate the template). Default is `all`. | | User-provided substitution expressions | String | Parameters matching substitution expressions in the template. Only allowed if `provision` is set to `true`. Optional. If `provision` is set to `false`, you can pass these parameters in the [Provision Workflow API query parameters]({{site.url}}{{site.baseurl}}/automating-configurations/api/provision-workflow/#query-parameters). | From 3fb528cc387c1fc9ffeb0db19cee5a332344775e Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Mon, 15 Jul 2024 13:53:51 -0400 Subject: [PATCH 025/154] Correct k-NN settings and add more (#7693) * Correct k-NN settings and add more Signed-off-by: Fanit Kolchina * Add heading Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- _search-plugins/knn/settings.md | 37 +++++++++++++++++++++------------ 1 file changed, 24 insertions(+), 13 deletions(-) diff --git a/_search-plugins/knn/settings.md b/_search-plugins/knn/settings.md index f4ef057cfb..4d84cc80bb 100644 --- a/_search-plugins/knn/settings.md +++ b/_search-plugins/knn/settings.md @@ -12,17 +12,28 @@ The k-NN plugin adds several new cluster settings. To learn more about static an ## Cluster settings +The following table lists all available cluster-level k-NN settings. For more information about cluster settings, see [Configuring OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#updating-cluster-settings-using-the-api) and [Updating cluster settings using the API]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#updating-cluster-settings-using-the-api). + +Setting | Static/Dynamic | Default | Description +:--- | :--- | :--- | :--- +`knn.plugin.enabled`| Dynamic | `true` | Enables or disables the k-NN plugin. +`knn.algo_param.index_thread_qty` | Dynamic | `1` | The number of threads used for native library index creation. Keeping this value low reduces the CPU impact of the k-NN plugin but also reduces indexing performance. +`knn.cache.item.expiry.enabled` | Dynamic | `false` | Whether to remove native library indexes that have not been accessed for a certain duration from memory. +`knn.cache.item.expiry.minutes` | Dynamic | `3h` | If enabled, the amount of idle time before a native library index is removed from memory. +`knn.circuit_breaker.unset.percentage` | Dynamic | `75` | The native memory usage threshold for the circuit breaker. Memory usage must be lower than this percentage of `knn.memory.circuit_breaker.limit` in order for `knn.circuit_breaker.triggered` to remain `false`. +`knn.circuit_breaker.triggered` | Dynamic | `false` | True when memory usage exceeds the `knn.circuit_breaker.unset.percentage` value. +`knn.memory.circuit_breaker.limit` | Dynamic | `50%` | The native memory limit for native library indexes. At the default value, if a machine has 100 GB of memory and the JVM uses 32 GB, then the k-NN plugin uses 50% of the remaining 68 GB (34 GB). If memory usage exceeds this value, then the plugin removes the native library indexes used least recently. +`knn.memory.circuit_breaker.enabled` | Dynamic | `true` | Whether to enable the k-NN memory circuit breaker. +`knn.model.index.number_of_shards`| Dynamic | `1` | The number of shards to use for the model system index, which is the OpenSearch index that stores the models used for approximate nearest neighbor (ANN) search. +`knn.model.index.number_of_replicas`| Dynamic | `1` | The number of replica shards to use for the model system index. Generally, in a multi-node cluster, this value should be at least 1 in order to increase stability. +`knn.model.cache.size.limit` | Dynamic | `10%` | The model cache limit cannot exceed 25% of the JVM heap. +`knn.faiss.avx2.disabled` | Static | `false` | A static setting that specifies whether to disable the SIMD-based `libopensearchknn_faiss_avx2.so` library and load the non-optimized `libopensearchknn_faiss.so` library for the Faiss engine on machines with x64 architecture. For more information, see [SIMD optimization for the Faiss engine]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#simd-optimization-for-the-faiss-engine). + +## Index settings + +The following table lists all available index-level k-NN settings. All settings are static. For information about updating static index-level settings, see [Updating a static index setting]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index-settings/#updating-a-static-index-setting). + Setting | Default | Description -:--- | :--- | :--- -`knn.algo_param.index_thread_qty` | 1 | The number of threads used for native library index creation. Keeping this value low reduces the CPU impact of the k-NN plugin, but also reduces indexing performance. -`knn.cache.item.expiry.enabled` | false | Whether to remove native library indexes that have not been accessed for a certain duration from memory. -`knn.cache.item.expiry.minutes` | 3h | If enabled, the idle time before removing a native library index from memory. -`knn.circuit_breaker.unset.percentage` | 75% | The native memory usage threshold for the circuit breaker. Memory usage must be below this percentage of `knn.memory.circuit_breaker.limit` for `knn.circuit_breaker.triggered` to remain false. -`knn.circuit_breaker.triggered` | false | True when memory usage exceeds the `knn.circuit_breaker.unset.percentage` value. -`knn.memory.circuit_breaker.limit` | 50% | The native memory limit for native library indexes. At the default value, if a machine has 100 GB of memory and the JVM uses 32 GB, the k-NN plugin uses 50% of the remaining 68 GB (34 GB). If memory usage exceeds this value, k-NN removes the least recently used native library indexes. -`knn.memory.circuit_breaker.enabled` | true | Whether to enable the k-NN memory circuit breaker. -`knn.plugin.enabled`| true | Enables or disables the k-NN plugin. -`knn.model.index.number_of_shards`| 1 | The number of shards to use for the model system index, the OpenSearch index that stores the models used for Approximate Nearest Neighbor (ANN) search. -`knn.model.index.number_of_replicas`| 1 | The number of replica shards to use for the model system index. Generally, in a multi-node cluster, this should be at least 1 to increase stability. -`knn.advanced.filtered_exact_search_threshold`| null | The threshold value for the filtered IDs that is used to switch to exact search during filtered ANN search. If the number of filtered IDs in a segment is less than this setting's value, exact search will be performed on the filtered IDs. -`knn.faiss.avx2.disabled` | False | A static setting that specifies whether to disable the SIMD-based `libopensearchknn_faiss_avx2.so` library and load the non-optimized `libopensearchknn_faiss.so` library for the Faiss engine on machines with x64 architecture. For more information, see [SIMD optimization for the Faiss engine]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#simd-optimization-for-the-faiss-engine). +:--- | :--- | :--- +`index.knn.advanced.filtered_exact_search_threshold`| `null` | The filtered ID threshold value used to switch to exact search during filtered ANN search. If the number of filtered IDs in a segment is lower than this setting's value, then exact search will be performed on the filtered IDs. +`index.knn.algo_param.ef_search` | `100` | `ef` (or `efSearch`) represents the size of the dynamic list for the nearest neighbors used during a search. Higher `ef` values lead to a more accurate but slower search. `ef` cannot be set to a value lower than the number of queried nearest neighbors, `k`. `ef` can take any value between `k` and the size of the dataset. \ No newline at end of file From b312b53b18bce18c4dad5225a61c0c184dbb1381 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Mon, 15 Jul 2024 14:08:34 -0400 Subject: [PATCH 026/154] Add permission to write on PRs to the PR checklist workflow (#7711) Signed-off-by: Fanit Kolchina --- .github/workflows/pr_checklist.yml | 3 +++ 1 file changed, 3 insertions(+) diff --git a/.github/workflows/pr_checklist.yml b/.github/workflows/pr_checklist.yml index c2c5c6db53..accce0f882 100644 --- a/.github/workflows/pr_checklist.yml +++ b/.github/workflows/pr_checklist.yml @@ -4,6 +4,9 @@ on: pull_request: types: [opened] +permissions: + pull-requests: write + jobs: add-checklist: runs-on: ubuntu-latest From 72713abeffa854cf361dcaa2407c66c2253e7039 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Mon, 15 Jul 2024 14:36:28 -0400 Subject: [PATCH 027/154] Remove model requirement from hybrid search documentation (#7511) * Remove model requirement from hybrid search documentation Signed-off-by: Fanit Kolchina * Review comment Signed-off-by: Fanit Kolchina * Revised sentence Signed-off-by: Fanit Kolchina * Update _search-plugins/hybrid-search.md Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- _query-dsl/compound/hybrid.md | 6 +----- _search-plugins/hybrid-search.md | 2 +- 2 files changed, 2 insertions(+), 6 deletions(-) diff --git a/_query-dsl/compound/hybrid.md b/_query-dsl/compound/hybrid.md index e573d17676..22b3a17fc1 100644 --- a/_query-dsl/compound/hybrid.md +++ b/_query-dsl/compound/hybrid.md @@ -12,11 +12,7 @@ You can use a hybrid query to combine relevance scores from multiple queries int ## Example -Before using a `hybrid` query, you must set up a machine learning (ML) model, ingest documents, and configure a search pipeline with a [`normalization-processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/). - -To learn how to set up an ML model, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model). - -Once you set up an ML model, learn how to use the `hybrid` query by following the steps in [Using hybrid search]({{site.url}}{{site.baseurl}}/search-plugins/hybrid-search/#using-hybrid-search). +Learn how to use the `hybrid` query by following the steps in [Using hybrid search]({{site.url}}{{site.baseurl}}/search-plugins/hybrid-search/#using-hybrid-search). For a comprehensive example, follow the [Neural search tutorial]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search#tutorial). diff --git a/_search-plugins/hybrid-search.md b/_search-plugins/hybrid-search.md index b0fb4d5bef..7f08d63d0f 100644 --- a/_search-plugins/hybrid-search.md +++ b/_search-plugins/hybrid-search.md @@ -12,7 +12,7 @@ Introduced 2.11 Hybrid search combines keyword and neural search to improve search relevance. To implement hybrid search, you need to set up a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) that runs at search time. The search pipeline you'll configure intercepts search results at an intermediate stage and applies the [`normalization_processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/) to them. The `normalization_processor` normalizes and combines the document scores from multiple query clauses, rescoring the documents according to the chosen normalization and combination techniques. **PREREQUISITE**
-Before using hybrid search, you must set up a text embedding model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model). +To follow this example, you must set up a text embedding model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model). If you have already generated text embeddings, ingest the embeddings into an index and skip to [Step 4](#step-4-configure-a-search-pipeline). {: .note} ## Using hybrid search From 0245610ceabac73c00ebeb36b258743250bc80ad Mon Sep 17 00:00:00 2001 From: David Venable Date: Mon, 15 Jul 2024 14:27:02 -0500 Subject: [PATCH 028/154] Adds IntelliJ's *.iml to the .gitignore. (#7705) Signed-off-by: David Venable --- .gitignore | 1 + 1 file changed, 1 insertion(+) diff --git a/.gitignore b/.gitignore index ae2249e73f..446d1deda6 100644 --- a/.gitignore +++ b/.gitignore @@ -4,4 +4,5 @@ _site .DS_Store Gemfile.lock .idea +*.iml .jekyll-cache From a7a71551b6ae8f9502f52a06a5c8edae99adb3a2 Mon Sep 17 00:00:00 2001 From: gaobinlong Date: Tue, 16 Jul 2024 08:55:09 +0800 Subject: [PATCH 029/154] Fix format issue for the split ingest processor documentation (#7695) * Fix format issue for the split ingest processor documentation Signed-off-by: gaobinlong * Update _ingest-pipelines/processors/split.md Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Heather Halter * Update _ingest-pipelines/processors/split.md Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Heather Halter * Apply suggestions from code review Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Heather Halter * Update _ingest-pipelines/processors/split.md Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Heather Halter --------- Signed-off-by: gaobinlong Signed-off-by: Heather Halter Co-authored-by: Heather Halter Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _ingest-pipelines/processors/split.md | 25 ++++++++++++------------- 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/_ingest-pipelines/processors/split.md b/_ingest-pipelines/processors/split.md index 2052c3def1..c424ef671c 100644 --- a/_ingest-pipelines/processors/split.md +++ b/_ingest-pipelines/processors/split.md @@ -26,19 +26,18 @@ The following is the syntax for the `split` processor: The following table lists the required and optional parameters for the `split` processor. -Parameter | Required/Optional | Description | -|-----------|-----------|-----------| -`field` | Required | The field containing the string to be split. -`separator` | Required | The delimiter used to split the string. This can be a regular expression pattern. -`preserve_field` | Optional | If set to `true`, preserves empty trailing fields (for example, `''`) in the resulting array. If set to `false`, empty trailing fields are removed from the resulting array. Default is `false`. -`target_field` | Optional | The field where the array of substrings is stored. If not specified, then the field is updated in-place. -`ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not contain the specified -field. If set to `true`, then the processor ignores missing values in the field and leaves the `target_field` unchanged. Default is `false`. -`description` | Optional | A brief description of the processor. -`if` | Optional | A condition for running the processor. -`ignore_failure` | Optional | Specifies whether the processor continues execution even if it encounters an error. If set to `true`, then failures are ignored. Default is `false`. -`on_failure` | Optional | A list of processors to run if the processor fails. -`tag` | Optional | An identifier tag for the processor. Useful for debugging in order to distinguish between processors of the same type. +Parameter | Required/Optional | Description +:--- | :--- | :--- +`field` | Required | The field containing the string to be split. +`separator` | Required | The delimiter used to split the string. This can be a regular expression pattern. +`preserve_field` | Optional | If set to `true`, preserves empty trailing fields (for example, `''`) in the resulting array. If set to `false`, empty trailing fields are removed from the resulting array. Default is `false`. +`target_field` | Optional | The field where the array of substrings is stored. If not specified, then the field is updated in-place. +`ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not contain the specified field. If set to `true`, then the processor ignores missing values in the field and leaves the `target_field` unchanged. Default is `false`. +`description` | Optional | A brief description of the processor. +`if` | Optional | A condition for running the processor. +`ignore_failure` | Optional | Specifies whether the processor continues execution even if it encounters an error. If set to `true`, then failures are ignored. Default is `false`. +`on_failure` | Optional | A list of processors to run if the processor fails. +`tag` | Optional | An identifier tag for the processor. Useful for debugging in order to distinguish between processors of the same type. ## Using the processor From ded24153401d9447bb12d628a256c7ad62c962e3 Mon Sep 17 00:00:00 2001 From: zhichao-aws Date: Tue, 16 Jul 2024 22:38:04 +0800 Subject: [PATCH 030/154] Improve wording for the 2 search mode of neural sparse documentation (#7718) * improve wording for ns Signed-off-by: zhichao-aws * Apply suggestions from code review Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: zhichao-aws Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _search-plugins/neural-sparse-search.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_search-plugins/neural-sparse-search.md b/_search-plugins/neural-sparse-search.md index b2b4fc33d6..8aa2ff7dbf 100644 --- a/_search-plugins/neural-sparse-search.md +++ b/_search-plugins/neural-sparse-search.md @@ -16,8 +16,8 @@ Introduced 2.11 When selecting a model, choose one of the following options: -- Use a sparse encoding model at both ingestion time and search time (high performance, relatively high latency). -- Use a sparse encoding model at ingestion time and a tokenizer at search time for relatively low performance and low latency. The tokenism doesn't conduct model inference, so you can deploy and invoke a tokenizer using the ML Commons Model API for a more consistent experience. +- Use a sparse encoding model at both ingestion time and search time for better search relevance at the expense of relatively high latency. +- Use a sparse encoding model at ingestion time and a tokenizer at search time for lower search latency at the expense of relatively lower search relevance. Tokenization doesn't involve model inference, so you can deploy and invoke a tokenizer using the ML Commons Model API for a more streamlined experience. **PREREQUISITE**
Before using neural sparse search, make sure to set up a [pretrained sparse embedding model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#sparse-encoding-models) or your own sparse embedding model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model). From da41ad8322b5e0979430325ddeffa7e754510308 Mon Sep 17 00:00:00 2001 From: Landon Lengyel Date: Tue, 16 Jul 2024 12:55:08 -0600 Subject: [PATCH 031/154] Update nodes-stats.md (#7721) Fixing typo Signed-off-by: Landon Lengyel --- _api-reference/nodes-apis/nodes-stats.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/nodes-apis/nodes-stats.md b/_api-reference/nodes-apis/nodes-stats.md index 145b3d5b24..ca6810b961 100644 --- a/_api-reference/nodes-apis/nodes-stats.md +++ b/_api-reference/nodes-apis/nodes-stats.md @@ -44,7 +44,7 @@ thread_pool | Statistics about each thread pool for the node. fs | File system statistics, such as read/write statistics, data path, and free disk space. transport | Transport layer statistics about send/receive in cluster communication. http | Statistics about the HTTP layer. -breaker | Statistics about the field data circuit breakers. +breakers | Statistics about the field data circuit breakers. script | Statistics about scripts, such as compilations and cache evictions. discovery | Statistics about cluster states. ingest | Statistics about ingest pipelines. From b6478f593377089e44efebf320d258b4b4f3e823 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Tue, 16 Jul 2024 16:37:57 -0400 Subject: [PATCH 032/154] Update PR comment workflow to use pull request target (#7723) Signed-off-by: Fanit Kolchina --- .github/workflows/pr_checklist.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/pr_checklist.yml b/.github/workflows/pr_checklist.yml index accce0f882..4130f5e2bd 100644 --- a/.github/workflows/pr_checklist.yml +++ b/.github/workflows/pr_checklist.yml @@ -1,7 +1,7 @@ name: PR Checklist on: - pull_request: + pull_request_target: types: [opened] permissions: From e3ee238aadac34a42b6880dede93c01c0a50dcda Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Tue, 16 Jul 2024 17:09:39 -0400 Subject: [PATCH 033/154] Add 1.3.18 to version history (#7726) Signed-off-by: Fanit Kolchina --- _about/version-history.md | 1 + 1 file changed, 1 insertion(+) diff --git a/_about/version-history.md b/_about/version-history.md index 0d6d844951..09f331b235 100644 --- a/_about/version-history.md +++ b/_about/version-history.md @@ -30,6 +30,7 @@ OpenSearch version | Release highlights | Release date [2.0.1](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.0.1.md) | Includes bug fixes and maintenance updates for Alerting and Anomaly Detection. | 16 June 2022 [2.0.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.0.0.md) | Includes document-level monitors for alerting, OpenSearch Notifications plugins, and Geo Map Tiles in OpenSearch Dashboards. Also adds support for Lucene 9 and bug fixes for all OpenSearch plugins. For a full list of release highlights, see the Release Notes. | 26 May 2022 [2.0.0-rc1](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.0.0-rc1.md) | The Release Candidate for 2.0.0. This version allows you to preview the upcoming 2.0.0 release before the GA release. The preview release adds document-level alerting, support for Lucene 9, and the ability to use term lookup queries in document level security. | 03 May 2022 +[1.3.18](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.18.md) | Includes maintenance updates for OpenSearch security. | 16 July 2024 [1.3.17](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.17.md) | Includes maintenance updates for OpenSearch security and OpenSearch Dashboards security. | 06 June 2024 [1.3.16](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.16.md) | Includes bug fixes and maintenance updates for OpenSearch security, index management, performance analyzer, and reporting. | 23 April 2024 [1.3.15](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.15.md) | Includes bug fixes and maintenance updates for cross-cluster replication, SQL, OpenSearch Dashboards reporting, and alerting. | 05 March 2024 From e05b4004a77c0b5208a339e6abef98121e43b690 Mon Sep 17 00:00:00 2001 From: Heather Halter Date: Wed, 17 Jul 2024 11:41:44 -0700 Subject: [PATCH 034/154] Updates PPL description (#7637) * Update index.md Updated description of PPL. Signed-off-by: Heather Halter * Update index.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update search-plugins/sql/ppl/index/ Signed-off-by: Heather Halter * Update _search-plugins/sql/ppl/index.md Co-authored-by: Nathan Bower Signed-off-by: Heather Halter --------- Signed-off-by: Heather Halter Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Nathan Bower --- _search-plugins/sql/ppl/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/sql/ppl/index.md b/_search-plugins/sql/ppl/index.md index 602255d126..bda67aba36 100644 --- a/_search-plugins/sql/ppl/index.md +++ b/_search-plugins/sql/ppl/index.md @@ -18,7 +18,7 @@ redirect_from: # PPL -Piped Processing Language (PPL) is a query language that focuses on processing data in a sequential, step-by-step manner. PPL uses the pipe (`|`) operator to combine commands to find and retrieve data. It is the primary language used with observability in OpenSearch and supports multi-data queries. +Piped Processing Language (PPL) is a query language that focuses on processing data in a sequential, step-by-step manner. PPL uses the pipe (`|`) operator to combine commands to find and retrieve data. It is particularly well suited for analyzing observability data, such as logs, metrics, and traces, due to its ability to handle semi-structured data efficiently. ## PPL syntax From d2da95d9cbec2d325df083bca817880af32d3180 Mon Sep 17 00:00:00 2001 From: Miki Date: Wed, 17 Jul 2024 12:41:35 -0700 Subject: [PATCH 035/154] Redirect to `latest` when the latest version is picked from the version selector (#7759) Signed-off-by: Miki --- assets/js/_version-selector.js | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/assets/js/_version-selector.js b/assets/js/_version-selector.js index e5efcaaf2c..23730e67f0 100644 --- a/assets/js/_version-selector.js +++ b/assets/js/_version-selector.js @@ -192,7 +192,10 @@ class VersionSelector extends HTMLElement { frag.querySelector('#selected').textContent = `${PREFIX}${this.getAttribute('selected')}`; const pathName = location.pathname.replace(/\/docs(\/((latest|\d+\.\d+)\/?)?)?/, ''); - const versionsDOMNodes = DOC_VERSIONS.map((v, idx) => `${PREFIX}${v}`); + const versionsDOMNodes = DOC_VERSIONS.map((v, idx) => v === DOC_VERSION_LATEST + ? `${PREFIX}${v}` + : `${PREFIX}${v}`, + ); if (Array.isArray(DOC_VERSIONS_ARCHIVED) && DOC_VERSIONS_ARCHIVED.length) { versionsDOMNodes.push( `Show archived`, From d8f1b7b95d3d90f130b6b590ee7526ca0c8a186a Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Wed, 17 Jul 2024 16:56:02 -0400 Subject: [PATCH 036/154] Fix breadcrumbs by excluding collection index pages from parent relationship (#7758) Signed-off-by: Fanit Kolchina --- _aggregations/bucket/adjacency-matrix.md | 1 - _aggregations/bucket/date-histogram.md | 1 - _aggregations/bucket/date-range.md | 1 - _aggregations/bucket/diversified-sampler.md | 1 - _aggregations/bucket/filter.md | 1 - _aggregations/bucket/filters.md | 1 - _aggregations/bucket/geo-distance.md | 1 - _aggregations/bucket/geohash-grid.md | 1 - _aggregations/bucket/geohex-grid.md | 1 - _aggregations/bucket/geotile-grid.md | 1 - _aggregations/bucket/global.md | 1 - _aggregations/bucket/histogram.md | 1 - _aggregations/bucket/ip-range.md | 1 - _aggregations/bucket/missing.md | 1 - _aggregations/bucket/multi-terms.md | 1 - _aggregations/bucket/nested.md | 1 - _aggregations/bucket/range.md | 1 - _aggregations/bucket/reverse-nested.md | 1 - _aggregations/bucket/sampler.md | 1 - _aggregations/bucket/significant-terms.md | 1 - _aggregations/bucket/significant-text.md | 1 - _aggregations/bucket/terms.md | 1 - _aggregations/index.md | 2 +- _aggregations/metric/average.md | 1 - _aggregations/metric/cardinality.md | 1 - _aggregations/metric/extended-stats.md | 1 - _aggregations/metric/geobounds.md | 1 - _aggregations/metric/geocentroid.md | 1 - _aggregations/metric/matrix-stats.md | 1 - _aggregations/metric/maximum.md | 1 - _aggregations/metric/median-absolute-deviation.md | 1 - _aggregations/metric/minimum.md | 1 - _aggregations/metric/percentile-ranks.md | 1 - _aggregations/metric/percentile.md | 1 - _aggregations/metric/scripted-metric.md | 1 - _aggregations/metric/stats.md | 1 - _aggregations/metric/sum.md | 1 - _aggregations/metric/top-hits.md | 1 - _aggregations/metric/value-count.md | 1 - _aggregations/metric/weighted-avg.md | 1 - _query-dsl/compound/bool.md | 1 - _query-dsl/compound/boosting.md | 1 - _query-dsl/compound/constant-score.md | 1 - _query-dsl/compound/disjunction-max.md | 1 - _query-dsl/compound/function-score.md | 1 - _query-dsl/compound/hybrid.md | 1 - _query-dsl/full-text/intervals.md | 1 - _query-dsl/full-text/match-bool-prefix.md | 1 - _query-dsl/full-text/match-phrase-prefix.md | 1 - _query-dsl/full-text/match-phrase.md | 1 - _query-dsl/full-text/match.md | 1 - _query-dsl/full-text/multi-match.md | 1 - _query-dsl/full-text/query-string.md | 2 -- _query-dsl/full-text/simple-query-string.md | 1 - _query-dsl/geo-and-xy/geo-bounding-box.md | 1 - _query-dsl/geo-and-xy/geodistance.md | 1 - _query-dsl/geo-and-xy/geopolygon.md | 1 - _query-dsl/geo-and-xy/xy.md | 2 -- _query-dsl/specialized/neural-sparse.md | 1 - _query-dsl/specialized/neural.md | 1 - _query-dsl/specialized/script-score.md | 1 - _query-dsl/term/exists.md | 1 - _query-dsl/term/fuzzy.md | 1 - _query-dsl/term/ids.md | 1 - _query-dsl/term/prefix.md | 1 - _query-dsl/term/range.md | 1 - _query-dsl/term/regexp.md | 1 - _query-dsl/term/term.md | 1 - _query-dsl/term/terms-set.md | 1 - _query-dsl/term/terms.md | 1 - _query-dsl/term/wildcard.md | 1 - _search-plugins/knn/api.md | 1 - _search-plugins/knn/approximate-knn.md | 1 - _search-plugins/knn/filter-search-knn.md | 1 - _search-plugins/knn/jni-libraries.md | 1 - _search-plugins/knn/knn-index.md | 1 - _search-plugins/knn/knn-score-script.md | 1 - _search-plugins/knn/knn-vector-quantization.md | 1 - _search-plugins/knn/nested-search-knn.md | 1 - _search-plugins/knn/painless-functions.md | 1 - _search-plugins/knn/performance-tuning.md | 1 - _search-plugins/knn/radial-search-knn.md | 1 - _search-plugins/knn/settings.md | 1 - 83 files changed, 1 insertion(+), 85 deletions(-) diff --git a/_aggregations/bucket/adjacency-matrix.md b/_aggregations/bucket/adjacency-matrix.md index fd521f8510..cf62295763 100644 --- a/_aggregations/bucket/adjacency-matrix.md +++ b/_aggregations/bucket/adjacency-matrix.md @@ -2,7 +2,6 @@ layout: default title: Adjacency matrix parent: Bucket aggregations -grand_parent: Aggregations nav_order: 10 redirect_from: - /query-dsl/aggregations/bucket/adjacency-matrix/ diff --git a/_aggregations/bucket/date-histogram.md b/_aggregations/bucket/date-histogram.md index e308104e16..db3fe6884e 100644 --- a/_aggregations/bucket/date-histogram.md +++ b/_aggregations/bucket/date-histogram.md @@ -2,7 +2,6 @@ layout: default title: Date histogram parent: Bucket aggregations -grand_parent: Aggregations nav_order: 20 redirect_from: - /query-dsl/aggregations/bucket/date-histogram/ diff --git a/_aggregations/bucket/date-range.md b/_aggregations/bucket/date-range.md index c7d66d729d..d2498a53da 100644 --- a/_aggregations/bucket/date-range.md +++ b/_aggregations/bucket/date-range.md @@ -2,7 +2,6 @@ layout: default title: Date range parent: Bucket aggregations -grand_parent: Aggregations nav_order: 30 redirect_from: - /query-dsl/aggregations/bucket/date-range/ diff --git a/_aggregations/bucket/diversified-sampler.md b/_aggregations/bucket/diversified-sampler.md index 7249ac3555..a62410cc8c 100644 --- a/_aggregations/bucket/diversified-sampler.md +++ b/_aggregations/bucket/diversified-sampler.md @@ -2,7 +2,6 @@ layout: default title: Diversified sampler parent: Bucket aggregations -grand_parent: Aggregations nav_order: 40 redirect_from: - /query-dsl/aggregations/bucket/diversified-sampler/ diff --git a/_aggregations/bucket/filter.md b/_aggregations/bucket/filter.md index 0768ea1148..58624c222c 100644 --- a/_aggregations/bucket/filter.md +++ b/_aggregations/bucket/filter.md @@ -2,7 +2,6 @@ layout: default title: Filter parent: Bucket aggregations -grand_parent: Aggregations nav_order: 50 redirect_from: - /query-dsl/aggregations/bucket/filter/ diff --git a/_aggregations/bucket/filters.md b/_aggregations/bucket/filters.md index b3977da7c1..2e9270b30d 100644 --- a/_aggregations/bucket/filters.md +++ b/_aggregations/bucket/filters.md @@ -2,7 +2,6 @@ layout: default title: Filters parent: Bucket aggregations -grand_parent: Aggregations nav_order: 60 redirect_from: - /query-dsl/aggregations/bucket/filters/ diff --git a/_aggregations/bucket/geo-distance.md b/_aggregations/bucket/geo-distance.md index a111015ac1..7b8e660630 100644 --- a/_aggregations/bucket/geo-distance.md +++ b/_aggregations/bucket/geo-distance.md @@ -2,7 +2,6 @@ layout: default title: Geodistance parent: Bucket aggregations -grand_parent: Aggregations nav_order: 70 redirect_from: - /query-dsl/aggregations/bucket/geo-distance/ diff --git a/_aggregations/bucket/geohash-grid.md b/_aggregations/bucket/geohash-grid.md index 13f89799ba..3969ea9a13 100644 --- a/_aggregations/bucket/geohash-grid.md +++ b/_aggregations/bucket/geohash-grid.md @@ -2,7 +2,6 @@ layout: default title: Geohash grid parent: Bucket aggregations -grand_parent: Aggregations nav_order: 80 redirect_from: - /query-dsl/aggregations/bucket/geohash-grid/ diff --git a/_aggregations/bucket/geohex-grid.md b/_aggregations/bucket/geohex-grid.md index 03fd45e369..eef2ed0b21 100644 --- a/_aggregations/bucket/geohex-grid.md +++ b/_aggregations/bucket/geohex-grid.md @@ -2,7 +2,6 @@ layout: default title: Geohex grid parent: Bucket aggregations -grand_parent: Aggregations nav_order: 85 redirect_from: - /opensearch/geohexgrid-agg/ diff --git a/_aggregations/bucket/geotile-grid.md b/_aggregations/bucket/geotile-grid.md index dd0c4f8a1f..e8e80451f5 100644 --- a/_aggregations/bucket/geotile-grid.md +++ b/_aggregations/bucket/geotile-grid.md @@ -2,7 +2,6 @@ layout: default title: Geotile grid parent: Bucket aggregations -grand_parent: Aggregations nav_order: 87 redirect_from: - /query-dsl/aggregations/bucket/geotile-grid/ diff --git a/_aggregations/bucket/global.md b/_aggregations/bucket/global.md index bfd516b8a3..483c28a69e 100644 --- a/_aggregations/bucket/global.md +++ b/_aggregations/bucket/global.md @@ -2,7 +2,6 @@ layout: default title: Global parent: Bucket aggregations -grand_parent: Aggregations nav_order: 90 redirect_from: - /query-dsl/aggregations/bucket/global/ diff --git a/_aggregations/bucket/histogram.md b/_aggregations/bucket/histogram.md index 0d9f2bb964..b97755bcae 100644 --- a/_aggregations/bucket/histogram.md +++ b/_aggregations/bucket/histogram.md @@ -2,7 +2,6 @@ layout: default title: Histogram parent: Bucket aggregations -grand_parent: Aggregations nav_order: 100 redirect_from: - /query-dsl/aggregations/bucket/histogram/ diff --git a/_aggregations/bucket/ip-range.md b/_aggregations/bucket/ip-range.md index 897827d412..bef4eb8f1f 100644 --- a/_aggregations/bucket/ip-range.md +++ b/_aggregations/bucket/ip-range.md @@ -2,7 +2,6 @@ layout: default title: IP range parent: Bucket aggregations -grand_parent: Aggregations nav_order: 110 redirect_from: - /query-dsl/aggregations/bucket/ip-range/ diff --git a/_aggregations/bucket/missing.md b/_aggregations/bucket/missing.md index 547076859d..e9bd5981a1 100644 --- a/_aggregations/bucket/missing.md +++ b/_aggregations/bucket/missing.md @@ -2,7 +2,6 @@ layout: default title: Missing parent: Bucket aggregations -grand_parent: Aggregations nav_order: 120 redirect_from: - /query-dsl/aggregations/bucket/missing/ diff --git a/_aggregations/bucket/multi-terms.md b/_aggregations/bucket/multi-terms.md index 62a4d264e0..8c99a450fe 100644 --- a/_aggregations/bucket/multi-terms.md +++ b/_aggregations/bucket/multi-terms.md @@ -2,7 +2,6 @@ layout: default title: Multi-terms parent: Bucket aggregations -grand_parent: Aggregations nav_order: 130 redirect_from: - /query-dsl/aggregations/bucket/multi-terms/ diff --git a/_aggregations/bucket/nested.md b/_aggregations/bucket/nested.md index 94a0f4416a..89c44c6457 100644 --- a/_aggregations/bucket/nested.md +++ b/_aggregations/bucket/nested.md @@ -2,7 +2,6 @@ layout: default title: Nested parent: Bucket aggregations -grand_parent: Aggregations nav_order: 140 redirect_from: - /query-dsl/aggregations/bucket/nested/ diff --git a/_aggregations/bucket/range.md b/_aggregations/bucket/range.md index f4e19f188d..7b17b64ed4 100644 --- a/_aggregations/bucket/range.md +++ b/_aggregations/bucket/range.md @@ -2,7 +2,6 @@ layout: default title: Range parent: Bucket aggregations -grand_parent: Aggregations nav_order: 150 redirect_from: - /query-dsl/aggregations/bucket/date-range/ diff --git a/_aggregations/bucket/reverse-nested.md b/_aggregations/bucket/reverse-nested.md index bfd04986fa..3757ec7ab7 100644 --- a/_aggregations/bucket/reverse-nested.md +++ b/_aggregations/bucket/reverse-nested.md @@ -2,7 +2,6 @@ layout: default title: Reverse nested parent: Bucket aggregations -grand_parent: Aggregations nav_order: 160 redirect_from: - /query-dsl/aggregations/bucket/reverse-nested/ diff --git a/_aggregations/bucket/sampler.md b/_aggregations/bucket/sampler.md index 5411052d45..7d8d71e04a 100644 --- a/_aggregations/bucket/sampler.md +++ b/_aggregations/bucket/sampler.md @@ -2,7 +2,6 @@ layout: default title: Sampler parent: Bucket aggregations -grand_parent: Aggregations nav_order: 170 redirect_from: - /query-dsl/aggregations/bucket/diversified-sampler/ diff --git a/_aggregations/bucket/significant-terms.md b/_aggregations/bucket/significant-terms.md index 34a4354a73..c255379dd8 100644 --- a/_aggregations/bucket/significant-terms.md +++ b/_aggregations/bucket/significant-terms.md @@ -2,7 +2,6 @@ layout: default title: Significant terms parent: Bucket aggregations -grand_parent: Aggregations nav_order: 180 redirect_from: - /query-dsl/aggregations/bucket/significant-terms/ diff --git a/_aggregations/bucket/significant-text.md b/_aggregations/bucket/significant-text.md index 6f1c7ebeca..b30b3aba95 100644 --- a/_aggregations/bucket/significant-text.md +++ b/_aggregations/bucket/significant-text.md @@ -2,7 +2,6 @@ layout: default title: Significant text parent: Bucket aggregations -grand_parent: Aggregations nav_order: 190 redirect_from: - /query-dsl/aggregations/bucket/significant-text/ diff --git a/_aggregations/bucket/terms.md b/_aggregations/bucket/terms.md index 5d05c328d4..b36214e3f6 100644 --- a/_aggregations/bucket/terms.md +++ b/_aggregations/bucket/terms.md @@ -2,7 +2,6 @@ layout: default title: Terms parent: Bucket aggregations -grand_parent: Aggregations nav_order: 200 redirect_from: - /query-dsl/aggregations/bucket/terms/ diff --git a/_aggregations/index.md b/_aggregations/index.md index 385c7a09d8..a1c457be02 100644 --- a/_aggregations/index.md +++ b/_aggregations/index.md @@ -1,7 +1,7 @@ --- layout: default title: Aggregations -has_children: true +has_children: false nav_order: 5 nav_exclude: true permalink: /aggregations/ diff --git a/_aggregations/metric/average.md b/_aggregations/metric/average.md index 428f1e76b6..9ad0c582fe 100644 --- a/_aggregations/metric/average.md +++ b/_aggregations/metric/average.md @@ -2,7 +2,6 @@ layout: default title: Average parent: Metric aggregations -grand_parent: Aggregations nav_order: 10 redirect_from: - /query-dsl/aggregations/metric/average/ diff --git a/_aggregations/metric/cardinality.md b/_aggregations/metric/cardinality.md index c40dbb4497..e03a561adb 100644 --- a/_aggregations/metric/cardinality.md +++ b/_aggregations/metric/cardinality.md @@ -2,7 +2,6 @@ layout: default title: Cardinality parent: Metric aggregations -grand_parent: Aggregations nav_order: 20 redirect_from: - /query-dsl/aggregations/metric/cardinality/ diff --git a/_aggregations/metric/extended-stats.md b/_aggregations/metric/extended-stats.md index 633407dab0..467fa348b7 100644 --- a/_aggregations/metric/extended-stats.md +++ b/_aggregations/metric/extended-stats.md @@ -2,7 +2,6 @@ layout: default title: Extended stats parent: Metric aggregations -grand_parent: Aggregations nav_order: 30 redirect_from: - /query-dsl/aggregations/metric/extended-stats/ diff --git a/_aggregations/metric/geobounds.md b/_aggregations/metric/geobounds.md index 27b7646ca5..9489c6b18e 100644 --- a/_aggregations/metric/geobounds.md +++ b/_aggregations/metric/geobounds.md @@ -2,7 +2,6 @@ layout: default title: Geobounds parent: Metric aggregations -grand_parent: Aggregations nav_order: 40 redirect_from: - /query-dsl/aggregations/metric/geobounds/ diff --git a/_aggregations/metric/geocentroid.md b/_aggregations/metric/geocentroid.md index 711f49862a..14a2d179bb 100644 --- a/_aggregations/metric/geocentroid.md +++ b/_aggregations/metric/geocentroid.md @@ -2,7 +2,6 @@ layout: default title: Geocentroid parent: Metric aggregations -grand_parent: Aggregations nav_order: 45 --- diff --git a/_aggregations/metric/matrix-stats.md b/_aggregations/metric/matrix-stats.md index 475e0caa24..188f8745fb 100644 --- a/_aggregations/metric/matrix-stats.md +++ b/_aggregations/metric/matrix-stats.md @@ -2,7 +2,6 @@ layout: default title: Matrix stats parent: Metric aggregations -grand_parent: Aggregations nav_order: 50 redirect_from: - /query-dsl/aggregations/metric/matrix-stats/ diff --git a/_aggregations/metric/maximum.md b/_aggregations/metric/maximum.md index 63b4d62a7b..1a1aaff607 100644 --- a/_aggregations/metric/maximum.md +++ b/_aggregations/metric/maximum.md @@ -2,7 +2,6 @@ layout: default title: Maximum parent: Metric aggregations -grand_parent: Aggregations nav_order: 60 redirect_from: - /query-dsl/aggregations/metric/maximum/ diff --git a/_aggregations/metric/median-absolute-deviation.md b/_aggregations/metric/median-absolute-deviation.md index 7332d7eb2f..a882475158 100644 --- a/_aggregations/metric/median-absolute-deviation.md +++ b/_aggregations/metric/median-absolute-deviation.md @@ -2,7 +2,6 @@ layout: default title: Median absolute deviation parent: Metric aggregations -grand_parent: Aggregations nav_order: 65 redirect_from: - /query-dsl/aggregations/metric/median-absolute-deviation/ diff --git a/_aggregations/metric/minimum.md b/_aggregations/metric/minimum.md index dd17c854a9..9455c71fea 100644 --- a/_aggregations/metric/minimum.md +++ b/_aggregations/metric/minimum.md @@ -2,7 +2,6 @@ layout: default title: Minimum parent: Metric aggregations -grand_parent: Aggregations nav_order: 70 redirect_from: - /query-dsl/aggregations/metric/minimum/ diff --git a/_aggregations/metric/percentile-ranks.md b/_aggregations/metric/percentile-ranks.md index 33ccb3d291..660cb01bd1 100644 --- a/_aggregations/metric/percentile-ranks.md +++ b/_aggregations/metric/percentile-ranks.md @@ -2,7 +2,6 @@ layout: default title: Percentile ranks parent: Metric aggregations -grand_parent: Aggregations nav_order: 80 redirect_from: - /query-dsl/aggregations/metric/percentile-ranks/ diff --git a/_aggregations/metric/percentile.md b/_aggregations/metric/percentile.md index c68b0e0ec7..0f241306d1 100644 --- a/_aggregations/metric/percentile.md +++ b/_aggregations/metric/percentile.md @@ -2,7 +2,6 @@ layout: default title: Percentile parent: Metric aggregations -grand_parent: Aggregations nav_order: 90 redirect_from: - /query-dsl/aggregations/metric/percentile/ diff --git a/_aggregations/metric/scripted-metric.md b/_aggregations/metric/scripted-metric.md index d1807efbc0..4247f9aa0e 100644 --- a/_aggregations/metric/scripted-metric.md +++ b/_aggregations/metric/scripted-metric.md @@ -2,7 +2,6 @@ layout: default title: Scripted metric parent: Metric aggregations -grand_parent: Aggregations nav_order: 100 redirect_from: - /query-dsl/aggregations/metric/scripted-metric/ diff --git a/_aggregations/metric/stats.md b/_aggregations/metric/stats.md index 0a54831522..d8ba5963e0 100644 --- a/_aggregations/metric/stats.md +++ b/_aggregations/metric/stats.md @@ -2,7 +2,6 @@ layout: default title: Stats parent: Metric aggregations -grand_parent: Aggregations nav_order: 110 redirect_from: - /query-dsl/aggregations/metric/stats/ diff --git a/_aggregations/metric/sum.md b/_aggregations/metric/sum.md index 0320de63fc..2e0b32cb3d 100644 --- a/_aggregations/metric/sum.md +++ b/_aggregations/metric/sum.md @@ -2,7 +2,6 @@ layout: default title: Sum parent: Metric aggregations -grand_parent: Aggregations nav_order: 120 redirect_from: - /query-dsl/aggregations/metric/sum/ diff --git a/_aggregations/metric/top-hits.md b/_aggregations/metric/top-hits.md index b6752300b2..cead3f77f2 100644 --- a/_aggregations/metric/top-hits.md +++ b/_aggregations/metric/top-hits.md @@ -2,7 +2,6 @@ layout: default title: Top hits parent: Metric aggregations -grand_parent: Aggregations nav_order: 130 redirect_from: - /query-dsl/aggregations/metric/top-hits/ diff --git a/_aggregations/metric/value-count.md b/_aggregations/metric/value-count.md index dfddaf9417..596fb5d806 100644 --- a/_aggregations/metric/value-count.md +++ b/_aggregations/metric/value-count.md @@ -2,7 +2,6 @@ layout: default title: Value count parent: Metric aggregations -grand_parent: Aggregations nav_order: 140 redirect_from: - /query-dsl/aggregations/metric/value-count/ diff --git a/_aggregations/metric/weighted-avg.md b/_aggregations/metric/weighted-avg.md index 268f78bfdc..6f67939d6e 100644 --- a/_aggregations/metric/weighted-avg.md +++ b/_aggregations/metric/weighted-avg.md @@ -2,7 +2,6 @@ layout: default title: Weighted average parent: Metric aggregations -grand_parent: Aggregations nav_order: 150 --- diff --git a/_query-dsl/compound/bool.md b/_query-dsl/compound/bool.md index 12caea2f8d..4479094214 100644 --- a/_query-dsl/compound/bool.md +++ b/_query-dsl/compound/bool.md @@ -2,7 +2,6 @@ layout: default title: Boolean parent: Compound queries -grand_parent: Query DSL nav_order: 10 redirect_from: - /opensearch/query-dsl/compound/bool/ diff --git a/_query-dsl/compound/boosting.md b/_query-dsl/compound/boosting.md index 7aa9c6c035..ede7eb3d51 100644 --- a/_query-dsl/compound/boosting.md +++ b/_query-dsl/compound/boosting.md @@ -2,7 +2,6 @@ layout: default title: Boosting parent: Compound queries -grand_parent: Query DSL nav_order: 30 redirect_from: - /query-dsl/query-dsl/compound/boosting/ diff --git a/_query-dsl/compound/constant-score.md b/_query-dsl/compound/constant-score.md index ed12e33ec0..bb5af3800a 100644 --- a/_query-dsl/compound/constant-score.md +++ b/_query-dsl/compound/constant-score.md @@ -2,7 +2,6 @@ layout: default title: Constant score parent: Compound queries -grand_parent: Query DSL nav_order: 40 redirect_from: - /query-dsl/query-dsl/compound/constant-score/ diff --git a/_query-dsl/compound/disjunction-max.md b/_query-dsl/compound/disjunction-max.md index 8dd9e41d2c..09a7e0e729 100644 --- a/_query-dsl/compound/disjunction-max.md +++ b/_query-dsl/compound/disjunction-max.md @@ -2,7 +2,6 @@ layout: default title: Disjunction max parent: Compound queries -grand_parent: Query DSL nav_order: 50 redirect_from: - /query-dsl/query-dsl/compound/disjunction-max/ diff --git a/_query-dsl/compound/function-score.md b/_query-dsl/compound/function-score.md index 98568e0965..b28a6abed6 100644 --- a/_query-dsl/compound/function-score.md +++ b/_query-dsl/compound/function-score.md @@ -2,7 +2,6 @@ layout: default title: Function score parent: Compound queries -grand_parent: Query DSL nav_order: 60 has_math: true redirect_from: diff --git a/_query-dsl/compound/hybrid.md b/_query-dsl/compound/hybrid.md index 22b3a17fc1..69ce89ce17 100644 --- a/_query-dsl/compound/hybrid.md +++ b/_query-dsl/compound/hybrid.md @@ -2,7 +2,6 @@ layout: default title: Hybrid parent: Compound queries -grand_parent: Query DSL nav_order: 70 --- diff --git a/_query-dsl/full-text/intervals.md b/_query-dsl/full-text/intervals.md index 082f8fbe46..a31f7434b3 100644 --- a/_query-dsl/full-text/intervals.md +++ b/_query-dsl/full-text/intervals.md @@ -3,7 +3,6 @@ layout: default title: Intervals nav_order: 80 parent: Full-text queries -grand_parent: Query DSL --- # Intervals query diff --git a/_query-dsl/full-text/match-bool-prefix.md b/_query-dsl/full-text/match-bool-prefix.md index 3a0d304ce4..3964dc5ee8 100644 --- a/_query-dsl/full-text/match-bool-prefix.md +++ b/_query-dsl/full-text/match-bool-prefix.md @@ -2,7 +2,6 @@ layout: default title: Match Boolean prefix parent: Full-text queries -grand_parent: Query DSL nav_order: 20 --- diff --git a/_query-dsl/full-text/match-phrase-prefix.md b/_query-dsl/full-text/match-phrase-prefix.md index 354dd35c61..9e05f034d7 100644 --- a/_query-dsl/full-text/match-phrase-prefix.md +++ b/_query-dsl/full-text/match-phrase-prefix.md @@ -2,7 +2,6 @@ layout: default title: Match phrase prefix parent: Full-text queries -grand_parent: Query DSL nav_order: 40 --- diff --git a/_query-dsl/full-text/match-phrase.md b/_query-dsl/full-text/match-phrase.md index 18dd6a858c..747c4814d9 100644 --- a/_query-dsl/full-text/match-phrase.md +++ b/_query-dsl/full-text/match-phrase.md @@ -2,7 +2,6 @@ layout: default title: Match phrase parent: Full-text queries -grand_parent: Query DSL nav_order: 30 --- diff --git a/_query-dsl/full-text/match.md b/_query-dsl/full-text/match.md index 746a4cf5b6..b4db30ec1f 100644 --- a/_query-dsl/full-text/match.md +++ b/_query-dsl/full-text/match.md @@ -2,7 +2,6 @@ layout: default title: Match parent: Full-text queries -grand_parent: Query DSL nav_order: 10 --- diff --git a/_query-dsl/full-text/multi-match.md b/_query-dsl/full-text/multi-match.md index 7450b74721..ab1496fdd3 100644 --- a/_query-dsl/full-text/multi-match.md +++ b/_query-dsl/full-text/multi-match.md @@ -2,7 +2,6 @@ layout: default title: Multi-match parent: Full-text queries -grand_parent: Query DSL nav_order: 50 --- diff --git a/_query-dsl/full-text/query-string.md b/_query-dsl/full-text/query-string.md index 12609e29c0..47180e3f6d 100644 --- a/_query-dsl/full-text/query-string.md +++ b/_query-dsl/full-text/query-string.md @@ -2,9 +2,7 @@ layout: default title: Query string parent: Full-text queries -grand_parent: Query DSL nav_order: 60 - redirect_from: - /opensearch/query-dsl/full-text/query-string/ - /query-dsl/query-dsl/full-text/query-string/ diff --git a/_query-dsl/full-text/simple-query-string.md b/_query-dsl/full-text/simple-query-string.md index fbf37f588d..58780cfdb4 100644 --- a/_query-dsl/full-text/simple-query-string.md +++ b/_query-dsl/full-text/simple-query-string.md @@ -2,7 +2,6 @@ layout: default title: Simple query string parent: Full-text queries -grand_parent: Query DSL nav_order: 70 --- diff --git a/_query-dsl/geo-and-xy/geo-bounding-box.md b/_query-dsl/geo-and-xy/geo-bounding-box.md index df697e2ce5..1112a4278e 100644 --- a/_query-dsl/geo-and-xy/geo-bounding-box.md +++ b/_query-dsl/geo-and-xy/geo-bounding-box.md @@ -2,7 +2,6 @@ layout: default title: Geo-bounding box parent: Geographic and xy queries -grand_parent: Query DSL nav_order: 10 redirect_from: - /opensearch/query-dsl/geo-and-xy/geo-bounding-box/ diff --git a/_query-dsl/geo-and-xy/geodistance.md b/_query-dsl/geo-and-xy/geodistance.md index 7a36b0c933..b272cad81e 100644 --- a/_query-dsl/geo-and-xy/geodistance.md +++ b/_query-dsl/geo-and-xy/geodistance.md @@ -2,7 +2,6 @@ layout: default title: Geodistance parent: Geographic and xy queries -grand_parent: Query DSL nav_order: 20 --- diff --git a/_query-dsl/geo-and-xy/geopolygon.md b/_query-dsl/geo-and-xy/geopolygon.md index c53b1379cf..980a0c5a63 100644 --- a/_query-dsl/geo-and-xy/geopolygon.md +++ b/_query-dsl/geo-and-xy/geopolygon.md @@ -2,7 +2,6 @@ layout: default title: Geopolygon parent: Geographic and xy queries -grand_parent: Query DSL nav_order: 30 --- diff --git a/_query-dsl/geo-and-xy/xy.md b/_query-dsl/geo-and-xy/xy.md index 88a22448c3..d0ed61c050 100644 --- a/_query-dsl/geo-and-xy/xy.md +++ b/_query-dsl/geo-and-xy/xy.md @@ -2,9 +2,7 @@ layout: default title: xy parent: Geographic and xy queries -grand_parent: Query DSL nav_order: 50 - redirect_from: - /opensearch/query-dsl/geo-and-xy/xy/ - /query-dsl/query-dsl/geo-and-xy/xy/ diff --git a/_query-dsl/specialized/neural-sparse.md b/_query-dsl/specialized/neural-sparse.md index 47f77fa95d..8de3eaf693 100644 --- a/_query-dsl/specialized/neural-sparse.md +++ b/_query-dsl/specialized/neural-sparse.md @@ -2,7 +2,6 @@ layout: default title: Neural sparse parent: Specialized queries -grand_parent: Query DSL nav_order: 55 --- diff --git a/_query-dsl/specialized/neural.md b/_query-dsl/specialized/neural.md index fea949cf52..14b930cdb6 100644 --- a/_query-dsl/specialized/neural.md +++ b/_query-dsl/specialized/neural.md @@ -2,7 +2,6 @@ layout: default title: Neural parent: Specialized queries -grand_parent: Query DSL nav_order: 50 --- diff --git a/_query-dsl/specialized/script-score.md b/_query-dsl/specialized/script-score.md index d09158f20a..f65c266cc5 100644 --- a/_query-dsl/specialized/script-score.md +++ b/_query-dsl/specialized/script-score.md @@ -2,7 +2,6 @@ layout: default title: Script score parent: Specialized queries -grand_parent: Query DSL nav_order: 60 --- diff --git a/_query-dsl/term/exists.md b/_query-dsl/term/exists.md index 1d52744c91..95573a36f8 100644 --- a/_query-dsl/term/exists.md +++ b/_query-dsl/term/exists.md @@ -2,7 +2,6 @@ layout: default title: Exists parent: Term-level queries -grand_parent: Query DSL nav_order: 10 --- diff --git a/_query-dsl/term/fuzzy.md b/_query-dsl/term/fuzzy.md index 9afa85ea93..bf2bd43bba 100644 --- a/_query-dsl/term/fuzzy.md +++ b/_query-dsl/term/fuzzy.md @@ -2,7 +2,6 @@ layout: default title: Fuzzy parent: Term-level queries -grand_parent: Query DSL nav_order: 20 --- diff --git a/_query-dsl/term/ids.md b/_query-dsl/term/ids.md index 0c3b5393fb..f895745c97 100644 --- a/_query-dsl/term/ids.md +++ b/_query-dsl/term/ids.md @@ -2,7 +2,6 @@ layout: default title: IDs parent: Term-level queries -grand_parent: Query DSL nav_order: 30 --- diff --git a/_query-dsl/term/prefix.md b/_query-dsl/term/prefix.md index eda5307d14..2a429c9f0e 100644 --- a/_query-dsl/term/prefix.md +++ b/_query-dsl/term/prefix.md @@ -2,7 +2,6 @@ layout: default title: Prefix parent: Term-level queries -grand_parent: Query DSL nav_order: 40 --- diff --git a/_query-dsl/term/range.md b/_query-dsl/term/range.md index 8a8f53c480..ceb264db76 100644 --- a/_query-dsl/term/range.md +++ b/_query-dsl/term/range.md @@ -2,7 +2,6 @@ layout: default title: Range parent: Term-level queries -grand_parent: Query DSL nav_order: 50 --- diff --git a/_query-dsl/term/regexp.md b/_query-dsl/term/regexp.md index 65d6953516..4a038729c0 100644 --- a/_query-dsl/term/regexp.md +++ b/_query-dsl/term/regexp.md @@ -2,7 +2,6 @@ layout: default title: Regexp parent: Term-level queries -grand_parent: Query DSL nav_order: 60 --- diff --git a/_query-dsl/term/term.md b/_query-dsl/term/term.md index c1c296b9a0..a33146f6aa 100644 --- a/_query-dsl/term/term.md +++ b/_query-dsl/term/term.md @@ -2,7 +2,6 @@ layout: default title: Term parent: Term-level queries -grand_parent: Query DSL nav_order: 70 --- diff --git a/_query-dsl/term/terms-set.md b/_query-dsl/term/terms-set.md index ea0251ddff..6652d15979 100644 --- a/_query-dsl/term/terms-set.md +++ b/_query-dsl/term/terms-set.md @@ -2,7 +2,6 @@ layout: default title: Terms set parent: Term-level queries -grand_parent: Query DSL nav_order: 90 --- diff --git a/_query-dsl/term/terms.md b/_query-dsl/term/terms.md index fd15126255..42c74c0436 100644 --- a/_query-dsl/term/terms.md +++ b/_query-dsl/term/terms.md @@ -2,7 +2,6 @@ layout: default title: Terms parent: Term-level queries -grand_parent: Query DSL nav_order: 80 --- diff --git a/_query-dsl/term/wildcard.md b/_query-dsl/term/wildcard.md index 0652581941..b2d7238758 100644 --- a/_query-dsl/term/wildcard.md +++ b/_query-dsl/term/wildcard.md @@ -2,7 +2,6 @@ layout: default title: Wildcard parent: Term-level queries -grand_parent: Query DSL nav_order: 100 --- diff --git a/_search-plugins/knn/api.md b/_search-plugins/knn/api.md index 23678063a8..c7314f7ae2 100644 --- a/_search-plugins/knn/api.md +++ b/_search-plugins/knn/api.md @@ -3,7 +3,6 @@ layout: default title: k-NN plugin API nav_order: 30 parent: k-NN search -grand_parent: Search methods has_children: false --- diff --git a/_search-plugins/knn/approximate-knn.md b/_search-plugins/knn/approximate-knn.md index c0a9557728..144365166f 100644 --- a/_search-plugins/knn/approximate-knn.md +++ b/_search-plugins/knn/approximate-knn.md @@ -3,7 +3,6 @@ layout: default title: Approximate k-NN search nav_order: 15 parent: k-NN search -grand_parent: Search methods has_children: false has_math: true --- diff --git a/_search-plugins/knn/filter-search-knn.md b/_search-plugins/knn/filter-search-knn.md index 309cf6850e..2f0c4aa072 100644 --- a/_search-plugins/knn/filter-search-knn.md +++ b/_search-plugins/knn/filter-search-knn.md @@ -3,7 +3,6 @@ layout: default title: k-NN search with filters nav_order: 20 parent: k-NN search -grand_parent: Search methods has_children: false has_math: true --- diff --git a/_search-plugins/knn/jni-libraries.md b/_search-plugins/knn/jni-libraries.md index 59a5b7a1e2..4dbdb2da56 100644 --- a/_search-plugins/knn/jni-libraries.md +++ b/_search-plugins/knn/jni-libraries.md @@ -3,7 +3,6 @@ layout: default title: JNI libraries nav_order: 35 parent: k-NN search -grand_parent: Search methods has_children: false redirect_from: - /search-plugins/knn/jni-library/ diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md index ab24a0c097..ed8b9217f5 100644 --- a/_search-plugins/knn/knn-index.md +++ b/_search-plugins/knn/knn-index.md @@ -3,7 +3,6 @@ layout: default title: k-NN index nav_order: 5 parent: k-NN search -grand_parent: Search methods has_children: false --- diff --git a/_search-plugins/knn/knn-score-script.md b/_search-plugins/knn/knn-score-script.md index cc79e90850..1696bd4cad 100644 --- a/_search-plugins/knn/knn-score-script.md +++ b/_search-plugins/knn/knn-score-script.md @@ -3,7 +3,6 @@ layout: default title: Exact k-NN with scoring script nav_order: 10 parent: k-NN search -grand_parent: Search methods has_children: false has_math: true --- diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 549437f346..fe4833ee47 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -3,7 +3,6 @@ layout: default title: k-NN vector quantization nav_order: 27 parent: k-NN search -grand_parent: Search methods has_children: false has_math: true --- diff --git a/_search-plugins/knn/nested-search-knn.md b/_search-plugins/knn/nested-search-knn.md index bdc1045387..d947ebc6e6 100644 --- a/_search-plugins/knn/nested-search-knn.md +++ b/_search-plugins/knn/nested-search-knn.md @@ -3,7 +3,6 @@ layout: default title: k-NN search with nested fields nav_order: 21 parent: k-NN search -grand_parent: Search methods has_children: false has_math: true --- diff --git a/_search-plugins/knn/painless-functions.md b/_search-plugins/knn/painless-functions.md index 1f27cc29a6..09eb989702 100644 --- a/_search-plugins/knn/painless-functions.md +++ b/_search-plugins/knn/painless-functions.md @@ -3,7 +3,6 @@ layout: default title: k-NN Painless extensions nav_order: 25 parent: k-NN search -grand_parent: Search methods has_children: false has_math: true --- diff --git a/_search-plugins/knn/performance-tuning.md b/_search-plugins/knn/performance-tuning.md index 24d92bd67d..123b1daef1 100644 --- a/_search-plugins/knn/performance-tuning.md +++ b/_search-plugins/knn/performance-tuning.md @@ -2,7 +2,6 @@ layout: default title: Performance tuning parent: k-NN search -grand_parent: Search methods nav_order: 45 --- diff --git a/_search-plugins/knn/radial-search-knn.md b/_search-plugins/knn/radial-search-knn.md index 48aaac034d..1a4a223294 100644 --- a/_search-plugins/knn/radial-search-knn.md +++ b/_search-plugins/knn/radial-search-knn.md @@ -3,7 +3,6 @@ layout: default title: Radial search nav_order: 28 parent: k-NN search -grand_parent: Search methods has_children: false has_math: true --- diff --git a/_search-plugins/knn/settings.md b/_search-plugins/knn/settings.md index 4d84cc80bb..1b9aa3608c 100644 --- a/_search-plugins/knn/settings.md +++ b/_search-plugins/knn/settings.md @@ -2,7 +2,6 @@ layout: default title: Settings parent: k-NN search -grand_parent: Search methods nav_order: 40 --- From f1cb58fd0c5ad8565e929e162157e857b5d42c06 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Wed, 17 Jul 2024 16:01:56 -0500 Subject: [PATCH 037/154] Remove redundant source (#7755) Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _api-reference/document-apis/reindex.md | 1 - 1 file changed, 1 deletion(-) diff --git a/_api-reference/document-apis/reindex.md b/_api-reference/document-apis/reindex.md index 48f14923f5..c2afa347e1 100644 --- a/_api-reference/document-apis/reindex.md +++ b/_api-reference/document-apis/reindex.md @@ -79,7 +79,6 @@ version_type | The indexing operation's version type. Valid values are `internal op_type | Whether to copy over documents that are missing in the destination index. Valid values are `create` (ignore documents with the same ID from the source index) and `index` (copy everything from the source index). pipeline | Which ingest pipeline to utilize during the reindex. script | A script that OpenSearch uses to apply transformations to the data during the reindex operation. -source | The actual script that OpenSearch runs. lang | The scripting language. Valid options are `painless`, `expression`, `mustache`, and `java`. ## Response From 7bfee4011d0a82644e3e5868717f44eac82ea6b4 Mon Sep 17 00:00:00 2001 From: Michael Oviedo Date: Wed, 17 Jul 2024 15:15:25 -0700 Subject: [PATCH 038/154] Update compare.md (#7765) The `--results-number-align` option for the compare API is actually spelled `--results-numbers-align`. This change updates the spelling for that option. Signed-off-by: Michael Oviedo --- _benchmark/reference/commands/compare.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_benchmark/reference/commands/compare.md b/_benchmark/reference/commands/compare.md index a9eb8bcc10..35bafe0704 100644 --- a/_benchmark/reference/commands/compare.md +++ b/_benchmark/reference/commands/compare.md @@ -130,7 +130,7 @@ You can use the following options to customize the results of your test comparis - `--baseline`: The baseline TestExecution ID used to compare the contender TestExecution. - `--contender`: The TestExecution ID for the contender being compared to the baseline. - `--results-format`: Defines the output format for the command line results, either `markdown` or `csv`. Default is `markdown`. -- `--results-number-align`: Defines the column number alignment for when the `compare` command outputs results. Default is `right`. +- `--results-numbers-align`: Defines the column number alignment for when the `compare` command outputs results. Default is `right`. - `--results-file`: When provided a file path, writes the compare results to the file indicated in the path. - `--show-in-results`: Determines whether or not to include the comparison in the results file. From 3f37837c631d6f99c14f823965e3fdcedfa67dc8 Mon Sep 17 00:00:00 2001 From: gaobinlong Date: Thu, 18 Jul 2024 23:37:15 +0800 Subject: [PATCH 039/154] Document CreateAnomalyDetectorTool (#7742) * Document CreateAnomalyDetectorTool Signed-off-by: gaobinlong * Fix format issue Signed-off-by: gaobinlong * Fix link Signed-off-by: gaobinlong * Update create-anomaly-detector.md Signed-off-by: Melissa Vagi * Update _ml-commons-plugin/agents-tools/tools/index.md Signed-off-by: Melissa Vagi * Update _ml-commons-plugin/agents-tools/tools/index.md Signed-off-by: Melissa Vagi * Update _ml-commons-plugin/agents-tools/tools/create-anomaly-detector.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ml-commons-plugin/agents-tools/tools/create-anomaly-detector.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ml-commons-plugin/agents-tools/tools/create-anomaly-detector.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ml-commons-plugin/agents-tools/tools/create-anomaly-detector.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ml-commons-plugin/agents-tools/tools/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --------- Signed-off-by: gaobinlong Signed-off-by: Melissa Vagi Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower --- .../tools/create-anomaly-detector.md | 169 ++++++++++++++++++ .../agents-tools/tools/index.md | 3 +- 2 files changed, 171 insertions(+), 1 deletion(-) create mode 100644 _ml-commons-plugin/agents-tools/tools/create-anomaly-detector.md diff --git a/_ml-commons-plugin/agents-tools/tools/create-anomaly-detector.md b/_ml-commons-plugin/agents-tools/tools/create-anomaly-detector.md new file mode 100644 index 0000000000..b6fae8131c --- /dev/null +++ b/_ml-commons-plugin/agents-tools/tools/create-anomaly-detector.md @@ -0,0 +1,169 @@ +--- +layout: default +title: CreateAnomalyDetectorTool +has_children: false +has_toc: false +nav_order: 70 +parent: Tools +grand_parent: Agents and tools +--- + + +# CreateAnomalyDetectorTool +**Introduced 2.16** +{: .label .label-purple } + + +This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/skills/issues/337). +{: .warning} + +The `CreateAnomalyDetectorTool` helps create anomaly detectors based on your provided index. This tool retrieves index mappings and enables a large language model (LLM) to recommend category fields, aggregation fields, and their corresponding aggregation methods, which are required by the Create Anomaly Detector API. + +For comprehensive information about anomaly detectors, see [Anomaly detection]({{site.url}}{{site.baseurl}}/observing-your-data/ad/index/). +{: .tip} + +## Step 1: Register a flow agent that runs the CreateAnomalyDetectorTool + +A flow agent runs a sequence of tools in order, returning the output of the last tool. To create a flow agent, send the following register agent request: + +```json +POST /_plugins/_ml/agents/_register +{ + "name": "Test_Agent_For_Create_Anomaly_Detector_Tool", + "type": "flow", + "description": "this is a test agent for the CreateAnomalyDetectorTool", + "memory": { + "type": "demo" + }, + "tools": [ + { + "type": "CreateAnomalyDetectorTool", + "name": "DemoCreateAnomalyDetectorTool", + "parameters": { + "model_id": "" + } + } + ] +} +``` +{% include copy-curl.html %} + +OpenSearch responds with an agent ID, for example, as follows: + +```json +{ + "agent_id": "EuJYYo0B9RaBCvhuy1q8" +} +``` +{% include copy-curl.html %} + +## Step 2: Run the agent + +Run the agent by sending the following request: + +```json +POST /_plugins/_ml/agents/EuJYYo0B9RaBCvhuy1q8/_execute +{ + "parameters": { + "index": "sample_weblogs_test" + } +} +``` +{% include copy-curl.html %} + +OpenSearch responds with a JSON string containing all of the recommended parameters for creating an anomaly detector, such as the string shown in the following example repsonse: + +```json +{ + "inference_results": [ + { + "output": [ + { + "name": "response", + "result":"""{"index":"sample_weblogs_test","categoryField":"ip.keyword","aggregationField":"bytes,response,responseLatency","aggregationMethod":"sum,avg,avg","dateFields":"utc_time,timestamp"}""" + } + ] + } + ] +} +``` +{% include copy-curl.html %} + +You can then create an anomaly detector containing the recommended parameters by sending a request similar to the following: + +```json +POST _plugins/_anomaly_detection/detectors +{ + "name": "test-detector", + "description": "Test detector", + "time_field": "timestamp", + "indices": [ + "sample_weblogs_test" + ], + "feature_attributes": [ + { + "feature_name": "feature_bytes", + "feature_enabled": true, + "aggregation_query": { + "agg1": { + "sum": { + "field": "bytes" + } + } + } + }, + { + "feature_name": "feature_response", + "feature_enabled": true, + "aggregation_query": { + "agg2": { + "avg": { + "field": "response" + } + } + } + }, + { + "feature_name": "feature_responseLatency", + "feature_enabled": true, + "aggregation_query": { + "agg3": { + "avg": { + "field": "responseLatency" + } + } + } + } + ], + "detection_interval": { + "period": { + "interval": 1, + "unit": "Minutes" + } + }, + "window_delay": { + "period": { + "interval": 1, + "unit": "Minutes" + } + } +} +``` +{% include copy-curl.html %} + +## Register parameters + +The following table lists the available tool parameters for agent registration. + +Parameter | Type | Required/Optional | Description +:--- | :--- | :--- | :--- +`model_id` | String | Required | The LLM model ID used for suggesting required Create Anomaly Detector API parameters. +`model_type` | String | Optional | The model type. Valid values are `CLAUDE` (Anthropic Claude models) and `OPENAI` (OpenAI models). + +## Execute parameters + +The following table lists the available tool parameters for running the agent. + +Parameter | Type | Required/Optional | Description +:--- | :--- | :--- | :--- +`index` | String | Required | The index name. Supports wildcards (for example, `weblogs-*`). If wildcards are used, then the tool fetches mappings from the first resolved index and sends them to the LLM. diff --git a/_ml-commons-plugin/agents-tools/tools/index.md b/_ml-commons-plugin/agents-tools/tools/index.md index fba47b63da..bc71122949 100644 --- a/_ml-commons-plugin/agents-tools/tools/index.md +++ b/_ml-commons-plugin/agents-tools/tools/index.md @@ -35,6 +35,7 @@ Each tool takes a list of parameters specific to that tool. In the preceding exa |[`CatIndexTool`]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/cat-index-tool/) |Retrieves index information for the OpenSearch cluster. | |[`ConnectorTool`]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/connector-tool/) | Uses a [connector]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/connectors/) to call any REST API function. | |[`IndexMappingTool`]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/index-mapping-tool/) |Retrieves index mapping and setting information for an index. | +|[`CreateAnomalyDetectorTool`]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/create-anomaly-detector/) | Enables an LLM to suggest required parameters for creating an anomaly detector. | |[`MLModelTool`]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/ml-model-tool/) |Runs machine learning models. | |[`NeuralSparseSearchTool`]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/neural-sparse-tool/) | Performs sparse vector retrieval. | |[`PPLTool`]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/ppl-tool/) |Translates natural language into a Piped Processing Language (PPL) query. | @@ -49,4 +50,4 @@ Each tool takes a list of parameters specific to that tool. In the preceding exa ## Developer information -The agents and tools framework is flexible and extensible. You can find the list of tools provided by OpenSearch in the [Tools library](https://github.com/opensearch-project/skills/tree/main/src/main/java/org/opensearch/agent/tools). For a different use case, you can build your own tool by implementing the [_Tool_ interface](https://github.com/opensearch-project/ml-commons/blob/2.x/spi/src/main/java/org/opensearch/ml/common/spi/tools/Tool.java). \ No newline at end of file +The agents and tools framework offers flexibility and extensibility. See the [tools library](https://github.com/opensearch-project/skills/tree/main/src/main/java/org/opensearch/agent/tools) for OpenSearch-provided tools. Implement the [**Tool** interface](https://github.com/opensearch-project/ml-commons/blob/2.x/spi/src/main/java/org/opensearch/ml/common/spi/tools/Tool.java) to build custom tools for different use cases. From 0c3dacee582816b9ab0c7c98f927597b28677a79 Mon Sep 17 00:00:00 2001 From: gaobinlong Date: Thu, 18 Jul 2024 23:55:36 +0800 Subject: [PATCH 040/154] Add strict_allow_templates option for the dynamic mapping parameter (#7745) * Add strict_allow_templates option for the dynamic mapping parameter Signed-off-by: gaobinlong * Fix typo Signed-off-by: gaobinlong * Fix header Signed-off-by: gaobinlong * Update dynamic.md Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/supported-field-types/object.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/supported-field-types/object.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/supported-field-types/nested.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update dynamic.md Make changes to address editorial review comments Signed-off-by: Melissa Vagi --------- Signed-off-by: gaobinlong Signed-off-by: Melissa Vagi Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower --- _field-types/dynamic.md | 342 +++++++++++++++++++ _field-types/supported-field-types/nested.md | 2 +- _field-types/supported-field-types/object.md | 3 +- 3 files changed, 345 insertions(+), 2 deletions(-) create mode 100644 _field-types/dynamic.md diff --git a/_field-types/dynamic.md b/_field-types/dynamic.md new file mode 100644 index 0000000000..59f59bfe3d --- /dev/null +++ b/_field-types/dynamic.md @@ -0,0 +1,342 @@ +--- +layout: default +title: Dynamic parameter +nav_order: 10 +redirect_from: + - /opensearch/dynamic/ +--- + +# Dynamic parameter + +The `dynamic` parameter specifies whether newly detected fields can be added dynamically to a mapping. It accepts the parameters listed in the following table. + +Parameter | Description +:--- | :--- +`true` | Specfies that new fields can be added dynamically to the mapping. Default is `true`. +`false` | Specifies that new fields cannot be added dynamically to the mapping. If a new field is detected, then it is not indexed or searchable but can be retrieved from the `_source` field. +`strict` | Throws an exception. The indexing operation fails when new fields are detected. +`strict_allow_templates` | Adds new fields if they match predefined dynamic templates in the mapping. + +--- + +## Example: Create an index with `dynamic` set to `true` + +1. Create an index with `dynamic` set to `true` by sending the following request: + +```json +PUT testindex1 +{ + "mappings": { + "dynamic": true + } +} +``` +{% include copy-curl.html %} + +2. Index a document with an object field `patient` containing two string fields by sending the following request: + +```json +PUT testindex1/_doc/1 +{ + "patient": { + "name" : "John Doe", + "id" : "123456" + } +} +``` +{% include copy-curl.html %} + +3. Confirm the mapping works as expected by sending the following request: + +```json +GET testindex1/_mapping +``` +{% include copy-curl.html %} + +The object field `patient` and two subfields `name` and `id` are added to the mapping, as shown in the following response: + +```json +{ + "testindex1": { + "mappings": { + "dynamic": "true", + "properties": { + "patient": { + "properties": { + "id": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword", + "ignore_above": 256 + } + } + }, + "name": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword", + "ignore_above": 256 + } + } + } + } + } + } + } + } +} +``` + +--- + +## Example: Create an index with `dynamic` set to `false` + +1. Create an index with explicit mappings and `dynamic` set to `false` by sending the following request: + +```json +PUT testindex1 +{ + "mappings": { + "dynamic": false, + "properties": { + "patient": { + "properties": { + "id": { + "type": "keyword" + }, + "name": { + "type": "keyword" + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +2. Index a document with an object field `patient` containing two string fields and additional unmapped fields by sending the following request: + +```json +PUT testindex1/_doc/1 +{ + "patient": { + "name" : "John Doe", + "id" : "123456" + }, + "room": "room1", + "floor": "1" +} +``` +{% include copy-curl.html %} + +3. Confirm the mapping works as expected by sending the following request: + +```json +GET testindex1/_mapping +``` +{% include copy-curl.html %} + +The following response shows that the new fields `room` and `floor` were not added to the mapping, which remained unchanged: + +```json +{ + "testindex1": { + "mappings": { + "dynamic": "false", + "properties": { + "patient": { + "properties": { + "id": { + "type": "keyword" + }, + "name": { + "type": "keyword" + } + } + } + } + } + } +} +``` + +4. Get the unmapped fields `room` and `floor` from the document by sending the following request: + +```json +PUT testindex1/_doc/1 +{ + "patient": { + "name" : "John Doe", + "id" : "123456" + }, + "room": "room1", + "floor": "1" +} +``` + +The following request searches for the fields `room` and `floor`: + +```json +POST testindex1/_search +{ + "query": { + "term": { + "room": "room1" + } + } +} +``` + +The response returns no results: + +```json +{ + "took": 3, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 0, + "relation": "eq" + }, + "max_score": null, + "hits": [] + } +} +``` + +--- + +## Example: Create an index with `dynamic` set to `strict` + +1. Create an index with explicit mappings and `dynamic` set to `strict` by sending the following request: + +```json +PUT testindex1 +{ + "mappings": { + "dynamic": strict, + "properties": { + "patient": { + "properties": { + "id": { + "type": "keyword" + }, + "name": { + "type": "keyword" + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +2. Index a document with an object field `patient` containing two string fields and additional unmapped fields by sending the following request: + +```json +PUT testindex1/_doc/1 +{ + "patient": { + "name" : "John Doe", + "id" : "123456" + }, + "room": "room1", + "floor": "1" +} +``` +{% include copy-curl.html %} + +Note that an exception is thrown, as shown in the following response: + +```json +{ + "error": { + "root_cause": [ + { + "type": "strict_dynamic_mapping_exception", + "reason": "mapping set to strict, dynamic introduction of [room] within [_doc] is not allowed" + } + ], + "type": "strict_dynamic_mapping_exception", + "reason": "mapping set to strict, dynamic introduction of [room] within [_doc] is not allowed" + }, + "status": 400 +} +``` + +--- + +## Example: Create an index with `dynamic` set to `strict_allow_templates` + +1. Create an index with predefined dynamic templates and `dynamic` set to `strict_allow_templates` by sending the following request: + +```json +PUT testindex1 +{ + "mappings": { + "dynamic": "strict_allow_templates", + "dynamic_templates": [ + { + "strings": { + "match": "room*", + "match_mapping_type": "string", + "mapping": { + "type": "keyword" + } + } + } + ], + "properties": { + "patient": { + "properties": { + "id": { + "type": "keyword" + }, + "name": { + "type": "keyword" + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +2. Index a document with an object field `patient` containing two string fields and a new field `room` that matches one of the dynamic templates by sending the following request: + +```json +PUT testindex1/_doc/1 +{ + "patient": { + "name" : "John Doe", + "id" : "123456" + }, + "room": "room1" +} +``` +{% include copy-curl.html %} + +Indexing succeeds because the new field `room` matches the dynamic templates. However, indexing fails for the new field `floor` because it does not match one of the dynamic templates and is not explicitly mapped, as shown in the following response: + +```json +PUT testindex1/_doc/1 +{ + "patient": { + "name" : "John Doe", + "id" : "123456" + }, + "room": "room1", + "floor": "1" +} +``` diff --git a/_field-types/supported-field-types/nested.md b/_field-types/supported-field-types/nested.md index d61ccd53df..90d09177d1 100644 --- a/_field-types/supported-field-types/nested.md +++ b/_field-types/supported-field-types/nested.md @@ -308,7 +308,7 @@ The following table lists the parameters accepted by object field types. All par Parameter | Description :--- | :--- -[`dynamic`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/object#the-dynamic-parameter) | Specifies whether new fields can be dynamically added to this object. Valid values are `true`, `false`, and `strict`. Default is `true`. +[`dynamic`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/object#the-dynamic-parameter) | Specifies whether new fields can be dynamically added to the object. Valid values are `true`, `false`, `strict`, and `strict_allow_templates`. Default is `true`. `include_in_parent` | A Boolean value that specifies whether all fields in the child nested object should also be added to the parent document in flattened form. Default is `false`. `include_in_root` | A Boolean value that specifies whether all fields in the child nested object should also be added to the root document in flattened form. Default is `false`. `properties` | Fields of this object, which can be of any supported type. New properties can be dynamically added to this object if `dynamic` is set to `true`. diff --git a/_field-types/supported-field-types/object.md b/_field-types/supported-field-types/object.md index 372a5c46d9..db539a9608 100644 --- a/_field-types/supported-field-types/object.md +++ b/_field-types/supported-field-types/object.md @@ -73,7 +73,7 @@ The following table lists the parameters accepted by object field types. All par Parameter | Description :--- | :--- -[`dynamic`](#the-dynamic-parameter) | Specifies whether new fields can be dynamically added to this object. Valid values are `true`, `false`, and `strict`. Default is `true`. +[`dynamic`](#the-dynamic-parameter) | Specifies whether new fields can be dynamically added to the object. Valid values are `true`, `false`, `strict`, and `strict_allow_templates`. Default is `true`. `enabled` | A Boolean value that specifies whether the JSON contents of the object should be parsed. If `enabled` is set to `false`, the object's contents are not indexed or searchable, but they are still retrievable from the _source field. Default is `true`. `properties` | Fields of this object, which can be of any supported type. New properties can be dynamically added to this object if `dynamic` is set to `true`. @@ -149,6 +149,7 @@ Value | Description `true` | New fields can be added to the mapping dynamically. This is the default. `false` | New fields cannot be added to the mapping dynamically. If a new field is detected, it is not indexed or searchable. However, it is still retrievable from the _source field. `strict` | When new fields are added to the mapping dynamically, an exception is thrown. To add a new field to an object, you have to add it to the mapping first. +`strict_allow_templates` | If the newly detected fields match any of the predefined dynamic templates in the mapping, then they are added to the mapping; if they do not match any of them, then an exception is thrown. Inner objects inherit the `dynamic` parameter value from their parent unless they declare their own `dynamic` parameter value. {: .note } From 4477614f4fdc980fab0808b210107e6972e057b4 Mon Sep 17 00:00:00 2001 From: AWSHurneyt Date: Thu, 18 Jul 2024 15:30:21 -0700 Subject: [PATCH 041/154] Update per-cluster-metrics-monitors.md (#7769) Fixed typo in example. Signed-off-by: AWSHurneyt --- _observing-your-data/alerting/per-cluster-metrics-monitors.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_observing-your-data/alerting/per-cluster-metrics-monitors.md b/_observing-your-data/alerting/per-cluster-metrics-monitors.md index baea9c626b..bcaa03cc0c 100644 --- a/_observing-your-data/alerting/per-cluster-metrics-monitors.md +++ b/_observing-your-data/alerting/per-cluster-metrics-monitors.md @@ -91,7 +91,7 @@ The `script` parameter points the `source` to the Painless script `for (cluster "path": "_cluster/health/", "path_params": "", "url": "http://localhost:9200/_cluster/health/", - "cluster": ["cluster-1", "cluster-2"] + "clusters": ["cluster-1", "cluster-2"] } } ], From 2d55f1cf63203e32e12b3015605b4f0bb7ad7b98 Mon Sep 17 00:00:00 2001 From: Heather Halter Date: Thu, 18 Jul 2024 15:50:25 -0700 Subject: [PATCH 042/154] Update kafka.md (#7774) Fixed capitalization issue. Signed-off-by: Heather Halter --- _data-prepper/pipelines/configuration/sources/kafka.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_data-prepper/pipelines/configuration/sources/kafka.md b/_data-prepper/pipelines/configuration/sources/kafka.md index 4df72cfdd6..e8452a93c3 100644 --- a/_data-prepper/pipelines/configuration/sources/kafka.md +++ b/_data-prepper/pipelines/configuration/sources/kafka.md @@ -120,7 +120,7 @@ Use the following options when setting SSL encryption. Option | Required | Type | Description :--- | :--- | :--- | :--- `type` | No | String | The encryption type. Use `none` to disable encryption. Default is `ssl`. -`Insecure` | No | Boolean | A Boolean flag used to turn off SSL certificate verification. If set to `true`, certificate authority (CA) certificate verification is turned off and insecure HTTP requests are sent. Default is `false`. +`insecure` | No | Boolean | A Boolean flag used to turn off SSL certificate verification. If set to `true`, certificate authority (CA) certificate verification is turned off and insecure HTTP requests are sent. Default is `false`. #### AWS From f63f8b95f6f3dfd5043e8a5152123665fc7e98c0 Mon Sep 17 00:00:00 2001 From: gaobinlong Date: Fri, 19 Jul 2024 21:05:02 +0800 Subject: [PATCH 043/154] Fix ISM error prevention setting key is not correct (#7777) Signed-off-by: gaobinlong --- _im-plugin/ism/api.md | 6 +++--- _im-plugin/ism/error-prevention/api.md | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/_im-plugin/ism/api.md b/_im-plugin/ism/api.md index 441f737e6f..e0fbb904cd 100644 --- a/_im-plugin/ism/api.md +++ b/_im-plugin/ism/api.md @@ -553,13 +553,13 @@ Introduced 2.4 ISM allows you to run an action automatically. However, running an action can fail for a variety of reasons. You can use error prevention validation to test an action in order to rule out failures. -To enable error prevention validation, set the `plugins.index_state_management.validation_service.enabled` setting to `true`: +To enable error prevention validation, set the `plugins.index_state_management.action_validation.enabled` setting to `true`: ```bash PUT _cluster/settings { "persistent":{ - "plugins.index_state_management.validation_action.enabled": true + "plugins.index_state_management.action_validation.enabled": true } } ``` @@ -692,4 +692,4 @@ GET _plugins/_ism/explain/test-000001 }, "total_managed_indices" : 1 } -``` \ No newline at end of file +``` diff --git a/_im-plugin/ism/error-prevention/api.md b/_im-plugin/ism/error-prevention/api.md index a273d25cfb..c03a62d868 100644 --- a/_im-plugin/ism/error-prevention/api.md +++ b/_im-plugin/ism/error-prevention/api.md @@ -12,7 +12,7 @@ The ISM Error Prevention API allows you to enable Index State Management (ISM) e ## Enable error prevention validation -You can configure error prevention validation by setting the `plugins.index_state_management.validation_service.enabled` parameter. +You can configure error prevention validation by setting the `plugins.index_state_management.action_validation.enabled` parameter. #### Example request @@ -20,7 +20,7 @@ You can configure error prevention validation by setting the `plugins.index_stat PUT _cluster/settings { "persistent":{ - "plugins.index_state_management.validation_action.enabled": true + "plugins.index_state_management.action_validation.enabled": true } } ``` From 27c41222955655669469b1319cff9daaa7212956 Mon Sep 17 00:00:00 2001 From: David Venable Date: Fri, 19 Jul 2024 12:16:25 -0500 Subject: [PATCH 044/154] Data Prepper documentation updates: autogeneration campaign (#7707) Updates Data Prepper documentation with some missing fields. Adds support for autogeneration of processors by naming to match the processor and including the autogenerated comment. Signed-off-by: David Venable Signed-off-by: David Venable Signed-off-by: Melissa Vagi Co-authored-by: Melissa Vagi Co-authored-by: Heather Halter --- .../processors/convert_entry_type.md | 16 ++++++++++++++-- .../processors/{parse-ion.md => parse_ion.md} | 12 +++++++++++- .../processors/{parse-json.md => parse_json.md} | 11 +++++++++++ .../processors/{parse-xml.md => parse_xml.md} | 11 ++++++++++- .../configuration/processors/write_json.md | 11 +++++++++-- .../pipelines/configuration/sources/s3.md | 2 +- 6 files changed, 56 insertions(+), 7 deletions(-) rename _data-prepper/pipelines/configuration/processors/{parse-ion.md => parse_ion.md} (61%) rename _data-prepper/pipelines/configuration/processors/{parse-json.md => parse_json.md} (70%) rename _data-prepper/pipelines/configuration/processors/{parse-xml.md => parse_xml.md} (70%) diff --git a/_data-prepper/pipelines/configuration/processors/convert_entry_type.md b/_data-prepper/pipelines/configuration/processors/convert_entry_type.md index 2fc9fdb9bd..c2c46260ed 100644 --- a/_data-prepper/pipelines/configuration/processors/convert_entry_type.md +++ b/_data-prepper/pipelines/configuration/processors/convert_entry_type.md @@ -14,10 +14,22 @@ The `convert_entry_type` processor converts a value type associated with the spe You can configure the `convert_entry_type` processor with the following options. + + | Option | Required | Description | | :--- | :--- | :--- | -| `key`| Yes | Keys whose value needs to be converted to a different type. | -| `type` | No | Target type for the key-value pair. Possible values are `integer`, `double`, `string`, and `Boolean`. Default value is `integer`. | +| `key`| Yes | Key whose value needs to be converted to a different type. | +| `keys`| Yes | Keys whose value needs to be converted to a different type. | +| `type` | No | Target type for the key-value pair. Possible values are `integer`, `long`, `double`, `big_decimal`, `string`, and `boolean`. Default value is `integer`. | +| `null_values` | No | String representation of what constitutes a `null` value. If the field value equals one of these strings, then the value is considered `null` and is converted to `null`. | +| `scale` | No | Modifies the scale of the `big_decimal` when converting to a `big_decimal`. The default value is `0`. | +| `tags_on_failure` | No | A list of tags to be added to the event metadata when the event fails to convert. | +| `convert_when` | No | Specifies a condition using a [Data Prepper expression]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/) for performing the `convert_entry_type` operation. If specified, the `convert_entry_type` operation runs only when the expression evaluates to `true`. | ## Usage diff --git a/_data-prepper/pipelines/configuration/processors/parse-ion.md b/_data-prepper/pipelines/configuration/processors/parse_ion.md similarity index 61% rename from _data-prepper/pipelines/configuration/processors/parse-ion.md rename to _data-prepper/pipelines/configuration/processors/parse_ion.md index 0edd446c42..8360eaa296 100644 --- a/_data-prepper/pipelines/configuration/processors/parse-ion.md +++ b/_data-prepper/pipelines/configuration/processors/parse_ion.md @@ -14,12 +14,22 @@ The `parse_ion` processor parses [Amazon Ion](https://amazon-ion.github.io/ion-d You can configure the `parse_ion` processor with the following options. + + | Option | Required | Type | Description | | :--- | :--- | :--- | :--- | | `source` | No | String | The field in the `event` that is parsed. Default value is `message`. | | `destination` | No | String | The destination field of the parsed JSON. Defaults to the root of the `event`. Cannot be `""`, `/`, or any white-space-only `string` because these are not valid `event` fields. | | `pointer` | No | String | A JSON pointer to the field to be parsed. There is no `pointer` by default, meaning that the entire `source` is parsed. The `pointer` can access JSON array indexes as well. If the JSON pointer is invalid, then the entire `source` data is parsed into the outgoing `event`. If the key that is pointed to already exists in the `event` and the `destination` is the root, then the pointer uses the entire path of the key. | -| `tags_on_failure` | No | String | A list of strings that specify the tags to be set in the event that the processors fails or an unknown exception occurs while parsing. +| `parse_when` | No | String | Specifies under which conditions the processor should perform parsing. Default is no condition. Accepts a Data Prepper expression string following the [Expression syntax]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). | +| `overwrite_if_destination_exists` | No | Boolean | Overwrites the destination if set to `true`. Set to `false` to prevent changing a destination value that exists. Defaults is `true`. | +| `delete_source` | No | Boolean | If set to `true`, then the source field is deleted. Defaults is `false`. | +| `tags_on_failure` | No | String | A list of strings specifying the tags to be set in the event that the processor fails or an unknown exception occurs during parsing. ## Usage diff --git a/_data-prepper/pipelines/configuration/processors/parse-json.md b/_data-prepper/pipelines/configuration/processors/parse_json.md similarity index 70% rename from _data-prepper/pipelines/configuration/processors/parse-json.md rename to _data-prepper/pipelines/configuration/processors/parse_json.md index 2cbce4782e..894d5dba42 100644 --- a/_data-prepper/pipelines/configuration/processors/parse-json.md +++ b/_data-prepper/pipelines/configuration/processors/parse_json.md @@ -15,11 +15,22 @@ The `parse_json` processor parses JSON data for an event, including any nested f You can configure the `parse_json` processor with the following options. + + | Option | Required | Type | Description | | :--- | :--- | :--- | :--- | | `source` | No | String | The field in the `event` that will be parsed. Default value is `message`. | | `destination` | No | String | The destination field of the parsed JSON. Defaults to the root of the `event`. Cannot be `""`, `/`, or any white-space-only `string` because these are not valid `event` fields. | | `pointer` | No | String | A JSON pointer to the field to be parsed. There is no `pointer` by default, meaning the entire `source` is parsed. The `pointer` can access JSON array indexes as well. If the JSON pointer is invalid then the entire `source` data is parsed into the outgoing `event`. If the key that is pointed to already exists in the `event` and the `destination` is the root, then the pointer uses the entire path of the key. | +| `parse_when` | No | String | Specifies under which conditions the processor should perform parsing. Default is no condition. Accepts a Data Prepper expression string following the [Expression syntax]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). | +| `overwrite_if_destination_exists` | No | Boolean | Overwrites the destination if set to `true`. Set to `false` to prevent changing a destination value that exists. Defaults to `true`. | +| `delete_source` | No | Boolean | If set to `true` then this will delete the source field. Defaults to `false`. | +| `tags_on_failure` | No | String | A list of strings specifying the tags to be set in the event that the processor fails or an unknown exception occurs during parsing. ## Usage diff --git a/_data-prepper/pipelines/configuration/processors/parse-xml.md b/_data-prepper/pipelines/configuration/processors/parse_xml.md similarity index 70% rename from _data-prepper/pipelines/configuration/processors/parse-xml.md rename to _data-prepper/pipelines/configuration/processors/parse_xml.md index 861705da2b..c8c9f3eebf 100644 --- a/_data-prepper/pipelines/configuration/processors/parse-xml.md +++ b/_data-prepper/pipelines/configuration/processors/parse_xml.md @@ -14,13 +14,22 @@ The `parse_xml` processor parses XML data for an event. You can configure the `parse_xml` processor with the following options. + + | Option | Required | Type | Description | | :--- | :--- | :--- | :--- | | `source` | No | String | Specifies which `event` field to parse. | | `destination` | No | String | The destination field of the parsed XML. Defaults to the root of the `event`. Cannot be `""`, `/`, or any white-space-only string because these are not valid `event` fields. | | `pointer` | No | String | A JSON pointer to the field to be parsed. The value is null by default, meaning that the entire `source` is parsed. The `pointer` can access JSON array indexes as well. If the JSON pointer is invalid, then the entire `source` data is parsed into the outgoing `event` object. If the key that is pointed to already exists in the `event` object and the `destination` is the root, then the pointer uses the entire path of the key. | | `parse_when` | No | String | Specifies under what conditions the processor should perform parsing. Default is no condition. Accepts a Data Prepper expression string following the [Data Prepper Expression Syntax]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). | -| `tags_on_failure` | No | String | A list of strings that specify the tags to be set if the processor fails or an unknown exception occurs while parsing. +| `overwrite_if_destination_exists` | No | Boolean | Overwrites the destination if set to `true`. Set to `false` to prevent changing a destination value that exists. Defaults to `true`. | +| `delete_source` | No | Boolean | If set to `true` then this will delete the source field. Defaults to `false`. | +| `tags_on_failure` | No | String | A list of strings specifying the tags to be set in the event that the processor fails or an unknown exception occurs during parsing. ## Usage diff --git a/_data-prepper/pipelines/configuration/processors/write_json.md b/_data-prepper/pipelines/configuration/processors/write_json.md index 8f1e6851da..20414b4672 100644 --- a/_data-prepper/pipelines/configuration/processors/write_json.md +++ b/_data-prepper/pipelines/configuration/processors/write_json.md @@ -11,8 +11,15 @@ nav_order: 56 The `write_json` processor converts an object in an event into a JSON string. You can customize the processor to choose the source and target field names. -Option | Description | Example -:--- | :--- | :--- + + +Option | Description | Example +:--- | :--- | :--- source | Mandatory field that specifies the name of the field in the event containing the message or object to be parsed. | If `source` is set to `"message"` and the input is `{"message": {"key1":"value1", "key2":{"key3":"value3"}}}`, then the `write_json` processor outputs the event as `"{\"key1\":\"value1\",\"key2\":{\"key3\":\"value3\"}}"`. target | An optional field that specifies the name of the field in which the resulting JSON string should be stored. If `target` is not specified, then the `source` field is used. | `key1` diff --git a/_data-prepper/pipelines/configuration/sources/s3.md b/_data-prepper/pipelines/configuration/sources/s3.md index 5a7d9986e5..7b1599f838 100644 --- a/_data-prepper/pipelines/configuration/sources/s3.md +++ b/_data-prepper/pipelines/configuration/sources/s3.md @@ -138,7 +138,7 @@ The `codec` determines how the `s3` source parses each Amazon S3 object. For inc ### `newline` codec -The `newline` codec parses each single line as a single log event. This is ideal for most application logs because each event parses per single line. It can also be suitable for S3 objects that have individual JSON objects on each line, which matches well when used with the [parse_json]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/parse-json/) processor to parse each line. +The `newline` codec parses each single line as a single log event. This is ideal for most application logs because each event parses per single line. It can also be suitable for S3 objects that have individual JSON objects on each line, which matches well when used with the [parse_json]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/parse_json/) processor to parse each line. Use the following options to configure the `newline` codec. From 78414ee92fb963dfc7603b78de156e626db35839 Mon Sep 17 00:00:00 2001 From: David Venable Date: Fri, 19 Jul 2024 12:57:24 -0500 Subject: [PATCH 045/154] Adds documentation for the Data Prepper delay processor. (#7708) Adds documentation for the delay processor. Signed-off-by: David Venable Signed-off-by: Melissa Vagi Co-authored-by: Melissa Vagi --- .../configuration/processors/delay.md | 27 +++++++++++++++++++ .../{delete-entries.md => delete_entries.md} | 2 +- .../configuration/processors/mutate-event.md | 2 +- 3 files changed, 29 insertions(+), 2 deletions(-) create mode 100644 _data-prepper/pipelines/configuration/processors/delay.md rename _data-prepper/pipelines/configuration/processors/{delete-entries.md => delete_entries.md} (99%) diff --git a/_data-prepper/pipelines/configuration/processors/delay.md b/_data-prepper/pipelines/configuration/processors/delay.md new file mode 100644 index 0000000000..c4e9d8e973 --- /dev/null +++ b/_data-prepper/pipelines/configuration/processors/delay.md @@ -0,0 +1,27 @@ +--- +layout: default +title: delay +parent: Processors +grand_parent: Pipelines +nav_order: 41 +--- + +# delay + +This processor will add a delay into the processor chain. Typically, you should use this only for testing, experimenting, and debugging. + +## Configuration + +Option | Required | Type | Description +:--- | :--- | :--- | :--- +`for` | No | Duration | The duration of time to delay. Defaults to `1s`. + +## Usage + +The following example shows using the `delay` processor to delay for 2 seconds. + +```yaml +processor: + - delay: + for: 2s +``` diff --git a/_data-prepper/pipelines/configuration/processors/delete-entries.md b/_data-prepper/pipelines/configuration/processors/delete_entries.md similarity index 99% rename from _data-prepper/pipelines/configuration/processors/delete-entries.md rename to _data-prepper/pipelines/configuration/processors/delete_entries.md index 33c54a0b29..c9a93a1f3e 100644 --- a/_data-prepper/pipelines/configuration/processors/delete-entries.md +++ b/_data-prepper/pipelines/configuration/processors/delete_entries.md @@ -3,7 +3,7 @@ layout: default title: delete_entries parent: Processors grand_parent: Pipelines -nav_order: 41 +nav_order: 43 --- # delete_entries diff --git a/_data-prepper/pipelines/configuration/processors/mutate-event.md b/_data-prepper/pipelines/configuration/processors/mutate-event.md index 9b3b2afb33..1afb34a970 100644 --- a/_data-prepper/pipelines/configuration/processors/mutate-event.md +++ b/_data-prepper/pipelines/configuration/processors/mutate-event.md @@ -13,7 +13,7 @@ Mutate event processors allow you to modify events in Data Prepper. The followin * [add_entries]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/add-entries/) allows you to add entries to an event. * [convert_entry_type]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/convert_entry_type/) allows you to convert value types in an event. * [copy_values]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/copy-values/) allows you to copy values within an event. -* [delete_entries]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/delete-entries/) allows you to delete entries from an event. +* [delete_entries]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/delete_entries/) allows you to delete entries from an event. * [list_to_map]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/list-to-map) allows you to convert list of objects from an event where each object contains a `key` field into a map of target keys. * `map_to_list` allows you to convert a map of objects from an event, where each object contains a `key` field, into a list of target keys. * [rename_keys]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/rename-keys/) allows you to rename keys in an event. From 9dc4d29294550675c1b3a93e9074da07d9455d14 Mon Sep 17 00:00:00 2001 From: Heather Halter Date: Fri, 19 Jul 2024 10:58:35 -0700 Subject: [PATCH 046/154] Update index.md (#7779) Community feedback Signed-off-by: Heather Halter --- _install-and-configure/install-opensearch/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_install-and-configure/install-opensearch/index.md b/_install-and-configure/install-opensearch/index.md index 1afe12f6a5..541321bcdd 100644 --- a/_install-and-configure/install-opensearch/index.md +++ b/_install-and-configure/install-opensearch/index.md @@ -121,5 +121,5 @@ Property | Description `opensearch.xcontent.string.length.max=` | By default, OpenSearch does not impose any limits on the maximum length of the JSON/YAML/CBOR/Smile string fields. To protect your cluster against potential distributed denial-of-service (DDoS) or memory issues, you can set the `opensearch.xcontent.string.length.max` system property to a reasonable limit (the maximum is 2,147,483,647), for example, `-Dopensearch.xcontent.string.length.max=5000000`. | `opensearch.xcontent.fast_double_writer=[true|false]` | By default, OpenSearch serializes floating-point numbers using the default implementation provided by the Java Runtime Environment. Set this value to `true` to use the Schubfach algorithm, which is faster but may lead to small differences in precision. Default is `false`. | `opensearch.xcontent.name.length.max=` | By default, OpenSearch does not impose any limits on the maximum length of the JSON/YAML/CBOR/Smile field names. To protect your cluster against potential DDoS or memory issues, you can set the `opensearch.xcontent.name.length.max` system property to a reasonable limit (the maximum is 2,147,483,647), for example, `-Dopensearch.xcontent.name.length.max=50000`. | -`opensearch.xcontent.depth.max=` | By default, OpenSearch does not impose any limits on the maximum nesting depth for JSON/YAML/CBOR/Smile documents. To protect your cluster against potential DDoS or memory issues, you can set the `opensearch.xcontent.depth.max` system property to a reasonable limit (the maximum is 2,147,483,647), for example, `-Dopensearch.xcontent.name.length.max=1000`. | +`opensearch.xcontent.depth.max=` | By default, OpenSearch does not impose any limits on the maximum nesting depth for JSON/YAML/CBOR/Smile documents. To protect your cluster against potential DDoS or memory issues, you can set the `opensearch.xcontent.depth.max` system property to a reasonable limit (the maximum is 2,147,483,647), for example, `-Dopensearch.xcontent.depth.max=1000`. | `opensearch.xcontent.codepoint.max=` | By default, OpenSearch imposes a limit of `52428800` on the maximum size of the YAML documents (in code points). To protect your cluster against potential DDoS or memory issues, you can change the `opensearch.xcontent.codepoint.max` system property to a reasonable limit (the maximum is 2,147,483,647). For example, `-Dopensearch.xcontent.codepoint.max=5000000`. | From 55ce5f81983370e2f1f1dc6a2e5dc7da8b9129f5 Mon Sep 17 00:00:00 2001 From: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> Date: Mon, 22 Jul 2024 12:01:41 -0400 Subject: [PATCH 047/154] add acronym for reference (#7786) Signed-off-by: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> --- _search-plugins/cross-cluster-search.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/cross-cluster-search.md b/_search-plugins/cross-cluster-search.md index 947097e8b3..7d3ff72efb 100644 --- a/_search-plugins/cross-cluster-search.md +++ b/_search-plugins/cross-cluster-search.md @@ -9,7 +9,7 @@ redirect_from: # Cross-cluster search -You can use the cross-cluster search feature in OpenSearch to search and analyze data across multiple clusters, enabling you to gain insights from distributed data sources. Cross-cluster search is available by default with the Security plugin, but you need to configure each cluster to allow remote connections from other clusters. This involves setting up remote cluster connections and configuring access permissions. +You can use cross-cluster search (CCS) in OpenSearch to search and analyze data across multiple clusters, enabling you to gain insights from distributed data sources. Cross-cluster search is available by default with the Security plugin, but you need to configure each cluster to allow remote connections from other clusters. This involves setting up remote cluster connections and configuring access permissions. --- From 84e533cc9386f063603f4facf3c95fc3d66e51b5 Mon Sep 17 00:00:00 2001 From: zhichao-aws Date: Tue, 23 Jul 2024 00:14:03 +0800 Subject: [PATCH 048/154] add doc for nested_path (#7741) Signed-off-by: zhichao-aws --- _ml-commons-plugin/agents-tools/tools/neural-sparse-tool.md | 1 + _ml-commons-plugin/agents-tools/tools/rag-tool.md | 1 + _ml-commons-plugin/agents-tools/tools/vector-db-tool.md | 1 + 3 files changed, 3 insertions(+) diff --git a/_ml-commons-plugin/agents-tools/tools/neural-sparse-tool.md b/_ml-commons-plugin/agents-tools/tools/neural-sparse-tool.md index 9fee4dcbd2..9014c585c8 100644 --- a/_ml-commons-plugin/agents-tools/tools/neural-sparse-tool.md +++ b/_ml-commons-plugin/agents-tools/tools/neural-sparse-tool.md @@ -212,6 +212,7 @@ Parameter | Type | Required/Optional | Description `name` | String | Optional | The tool name. Useful when an LLM needs to select an appropriate tool for a task. `description` | String | Optional | A description of the tool. Useful when an LLM needs to select an appropriate tool for a task. `doc_size` | Integer | Optional | The number of documents to fetch. Default is `2`. +`nested_path` | String | Optional | The path to the nested object for the nested query. Only used for nested fields. Default is `null`. ## Execute parameters diff --git a/_ml-commons-plugin/agents-tools/tools/rag-tool.md b/_ml-commons-plugin/agents-tools/tools/rag-tool.md index 1f6fafe49a..c88c2d047b 100644 --- a/_ml-commons-plugin/agents-tools/tools/rag-tool.md +++ b/_ml-commons-plugin/agents-tools/tools/rag-tool.md @@ -136,6 +136,7 @@ Parameter | Type | Required/Optional | Description `prompt` | String | Optional | The prompt to provide to the LLM. `k` | Integer | Optional | The number of nearest neighbors to search for when performing neural search. Default is 10. `enable_Content_Generation` | Boolean | Optional | If `true`, returns results generated by an LLM. If `false`, returns results directly without LLM-assisted content generation. Default is `true`. +`nested_path` | String | Optional | The path to the nested object for the nested query. Only used for nested fields. Default is `null`. ## Execute parameters diff --git a/_ml-commons-plugin/agents-tools/tools/vector-db-tool.md b/_ml-commons-plugin/agents-tools/tools/vector-db-tool.md index 9093541cbb..70d7e19321 100644 --- a/_ml-commons-plugin/agents-tools/tools/vector-db-tool.md +++ b/_ml-commons-plugin/agents-tools/tools/vector-db-tool.md @@ -225,6 +225,7 @@ Parameter | Type | Required/Optional | Description `input` | String | Required for flow agent | Runtime input sourced from flow agent parameters. If using a large language model (LLM), this field is populated with the LLM response. `doc_size` | Integer | Optional | The number of documents to fetch. Default is `2`. `k` | Integer | Optional | The number of nearest neighbors to search for when performing neural search. Default is `10`. +`nested_path` | String | Optional | The path to the nested object for the nested query. Only used for nested fields. Default is `null`. ## Execute parameters From fd629cac17a1ef18ad0729bb51b971515c5e5422 Mon Sep 17 00:00:00 2001 From: Daniel Widdis Date: Mon, 22 Jul 2024 09:17:38 -0700 Subject: [PATCH 049/154] Document new Split and Sort SearchResponseProcessors (#7767) * Add documentation for Sort SearchRequestProcessor Signed-off-by: Daniel Widdis * Add documentation for Split SearchRequestProcessor Signed-off-by: Daniel Widdis * Doc review Signed-off-by: Fanit Kolchina * Update _ingest-pipelines/processors/split.md Co-authored-by: Nathan Bower Signed-off-by: Daniel Widdis * Update _search-plugins/search-pipelines/sort-processor.md Co-authored-by: Nathan Bower Signed-off-by: Daniel Widdis * Update _search-plugins/search-pipelines/split-processor.md Co-authored-by: Nathan Bower Signed-off-by: Daniel Widdis * Update _search-plugins/search-pipelines/split-processor.md Co-authored-by: Nathan Bower Signed-off-by: Daniel Widdis * Update _search-plugins/search-pipelines/split-processor.md Co-authored-by: Nathan Bower Signed-off-by: Daniel Widdis * Update _search-plugins/search-pipelines/split-processor.md Co-authored-by: Nathan Bower Signed-off-by: Daniel Widdis --------- Signed-off-by: Daniel Widdis Signed-off-by: Fanit Kolchina Co-authored-by: Fanit Kolchina Co-authored-by: Nathan Bower --- _ingest-pipelines/processors/split.md | 2 +- .../search-pipelines/search-processors.md | 7 +- .../search-pipelines/sort-processor.md | 251 ++++++++++++++++++ .../search-pipelines/split-processor.md | 234 ++++++++++++++++ 4 files changed, 491 insertions(+), 3 deletions(-) create mode 100644 _search-plugins/search-pipelines/sort-processor.md create mode 100644 _search-plugins/search-pipelines/split-processor.md diff --git a/_ingest-pipelines/processors/split.md b/_ingest-pipelines/processors/split.md index c424ef671c..cdb0cfe3de 100644 --- a/_ingest-pipelines/processors/split.md +++ b/_ingest-pipelines/processors/split.md @@ -30,7 +30,7 @@ Parameter | Required/Optional | Description :--- | :--- | :--- `field` | Required | The field containing the string to be split. `separator` | Required | The delimiter used to split the string. This can be a regular expression pattern. -`preserve_field` | Optional | If set to `true`, preserves empty trailing fields (for example, `''`) in the resulting array. If set to `false`, empty trailing fields are removed from the resulting array. Default is `false`. +`preserve_trailing` | Optional | If set to `true`, preserves empty trailing fields (for example, `''`) in the resulting array. If set to `false`, then empty trailing fields are removed from the resulting array. Default is `false`. `target_field` | Optional | The field where the array of substrings is stored. If not specified, then the field is updated in-place. `ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not contain the specified field. If set to `true`, then the processor ignores missing values in the field and leaves the `target_field` unchanged. Default is `false`. `description` | Optional | A brief description of the processor. diff --git a/_search-plugins/search-pipelines/search-processors.md b/_search-plugins/search-pipelines/search-processors.md index 4630ab950c..ad515cc541 100644 --- a/_search-plugins/search-pipelines/search-processors.md +++ b/_search-plugins/search-pipelines/search-processors.md @@ -37,13 +37,16 @@ The following table lists all supported search response processors. Processor | Description | Earliest available version :--- | :--- | :--- +[`collapse`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/collapse-processor/)| Deduplicates search hits based on a field value, similarly to `collapse` in a search request. | 2.12 [`personalize_search_ranking`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/personalize-search-ranking/) | Uses [Amazon Personalize](https://aws.amazon.com/personalize/) to rerank search results (requires setting up the Amazon Personalize service). | 2.9 -[`retrieval_augmented_generation`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rag-processor/) | Used for retrieval-augmented generation (RAG) in [conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/). | 2.10 (generally available in 2.12) [`rename_field`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rename-field-processor/)| Renames an existing field. | 2.8 [`rerank`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/)| Reranks search results using a cross-encoder model. | 2.12 -[`collapse`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/collapse-processor/)| Deduplicates search hits based on a field value, similarly to `collapse` in a search request. | 2.12 +[`retrieval_augmented_generation`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rag-processor/) | Used for retrieval-augmented generation (RAG) in [conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/). | 2.10 (generally available in 2.12) +[`sort`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/sort-processor/)| Sorts an array of items in either ascending or descending order. | 2.16 +[`split`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/split-processor/)| Splits a string field into an array of substrings based on a specified delimiter. | 2.16 [`truncate_hits`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/truncate-hits-processor/)| Discards search hits after a specified target count is reached. Can undo the effect of the `oversample` request processor. | 2.12 + ## Search phase results processors A search phase results processor runs between search phases at the coordinating node level. It intercepts the results retrieved from one search phase and transforms them before passing them to the next search phase. diff --git a/_search-plugins/search-pipelines/sort-processor.md b/_search-plugins/search-pipelines/sort-processor.md new file mode 100644 index 0000000000..dde05c1b3a --- /dev/null +++ b/_search-plugins/search-pipelines/sort-processor.md @@ -0,0 +1,251 @@ +--- +layout: default +title: Sort +nav_order: 32 +has_children: false +parent: Search processors +grand_parent: Search pipelines +--- + +# Sort processor + +The `sort` processor sorts an array of items in either ascending or descending order. Numeric arrays are sorted numerically, while string or mixed arrays (strings and numbers) are sorted lexicographically. The processor throws an error if the input is not an array. + +## Request fields + +The following table lists all available request fields. + +Field | Data type | Description +:--- | :--- | :--- +`field` | String | The field to be sorted. Must be an array. Required. +`order` | String | The sort order to apply. Accepts `asc` for ascending or `desc` for descending. Default is `asc`. +`target_field` | String | The name of the field in which the sorted array is stored. If not specified, then the sorted array is stored in the same field as the original array (the `field` variable). +`tag` | String | The processor's identifier. +`description` | String | A description of the processor. +`ignore_failure` | Boolean | If `true`, then OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`. + +## Example + +The following example demonstrates using a search pipeline with a `sort` processor. + +### Setup + +Create an index named `my_index` and index a document with the field `message` that contains an array of strings: + +```json +POST /my_index/_doc/1 +{ + "message": ["one", "two", "three", "four"], + "visibility": "public" +} +``` +{% include copy-curl.html %} + +### Creating a search pipeline + +Create a search pipeline with a `sort` response processor that sorts the `message` field and stores the sorted results in the `sorted_message` field: + +```json +PUT /_search/pipeline/my_pipeline +{ + "response_processors": [ + { + "sort": { + "field": "message", + "target_field": "sorted_message" + } + } + ] +} +``` +{% include copy-curl.html %} + +### Using a search pipeline + +Search for documents in `my_index` without a search pipeline: + +```json +GET /my_index/_search +``` +{% include copy-curl.html %} + +The response contains the field `message`: + +
+ + Response + + {: .text-delta} + +```json +{ + "took": 1, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "my_index", + "_id": "1", + "_score": 1, + "_source": { + "message": [ + "one", + "two", + "three", + "four" + ], + "visibility": "public" + } + } + ] + } +} +``` +
+ +To search with a pipeline, specify the pipeline name in the `search_pipeline` query parameter: + +```json +GET /my_index/_search?search_pipeline=my_pipeline +``` +{% include copy-curl.html %} + +The `sorted_message` field contains the strings from the `message` field sorted alphabetically: + +
+ + Response + + {: .text-delta} + +```json +{ + "took": 3, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "my_index", + "_id": "1", + "_score": 1, + "_source": { + "visibility": "public", + "sorted_message": [ + "four", + "one", + "three", + "two" + ], + "message": [ + "one", + "two", + "three", + "four" + ] + } + } + ] + } +} +``` +
+ +You can also use the `fields` option to search for specific fields in a document: + +```json +POST /my_index/_search?pretty&search_pipeline=my_pipeline +{ + "fields": ["visibility", "message"] +} +``` +{% include copy-curl.html %} + +In the response, the `message` field is sorted and the results are stored in the `sorted_message` field: + +
+ + Response + + {: .text-delta} + +```json +{ + "took": 2, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "my_index", + "_id": "1", + "_score": 1, + "_source": { + "visibility": "public", + "sorted_message": [ + "four", + "one", + "three", + "two" + ], + "message": [ + "one", + "two", + "three", + "four" + ] + }, + "fields": { + "visibility": [ + "public" + ], + "sorted_message": [ + "four", + "one", + "three", + "two" + ], + "message": [ + "one", + "two", + "three", + "four" + ] + } + } + ] + } +} +``` +
\ No newline at end of file diff --git a/_search-plugins/search-pipelines/split-processor.md b/_search-plugins/search-pipelines/split-processor.md new file mode 100644 index 0000000000..6830f81ec3 --- /dev/null +++ b/_search-plugins/search-pipelines/split-processor.md @@ -0,0 +1,234 @@ +--- +layout: default +title: Split +nav_order: 33 +has_children: false +parent: Search processors +grand_parent: Search pipelines +--- + +# Split processor + +The `split` processor splits a string field into an array of substrings based on a specified delimiter. + +## Request fields + +The following table lists all available request fields. + +Field | Data type | Description +:--- | :--- | :--- +`field` | String | The field containing the string to be split. Required. +`separator` | String | The delimiter used to split the string. Specify either a single separator character or a regular expression pattern. Required. +`preserve_trailing` | Boolean | If set to `true`, preserves empty trailing fields (for example, `''`) in the resulting array. If set to `false`, then empty trailing fields are removed from the resulting array. Default is `false`. +`target_field` | String | The field in which the array of substrings is stored. If not specified, then the field is updated in place. +`tag` | String | The processor's identifier. +`description` | String | A description of the processor. +`ignore_failure` | Boolean | If `true`, then OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`. + +## Example + +The following example demonstrates using a search pipeline with a `split` processor. + +### Setup + +Create an index named `my_index` and index a document containing the field `message`: + +```json +POST /my_index/_doc/1 +{ + "message": "ingest, search, visualize, and analyze data", + "visibility": "public" +} +``` +{% include copy-curl.html %} + +### Creating a search pipeline + +The following request creates a search pipeline with a `split` response processor that splits the `message` field and stores the results in the `split_message` field: + +```json +PUT /_search/pipeline/my_pipeline +{ + "response_processors": [ + { + "split": { + "field": "message", + "separator": ", ", + "target_field": "split_message" + } + } + ] +} +``` +{% include copy-curl.html %} + +### Using a search pipeline + +Search for documents in `my_index` without a search pipeline: + +```json +GET /my_index/_search +``` +{% include copy-curl.html %} + +The response contains the field `message`: + +
+ + Response + + {: .text-delta} +```json +{ + "took": 3, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "my_index", + "_id": "1", + "_score": 1, + "_source": { + "message": "ingest, search, visualize, and analyze data", + "visibility": "public" + } + } + ] + } +} +``` +
+ +To search with a pipeline, specify the pipeline name in the `search_pipeline` query parameter: + +```json +GET /my_index/_search?search_pipeline=my_pipeline +``` +{% include copy-curl.html %} + +The `message` field is split and the results are stored in the `split_message` field: + +
+ + Response + + {: .text-delta} + +```json +{ + "took": 6, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "my_index", + "_id": "1", + "_score": 1, + "_source": { + "visibility": "public", + "message": "ingest, search, visualize, and analyze data", + "split_message": [ + "ingest", + "search", + "visualize", + "and analyze data" + ] + } + } + ] + } +} +``` +
+ +You can also use the `fields` option to search for specific fields in a document: + +```json +POST /my_index/_search?pretty&search_pipeline=my_pipeline +{ + "fields": ["visibility", "message"] +} +``` +{% include copy-curl.html %} + +In the response, the `message` field is split and the results are stored in the `split_message` field: + +
+ + Response + + {: .text-delta} + +```json +{ + "took": 7, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "my_index", + "_id": "1", + "_score": 1, + "_source": { + "visibility": "public", + "message": "ingest, search, visualize, and analyze data", + "split_message": [ + "ingest", + "search", + "visualize", + "and analyze data" + ] + }, + "fields": { + "visibility": [ + "public" + ], + "message": [ + "ingest, search, visualize, and analyze data" + ], + "split_message": [ + "ingest", + "search", + "visualize", + "and analyze data" + ] + } + } + ] + } +} +``` +
\ No newline at end of file From 17341992c553eddb7b6f560a5dbac6559d2a9237 Mon Sep 17 00:00:00 2001 From: Daniel Widdis Date: Mon, 22 Jul 2024 09:19:24 -0700 Subject: [PATCH 050/154] Add documentation for Deprovision Workflow API allow_delete parameter (#7639) * Add documentation for Deprovision Workflow API allow_delete parameter Signed-off-by: Daniel Widdis * Add new steps and missing delete search pipeline doc Signed-off-by: Daniel Widdis * Revert changes to workflow steps. Users can't use these new step types Signed-off-by: Daniel Widdis * Update _automating-configurations/api/deprovision-workflow.md Co-authored-by: Nathan Bower Signed-off-by: Daniel Widdis * Update _automating-configurations/api/deprovision-workflow.md Co-authored-by: Nathan Bower Signed-off-by: Daniel Widdis * Update _automating-configurations/api/deprovision-workflow.md Co-authored-by: Nathan Bower Signed-off-by: Daniel Widdis * Remove redundant use of workflow, accept other edits Signed-off-by: Daniel Widdis --------- Signed-off-by: Daniel Widdis Co-authored-by: Nathan Bower --- .gitignore | 1 + .../api/deprovision-workflow.md | 13 +++++++++- .../deleting-search-pipeline.md | 26 +++++++++++++++++++ 3 files changed, 39 insertions(+), 1 deletion(-) create mode 100644 _search-plugins/search-pipelines/deleting-search-pipeline.md diff --git a/.gitignore b/.gitignore index 446d1deda6..da3cf9d144 100644 --- a/.gitignore +++ b/.gitignore @@ -6,3 +6,4 @@ Gemfile.lock .idea *.iml .jekyll-cache +.project diff --git a/_automating-configurations/api/deprovision-workflow.md b/_automating-configurations/api/deprovision-workflow.md index e9219536ce..98c944a9d4 100644 --- a/_automating-configurations/api/deprovision-workflow.md +++ b/_automating-configurations/api/deprovision-workflow.md @@ -9,7 +9,9 @@ nav_order: 70 When you no longer need a workflow, you can deprovision its resources. Most workflow steps that create a resource have corresponding workflow steps to reverse that action. To retrieve all resources currently created for a workflow, call the [Get Workflow Status API]({{site.url}}{{site.baseurl}}/automating-configurations/api/get-workflow-status/). When you call the Deprovision Workflow API, resources included in the `resources_created` field of the Get Workflow Status API response will be removed using a workflow step corresponding to the one that provisioned them. -The workflow executes the provisioning workflow steps in reverse order. If failures occur because of resource dependencies, such as preventing deletion of a registered model if it is still deployed, the workflow attempts retries. +The workflow executes the provisioning steps in reverse order. If a failure occurs because of a resource dependency, such as trying to delete a registered model that is still deployed, then the workflow retries the failing step as long as at least one resource was deleted. + +To prevent data loss, resources created using the `create_index`, `create_search_pipeline`, and `create_ingest_pipeline` steps require the resource ID to be included in the `allow_delete` parameter. ## Path and HTTP methods @@ -24,6 +26,7 @@ The following table lists the available path parameters. | Parameter | Data type | Description | | :--- | :--- | :--- | | `workflow_id` | String | The ID of the workflow to be deprovisioned. Required. | +| `allow-delete` | String | A comma-separated list of resource IDs to be deprovisioned. Required if deleting resources of type `index_name` or `pipeline_id`. | ### Example request @@ -53,6 +56,14 @@ If deprovisioning did not completely remove all resources, OpenSearch responds w In some cases, the failure happens because of another dependent resource that took some time to be removed. In this case, you can attempt to send the same request again. {: .tip} +If deprovisioning required the `allow_delete` parameter, then OpenSearch responds with a `403 (FORBIDDEN)` status and identifies the resources that were not deprovisioned: + +```json +{ + "error": "These resources require the allow_delete parameter to deprovision: [index_name my-index]." +} +``` + To obtain a more detailed deprovisioning status than is provided by the summary in the error response, query the [Get Workflow Status API]({{site.url}}{{site.baseurl}}/automating-configurations/api/get-workflow-status/). On success, the workflow returns to a `NOT_STARTED` state. If some resources have not yet been removed, they are provided in the response. \ No newline at end of file diff --git a/_search-plugins/search-pipelines/deleting-search-pipeline.md b/_search-plugins/search-pipelines/deleting-search-pipeline.md new file mode 100644 index 0000000000..3f113f7688 --- /dev/null +++ b/_search-plugins/search-pipelines/deleting-search-pipeline.md @@ -0,0 +1,26 @@ +--- +layout: default +title: Deleting search pipelines +nav_order: 30 +has_children: false +parent: Search pipelines +grand_parent: Search +--- + +# Deleting search pipelines + +Use the following request to delete a pipeline. + +To delete a specific search pipeline, pass the pipeline ID as a parameter: + +```json +DELETE /_search/pipeline/ +``` +{% include copy-curl.html %} + +To delete all search pipelines in a cluster, use the wildcard character (`*`): + +```json +DELETE /_search/pipeline/* +``` +{% include copy-curl.html %} From f3fe8f910fa50f914dfafe90ed3e6fc8c0a87b79 Mon Sep 17 00:00:00 2001 From: Tejas Shah Date: Mon, 22 Jul 2024 10:51:53 -0700 Subject: [PATCH 051/154] Adds Documentation for dynamic query parameters for kNN search request (#7761) * Adds documentation for dynamic query parameters Signed-off-by: Tejas Shah * Update _search-plugins/knn/approximate-knn.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _search-plugins/knn/approximate-knn.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Tejas Shah * Update _search-plugins/knn/approximate-knn.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Tejas Shah * Update _search-plugins/knn/approximate-knn.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Tejas Shah * Update _search-plugins/knn/approximate-knn.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Tejas Shah * Update _search-plugins/knn/approximate-knn.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Tejas Shah * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Tejas Shah Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- _search-plugins/knn/approximate-knn.md | 51 +++++++++++++++++++++++++- 1 file changed, 49 insertions(+), 2 deletions(-) diff --git a/_search-plugins/knn/approximate-knn.md b/_search-plugins/knn/approximate-knn.md index 144365166f..fa1b4096c7 100644 --- a/_search-plugins/knn/approximate-knn.md +++ b/_search-plugins/knn/approximate-knn.md @@ -141,7 +141,7 @@ The following table provides examples of the number of results returned by vario 10 | 1 | 1 | 4 | 4 | 1 10 | 10 | 1 | 4 | 10 | 10 10 | 1 | 2 | 4 | 8 | 2 - + The number of results returned by Faiss/NMSLIB differs from the number of results returned by Lucene only when `k` is smaller than `size`. If `k` and `size` are equal, all engines return the same number of results. Starting in OpenSearch 2.14, you can use `k`, `min_score`, or `max_distance` for [radial search]({{site.url}}{{site.baseurl}}/search-plugins/knn/radial-search-knn/). @@ -253,7 +253,54 @@ POST _bulk ... ``` -After data is ingested, it can be search just like any other `knn_vector` field! +After data is ingested, it can be searched in the same way as any other `knn_vector` field. + +### Additional query parameters + +Starting with version 2.16, you can provide `method_parameters` in a search request: + +```json +GET my-knn-index-1/_search +{ + "size": 2, + "query": { + "knn": { + "my_vector2": { + "vector": [2, 3, 5, 6], + "k": 2, + "method_parameters" : { + "ef_search": 100 + } + } + } + } +} +``` +{% include copy-curl.html %} + +These parameters are dependent on the combination of engine and method used to create the index. The following sections provide information about the supported `method_parameters`. + +#### `ef_search` + +You can provide the `ef_search` parameter when searching an index created using the `hnsw` method. The `ef_search` parameter specifies the number of vectors to examine in order to find the top k nearest neighbors. Higher `ef_search` values improve recall at the cost of increased search latency. The value must be positive. + +The following table provides information about the `ef_search` parameter for the supported engines. + +Engine | Radial query support | Notes +:--- | :--- | :--- +`nmslib` | No | If `ef_search` is present in a query, it overrides the `index.knn.algo_param.ef_search` index setting. +`faiss` | Yes | If `ef_search` is present in a query, it overrides the `index.knn.algo_param.ef_search` index setting. +`lucene` | No | When creating a search query, you must specify `k`. If you provide both `k` and `ef_search`, then the larger value is passed to the engine. If `ef_search` is larger than `k`, you can provide the `size` parameter to limit the final number of results to `k`. + +#### `nprobes` + +You can provide the `nprobes` parameter when searching an index created using the `ivf` method. The `nprobes` parameter specifies the number of `nprobes` clusters to examine in order to find the top k nearest neighbors. Higher `nprobes` values improve recall at the cost of increased search latency. The value must be positive. + +The following table provides information about the `nprobes` parameter for the supported engines. + +Engine | Notes +:--- | :--- +`faiss` | If `nprobes` is present in a query, it overrides the value provided when creating the index. ### Using approximate k-NN with filters From b2abf250a4d0d3316e2960bd8910f1d49f79319c Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Mon, 22 Jul 2024 16:32:05 -0500 Subject: [PATCH 052/154] Add Rollover API (#7685) * Add Rollover API. Signed-off-by: Archer * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Make rollover match template. Signed-off-by: Archer * Apply suggestions from code review Co-authored-by: Melissa Vagi Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _api-reference/index-apis/rollover.md Co-authored-by: Melissa Vagi Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _api-reference/index-apis/rollover.md Co-authored-by: Melissa Vagi Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Co-authored-by: Melissa Vagi Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _api-reference/index-apis/rollover.md Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Archer Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower --- _api-reference/index-apis/rollover.md | 195 ++++++++++++++++++++++++++ 1 file changed, 195 insertions(+) create mode 100644 _api-reference/index-apis/rollover.md diff --git a/_api-reference/index-apis/rollover.md b/_api-reference/index-apis/rollover.md new file mode 100644 index 0000000000..722dfe196c --- /dev/null +++ b/_api-reference/index-apis/rollover.md @@ -0,0 +1,195 @@ +--- +layout: default +title: Rollover Index +parent: Index APIs +nav_order: 63 +--- + +# Rollover Index +Introduced 1.0 +{: .label .label-purple } + +The Rollover Index API creates a new index for a data stream or index alias based on the `wait_for_active_shards` setting. + +## Path and HTTP methods + +```json +POST //_rollover/ +POST //_rollover/ +``` + +## Rollover types + +You can roll over a data stream, an index alias with one index, or an index alias with a write index. + +### Data stream + +When you perform a rollover operation on a data stream, the API generates a fresh write index for that stream. Simultaneously, the stream's preceding write index transforms into a regular backing index. Additionally, the rollover process increments the generation count of the data stream. Data stream rollovers do not support specifying index settings in the request body. + +### Index alias with one index + +When initiating a rollover on an index alias associated with a single index, the API generates a new index and disassociates the original index from the alias. + +### Index alias with a write index + +When an index alias references multiple indexes, one index must be designated as the write index. During a rollover, the API creates a new write index with its `is_write_index` property set to `true` while updating the previous write index by setting its `is_write_index property` to `false.` + +## Incrementing index names for an alias + +During the index alias rollover process, if you don't specify a custom name and the current index's name ends with a hyphen followed by a number (for example, `my-index-000001` or `my-index-3`), then the rollover operation will automatically increment that number for the new index's name. For instance, rolling over `my-index-000001` will generate `my-index-000002`. The numeric portion is always padded with leading zeros to ensure a consistent length of six characters. + +## Using date math with index rollovers + +When using an index alias for time-series data, you can leverage [date math](https://opensearch.org/docs/latest/field-types/supported-field-types/date/) in the index name to track the rollover date. For example, you can create an alias pointing to `my-index-{now/d}-000001`. If you create an alias on June 11, 2029, then the index name would be `my-index-2029.06.11-000001`. For a rollover on June 12, 2029, the new index would be named `my-index-2029.06.12-000002`. See [Roll over an index alias with a write index](#rolling-over-an-index-alias-with-a-write-index) for a practical example. + +## Path parameters + +The Rollover Index API supports the parameters listed in the following table. + +Parameter | Type | Description +:--- | :--- | :--- +`` | String | The name of the data stream or index alias to roll over. Required. | +`` | String | The name of the index to create. Supports date math. Data streams do not support this parameter. If the name of the alias's current write index does not end with `-` and a number, such as `my-index-000001` or `my-index-2`, then the parameter is required. + +## Query parameters + +The following table lists the supported query parameters. + +Parameter | Type | Description +:--- | :--- | :--- +`cluster_manager_timeout` | Time | The amount of time to wait for a connection to the cluster manager node. Default is `30s`. +`timeout` | Time | The amount of time to wait for a response. Default is `30s`. +`wait_for_active_shards` | String | The number of active shards that must be available before OpenSearch processes the request. Default is `1` (only the primary shard). You can also set to `all` or a positive integer. Values greater than `1` require replicas. For example, if you specify a value of `3`, then the index must have two replicas distributed across two additional nodes in order for the operation to succeed. + +## Request body + +The following request body parameters are supported. + +### `alias` + +The `alias` parameter specifies the alias name as the key. It is required when the `template` option exists in the request body. The object body contains the following optional parameters. + + +Parameter | Type | Description +:--- | :--- | :--- +`filter` | Query DSL object | The query that limits the number of documents that the alias can access. +`index_routing` | String | The value that routes indexing operations to a specific shard. When specified, overwrites the `routing` value for indexing operations. +`is_hidden` | Boolean | Hides or unhides the alias. When `true`, the alias is hidden. Default is `false`. Indexes for the alias must have matching values for this setting. +`is_write_index` | Boolean | Specifies the write index. When `true`, the index is the write index for the alias. Default is `false`. +`routing` | String | The value used to route index and search operations to a specific shard. +`search_routing` | String | Routes search operations to a specific shard. When specified, it overwrites `routing` for search operations. + +### `mappings` + +The `mappings` parameter specifies the index field mappings. It is optional. See [Mappings and field types](https://opensearch.org/docs/latest/field-types/) for more information. + +### `conditions` + +The `conditions` parameter is an optional object defining criteria for triggering the rollover. When provided, OpenSearch only rolls over if the current index satisfies one or more specified conditions. If omitted, then the rollover occurs unconditionally without prerequisites. + +The object body supports the following parameters. + +Parameter | Type | Description +:--- | :--- | :--- +| `max_age` | Time units | Triggers a rollover after the maximum elapsed time since index creation is reached. The elapsed time is always calculated since the index creation time, even if the index origination date is configured to a custom date, such as when using the `index.lifecycle.parse_origination_date` or `index.lifecycle.origination_date` settings. Optional. | +`max_docs` | Integer | Triggers a rollover after the specified maximum number of documents, excluding documents added since the last refresh and documents in replica shards. Optional. +`max_size` | Byte units | Triggers a rollover when the index reaches a specified size, calculated as the total size of all primary shards. Replicas are not counted. Use the `_cat indices` API and check the `pri.store.size` value to see the current index size. Optional. +`max_primary_shard_size` | Byte units | Triggers a rollover when the largest primary shard in the index reaches a certain size. This is the maximum size of the primary shards in the index. As with `max_size`, replicas are ignored. To see the current shard size, use the `_cat shards` API. The `store` value shows the size of each shard, and `prirep` indicates whether a shard is a primary (`p`) or a replica (`r`). Optional. + +### `settings` + +The `settings` parameter specifies the index configuration options. See [Index settings](https://opensearch.org/docs/latest/install-and-configure/configuring-opensearch/index-settings/) for more information. + +## Example requests + +The following examples illustrate using the Rollover Index API. A rollover occurs when one or more of the specified conditions are met: + +- The index was created 5 or more days ago. +- The index contains 500 or more documents. +- The index's largest primary shard is 100 GB or larger. + +### Rolling over a data stream + +The following request rolls over the data stream if the current write index meets any of the specified conditions: + +```json +POST my-data-stream/_rollover +{ + "conditions": { + "max_age": "5d", + "max_docs": 500, + "max_primary_shard_size": "100gb" + } +} +``` +{% include copy-curl.html %} + +### Rolling over an index alias with a write index + +The following request creates a date-time index and sets it as the write index for `my-alias`: + +```json +PUT +PUT %3Cmy-index-%7Bnow%2Fd%7D-000001%3E +{ + "aliases": { + "my-alias": { + "is_write_index": true + } + } +} +``` +{% include copy-curl.html %} + +The next request performs a rollover using the alias: + +```json +POST my-alias/_rollover +{ + "conditions": { + "max_age": "5d", + "max_docs": 500, + "max_primary_shard_size": "100gb" + } +} +``` +{% include copy-curl.html %} + +### Specifying settings during a rollover + +In most cases, you can use an index template to automatically configure the indexes created during a rollover operation. However, when rolling over an index alias, you can use the Rollover Index API to introduce additional index settings or override the settings defined in the template by sending the following request: + +```json +POST my-alias/_rollover +{ + "settings": { + "index.number_of_shards": 4 + } +} +``` +{% include copy-curl.html %} + + +## Example response + +OpenSearch returns the following response confirming that all conditions except `max_primary_shard_size` were met: + +```json +{ + "acknowledged": true, + "shards_acknowledged": true, + "old_index": ".ds-my-data-stream-2029.06.11-000001", + "new_index": ".ds-my-data-stream-2029.06.12-000002", + "rolled_over": true, + "dry_run": false, + "conditions": { + "[max_age: 5d]": true, + "[max_docs: 500]": true, + "[max_primary_shard_size: 100gb]": false + } +} +``` + + + + From 1d2e4447cd518791d2900d59132db198d83eaae6 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Mon, 22 Jul 2024 16:32:22 -0500 Subject: [PATCH 053/154] Fix liquid syntax errors. (#7785) * Fix liquid syntax errors. Signed-off-by: Archer * Update render-template.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _api-reference/render-template.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _api-reference/render-template.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Archer Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _api-reference/render-template.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/_api-reference/render-template.md b/_api-reference/render-template.md index 16bada0290..409fde5e4a 100644 --- a/_api-reference/render-template.md +++ b/_api-reference/render-template.md @@ -44,7 +44,7 @@ Both of the following request examples use the search template with the template "source": { "query": { "match": { - "play_name": "{{play_name}}" + "play_name": "{% raw %}{{play_name}}{% endraw %}" } } }, @@ -76,11 +76,11 @@ If you don't want to use a saved template, or want to test a template before sav ``` { "source": { - "from": "{{from}}{{^from}}10{{/from}}", - "size": "{{size}}{{^size}}10{{/size}}", + "from": "{% raw %}{{from}}{{^from}}0{{/from}}{% endraw %}", + "size": "{% raw %}{{size}}{{^size}}10{{/size}}{% endraw %}", "query": { "match": { - "play_name": "{{play_name}}" + "play_name": "{% raw %}{{play_name}}{% endraw %}" } } }, From 3977152d4d04f6bd2c6520b94a90cf5202541ab9 Mon Sep 17 00:00:00 2001 From: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> Date: Tue, 23 Jul 2024 11:48:12 -0400 Subject: [PATCH 054/154] Explain ISM + link (#7787) * Explain ISM + link Signed-off-by: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> * Update _im-plugin/refresh-analyzer.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _im-plugin/refresh-analyzer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_im-plugin/refresh-analyzer.md b/_im-plugin/refresh-analyzer.md index 2e50f06dc0..bff54b739f 100644 --- a/_im-plugin/refresh-analyzer.md +++ b/_im-plugin/refresh-analyzer.md @@ -10,7 +10,7 @@ redirect_from: # Refresh search analyzer -With ISM installed, you can refresh search analyzers in real time with the following API: +You can refresh search analyzers in real time using the following API. This requires the [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/ism/index/) (ISM) plugin to be installed. For more information, see [Installing plugins]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/). ```json POST /_plugins/_refresh_search_analyzers/ From eb08f04b89d30f7d26f4fd8b0a47965fb52d0bd1 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Tue, 23 Jul 2024 12:36:26 -0400 Subject: [PATCH 055/154] Unify and correct geoshape GeoJSON and WKT examples (#7801) * Unify and correct geoshape GeoJSON and WKT examples Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- .../supported-field-types/geo-shape.md | 72 +++++++++++-------- 1 file changed, 41 insertions(+), 31 deletions(-) diff --git a/_field-types/supported-field-types/geo-shape.md b/_field-types/supported-field-types/geo-shape.md index cbf63551df..b7b06a0d04 100644 --- a/_field-types/supported-field-types/geo-shape.md +++ b/_field-types/supported-field-types/geo-shape.md @@ -68,7 +68,7 @@ PUT testindex/_doc/1 { "location" : { "type" : "point", - "coordinates" : [74.00, 40.71] + "coordinates" : [74.0060, 40.7128] } } ``` @@ -126,10 +126,12 @@ PUT testindex/_doc/3 "location" : { "type" : "polygon", "coordinates" : [ - [[74.0060, 40.7128], - [71.0589, 42.3601], - [73.7562, 42.6526], - [74.0060, 40.7128]] + [ + [74.0060, 40.7128], + [73.7562, 42.6526], + [71.0589, 42.3601], + [74.0060, 40.7128] + ] ] } } @@ -159,15 +161,18 @@ PUT testindex/_doc/4 "location" : { "type" : "polygon", "coordinates" : [ - [[74.0060, 40.7128], - [71.0589, 42.3601], - [73.7562, 42.6526], - [74.0060, 40.7128]], - - [[72.6734,41.7658], - [72.6506, 41.5623], - [73.0515, 41.5582], - [72.6734, 41.7658]] + [ + [74.0060, 40.7128], + [73.7562, 42.6526], + [71.0589, 42.3601], + [74.0060, 40.7128] + ], + [ + [72.6734,41.7658], + [73.0515, 41.5582], + [72.6506, 41.5623], + [72.6734, 41.7658] + ] ] } } @@ -179,12 +184,12 @@ Index a polygon (triangle) with a triangular hole in WKT format: ```json PUT testindex/_doc/4 { - "location" : "POLYGON ((40.7128 74.0060, 42.3601 71.0589, 42.6526 73.7562, 40.7128 74.0060), (41.7658 72.6734, 41.5623 72.6506, 41.5582 73.0515, 41.7658 72.6734))" + "location" : "POLYGON ((74.0060 40.7128, 71.0589 42.3601, 73.7562 42.6526, 74.0060 40.7128), (72.6734 41.7658, 72.6506 41.5623, 73.0515 41.5582, 72.6734 41.7658))" } ``` {% include copy-curl.html %} -In OpenSearch, you can specify a polygon by listing its vertices clockwise or counterclockwise. This works well for polygons that do not cross the date line (are narrower than 180°). However, a polygon that crosses the date line (is wider than 180°) might be ambiguous because WKT does not impose a specific order on vertices. Thus, you must specify polygons that cross the date line by listing their vertices counterclockwise. +You can specify a polygon in OpenSearch by listing its vertices in clockwise or counterclockwise order. This works well for polygons that do not cross the date line (that are narrower than 180°). However, a polygon that crosses the date line (is wider than 180°) might be ambiguous because WKT does not impose a specific order on vertices. Thus, you must specify polygons that cross the date line by listing their vertices in counterclockwise order. You can define an [`orientation`](#parameters) parameter to specify the vertex traversal order at mapping time: @@ -295,23 +300,28 @@ PUT testindex/_doc/4 "type" : "multipolygon", "coordinates" : [ [ - [[74.0060, 40.7128], - [71.0589, 42.3601], - [73.7562, 42.6526], - [74.0060, 40.7128]], - - [[72.6734,41.7658], - [72.6506, 41.5623], - [73.0515, 41.5582], - [72.6734, 41.7658]] + [ + [74.0060, 40.7128], + [73.7562, 42.6526], + [71.0589, 42.3601], + [74.0060, 40.7128] + ], + [ + [73.0515, 41.5582], + [72.6506, 41.5623], + [72.6734, 41.7658], + [73.0515, 41.5582] + ] ], [ - [[73.9776, 40.7614], - [73.9554, 40.7827], - [73.9631, 40.7812], - [73.9776, 40.7614]] + [ + [73.9146, 40.8252], + [73.8871, 41.0389], + [73.6853, 40.9747], + [73.9146, 40.8252] ] ] + ] } } ``` @@ -322,7 +332,7 @@ Index a multipolygon in WKT format: ```json PUT testindex/_doc/4 { - "location" : "MULTIPOLYGON (((40.7128 74.0060, 42.3601 71.0589, 42.6526 73.7562, 40.7128 74.0060), (41.7658 72.6734, 41.5623 72.6506, 41.5582 73.0515, 41.7658 72.6734)), ((73.9776 40.7614, 73.9554 40.7827, 73.9631 40.7812, 73.9776 40.7614)))" + "location" : "MULTIPOLYGON (((74.0060 40.7128, 71.0589 42.3601, 73.7562 42.6526, 74.0060 40.7128), (72.6734 41.7658, 72.6506 41.5623, 73.0515 41.5582, 72.6734 41.7658)), ((73.9146 40.8252, 73.6853 40.9747, 73.8871 41.0389, 73.9146 40.8252)))" } ``` {% include copy-curl.html %} @@ -400,5 +410,5 @@ Parameter | Description :--- | :--- `coerce` | A Boolean value that specifies whether to automatically close unclosed linear rings. Default is `false`. `ignore_malformed` | A Boolean value that specifies to ignore malformed GeoJSON or WKT geoshapes and not to throw an exception. Default is `false` (throw an exception when geoshapes are malformed). -`ignore_z_value` | Specific to points with three coordinates. If `ignore_z_value` is `true`, the third coordinate is not indexed but is still stored in the _source field. If `ignore_z_value` is `false`, an exception is thrown. Default is `true`. +`ignore_z_value` | Specific to points with three coordinates. If `ignore_z_value` is `true`, then the third coordinate is not indexed but is still stored in the `_source` field. If `ignore_z_value` is `false`, then an exception is thrown. Default is `true`. `orientation` | Specifies the traversal order of the vertices in the geoshape's list of coordinates. `orientation` takes the following values:
1. RIGHT: counterclockwise. Specify RIGHT orientation by using one of the following strings (uppercase or lowercase): `right`, `counterclockwise`, `ccw`.
2. LEFT: clockwise. Specify LEFT orientation by using one of the following strings (uppercase or lowercase): `left`, `clockwise`, `cw`. This value can be overridden by individual documents.
Default is `RIGHT`. From 50eed6b420ea74df74621aad704f9538db115983 Mon Sep 17 00:00:00 2001 From: Frank Dattalo <73919354+fddattal@users.noreply.github.com> Date: Tue, 23 Jul 2024 10:14:14 -0700 Subject: [PATCH 056/154] Documentation Updates for plugins.query.datasources.enabled SQL Setting (#7794) * Documentation Updates for plugins.query.datasources.enabled SQL Setting This setting allows users to toggle the data source code paths in the SQL plugin. Ref: https://github.com/opensearch-project/sql/pull/2811/files Signed-off-by: Frank Dattalo * Update _search-plugins/sql/settings.md Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Frank Dattalo Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- _search-plugins/sql/settings.md | 1 + 1 file changed, 1 insertion(+) diff --git a/_search-plugins/sql/settings.md b/_search-plugins/sql/settings.md index d4aaac7f40..4842f98449 100644 --- a/_search-plugins/sql/settings.md +++ b/_search-plugins/sql/settings.md @@ -78,6 +78,7 @@ Setting | Default | Description `plugins.sql.cursor.keep_alive` | 1 minute | Configures how long the cursor context is kept open. Cursor contexts are resource-intensive, so we recommend a low value. `plugins.query.memory_limit` | 85% | Configures the heap memory usage limit for the circuit breaker of the query engine. `plugins.query.size_limit` | 200 | Sets the default size of index that the query engine fetches from OpenSearch. +`plugins.query.datasources.enabled` | true | Change to `false` to disable support for data sources in the plugin. ## Spark connector settings From 06ef161814c7bc428d1b05dc4b15f6570ff0b9fe Mon Sep 17 00:00:00 2001 From: gaobinlong Date: Wed, 24 Jul 2024 02:07:39 +0800 Subject: [PATCH 057/154] Add the documentation of create or update alias API (#7641) * Add the documentation of create or update alias API Signed-off-by: gaobinlong * Fix typo Signed-off-by: gaobinlong * Refine the wording Signed-off-by: gaobinlong * Update update-alias.md * Fix typo Signed-off-by: gaobinlong * Add some clarification Signed-off-by: gaobinlong * Update update-alias.md * Update update-alias.md * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _api-reference/index-apis/update-alias.md Signed-off-by: Nathan Bower --------- Signed-off-by: gaobinlong Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Nathan Bower Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Nathan Bower --- _api-reference/index-apis/update-alias.md | 83 +++++++++++++++++++++++ 1 file changed, 83 insertions(+) create mode 100644 _api-reference/index-apis/update-alias.md diff --git a/_api-reference/index-apis/update-alias.md b/_api-reference/index-apis/update-alias.md new file mode 100644 index 0000000000..cac05ceedb --- /dev/null +++ b/_api-reference/index-apis/update-alias.md @@ -0,0 +1,83 @@ +--- +layout: default +title: Create or Update Alias +parent: Index APIs +nav_order: 5 +--- + +# Create or Update Alias +**Introduced 1.0** +{: .label .label-purple } + +The Create or Update Alias API adds a data stream or index to an alias or updates the settings for an existing alias. For more alias API operations, see [Index aliases]({{site.url}}{{site.baseurl}}/opensearch/index-alias/). + +The Create or Update Alias API is distinct from the [Alias API]({{site.url}}{{site.baseurl}}/opensearch/rest-api/alias/), which supports the addition and removal of aliases and the removal of alias indexes. In contrast, the following API only supports adding or updating an alias without updating the index itself. Each API also uses different request body parameters. +{: .note} + +## Path and HTTP methods + +``` +POST //_alias/ +PUT //_alias/ +POST /_alias/ +PUT /_alias/ +POST //_aliases/ +PUT //_aliases/ +POST /_aliases/ +PUT /_aliases/ +PUT //_alias +PUT //_aliases +PUT /_alias +``` + +## Path parameters + +| Parameter | Type | Description | +:--- | :--- | :--- +| `target` | String | A comma-delimited list of data streams and indexes. Wildcard expressions (`*`) are supported. To target all data streams and indexes in a cluster, use `_all` or `*`. Optional. | +| `alias-name` | String | The alias name to be created or updated. Optional. | + +## Query parameters + +All query parameters are optional. + +Parameter | Type | Description +:--- | :--- | :--- +`cluster_manager_timeout` | Time | The amount of time to wait for a response from the cluster manager node. Default is `30s`. +`timeout` | Time | The amount of time to wait for a response from the cluster. Default is `30s`. + +## Request body + +In the request body, you can specify the index name, the alias name, and the settings for the alias. All fields are optional. + +Field | Type | Description +:--- | :--- | :--- | :--- +`index` | String | A comma-delimited list of data streams or indexes that you want to associate with the alias. If this field is set, it will override the index name specified in the URL path. +`alias` | String | The name of the alias. If this field is set, it will override the alias name specified in the URL path. +`is_write_index` | Boolean | Specifies whether the index should be a write index. An alias can only have one write index at a time. If a write request is submitted to an alias that links to multiple indexes, then OpenSearch runs the request only on the write index. +`routing` | String | Assigns a custom value to a shard for specific operations. +`index_routing` | String | Assigns a custom value to a shard only for index operations. +`search_routing` | String | Assigns a custom value to a shard only for search operations. +`filter` | Object | A filter to use with the alias so that the alias points to a filtered part of the index. + +## Example request + +The following example request adds a sample alias with a custom routing value: + +```json +POST sample-index/_alias/sample-alias +{ + "routing":"test" +} +``` +{% include copy-curl.html %} + +## Example response + +```json +{ + "acknowledged": true +} +``` + +For more alias API operations, see [Index aliases]({{site.url}}{{site.baseurl}}/opensearch/index-alias/). From 5a788e98efc48e65e818610b14b0c513dca1c5eb Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Wed, 24 Jul 2024 11:36:11 -0500 Subject: [PATCH 058/154] Add Recovery API (#7653) * Add Recover Index API Signed-off-by: Archer * Add examples Signed-off-by: Archer * Fix link and parameter Signed-off-by: Archer * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: gaobinlong Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update recover.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _api-reference/index-apis/recover.md Signed-off-by: Heather Halter * Update _api-reference/index-apis/recover.md Signed-off-by: Heather Halter * Update recover.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _api-reference/index-apis/recover.md Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Archer Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Heather Halter Co-authored-by: gaobinlong Co-authored-by: Heather Halter Co-authored-by: Nathan Bower --- _api-reference/index-apis/recover.md | 291 +++++++++++++++++++++++++++ 1 file changed, 291 insertions(+) create mode 100644 _api-reference/index-apis/recover.md diff --git a/_api-reference/index-apis/recover.md b/_api-reference/index-apis/recover.md new file mode 100644 index 0000000000..dc2df1e5a2 --- /dev/null +++ b/_api-reference/index-apis/recover.md @@ -0,0 +1,291 @@ +--- +layout: default +title: Recovery +parent: Index APIs +nav_order: 40 +--- + +# Recovery API +Introduced 1.0 +{: .label .label-purple } + +The Recovery API provides information about any completed or ongoing shard recoveries for one or more indexes. If a data stream is listed, the API returns information about that data stream's backing indexes. + +Shard recovery involves creating a shard copy to restore a primary shard from a snapshot or to synchronize a replica shard. After the shard recovery process completes, the recovered shard becomes available for use in search and index operations. + +Shard recovery occurs automatically in the following scenarios: + +- Node startup, known as a local store recovery +- Replication of a primary shard +- Relocation of a shard to a different node within the same cluster +- Restoration of a snapshot +- Clone, shrink, or split operations + +The Recovery API reports solely on completed recoveries for shard copies presently stored in the cluster. It reports only the most recent recovery for each shard copy and does not include historical information about previous recoveries or information about recoveries of shard copies that no longer exist. Consequently, if a shard copy completes a recovery and is subsequently relocated to a different node, then the information about the original recovery is not displayed in the Recovery API. + + +## Path and HTTP methods + +```json +GET /_recovery +GET //recovery/ +``` + +## Path parameters + +Parameter | Data type | Description +:--- | :--- +`index-name` | String | A comma-separated list of indexes, data streams, or index aliases to which the operation is applied. Supports wildcard expressions (`*`). Use `_all` or `*` to specify all indexes and data streams in a cluster. | + + +## Query parameters + +All of the following query parameters are optional. + +Parameter | Data type | Description +:--- | :--- | :--- +`active_only` | Boolean | When `true`, the response only includes active shard recoveries. Default is `false`. +`detailed` | Boolean | When `true`, provides detailed information about shard recoveries. Default is `false`. +`index` | String | A comma-separated list or wildcard expression of index names used to limit the request. + +## Response fields + +The API responds with the following information about the recovery shard. + +Parameter | Data type | Description +:--- | :--- | :--- +`id` | Integer | The ID of the shard. +`type` | String | The recovery source for the shard. Returned values include:
- `EMPTY_STORE`: An empty store. Indicates a new primary shard or the forced allocation of an empty primary shard using the Cluster Reroute API.
- `EXISTING_STORE`: The store of an existing primary shard. Indicates that the recovery is related to node startup or the allocation of an existing primary shard.
- `LOCAL_SHARDS`: Shards belonging to another index on the same node. Indicates that the recovery is related to a clone, shrink, or split operation.
- `PEER`: A primary shard on another node. Indicates that the recovery is related to shard replication.
- `SNAPSHOT`: A snapshot. Indicates that the recovery is related to a snapshot restore operation. +`STAGE` | String | The recovery stage. Returned values can include:
- `INIT`: Recovery has not started.
- `INDEX`: Reading index metadata and copying bytes from the source to the destination.
- `VERIFY_INDEX`: Verifying the integrity of the index.
- `TRANSLOG`: Replaying the transaction log.
- `FINALIZE`: Cleanup.
- `DONE`: Complete. +`primary` | Boolean | When `true`, the shard is a primary shard. +`start_time` | String | The timestamp indicating when the recovery started. +`stop_time` | String | The timestamp indicating when the recovery completed. +`total_time_in_millis` | String | The total amount of time taken to recover a shard, in milliseconds. +`source` | Object | The recovery source. This can include a description of the repository (if the recovery is from a snapshot) or a description of the source node. +`target` | Object | The destination node. +`index` | Object | Statistics about the physical index recovery. +`translog` | Object | Statistics about the translog recovery. + `start` | Object | Statistics about the amount of time taken to open and start the index. + +## Example requests + +The following examples demonstrate how to recover information using the Recovery API. + +### Recover information from several or all indexes + +The following example request returns recovery information about several indexes in a [human-readable format](https://opensearch.org/docs/latest/api-reference/common-parameters/#human-readable-output): + +```json +GET index1,index2/_recovery?human +``` +{% include copy-curl.html %} + +The following example request returns recovery information about all indexes in a human-readable format: + +```json +GET /_recovery?human +``` +{% include copy-curl.html %} + +### Recover detailed information + +The following example request returns detailed recovery information: + +```json +GET _recovery?human&detailed=true +``` +{% include copy-curl.html %} + +## Example response + +The following response returns detailed recovery information about an index named `shakespeare`: + +```json +{ + "shakespeare": { + "shards": [ + { + "id": 0, + "type": "EXISTING_STORE", + "stage": "DONE", + "primary": true, + "start_time": "2024-07-01T18:06:47.415Z", + "start_time_in_millis": 1719857207415, + "stop_time": "2024-07-01T18:06:47.538Z", + "stop_time_in_millis": 1719857207538, + "total_time": "123ms", + "total_time_in_millis": 123, + "source": { + "bootstrap_new_history_uuid": false + }, + "target": { + "id": "uerS7REgRQCbBF3ImY8wOQ", + "host": "172.18.0.3", + "transport_address": "172.18.0.3:9300", + "ip": "172.18.0.3", + "name": "opensearch-node2" + }, + "index": { + "size": { + "total": "17.8mb", + "total_in_bytes": 18708764, + "reused": "17.8mb", + "reused_in_bytes": 18708764, + "recovered": "0b", + "recovered_in_bytes": 0, + "percent": "100.0%" + }, + "files": { + "total": 7, + "reused": 7, + "recovered": 0, + "percent": "100.0%", + "details": [ + { + "name": "_1.cfs", + "length": "9.8mb", + "length_in_bytes": 10325945, + "reused": true, + "recovered": "0b", + "recovered_in_bytes": 0 + }, + { + "name": "_0.cfe", + "length": "479b", + "length_in_bytes": 479, + "reused": true, + "recovered": "0b", + "recovered_in_bytes": 0 + }, + { + "name": "_0.si", + "length": "333b", + "length_in_bytes": 333, + "reused": true, + "recovered": "0b", + "recovered_in_bytes": 0 + }, + { + "name": "_1.cfe", + "length": "479b", + "length_in_bytes": 479, + "reused": true, + "recovered": "0b", + "recovered_in_bytes": 0 + }, + { + "name": "_1.si", + "length": "333b", + "length_in_bytes": 333, + "reused": true, + "recovered": "0b", + "recovered_in_bytes": 0 + }, + { + "name": "_0.cfs", + "length": "7.9mb", + "length_in_bytes": 8380790, + "reused": true, + "recovered": "0b", + "recovered_in_bytes": 0 + }, + { + "name": "segments_3", + "length": "405b", + "length_in_bytes": 405, + "reused": true, + "recovered": "0b", + "recovered_in_bytes": 0 + } + ] + }, + "total_time": "6ms", + "total_time_in_millis": 6, + "source_throttle_time": "-1", + "source_throttle_time_in_millis": 0, + "target_throttle_time": "-1", + "target_throttle_time_in_millis": 0 + }, + "translog": { + "recovered": 0, + "total": 0, + "percent": "100.0%", + "total_on_start": 0, + "total_time": "113ms", + "total_time_in_millis": 113 + }, + "verify_index": { + "check_index_time": "0s", + "check_index_time_in_millis": 0, + "total_time": "0s", + "total_time_in_millis": 0 + } + }, + { + "id": 0, + "type": "PEER", + "stage": "DONE", + "primary": false, + "start_time": "2024-07-01T18:06:47.693Z", + "start_time_in_millis": 1719857207693, + "stop_time": "2024-07-01T18:06:47.744Z", + "stop_time_in_millis": 1719857207744, + "total_time": "50ms", + "total_time_in_millis": 50, + "source": { + "id": "uerS7REgRQCbBF3ImY8wOQ", + "host": "172.18.0.3", + "transport_address": "172.18.0.3:9300", + "ip": "172.18.0.3", + "name": "opensearch-node2" + }, + "target": { + "id": "HFYKietmTO6Ud9COgP0k9Q", + "host": "172.18.0.2", + "transport_address": "172.18.0.2:9300", + "ip": "172.18.0.2", + "name": "opensearch-node1" + }, + "index": { + "size": { + "total": "0b", + "total_in_bytes": 0, + "reused": "0b", + "reused_in_bytes": 0, + "recovered": "0b", + "recovered_in_bytes": 0, + "percent": "0.0%" + }, + "files": { + "total": 0, + "reused": 0, + "recovered": 0, + "percent": "0.0%", + "details": [] + }, + "total_time": "1ms", + "total_time_in_millis": 1, + "source_throttle_time": "-1", + "source_throttle_time_in_millis": 0, + "target_throttle_time": "-1", + "target_throttle_time_in_millis": 0 + }, + "translog": { + "recovered": 0, + "total": 0, + "percent": "100.0%", + "total_on_start": -1, + "total_time": "42ms", + "total_time_in_millis": 42 + }, + "verify_index": { + "check_index_time": "0s", + "check_index_time_in_millis": 0, + "total_time": "0s", + "total_time_in_millis": 0 + } + } + ] + } +} +``` From 506b10121166e145a742368c4f12535451c72569 Mon Sep 17 00:00:00 2001 From: AntonEliatra Date: Wed, 24 Jul 2024 17:45:59 +0100 Subject: [PATCH 059/154] Additing error details and escape method to nodes_dn #7681 (#7703) * Additing error details and escape method to nodes_dn #7681 Signed-off-by: AntonEliatra * Update security-settings.md Signed-off-by: AntonEliatra * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: AntonEliatra Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../configuring-opensearch/security-settings.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/_install-and-configure/configuring-opensearch/security-settings.md b/_install-and-configure/configuring-opensearch/security-settings.md index 244d601449..ffdad36cb3 100644 --- a/_install-and-configure/configuring-opensearch/security-settings.md +++ b/_install-and-configure/configuring-opensearch/security-settings.md @@ -15,7 +15,7 @@ The following sections describe security-related settings in `opensearch.yml`. T The Security plugin supports the following common settings: -- `plugins.security.nodes_dn` (Static): Specifies a list of distinguished names (DNs) that denote the other nodes in the cluster. This setting supports wildcards and regular expressions. The list of DNs are also read from the security index **in addition** to the YAML configuration when `plugins.security.nodes_dn_dynamic_config_enabled` is `true`. +- `plugins.security.nodes_dn` (Static): Specifies a list of distinguished names (DNs) that denote the other nodes in the cluster. This setting supports wildcards and regular expressions. The list of DNs are also read from the security index **in addition** to the YAML configuration when `plugins.security.nodes_dn_dynamic_config_enabled` is `true`. If this setting is not configured correctly, the cluster will fail to form as the nodes will not be able to trust each other and will result in the following error: `Transport client authentication no longer supported`. - `plugins.security.nodes_dn_dynamic_config_enabled` (Static): Relevant for `cross_cluster` use cases where there is a need to manage the allow listed `nodes_dn` without having to restart the nodes every time a new `cross_cluster` remote is configured. Setting `nodes_dn_dynamic_config_enabled` to `true` enables **super-admin callable** Distinguished Names APIs, which provide means to update or retrieve `nodes_dn` dynamically. This setting only has effect if `plugins.security.cert.intercluster_request_evaluator_class` is not set. Default is `false`. @@ -357,6 +357,7 @@ The Security plugin supports the following transport layer security settings: plugins.security.nodes_dn: - "CN=*.example.com, OU=SSL, O=Test, L=Test, C=DE" - "CN=node.other.com, OU=SSL, O=Test, L=Test, C=DE" + - "CN=node.example.com, OU=SSL\, Inc., L=Test, C=DE" # escape additional comma with `\` plugins.security.authcz.admin_dn: - CN=kirk,OU=client,O=client,L=test, C=de plugins.security.roles_mapping_resolution: MAPPING_ONLY From d36fcd376aaf0a253f828b2c3a88ddb8f6c19843 Mon Sep 17 00:00:00 2001 From: leanneeliatra <131779422+leanneeliatra@users.noreply.github.com> Date: Wed, 24 Jul 2024 17:58:39 +0100 Subject: [PATCH 060/154] 20240417 Adding OpenSearch demo configuration mac instructions (#7381) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * adding OpenSearch demo configuration mac instructions Signed-off-by: leanne.laceybyrne@eliatra.com * [MDS] Add security analytics, alerting, feature anaywhere in the multiple data source document (#7328) * Add security analy Signed-off-by: yujin-emma * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update multi-data-sources.md Signed-off-by: Yu Jin <112784385+yujin-emma@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update multi-data-sources.md Signed-off-by: Yu Jin <112784385+yujin-emma@users.noreply.github.com> * Update _dashboards/management/multi-data-sources.md Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Melissa Vagi Signed-off-by: Yu Jin <112784385+yujin-emma@users.noreply.github.com> * Update multi-data-sources.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Signed-off-by: Melissa Vagi * Update multi-data-sources.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Signed-off-by: Melissa Vagi --------- Signed-off-by: yujin-emma Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Yu Jin <112784385+yujin-emma@users.noreply.github.com> Signed-off-by: Melissa Vagi Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Added documentation for managed identity support in repository-azure plugin (#7068) * Added documentation for managed identity support in repository-azure plugins Signed-off-by: Chengwu Shi * fixed syntax Signed-off-by: Chengwu Shi * fixed style error Signed-off-by: Chengwu Shi * remove sudo, and added 1 more point when configuring key or sas token Signed-off-by: Chengwu Shi * Update section Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update snapshot-restore.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * improve readability and clarity Signed-off-by: Chengwu Shi * improved naming Signed-off-by: Chengwu Shi * Update _tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: chengwushi-netapp <153049940+chengwushi-netapp@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update snapshot-restore.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * fixed syntax based on requested changes Signed-off-by: Chengwu Shi * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Chengwu Shi Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: chengwushi-netapp <153049940+chengwushi-netapp@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Updating the documentation for `obfuscate` processor and fixing a issue in `opensearch` sink documentation. (#7251) * Updating the documentation for processor to accomodate change in DataPrepper 2.8 and fixing a issue in sink documentation. Signed-off-by: Utkarsh Agarwal * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Utkarsh Agarwal Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Utkarsh Agarwal Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Added documentation for new default workflow templates (#7346) * add new default workflow templates Signed-off-by: Amit Galitzky * Apply suggestions from code review Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Amit Galitzky Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Remote state publication (#7364) * Remote state publication Signed-off-by: Sooraj Sinha * Address comments Signed-off-by: Sooraj Sinha * Simplify the remote publication description Signed-off-by: Sooraj Sinha * Update remote-cluster-state.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _tuning-your-cluster/availability-and-recovery/remote-store/remote-cluster-state.md Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Sooraj Sinha Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Add documentation of derived fields (#7329) * Add documentation of derived fields Signed-off-by: Rishabh Maurya * improve the documentation Signed-off-by: Rishabh Maurya * Fix style-job comments Signed-off-by: Rishabh Maurya * documentation for object derived field Signed-off-by: Rishabh Maurya * Update one of the search request and date format documentation Signed-off-by: Rishabh Maurya * Doc review Signed-off-by: Fanit Kolchina * Tech review comments Signed-off-by: Fanit Kolchina * One more passive voice Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _field-types/supported-field-types/derived.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _field-types/supported-field-types/derived.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _field-types/supported-field-types/derived.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * More editorial comments Signed-off-by: Fanit Kolchina --------- Signed-off-by: Rishabh Maurya Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Fanit Kolchina Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Add doc for neural-sparse-query-two-phase-processor. (#7306) * Add doc for neural-sparse-query-two-phase-processor. Signed-off-by: conggguan * Make some edits for the comments. Signed-off-by: conggguan * Fix some typo and style-job. Signed-off-by: conggguan * Update neural-sparse-query-two-phase-processor.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: conggguan Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Add documentation related to removal of source and recovery source in k-NN performance tuning section (#7359) * Add documentation related to removal of source and recovery source in k-NN performance tuning section Signed-off-by: Navneet Verma * Update formatting. Add Doc review. Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Navneet Verma Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Update derived.md (#7393) Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Update common-filters.md (#7386) Community member pointed out the error in the remove_field field. Signed-off-by: Heather Halter Signed-off-by: leanne.laceybyrne@eliatra.com * Documentation for Indices Request Cache Overview and its settings (#7288) * Create doc for Indices Request Cache Signed-off-by: Kiran Prakash * sentence case Signed-off-by: Kiran Prakash * sentence case Signed-off-by: Kiran Prakash * Create doc for Indices Request Cache Signed-off-by: Kiran Prakash * sentence case Signed-off-by: Kiran Prakash * sentence case Signed-off-by: Kiran Prakash * Doc review Signed-off-by: Fanit Kolchina * Corrects dynamic/static settings Signed-off-by: Fanit Kolchina * Move one of the settings to static Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Kiran Prakash Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Fanit Kolchina Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * 20240131 Clarification on backend role concept in access control (#7378) * clarification on backend role concept in access control Signed-off-by: leanne.laceybyrne@eliatra.com * taking out unnesesary changes Signed-off-by: leanne.laceybyrne@eliatra.com * Update _security/access-control/index.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: leanne.laceybyrne@eliatra.com Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Changed order of Best Practices page and linked it from other relevant pages (#7389) * changed order of best practices page and linked it from other relevant pages Signed-off-by: leanne.laceybyrne@eliatra.com * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: leanne.laceybyrne@eliatra.com Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * AD Enhancements in Version 2.15 (#7388) * Enhancements in Version 2.15 Starting from version 2.15, we have introduced several enhancements: 1. Custom Index Management: * Added support for custom index management. For more details, watch the video on the create detector page and detector detail page anomaly-detection-dashboards-plugin#770. * Custom result indices are now managed as aliases. Consequently, additional security permissions are required for management. 2. New JVM Heap Usage Threshold Setting: * Introduced a new setting, plugins.anomaly_detection.jvm_heap_usage_threshold, to manage the memory circuit breaker threshold. 3. Documentation Improvements: * Added examples for DSL filters to enhance the documentation. 4. Ruby Version Update: * Updated Ruby version in CONTRIBUTING.md from 3.2 to 3.2.4 as version 3.2 is not available. Testing done: * built and viewed the changed website locally Signed-off-by: Kaituo Li * Update _observing-your-data/ad/index.md Co-authored-by: Melissa Vagi Signed-off-by: Kaituo Li * Update _observing-your-data/ad/index.md Co-authored-by: Melissa Vagi Signed-off-by: Kaituo Li * Update _observing-your-data/ad/index.md Co-authored-by: Melissa Vagi Signed-off-by: Kaituo Li * Update _observing-your-data/ad/index.md Co-authored-by: Melissa Vagi Signed-off-by: Kaituo Li * Update _observing-your-data/ad/index.md Co-authored-by: Melissa Vagi Signed-off-by: Kaituo Li * Update _observing-your-data/ad/index.md Co-authored-by: Melissa Vagi Signed-off-by: Kaituo Li * Update _observing-your-data/ad/index.md Co-authored-by: Melissa Vagi Signed-off-by: Kaituo Li * Update _observing-your-data/ad/index.md Co-authored-by: Melissa Vagi Signed-off-by: Kaituo Li * Update _observing-your-data/ad/index.md Co-authored-by: Melissa Vagi Signed-off-by: Kaituo Li * Update _observing-your-data/ad/index.md Co-authored-by: Melissa Vagi Signed-off-by: Kaituo Li * Update _observing-your-data/ad/index.md Co-authored-by: Melissa Vagi Signed-off-by: Kaituo Li * Update _observing-your-data/ad/settings.md Co-authored-by: Melissa Vagi Signed-off-by: Kaituo Li * Update _observing-your-data/ad/settings.md Co-authored-by: Melissa Vagi Signed-off-by: Kaituo Li * Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi * Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi * Update index.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update _observing-your-data/ad/settings.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --------- Signed-off-by: Kaituo Li Signed-off-by: Melissa Vagi Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Add documentation for the new setting of cardinality aggregation dynamic pruning (#7341) * add documentation for the new setting of cardinality aggregation dynamic pruning Signed-off-by: bowenlan-amzn * Update search-settings.md Signed-off-by: bowenlan-amzn * Update _install-and-configure/configuring-opensearch/search-settings.md Signed-off-by: Melissa Vagi * Update search-settings.md Signed-off-by: Melissa Vagi --------- Signed-off-by: bowenlan-amzn Signed-off-by: Melissa Vagi Co-authored-by: Melissa Vagi Signed-off-by: leanne.laceybyrne@eliatra.com * Change introduced version for derived field (#7403) Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Add documentation for wildcard field type (#7339) * Add documentation for wildcard field type Signed-off-by: Michael Froh * Doc review Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * More explanation Signed-off-by: Fanit Kolchina * Update _field-types/supported-field-types/wildcard.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Michael Froh Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Fanit Kolchina Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Add document for top n queries improvements in 2.15 (#7326) * Add document for top n queries by cpu and memory Signed-off-by: Chenyang Ji * update document to fix style checks Signed-off-by: Chenyang Ji * combine exporter and metrics enhancements documents into one PR Signed-off-by: Chenyang Ji * Doc review Signed-off-by: Fanit Kolchina * add document for default exporter pattern Signed-off-by: Chenyang Ji * Consolidating metric types Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Chenyang Ji Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Fanit Kolchina Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Add documentation for innerHit on knn nested field (#7404) * Add documentation for innerHit on knn nested field Signed-off-by: Heemin Kim * Doc review Signed-off-by: Fanit Kolchina * Explain excluding source Signed-off-by: Fanit Kolchina --------- Signed-off-by: Heemin Kim Signed-off-by: Fanit Kolchina Co-authored-by: Fanit Kolchina Signed-off-by: leanne.laceybyrne@eliatra.com * Mark docrep to remote migration as GA and modify settings names (#7342) * Mark docrep to remote migration as GA and modify settings names Signed-off-by: Gaurav Bafna * Update _tuning-your-cluster/availability-and-recovery/remote-store/migrating-to-remote.md Co-authored-by: Bhumika Saini Signed-off-by: Gaurav Bafna <85113518+gbbafna@users.noreply.github.com> * Update migrating-to-remote.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update _tuning-your-cluster/availability-and-recovery/remote-store/migrating-to-remote.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _tuning-your-cluster/availability-and-recovery/remote-store/migrating-to-remote.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _tuning-your-cluster/availability-and-recovery/remote-store/migrating-to-remote.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --------- Signed-off-by: Gaurav Bafna Signed-off-by: Gaurav Bafna <85113518+gbbafna@users.noreply.github.com> Signed-off-by: Melissa Vagi Co-authored-by: Bhumika Saini Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Changed VisBuilder status from experimental to GA (#7405) * VisBuilder GA Signed-off-by: Fanit Kolchina * Removed experimental visualization requirement Signed-off-by: Fanit Kolchina * Update screenshots Signed-off-by: Fanit Kolchina --------- Signed-off-by: Fanit Kolchina Signed-off-by: leanne.laceybyrne@eliatra.com * Explicitly add a version question to the doc issue template (#7406) * Separate out the version section in the doc issue Signed-off-by: Fanit Kolchina * typo Signed-off-by: Fanit Kolchina --------- Signed-off-by: Fanit Kolchina Signed-off-by: leanne.laceybyrne@eliatra.com * Add doc for alerting comments (#7360) * alerting comments first documentation rough draft Signed-off-by: Dennis Toepker * formatting and styling changes Signed-off-by: Dennis Toepker * more formatting changes Signed-off-by: Dennis Toepker * more spacing changes Signed-off-by: Dennis Toepker * added not about rbac and added some links Signed-off-by: Dennis Toepker * removing comments history enabled setting Signed-off-by: Dennis Toepker * Update comments.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update _observing-your-data/alerting/api.md Update Update comment API description Co-authored-by: Melissa Vagi Signed-off-by: toepkerd <120457569+toepkerd@users.noreply.github.com> * Update _observing-your-data/alerting/api.md Update Delete comment API description Co-authored-by: Melissa Vagi Signed-off-by: toepkerd <120457569+toepkerd@users.noreply.github.com> * Update _observing-your-data/alerting/settings.md Updated comments_enabled setting description Co-authored-by: Melissa Vagi Signed-off-by: toepkerd <120457569+toepkerd@users.noreply.github.com> * Update _observing-your-data/alerting/settings.md Updated comments_history_max_age setting description Co-authored-by: Melissa Vagi Signed-off-by: toepkerd <120457569+toepkerd@users.noreply.github.com> * Update _observing-your-data/alerting/settings.md Update comments_history_rollover_period setting description Co-authored-by: Melissa Vagi Signed-off-by: toepkerd <120457569+toepkerd@users.noreply.github.com> * Update _observing-your-data/alerting/settings.md Update comments_history_retention_period setting description Co-authored-by: Melissa Vagi Signed-off-by: toepkerd <120457569+toepkerd@users.noreply.github.com> * minor edits and dead link fix attempt Signed-off-by: Dennis Toepker * misc edits Signed-off-by: Dennis Toepker * minor edit Signed-off-by: Dennis Toepker * Update api.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update _observing-your-data/alerting/comments.md Signed-off-by: Melissa Vagi * Update api.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update comments.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update api.md format request and response examples to expand Signed-off-by: Melissa Vagi * Update api.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update api.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update api.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update api.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update api.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update _observing-your-data/alerting/api.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/alerting/api.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/alerting/api.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/alerting/api.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/alerting/api.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/alerting/settings.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/alerting/settings.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/alerting/settings.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/alerting/comments.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/alerting/settings.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/alerting/comments.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/alerting/comments.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/alerting/comments.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update api.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update settings.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update api.md Signed-off-by: Melissa Vagi Added text to address stacked headings issue Signed-off-by: Melissa Vagi * Update api.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update _observing-your-data/alerting/api.md Signed-off-by: Melissa Vagi * Update api.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update _observing-your-data/alerting/api.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Fix broken link Signed-off-by: Melissa Vagi * Fix broken link Signed-off-by: Melissa Vagi --------- Signed-off-by: Dennis Toepker Signed-off-by: Melissa Vagi Signed-off-by: toepkerd <120457569+toepkerd@users.noreply.github.com> Co-authored-by: Dennis Toepker Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Update the integration page to reflect new integration catalog features (#7324) * update the integration page to reflect the integration catalog and additional capabilities Signed-off-by: YANGDB * update the integration documentation Signed-off-by: YANGDB * Update schema section Signed-off-by: Simeon Widdis * update the metrics analytics documentation Signed-off-by: YANGDB * update the trace analytics documentation Signed-off-by: YANGDB * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update index.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update index.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update index.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update index.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update index.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update index.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update index.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Rewrite tutorials and update or delete graphics Signed-off-by: Melissa Vagi * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: YANGDB * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: YANGDB * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _integrations/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi * Update _integrations/index.md Signed-off-by: Melissa Vagi --------- Signed-off-by: YANGDB Signed-off-by: Simeon Widdis Signed-off-by: Melissa Vagi Co-authored-by: Simeon Widdis Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Add connector tool (#7384) * add connector tool Signed-off-by: Yaliang Wu * address comments Signed-off-by: Yaliang Wu * Doc review Signed-off-by: Fanit Kolchina * Update _ml-commons-plugin/agents-tools/tools/connector-tool.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _ml-commons-plugin/agents-tools/tools/connector-tool.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _ml-commons-plugin/agents-tools/tools/connector-tool.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _ml-commons-plugin/agents-tools/tools/connector-tool.md Co-authored-by: Yaliang Wu Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _ml-commons-plugin/agents-tools/tools/connector-tool.md Co-authored-by: Yaliang Wu Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _ml-commons-plugin/remote-models/blueprints.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Yaliang Wu Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Fanit Kolchina Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Alerts in correlations feature documentation (#7410) * alerts in correlations feature documentation Signed-off-by: Riya Saxena * Update correlation-eng.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update alerts.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Copy edit Signed-off-by: Melissa Vagi * Update _security-analytics/api-tools/correlation-eng.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _security-analytics/usage/alerts.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _security-analytics/usage/alerts.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _security-analytics/usage/alerts.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _security-analytics/usage/alerts.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _security-analytics/usage/alerts.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _security-analytics/usage/alerts.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _security-analytics/usage/alerts.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _security-analytics/usage/alerts.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _security-analytics/usage/alerts.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _security-analytics/usage/alerts.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _security-analytics/usage/alerts.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _security-analytics/usage/alerts.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _security-analytics/usage/alerts.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _security-analytics/usage/alerts.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _security-analytics/usage/alerts.md Signed-off-by: Melissa Vagi * Update _security-analytics/usage/alerts.md Signed-off-by: Melissa Vagi * Update _security-analytics/usage/alerts.md Signed-off-by: Melissa Vagi --------- Signed-off-by: Riya Saxena Signed-off-by: Melissa Vagi Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Trace analytics update (#7362) * update the integration page to reflect the integration catalog and additional capabilities Signed-off-by: YANGDB * update the integration documentation Signed-off-by: YANGDB * Update schema section Signed-off-by: Simeon Widdis * update the metrics analytics documentation Signed-off-by: YANGDB * update the trace analytics documentation Signed-off-by: YANGDB * Update _observing-your-data/trace/ta-dashboards.md Signed-off-by: Melissa Vagi * Update _observing-your-data/trace/ta-dashboards.md Signed-off-by: Melissa Vagi * Update _observing-your-data/trace/ta-dashboards.md Signed-off-by: Melissa Vagi * Update _observing-your-data/trace/ta-dashboards.md Signed-off-by: Melissa Vagi * Update _observing-your-data/trace/ta-dashboards.md Signed-off-by: Melissa Vagi * Update _observing-your-data/trace/ta-dashboards.md Signed-off-by: Melissa Vagi * Update _observing-your-data/trace/ta-dashboards.md Signed-off-by: Melissa Vagi * Update _observing-your-data/trace/ta-dashboards.md Signed-off-by: Melissa Vagi * Update _observing-your-data/trace/ta-dashboards.md Signed-off-by: Melissa Vagi * Update _observing-your-data/trace/ta-dashboards.md Signed-off-by: Melissa Vagi * Update ta-dashboards.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update _observing-your-data/trace/ta-dashboards.md Signed-off-by: Melissa Vagi * Update _observing-your-data/trace/ta-dashboards.md Signed-off-by: Melissa Vagi * update service correlation index naming convention Signed-off-by: YANGDB * Update ta-dashboards.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update ta-dashboards.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update ta-dashboards.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update ta-dashboards.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update ta-dashboards.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update ta-dashboards.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update ta-dashboards.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update _observing-your-data/trace/ta-dashboards.md Signed-off-by: Melissa Vagi * Update _observing-your-data/trace/ta-dashboards.md Signed-off-by: Melissa Vagi * Update _observing-your-data/trace/ta-dashboards.md Co-authored-by: Nathan Bower Signed-off-by: YANGDB * Update _observing-your-data/trace/ta-dashboards.md Co-authored-by: Nathan Bower Signed-off-by: YANGDB * Update _observing-your-data/trace/ta-dashboards.md Co-authored-by: Nathan Bower Signed-off-by: YANGDB * Update _observing-your-data/trace/ta-dashboards.md Co-authored-by: Nathan Bower Signed-off-by: YANGDB * Update _observing-your-data/trace/ta-dashboards.md Co-authored-by: Nathan Bower Signed-off-by: YANGDB * Update _observing-your-data/trace/ta-dashboards.md Co-authored-by: Nathan Bower Signed-off-by: YANGDB * Update _observing-your-data/trace/ta-dashboards.md Co-authored-by: Nathan Bower Signed-off-by: YANGDB * Update _observing-your-data/trace/ta-dashboards.md Co-authored-by: Nathan Bower Signed-off-by: YANGDB * Update _observing-your-data/trace/ta-dashboards.md Co-authored-by: Nathan Bower Signed-off-by: YANGDB * Update _observing-your-data/trace/ta-dashboards.md Co-authored-by: Nathan Bower Signed-off-by: YANGDB * Update _observing-your-data/trace/ta-dashboards.md Co-authored-by: Nathan Bower Signed-off-by: YANGDB * Update _observing-your-data/trace/ta-dashboards.md Co-authored-by: Nathan Bower Signed-off-by: YANGDB * Update _observing-your-data/trace/ta-dashboards.md Co-authored-by: Nathan Bower Signed-off-by: YANGDB * Update _observing-your-data/trace/ta-dashboards.md Co-authored-by: Nathan Bower Signed-off-by: YANGDB * Update _observing-your-data/trace/ta-dashboards.md Co-authored-by: Nathan Bower Signed-off-by: YANGDB * Update _observing-your-data/trace/ta-dashboards.md Co-authored-by: Nathan Bower Signed-off-by: YANGDB * Update _observing-your-data/trace/ta-dashboards.md Co-authored-by: Nathan Bower Signed-off-by: YANGDB * Update _observing-your-data/trace/ta-dashboards.md Co-authored-by: Nathan Bower Signed-off-by: YANGDB * Update _observing-your-data/trace/ta-dashboards.md Co-authored-by: Nathan Bower Signed-off-by: YANGDB * Update _observing-your-data/trace/ta-dashboards.md Co-authored-by: Nathan Bower Signed-off-by: YANGDB * Update _observing-your-data/trace/ta-dashboards.md Co-authored-by: Nathan Bower Signed-off-by: YANGDB * Update _observing-your-data/trace/ta-dashboards.md Signed-off-by: Melissa Vagi * Update _observing-your-data/trace/ta-dashboards.md Signed-off-by: Melissa Vagi * Update _observing-your-data/trace/ta-dashboards.md Signed-off-by: Melissa Vagi * Update _observing-your-data/trace/ta-dashboards.md Signed-off-by: Melissa Vagi * Update _observing-your-data/trace/ta-dashboards.md Signed-off-by: Melissa Vagi * Update _observing-your-data/trace/ta-dashboards.md Signed-off-by: Melissa Vagi * Update _observing-your-data/trace/ta-dashboards.md Signed-off-by: Melissa Vagi * Update _observing-your-data/trace/ta-dashboards.md Signed-off-by: Melissa Vagi * Update _observing-your-data/trace/ta-dashboards.md Signed-off-by: Melissa Vagi * Update _observing-your-data/trace/ta-dashboards.md Signed-off-by: Melissa Vagi --------- Signed-off-by: YANGDB Signed-off-by: Simeon Widdis Signed-off-by: Melissa Vagi Co-authored-by: Simeon Widdis Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Add documentations for batch ingestion feature (#7408) * Add documents for batch ingestion Signed-off-by: Liyun Xiu * Revert change on other rows of bulk Signed-off-by: Liyun Xiu * Apply suggestions from code review Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _ml-commons-plugin/remote-models/batch-ingestion.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Liyun Xiu Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Moved batch-enabled processors to a separate section (#7415) Signed-off-by: Fanit Kolchina Signed-off-by: leanne.laceybyrne@eliatra.com * Add link to tutorial in OS Dashboards Assistant documentation (#7416) Signed-off-by: Fanit Kolchina Signed-off-by: leanne.laceybyrne@eliatra.com * Point to the spaces section of approximate k-NN from radial search (#7417) Signed-off-by: Fanit Kolchina Signed-off-by: leanne.laceybyrne@eliatra.com * Update documentation of ml inference processors to support for local models (#7368) * ml inference processor support for local models Signed-off-by: Bhavana Ramaram * address comments Signed-off-by: Bhavana Ramaram * Doc review Signed-off-by: Fanit Kolchina * tech review comments Signed-off-by: Fanit Kolchina * Update _ingest-pipelines/processors/ml-inference.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Bhavana Ramaram Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Fanit Kolchina Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Remove a link that caused link checker fail and update link checker (#7418) Signed-off-by: Fanit Kolchina Signed-off-by: leanne.laceybyrne@eliatra.com * Add remote guardrails model support (#7377) * remote guardrails model support Signed-off-by: Jing Zhang * address comments Signed-off-by: Jing Zhang * Add guardrail information Signed-off-by: Fanit Kolchina * Rename response accept parameter Signed-off-by: Fanit Kolchina * typo fix Signed-off-by: Fanit Kolchina * Fix links Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _ml-commons-plugin/remote-models/guardrails.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Jing Zhang Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Fanit Kolchina Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Update the metrics analytics documentation (#7352) * update the metrics analytics documentation Signed-off-by: YANGDB * update the metrics analytics documentation with remote data-sources Signed-off-by: YANGDB * Update metricsanalytics.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update metricsanalytics.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update metricsanalytics.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Doc review Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi * Update _observing-your-data/metricsanalytics.md Signed-off-by: Melissa Vagi --------- Signed-off-by: YANGDB Signed-off-by: Melissa Vagi Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Update key value processor documentation with new config options (#7413) * Update key value processor documentation with new config options Signed-off-by: Kondaka * Update _data-prepper/pipelines/configuration/processors/key-value.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/processors/key-value.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/processors/key-value.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/processors/key-value.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/processors/key-value.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/processors/key-value.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --------- Signed-off-by: Kondaka Signed-off-by: Melissa Vagi Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Add version to the warning headers (#7459) Signed-off-by: Fanit Kolchina Signed-off-by: leanne.laceybyrne@eliatra.com * Adds the max_request_length documentation for appropriate Data Prepper sources (#7441) * Adds the max_request_length documentation. Also, updates and corrects the http source documentation to use the correct name. Signed-off-by: David Venable * Update _data-prepper/pipelines/configuration/sources/http.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sources/otel-logs-source.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sources/otel-metrics-source.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sources/otel-trace-source.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sources/http.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sources/http.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sources/otel-logs-source.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sources/otel-logs-source.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sources/otel-metrics-source.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sources/otel-trace-source.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --------- Signed-off-by: David Venable Signed-off-by: Melissa Vagi Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * adding explanation to jwks validation #7367 #20240522 (#7398) * adding explanation to jwks validation #7367 #20240522 Signed-off-by: AntonEliatra * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: AntonEliatra Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * 20240322 Adding absolute path for config.yml (#7380) * adding absolute path Signed-off-by: leanne.laceybyrne@eliatra.com * adding clarification for mac Signed-off-by: leanne.laceybyrne@eliatra.com * changes commited on incorrect branch - undoing Signed-off-by: leanne.laceybyrne@eliatra.com * Update _security/configuration/configuration.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: leanne.laceybyrne@eliatra.com Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Heather Halter Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Rename metricsanalytics to prometheusmetrics (#7482) Signed-off-by: Fanit Kolchina Co-authored-by: Melissa Vagi Signed-off-by: leanne.laceybyrne@eliatra.com * Add a video requirement to a PR template (#7488) * Add a video requirement to a PR template Signed-off-by: Fanit Kolchina * Update .github/PULL_REQUEST_TEMPLATE.md Co-authored-by: Melissa Vagi Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Melissa Vagi Signed-off-by: leanne.laceybyrne@eliatra.com * Add 2.15 version (#7483) Signed-off-by: Fanit Kolchina Signed-off-by: leanne.laceybyrne@eliatra.com * Add 2.15 to version history (#7484) Signed-off-by: Fanit Kolchina Signed-off-by: leanne.laceybyrne@eliatra.com * Add release notes 2.15 (#7486) Signed-off-by: Fanit Kolchina Signed-off-by: leanne.laceybyrne@eliatra.com * Add new s3 sink documentation for Data Prepper 2.8 (#7163) * Add new s3 sink documentation for Data Prepper 2.8 Signed-off-by: Taylor Gray * Apply suggestions from code review Co-authored-by: Melissa Vagi Signed-off-by: Taylor Gray * Update s3.md Clean up formatting. Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Signed-off-by: Melissa Vagi * Update s3.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/configuration/sinks/s3.md Signed-off-by: Melissa Vagi --------- Signed-off-by: Taylor Gray Signed-off-by: Taylor Gray Signed-off-by: Melissa Vagi Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Adding information running script in OpenSearch (#7397) * adding one liner to explain script can be defined with source or id Signed-off-by: leanne.laceybyrne@eliatra.com * Update _api-reference/document-apis/bulk.md Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: leanneeliatra <131779422+leanneeliatra@users.noreply.github.com> --------- Signed-off-by: leanne.laceybyrne@eliatra.com Signed-off-by: leanneeliatra <131779422+leanneeliatra@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Fix monitoring predict requests in ML settings (#7512) Signed-off-by: Fanit Kolchina Signed-off-by: leanne.laceybyrne@eliatra.com * Add oidc docker example with keycloak #1566 (#7372) * adding oidc docker example with keycloak #1566 Signed-off-by: AntonEliatra * Update openid-connect.md Signed-off-by: AntonEliatra * Apply suggestions from code review Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: AntonEliatra * Update openid-connect.md Signed-off-by: AntonEliatra * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: AntonEliatra Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Add General Guidelines for concurrent segment search (#7402) * Add General Guidelines for concurrent segment search Signed-off-by: Jay Deng * Apply suggestions from code review Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Jay Deng Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Fixes Update compare-search-results.md (#7534) Has parent = true; updated to false so it doesn't look like it has children in the nav pane Signed-off-by: Heather Halter Signed-off-by: leanne.laceybyrne@eliatra.com * Drop repeated key/word in inverted index (#7538) "the" appears twice as a key in the inverted index; keys should only appear once. The correct key-value pair is kept, with a value of "1, 2" as "the" appears in both documents. Signed-off-by: Don Fox Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Remove Vale from date processor (#7556) Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Add selective download feature to Data Prepper sources section (#6247) * Add feature to this section Signed-off-by: Melissa Vagi * add content Signed-off-by: Melissa Vagi * Copy edits Signed-off-by: Melissa Vagi * Update selective-download.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Address tech review comments Signed-off-by: Melissa Vagi * Address tech review comments Signed-off-by: Melissa Vagi * Update _data-prepper/common-use-cases/s3-logs.md Co-authored-by: David Venable Signed-off-by: Melissa Vagi * Update _data-prepper/common-use-cases/s3-logs.md Co-authored-by: David Venable Signed-off-by: Melissa Vagi * Update _data-prepper/common-use-cases/s3-logs.md Co-authored-by: David Venable Signed-off-by: Melissa Vagi * Update _data-prepper/common-use-cases/s3-logs.md Co-authored-by: David Venable Signed-off-by: Melissa Vagi * Update s3-logs.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update s3-logs.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update s3-logs.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update _data-prepper/common-use-cases/s3-logs.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/common-use-cases/s3-logs.md Signed-off-by: Melissa Vagi * Update _data-prepper/common-use-cases/s3-logs.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/common-use-cases/s3-logs.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/common-use-cases/s3-logs.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/common-use-cases/s3-logs.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/common-use-cases/s3-logs.md Signed-off-by: Melissa Vagi --------- Signed-off-by: Melissa Vagi Co-authored-by: David Venable Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Update prometheusmetrics.md (#7561) Fixed a broken image link on https://opensearch.org/docs/latest/observing-your-data/prometheusmetrics/ Signed-off-by: Heather Halter Signed-off-by: leanne.laceybyrne@eliatra.com * Update Benchmark index to reflect TOC (#7560) * Update Benchmark index to reflect TOC Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Heather Halter Signed-off-by: leanne.laceybyrne@eliatra.com * Fix broken link to wildcard field type in index.md (#7565) Link to wildcard field type documentation is incorrect Signed-off-by: Thomas Wing Signed-off-by: leanne.laceybyrne@eliatra.com * Add Render Template API (#7219) * Add render/template API Signed-off-by: Archer * Add examples Signed-off-by: Archer * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Add example from `source`. Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Archer Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Heather Halter Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Update docs for new clause count setting (#7391) * Update docs for new clause count setting Signed-off-by: Harsha Vamsi Kalluri * Update to 2.16 Signed-off-by: Harsha Vamsi Kalluri * Update _install-and-configure/configuring-opensearch/index-settings.md Signed-off-by: Melissa Vagi * Update _install-and-configure/configuring-opensearch/index-settings.md Signed-off-by: Melissa Vagi * Update _install-and-configure/configuring-opensearch/index-settings.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _install-and-configure/configuring-opensearch/index-settings.md Signed-off-by: Melissa Vagi --------- Signed-off-by: Harsha Vamsi Kalluri Signed-off-by: Melissa Vagi Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Document new ingest and search pipeline allowlist settings (#7414) * Document new ingest and search pipeline allowlist settings Signed-off-by: Andrew Ross * Update _ingest-pipelines/processors/index-processors.md Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/index-processors.md Signed-off-by: Melissa Vagi * Update _search-plugins/search-pipelines/search-processors.md Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/index-processors.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _search-plugins/search-pipelines/search-processors.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _search-plugins/search-pipelines/search-processors.md Signed-off-by: Melissa Vagi --------- Signed-off-by: Andrew Ross Signed-off-by: Melissa Vagi Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Add documentation for using rerank and normalization processors together (#7513) * Add documentation for using rerank and normalization processors together Signed-off-by: Fanit Kolchina * Update _search-plugins/search-relevance/reranking-search-results.md Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Update syntax.md -- where-filter (#7593) * Update syntax.md -- where-filter The argument to the source= command should be named for what it does, not its type, and the doc should say what it does, not how it is evaluated. Signed-off-by: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> * Update _search-plugins/sql/ppl/syntax.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> --------- Signed-off-by: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Update syntax.md (#7595) * Update syntax.md Rewrite the PPL syntax section. It doesn't seem useful to say that PPL "currently" supports "only" one search command, as though we have plans to support multiple search commands (with what semantics?). Just say what it does. I also removed the incorrect space in [{ : .note}], which caused that text to appear in the formatted output. I have no idea what that command is supposed to do, but I notice another one later in the file that doesn't have the space, and doesn't appear in the formatted output, so I'm assuming that [{: .note}] is correct. Signed-off-by: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> * Update _search-plugins/sql/ppl/syntax.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> --------- Signed-off-by: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Update-jwt-docs (#7236) * Update-jwt-docs Signed-off-by: leedonggyu * Update _security/authentication-backends/jwt.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _security/authentication-backends/jwt.md Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: leedonggyu Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Add more information to Python docs (#7506) * addresources Signed-off-by: Heather Halter * fixedlinks Signed-off-by: Heather Halter * Update _clients/python-low-level.md Signed-off-by: Melissa Vagi --------- Signed-off-by: Heather Halter Signed-off-by: Melissa Vagi Co-authored-by: Melissa Vagi Signed-off-by: leanne.laceybyrne@eliatra.com * Correct step 2 of custom local models (#7600) * Update custom-local-models.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Correct response function Signed-off-by: Fanit Kolchina * More changes Signed-off-by: Fanit Kolchina * Fix rag cluster setting Signed-off-by: Fanit Kolchina --------- Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Fanit Kolchina Signed-off-by: leanne.laceybyrne@eliatra.com * Update options and add more examples for add_entries processor (#7412) * Update options and add more examples Signed-off-by: Hai Yan * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * A few small edits * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Hai Yan Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Update multiple data sources and add TSVB and Vega viz types (#7229) * Update multiple data sources and add TSVB and Vega viz types Signed-off-by: Melissa Vagi * add text Signed-off-by: Melissa Vagi * Add tutorials Signed-off-by: Melissa Vagi * Add tutorials Signed-off-by: Melissa Vagi * Update _dashboards/visualize/vega.md Co-authored-by: Huy Nguyen <73027756+huyaboo@users.noreply.github.com> Signed-off-by: Melissa Vagi * Update tsvb.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update vega.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update vega.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update tsvb.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update vega.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Address SME feedback Signed-off-by: Melissa Vagi * Address SME feedback Signed-off-by: Melissa Vagi * Address SME feedback Signed-off-by: Melissa Vagi * Address SME feedback Signed-off-by: Melissa Vagi * Update _dashboards/visualize/tsvb.md Co-authored-by: Heather Halter Signed-off-by: Melissa Vagi * Update _dashboards/visualize/tsvb.md Co-authored-by: Heather Halter Signed-off-by: Melissa Vagi * Update _dashboards/visualize/vega.md Co-authored-by: Heather Halter Signed-off-by: Melissa Vagi * Update _dashboards/visualize/vega.md Co-authored-by: Heather Halter Signed-off-by: Melissa Vagi * Update _dashboards/visualize/vega.md Co-authored-by: Heather Halter Signed-off-by: Melissa Vagi * Update _dashboards/visualize/tsvb.md Signed-off-by: Melissa Vagi * Update _dashboards/visualize/tsvb.md Signed-off-by: Melissa Vagi * Update _dashboards/visualize/tsvb.md Signed-off-by: Melissa Vagi * Update _dashboards/visualize/tsvb.md Signed-off-by: Melissa Vagi * Update _dashboards/visualize/tsvb.md Signed-off-by: Melissa Vagi * Update _dashboards/visualize/geojson-regionmaps.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/visualize/selfhost-maps-server.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/visualize/vega.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/visualize/visbuilder.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/visualize/vega.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/visualize/vega.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/visualize/vega.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Signed-off-by: Melissa Vagi * Update _dashboards/management/multi-data-sources.md Signed-off-by: Melissa Vagi * Update _dashboards/visualize/selfhost-maps-server.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/visualize/tsvb.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/visualize/tsvb.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/visualize/tsvb.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/visualize/tsvb.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/visualize/tsvb.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/visualize/tsvb.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/visualize/tsvb.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/visualize/tsvb.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/visualize/tsvb.md Signed-off-by: Melissa Vagi * Update _dashboards/visualize/vega.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/visualize/vega.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _dashboards/visualize/vega.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --------- Signed-off-by: Melissa Vagi Co-authored-by: Huy Nguyen <73027756+huyaboo@users.noreply.github.com> Co-authored-by: Heather Halter Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Add geodistance query documentation (#7607) * Add geodistance query documentation Signed-off-by: Fanit Kolchina * Add optional sentence Signed-off-by: Fanit Kolchina * Fix links Signed-off-by: Fanit Kolchina * More links Signed-off-by: Fanit Kolchina * Fix a link Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Editorial comment Signed-off-by: Fanit Kolchina --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Fix: expand_wildcards in _refresh API. (#7620) Signed-off-by: dblock Signed-off-by: leanne.laceybyrne@eliatra.com * Added documentation for cat?sort and format. (#7619) * Added documentation for cat?sort and format. Signed-off-by: dblock * Apply suggestions from code review Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: dblock Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Add deriving metrics from logs use case to Data Prepper (#6248) * Add use case to Data Prepper Signed-off-by: Melissa Vagi * Add content Signed-off-by: Melissa Vagi * Copy edits Signed-off-by: Melissa Vagi * Update metrics-logs.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update _data-prepper/common-use-cases/metrics-logs.md Co-authored-by: David Venable Signed-off-by: Melissa Vagi * Update _data-prepper/common-use-cases/metrics-logs.md Co-authored-by: David Venable Signed-off-by: Melissa Vagi * Update _data-prepper/common-use-cases/metrics-logs.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/common-use-cases/metrics-logs.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/common-use-cases/metrics-logs.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _data-prepper/common-use-cases/metrics-logs.md Signed-off-by: Melissa Vagi * Update metrics-logs.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update metrics-logs.md Signed-off-by: Melissa Vagi * Update _data-prepper/common-use-cases/metrics-logs.md Signed-off-by: Melissa Vagi --------- Signed-off-by: Melissa Vagi Co-authored-by: David Venable Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * [DOC] Clarify Interaction of JVM Memory Settings via OPENSEARCH_JAVA_OPTS with Default Values (#7564) * [DOC] add jvm Memory Settings note (#7564) Signed-off-by: c-neto * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: c-neto Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Add documentation for the UBI plugin (#7284) * gathering potential documentation attempts Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * considering the dashboard tutorial Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * place holder for js data structure usage Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * data-structures placeholder Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Updating index links Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * adding old doc to be merged Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Starting to link things together Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * fix broken link Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * respond to vale Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * more vale violations Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * name files consistently with docs site and fix links. Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * vale Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Minor tweaks. Moved Ubi under SEARCH. Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * add label for versining of spec and OS version Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * try to sort out vale error Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Converting mermaid diagrams to png's Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Updating query_id mermaid code Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Better way to ignore the mermaid scripts in the md files Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * description updates Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * schema.md updating (still need to update the mermaid diagram) Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * schema updates Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * updates Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Rebuilding main Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * merging in images Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Updating UBI spec number Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Use released version Signed-off-by: Eric Pugh * Update _search-plugins/index.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/index.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/index.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Addressing PR feedback Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/data-structures.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/data-structures.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/data-structures.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/data-structures.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Adding dsl intro Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Adding intro sentence Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Title adjust Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Addressing PR feedback Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Addressing pr feedback Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Addressing PR feedback Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Fixing vale errors Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Finishing initial pr feedback Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Next round of PR feedback Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Describing chorus workbench link Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Adding captions for result tables Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * PR clean up Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Adding a few more suggestions Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * updating query filter for laptos Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * PR feedback Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * PR feedback Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Edits to all files to comply with OpenSearch standards; nav_order updates Signed-off-by: Heather Halter * Apply suggestions from code review Missed a file - more commits to the sql-query topic Signed-off-by: Heather Halter * Update _search-plugins/ubi/sql-queries.md Signed-off-by: Heather Halter * Apply suggestions from code review Cleaned up the sample query topic Signed-off-by: Heather Halter * Apply suggestions from code review Signed-off-by: Heather Halter * Update _search-plugins/ubi/dsl-queries.md Signed-off-by: Heather Halter * Update _search-plugins/ubi/dsl-queries.md Signed-off-by: Heather Halter * Apply suggestions from code review Accepted editorial suggestions. Co-authored-by: Nathan Bower Signed-off-by: Heather Halter * Update index.md Reformatted table info Signed-off-by: Heather Halter * Update _search-plugins/ubi/dsl-queries.md Signed-off-by: Heather Halter * Apply suggestions from code review Signed-off-by: Heather Halter * Update index.md Signed-off-by: Heather Halter * Update schemas.md Signed-off-by: Heather Halter * Update index.md Added a missing link and fixed the table. Signed-off-by: Heather Halter * Update index.md Changed the bold to italics Signed-off-by: Heather Halter * Update ubi-dashboard-tutorial.md Removed unnecessary note tag. Signed-off-by: Heather Halter * Update schemas.md Inserted comma Signed-off-by: Heather Halter * Update sql-queries.md Signed-off-by: Heather Halter * Apply suggestions from code review There were some hidden comments that I found in this file. Co-authored-by: Nathan Bower Signed-off-by: Heather Halter * Update _search-plugins/ubi/sql-queries.md Signed-off-by: Heather Halter * Update _search-plugins/ubi/sql-queries.md Signed-off-by: Heather Halter * Apply suggestions from code review Signed-off-by: Heather Halter * Update _search-plugins/ubi/schemas.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update schemas.md Removed links on 'object_id' and 'query_id' Signed-off-by: Heather Halter * Update sql-queries.md removed a note tag and fixed line 326 Signed-off-by: Heather Halter * Update sql-queries.md One more table heading Signed-off-by: Heather Halter --------- Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> Signed-off-by: Eric Pugh Signed-off-by: Heather Halter Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Eric Pugh Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Heather Halter Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Update cat-nodes.md (#7626) 'Local' option is deprecated and no has any purpose. See Issue #7625 Signed-off-by: Landon Lengyel Signed-off-by: leanne.laceybyrne@eliatra.com * Fix Key value table (#7636) Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Add reranking search results with MS Marco cross-encoder tutorial (#7634) * Add reranking search results with MS Marco cross-encoder tutorial Signed-off-by: Fanit Kolchina * Update _ml-commons-plugin/tutorials/reranking-cross-encoder.md Co-authored-by: Melissa Vagi Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _ml-commons-plugin/tutorials/reranking-cross-encoder.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Add fingerprint processor (#7631) * Add fingerprint processor Signed-off-by: gaobinlong * Completed doc review Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/fingerprint.md Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/fingerprint.md Signed-off-by: Melissa Vagi * Update nav order Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/fingerprint.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/fingerprint.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/fingerprint.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/fingerprint.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/fingerprint.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/index-processors.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --------- Signed-off-by: gaobinlong Signed-off-by: Melissa Vagi Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Removed incorrect ignore_malformed query parameter. (#7652) Signed-off-by: dblock Signed-off-by: leanne.laceybyrne@eliatra.com * Add geo-centroid and weighted average aggregations documentation (#7613) * Add geo-centroid and weighted avaerage aggregations documentation Signed-off-by: Melissa Vagi * Add geocentroid content and examples Signed-off-by: Melissa Vagi * Add weighted average content and examples Signed-off-by: Melissa Vagi * Update _aggregations/metric/geocentroid.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/geocentroid.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/geocentroid.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/geocentroid.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/geocentroid.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/geocentroid.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/weighted-avg.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/geocentroid.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/weighted-avg.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/weighted-avg.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/weighted-avg.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/weighted-avg.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/geocentroid.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/weighted-avg.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/weighted-avg.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/weighted-avg.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/weighted-avg.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _aggregations/metric/geocentroid.md Signed-off-by: Melissa Vagi * Update _aggregations/metric/geocentroid.md Signed-off-by: Melissa Vagi --------- Signed-off-by: Melissa Vagi Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Setting-envars-docs #3582 (#7400) * setting-envars-docs #3582 Signed-off-by: AntonEliatra * Update index.md Signed-off-by: AntonEliatra * Apply suggestions from code review Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: AntonEliatra * Update index.md Signed-off-by: AntonEliatra --------- Signed-off-by: AntonEliatra Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * mention both Dashboards and endpoint (#7638) * mention both Dashboards and endpoint Old text said that Dashboards are a prerequisite for using PPL, and mentioned only the Query Workbench, not the _ppl endpoint. Is it really true that Dashboards are a prerequisite? Or is it just the SQL plugin that is a prerequisite? Signed-off-by: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> * Update index.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Document '_name' field in 'function_score' query's function definition (#7340) * Document '_name' field in 'function_score' query function definition Signed-off-by: Łukasz Rynek * Ensure real request JSON payload Signed-off-by: Łukasz Rynek * Ensure real response JSON payload + finish the paragraph Signed-off-by: Łukasz Rynek * Add missing copy-curl tag Signed-off-by: Łukasz Rynek * Add missing article Signed-off-by: Łukasz Rynek * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Łukasz Rynek <36886649+lrynek@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Łukasz Rynek <36886649+lrynek@users.noreply.github.com> --------- Signed-off-by: Łukasz Rynek Signed-off-by: Łukasz Rynek <36886649+lrynek@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Fix: the value of include_defaults is a boolean. (#7657) * Fix: the value of include_defaults is a boolean. Signed-off-by: dblock * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: dblock Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Update detector-visualization integration documentation to specify real-time AD results only (#7663) * Update doc to specify real-time AD results only Signed-off-by: Tyler Ohlsen * Update _observing-your-data/ad/dashboards-anomaly-detection.md Signed-off-by: Melissa Vagi * Update _observing-your-data/ad/dashboards-anomaly-detection.md Signed-off-by: Melissa Vagi --------- Signed-off-by: Tyler Ohlsen Signed-off-by: Melissa Vagi Co-authored-by: Melissa Vagi Signed-off-by: leanne.laceybyrne@eliatra.com * Fixes table in Data Prepper write_json processor (#7518) * fixtable Signed-off-by: Heather Halter * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update write_json.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Heather Halter Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Quote all alphabetic defaults. (#7660) * Quote all default is true/false. Signed-off-by: dblock * Fixed non-boolean defaults. Signed-off-by: dblock * Replaced cluster_manager node by cluster manager node. Signed-off-by: dblock * Replaced master node by cluster manager node. Signed-off-by: dblock * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _tuning-your-cluster/availability-and-recovery/snapshots/sm-api.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: dblock Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * adding basic_auth config to ldap #907 (#7671) * adding basic_auth config to ldap #907 Signed-off-by: AntonEliatra * Update ldap.md Signed-off-by: AntonEliatra * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: AntonEliatra Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Remove Point in Time from Vale terms (#7679) Signed-off-by: Fanit Kolchina Signed-off-by: leanne.laceybyrne@eliatra.com * Adding DLS with write permission recommendation #1273 (#7668) * Adding DLS with write permission recommendation #1273 Signed-off-by: AntonEliatra * Update _security/access-control/document-level-security.md Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: AntonEliatra --------- Signed-off-by: AntonEliatra Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Add some example results to make functionality clearer (#7686) There doesn't seem to be a detailed spec for these functions. For example, what are the arguments of substring? First and last positions? (no) First position and length? (yes) Is position 0-origin or 1-origin? (1-origin) Does it accept counting position from the end with negative arguments? (yes) I've added an example result which at least clarifies the first 3 questions. Signed-off-by: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Update functions.md (#7688) Several of the functions mentioned in the SQL/PPL Functions page (https://opensearch.org/docs/latest/search-plugins/sql/functions/) are not in fact implemented in PPL. Signed-off-by: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Update _security/configuration/tls.md (#7691) Removed a link to a section that referenced itself. Signed-off-by: Heather Halter Signed-off-by: leanne.laceybyrne@eliatra.com * Add geopolygon query (#7665) * Add geopolygon query Signed-off-by: Fanit Kolchina * Update _query-dsl/geo-and-xy/geopolygon.md Co-authored-by: Melissa Vagi Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Add link to index file Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Add PR checklist workflow (#7699) * Add PR checklist workflow Signed-off-by: Fanit Kolchina * Assign to user instead of owner Signed-off-by: Fanit Kolchina * Testing Signed-off-by: Fanit Kolchina * Remove test Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * If a PR is submitted by a doc team member, assign that member Signed-off-by: Fanit Kolchina * Remove test Signed-off-by: Fanit Kolchina --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Add vector database page (#6238) * Add vector database page Signed-off-by: Fanit Kolchina * Revise wording Signed-off-by: Fanit Kolchina * Add k-NN example and address feedback Signed-off-by: Fanit Kolchina * Update _search-plugins/vector-search.md Co-authored-by: Melissa Vagi Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _search-plugins/vector-search.md Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Link fix Signed-off-by: Fanit Kolchina --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Heather Halter Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Update CONTRIBUTING.md (#7702) * Update CONTRIBUTING.md Tweaked the wording in the troubleshooting section. Signed-off-by: Heather Halter * Update CONTRIBUTING.md Co-authored-by: Nathan Bower Signed-off-by: Heather Halter * Update CONTRIBUTING.md Co-authored-by: Nathan Bower Signed-off-by: Heather Halter --------- Signed-off-by: Heather Halter Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Add new update_fields parameter to update workflow API (#7632) * Add new update_fields parameter to update workflow API Signed-off-by: Daniel Widdis * Update _automating-configurations/api/create-workflow.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Daniel Widdis * Update _automating-configurations/api/create-workflow.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Daniel Widdis * Update _automating-configurations/api/create-workflow.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Daniel Widdis * Fixes from doc review Signed-off-by: Daniel Widdis * Update _automating-configurations/api/create-workflow.md Co-authored-by: Nathan Bower Signed-off-by: Daniel Widdis --------- Signed-off-by: Daniel Widdis Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Correct k-NN settings and add more (#7693) * Correct k-NN settings and add more Signed-off-by: Fanit Kolchina * Add heading Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Add permission to write on PRs to the PR checklist workflow (#7711) Signed-off-by: Fanit Kolchina Signed-off-by: leanne.laceybyrne@eliatra.com * Remove model requirement from hybrid search documentation (#7511) * Remove model requirement from hybrid search documentation Signed-off-by: Fanit Kolchina * Review comment Signed-off-by: Fanit Kolchina * Revised sentence Signed-off-by: Fanit Kolchina * Update _search-plugins/hybrid-search.md Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Adds IntelliJ's *.iml to the .gitignore. (#7705) Signed-off-by: David Venable Signed-off-by: leanne.laceybyrne@eliatra.com * Fix format issue for the split ingest processor documentation (#7695) * Fix format issue for the split ingest processor documentation Signed-off-by: gaobinlong * Update _ingest-pipelines/processors/split.md Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Heather Halter * Update _ingest-pipelines/processors/split.md Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Heather Halter * Apply suggestions from code review Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Heather Halter * Update _ingest-pipelines/processors/split.md Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Heather Halter --------- Signed-off-by: gaobinlong Signed-off-by: Heather Halter Co-authored-by: Heather Halter Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Update _security/configuration/demo-configuration.md Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: leanneeliatra <131779422+leanneeliatra@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Improve wording for the 2 search mode of neural sparse documentation (#7718) * improve wording for ns Signed-off-by: zhichao-aws * Apply suggestions from code review Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: zhichao-aws Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Update nodes-stats.md (#7721) Fixing typo Signed-off-by: Landon Lengyel Signed-off-by: leanne.laceybyrne@eliatra.com * Update PR comment workflow to use pull request target (#7723) Signed-off-by: Fanit Kolchina Signed-off-by: leanne.laceybyrne@eliatra.com * Add 1.3.18 to version history (#7726) Signed-off-by: Fanit Kolchina Signed-off-by: leanne.laceybyrne@eliatra.com * Updates PPL description (#7637) * Update index.md Updated description of PPL. Signed-off-by: Heather Halter * Update index.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update search-plugins/sql/ppl/index/ Signed-off-by: Heather Halter * Update _search-plugins/sql/ppl/index.md Co-authored-by: Nathan Bower Signed-off-by: Heather Halter --------- Signed-off-by: Heather Halter Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Redirect to `latest` when the latest version is picked from the version selector (#7759) Signed-off-by: Miki Signed-off-by: leanne.laceybyrne@eliatra.com * Fix breadcrumbs by excluding collection index pages from parent relationship (#7758) Signed-off-by: Fanit Kolchina Signed-off-by: leanne.laceybyrne@eliatra.com * Remove redundant source (#7755) Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Update compare.md (#7765) The `--results-number-align` option for the compare API is actually spelled `--results-numbers-align`. This change updates the spelling for that option. Signed-off-by: Michael Oviedo Signed-off-by: leanne.laceybyrne@eliatra.com * Document CreateAnomalyDetectorTool (#7742) * Document CreateAnomalyDetectorTool Signed-off-by: gaobinlong * Fix format issue Signed-off-by: gaobinlong * Fix link Signed-off-by: gaobinlong * Update create-anomaly-detector.md Signed-off-by: Melissa Vagi * Update _ml-commons-plugin/agents-tools/tools/index.md Signed-off-by: Melissa Vagi * Update _ml-commons-plugin/agents-tools/tools/index.md Signed-off-by: Melissa Vagi * Update _ml-commons-plugin/agents-tools/tools/create-anomaly-detector.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ml-commons-plugin/agents-tools/tools/create-anomaly-detector.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ml-commons-plugin/agents-tools/tools/create-anomaly-detector.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ml-commons-plugin/agents-tools/tools/create-anomaly-detector.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ml-commons-plugin/agents-tools/tools/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --------- Signed-off-by: gaobinlong Signed-off-by: Melissa Vagi Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Add strict_allow_templates option for the dynamic mapping parameter (#7745) * Add strict_allow_templates option for the dynamic mapping parameter Signed-off-by: gaobinlong * Fix typo Signed-off-by: gaobinlong * Fix header Signed-off-by: gaobinlong * Update dynamic.md Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/supported-field-types/object.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/supported-field-types/object.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/dynamic.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/supported-field-types/nested.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update dynamic.md Make changes to address editorial review comments Signed-off-by: Melissa Vagi --------- Signed-off-by: gaobinlong Signed-off-by: Melissa Vagi Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Update per-cluster-metrics-monitors.md (#7769) Fixed typo in example. Signed-off-by: AWSHurneyt Signed-off-by: leanne.laceybyrne@eliatra.com * Update kafka.md (#7774) Fixed capitalization issue. Signed-off-by: Heather Halter Signed-off-by: leanne.laceybyrne@eliatra.com * Fix ISM error prevention setting key is not correct (#7777) Signed-off-by: gaobinlong Signed-off-by: leanne.laceybyrne@eliatra.com * Data Prepper documentation updates: autogeneration campaign (#7707) Updates Data Prepper documentation with some missing fields. Adds support for autogeneration of processors by naming to match the processor and including the autogenerated comment. Signed-off-by: David Venable Signed-off-by: David Venable Signed-off-by: Melissa Vagi Co-authored-by: Melissa Vagi Co-authored-by: Heather Halter Signed-off-by: leanne.laceybyrne@eliatra.com * Adds documentation for the Data Prepper delay processor. (#7708) Adds documentation for the delay processor. Signed-off-by: David Venable Signed-off-by: Melissa Vagi Co-authored-by: Melissa Vagi Signed-off-by: leanne.laceybyrne@eliatra.com * Update index.md (#7779) Community feedback Signed-off-by: Heather Halter Signed-off-by: leanne.laceybyrne@eliatra.com * add acronym for reference (#7786) Signed-off-by: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * add doc for nested_path (#7741) Signed-off-by: zhichao-aws Signed-off-by: leanne.laceybyrne@eliatra.com * Document new Split and Sort SearchResponseProcessors (#7767) * Add documentation for Sort SearchRequestProcessor Signed-off-by: Daniel Widdis * Add documentation for Split SearchRequestProcessor Signed-off-by: Daniel Widdis * Doc review Signed-off-by: Fanit Kolchina * Update _ingest-pipelines/processors/split.md Co-authored-by: Nathan Bower Signed-off-by: Daniel Widdis * Update _search-plugins/search-pipelines/sort-processor.md Co-authored-by: Nathan Bower Signed-off-by: Daniel Widdis * Update _search-plugins/search-pipelines/split-processor.md Co-authored-by: Nathan Bower Signed-off-by: Daniel Widdis * Update _search-plugins/search-pipelines/split-processor.md Co-authored-by: Nathan Bower Signed-off-by: Daniel Widdis * Update _search-plugins/search-pipelines/split-processor.md Co-authored-by: Nathan Bower Signed-off-by: Daniel Widdis * Update _search-plugins/search-pipelines/split-processor.md Co-authored-by: Nathan Bower Signed-off-by: Daniel Widdis --------- Signed-off-by: Daniel Widdis Signed-off-by: Fanit Kolchina Co-authored-by: Fanit Kolchina Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Add documentation for Deprovision Workflow API allow_delete parameter (#7639) * Add documentation for Deprovision Workflow API allow_delete parameter Signed-off-by: Daniel Widdis * Add new steps and missing delete search pipeline doc Signed-off-by: Daniel Widdis * Revert changes to workflow steps. Users can't use these new step types Signed-off-by: Daniel Widdis * Update _automating-configurations/api/deprovision-workflow.md Co-authored-by: Nathan Bower Signed-off-by: Daniel Widdis * Update _automating-configurations/api/deprovision-workflow.md Co-authored-by: Nathan Bower Signed-off-by: Daniel Widdis * Update _automating-configurations/api/deprovision-workflow.md Co-authored-by: Nathan Bower Signed-off-by: Daniel Widdis * Remove redundant use of workflow, accept other edits Signed-off-by: Daniel Widdis --------- Signed-off-by: Daniel Widdis Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Adds Documentation for dynamic query parameters for kNN search request (#7761) * Adds documentation for dynamic query parameters Signed-off-by: Tejas Shah * Update _search-plugins/knn/approximate-knn.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _search-plugins/knn/approximate-knn.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Tejas Shah * Update _search-plugins/knn/approximate-knn.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Tejas Shah * Update _search-plugins/knn/approximate-knn.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Tejas Shah * Update _search-plugins/knn/approximate-knn.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Tejas Shah * Update _search-plugins/knn/approximate-knn.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Tejas Shah * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Tejas Shah Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Add Rollover API (#7685) * Add Rollover API. Signed-off-by: Archer * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Make rollover match template. Signed-off-by: Archer * Apply suggestions from code review Co-authored-by: Melissa Vagi Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _api-reference/index-apis/rollover.md Co-authored-by: Melissa Vagi Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _api-reference/index-apis/rollover.md Co-authored-by: Melissa Vagi Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Co-authored-by: Melissa Vagi Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _api-reference/index-apis/rollover.md Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Archer Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Fix liquid syntax errors. (#7785) * Fix liquid syntax errors. Signed-off-by: Archer * Update render-template.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _api-reference/render-template.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _api-reference/render-template.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Archer Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Explain ISM + link (#7787) * Explain ISM + link Signed-off-by: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> * Update _im-plugin/refresh-analyzer.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: leanne.laceybyrne@eliatra.com * Unify and correct geoshape GeoJSON and WKT examples (#7801) * Unify and correct geoshape GeoJSON and WKT examples Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Documentation Updates for plugins.query.datasources.enabled SQL Setting (#7794) * Documentation Updates for plugins.query.datasources.enabled SQL Setting This setting allows users to toggle the data source code paths in the SQL plugin. Ref: https://github.com/opensearch-project/sql/pull/2811/files Signed-off-by: Frank Dattalo * Update _search-plugins/sql/settings.md Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Frank Dattalo Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Add the documentation of create or update alias API (#7641) * Add the documentation of create or update alias API Signed-off-by: gaobinlong * Fix typo Signed-off-by: gaobinlong * Refine the wording Signed-off-by: gaobinlong * Update update-alias.md * Fix typo Signed-off-by: gaobinlong * Add some clarification Signed-off-by: gaobinlong * Update update-alias.md * Update update-alias.md * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _api-reference/index-apis/update-alias.md Signed-off-by: Nathan Bower --------- Signed-off-by: gaobinlong Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Nathan Bower Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: leanne.laceybyrne@eliatra.com * Update to Mac section Signed-off-by: leanne.laceybyrne@eliatra.com * Update to mac section Signed-off-by: leanne.laceybyrne@eliatra.com * changing & to and Signed-off-by: leanne.laceybyrne@eliatra.com --------- Signed-off-by: leanne.laceybyrne@eliatra.com Signed-off-by: yujin-emma Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Yu Jin <112784385+yujin-emma@users.noreply.github.com> Signed-off-by: Melissa Vagi Signed-off-by: Chengwu Shi Signed-off-by: chengwushi-netapp <153049940+chengwushi-netapp@users.noreply.github.com> Signed-off-by: Utkarsh Agarwal Signed-off-by: Amit Galitzky Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Sooraj Sinha Signed-off-by: Rishabh Maurya Signed-off-by: Fanit Kolchina Signed-off-by: conggguan Signed-off-by: Navneet Verma Signed-off-by: Heather Halter Signed-off-by: Kiran Prakash Signed-off-by: Kaituo Li Signed-off-by: bowenlan-amzn Signed-off-by: Michael Froh Signed-off-by: Chenyang Ji Signed-off-by: Heemin Kim Signed-off-by: Gaurav Bafna Signed-off-by: Gaurav Bafna <85113518+gbbafna@users.noreply.github.com> Signed-off-by: Dennis Toepker Signed-off-by: toepkerd <120457569+toepkerd@users.noreply.github.com> Signed-off-by: YANGDB Signed-off-by: Simeon Widdis Signed-off-by: Yaliang Wu Signed-off-by: Riya Saxena Signed-off-by: Liyun Xiu Signed-off-by: Bhavana Ramaram Signed-off-by: Jing Zhang Signed-off-by: Kondaka Signed-off-by: David Venable Signed-off-by: AntonEliatra Signed-off-by: Taylor Gray Signed-off-by: Taylor Gray Signed-off-by: leanneeliatra <131779422+leanneeliatra@users.noreply.github.com> Signed-off-by: Jay Deng Signed-off-by: Don Fox Signed-off-by: Thomas Wing Signed-off-by: Archer Signed-off-by: Harsha Vamsi Kalluri Signed-off-by: Andrew Ross Signed-off-by: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> Signed-off-by: leedonggyu Signed-off-by: Heather Halter Signed-off-by: Hai Yan Signed-off-by: dblock Signed-off-by: c-neto Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> Signed-off-by: Eric Pugh Signed-off-by: Landon Lengyel Signed-off-by: gaobinlong Signed-off-by: Łukasz Rynek Signed-off-by: Łukasz Rynek <36886649+lrynek@users.noreply.github.com> Signed-off-by: Tyler Ohlsen Signed-off-by: Daniel Widdis Signed-off-by: zhichao-aws Signed-off-by: Miki Signed-off-by: Michael Oviedo Signed-off-by: AWSHurneyt Signed-off-by: Tejas Shah Signed-off-by: Frank Dattalo Signed-off-by: Nathan Bower Co-authored-by: Yu Jin <112784385+yujin-emma@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Melissa Vagi Co-authored-by: Nathan Bower Co-authored-by: chengwushi-netapp <153049940+chengwushi-netapp@users.noreply.github.com> Co-authored-by: Utkarsh Agarwal <126544832+Utkarsh-Aga@users.noreply.github.com> Co-authored-by: Utkarsh Agarwal Co-authored-by: Amit Galitzky Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Sooraj Sinha <81695996+soosinha@users.noreply.github.com> Co-authored-by: Rishabh Maurya Co-authored-by: Fanit Kolchina Co-authored-by: conggguan <157357330+conggguan@users.noreply.github.com> Co-authored-by: Navneet Verma Co-authored-by: Heather Halter Co-authored-by: Kiran Prakash Co-authored-by: Kaituo Li Co-authored-by: bowenlan-amzn Co-authored-by: Michael Froh Co-authored-by: Chenyang Ji Co-authored-by: Heemin Kim Co-authored-by: Gaurav Bafna <85113518+gbbafna@users.noreply.github.com> Co-authored-by: Bhumika Saini Co-authored-by: toepkerd <120457569+toepkerd@users.noreply.github.com> Co-authored-by: Dennis Toepker Co-authored-by: YANGDB Co-authored-by: Simeon Widdis Co-authored-by: Yaliang Wu Co-authored-by: Riya <69919272+riysaxen-amzn@users.noreply.github.com> Co-authored-by: Liyun Xiu Co-authored-by: Bhavana Ramaram Co-authored-by: Jing Zhang Co-authored-by: Krishna Kondaka <41027584+kkondaka@users.noreply.github.com> Co-authored-by: David Venable Co-authored-by: AntonEliatra Co-authored-by: Taylor Gray Co-authored-by: Jay Deng Co-authored-by: Don Fox Co-authored-by: Thomas Wing Co-authored-by: Harsha Vamsi Kalluri Co-authored-by: Andrew Ross Co-authored-by: Stavros Macrakis <134456002+smacrakis@users.noreply.github.com> Co-authored-by: leedonggyu Co-authored-by: Hai Yan <8153134+oeyh@users.noreply.github.com> Co-authored-by: Huy Nguyen <73027756+huyaboo@users.noreply.github.com> Co-authored-by: Daniel (dB.) Doubrovkine Co-authored-by: Carlos Neto Co-authored-by: RasonJ <145287540+RasonJ@users.noreply.github.com> Co-authored-by: Eric Pugh Co-authored-by: Landon Lengyel Co-authored-by: gaobinlong Co-authored-by: Łukasz Rynek <36886649+lrynek@users.noreply.github.com> Co-authored-by: Tyler Ohlsen Co-authored-by: Daniel Widdis Co-authored-by: zhichao-aws Co-authored-by: Miki Co-authored-by: Michael Oviedo Co-authored-by: AWSHurneyt Co-authored-by: Tejas Shah Co-authored-by: Frank Dattalo <73919354+fddattal@users.noreply.github.com> --- _security/configuration/demo-configuration.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/_security/configuration/demo-configuration.md b/_security/configuration/demo-configuration.md index 33794b0d67..be188169ad 100644 --- a/_security/configuration/demo-configuration.md +++ b/_security/configuration/demo-configuration.md @@ -9,7 +9,6 @@ nav_order: 4 Welcome to the OpenSearch Security plugin demo configuration setup guide. This tool provides a quick and easy way to replicate a production environment for testing purposes. The demo configuration includes the setup of security-related components, such as internal users, roles, role mappings, audit configuration, basic authentication, tenants, and allow lists. - The demo configuration tool performs the following tasks: 1. Configures security settings, which are then loaded into the security index. @@ -49,9 +48,10 @@ If you want to disable the Security plugin when using Docker, set the `DISABLE_S - One digit [0--9] - One special character -4. Run `docker-compose up`. +4. Make sure that Docker is running on your local machine +5. Run `docker-compose up` from the file directory where your `docker-compose.yml` file and `.env` file are located. -### TAR (Linux) +### TAR (Linux) and Mac OS For TAR distributions on Linux, download the Linux setup files from the OpenSearch [Download & Get Started](https://opensearch.org/downloads.html) page. Then use the following command to run the demo configuration: From c561f321801e19360ee4f1c7dda04424b4811d13 Mon Sep 17 00:00:00 2001 From: Dan Cristian Cecoi Date: Wed, 24 Jul 2024 18:23:02 +0100 Subject: [PATCH 061/154] Add documentation for configuring the password hashing algorithm and its properties (#7697) * Add documentation for configuring the password hashing algorithms and their properties Signed-off-by: Dan Cecoi * Small change to the warning message Signed-off-by: Dan Cecoi * Modified the warning message and its placement Signed-off-by: Dan Cecoi * modified the bcrypt.rounds explanation Signed-off-by: Dan Cecoi * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Dan Cecoi Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Dan Cecoi Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Nathan Bower --- .../security-settings.md | 35 +++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/_install-and-configure/configuring-opensearch/security-settings.md b/_install-and-configure/configuring-opensearch/security-settings.md index ffdad36cb3..b9c375d208 100644 --- a/_install-and-configure/configuring-opensearch/security-settings.md +++ b/_install-and-configure/configuring-opensearch/security-settings.md @@ -122,6 +122,41 @@ The Security plugin supports the following expert-level settings: - `plugins.security.check_snapshot_restore_write_privileges` (Static): Enforces write privilege evaluation when creating snapshots. Default is `true`. +If you change any of the following password hashing properties, you must rehash all internal passwords to ensure compatibility and security. +{: .warning} + +- `plugins.security.password.hashing.algorithm`: (Static): Specifies the password hashing algorithm to use. + + Valid values are: + + - `BCrypt` (Default) + - `PBKDF2` + +- `plugins.security.password.hashing.bcrypt.rounds` (Static): Specifies the number of rounds to use for password hashing with `BCrypt`. Valid values are between `4` and `31`, inclusive. Default is `12`. + +- `plugins.security.password.hashing.bcrypt.minor` (Static): Specifies the minor version of the `BCrypt` algorithm to use for password hashing. + + Valid values are: + + - `A` + - `B` + - `Y` (Default) + +- `plugins.security.password.hashing.pbkdf2.function` (Static): Specifies the pseudo-random function applied to the password. + + Valid values are: + + - `SHA1` + - `SHA224` + - `SHA256` (Default) + - `SHA384` + - `SHA512` + +- `plugins.security.password.hashing.pbkdf2.iterations` (Static): Specifies the number of times that the pseudo-random function is applied to the password. Default is `600,000`. + +- `plugins.security.password.hashing.pbkdf2.length` (Static): Specifies the desired length of the final derived key. Default is `256`. + + ## Audit log settings The Security plugin supports the following audit log settings: From 279237d3e4bebcf16173a3ad03b2ac151d68c2ac Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Wed, 24 Jul 2024 14:11:51 -0500 Subject: [PATCH 062/154] [Proposal] Update API template (#7709) * [Proposal] Update API template Updates the API template to allow for more dynamic formatting. Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update API_TEMPLATE.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update API_TEMPLATE.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Add H3 level examples. Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Make APIs conform to template Signed-off-by: Archer * Fix cluster API templates Signed-off-by: Archer * Fix link Signed-off-by: Archer --------- Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Archer --- API_STYLE_GUIDE.md | 4 +- _api-reference/analyze-apis.md | 2 +- _api-reference/cat/cat-aliases.md | 43 +++--- _api-reference/cat/cat-allocation.md | 43 +++--- _api-reference/cat/cat-cluster_manager.md | 15 +- _api-reference/cat/cat-count.md | 28 ++-- _api-reference/cat/cat-field-data.md | 43 +++--- _api-reference/cat/cat-health.md | 16 +- _api-reference/cat/cat-indices.md | 43 +++--- _api-reference/cat/cat-nodeattrs.md | 17 ++- _api-reference/cat/cat-nodes.md | 17 ++- _api-reference/cat/cat-pending-tasks.md | 17 ++- _api-reference/cat/cat-plugins.md | 17 ++- _api-reference/cat/cat-recovery.md | 43 +++--- _api-reference/cat/cat-repositories.md | 18 ++- _api-reference/cat/cat-segment-replication.md | 16 +- _api-reference/cat/cat-segments.md | 40 ++--- _api-reference/cat/cat-shards.md | 44 +++--- _api-reference/cat/cat-snapshots.md | 17 ++- _api-reference/cat/cat-tasks.md | 18 ++- _api-reference/cat/cat-templates.md | 31 ++-- _api-reference/cat/cat-thread-pool.md | 41 ++--- .../cluster-api/cluster-allocation.md | 25 +-- .../cluster-api/cluster-decommission.md | 4 +- _api-reference/cluster-api/cluster-health.md | 25 +-- .../cluster-api/cluster-settings.md | 42 +++--- _api-reference/cluster-api/cluster-stats.md | 17 ++- _api-reference/document-apis/bulk.md | 2 +- .../document-apis/delete-by-query.md | 2 +- .../document-apis/delete-document.md | 2 +- _api-reference/document-apis/get-documents.md | 2 +- .../document-apis/index-document.md | 2 +- _api-reference/document-apis/multi-get.md | 15 +- _api-reference/document-apis/reindex.md | 2 +- .../document-apis/update-by-query.md | 2 +- .../document-apis/update-document.md | 2 +- _api-reference/index-apis/alias.md | 2 +- .../index-apis/clear-index-cache.md | 12 +- _api-reference/index-apis/clone.md | 20 ++- _api-reference/index-apis/close-index.md | 2 +- _api-reference/index-apis/create-index.md | 2 +- _api-reference/index-apis/delete-index.md | 2 +- _api-reference/index-apis/exists.md | 2 +- _api-reference/index-apis/force-merge.md | 26 +++- _api-reference/index-apis/get-index.md | 2 +- _api-reference/index-apis/get-settings.md | 2 +- _api-reference/index-apis/open-index.md | 2 +- _api-reference/index-apis/split.md | 2 +- _api-reference/index-apis/stats.md | 30 +++- .../nodes-apis/nodes-hot-threads.md | 6 +- _api-reference/nodes-apis/nodes-info.md | 4 +- .../nodes-apis/nodes-reload-secure.md | 4 +- _api-reference/nodes-apis/nodes-stats.md | 4 +- _api-reference/nodes-apis/nodes-usage.md | 4 +- _api-reference/profile.md | 2 +- _api-reference/rank-eval.md | 4 +- .../script-apis/create-stored-script.md | 4 +- _api-reference/script-apis/delete-script.md | 4 +- _api-reference/script-apis/exec-script.md | 4 +- .../script-apis/exec-stored-script.md | 2 +- .../script-apis/get-script-contexts.md | 4 +- .../script-apis/get-script-language.md | 4 +- .../script-apis/get-stored-script.md | 4 +- _api-reference/scroll.md | 2 +- _api-reference/snapshots/create-repository.md | 6 +- _api-reference/snapshots/create-snapshot.md | 4 +- .../snapshots/delete-snapshot-repository.md | 4 +- _api-reference/snapshots/delete-snapshot.md | 4 +- .../snapshots/get-snapshot-repository.md | 4 +- .../snapshots/get-snapshot-status.md | 4 +- _api-reference/snapshots/get-snapshot.md | 4 +- _api-reference/snapshots/restore-snapshot.md | 4 +- .../snapshots/verify-snapshot-repository.md | 4 +- _api-reference/tasks.md | 142 +++--------------- templates/API_TEMPLATE.md | 36 ++++- 75 files changed, 552 insertions(+), 543 deletions(-) diff --git a/API_STYLE_GUIDE.md b/API_STYLE_GUIDE.md index 6dc40df017..a6e0551f17 100644 --- a/API_STYLE_GUIDE.md +++ b/API_STYLE_GUIDE.md @@ -115,7 +115,7 @@ Include a table with these columns: Field | Data type | Description :--- | :--- | :--- -#### Example request +## Example request Provide a sentence that describes what is shown in the example, followed by a cut-and-paste-ready API request in JSON format. Make sure that you test the request yourself in the Dashboards Dev Tools console to make sure it works. See the following examples. @@ -139,7 +139,7 @@ POST _reindex } ``` -#### Example response +## Example response Include a JSON example response to show what the API returns. See the following examples. diff --git a/_api-reference/analyze-apis.md b/_api-reference/analyze-apis.md index 10af71c1ad..ac8e9e249f 100644 --- a/_api-reference/analyze-apis.md +++ b/_api-reference/analyze-apis.md @@ -61,7 +61,7 @@ Field | Data type | Description :--- | :--- | :--- text | String or Array of Strings | Text to analyze. If you provide an array of strings, the text is analyzed as a multi-value field. -#### Example requests +## Example requests [Analyze array of text strings](#analyze-array-of-text-strings) diff --git a/_api-reference/cat/cat-aliases.md b/_api-reference/cat/cat-aliases.md index b0c2d7184e..2d5c5c300a 100644 --- a/_api-reference/cat/cat-aliases.md +++ b/_api-reference/cat/cat-aliases.md @@ -15,26 +15,6 @@ has_children: false The CAT aliases operation lists the mapping of aliases to indexes, plus routing and filtering information. -## Example - -```json -GET _cat/aliases?v -``` -{% include copy-curl.html %} - -To limit the information to a specific alias, add the alias name after your query: - -```json -GET _cat/aliases/?v -``` -{% include copy-curl.html %} - -If you want to get information for more than one alias, separate the alias names with commas: - -```json -GET _cat/aliases/alias1,alias2,alias3 -``` -{% include copy-curl.html %} ## Path and HTTP methods @@ -55,7 +35,28 @@ Parameter | Type | Description local | Boolean | Whether to return information from the local node only instead of from the cluster manager node. Default is `false`. expand_wildcards | Enum | Expands wildcard expressions to concrete indexes. Combine multiple values with commas. Supported values are `all`, `open`, `closed`, `hidden`, and `none`. Default is `open`. -## Response +## Example requests + +```json +GET _cat/aliases?v +``` +{% include copy-curl.html %} + +To limit the information to a specific alias, add the alias name after your query: + +```json +GET _cat/aliases/?v +``` +{% include copy-curl.html %} + +If you want to get information for more than one alias, separate the alias names with commas: + +```json +GET _cat/aliases/alias1,alias2,alias3 +``` +{% include copy-curl.html %} + +## Example response The following response shows that `alias1` refers to a `movies` index and has a configured filter: diff --git a/_api-reference/cat/cat-allocation.md b/_api-reference/cat/cat-allocation.md index 23ebed79ff..085a755dc1 100644 --- a/_api-reference/cat/cat-allocation.md +++ b/_api-reference/cat/cat-allocation.md @@ -14,26 +14,6 @@ has_children: false The CAT allocation operation lists the allocation of disk space for indexes and the number of shards on each node. -## Example - -```json -GET _cat/allocation?v -``` -{% include copy-curl.html %} - -To limit the information to a specific node, add the node name after your query: - -```json -GET _cat/allocation/ -``` -{% include copy-curl.html %} - -If you want to get information for more than one node, separate the node names with commas: - -```json -GET _cat/allocation/node_name_1,node_name_2,node_name_3 -``` -{% include copy-curl.html %} ## Path and HTTP methods @@ -54,7 +34,28 @@ bytes | Byte size | Specify the units for byte size. For example, `7kb` or `6gb` local | Boolean | Whether to return information from the local node only instead of from the cluster manager node. Default is `false`. cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds. -## Response +## Example requests + +```json +GET _cat/allocation?v +``` +{% include copy-curl.html %} + +To limit the information to a specific node, add the node name after your query: + +```json +GET _cat/allocation/ +``` +{% include copy-curl.html %} + +If you want to get information for more than one node, separate the node names with commas: + +```json +GET _cat/allocation/node_name_1,node_name_2,node_name_3 +``` +{% include copy-curl.html %} + +## Example response The following response shows that eight shards are allocated to each of the two nodes available: diff --git a/_api-reference/cat/cat-cluster_manager.md b/_api-reference/cat/cat-cluster_manager.md index abf204ce16..d81e334009 100644 --- a/_api-reference/cat/cat-cluster_manager.md +++ b/_api-reference/cat/cat-cluster_manager.md @@ -14,12 +14,6 @@ has_children: false The CAT cluster manager operation lists information that helps identify the elected cluster manager node. -## Example - -``` -GET _cat/cluster_manager?v -``` -{% include copy-curl.html %} ## Path and HTTP methods @@ -37,7 +31,14 @@ Parameter | Type | Description :--- | :--- | :--- cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds. -## Response +## Example requests + +``` +GET _cat/cluster_manager?v +``` +{% include copy-curl.html %} + +## Example response ```json id | host | ip | node diff --git a/_api-reference/cat/cat-count.md b/_api-reference/cat/cat-count.md index 34baa04dd4..8d0b4fbad2 100644 --- a/_api-reference/cat/cat-count.md +++ b/_api-reference/cat/cat-count.md @@ -15,7 +15,19 @@ redirect_from: The CAT count operation lists the number of documents in your cluster. -## Example + +## Path and HTTP methods + +``` +GET _cat/count?v +GET _cat/count/?v +``` + +## URL parameters + +All CAT count URL parameters are optional. You can specify any of the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index). + +## Example requests ```json GET _cat/count?v @@ -36,19 +48,7 @@ GET _cat/count/index_or_alias_1,index_or_alias_2,index_or_alias_3 ``` {% include copy-curl.html %} -## Path and HTTP methods - -``` -GET _cat/count?v -GET _cat/count/?v -``` - -## URL parameters - -All CAT count URL parameters are optional. You can specify any of the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index). - - -## Response +## Example response The following response shows the overall document count as 1625: diff --git a/_api-reference/cat/cat-field-data.md b/_api-reference/cat/cat-field-data.md index 6481e5cea1..05c720b952 100644 --- a/_api-reference/cat/cat-field-data.md +++ b/_api-reference/cat/cat-field-data.md @@ -8,13 +8,31 @@ redirect_from: - /opensearch/rest-api/cat/cat-field-data/ --- -# CAT fielddata +# CAT Field Data **Introduced 1.0** {: .label .label-purple } -The CAT fielddata operation lists the memory size used by each field per node. +The CAT Field Data operation lists the memory size used by each field per node. -## Example + +## Path and HTTP methods + +``` +GET _cat/fielddata?v +GET _cat/fielddata/?v +``` + +## URL parameters + +All CAT fielddata URL parameters are optional. + +In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameter: + +Parameter | Type | Description +:--- | :--- | :--- +bytes | Byte size | Specify the units for byte size. For example, `7kb` or `6gb`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). + +## Example requests ```json GET _cat/fielddata?v @@ -35,24 +53,7 @@ GET _cat/fielddata/field_name_1,field_name_2,field_name_3 ``` {% include copy-curl.html %} -## Path and HTTP methods - -``` -GET _cat/fielddata?v -GET _cat/fielddata/?v -``` - -## URL parameters - -All CAT fielddata URL parameters are optional. - -In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameter: - -Parameter | Type | Description -:--- | :--- | :--- -bytes | Byte size | Specify the units for byte size. For example, `7kb` or `6gb`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). - -## Response +## Example response The following response shows the memory size for all fields as 284 bytes: diff --git a/_api-reference/cat/cat-health.md b/_api-reference/cat/cat-health.md index 7767cfbc46..1c400916ad 100644 --- a/_api-reference/cat/cat-health.md +++ b/_api-reference/cat/cat-health.md @@ -15,12 +15,6 @@ redirect_from: The CAT health operation lists the status of the cluster, how long the cluster has been up, the number of nodes, and other useful information that helps you analyze the health of your cluster. -## Example - -```json -GET _cat/health?v -``` -{% include copy-curl.html %} ## Path and HTTP methods @@ -38,7 +32,15 @@ Parameter | Type | Description time | Time | Specify the units for time. For example, `5d` or `7h`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). ts | Boolean | If true, returns HH:MM:SS and Unix epoch timestamps. Default is `true`. -## Response +## Example request + +The following example request give cluster health information for the past 5 days: + +```json +GET _cat/health?v&time=5d +``` + +## Example response ```json GET _cat/health?v&time=5d diff --git a/_api-reference/cat/cat-indices.md b/_api-reference/cat/cat-indices.md index fe9556899e..16c57e5791 100644 --- a/_api-reference/cat/cat-indices.md +++ b/_api-reference/cat/cat-indices.md @@ -14,26 +14,6 @@ redirect_from: The CAT indices operation lists information related to indexes, that is, how much disk space they are using, how many shards they have, their health status, and so on. -## Example - -``` -GET _cat/indices?v -``` -{% include copy-curl.html %} - -To limit the information to a specific index, add the index name after your query. - -``` -GET _cat/indices/?v -``` -{% include copy-curl.html %} - -If you want to get information for more than one index, separate the indexes with commas: - -```json -GET _cat/indices/index1,index2,index3 -``` -{% include copy-curl.html %} ## Path and HTTP methods @@ -58,8 +38,29 @@ pri | Boolean | Whether to return information only from the primary shards. Defa time | Time | Specify the units for time. For example, `5d` or `7h`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). expand_wildcards | Enum | Expands wildcard expressions to concrete indexes. Combine multiple values with commas. Supported values are `all`, `open`, `closed`, `hidden`, and `none`. Default is `open`. +## Example requests + +``` +GET _cat/indices?v +``` +{% include copy-curl.html %} + +To limit the information to a specific index, add the index name after your query. + +``` +GET _cat/indices/?v +``` +{% include copy-curl.html %} + +If you want to get information for more than one index, separate the indexes with commas: + +```json +GET _cat/indices/index1,index2,index3 +``` +{% include copy-curl.html %} + -## Response +## Example response ```json health | status | index | uuid | pri | rep | docs.count | docs.deleted | store.size | pri.store.size diff --git a/_api-reference/cat/cat-nodeattrs.md b/_api-reference/cat/cat-nodeattrs.md index 6b4cc6d92e..b09e164698 100644 --- a/_api-reference/cat/cat-nodeattrs.md +++ b/_api-reference/cat/cat-nodeattrs.md @@ -14,12 +14,6 @@ redirect_from: The CAT nodeattrs operation lists the attributes of custom nodes. -## Example - -``` -GET _cat/nodeattrs?v -``` -{% include copy-curl.html %} ## Path and HTTP methods @@ -38,8 +32,17 @@ Parameter | Type | Description local | Boolean | Whether to return information from the local node only instead of from the cluster manager node. Default is `false`. cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds. +## Example request + +The following example request returns attributes about custom nodes: + +``` +GET _cat/nodeattrs?v +``` +{% include copy-curl.html %} + -## Response +## Example response ```json node | host | ip | attr | value diff --git a/_api-reference/cat/cat-nodes.md b/_api-reference/cat/cat-nodes.md index 864e5dfdd5..5e7238a0d0 100644 --- a/_api-reference/cat/cat-nodes.md +++ b/_api-reference/cat/cat-nodes.md @@ -16,12 +16,6 @@ The CAT nodes operation lists node-level information, including node roles and l A few important node metrics are `pid`, `name`, `cluster_manager`, `ip`, `port`, `version`, `build`, `jdk`, along with `disk`, `heap`, `ram`, and `file_desc`. -## Example - -``` -GET _cat/nodes?v -``` -{% include copy-curl.html %} ## Path and HTTP methods @@ -43,8 +37,17 @@ cluster_manager_timeout | Time | The amount of time to wait for a connection to time | Time | Specify the units for time. For example, `5d` or `7h`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). include_unloaded_segments | Boolean | Whether to include information from segments not loaded into memory. Default is `false`. +## Example request + +The following example request lists node level information: + +``` +GET _cat/nodes?v +``` +{% include copy-curl.html %} + -## Response +## Example response ```json ip | heap.percent | ram.percent | cpu load_1m | load_5m | load_15m | node.role | node.roles | cluster_manager | name diff --git a/_api-reference/cat/cat-pending-tasks.md b/_api-reference/cat/cat-pending-tasks.md index 748defd06e..ea224670ac 100644 --- a/_api-reference/cat/cat-pending-tasks.md +++ b/_api-reference/cat/cat-pending-tasks.md @@ -15,12 +15,6 @@ redirect_from: The CAT pending tasks operation lists the progress of all pending tasks, including task priority and time in queue. -## Example - -``` -GET _cat/pending_tasks?v -``` -{% include copy-curl.html %} ## Path and HTTP methods @@ -40,7 +34,16 @@ local | Boolean | Whether to return information from the local node only instead cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds. time | Time | Specify the units for time. For example, `5d` or `7h`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). -## Response +## Example request + +The following example request lists the progress of all pending node tasks: + +``` +GET _cat/pending_tasks?v +``` +{% include copy-curl.html %} + +## Example response ```json insertOrder | timeInQueue | priority | source diff --git a/_api-reference/cat/cat-plugins.md b/_api-reference/cat/cat-plugins.md index 519c77f27f..358eb70fbf 100644 --- a/_api-reference/cat/cat-plugins.md +++ b/_api-reference/cat/cat-plugins.md @@ -15,12 +15,6 @@ redirect_from: The CAT plugins operation lists the names, components, and versions of the installed plugins. -## Example - -``` -GET _cat/plugins?v -``` -{% include copy-curl.html %} ## Path and HTTP methods @@ -39,7 +33,16 @@ Parameter | Type | Description local | Boolean | Whether to return information from the local node only instead of from the cluster manager node. Default is `false`. cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds. -## Response +## Example requests + +The following example request lists all installed plugins: + +``` +GET _cat/plugins?v +``` +{% include copy-curl.html %} + +## Example response ```json name component version diff --git a/_api-reference/cat/cat-recovery.md b/_api-reference/cat/cat-recovery.md index da66aa7272..8f251a94e0 100644 --- a/_api-reference/cat/cat-recovery.md +++ b/_api-reference/cat/cat-recovery.md @@ -15,26 +15,6 @@ redirect_from: The CAT recovery operation lists all completed and ongoing index and shard recoveries. -## Example - -``` -GET _cat/recovery?v -``` -{% include copy-curl.html %} - -To see only the recoveries of a specific index, add the index name after your query. - -``` -GET _cat/recovery/?v -``` -{% include copy-curl.html %} - -If you want to get information for more than one index, separate the indexes with commas: - -```json -GET _cat/recovery/index1,index2,index3 -``` -{% include copy-curl.html %} ## Path and HTTP methods @@ -55,7 +35,28 @@ bytes | Byte size | Specify the units for byte size. For example, `7kb` or `6gb` detailed | Boolean | Whether to include detailed information about shard recoveries. Default is `false`. time | Time | Specify the units for time. For example, `5d` or `7h`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). -## Response +## Example requests + +``` +GET _cat/recovery?v +``` +{% include copy-curl.html %} + +To see only the recoveries of a specific index, add the index name after your query. + +``` +GET _cat/recovery/?v +``` +{% include copy-curl.html %} + +If you want to get information for more than one index, separate the indexes with commas: + +```json +GET _cat/recovery/index1,index2,index3 +``` +{% include copy-curl.html %} + +## Example response ```json index | shard | time | type | stage | source_host | source_node | target_host | target_node | repository | snapshot | files | files_recovered | files_percent | files_total | bytes | bytes_recovered | bytes_percent | bytes_total | translog_ops | translog_ops_recovered | translog_ops_percent diff --git a/_api-reference/cat/cat-repositories.md b/_api-reference/cat/cat-repositories.md index c6d62c9c62..f0fc4bb622 100644 --- a/_api-reference/cat/cat-repositories.md +++ b/_api-reference/cat/cat-repositories.md @@ -15,13 +15,6 @@ redirect_from: The CAT repositories operation lists all snapshot repositories for a cluster. -## Example - -``` -GET _cat/repositories?v -``` -{% include copy-curl.html %} - ## Path and HTTP methods ``` @@ -39,8 +32,17 @@ Parameter | Type | Description local | Boolean | Whether to return information from the local node only instead of from the cluster manager node. Default is `false`. cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds. +## Example request + +The following example request lists all snapshot repositories in the cluster: + +``` +GET _cat/repositories?v +``` +{% include copy-curl.html %} + -## Response +## Example response ```json id type diff --git a/_api-reference/cat/cat-segment-replication.md b/_api-reference/cat/cat-segment-replication.md index e22012ea66..5900b97a7c 100644 --- a/_api-reference/cat/cat-segment-replication.md +++ b/_api-reference/cat/cat-segment-replication.md @@ -47,11 +47,11 @@ Parameter | Data type | Description `v` | Boolean | If `true`, the response includes column headings. Defaults to `false`. `s` | String | Specifies to sort the results. For example, `s=shardId:desc` sorts by shardId in descending order. -## Example +## Example requests The following examples illustrate various segment replication responses. -#### Example 1: No active segment replication events +### No active segment replication events The following query requests segment replication metrics with column headings for all indexes: @@ -67,7 +67,7 @@ shardId target_node target_host checkpoints_behind bytes_behind current_lag last [index-1][0] runTask-1 127.0.0.1 0 0b 0s 7ms 0 ``` -#### Example 2: Shard ID specified +### Shard ID specified The following query requests segment replication metrics with column headings for shards with the ID `0` from indexes `index1` and `index2`: @@ -84,7 +84,7 @@ shardId target_node target_host checkpoints_behind bytes_behind current_lag last [index-2][0] runTask-1 127.0.0.1 0 0b 0s 5ms 0 ``` -#### Example 3: Detailed response +### Detailed response The following query requests detailed segment replication metrics with column headings for all indexes: @@ -101,7 +101,7 @@ shardId target_node target_host checkpoints_behind bytes_behind current_lag last [index-2][0] runTask-1 127.0.0.1 0 0b 0s 5ms 0 done 7ms 3 100.0% 3664 100.0% 2023-03-16T13:53:33.466Z 2023-03-16T13:53:33.474Z 3 3 3.5kb 3.5kb 0s 1ms 0s 2ms 2ms ``` -#### Example 4: Sorting the results +### Sorting the results The following query requests segment replication metrics with column headings for all indexes, sorted by shard ID in descending order: @@ -118,7 +118,7 @@ shardId target_node target_host checkpoints_behind bytes_behind current_lag [test6][0] runTask-2 127.0.0.1 0 0b 0s 4ms 0 ``` -#### Example 5: Using a metric alias +### Using a metric alias In a request, you can either use a metric's full name or one of its aliases. The following query is the same as the preceding query, but it uses the alias `s` instead of `shardID` for sorting: @@ -127,9 +127,9 @@ GET /_cat/segment_replication?v&s=s:desc ``` {% include copy-curl.html %} -## Response metrics +## Example response metrics -The following table lists the response metrics that are returned for all requests. When referring to a metric in a query parameter, you can provide either the metric's full name or any of its aliases, as shown in the previous [example](#example-5-using-a-metric-alias). +The following table lists the response metrics that are returned for all requests. When referring to a metric in a query parameter, you can provide either the metric's full name or any of its aliases, as shown in the previous [example](#using-a-metric-alias). Metric | Alias | Description :--- | :--- | :--- diff --git a/_api-reference/cat/cat-segments.md b/_api-reference/cat/cat-segments.md index b860486692..cd9eda38be 100644 --- a/_api-reference/cat/cat-segments.md +++ b/_api-reference/cat/cat-segments.md @@ -15,7 +15,25 @@ redirect_from: The cat segments operation lists Lucene segment-level information for each index. -## Example + +## Path and HTTP methods + +``` +GET _cat/segments +``` + +## URL parameters + +All CAT segments URL parameters are optional. + +In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameters: + +Parameter | Type | Description +:--- | :--- | :--- +bytes | Byte size | Specify the units for byte size. For example, `7kb` or `6gb`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/).. +cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds. + +## Example requests ``` GET _cat/segments?v @@ -36,25 +54,7 @@ GET _cat/segments/index1,index2,index3 ``` {% include copy-curl.html %} -## Path and HTTP methods - -``` -GET _cat/segments -``` - -## URL parameters - -All CAT segments URL parameters are optional. - -In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameters: - -Parameter | Type | Description -:--- | :--- | :--- -bytes | Byte size | Specify the units for byte size. For example, `7kb` or `6gb`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/).. -cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds. - - -## Response +## Example response ```json index | shard | prirep | ip | segment | generation | docs.count | docs.deleted | size | size.memory | committed | searchable | version | compound diff --git a/_api-reference/cat/cat-shards.md b/_api-reference/cat/cat-shards.md index 9a727b5b11..b07f11aca3 100644 --- a/_api-reference/cat/cat-shards.md +++ b/_api-reference/cat/cat-shards.md @@ -15,26 +15,6 @@ redirect_from: The CAT shards operation lists the state of all primary and replica shards and how they are distributed. -## Example - -``` -GET _cat/shards?v -``` -{% include copy-curl.html %} - -To see only the information about shards of a specific index, add the index name after your query. - -``` -GET _cat/shards/?v -``` -{% include copy-curl.html %} - -If you want to get information for more than one index, separate the indexes with commas: - -``` -GET _cat/shards/index1,index2,index3 -``` -{% include copy-curl.html %} ## Path and HTTP methods @@ -55,8 +35,30 @@ local | Boolean | Whether to return information from the local node only instead cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds. time | Time | Specify the units for time. For example, `5d` or `7h`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). +## Example requests + +The following example requests returns information about shards: + +``` +GET _cat/shards?v +``` +{% include copy-curl.html %} + +To see only the information about shards of a specific index, add the index name after your query. + +``` +GET _cat/shards/?v +``` +{% include copy-curl.html %} + +If you want to get information for more than one index, separate the indexes with commas: + +``` +GET _cat/shards/index1,index2,index3 +``` +{% include copy-curl.html %} -## Response +## Example response ```json index | shard | prirep | state | docs | store | ip | | node diff --git a/_api-reference/cat/cat-snapshots.md b/_api-reference/cat/cat-snapshots.md index 82cb5c1b1f..2e1bd514bf 100644 --- a/_api-reference/cat/cat-snapshots.md +++ b/_api-reference/cat/cat-snapshots.md @@ -15,12 +15,6 @@ redirect_from: The CAT snapshots operation lists all snapshots for a repository. -## Example - -``` -GET _cat/snapshots?v -``` -{% include copy-curl.html %} ## Path and HTTP methods @@ -39,8 +33,17 @@ Parameter | Type | Description cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds. time | Time | Specify the units for time. For example, `5d` or `7h`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). +## Example request + +The following example request lists all snapshots: + +``` +GET _cat/snapshots?v +``` +{% include copy-curl.html %} + -## Response +## Example response ```json index | shard | prirep | state | docs | store | ip | | node diff --git a/_api-reference/cat/cat-tasks.md b/_api-reference/cat/cat-tasks.md index 4d2a06cced..7a71b592e7 100644 --- a/_api-reference/cat/cat-tasks.md +++ b/_api-reference/cat/cat-tasks.md @@ -15,13 +15,6 @@ redirect_from: The CAT tasks operation lists the progress of all tasks currently running on your cluster. -## Example - -``` -GET _cat/tasks?v -``` -{% include copy-curl.html %} - ## Path and HTTP methods ``` @@ -41,8 +34,17 @@ detailed | Boolean | Returns detailed task information. (Default: false) parent_task_id | String | Returns tasks with a specified parent task ID (node_id:task_number). Keep empty or set to -1 to return all. time | Time | Specify the units for time. For example, `5d` or `7h`. For more information, see [Supported units]({{site.url}}{{site.baseurl}}/opensearch/units/). +## Example request + +The following example request lists all tasks in progress: + +``` +GET _cat/tasks?v +``` +{% include copy-curl.html %} + -## Response +## Example response ```json action | task_id | parent_task_id | type | start_time | timestamp | running_time | ip | node diff --git a/_api-reference/cat/cat-templates.md b/_api-reference/cat/cat-templates.md index d7c7aac90f..ba47ae711d 100644 --- a/_api-reference/cat/cat-templates.md +++ b/_api-reference/cat/cat-templates.md @@ -15,19 +15,6 @@ redirect_from: The CAT templates operation lists the names, patterns, order numbers, and version numbers of index templates. -## Example - -``` -GET _cat/templates?v -``` -{% include copy-curl.html %} - -If you want to get information for a specific template or pattern: - -``` -GET _cat/templates/ -``` -{% include copy-curl.html %} ## Path and HTTP methods @@ -47,8 +34,24 @@ Parameter | Type | Description local | Boolean | Whether to return information from the local node only instead of from the cluster manager node. Default is `false`. cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds. +## Example requests + +The following example request returns information about all templates: + +``` +GET _cat/templates?v +``` +{% include copy-curl.html %} + +If you want to get information for a specific template or pattern: + +``` +GET _cat/templates/ +``` +{% include copy-curl.html %} + -## Response +## Example response ``` name | index_patterns order version composed_of diff --git a/_api-reference/cat/cat-thread-pool.md b/_api-reference/cat/cat-thread-pool.md index 491b523092..de24052175 100644 --- a/_api-reference/cat/cat-thread-pool.md +++ b/_api-reference/cat/cat-thread-pool.md @@ -14,7 +14,27 @@ redirect_from: The CAT thread pool operation lists the active, queued, and rejected threads of different thread pools on each node. -## Example + +## Path and HTTP methods + +``` +GET _cat/thread_pool +``` + +## URL parameters + +All CAT thread pool URL parameters are optional. + +In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameters: + +Parameter | Type | Description +:--- | :--- | :--- +local | Boolean | Whether to return information from the local node only instead of from the cluster manager node. Default is `false`. +cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds. + +## Example requests + +The following example request gives information about thread pools on all nodes: ``` GET _cat/thread_pool?v @@ -35,25 +55,8 @@ GET _cat/thread_pool/?v ``` {% include copy-curl.html %} -## Path and HTTP methods - -``` -GET _cat/thread_pool -``` - -## URL parameters - -All CAT thread pool URL parameters are optional. - -In addition to the [common URL parameters]({{site.url}}{{site.baseurl}}/api-reference/cat/index), you can specify the following parameters: - -Parameter | Type | Description -:--- | :--- | :--- -local | Boolean | Whether to return information from the local node only instead of from the cluster manager node. Default is `false`. -cluster_manager_timeout | Time | The amount of time to wait for a connection to the cluster manager node. Default is 30 seconds. - -## Response +## Example response ```json node_name name active queue rejected diff --git a/_api-reference/cluster-api/cluster-allocation.md b/_api-reference/cluster-api/cluster-allocation.md index b1b1c266d6..4ec6e27f2b 100644 --- a/_api-reference/cluster-api/cluster-allocation.md +++ b/_api-reference/cluster-api/cluster-allocation.md @@ -17,17 +17,6 @@ The most basic cluster allocation explain request finds an unassigned shard and If you add some options, you can instead get information on a specific shard, including why OpenSearch assigned it to its current node. -## Example - -```json -GET _cluster/allocation/explain?include_yes_decisions=true -{ - "index": "movies", - "shard": 0, - "primary": true -} -``` -{% include copy-curl.html %} ## Path and HTTP methods @@ -58,8 +47,20 @@ index | String | The name of the shard's index. primary | Boolean | Whether to provide an explanation for the primary shard (true) or its first replica (false), which share the same shard ID. shard | Integer | The shard ID that you want an explanation for. +## Example request + +```json +GET _cluster/allocation/explain?include_yes_decisions=true +{ + "index": "movies", + "shard": 0, + "primary": true +} +``` +{% include copy-curl.html %} + -## Response +## Example response ```json { diff --git a/_api-reference/cluster-api/cluster-decommission.md b/_api-reference/cluster-api/cluster-decommission.md index 867f58eda0..c707e5390a 100644 --- a/_api-reference/cluster-api/cluster-decommission.md +++ b/_api-reference/cluster-api/cluster-decommission.md @@ -54,7 +54,7 @@ DELETE /_cluster/decommission/awareness ``` {% include copy-curl.html %} -#### Response +#### Example response ```json @@ -74,7 +74,7 @@ GET /_cluster/decommission/awareness/zone/_status ``` {% include copy-curl.html %} -#### Response +#### Example response ```json { diff --git a/_api-reference/cluster-api/cluster-health.md b/_api-reference/cluster-api/cluster-health.md index 73c83d5ee6..7081a1a587 100644 --- a/_api-reference/cluster-api/cluster-health.md +++ b/_api-reference/cluster-api/cluster-health.md @@ -17,16 +17,6 @@ The most basic cluster health request returns a simple status of the health of y To get the status of a specific index, provide the index name. -## Example - -This request waits 50 seconds for the cluster to reach the yellow status or better: - -``` -GET _cluster/health?wait_for_status=yellow&timeout=50s -``` -{% include copy-curl.html %} - -If the cluster health becomes yellow or green before 50 seconds elapse, it returns a response immediately. Otherwise it returns a response as soon as it exceeds the timeout. ## Path and HTTP methods @@ -55,7 +45,18 @@ wait_for_no_initializing_shards | Boolean | Whether to wait until there are no i wait_for_status | Enum | Wait until the cluster health reaches the specified status or better. Supported values are `green`, `yellow`, and `red`. weights | JSON object | Assigns weights to attributes within the request body of the PUT request. Weights can be set in any ration, for example, 2:3:5. In a 2:3:5 ratio with three zones, for every 100 requests sent to the cluster, each zone would receive either 20, 30, or 50 search requests in a random order. When assigned a weight of `0`, the zone does not receive any search traffic. -#### Example request +## Example requests + +The following examples show how to use the cluster health API. + +This request waits 50 seconds for the cluster to reach the yellow status or better: + +``` +GET _cluster/health?wait_for_status=yellow&timeout=50s +``` +{% include copy-curl.html %} + +If the cluster health becomes yellow or green before 50 seconds elapse, it returns a response immediately. Otherwise it returns a response as soon as it exceeds the timeout. The following example request retrieves cluster health for all indexes in the cluster: @@ -64,7 +65,7 @@ GET _cluster/health ``` {% include copy-curl.html %} -#### Example response +## Example response The response contains cluster health information: diff --git a/_api-reference/cluster-api/cluster-settings.md b/_api-reference/cluster-api/cluster-settings.md index 3538339001..ec682ecdbd 100644 --- a/_api-reference/cluster-api/cluster-settings.md +++ b/_api-reference/cluster-api/cluster-settings.md @@ -32,25 +32,6 @@ include_defaults (GET only) | Boolean | Whether to include default settings as p cluster_manager_timeout | Time unit | The amount of time to wait for a response from the cluster manager node. Default is `30 seconds`. timeout (PUT only) | Time unit | The amount of time to wait for a response from the cluster. Default is `30 seconds`. - -#### Example request - -```json -GET _cluster/settings?include_defaults=true -``` -{% include copy-curl.html %} - -#### Example response - -```json -PUT _cluster/settings -{ - "persistent":{ - "action.auto_create_index": false - } -} -``` - ## Request fields The GET operation has no request body options. All cluster setting field parameters are optional. @@ -60,9 +41,24 @@ Not all cluster settings can be updated using the cluster settings API. You will For a listing of all cluster settings, see [Configuring OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/). -#### Example request -For a PUT operation, the request body must contain `transient` or `persistent`, along with the setting you want to update: +## Example requests + +The following example request show how to use the cluster settings API. + +### Check default cluster settings + +The following example request checks for default cluster settings: + +```json +GET _cluster/settings?include_defaults=true +``` +{% include copy-curl.html %} + +### Update cluster setting + +The following example updates the `cluster.max_shards_per_node` setting. For a PUT operation, the request body must contain `transient` or `persistent`, along with the setting you want to update: + ```json PUT _cluster/settings @@ -76,7 +72,9 @@ PUT _cluster/settings For more information about transient settings, persistent settings, and precedence, see [OpenSearch configuration]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/). -#### Example response +## Example response + +The following example response shows that the persistent cluster setting, `max_shard_per_node`, has been updated: ```json { diff --git a/_api-reference/cluster-api/cluster-stats.md b/_api-reference/cluster-api/cluster-stats.md index 8f8b585a6a..fb0ade2c6b 100644 --- a/_api-reference/cluster-api/cluster-stats.md +++ b/_api-reference/cluster-api/cluster-stats.md @@ -15,12 +15,6 @@ redirect_from: The cluster stats API operation returns statistics about your cluster. -## Example - -```json -GET _cluster/stats/nodes/_cluster_manager -``` -{% include copy-curl.html %} ## Path and HTTP methods @@ -41,7 +35,16 @@ Parameter | Type | Description Although the `master` node is now called `cluster_manager` for version 2.0, we retained the `master` field for backwards compatibility. If you have a node that has either a `master` role or a `cluster_manager` role, the `count` increases for both fields by 1. To see an example node count increase, see the Response sample. {: .note } -## Response +## Example request + +The following example requests returns information about the cluster manager node: + +```json +GET _cluster/stats/nodes/_cluster_manager +``` +{% include copy-curl.html %} + +## Example response ```json { diff --git a/_api-reference/document-apis/bulk.md b/_api-reference/document-apis/bulk.md index a020fc459d..a9833a701f 100644 --- a/_api-reference/document-apis/bulk.md +++ b/_api-reference/document-apis/bulk.md @@ -134,7 +134,7 @@ All actions support the same metadata: `_index`, `_id`, and `_require_alias`. If { "script" : { "source": "ctx._source.title = \"World War Z\"" } } ``` -## Response +## Example response In the response, pay particular attention to the top-level `errors` boolean. If true, you can iterate over the individual actions for more detailed information. diff --git a/_api-reference/document-apis/delete-by-query.md b/_api-reference/document-apis/delete-by-query.md index 6f4104c254..64da909aad 100644 --- a/_api-reference/document-apis/delete-by-query.md +++ b/_api-reference/document-apis/delete-by-query.md @@ -88,7 +88,7 @@ To search your index for specific documents, you must include a [query]({{site.u } ``` -## Response +## Example response ```json { "took": 143, diff --git a/_api-reference/document-apis/delete-document.md b/_api-reference/document-apis/delete-document.md index c3dea2f7e1..ece99a28ca 100644 --- a/_api-reference/document-apis/delete-document.md +++ b/_api-reference/document-apis/delete-document.md @@ -42,7 +42,7 @@ version_type | Enum | Retrieves a specifically typed document. Available options wait_for_active_shards | String | The number of active shards that must be available before OpenSearch processes the delete request. Default is 1 (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the operation to succeed. | No -## Response +## Example response ```json { "_index": "sample-index1", diff --git a/_api-reference/document-apis/get-documents.md b/_api-reference/document-apis/get-documents.md index 3eaeb507d4..d493df136b 100644 --- a/_api-reference/document-apis/get-documents.md +++ b/_api-reference/document-apis/get-documents.md @@ -49,7 +49,7 @@ version | Integer | The version of the document to return, which must match the version_type | Enum | Retrieves a specifically typed document. Available options are `external` (retrieve the document if the specified version number is greater than the document's current version) and `external_gte` (retrieve the document if the specified version number is greater than or equal to the document's current version). For example, to retrieve version 3 of a document, use `/_doc/1?version=3&version_type=external`. -## Response +## Example response ```json { "_index": "sample-index1", diff --git a/_api-reference/document-apis/index-document.md b/_api-reference/document-apis/index-document.md index d131a2f50e..a506e2d9d8 100644 --- a/_api-reference/document-apis/index-document.md +++ b/_api-reference/document-apis/index-document.md @@ -110,7 +110,7 @@ Your request body must contain the information you want to index. } ``` -## Response +## Example response ```json { "_index": "sample-index", diff --git a/_api-reference/document-apis/multi-get.md b/_api-reference/document-apis/multi-get.md index 2d3246fa58..b267b8f3ac 100644 --- a/_api-reference/document-apis/multi-get.md +++ b/_api-reference/document-apis/multi-get.md @@ -54,7 +54,11 @@ _source.excludes | Array | Specifies which fields to exclude in the query respon ids | Array | IDs of the documents to retrieve. Only allowed when an index is specified in the URL. | No -#### Example without specifying index in URL +## Example requests + +### Specify an index in the request body + +The following example requests does specifies an index in the request body: ```json GET _mget @@ -76,7 +80,9 @@ GET _mget ``` {% include copy-curl.html %} -#### Example of specifying index in URL +### Specify an index the URL + +The following example specifies an index in the URL: ```json GET sample-index1/_mget @@ -95,7 +101,10 @@ GET sample-index1/_mget ``` {% include copy-curl.html %} -#### Example Response +## Example response + +The following example response returns information about multiple documents: + ```json { "docs": [ diff --git a/_api-reference/document-apis/reindex.md b/_api-reference/document-apis/reindex.md index c2afa347e1..8ac1c48be4 100644 --- a/_api-reference/document-apis/reindex.md +++ b/_api-reference/document-apis/reindex.md @@ -81,7 +81,7 @@ pipeline | Which ingest pipeline to utilize during the reindex. script | A script that OpenSearch uses to apply transformations to the data during the reindex operation. lang | The scripting language. Valid options are `painless`, `expression`, `mustache`, and `java`. -## Response +## Example response ```json { "took": 28829, diff --git a/_api-reference/document-apis/update-by-query.md b/_api-reference/document-apis/update-by-query.md index 217ae69550..09b3bd599f 100644 --- a/_api-reference/document-apis/update-by-query.md +++ b/_api-reference/document-apis/update-by-query.md @@ -102,7 +102,7 @@ To update your indexes and documents by query, you must include a [query]({{site } ``` -## Response +## Example response ```json { "took": 21, diff --git a/_api-reference/document-apis/update-document.md b/_api-reference/document-apis/update-document.md index 3da7030fa5..3f951b5adf 100644 --- a/_api-reference/document-apis/update-document.md +++ b/_api-reference/document-apis/update-document.md @@ -195,7 +195,7 @@ After the upsert operation, the document's `first_name` and `last_name` fields a } ``` -## Response +## Example response ```json { "_index": "sample-index1", diff --git a/_api-reference/index-apis/alias.md b/_api-reference/index-apis/alias.md index a38a3798a4..ebd7bdedfd 100644 --- a/_api-reference/index-apis/alias.md +++ b/_api-reference/index-apis/alias.md @@ -75,7 +75,7 @@ routing | String | Used to assign a custom value to a shard for specific operati index_routing | String | Assigns a custom value to a shard only for index operations. | No search_routing | String | Assigns a custom value to a shard only for search operations. | No -## Response +## Example response ```json { diff --git a/_api-reference/index-apis/clear-index-cache.md b/_api-reference/index-apis/clear-index-cache.md index 55a5ce85d5..9bf873301d 100644 --- a/_api-reference/index-apis/clear-index-cache.md +++ b/_api-reference/index-apis/clear-index-cache.md @@ -38,11 +38,11 @@ All query parameters are optional. | query | Boolean | If `true`, clears the query cache. Defaults to `true`. | | request | Boolean | If `true`, clears the request cache. Defaults to `true`. | -#### Example requests +## Example requests The following example requests show multiple clear cache API uses. -##### Clear a specific cache +### Clear a specific cache The following request clears the fields cache only: @@ -69,7 +69,7 @@ POST /my-index/_cache/clear?request=true ``` {% include copy-curl.html %} -#### Clear the cache for specific fields +### Clear the cache for specific fields The following request clears the fields caches of `fielda` and `fieldb`: @@ -78,7 +78,7 @@ POST /my-index/_cache/clear?fields=fielda,fieldb ``` {% include copy-curl.html %} -#### Clear caches for specific data streams or indexes +### Clear caches for specific data streams or indexes The following request clears the cache for two specific indexes: @@ -96,14 +96,14 @@ POST /_cache/clear ``` {% include copy-curl.html %} -#### Clear unused entries from the cache on search-capable nodes +### Clear unused entries from the cache on search-capable nodes ```json POST /*/_cache/clear?file=true ``` {% include copy-curl.html %} -#### Example response +## Example response The `POST /books,hockey/_cache/clear` request returns the following fields: diff --git a/_api-reference/index-apis/clone.md b/_api-reference/index-apis/clone.md index 60228b5894..c1496cbaf8 100644 --- a/_api-reference/index-apis/clone.md +++ b/_api-reference/index-apis/clone.md @@ -66,7 +66,25 @@ task_execution_timeout | Time | The explicit task execution timeout. Only useful The clone index API operation creates a new target index, so you can specify any [index settings]({{site.url}}{{site.baseurl}}/im-plugin/index-settings/) and [aliases]({{site.url}}{{site.baseurl}}/opensearch/index-alias/) to apply to the target index. -## Response +## Example request + +```json +PUT /sample-index1/_clone/cloned-index1 +{ + "settings": { + "index": { + "number_of_shards": 2, + "number_of_replicas": 1 + } + }, + "aliases": { + "sample-alias1": {} + } +} +``` +{% include copy-curl.html %} + +## Example response ```json { diff --git a/_api-reference/index-apis/close-index.md b/_api-reference/index-apis/close-index.md index 7e43198d37..865d17d90a 100644 --- a/_api-reference/index-apis/close-index.md +++ b/_api-reference/index-apis/close-index.md @@ -41,7 +41,7 @@ cluster_manager_timeout | Time | How long to wait for a connection to the cluste timeout | Time | How long to wait for a response from the cluster. Default is `30s`. -## Response +## Example response ```json { "acknowledged": true, diff --git a/_api-reference/index-apis/create-index.md b/_api-reference/index-apis/create-index.md index 53d2dc28f9..ff5d7dbda5 100644 --- a/_api-reference/index-apis/create-index.md +++ b/_api-reference/index-apis/create-index.md @@ -52,7 +52,7 @@ timeout | Time | How long to wait for the request to return. Default is `30s`. As part of your request, you can optionally specify [index settings]({{site.url}}{{site.baseurl}}/im-plugin/index-settings/), [mappings]({{site.url}}{{site.baseurl}}/field-types/index/), and [aliases]({{site.url}}{{site.baseurl}}/opensearch/index-alias/) for your newly created index. -#### Example request +## Example request ```json PUT /sample-index1 diff --git a/_api-reference/index-apis/delete-index.md b/_api-reference/index-apis/delete-index.md index 20e5c51c93..ad00eb7eca 100644 --- a/_api-reference/index-apis/delete-index.md +++ b/_api-reference/index-apis/delete-index.md @@ -38,7 +38,7 @@ cluster_manager_timeout | Time | How long to wait for a connection to the cluste timeout | Time | How long to wait for the response to return. Default is `30s`. -## Response +## Example response ```json { "acknowledged": true diff --git a/_api-reference/index-apis/exists.md b/_api-reference/index-apis/exists.md index 429ac40745..351e2f2088 100644 --- a/_api-reference/index-apis/exists.md +++ b/_api-reference/index-apis/exists.md @@ -40,6 +40,6 @@ ignore_unavailable | Boolean | If true, OpenSearch does not search for missing o local | Boolean | Whether to return information from only the local node instead of from the cluster manager node. Default is `false`. -## Response +## Example response The index exists API operation returns only one of two possible response codes: `200` -- the index exists, and `404` -- the index does not exist. diff --git a/_api-reference/index-apis/force-merge.md b/_api-reference/index-apis/force-merge.md index 6c2a61bef3..ce7501ebe3 100644 --- a/_api-reference/index-apis/force-merge.md +++ b/_api-reference/index-apis/force-merge.md @@ -74,42 +74,56 @@ The following table lists the available query parameters. All query parameters a | `only_expunge_deletes` | Boolean | If `true`, the merge operation only expunges segments containing a certain percentage of deleted documents. The percentage is 10% by default and is configurable in the `index.merge.policy.expunge_deletes_allowed` setting. Prior to OpenSearch 2.12, `only_expunge_deletes` ignored the `index.merge.policy.max_merged_segment` setting. Starting with OpenSearch 2.12, using `only_expunge_deletes` does not produce segments larger than `index.merge.policy.max_merged_segment` (by default, 5 GB). For more information, see [Deleted documents](#deleted-documents). Default is `false`. | | `primary_only` | Boolean | If set to `true`, then the merge operation is performed only on the primary shards of an index. This can be useful when you want to take a snapshot of the index after the merge is complete. Snapshots only copy segments from the primary shards. Merging the primary shards can reduce resource consumption. Default is `false`. | -#### Example request: Force merge a specific index +## Example requests + +The following examples show how to use the Force merge API. + +### Force merge a specific index + +The following example force merges a specific index: ```json POST /testindex1/_forcemerge ``` {% include copy-curl.html %} -#### Example request: Force merge multiple indexes +### Force merge multiple indexes + +The following example force merges multiple indexes: ```json POST /testindex1,testindex2/_forcemerge ``` {% include copy-curl.html %} -#### Example request: Force merge all indexes +### Force merge all indexes + +The following example force merges all indexes: ```json POST /_forcemerge ``` {% include copy-curl.html %} -#### Example request: Force merge a data stream's backing indexes into one segment +### Force merge a data stream's backing indexes into one segment + +The following example force merges a data stream's backing indexes into one segment: ```json POST /.testindex-logs/_forcemerge?max_num_segments=1 ``` {% include copy-curl.html %} -#### Example request: Force merge primary shards +### Force merge primary shards + +The following example force merges an index's primary shards: ```json POST /.testindex-logs/_forcemerge?primary_only=true ``` {% include copy-curl.html %} -#### Example response +## Example response ```json { diff --git a/_api-reference/index-apis/get-index.md b/_api-reference/index-apis/get-index.md index 733110d63a..e2d2d85c65 100644 --- a/_api-reference/index-apis/get-index.md +++ b/_api-reference/index-apis/get-index.md @@ -41,7 +41,7 @@ local | Boolean | Whether to return information from only the local node instead cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. -## Response +## Example response ```json { "sample-index1": { diff --git a/_api-reference/index-apis/get-settings.md b/_api-reference/index-apis/get-settings.md index c41b25b4f5..94cb4a7c6c 100644 --- a/_api-reference/index-apis/get-settings.md +++ b/_api-reference/index-apis/get-settings.md @@ -45,7 +45,7 @@ ignore_unavailable | Boolean | If true, OpenSearch does not include missing or c local | Boolean | Whether to return information from the local node only instead of the cluster manager node. Default is `false`. cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. -## Response +## Example response ```json { diff --git a/_api-reference/index-apis/open-index.md b/_api-reference/index-apis/open-index.md index 12381aa8c6..0d8ef62282 100644 --- a/_api-reference/index-apis/open-index.md +++ b/_api-reference/index-apis/open-index.md @@ -43,7 +43,7 @@ wait_for_completion | Boolean | When set to `false`, the request returns immedia task_execution_timeout | Time | The explicit task execution timeout. Only useful when wait_for_completion is set to `false`. Default is `1h`. -## Response +## Example response ```json { "acknowledged": true, diff --git a/_api-reference/index-apis/split.md b/_api-reference/index-apis/split.md index 03b2f742d1..ad13bffbba 100644 --- a/_api-reference/index-apis/split.md +++ b/_api-reference/index-apis/split.md @@ -66,7 +66,7 @@ task_execution_timeout | Time | The explicit task execution timeout. Only useful The split index API operation creates a new target index, so you can specify any [index settings]({{site.url}}{{site.baseurl}}/im-plugin/index-settings/) and [aliases]({{site.url}}{{site.baseurl}}/opensearch/index-alias/) to apply to the target index. -## Response +## Example response ```json { diff --git a/_api-reference/index-apis/stats.md b/_api-reference/index-apis/stats.md index 8ccd624de1..7310298594 100644 --- a/_api-reference/index-apis/stats.md +++ b/_api-reference/index-apis/stats.md @@ -71,49 +71,65 @@ Parameter | Data type | Description `include_segment_file_sizes` | Boolean | Specifies whether to report the aggregated disk usage of each Lucene index file. Only applies to `segments` statistics. Default is `false`. `include_unloaded_segments` | Boolean | Specifies whether to include information from segments that are not loaded into memory. Default is `false`. -#### Example request: One index +## Example requests + +The following example requests show how to use the Index Stats API. + +### One index + +The following example returns index stats for a single index: ```json GET /testindex/_stats ``` {% include copy-curl.html %} -#### Example request: Comma-separated list of indexes +### Comma-separated list of indexes + +The following example returns stats for multiple indexes: ```json GET /testindex1,testindex2/_stats ``` {% include copy-curl.html %} -#### Example request: Wildcard expression +### Wildcard expression + +The following example returns starts about any index that starts with `testindex`: ```json GET /testindex*/_stats ``` {% include copy-curl.html %} -#### Example request: Specific stats +### Specific stats + +The following example returns index stats related to the index and flush operations: ```json GET /testindex/_stats/refresh,flush ``` {% include copy-curl.html %} -#### Example request: Expand wildcards +### Expand wildcards + +The following example expands all wildcards related to index stats: ```json GET /testindex*/_stats?expand_wildcards=open,hidden ``` {% include copy-curl.html %} -#### Example request: Shard-level statistics +### Shard-level statistics + +The following example returns shard level stats about a test index: ```json GET /testindex/_stats?level=shards ``` {% include copy-curl.html %} -#### Example response +## Example response By default, the returned statistics are aggregated in the `primaries` and `total` aggregations. The `primaries` aggregation contains statistics for the primary shards. The `total` aggregation contains statistics for both primary and replica shards. The following is an example Index Stats API response: diff --git a/_api-reference/nodes-apis/nodes-hot-threads.md b/_api-reference/nodes-apis/nodes-hot-threads.md index 3fb6ff65ea..f5e014dd6d 100644 --- a/_api-reference/nodes-apis/nodes-hot-threads.md +++ b/_api-reference/nodes-apis/nodes-hot-threads.md @@ -46,14 +46,14 @@ ignore_idle_threads | Boolean | Don’t show threads that are in known idle st type | String | Supported thread types are `cpu`, `wait`, or `block`. Defaults to `cpu`. timeout | Time | Sets the time limit for node response. Default value is `30s`. -#### Example request +## Example request ```json GET /_nodes/hot_threads ``` {% include copy-curl.html %} -#### Example response +## Example response ```bash ::: {opensearch}{F-ByTQzVQ3GQeYzQJArJGQ}{GxbcLdCATPWggOuQHJAoCw}{127.0.0.1}{127.0.0.1:9300}{dimr}{shard_indexing_pressure_enabled=true} @@ -65,7 +65,7 @@ GET /_nodes/hot_threads org.opensearch.performanceanalyzer.collectors.ScheduledMetricCollectorsExecutor.run(ScheduledMetricCollectorsExecutor.java:100) ``` -## Response +## Example response Unlike the majority of OpenSearch API responses, this response is in a text format. diff --git a/_api-reference/nodes-apis/nodes-info.md b/_api-reference/nodes-apis/nodes-info.md index d7c810410e..a8953505ff 100644 --- a/_api-reference/nodes-apis/nodes-info.md +++ b/_api-reference/nodes-apis/nodes-info.md @@ -79,7 +79,7 @@ Parameter | Type | Description flat_settings| Boolean | Specifies whether to return the `settings` object of the response in flat format. Default is `false`. timeout | Time | Sets the time limit for node response. Default value is `30s`. -#### Example request +## Example request The following query requests the `process` and `transport` metrics from the cluster manager node: @@ -88,7 +88,7 @@ GET /_nodes/cluster_manager:true/process,transport ``` {% include copy-curl.html %} -#### Example response +## Example response The response contains the metric groups specified in the `` request parameter (in this case, `process` and `transport`): diff --git a/_api-reference/nodes-apis/nodes-reload-secure.md b/_api-reference/nodes-apis/nodes-reload-secure.md index 52b2ef67ab..b4be66ddd4 100644 --- a/_api-reference/nodes-apis/nodes-reload-secure.md +++ b/_api-reference/nodes-apis/nodes-reload-secure.md @@ -36,7 +36,7 @@ The request may include an optional object containing the password for the OpenS } ``` -#### Example request +## Example request The following is an example API request: @@ -45,7 +45,7 @@ POST _nodes/reload_secure_settings ``` {% include copy-curl.html %} -#### Example response +## Example response The following is an example response: diff --git a/_api-reference/nodes-apis/nodes-stats.md b/_api-reference/nodes-apis/nodes-stats.md index ca6810b961..2ccea54390 100644 --- a/_api-reference/nodes-apis/nodes-stats.md +++ b/_api-reference/nodes-apis/nodes-stats.md @@ -118,14 +118,14 @@ level | String | Specifies whether statistics for the `indices` metric are aggre timeout | Time | Sets the time limit for node response. Default is `30s`. include_segment_file_sizes | Boolean | If segment statistics are requested, this field specifies to return the aggregated disk usage of every Lucene index file. Default is `false`. -#### Example request +## Example request ```json GET _nodes/stats/ ``` {% include copy-curl.html %} -#### Example response +## Example response Select the arrow to view the example response. diff --git a/_api-reference/nodes-apis/nodes-usage.md b/_api-reference/nodes-apis/nodes-usage.md index 532ddb626b..355b7f8ff2 100644 --- a/_api-reference/nodes-apis/nodes-usage.md +++ b/_api-reference/nodes-apis/nodes-usage.md @@ -38,7 +38,7 @@ Parameter | Type | Description timeout | Time | Sets the time limit for a response from the node. Default is `30s`. cluster_manager_timeout | Time | Sets the time limit for a response from the cluster manager. Default is `30s`. -#### Example request +## Example request The following request returns usage details for all nodes: @@ -47,7 +47,7 @@ GET _nodes/usage ``` {% include copy-curl.html %} -#### Example response +## Example response The following is an example response: diff --git a/_api-reference/profile.md b/_api-reference/profile.md index 94c7857b80..4f8c69db9c 100644 --- a/_api-reference/profile.md +++ b/_api-reference/profile.md @@ -26,7 +26,7 @@ A slice is the unit of work that can be executed by a thread. Each query can be In general, the max/min/avg slice time captures statistics across all slices for a timing type. For example, when profiling aggregations, the `max_slice_time_in_nanos` field in the `aggregations` section shows the maximum time consumed by the aggregation operation and its children across all slices. -#### Example request: Non-concurrent search +## Example request: Non-concurrent search To use the Profile API, include the `profile` parameter set to `true` in the search request sent to the `_search` endpoint: diff --git a/_api-reference/rank-eval.md b/_api-reference/rank-eval.md index 04fd3cf5c0..881ff3a22b 100644 --- a/_api-reference/rank-eval.md +++ b/_api-reference/rank-eval.md @@ -45,7 +45,7 @@ ignore_unlabeled | Defaults to `false`. Unlabeled documents are ignored when set template_id | Template ID. params | Parameters used in the template. -#### Example request +## Example request ````json GET shakespeare/_rank_eval @@ -76,7 +76,7 @@ GET shakespeare/_rank_eval ```` {% include copy-curl.html %} -#### Example response +## Example response ````json { diff --git a/_api-reference/script-apis/create-stored-script.md b/_api-reference/script-apis/create-stored-script.md index 04a73a205a..0a915cd836 100644 --- a/_api-reference/script-apis/create-stored-script.md +++ b/_api-reference/script-apis/create-stored-script.md @@ -47,7 +47,7 @@ All parameters are optional. | lang | String | Scripting language. Required. | | source | String or Object | Required.

For scripts, a string with the contents of the script.

For search templates, an object that defines the search template. Supports the same parameters as the [Search]({{site.url}}{{site.baseurl}}/api-reference/search) API request body. Search templates also support Mustache variables. | -#### Example request +## Example request The sample uses an index called `books` with the following documents: @@ -117,7 +117,7 @@ PUT _scripts/my-first-script See [Execute Painless stored script]({{site.url}}{{site.baseurl}}/api-reference/script-apis/exec-stored-script/) for information about running the script. -#### Example response +## Example response The `PUT _scripts/my-first-script` request returns the following field: diff --git a/_api-reference/script-apis/delete-script.md b/_api-reference/script-apis/delete-script.md index 363b0152df..fe9c272acc 100644 --- a/_api-reference/script-apis/delete-script.md +++ b/_api-reference/script-apis/delete-script.md @@ -26,7 +26,7 @@ Path parameters are optional. | cluster_manager_timeout | Time | Amount of time to wait for a connection to the cluster manager. Optional, defaults to `30s`. | | timeout | Time | The period of time to wait for a response. If a response is not received before the timeout value, the request will be dropped. -#### Example request +## Example request The following request deletes the `my-first-script` script: @@ -35,7 +35,7 @@ DELETE _scripts/my-script ```` {% include copy-curl.html %} -#### Example response +## Example response The `DELETE _scripts/my-first-script` request returns the following field: diff --git a/_api-reference/script-apis/exec-script.md b/_api-reference/script-apis/exec-script.md index 4ecb6a37fc..b6476be980 100644 --- a/_api-reference/script-apis/exec-script.md +++ b/_api-reference/script-apis/exec-script.md @@ -26,7 +26,7 @@ POST /_scripts/painless/_execute | context | A context for the script. Optional. Default is `painless_test`. | | context_setup | Specifies additional parameters for the context. Optional.| -#### Example request +## Example request The following request uses the default `painless_context` for the script: @@ -44,7 +44,7 @@ GET /_scripts/painless/_execute ``` {% include copy-curl.html %} -#### Example response +## Example response The response contains the average of two script parameters: diff --git a/_api-reference/script-apis/exec-stored-script.md b/_api-reference/script-apis/exec-stored-script.md index 7525ec81a4..a7de3b5274 100644 --- a/_api-reference/script-apis/exec-stored-script.md +++ b/_api-reference/script-apis/exec-stored-script.md @@ -21,7 +21,7 @@ OpenSearch provides several ways to run a script; the following sections show ho | script_fields | Object | Fields to include in output. | | script | Object | ID of the script that produces a value for a field. | -#### Example request +## Example request The following request runs the stored script that was created in [Create or update stored script]({{site.url}}{{site.baseurl}}/api-reference/script-apis/create-stored-script/). The script sums the ratings for each book and displays the sum in the `total_ratings` field in the output. diff --git a/_api-reference/script-apis/get-script-contexts.md b/_api-reference/script-apis/get-script-contexts.md index 40a155955f..85421128a1 100644 --- a/_api-reference/script-apis/get-script-contexts.md +++ b/_api-reference/script-apis/get-script-contexts.md @@ -11,14 +11,14 @@ nav_order: 5 Retrieves all contexts for stored scripts. -#### Example request +## Example request ````json GET _script_context ```` {% include copy-curl.html %} -#### Example response +## Example response The `GET _script_context` request returns the following fields: diff --git a/_api-reference/script-apis/get-script-language.md b/_api-reference/script-apis/get-script-language.md index f84b7db4a0..76414d52ea 100644 --- a/_api-reference/script-apis/get-script-language.md +++ b/_api-reference/script-apis/get-script-language.md @@ -11,14 +11,14 @@ nav_order: 6 The get script language API operation retrieves all supported script languages and the contexts in which they may be used. -#### Example request +## Example request ```json GET _script_language ``` {% include copy-curl.html %} -#### Example response +## Example response The `GET _script_language` request returns the available contexts for each language: diff --git a/_api-reference/script-apis/get-stored-script.md b/_api-reference/script-apis/get-stored-script.md index cc681cd0f4..d7987974d3 100644 --- a/_api-reference/script-apis/get-stored-script.md +++ b/_api-reference/script-apis/get-stored-script.md @@ -23,7 +23,7 @@ Retrieves a stored script. :--- | :--- | :--- | cluster_manager_timeout | Time | Amount of time to wait for a connection to the cluster manager. Optional, defaults to `30s`. | -#### Example request +## Example request The following retrieves the `my-first-script` stored script. @@ -32,7 +32,7 @@ GET _scripts/my-first-script ```` {% include copy-curl.html %} -#### Example response +## Example response The `GET _scripts/my-first-script` request returns the following fields: diff --git a/_api-reference/scroll.md b/_api-reference/scroll.md index cee599902d..b940c90d86 100644 --- a/_api-reference/scroll.md +++ b/_api-reference/scroll.md @@ -106,7 +106,7 @@ scroll | Time | Specifies the amount of time the search context is maintained. scroll_id | String | The scroll ID for the search. rest_total_hits_as_int | Boolean | Whether the `hits.total` property is returned as an integer (`true`) or an object (`false`). Default is `false`. -## Response +## Example response ```json { diff --git a/_api-reference/snapshots/create-repository.md b/_api-reference/snapshots/create-repository.md index 54807b85d1..8ee7885ca8 100644 --- a/_api-reference/snapshots/create-repository.md +++ b/_api-reference/snapshots/create-repository.md @@ -48,7 +48,7 @@ Request field | Description `remote_store_index_shallow_copy` | Boolean | Determines whether the snapshot of the remote store indexes are captured as a shallow copy. Default is `false`. `readonly` | Whether the repository is read-only. Useful when migrating from one cluster (`"readonly": false` when registering) to another cluster (`"readonly": true` when registering). Optional. -#### Example request +## Example request The following example registers an `fs` repository using the local directory `/mnt/snapshots` as `location`. @@ -85,7 +85,7 @@ Request field | Description For the `base_path` parameter, do not enter the `s3://` prefix when entering your S3 bucket details. Only the name of the bucket is required. {: .note} -#### Example request +## Example request The following request registers a new S3 repository called `my-opensearch-repo` in an existing bucket called `my-open-search-bucket`. By default, all snapshots are stored in the `my/snapshot/directory`. @@ -101,7 +101,7 @@ PUT /_snapshot/my-opensearch-repo ``` {% include copy-curl.html %} -#### Example response +## Example response Upon success, the following JSON object is returned: diff --git a/_api-reference/snapshots/create-snapshot.md b/_api-reference/snapshots/create-snapshot.md index 6334878d8c..d4c9ef8219 100644 --- a/_api-reference/snapshots/create-snapshot.md +++ b/_api-reference/snapshots/create-snapshot.md @@ -46,7 +46,7 @@ Field | Data type | Description `include_global_state` | Boolean | Whether to include cluster state in the snapshot. Default is `true`. `partial` | Boolean | Whether to allow partial snapshots. Default is `false`, which fails the entire snapshot if one or more shards fails to stor -#### Example requests +## Example requests ##### Request without a body @@ -72,7 +72,7 @@ PUT _snapshot/my-s3-repository/2 ``` {% include copy-curl.html %} -#### Example responses +## Example responses Upon success, the response content depends on whether you include the `wait_for_completion` query parameter. diff --git a/_api-reference/snapshots/delete-snapshot-repository.md b/_api-reference/snapshots/delete-snapshot-repository.md index 385205a5df..1fadc21207 100644 --- a/_api-reference/snapshots/delete-snapshot-repository.md +++ b/_api-reference/snapshots/delete-snapshot-repository.md @@ -21,7 +21,7 @@ Parameter | Data type | Description :--- | :--- | :--- repository | String | Repository to delete. | -#### Example request +## Example request The following request deletes the `my-opensearch-repo` repository: @@ -30,7 +30,7 @@ DELETE _snapshot/my-opensearch-repo ```` {% include copy-curl.html %} -#### Example response +## Example response Upon success, the response returns the following JSON object: diff --git a/_api-reference/snapshots/delete-snapshot.md b/_api-reference/snapshots/delete-snapshot.md index e4232c20ec..d231adf74a 100644 --- a/_api-reference/snapshots/delete-snapshot.md +++ b/_api-reference/snapshots/delete-snapshot.md @@ -24,7 +24,7 @@ Parameter | Data type | Description repository | String | Repostory that contains the snapshot. | snapshot | String | Snapshot to delete. | -#### Example request +## Example request The following request deletes a snapshot called `my-first-snapshot` from the `my-opensearch-repo` repository: @@ -33,7 +33,7 @@ DELETE _snapshot/my-opensearch-repo/my-first-snapshot ``` {% include copy-curl.html %} -#### Example response +## Example response Upon success, the response returns the following JSON object: diff --git a/_api-reference/snapshots/get-snapshot-repository.md b/_api-reference/snapshots/get-snapshot-repository.md index 501d0785dd..6617106059 100644 --- a/_api-reference/snapshots/get-snapshot-repository.md +++ b/_api-reference/snapshots/get-snapshot-repository.md @@ -29,7 +29,7 @@ You can also get details about a snapshot during and after snapshot creation. Se | local | Boolean | Whether to get information from the local node. Optional, defaults to `false`.| | cluster_manager_timeout | Time | Amount of time to wait for a connection to the cluster manager node. Optional, defaults to 30 seconds. | -#### Example request +## Example request The following request retrieves information for the `my-opensearch-repo` repository: @@ -38,7 +38,7 @@ GET /_snapshot/my-opensearch-repo ```` {% include copy-curl.html %} -#### Example response +## Example response Upon success, the response returns repositry information. This sample is for an `s3` repository type. diff --git a/_api-reference/snapshots/get-snapshot-status.md b/_api-reference/snapshots/get-snapshot-status.md index 6f8320d0b0..c7f919bcb3 100644 --- a/_api-reference/snapshots/get-snapshot-status.md +++ b/_api-reference/snapshots/get-snapshot-status.md @@ -42,7 +42,7 @@ Using the API to return state for other than currently running snapshots can be :--- | :--- | :--- | ignore_unavailable | Boolean | How to handles requests for unavailable snapshots. If `false`, the request returns an error for unavailable snapshots. If `true`, the request ignores unavailable snapshots, such as those that are corrupted or temporarily cannot be returned. Defaults to `false`.| -#### Example request +## Example request The following request returns the status of `my-first-snapshot` in the `my-opensearch-repo` repository. Unavailable snapshots are ignored. @@ -54,7 +54,7 @@ GET _snapshot/my-opensearch-repo/my-first-snapshot/_status ```` {% include copy-curl.html %} -#### Example response +## Example response The example that follows corresponds to the request above in the [Example request](#example-request) section. diff --git a/_api-reference/snapshots/get-snapshot.md b/_api-reference/snapshots/get-snapshot.md index da44c1f23d..ac55c0370f 100644 --- a/_api-reference/snapshots/get-snapshot.md +++ b/_api-reference/snapshots/get-snapshot.md @@ -25,7 +25,7 @@ Retrieves information about a snapshot. | verbose | Boolean | Whether to show all, or just basic snapshot information. If `true`, returns all information. If `false`, omits information like start/end times, failures, and shards. Optional, defaults to `true`.| | ignore_unavailable | Boolean | How to handle snapshots that are unavailable (corrupted or otherwise temporarily can't be returned). If `true` and the snapshot is unavailable, the request does not return the snapshot. If `false` and the snapshot is unavailable, the request returns an error. Optional, defaults to `false`.| -#### Example request +## Example request The following request retrieves information for the `my-first-snapshot` located in the `my-opensearch-repo` repository: @@ -34,7 +34,7 @@ GET _snapshot/my-opensearch-repo/my-first-snapshot ```` {% include copy-curl.html %} -#### Example response +## Example response Upon success, the response returns snapshot information: diff --git a/_api-reference/snapshots/restore-snapshot.md b/_api-reference/snapshots/restore-snapshot.md index 7b82f72256..cdb9948c28 100644 --- a/_api-reference/snapshots/restore-snapshot.md +++ b/_api-reference/snapshots/restore-snapshot.md @@ -57,7 +57,7 @@ All request body parameters are optional. * Ingest pipelines * Index lifecycle policies -#### Example request +## Example request The following request restores the `opendistro-reports-definitions` index from `my-first-snapshot`. The `rename_pattern` and `rename_replacement` combination causes the index to be renamed to `opendistro-reports-definitions_restored` because duplicate open index names in a cluster are not allowed. @@ -73,7 +73,7 @@ POST /_snapshot/my-opensearch-repo/my-first-snapshot/_restore } ```` -#### Example response +## Example response Upon success, the response returns the following JSON object: diff --git a/_api-reference/snapshots/verify-snapshot-repository.md b/_api-reference/snapshots/verify-snapshot-repository.md index 12fada3303..e5e6337196 100644 --- a/_api-reference/snapshots/verify-snapshot-repository.md +++ b/_api-reference/snapshots/verify-snapshot-repository.md @@ -32,7 +32,7 @@ Path parameters are optional. | cluster_manager_timeout | Time | Amount of time to wait for a connection to the cluster manager node. Optional, defaults to `30s`. | | timeout | Time | The period of time to wait for a response. If a response is not received before the timeout value, the request fails and returns an error. Defaults to `30s`. | -#### Example request +## Example request The following request verifies that the my-opensearch-repo is functional: @@ -40,7 +40,7 @@ The following request verifies that the my-opensearch-repo is functional: POST /_snapshot/my-opensearch-repo/_verify?timeout=0s&cluster_manager_timeout=50s ```` -#### Example response +## Example response The example that follows corresponds to the request above in the [Example request](#example-request) section. diff --git a/_api-reference/tasks.md b/_api-reference/tasks.md index 5c3a41fd34..e4ca0b6049 100644 --- a/_api-reference/tasks.md +++ b/_api-reference/tasks.md @@ -28,59 +28,6 @@ GET _tasks/ Note that if a task finishes running, it won't be returned as part of your request. For an example of a task that takes a little longer to finish, you can run the [`_reindex`]({{site.url}}{{site.baseurl}}/opensearch/reindex-data) API operation on a larger document, and then run `tasks`. -#### Example response -```json -{ - "nodes": { - "Mgqdm0r9SEGClWxp_RbnaQ": { - "name": "opensearch-node1", - "transport_address": "172.18.0.3:9300", - "host": "172.18.0.3", - "ip": "172.18.0.3:9300", - "roles": [ - "data", - "ingest", - "master", - "remote_cluster_client" - ], - "tasks": { - "Mgqdm0r9SEGClWxp_RbnaQ:17416": { - "node": "Mgqdm0r9SEGClWxp_RbnaQ", - "id": 17416, - "type": "transport", - "action": "cluster:monitor/tasks/lists", - "start_time_in_millis": 1613599752458, - "running_time_in_nanos": 994000, - "cancellable": false, - "headers": {} - } - }, - "Mgqdm0r9SEGClWxp_RbnaQ:17413": { - "node": "Mgqdm0r9SEGClWxp_RbnaQ", - "id": 17413, - "type": "transport", - "action": "indices:data/write/bulk", - "start_time_in_millis": 1613599752286, - "running_time_in_nanos": 172846500, - "cancellable": false, - "parent_task_id": "Mgqdm0r9SEGClWxp_RbnaQ:17366", - "headers": {} - }, - "Mgqdm0r9SEGClWxp_RbnaQ:17366": { - "node": "Mgqdm0r9SEGClWxp_RbnaQ", - "id": 17366, - "type": "transport", - "action": "indices:data/write/reindex", - "start_time_in_millis": 1613599750929, - "running_time_in_nanos": 1529733100, - "cancellable": true, - "headers": {} - } - } - } - } -} -``` You can also use the following parameters with your query. @@ -97,14 +44,29 @@ Parameter | Data type | Description | For example, this request returns tasks currently running on a node named `opensearch-node1`: -#### Example request +## Example requests + +### Return information about running tasks + +The following request returns tasks currently running on a node named `opensearch-node1`: ```json GET /_tasks?nodes=opensearch-node1 ``` {% include copy-curl.html %} -#### Example response +### Return information about active search tasks + +The following request returns detailed information about active search tasks: + +```bash +curl -XGET "localhost:9200/_tasks?actions=*search&detailed +``` +{% include copy.html %} + +## Example response + +The following example response shows information about running tasks: ```json { @@ -148,76 +110,6 @@ GET /_tasks?nodes=opensearch-node1 } ``` -The following request returns detailed information about active search tasks: - -#### Example request - -```bash -curl -XGET "localhost:9200/_tasks?actions=*search&detailed -``` -{% include copy.html %} - -#### Example response - -```json -{ - "nodes" : { - "CRqNwnEeRXOjeTSYYktw-A" : { - "name" : "runTask-0", - "transport_address" : "127.0.0.1:9300", - "host" : "127.0.0.1", - "ip" : "127.0.0.1:9300", - "roles" : [ - "cluster_manager", - "data", - "ingest", - "remote_cluster_client" - ], - "attributes" : { - "testattr" : "test", - "shard_indexing_pressure_enabled" : "true" - }, - "tasks" : { - "CRqNwnEeRXOjeTSYYktw-A:677" : { - "node" : "CRqNwnEeRXOjeTSYYktw-A", - "id" : 677, - "type" : "transport", - "action" : "indices:data/read/search", - "description" : "indices[], search_type[QUERY_THEN_FETCH], source[{\"query\":{\"query_string\":}}]", - "start_time_in_millis" : 1660106254525, - "running_time_in_nanos" : 1354236, - "cancellable" : true, - "cancelled" : false, - "headers" : { }, - "resource_stats" : { - "average" : { - "cpu_time_in_nanos" : 0, - "memory_in_bytes" : 0 - }, - "total" : { - "cpu_time_in_nanos" : 0, - "memory_in_bytes" : 0 - }, - "min" : { - "cpu_time_in_nanos" : 0, - "memory_in_bytes" : 0 - }, - "max" : { - "cpu_time_in_nanos" : 0, - "memory_in_bytes" : 0 - }, - "thread_info" : { - "thread_executions" : 0, - "active_threads" : 0 - } - } - } - } - } - } -} - -``` ### The `resource_stats` object diff --git a/templates/API_TEMPLATE.md b/templates/API_TEMPLATE.md index c4c46fc5ce..02c0f341d9 100644 --- a/templates/API_TEMPLATE.md +++ b/templates/API_TEMPLATE.md @@ -30,13 +30,13 @@ The following table lists the available path parameters. All path parameters are The following table lists the available query parameters. All query parameters are optional. -| Parameter | Data type | Description | +| Parameter | Data type | Description | | :--- | :--- | :--- | | `query_parameter` | String | Example query parameter description. Default is ... | -## Request fields +## Request body fields -The following table lists the available request fields. +The following table lists the available request body fields. | Field | Data type | Description | | :--- | :--- | :--- | @@ -44,7 +44,13 @@ The following table lists the available request fields. | `example_object.required_request_field` | Type | Required request field description. Required. | | `example_object.optional_request_field` | Type | Optional request field description. Optional. Default is ... | -#### Example request +## Example request(s) + +**TIP:** If multiple examples exist for the request, seperate those examples using an `h3` header underneath this section. + +### Request with an example object + +The following example shows an API request with an example object: ```json POST /_example/endpoint/ @@ -57,7 +63,21 @@ POST /_example/endpoint/ ``` {% include copy-curl.html %} -#### Example response +## Request without an example object + +The following example shows an API request without an example object: + +```json +POST /_example/endpoint/ +``` +{% include copy-curl.html %} + + +## Example response + +**TIP:** If multiple response examples exist for the request, seperate those examples using an `h3` header underneath this section, similar to the [Example requests](#example-requests). + +The following example shows an API response:
@@ -76,9 +96,9 @@ POST /_example/endpoint/ ```
-## Response fields +## Response body fields -The following table lists all response fields. +The following table lists all response body fields. | Field | Data type | Description | | :--- | :--- | :--- | @@ -87,3 +107,5 @@ The following table lists all response fields. ## Required permissions If you use the Security plugin, make sure you have the appropriate permissions: `cluster:example/permission/name`. + + From beb3bc225ac83b19e0d4545ea1fb5285b4d062d9 Mon Sep 17 00:00:00 2001 From: zane-neo Date: Thu, 25 Jul 2024 22:33:36 +0800 Subject: [PATCH 063/154] Add disk free space cluster settings (#7799) * Add disk free space cluster settings Signed-off-by: zane-neo * Add value range to disk free space setting Signed-off-by: zane-neo * Update _ml-commons-plugin/cluster-settings.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: zane-neo Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- _ml-commons-plugin/cluster-settings.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/_ml-commons-plugin/cluster-settings.md b/_ml-commons-plugin/cluster-settings.md index 0c1f433bf2..efb13dd73a 100644 --- a/_ml-commons-plugin/cluster-settings.md +++ b/_ml-commons-plugin/cluster-settings.md @@ -256,6 +256,23 @@ plugins.ml_commons.jvm_heap_memory_threshold: 85 - Default value: 85 - Value range: [0, 100] +## Set a disk free space threshold + +Sets a disk circuit breaker that checks disk usage before running an ML task. If the amount of disk free space exceeds the threshold, then OpenSearch triggers a circuit breaker and throws an exception to maintain optimal performance. + +Valid values are in byte units. To disable the circuit breaker, set this value to -1. + +### Setting + +``` +plugins.ml_commons.disk_free_space_threshold: 5G +``` + +### Values + +- Default value: 5G +- Value range: [-1, Long.MAX_VALUE] + ## Exclude node names Use this setting to specify the names of nodes on which you don't want to run ML tasks. The value should be a valid node name or a comma-separated node name list. From b56abe2fce09a7f2b54da00d7b176b4a67035463 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Thu, 25 Jul 2024 10:48:36 -0500 Subject: [PATCH 064/154] Fix alias page (#7831) Signed-off-by: Archer --- _api-reference/index-apis/update-alias.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/_api-reference/index-apis/update-alias.md b/_api-reference/index-apis/update-alias.md index cac05ceedb..f32d34025e 100644 --- a/_api-reference/index-apis/update-alias.md +++ b/_api-reference/index-apis/update-alias.md @@ -1,11 +1,12 @@ --- layout: default -title: Create or Update Alias +title: Create or update alias parent: Index APIs nav_order: 5 --- # Create or Update Alias + **Introduced 1.0** {: .label .label-purple } @@ -16,7 +17,7 @@ The Create or Update Alias API is distinct from the [Alias API]({{site.url}}{{si ## Path and HTTP methods -``` +```json POST //_alias/ PUT //_alias/ POST /_alias/ From fdfd53f29f0d6ea1a04761e33b8fc837b25be770 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Thu, 25 Jul 2024 12:19:07 -0500 Subject: [PATCH 065/154] Add Segment API (#7768) * Add Segment API Signed-off-by: Archer * Update segment.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _api-reference/index-apis/segment.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _api-reference/index-apis/segment.md Co-authored-by: Heather Halter Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update segment.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Archer Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Heather Halter Co-authored-by: Nathan Bower --- _api-reference/index-apis/segment.md | 122 +++++++++++++++++++++++++++ 1 file changed, 122 insertions(+) create mode 100644 _api-reference/index-apis/segment.md diff --git a/_api-reference/index-apis/segment.md b/_api-reference/index-apis/segment.md new file mode 100644 index 0000000000..a8a7ccaee1 --- /dev/null +++ b/_api-reference/index-apis/segment.md @@ -0,0 +1,122 @@ +--- +layout: default +title: Segment +parent: Index APIs +nav_order: 64 +--- + +# Segment +Introduced 1.0 +{: .label .label-purple } + +The Segment API provides details about the Lucene segments within index shards as well as information about the backing indexes of data streams. + + +## Path and HTTP methods + +```json +GET //_segments +GET /_segments +``` + +## Path parameters + +The following table lists the available path parameters. All path parameters are optional. + +Parameter | Data type | Description +:--- | :--- | :--- +`` | String | A comma-separated list of indexes, data streams, or index aliases to which the operation is applied. Supports wildcard expressions (`*`). Use `_all` or `*` to specify all indexes and data streams in a cluster. | + +## Query parameters + +The Segment API supports the following optional query parameters. + +Parameter | Data type | Description +:--- | :--- | :--- +`allow_no_indices` | Boolean | Whether to ignore wildcards that don't match any indexes. Default is `true`. +`allow_partial_search_results` | Boolean | Whether to return partial results if the request encounters an error or times out. Default is `true`. +`expand_wildcards` | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are `all` (match any index), `open` (match open, non-hidden indexes), `closed` (match closed, non-hidden indexes), `hidden` (match hidden indexes), and `none` (deny wildcard expressions). Default is `open`. +`ignore_unavailable` | Boolean | When `true`, OpenSearch ignores missing or closed indexes. If `false`, OpenSearch returns an error if the force merge operation encounters missing or closed indexes. Default is `false`. +`verbose` | Boolean | When `true`, provides information about Lucene's memory usage. Default is `false`. + +## Response body fields + +Parameter | Data type | Description + :--- | :--- | :--- +`` | String | The name of the segment used to create internal file names in the shard directory. +`generation` | Integer | The generation number, such as `0`, incremented for each written segment and used to name the segment. +`num_docs` | Integer | The number of documents, obtained from Lucene. Nested documents are counted separately from their parents. Deleted documents, as well as recently indexed documents that are not yet assigned to a segment, are excluded. +`deleted_docs` | Integer | The number of deleted documents, obtained from Lucene, which may not match the actual number of delete operations performed. Recently deleted documents that are not yet assigned to a segment are excluded. Deleted documents are automatically merged when appropriate. OpenSearch will occasionally delete extra documents in order to track recent shard operations. +`size_in_bytes` | Integer | The amount of disk space used by the segment, for example, `50kb`. +`memory_in_bytes` | Integer | The amount of segment data, measured in bytes, that is kept in memory to facilitate efficient search operations, such as `1264`. A value of `-1` indicates that OpenSearch was unable to compute this number. +`committed` | Boolean | When `true`, the segments are synced to disk. Segments synced to disk can survive a hard reboot. If `false`, then uncommitted segment data is stored in the transaction log as well so that changes can be replayed at the next startup. +`search` | Boolean | When `true`, segment search is enabled. When `false`, the segment may have already been written to disk and require a refresh in order to be searchable. +`version` | String | The Lucene version used to write the segment. +`compound` | Boolean | When `true`, indicates that Lucene merged all segment files into one file in order to save any file descriptions. +`attributes` | Object | Shows if high compression was enabled. + +## Example requests + +The following example requests show you how to use the Segment API. + +### Specific data stream or index + +```json +GET /index1/_segments +``` +{% include copy-curl.html %} + +### Several data streams and indexes + +```json +GET /index1,index2/_segments +``` +{% include copy-curl.html %} + +### All data streams and indexes in a cluster + +```json +GET /_segments +``` +{% include copy-curl.html %} + +## Example response + +```json +{ + "_shards": ... + "indices": { + "test": { + "shards": { + "0": [ + { + "routing": { + "state": "STARTED", + "primary": true, + "node": "zDC_RorJQCao9xf9pg3Fvw" + }, + "num_committed_segments": 0, + "num_search_segments": 1, + "segments": { + "_0": { + "generation": 0, + "num_docs": 1, + "deleted_docs": 0, + "size_in_bytes": 3800, + "memory_in_bytes": 1410, + "committed": false, + "search": true, + "version": "7.0.0", + "compound": true, + "attributes": { + } + } + } + } + ] + } + } + } +} +``` + From 79a422bc7d8c9eed77a8971ed74abc6edade1bb5 Mon Sep 17 00:00:00 2001 From: Naveen Tatikonda Date: Fri, 26 Jul 2024 14:11:26 -0500 Subject: [PATCH 066/154] [Doc] Lucene inbuilt scalar quantization (#7797) * [Doc] Lucene inbuilt scalar quantization in k-NN Signed-off-by: Naveen Tatikonda * Address Review Comments Signed-off-by: Naveen Tatikonda * Doc review Signed-off-by: Fanit Kolchina * Clarified M Signed-off-by: Fanit Kolchina * Tech review comments Signed-off-by: Fanit Kolchina * One more change Signed-off-by: Fanit Kolchina * Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Reword search time sentence Signed-off-by: Fanit Kolchina * Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Naveen Tatikonda Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Fanit Kolchina Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- .../styles/Vocab/OpenSearch/Words/accept.txt | 3 +- .../knn/knn-vector-quantization.md | 118 +++++++++++++++++- 2 files changed, 114 insertions(+), 7 deletions(-) diff --git a/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt b/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt index b588586138..9e09f21c3a 100644 --- a/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt +++ b/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt @@ -93,7 +93,8 @@ pebibyte [Pp]reprocess [Pp]retrain [Pp]seudocode -[Quantiz](e|ation|ing|er) +[Qq]uantiles? +[Qq]uantiz(e|ation|ing|er) [Rr]ebalance [Rr]ebalancing [Rr]edownload diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index fe4833ee47..656ce72fd2 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -15,7 +15,113 @@ OpenSearch supports many varieties of quantization. In general, the level of qua ## Lucene byte vector -Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine in order to reduce the amount of required memory. This requires quantizing the vectors outside of OpenSearch before ingesting them into an OpenSearch index. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector). +Starting with k-NN plugin version 2.9, you can use `byte` vectors with the Lucene engine in order to reduce the amount of required memory. This requires quantizing the vectors outside of OpenSearch before ingesting them into an OpenSearch index. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector). + +## Lucene scalar quantization + +Starting with version 2.16, the k-NN plugin supports built-in scalar quantization for the Lucene engine. Unlike the [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector), which requires you to quantize vectors before ingesting the documents, the Lucene scalar quantizer quantizes input vectors in OpenSearch during ingestion. The Lucene scalar quantizer converts 32-bit floating-point input vectors into 7-bit integer vectors in each segment using the minimum and maximum quantiles computed based on the [`confidence_interval`](#confidence-interval) parameter. During search, the query vector is quantized in each segment using the segment's minimum and maximum quantiles in order to compute the distance between the query vector and the segment's quantized input vectors. + +Quantization can decrease the memory footprint by a factor of 4 in exchange for some loss in recall. Additionally, quantization slightly increases disk usage because it requires storing both the raw input vectors and the quantized vectors. + +### Using Lucene scalar quantization + +To use the Lucene scalar quantizer, set the k-NN vector field's `method.parameters.encoder.name` to `sq` when creating a k-NN index: + +```json +PUT /test-index +{ + "settings": { + "index": { + "knn": true + } + }, + "mappings": { + "properties": { + "my_vector1": { + "type": "knn_vector", + "dimension": 2, + "method": { + "name": "hnsw", + "engine": "lucene", + "space_type": "l2", + "parameters": { + "encoder": { + "name": "sq" + }, + "ef_construction": 256, + "m": 8 + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +### Confidence interval + +Optionally, you can specify the `confidence_interval` parameter in the `method.parameters.encoder` object. +The `confidence_interval` is used to compute the minimum and maximum quantiles in order to quantize the vectors: +- If you set the `confidence_interval` to a value in the `0.9` to `1.0` range, inclusive, then the quantiles are calculated statically. For example, setting the `confidence_interval` to `0.9` specifies to compute the minimum and maximum quantiles based on the middle 90% of the vector values, excluding the minimum 5% and maximum 5% of the values. +- Setting `confidence_interval` to `0` specifies to compute the quantiles dynamically, which involves oversampling and additional computations performed on the input data. +- When `confidence_interval` is not set, it is computed based on the vector dimension $$d$$ using the formula $$max(0.9, 1 - \frac{1}{1 + d})$$. + +Lucene scalar quantization is applied only to `float` vectors. If you change the default value of the `data_type` parameter from `float` to `byte` or any other type when mapping a [k-NN vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/), then the request is rejected. +{: .warning} + +The following example method definition specifies the Lucene `sq` encoder with the `confidence_interval` set to `1.0`. This `confidence_interval` specifies to consider all the input vectors when computing the minimum and maximum quantiles. Vectors are quantized to 7 bits by default: + +```json +PUT /test-index +{ + "settings": { + "index": { + "knn": true + } + }, + "mappings": { + "properties": { + "my_vector1": { + "type": "knn_vector", + "dimension": 2, + "method": { + "name": "hnsw", + "engine": "lucene", + "space_type": "l2", + "parameters": { + "encoder": { + "name": "sq", + "parameters": { + "confidence_interval": 1.0 + } + }, + "ef_construction": 256, + "m": 8 + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +There are no changes to ingestion or query mapping and no range limitations for the input vectors. + +### Memory estimation + +In the ideal scenario, 7-bit vectors created by the Lucene scalar quantizer use only 25% of the memory required by 32-bit vectors. + +#### HNSW memory estimation + +The memory required for the Hierarchical Navigable Small World (HNSW) graph can be estimated as `1.1 * (dimension + 8 * M)` bytes/vector, where `M` is the maximum number of bidirectional links created for each element during the construction of the graph. + +As an example, assume that you have 1 million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows: + +```r +1.1 * (256 + 8 * 16) * 1,000,000 ~= 0.4 GB +``` ## Faiss 16-bit scalar quantization @@ -148,7 +254,7 @@ The memory required for Hierarchical Navigable Small Worlds (HNSW) is estimated As an example, assume that you have 1 million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows: -```bash +```r 1.1 * (2 * 256 + 8 * 16) * 1,000,000 ~= 0.656 GB ``` @@ -158,7 +264,7 @@ The memory required for IVF is estimated to be `1.1 * (((2 * dimension) * num_ve As an example, assume that you have 1 million vectors with a dimension of 256 and `nlist` of 128. The memory requirement can be estimated as follows: -```bash +```r 1.1 * (((2 * 256) * 1,000,000) + (4 * 128 * 256)) ~= 0.525 GB ``` @@ -191,8 +297,8 @@ The memory required for HNSW with PQ is estimated to be `1.1*(((pq_code_size / 8 As an example, assume that you have 1 million vectors with a dimension of 256, `hnsw_m` of 16, `pq_m` of 32, `pq_code_size` of 8, and 100 segments. The memory requirement can be estimated as follows: -```bash -1.1*((8 / 8 * 32 + 24 + 8 * 16) * 1000000 + 100 * (2^8 * 4 * 256)) ~= 0.215 GB +```r +1.1 * ((8 / 8 * 32 + 24 + 8 * 16) * 1000000 + 100 * (2^8 * 4 * 256)) ~= 0.215 GB ``` #### IVF memory estimation @@ -201,6 +307,6 @@ The memory required for IVF with PQ is estimated to be `1.1*(((pq_code_size / 8) For example, assume that you have 1 million vectors with a dimension of 256, `ivf_nlist` of 512, `pq_m` of 32, `pq_code_size` of 8, and 100 segments. The memory requirement can be estimated as follows: -```bash +```r 1.1*((8 / 8 * 64 + 24) * 1000000 + 100 * (2^8 * 4 * 256 + 4 * 512 * 256)) ~= 0.171 GB ``` From c1471b686b623bf6bb652bde7cbbecabd8ab42b8 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Fri, 26 Jul 2024 15:12:01 -0400 Subject: [PATCH 067/154] Add geoshape query documentation (#7829) * Add geoshape query documentation Signed-off-by: Fanit Kolchina * Update _query-dsl/geo-and-xy/geoshape.md Co-authored-by: Heather Halter Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _query-dsl/geo-and-xy/geoshape.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Heather Halter Co-authored-by: Nathan Bower --- _query-dsl/geo-and-xy/geoshape.md | 730 ++++++++++++++++++++++++++++++ _query-dsl/geo-and-xy/index.md | 2 +- 2 files changed, 731 insertions(+), 1 deletion(-) create mode 100644 _query-dsl/geo-and-xy/geoshape.md diff --git a/_query-dsl/geo-and-xy/geoshape.md b/_query-dsl/geo-and-xy/geoshape.md new file mode 100644 index 0000000000..42948666f4 --- /dev/null +++ b/_query-dsl/geo-and-xy/geoshape.md @@ -0,0 +1,730 @@ +--- +layout: default +title: Geoshape +parent: Geographic and xy queries +nav_order: 40 +--- + +# Geoshape query + +Use a geoshape query to search for documents that contain [geopoint]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-point/) or [geoshape]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-shape/) fields. You can filter documents using a [geoshape that is defined within a query](#using-a-new-shape-definition) or use a [pre-indexed geoshape](#using-a-pre-indexed-shape-definition). + +The searched document field must be mapped as `geo_point` or `geo_shape`. +{: .note} + +## Spatial relations + +When you provide a geoshape to the geoshape query, the geopoint and geoshape fields in the documents are matched using the following spatial relations to the provided shape. + +Relation | Description | Supporting geographic field type +:--- | :--- | :--- +`INTERSECTS` | (Default) Matches documents whose geopoint or geoshape intersects with the shape provided in the query. | `geo_point`, `geo_shape` +`DISJOINT` | Matches documents whose geoshape does not intersect with the shape provided in the query. | `geo_shape` +`WITHIN` | Matches documents whose geoshape is completely within the shape provided in the query. | `geo_shape` +`CONTAINS` | Matches documents whose geoshape completely contains the shape provided in the query. | `geo_shape` + +## Defining the shape in a geoshape query + +You can define the shape to filter documents in a geoshape query either by providing a new shape definition at query time or by referencing the name of a shape pre-indexed in another index. + +### Using a new shape definition + +To provide a new shape to a geoshape query, define it in the `geo_shape` field. You must define the geoshape in [GeoJSON format](https://geojson.org/). + +The following example illustrates searching for documents containing geoshapes that match a geoshape defined at query time. + +#### Step 1: Create an index + +First, create an index and map the `location` field as a `geo_shape`: + +```json +PUT /testindex +{ + "mappings": { + "properties": { + "location": { + "type": "geo_shape" + } + } + } +} +``` +{% include copy-curl.html %} + +### Step 2: Index documents + +Index one document containing a point and another containing a polygon: + +```json +PUT testindex/_doc/1 +{ + "location": { + "type": "point", + "coordinates": [ 73.0515, 41.5582 ] + } +} +``` +{% include copy-curl.html %} + +```json +PUT testindex/_doc/2 +{ + "location": { + "type": "polygon", + "coordinates": [ + [ + [ + 73.0515, + 41.5582 + ], + [ + 72.6506, + 41.5623 + ], + [ + 72.6734, + 41.7658 + ], + [ + 73.0515, + 41.5582 + ] + ] + ] + } +} +``` +{% include copy-curl.html %} + +### Step 3: Run a geoshape query + +Finally, define a geoshape to filter the documents. The following sections illustrate providing various geoshapes in a query. For more information about various geoshape formats, see [Geoshape field type]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-shape/). + +#### Envelope + +An [`envelope`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-shape#envelope) is a bounding rectangle in the `[[minLon, maxLat], [maxLon, minLat]]` format. Search for documents containing geoshape fields that intersect with the provided envelope: + +```json +GET /testindex/_search +{ + "query": { + "geo_shape": { + "location": { + "shape": { + "type": "envelope", + "coordinates": [ + [ + 71.0589, + 42.3601 + ], + [ + 74.006, + 40.7128 + ] + ] + }, + "relation": "WITHIN" + } + } + } +} +``` +{% include copy-curl.html %} + +The response contains both documents: + +```json +{ + "took": 5, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 2, + "relation": "eq" + }, + "max_score": 0, + "hits": [ + { + "_index": "testindex", + "_id": "1", + "_score": 0, + "_source": { + "location": { + "type": "point", + "coordinates": [ + 73.0515, + 41.5582 + ] + } + } + }, + { + "_index": "testindex", + "_id": "2", + "_score": 0, + "_source": { + "location": { + "type": "polygon", + "coordinates": [ + [ + [ + 73.0515, + 41.5582 + ], + [ + 72.6506, + 41.5623 + ], + [ + 72.6734, + 41.7658 + ], + [ + 73.0515, + 41.5582 + ] + ] + ] + } + } + } + ] + } +} +``` + +#### Point + +Search for documents whose geoshape fields contain the provided point: + +```json +GET /testindex/_search +{ + "query": { + "geo_shape": { + "location": { + "shape": { + "type": "point", + "coordinates": [ + 72.8000, + 41.6300 + ] + }, + "relation": "CONTAINS" + } + } + } +} +``` +{% include copy-curl.html %} + +#### Linestring + +Search for documents whose geoshape fields do not intersect with the provided linestring: + +```json +GET /testindex/_search +{ + "query": { + "geo_shape": { + "location": { + "shape": { + "type": "linestring", + "coordinates": [[74.0060, 40.7128], [71.0589, 42.3601]] + }, + "relation": "DISJOINT" + } + } + } +} +``` +{% include copy-curl.html %} + +Linestring geoshape queries do not support the `WITHIN` relation. +{: .note} + +#### Polygon + +In GeoJSON format, you must list the vertices of the polygon in counterclockwise order and close the polygon so that the first vertex and the last vertex are the same. + +Search for documents whose geoshape fields are within the provided polygon: + +```json +GET /testindex/_search +{ + "query": { + "geo_shape": { + "location": { + "shape": { + "type": "polygon", + "coordinates": [ + [ + [74.0060, 40.7128], + [73.7562, 42.6526], + [71.0589, 42.3601], + [74.0060, 40.7128] + ] + ] + }, + "relation": "WITHIN" + } + } + } +} +``` +{% include copy-curl.html %} + +#### Multipoint + +Search for documents whose geoshape fields do not intersect with the provided points: + +```json +GET /testindex/_search +{ + "query": { + "geo_shape": { + "location": { + "shape": { + "type": "multipoint", + "coordinates" : [ + [74.0060, 40.7128], + [71.0589, 42.3601] + ] + }, + "relation": "DISJOINT" + } + } + } +} +``` +{% include copy-curl.html %} + +#### Multilinestring + +Search for documents whose geoshape fields do not intersect with the provided lines: + +```json +GET /testindex/_search +{ + "query": { + "geo_shape": { + "location": { + "shape": { + "type": "multilinestring", + "coordinates" : [ + [[74.0060, 40.7128], [71.0589, 42.3601]], + [[73.7562, 42.6526], [72.6734, 41.7658]] + ] + }, + "relation": "disjoint" + } + } + } +} +``` +{% include copy-curl.html %} + +Multilinestring geoshape queries do not support the `WITHIN` relation. +{: .note} + +#### Multipolygon + +Search for documents whose geoshape fields are within the provided multipolygon: + +```json +GET /testindex/_search +{ + "query": { + "geo_shape": { + "location": { + "shape": { + "type" : "multipolygon", + "coordinates" : [ + [ + [ + [74.0060, 40.7128], + [73.7562, 42.6526], + [71.0589, 42.3601], + [74.0060, 40.7128] + ], + [ + [73.0515, 41.5582], + [72.6506, 41.5623], + [72.6734, 41.7658], + [73.0515, 41.5582] + ] + ], + [ + [ + [73.9146, 40.8252], + [73.8871, 41.0389], + [73.6853, 40.9747], + [73.9146, 40.8252] + ] + ] + ] + }, + "relation": "WITHIN" + } + } + } +} +``` +{% include copy-curl.html %} + +#### Geometry collection + +Search for documents whose geoshape fields are within the provided polygons: + +```json +GET /testindex/_search +{ + "query": { + "geo_shape": { + "location": { + "shape": { + "type": "geometrycollection", + "geometries": [ + { + "type": "polygon", + "coordinates": [[ + [74.0060, 40.7128], + [73.7562, 42.6526], + [71.0589, 42.3601], + [74.0060, 40.7128] + ]] + }, + { + "type": "polygon", + "coordinates": [[ + [73.0515, 41.5582], + [72.6506, 41.5623], + [72.6734, 41.7658], + [73.0515, 41.5582] + ]] + } + ] + }, + "relation": "WITHIN" + } + } + } +} +``` +{% include copy-curl.html %} + +Geoshape queries whose geometry collection contains a linestring or a multilinestring do not support the `WITHIN` relation. +{: .note} + +### Using a pre-indexed shape definition + +When constructing a geoshape query, you can also reference the name of a shape pre-indexed in another index. Using this method, you can define a geoshape at index time and refer to it by name at search time. + +You can define a pre-indexed geoshape in [GeoJSON](https://geojson.org/) or [Well-Known Text (WKT)](https://docs.opengeospatial.org/is/12-063r5/12-063r5.html) format. For more information about various geoshape formats, see [Geoshape field type]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-shape/). + +The `indexed_shape` object supports the following parameters. + +Parameter | Required/Optional | Description +:--- | :--- | :--- +`id` | Required | The document ID of the document containing the pre-indexed shape. +`index` | Optional | The name of the index containing the pre-indexed shape. Default is `shapes`. +`path` | Optional | The field name of the field containing the pre-indexed shape as a path. Default is `shape`. +`routing` | Optional | The routing of the document containing the pre-indexed shape. + +The following example illustrates how to reference the name of a shape pre-indexed in another index. In this example, the index `pre-indexed-shapes` contains the shape that defines the boundaries, and the index `testindex` contains the shapes that are checked against those boundaries. + +First, create the `pre-indexed-shapes` index and map the `boundaries` field for this index as a `geo_shape`: + +```json +PUT /pre-indexed-shapes +{ + "mappings": { + "properties": { + "boundaries": { + "type": "geo_shape", + "orientation" : "left" + } + } + } +} +``` +{% include copy-curl.html %} + +For more information about specifying a different vertex orientation for polygons, see [Polygon]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-shape/#polygon). + +Index a polygon specifying the search boundaries into the `pre-indexed-shapes` index. The polygon's ID is `search_triangle`. In this example, you'll index the polygon in WKT format: + +```json +PUT /pre-indexed-shapes/_doc/search_triangle +{ + "boundaries": + "POLYGON ((74.0060 40.7128, 71.0589 42.3601, 73.7562 42.6526, 74.0060 40.7128))" +} +``` +{% include copy-curl.html %} + +If you haven't already done so, index one document containing a point and another document containing a polygon into the `testindex` index: + +```json +PUT /testindex/_doc/1 +{ + "location": { + "type": "point", + "coordinates": [ 73.0515, 41.5582 ] + } +} +``` +{% include copy-curl.html %} + +```json +PUT /testindex/_doc/2 +{ + "location": { + "type": "polygon", + "coordinates": [ + [ + [ + 73.0515, + 41.5582 + ], + [ + 72.6506, + 41.5623 + ], + [ + 72.6734, + 41.7658 + ], + [ + 73.0515, + 41.5582 + ] + ] + ] + } +} +``` +{% include copy-curl.html %} + +Search for documents whose geoshapes are within the `search_triangle`: + +```json +GET /testindex/_search +{ + "query": { + "bool": { + "must": { + "match_all": {} + }, + "filter": { + "geo_shape": { + "location": { + "indexed_shape": { + "index": "pre-indexed-shapes", + "id": "search_triangle", + "path": "boundaries" + }, + "relation": "WITHIN" + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +The response contains both documents: + +```json +{ + "took": 11, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 2, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "testindex", + "_id": "1", + "_score": 1, + "_source": { + "location": { + "type": "point", + "coordinates": [ + 73.0515, + 41.5582 + ] + } + } + }, + { + "_index": "testindex", + "_id": "2", + "_score": 1, + "_source": { + "location": { + "type": "polygon", + "coordinates": [ + [ + [ + 73.0515, + 41.5582 + ], + [ + 72.6506, + 41.5623 + ], + [ + 72.6734, + 41.7658 + ], + [ + 73.0515, + 41.5582 + ] + ] + ] + } + } + } + ] + } +} +``` + +## Querying geopoints + +You can also use a geoshape query to search for documents containing geopoints. + +Geoshape queries on geopoint fields only support the default `INTERSECTS` spatial relation, so you don't need to provide the `relation` parameter. +{: .note} + +{: .important } +> Geoshape queries on geopoint fields do not support the following geoshapes: +> +> - Points +> - Linestrings +> - Multipoints +> - Multilinestrings +> - Geometry collections containing one of the preceding geoshape types + +Create a mapping where `location` is a `geo_point`: + +```json +PUT /testindex1 +{ + "mappings": { + "properties": { + "location": { + "type": "geo_point" + } + } + } +} +``` +{% include copy-curl.html %} + +Index two points into the index. In this example, you'll provide the geopoint coordinates as strings: + +```json +PUT /testindex1/_doc/1 +{ + "location": "41.5623, 72.6506" +} +``` +{% include copy-curl.html %} + +```json +PUT /testindex1/_doc/2 +{ + "location": "76.0254, 39.2467" +} +``` +{% include copy-curl.html %} + + For information about providing geopoint coordinates in various formats, see [Formats]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-point/#formats). + +Search for geopoints that intersect with the provided polygon: + +```json +GET /testindex1/_search +{ + "query": { + "geo_shape": { + "location": { + "shape": { + "type": "polygon", + "coordinates": [ + [ + [74.0060, 40.7128], + [73.7562, 42.6526], + [71.0589, 42.3601], + [74.0060, 40.7128] + ] + ] + } + } + } + } +} + +``` +{% include copy-curl.html %} + +The response returns document 1: + +```json +{ + "took": 21, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 0, + "hits": [ + { + "_index": "testindex1", + "_id": "1", + "_score": 0, + "_source": { + "location": "41.5623, 72.6506" + } + } + ] + } +} +``` + +Note that when you indexed the geopoints, you specified their coordinates in `"latitude, longitude"` format. When you search for matching documents, the coordinate array is in `[longitude, latitude]` format. Thus, document 1 is returned in the results but document 2 is not. + +## Request fields + +Geoshape queries accept the following fields. + +Field | Data type | Description +:--- | :--- | :--- +`ignore_unmapped` | Boolean | Specifies whether to ignore an unmapped field. If set to `true`, then the query does not return any documents that contain an unmapped field. If set to `false`, then an exception is thrown when the field is unmapped. Optional. Default is `false`. \ No newline at end of file diff --git a/_query-dsl/geo-and-xy/index.md b/_query-dsl/geo-and-xy/index.md index 83cdbf08d7..ee51e1e523 100644 --- a/_query-dsl/geo-and-xy/index.md +++ b/_query-dsl/geo-and-xy/index.md @@ -31,6 +31,6 @@ OpenSearch provides the following geographic query types: - [**Geo-bounding box queries**]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/geo-and-xy/geo-bounding-box/): Return documents with geopoint field values that are within a bounding box. - [**Geodistance queries**]({{site.url}}{{site.baseurl}}/query-dsl/geo-and-xy/geodistance/): Return documents with geopoints that are within a specified distance from the provided geopoint. - [**Geopolygon queries**]({{site.url}}{{site.baseurl}}/query-dsl/geo-and-xy/geodistance/): Return documents containing geopoints that are within a polygon. -- **Geoshape queries**: Return documents that contain: +- [**Geoshape queries**]({{site.url}}{{site.baseurl}}/query-dsl/geo-and-xy/geoshape/): Return documents that contain: - Geoshapes and geopoints that have one of four spatial relations to the provided shape: `INTERSECTS`, `DISJOINT`, `WITHIN`, or `CONTAINS`. - Geopoints that intersect the provided shape. \ No newline at end of file From 98886f8666cfb66a86ed5fc289c777ae147dd6e6 Mon Sep 17 00:00:00 2001 From: Varun Jain Date: Fri, 26 Jul 2024 13:10:11 -0700 Subject: [PATCH 068/154] Sorting and Search After in Hybrid Search (#7820) * Sorting and Search After Signed-off-by: Varun Jain * Change of keywords Signed-off-by: Varun Jain * Addressing Martin Comments Signed-off-by: Varun Jain * Addressing Martin Comments Signed-off-by: Varun Jain * Addressing Martin Comments Signed-off-by: Varun Jain * Doc review Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Varun Jain Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Fanit Kolchina Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- _search-plugins/hybrid-search.md | 449 +++++++++++++++++++++ _search-plugins/searching-data/paginate.md | 2 +- 2 files changed, 450 insertions(+), 1 deletion(-) diff --git a/_search-plugins/hybrid-search.md b/_search-plugins/hybrid-search.md index 7f08d63d0f..6d68645421 100644 --- a/_search-plugins/hybrid-search.md +++ b/_search-plugins/hybrid-search.md @@ -569,4 +569,453 @@ The response contains the matching documents and the aggregation results: } } } +``` + +## Using sorting with a hybrid query +**Introduced 2.16** +{: .label .label-purple } + +By default, hybrid search returns results ordered by scores in descending order. You can apply sorting to hybrid query results by providing the `sort` criteria in the search request. For more information about sort criteria, see [Sort results]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/sort/). +When sorting is applied to a hybrid search, results are fetched from the shards based on the specified sort criteria. As a result, the search results are sorted accordingly, and the document scores are `null`. Scores are only present in the hybrid search sorting results if documents are sorted by `_score`. + +In the following example, sorting is applied by `doc_price` in the hybrid query search request: + +```json +GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline +{ + "query": { + "hybrid": { + "queries": [ + { + "term": { + "category": "permission" + } + }, + { + "bool": { + "should": [ + { + "term": { + "category": "editor" + } + }, + { + "term": { + "category": "statement" + } + } + ] + } + } + ] + } + }, + "sort":[ + { + "doc_price": { + "order": "desc" + } + } + ] +} +``` +{% include copy-curl.html %} + +The response contains the matching documents sorted by `doc_price` in descending order: + +```json +{ + "took": 35, + "timed_out": false, + "_shards": { + "total": 3, + "successful": 3, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 4, + "relation": "eq" + }, + "max_score": 0.5, + "hits": [ + { + "_index": "my-nlp-index", + "_id": "7yaM4JABZkI1FQv8AwoN", + "_score": null, + "_source": { + "category": "statement", + "doc_keyword": "entire", + "doc_index": 8242, + "doc_price": 350 + }, + "sort": [ + 350 + ] + }, + { + "_index": "my-nlp-index", + "_id": "8CaM4JABZkI1FQv8AwoN", + "_score": null, + "_source": { + "category": "statement", + "doc_keyword": "idea", + "doc_index": 5212, + "doc_price": 200 + }, + "sort": [ + 200 + ] + }, + { + "_index": "my-nlp-index", + "_id": "6yaM4JABZkI1FQv8AwoM", + "_score": null, + "_source": { + "category": "permission", + "doc_keyword": "workable", + "doc_index": 4976, + "doc_price": 100 + }, + "sort": [ + 100 + ] + }, + { + "_index": "my-nlp-index", + "_id": "7iaM4JABZkI1FQv8AwoN", + "_score": null, + "_source": { + "category": "editor", + "doc_index": 9871, + "doc_price": 30 + }, + "sort": [ + 30 + ] + } + ] + } +} +``` + +In the following example, sorting is applied by `_id`: + +```json +GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline +{ + "query": { + "hybrid": { + "queries": [ + { + "term": { + "category": "permission" + } + }, + { + "bool": { + "should": [ + { + "term": { + "category": "editor" + } + }, + { + "term": { + "category": "statement" + } + } + ] + } + } + ] + } + }, + "sort":[ + { + "_id": { + "order": "desc" + } + } + ] +} +``` +{% include copy-curl.html %} + +The response contains the matching documents sorted by `_id` in descending order: + +```json +{ + "took": 33, + "timed_out": false, + "_shards": { + "total": 3, + "successful": 3, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 4, + "relation": "eq" + }, + "max_score": 0.5, + "hits": [ + { + "_index": "my-nlp-index", + "_id": "8CaM4JABZkI1FQv8AwoN", + "_score": null, + "_source": { + "category": "statement", + "doc_keyword": "idea", + "doc_index": 5212, + "doc_price": 200 + }, + "sort": [ + "8CaM4JABZkI1FQv8AwoN" + ] + }, + { + "_index": "my-nlp-index", + "_id": "7yaM4JABZkI1FQv8AwoN", + "_score": null, + "_source": { + "category": "statement", + "doc_keyword": "entire", + "doc_index": 8242, + "doc_price": 350 + }, + "sort": [ + "7yaM4JABZkI1FQv8AwoN" + ] + }, + { + "_index": "my-nlp-index", + "_id": "7iaM4JABZkI1FQv8AwoN", + "_score": null, + "_source": { + "category": "editor", + "doc_index": 9871, + "doc_price": 30 + }, + "sort": [ + "7iaM4JABZkI1FQv8AwoN" + ] + }, + { + "_index": "my-nlp-index", + "_id": "6yaM4JABZkI1FQv8AwoM", + "_score": null, + "_source": { + "category": "permission", + "doc_keyword": "workable", + "doc_index": 4976, + "doc_price": 100 + }, + "sort": [ + "6yaM4JABZkI1FQv8AwoM" + ] + } + ] + } +} +``` + +## Hybrid search with search_after +**Introduced 2.16** +{: .label .label-purple } + +You can control sorting results by applying a `search_after` condition that provides a live cursor and uses the previous page's results to obtain the next page's results. For more information about `search_after`, see [The search_after parameter]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/paginate/#the-search_after-parameter). + +You can paginate the sorted results by applying a `search_after` condition in the sort queries. + +In the following example, sorting is applied by `doc_price` with a `search_after` condition: + +```json +GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline +{ + "query": { + "hybrid": { + "queries": [ + { + "term": { + "category": "permission" + } + }, + { + "bool": { + "should": [ + { + "term": { + "category": "editor" + } + }, + { + "term": { + "category": "statement" + } + } + ] + } + } + ] + } + }, + "sort":[ + { + "_id": { + "order": "desc" + } + } + ], + "search_after":[200] +} +``` +{% include copy-curl.html %} + +The response contains the matching documents that are listed after the `200` sort value, sorted by `doc_price` in descending order: + +```json +{ + "took": 8, + "timed_out": false, + "_shards": { + "total": 3, + "successful": 3, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 4, + "relation": "eq" + }, + "max_score": 0.5, + "hits": [ + { + "_index": "my-nlp-index", + "_id": "6yaM4JABZkI1FQv8AwoM", + "_score": null, + "_source": { + "category": "permission", + "doc_keyword": "workable", + "doc_index": 4976, + "doc_price": 100 + }, + "sort": [ + 100 + ] + }, + { + "_index": "my-nlp-index", + "_id": "7iaM4JABZkI1FQv8AwoN", + "_score": null, + "_source": { + "category": "editor", + "doc_index": 9871, + "doc_price": 30 + }, + "sort": [ + 30 + ] + } + ] + } +} +``` + +In the following example, sorting is applied by `id` with a `search_after` condition: + +```json +GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline +{ + "query": { + "hybrid": { + "queries": [ + { + "term": { + "category": "permission" + } + }, + { + "bool": { + "should": [ + { + "term": { + "category": "editor" + } + }, + { + "term": { + "category": "statement" + } + } + ] + } + } + ] + } + }, + "sort":[ + { + "_id": { + "order": "desc" + } + } + ], + "search_after":["7yaM4JABZkI1FQv8AwoN"] +} +``` +{% include copy-curl.html %} + +The response contains the matching documents that are listed after the `7yaM4JABZkI1FQv8AwoN` sort value, sorted by `id` in descending order: + +```json +{ + "took": 17, + "timed_out": false, + "_shards": { + "total": 3, + "successful": 3, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 4, + "relation": "eq" + }, + "max_score": 0.5, + "hits": [ + { + "_index": "my-nlp-index", + "_id": "7iaM4JABZkI1FQv8AwoN", + "_score": null, + "_source": { + "category": "editor", + "doc_index": 9871, + "doc_price": 30 + }, + "sort": [ + "7iaM4JABZkI1FQv8AwoN" + ] + }, + { + "_index": "my-nlp-index", + "_id": "6yaM4JABZkI1FQv8AwoM", + "_score": null, + "_source": { + "category": "permission", + "doc_keyword": "workable", + "doc_index": 4976, + "doc_price": 100 + }, + "sort": [ + "6yaM4JABZkI1FQv8AwoM" + ] + } + ] + } +} ``` \ No newline at end of file diff --git a/_search-plugins/searching-data/paginate.md b/_search-plugins/searching-data/paginate.md index ca6e9544be..6040065991 100644 --- a/_search-plugins/searching-data/paginate.md +++ b/_search-plugins/searching-data/paginate.md @@ -157,7 +157,7 @@ Because open search contexts consume a lot of memory, we suggest you don't use t ## The `search_after` parameter -The `search_after` parameter provides a live cursor that uses the previous page's results to obtain the next page's results. It is similar to the `scroll` operation in that it is meant to scroll many queries in parallel. +The `search_after` parameter provides a live cursor that uses the previous page's results to obtain the next page's results. It is similar to the `scroll` operation in that it is meant to scroll many queries in parallel. You can use `search_after` only when sorting is applied. For example, the following query sorts all lines from the play "Hamlet" by the speech number and then the ID and retrieves the first three results: From 9f9e6d5b0bb5844b465f66fd7450ce3da9bafcfd Mon Sep 17 00:00:00 2001 From: Liyun Xiu Date: Tue, 30 Jul 2024 01:39:33 +0800 Subject: [PATCH 069/154] Move bulk API's batch_size parameter to processors (#7719) * Deprecate batch_size from bulk API & introduce batch_size in two processors Signed-off-by: Liyun Xiu * Remove empty line Signed-off-by: Liyun Xiu * Update _api-reference/document-apis/bulk.md Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Liyun Xiu * Update _ingest-pipelines/processors/sparse-encoding.md Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Liyun Xiu * Update _ingest-pipelines/processors/text-embedding.md Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Liyun Xiu * Update _ml-commons-plugin/remote-models/batch-ingestion.md Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _ml-commons-plugin/remote-models/batch-ingestion.md Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Liyun Xiu Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Nathan Bower --- _api-reference/document-apis/bulk.md | 2 +- _ingest-pipelines/processors/sparse-encoding.md | 1 + _ingest-pipelines/processors/text-embedding.md | 1 + _ml-commons-plugin/remote-models/batch-ingestion.md | 8 +++++--- 4 files changed, 8 insertions(+), 4 deletions(-) diff --git a/_api-reference/document-apis/bulk.md b/_api-reference/document-apis/bulk.md index a9833a701f..0475aa573d 100644 --- a/_api-reference/document-apis/bulk.md +++ b/_api-reference/document-apis/bulk.md @@ -59,7 +59,7 @@ routing | String | Routes the request to the specified shard. timeout | Time | How long to wait for the request to return. Default `1m`. type | String | (Deprecated) The default document type for documents that don't specify a type. Default is `_doc`. We highly recommend ignoring this parameter and using a type of `_doc` for all indexes. wait_for_active_shards | String | Specifies the number of active shards that must be available before OpenSearch processes the bulk request. Default is 1 (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the request to succeed. -batch_size | Integer | Specifies the number of documents to be batched and sent to an ingest pipeline to be processed together. Default is `1` (documents are ingested by an ingest pipeline one at a time). If the bulk request doesn't explicitly specify an ingest pipeline or the index doesn't have a default ingest pipeline, then this parameter is ignored. Only documents with `create`, `index`, or `update` actions can be grouped into batches. +batch_size | Integer | **(Deprecated)** Specifies the number of documents to be batched and sent to an ingest pipeline to be processed together. Default is `2147483647` (documents are ingested by an ingest pipeline all at once). If the bulk request doesn't explicitly specify an ingest pipeline or the index doesn't have a default ingest pipeline, then this parameter is ignored. Only documents with `create`, `index`, or `update` actions can be grouped into batches. {% comment %}_source | List | asdf _source_excludes | list | asdf _source_includes | list | asdf{% endcomment %} diff --git a/_ingest-pipelines/processors/sparse-encoding.md b/_ingest-pipelines/processors/sparse-encoding.md index 1f86447ed5..38b44320b1 100644 --- a/_ingest-pipelines/processors/sparse-encoding.md +++ b/_ingest-pipelines/processors/sparse-encoding.md @@ -41,6 +41,7 @@ The following table lists the required and optional parameters for the `sparse_e `field_map.` | String | Required | The name of the vector field in which to store the generated vector embeddings. `description` | String | Optional | A brief description of the processor. | `tag` | String | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | +`batch_size` | Integer | Optional | Specifies the number of documents to be batched and processed each time. Default is `1`. | ## Using the processor diff --git a/_ingest-pipelines/processors/text-embedding.md b/_ingest-pipelines/processors/text-embedding.md index 28b18c2ebf..6d263a0fec 100644 --- a/_ingest-pipelines/processors/text-embedding.md +++ b/_ingest-pipelines/processors/text-embedding.md @@ -41,6 +41,7 @@ The following table lists the required and optional parameters for the `text_emb `field_map.` | String | Required | The name of the vector field in which to store the generated text embeddings. `description` | String | Optional | A brief description of the processor. | `tag` | String | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | +`batch_size` | Integer | Optional | Specifies the number of documents to be batched and processed each time. Default is `1`. | ## Using the processor diff --git a/_ml-commons-plugin/remote-models/batch-ingestion.md b/_ml-commons-plugin/remote-models/batch-ingestion.md index 64f434e652..80b31a9fe4 100644 --- a/_ml-commons-plugin/remote-models/batch-ingestion.md +++ b/_ml-commons-plugin/remote-models/batch-ingestion.md @@ -14,10 +14,11 @@ grand_parent: Integrating ML models If you are ingesting multiple documents and generating embeddings by invoking an externally hosted model, you can use batch ingestion to improve performance. -The [Bulk API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/) accepts a `batch_size` parameter that specifies to process documents in batches of a specified size. Processors that support batch ingestion will send each batch of documents to an externally hosted model in a single request. +When using the [Bulk API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/) to ingest documents, processors that support batch ingestion will split documents into batches and send each batch of documents to an externally hosted model in a single request. The [`text_embedding`]({{site.url}}{{site.baseurl}}/ingest-pipelines/processors/text-embedding/) and [`sparse_encoding`]({{site.url}}{{site.baseurl}}/ingest-pipelines/processors/sparse-encoding/) processors currently support batch ingestion. + ## Step 1: Register a model group You can register a model in two ways: @@ -212,7 +213,8 @@ PUT /_ingest/pipeline/nlp-ingest-pipeline "model_id": "cleMb4kBJ1eYAeTMFFg4", "field_map": { "passage_text": "passage_embedding" - } + }, + "batch_size": 5 } } ] @@ -222,7 +224,7 @@ PUT /_ingest/pipeline/nlp-ingest-pipeline ## Step 6: Perform bulk indexing -To ingest documents in bulk, call the Bulk API and provide the `batch_size` and `pipeline` parameters. If you don't provide a `pipeline` parameter, the default ingest pipeline for the index will be used for ingestion: +To ingest documents in bulk, call the Bulk API and provide the `pipeline` parameter. If you don't provide a `pipeline` parameter, then the default ingest pipeline for the index will be used for ingestion: ```json POST _bulk?batch_size=5&pipeline=nlp-ingest-pipeline From e7b36f7ceb95e4084720acdb365263f9f0002622 Mon Sep 17 00:00:00 2001 From: Craig Perkins Date: Mon, 29 Jul 2024 19:30:33 -0400 Subject: [PATCH 070/154] Rework plugins doc and add the mapper-size plugin (#7646) * Add documentation about the mapper-size plugin Signed-off-by: Craig Perkins * Fix link checker Signed-off-by: Craig Perkins * Apply feedback Signed-off-by: Craig Perkins * Fix links Signed-off-by: Craig Perkins * Respond to feedback Signed-off-by: Craig Perkins * fix-nav-pane Signed-off-by: Heather Halter * fixed-parent Signed-off-by: Heather Halter * additional-formatting-changes Signed-off-by: Heather Halter * moretweaks Signed-off-by: Heather Halter * more tweaks Signed-off-by: Heather Halter * reworded the intros and added collapsible blocks Signed-off-by: Heather Halter * small tweak Signed-off-by: Heather Halter * more tweaks to wording Signed-off-by: Heather Halter * Update installation message text Signed-off-by: Craig Perkins * Update _install-and-configure/additional-plugins/index.md Co-authored-by: Heather Halter Signed-off-by: Craig Perkins * Mention more core functionality Signed-off-by: Craig Perkins * final changes Signed-off-by: Heather Halter * Update _install-and-configure/plugins.md Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Heather Halter * Update _install-and-configure/additional-plugins/index.md Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Heather Halter * Update _install-and-configure/additional-plugins/index.md Co-authored-by: Nathan Bower Signed-off-by: Heather Halter * Update _install-and-configure/additional-plugins/index.md Co-authored-by: Nathan Bower Signed-off-by: Heather Halter * Update _install-and-configure/additional-plugins/mapper-size-plugin.md Co-authored-by: Nathan Bower Signed-off-by: Heather Halter * Update _install-and-configure/additional-plugins/mapper-size-plugin.md Co-authored-by: Nathan Bower Signed-off-by: Heather Halter * Update _install-and-configure/plugins.md Co-authored-by: Nathan Bower Signed-off-by: Heather Halter * Update _install-and-configure/plugins.md Co-authored-by: Nathan Bower Signed-off-by: Heather Halter * Update _install-and-configure/plugins.md Co-authored-by: Nathan Bower Signed-off-by: Heather Halter * Update _install-and-configure/plugins.md Co-authored-by: Nathan Bower Signed-off-by: Heather Halter * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Heather Halter * Update _install-and-configure/plugins.md Co-authored-by: Heather Halter Signed-off-by: Craig Perkins * Update _install-and-configure/plugins.md Co-authored-by: Heather Halter Signed-off-by: Craig Perkins --------- Signed-off-by: Craig Perkins Signed-off-by: Heather Halter Signed-off-by: Craig Perkins Signed-off-by: Heather Halter Co-authored-by: Heather Halter Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Nathan Bower --- .../additional-plugins/index.md | 40 ++++++ .../additional-plugins/mapper-size-plugin.md | 100 ++++++++++++++ _install-and-configure/plugins.md | 123 +++++++++--------- 3 files changed, 200 insertions(+), 63 deletions(-) create mode 100644 _install-and-configure/additional-plugins/index.md create mode 100644 _install-and-configure/additional-plugins/mapper-size-plugin.md diff --git a/_install-and-configure/additional-plugins/index.md b/_install-and-configure/additional-plugins/index.md new file mode 100644 index 0000000000..de97af0b1a --- /dev/null +++ b/_install-and-configure/additional-plugins/index.md @@ -0,0 +1,40 @@ +--- +layout: default +title: Additional plugins +parent: Installing plugins +nav_order: 10 +--- + +# Additional plugins + +There are many more plugins available in addition to those provided by the standard distribution of OpenSearch. These additional plugins have been built by OpenSearch developers or members of the OpenSearch community. While it isn't possible to provide an exhaustive list (because many plugins are not maintained in an OpenSearch GitHub repository), the following plugins, available in the [OpenSearch/plugins](https://github.com/opensearch-project/OpenSearch/tree/main/plugins) directory on GitHub, are some of the plugins that can be installed using one of the installation options, for example, using the command `bin/opensearch-plugin install `. + + +| Plugin name | Earliest available version | +| :--- | :--- | +| analysis-icu | 1.0.0 | +| analysis-kuromoji | 1.0.0 | +| analysis-nori | 1.0.0 | +| analysis-phonetic | 1.0.0 | +| analysis-smartcn | 1.0.0 | +| analysis-stempel | 1.0.0 | +| analysis-ukrainian | 1.0.0 | +| discovery-azure-classic | 1.0.0 | +| discovery-ec2 | 1.0.0 | +| discovery-gce | 1.0.0 | +| ingest-attachment | 1.0.0 | +| mapper-annotated-text | 1.0.0 | +| mapper-murmur3 | 1.0.0 | +| [`mapper-size`]({{site.url}}{{site.baseurl}}/install-and-configure/additional-plugins/mapper-size-plugin/) | 1.0.0 | +| query-insights | 2.12.0 | +| repository-azure | 1.0.0 | +| repository-gcs | 1.0.0 | +| repository-hdfs | 1.0.0 | +| repository-s3 | 1.0.0 | +| store-smb | 1.0.0 | +| transport-nio | 1.0.0 | + + +## Related articles +[Installing plugins]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/) +[`mapper-size` plugin]({{site.url}}{{site.baseurl}}/install-and-configure/additional-plugins/mapper-size-plugin/) diff --git a/_install-and-configure/additional-plugins/mapper-size-plugin.md b/_install-and-configure/additional-plugins/mapper-size-plugin.md new file mode 100644 index 0000000000..4c68d9a5a6 --- /dev/null +++ b/_install-and-configure/additional-plugins/mapper-size-plugin.md @@ -0,0 +1,100 @@ +--- +layout: default +title: Mapper-size plugin +parent: Installing plugins +nav_order: 20 + +--- + +# Mapper-size plugin + +The `mapper-size` plugin enables the use of the `_size` field in OpenSearch indexes. The `_size` field stores the size, in bytes, of each document. + +## Installing the plugin + +You can install the `mapper-size` plugin using the following command: + +```sh +./bin/opensearch-plugin install mapper-size +``` + +## Examples + +After starting up a cluster, you can create an index with size mapping enabled, index a document, and search for documents, as shown in the following examples. + +### Create an index with size mapping enabled + +```sh +curl -XPUT example-index -H "Content-Type: application/json" -d '{ + "mappings": { + "_size": { + "enabled": true + }, + "properties": { + "name": { + "type": "text" + }, + "age": { + "type": "integer" + } + } + } +}' +``` + +### Index a document + +```sh +curl -XPOST example-index/_doc -H "Content-Type: application/json" -d '{ + "name": "John Doe", + "age": 30 +}' +``` + +### Query the index + +```sh +curl -XGET example-index/_search -H "Content-Type: application/json" -d '{ + "query": { + "match_all": {} + }, + "stored_fields": ["_size", "_source"] +}' +``` + +### Query results + +In the following example, the `_size` field is included in the query results and shows the size, in bytes, of the indexed document: + +```json +{ + "took": 2, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1.0, + "hits": [ + { + "_index": "example_index", + "_id": "Pctw0I8BLto8I5f_NLKK", + "_score": 1.0, + "_size": 37, + "_source": { + "name": "John Doe", + "age": 30 + } + } + ] + } +} +``` + diff --git a/_install-and-configure/plugins.md b/_install-and-configure/plugins.md index d4fc35507f..3a5d6a1834 100644 --- a/_install-and-configure/plugins.md +++ b/_install-and-configure/plugins.md @@ -2,6 +2,7 @@ layout: default title: Installing plugins nav_order: 90 +has_children: true redirect_from: - /opensearch/install/plugins/ - /install-and-configure/install-opensearch/plugins/ @@ -9,22 +10,26 @@ redirect_from: # Installing plugins -You can install individual plugins for OpenSearch based on your needs. For information about available plugins, see [Available plugins](#available-plugins). +OpenSearch comprises of a number of plugins that add features and capabilities to the core platform. The plugins available to you are dependent on how OpenSearch was installed and which plugins were subsequently added or removed. For example, the minimal distribution of OpenSearch enables only core functionality, such as indexing and search. Using the minimal distribution of OpenSearch is beneficial when you are working in a testing environment, have custom plugins, or are intending to integrate OpenSearch with other services. -For plugins to work properly with OpenSearch, all plugins must have the ability to access the data in the cluster, including metadata about cluster operations. Therefore, to protect your cluster's data and preserve its integrity, first be sure you understand the function of a plugin before installing it on your OpenSearch cluster. Second, when selecting a custom plugin, make sure the plugin's source is a reliable one. +The standard distribution of OpenSearch has much more functionality included. You can choose to add additional plugins or remove any of the plugins you don't need. + +For a list of the available plugins, see [Available plugins](#available-plugins). + +For a plugin to work properly with OpenSearch, it may request certain permissions as part of the installation process. Review the requested permissions and proceed accordingly. It is important that you understand a plugin's functionality before installation. When opting for a community-provided plugin, ensure that the source is trustworthy and reliable. {: .warning} ## Managing plugins -OpenSearch uses a command line tool called `opensearch-plugin` for managing plugins. This tool allows you to: +To manage plugins in OpenSearch, you can use a command line tool called `opensearch-plugin`. This tool allows you to perform the following actions: - [List](#list) installed plugins. - [Install](#install) plugins. - [Remove](#remove) an installed plugin. -Print help text by passing `-h` or `--help`. Depending on your host configuration, you might also need to run the command with `sudo` privileges. +You can print help text by passing `-h` or `--help`. Depending on your host configuration, you might also need to run the command with `sudo` privileges. -If you are running OpenSearch in a Docker container, plugins must be installed, removed, and configured by modifying the Docker image. For information, see [Working with plugins]({{site.url}}{{site.baseurl}}/install-and-configure/install-opensearch/docker#working-with-plugins) +If you're running OpenSearch in a Docker container, plugins must be installed, removed, and configured by modifying the Docker image. For more information, see [Working with plugins]({{site.url}}{{site.baseurl}}/install-and-configure/install-opensearch/docker#working-with-plugins). {: .note} ## List @@ -57,9 +62,10 @@ opensearch-security opensearch-sql ``` +## List (with CAT API) You can also list installed plugins by using the [CAT API]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-plugins/). -#### Path and HTTP method +#### Usage ```bash GET _cat/plugins @@ -82,15 +88,15 @@ opensearch-node1 opensearch-notifications-core 2.0.1.0 ## Install -There are three ways to install plugins using the `opensearch-plugin`: +There are three ways to install plugins using the `opensearch-plugin` tool: - [Install a plugin by name](#install-a-plugin-by-name). -- [Install a plugin from a ZIP file](#install-a-plugin-from-a-zip-file). +- [Install a plugin from a zip file](#install-a-plugin-from-a-zip-file). - [Install a plugin using Maven coordinates](#install-a-plugin-using-maven-coordinates). -### Install a plugin by name: +### Install a plugin by name -For a list of plugins that can be installed by name, see [Additional plugins](#additional-plugins). +You can install plugins that aren't already preinstalled in your installation by using the plugin name. For a list of plugins that may not be preinstalled, see [Additional plugins](#additional-plugins). #### Usage ```bash @@ -108,7 +114,7 @@ $ sudo ./opensearch-plugin install analysis-icu ### Install a plugin from a zip file -Remote zip files can be installed by replacing `` with the URL of the hosted file. The tool only supports downloading over HTTP/HTTPS protocols. For local zip files, replace `` with `file:` followed by the absolute or relative path to the plugin zip file as in the second example below. +You can install remote zip files by replacing `` with the URL of the hosted file. The tool supports downloading over HTTP/HTTPS protocols only. For local zip files, replace `` with `file:` followed by the absolute or relative path to the plugin zip file, as shown in the second example that follows. #### Usage ```bash @@ -116,6 +122,12 @@ bin/opensearch-plugin install ``` #### Example +
+ + Select to expand the example + + {: .text-delta} + ```bash # Zip file is hosted on a remote server - in this case, Maven central repository. $ sudo ./opensearch-plugin install https://repo1.maven.org/maven2/org/opensearch/plugin/opensearch-anomaly-detection/2.2.0.0/opensearch-anomaly-detection-2.2.0.0.zip @@ -165,10 +177,11 @@ for descriptions of what these permissions allow and the associated risks. Continue with installation? [y/N]y -> Installed opensearch-anomaly-detection with folder name opensearch-anomaly-detection ``` +
### Install a plugin using Maven coordinates -The `opensearch-plugin install` tool also accepts Maven coordinates for available artifacts and versions hosted on [Maven Central](https://search.maven.org/search?q=org.opensearch.plugin). `opensearch-plugin` will parse the Maven coordinates you provide and construct a URL. As a result, the host must be able to connect directly to [Maven Central](https://search.maven.org/search?q=org.opensearch.plugin). The plugin installation will fail if you pass coordinates to a proxy or local repository. +The `opensearch-plugin install` tool also allows you to specify Maven coordinates for available artifacts and versions hosted on [Maven Central](https://search.maven.org/search?q=org.opensearch.plugin). The tool parses the Maven coordinates you provide and constructs a URL. As a result, the host must be able to connect directly to the Maven Central site. The plugin installation fails if you pass coordinates to a proxy or local repository. #### Usage ```bash @@ -176,6 +189,13 @@ bin/opensearch-plugin install :: ``` #### Example + +
+ + Select to expand the example + + {: .text-delta} + ```console $ sudo ./opensearch-plugin install org.opensearch.plugin:opensearch-anomaly-detection:2.2.0.0 -> Installing org.opensearch.plugin:opensearch-anomaly-detection:2.2.0.0 @@ -200,11 +220,12 @@ for descriptions of what these permissions allow and the associated risks. Continue with installation? [y/N]y -> Installed opensearch-anomaly-detection with folder name opensearch-anomaly-detection ``` +
Restart your OpenSearch node after installing a plugin. {: .note} -### Installing multiple plugins +## Installing multiple plugins Multiple plugins can be installed in a single invocation. @@ -238,7 +259,7 @@ Restart your OpenSearch node after removing a plugin. ## Batch mode -When installing plugins that require additional privileges not included by default, the plugins will prompt the user for confirmation of the required privileges. To grant all requested privileges, use batch mode to skip the confirmation prompt. +When installing a plugin that requires additional privileges that are not included by default, the plugin will prompt you for confirmation of the required privileges. To grant all requested privileges, use batch mode to skip the confirmation prompt. To force batch mode when installing plugins, add the `-b` or `--batch` option: ```bash @@ -247,27 +268,11 @@ bin/opensearch-plugin install --batch ## Available plugins -OpenSearch provides several bundled and additional plugins. - -### Plugin compatibility - -A plugin can explicitly specify compatibility with a specific OpenSearch version by listing that version in its `plugin-descriptor.properties` file. For example, a plugin with the following property is compatible only with OpenSearch 2.3.0: - -```properties -opensearch.version=2.3.0 -``` -Alternatively, a plugin can specify a range of compatible OpenSearch versions by setting the `dependencies` property in its `plugin-descriptor.properties` file using one of the following notations: -- `dependencies={ opensearch: "2.3.0" }`: The plugin is compatible only with OpenSearch version 2.3.0. -- `dependencies={ opensearch: "=2.3.0" }`: The plugin is compatible only with OpenSearch version 2.3.0. -- `dependencies={ opensearch: "~2.3.0" }`: The plugin is compatible with all versions starting from 2.3.0 up to the next minor version, in this example, 2.4.0 (exclusive). -- `dependencies={ opensearch: "^2.3.0" }`: The plugin is compatible with all versions starting from 2.3.0 up to the next major version, in this example, 3.0.0 (exclusive). - -You can specify only one of the `opensearch.version` or `dependencies` properties. -{: .note} +OpenSearch provides several bundled plugins that are available for immediate use with all OpenSearch distributions except for the minimal distribution. Additional plugins are available but must be installed separately using one of the installation options. ### Bundled plugins -The following plugins are bundled with all OpenSearch distributions except for minimum distribution packages. +The following plugins are bundled with all OpenSearch distributions except for the minimal distribution. If you are using the minimal distribution, you can add these plugins by using one of the installation methods. | Plugin name | Repository | Earliest available version | | :--- | :--- | :--- | @@ -299,45 +304,37 @@ _2Performance Analyzer is not available on Windows._ ### Additional plugins -Members of the OpenSearch community have built countless plugins for the service. Although it isn't possible to build an exhaustive list of every plugin, since many plugins are not maintained within the OpenSearch GitHub repository, the following list of plugins are available to be installed by name using `bin/opensearch-plugin install `. - -| Plugin name | Earliest available version | -| :--- | :--- | -| analysis-icu | 1.0.0 | -| analysis-kuromoji | 1.0.0 | -| analysis-nori | 1.0.0 | -| analysis-phonetic | 1.0.0 | -| analysis-smartcn | 1.0.0 | -| analysis-stempel | 1.0.0 | -| analysis-ukrainian | 1.0.0 | -| discovery-azure-classic | 1.0.0 | -| discovery-ec2 | 1.0.0 | -| discovery-gce | 1.0.0 | -| ingest-attachment | 1.0.0 | -| mapper-annotated-text | 1.0.0 | -| mapper-murmur3 | 1.0.0 | -| mapper-size | 1.0.0 | -| query-insights | 2.12.0 | -| repository-azure | 1.0.0 | -| repository-gcs | 1.0.0 | -| repository-hdfs | 1.0.0 | -| repository-s3 | 1.0.0 | -| store-smb | 1.0.0 | -| transport-nio | 1.0.0 | +There are many more plugins available in addition to those provided by the default distribution. These additional plugins have been built by OpenSearch developers or members of the OpenSearch community. For a list of additional plugins you can install, see [Additional plugins]({{site.url}}{{site.baseurl}}/install-and-configure/additional-plugins/index/). + +## Plugin compatibility + +You can specify plugin compatibility with a particular OpenSearch version in the `plugin-descriptor.properties` file. For example, a plugin with the following property is compatible only with OpenSearch 2.3.0: + +```properties +opensearch.version=2.3.0 +``` +Alternatively, you can specify a range of compatible OpenSearch versions by setting the `dependencies` property in the `plugin-descriptor.properties` file to one of the following notations: +- `dependencies={ opensearch: "2.3.0" }`: The plugin is compatible only with OpenSearch version 2.3.0. +- `dependencies={ opensearch: "=2.3.0" }`: The plugin is compatible only with OpenSearch version 2.3.0. +- `dependencies={ opensearch: "~2.3.0" }`: The plugin is compatible with all versions from 2.3.0 up to the next minor version, in this example, 2.4.0 (exclusive). +- `dependencies={ opensearch: "^2.3.0" }`: The plugin is compatible with all versions from 2.3.0 up to the next major version, in this example, 3.0.0 (exclusive). + +You can specify only one of the `opensearch.version` or `dependencies` properties. +{: .note} ## Related links -- [About Observability]({{site.url}}{{site.baseurl}}/observability-plugin/index/) -- [About security analytics]({{site.url}}{{site.baseurl}}/security-analytics/index/) -- [About the Security plugin]({{site.url}}{{site.baseurl}}/security/index/) +- [Observability]({{site.url}}{{site.baseurl}}/observability-plugin/index/) +- [Security Analytics]({{site.url}}{{site.baseurl}}/security-analytics/index/) +- [Security]({{site.url}}{{site.baseurl}}/security/index/) - [Alerting]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/index/) - [Anomaly detection]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/index/) - [Asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/index/) - [Cross-cluster replication]({{site.url}}{{site.baseurl}}/replication-plugin/index/) - [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/ism/index/) -- [k-NN]({{site.url}}{{site.baseurl}}/search-plugins/knn/index/) -- [ML Commons plugin]({{site.url}}{{site.baseurl}}/ml-commons-plugin/index/) -- [Neural Search]({{site.url}}{{site.baseurl}}/neural-search-plugin/index/) +- [k-NN search]({{site.url}}{{site.baseurl}}/search-plugins/knn/index/) +- [ML Commons]({{site.url}}{{site.baseurl}}/ml-commons-plugin/index/) +- [Neural search]({{site.url}}{{site.baseurl}}/neural-search-plugin/index/) - [Notifications]({{site.url}}{{site.baseurl}}/notifications-plugin/index/) - [OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/dashboards/index/) - [Performance Analyzer]({{site.url}}{{site.baseurl}}/monitoring-plugins/pa/index/) From 8e03f53e207b56d7bfa1bd590065fb272ca64a73 Mon Sep 17 00:00:00 2001 From: Sicheng Song Date: Tue, 30 Jul 2024 08:13:13 -0700 Subject: [PATCH 071/154] Add predefined model interface doc (#7830) * Add predefined model interface doc Signed-off-by: Sicheng Song * Update _ml-commons-plugin/api/model-apis/register-model.md Co-authored-by: Nathan Bower Signed-off-by: Sicheng Song * Update _ml-commons-plugin/api/model-apis/register-model.md Co-authored-by: Nathan Bower Signed-off-by: Sicheng Song * Update _ml-commons-plugin/api/model-apis/register-model.md Co-authored-by: Nathan Bower Signed-off-by: Sicheng Song * Update _ml-commons-plugin/api/model-apis/register-model.md Co-authored-by: Nathan Bower Signed-off-by: Sicheng Song * Update _ml-commons-plugin/api/model-apis/register-model.md Co-authored-by: Nathan Bower Signed-off-by: Sicheng Song * Address comments Signed-off-by: Sicheng Song * Doc review Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Sicheng Song Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower Co-authored-by: Fanit Kolchina Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- .../Vocab/OpenSearch/Products/accept.txt | 1 + .../api/model-apis/register-model.md | 23 ++++++++++++++++--- 2 files changed, 21 insertions(+), 3 deletions(-) diff --git a/.github/vale/styles/Vocab/OpenSearch/Products/accept.txt b/.github/vale/styles/Vocab/OpenSearch/Products/accept.txt index 9be8da79a9..4ea310a086 100644 --- a/.github/vale/styles/Vocab/OpenSearch/Products/accept.txt +++ b/.github/vale/styles/Vocab/OpenSearch/Products/accept.txt @@ -86,6 +86,7 @@ RPM Package Manager Ruby Simple Schema for Observability Tableau +Textract TorchScript Tribuo VisBuilder diff --git a/_ml-commons-plugin/api/model-apis/register-model.md b/_ml-commons-plugin/api/model-apis/register-model.md index ec830a7821..2a0e9706e9 100644 --- a/_ml-commons-plugin/api/model-apis/register-model.md +++ b/_ml-commons-plugin/api/model-apis/register-model.md @@ -357,7 +357,8 @@ OpenSearch responds with the `task_id`, task `status`, and `model_id`: ### The `interface` parameter -The model interface provides a highly flexible way to add arbitrary metadata annotations to all local deep learning models and remote models in a JSON schema syntax. This annotation initiates a validation check on the input and output fields of the model during the model's invocation. The validation check ensures that the input and output fields are in the correct format both before and after the model performs a prediction. +The model interface provides a highly flexible way to add arbitrary metadata annotations to all local deep learning models and externally hosted models in a JSON schema syntax. This annotation initiates a validation check on the input and output fields of the model during the model's invocation. The validation check ensures that the input and output fields are in the correct format both before and after the model performs inference. + To register a model with a model interface, provide the `interface` parameter, which supports the following fields. Field | Data type | Description @@ -365,9 +366,25 @@ Field | Data type | Description `input`| Object | The JSON schema for the model input. | `output`| Object | The JSON schema for the model output. | -The input and output fields will be evaluated against the separately provided JSON schema. You do not necessarily need to provide both input and output fields simultaneously. +The input and output fields are evaluated against the provided JSON schema. You do not need to provide both fields simultaneously. + +#### Connector model interfaces + +To simplify your workflow, you can register an externally hosted model using a connector in one of the [connector blueprint]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/blueprints/) formats. If you do so, a predefined model interface for this connector is generated automatically during model registration. The predefined model interface is generated based on the connector blueprint and the model's metadata, so you must strictly follow the blueprint when creating the connector in order to avoid errors. + +The following connector blueprints currently support creating predefined model interfaces: + +- [Amazon Comprehend](https://github.com/opensearch-project/ml-commons/blob/2.x/docs/remote_inference_blueprints/amazon_comprehend_connector_blueprint.md) +- [Amazon Textract](https://github.com/opensearch-project/ml-commons/blob/2.x/docs/remote_inference_blueprints/amazon_textract_connector_blueprint.md) (Note that a predefined model interface is only available for the `DetectDocumentText` API; the `DetectEnities` API is not currently supported). +- [Amazon Bedrock AI21 Labs Jurassic](https://github.com/opensearch-project/ml-commons/blob/2.x/docs/remote_inference_blueprints/bedrock_connector_ai21labs_jurassic_blueprint.md) +- [Amazon Bedrock Anthropic Claude 3](https://github.com/opensearch-project/ml-commons/blob/2.x/docs/remote_inference_blueprints/bedrock_connector_anthropic_claude3_blueprint.md) +- [Amazon Bedrock Anthropic Claude](https://github.com/opensearch-project/ml-commons/blob/2.x/docs/remote_inference_blueprints/bedrock_connector_anthropic_claude_blueprint.md) +- [Amazon Bedrock Cohere Embed English v3](https://github.com/opensearch-project/ml-commons/blob/2.x/docs/remote_inference_blueprints/bedrock_connector_cohere_cohere.embed-english-v3_blueprint.md) +- [Amazon Bedrock Cohere Embed Multilingual v3](https://github.com/opensearch-project/ml-commons/blob/2.x/docs/remote_inference_blueprints/bedrock_connector_cohere_cohere.embed-multilingual-v3_blueprint.md) +- [Amazon Bedrock Titan Text Embeddings](https://github.com/opensearch-project/ml-commons/blob/2.x/docs/remote_inference_blueprints/bedrock_connector_titan_embedding_blueprint.md) +- [Amazon Bedrock Titan Multimodal Embeddings](https://github.com/opensearch-project/ml-commons/blob/2.x/docs/remote_inference_blueprints/bedrock_connector_titan_multimodal_embedding_blueprint.md) -To learn more about the JSON schema syntax, see [Understanding JSON Schema](https://json-schema.org/understanding-json-schema/). +To learn more about connector blueprints, see [Connector blueprints]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/blueprints/). #### Example request: Externally hosted model with an interface From f58bf565d47720ae6eedf850595bdd4d23c94d92 Mon Sep 17 00:00:00 2001 From: Sander van de Geijn Date: Tue, 30 Jul 2024 17:26:47 +0200 Subject: [PATCH 072/154] Update index-codecs.md (#7837) * Update index-codecs.md qat_deflate does not work on 2.14, it does on 2.15. 2.14: { "error": { "root_cause": [ { "type": "illegal_argument_exception", "reason": "unknown value for [index.codec] must be one of [default, lz4, best_compression, zlib] but was: qat_deflate" } ], "type": "illegal_argument_exception", "reason": "unknown value for [index.codec] must be one of [default, lz4, best_compression, zlib] but was: qat_deflate" }, "status": 400 } 2.15: { "acknowledged": true, "shards_acknowledged": true, "index": "x" } Signed-off-by: Sander van de Geijn * Update to 2.15 for all. Signed-off-by: Sander van de Geijn --------- Signed-off-by: Sander van de Geijn --- _im-plugin/index-codecs.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/_im-plugin/index-codecs.md b/_im-plugin/index-codecs.md index ed28aca8e8..81af8b899f 100644 --- a/_im-plugin/index-codecs.md +++ b/_im-plugin/index-codecs.md @@ -32,11 +32,11 @@ For the `zstd` and `zstd_no_dict` codecs, you can optionally specify a compressi When an index segment is created, it uses the current index codec for compression. If you update the index codec, any segment created after the update will use the new compression algorithm. For specific operation considerations, see [Index codec considerations for index operations](#index-codec-considerations-for-index-operations). {: .note} -As of OpenSearch 2.14, hardware-accelerated compression codecs for the `DEFLATE` and `LZ4` compression algorithms are available. These hardware-accelerated codecs are available on the latest 4th and 5th Gen Intel®️ Xeon®️ processors running Linux kernel 3.10 and later. For all other systems and platforms, the codecs use that platform's corresponding software implementations. +As of OpenSearch 2.15, hardware-accelerated compression codecs for the `DEFLATE` and `LZ4` compression algorithms are available. These hardware-accelerated codecs are available on the latest 4th and 5th Gen Intel®️ Xeon®️ processors running Linux kernel 3.10 and later. For all other systems and platforms, the codecs use that platform's corresponding software implementations. The new hardware-accelerated codecs can be used by setting one of the following `index.codec` values: -* `qat_lz4` (OpenSearch 2.14 and later): Hardware-accelerated `LZ4` -* `qat_deflate` (OpenSearch 2.14 and later): Hardware-accelerated `DEFLATE` +* `qat_lz4` (OpenSearch 2.15 and later): Hardware-accelerated `LZ4` +* `qat_deflate` (OpenSearch 2.15 and later): Hardware-accelerated `DEFLATE` `qat_deflate` offers a much better compression ratio than `qat_lz4`, with a modest drop in compression and decompression speed. {: .note} @@ -78,7 +78,7 @@ When creating a [snapshot]({{site.url}}{{site.baseurl}}/tuning-your-cluster/avai When you restore the indexes from a snapshot of a cluster to another cluster, it is important to verify that the target cluster supports the codecs of the segments in the source snapshot. For example, if the source snapshot contains segments of the `zstd` or `zstd_no_dict` codecs (introduced in OpenSearch 2.9), you won't be able to restore the snapshot to a cluster that runs on an older OpenSearch version because it doesn't support these codecs. -For hardware-accelerated compression codecs, available in OpenSearch 2.14 and later, the value of `index.codec.qatmode` affects how snapshots and restores are performed. If the value is `auto` (the default), then snapshots and restores work without issue. However, if the value is `hardware`, then it must be reset to `auto` in order for the restore process to succeed on systems lacking the hardware accelerator. +For hardware-accelerated compression codecs, available in OpenSearch 2.15 and later, the value of `index.codec.qatmode` affects how snapshots and restores are performed. If the value is `auto` (the default), then snapshots and restores work without issue. However, if the value is `hardware`, then it must be reset to `auto` in order for the restore process to succeed on systems lacking the hardware accelerator. You can modify the value of `index.codec.qatmode` during the restore process by setting its value as follows: `"index_settings": {"index.codec.qatmode": "auto"}`. {: .note} From 4993689ef57468dd7c4861071302cda0641d3a6e Mon Sep 17 00:00:00 2001 From: Tim <2527559+svitlo@users.noreply.github.com> Date: Tue, 30 Jul 2024 18:49:21 +0300 Subject: [PATCH 073/154] Update match.md (#7849) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Update match.md Corrected the name of the metric used in the field `fuzziness`. It is called the Damerau–Levenshtein distance because the field `fuzzy_transpositions` is equal to `true` by default. Signed-off-by: Tim <2527559+svitlo@users.noreply.github.com> Signed-off-by: svitlo * correct the description of the field 'fuzziness' Signed-off-by: svitlo * Update _query-dsl/full-text/match.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Tim <2527559+svitlo@users.noreply.github.com> Signed-off-by: svitlo --------- Signed-off-by: Tim <2527559+svitlo@users.noreply.github.com> Signed-off-by: svitlo Co-authored-by: svitlo Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _query-dsl/full-text/match.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/_query-dsl/full-text/match.md b/_query-dsl/full-text/match.md index b4db30ec1f..056ef76890 100644 --- a/_query-dsl/full-text/match.md +++ b/_query-dsl/full-text/match.md @@ -289,7 +289,7 @@ GET testindex/_search To account for typos, you can specify `fuzziness` for your query as either of the following: -- An integer that specifies the maximum allowed [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) for this edit. +- An integer that specifies the maximum allowed [Damerau–Levenshtein distance](https://en.wikipedia.org/wiki/Damerau–Levenshtein_distance) for this edit. - `AUTO`: - Strings of 0–2 characters must match exactly. - Strings of 3–5 characters allow 1 edit. @@ -454,7 +454,7 @@ Parameter | Data type | Description `analyzer` | String | The [analyzer]({{site.url}}{{site.baseurl}}/analyzers/index/) used to tokenize the query string text. Default is the index-time analyzer specified for the `default_field`. If no analyzer is specified for the `default_field`, the `analyzer` is the default analyzer for the index. `boost` | Floating-point | Boosts the clause by the given multiplier. Useful for weighing clauses in compound queries. Values in the [0, 1) range decrease relevance, and values greater than 1 increase relevance. Default is `1`. `enable_position_increments` | Boolean | When `true`, resulting queries are aware of position increments. This setting is useful when the removal of stop words leaves an unwanted "gap" between terms. Default is `true`. -`fuzziness` | String | The number of character edits (insert, delete, substitute) that it takes to change one word to another when determining whether a term matched a value. For example, the distance between `wined` and `wind` is 1. Valid values are non-negative integers or `AUTO`. The default, `AUTO`, chooses a value based on the length of each term and is a good choice for most use cases. +`fuzziness` | String | The number of character edits (insertions, deletions, substitutions, or transpositions) that it takes to change one word to another when determining whether a term matched a value. For example, the distance between `wined` and `wind` is 1. Valid values are non-negative integers or `AUTO`. The default, `AUTO`, chooses a value based on the length of each term and is a good choice for most use cases. `fuzzy_rewrite` | String | Determines how OpenSearch rewrites the query. Valid values are `constant_score`, `scoring_boolean`, `constant_score_boolean`, `top_terms_N`, `top_terms_boost_N`, and `top_terms_blended_freqs_N`. If the `fuzziness` parameter is not `0`, the query uses a `fuzzy_rewrite` method of `top_terms_blended_freqs_${max_expansions}` by default. Default is `constant_score`. `fuzzy_transpositions` | Boolean | Setting `fuzzy_transpositions` to `true` (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the `fuzziness` option. For example, the distance between `wind` and `wnid` is 1 if `fuzzy_transpositions` is true (swap "n" and "i") and 2 if it is false (delete "n", insert "n"). If `fuzzy_transpositions` is false, `rewind` and `wnid` have the same distance (2) from `wind`, despite the more human-centric opinion that `wnid` is an obvious typo. The default is a good choice for most use cases. `lenient` | Boolean | Setting `lenient` to `true` ignores data type mismatches between the query and the document field. For example, a query string of `"8.2"` could match a field of type `float`. Default is `false`. @@ -462,4 +462,4 @@ Parameter | Data type | Description `minimum_should_match` | Positive or negative integer, positive or negative percentage, combination | If the query string contains multiple search terms and you use the `or` operator, the number of terms that need to match for the document to be considered a match. For example, if `minimum_should_match` is 2, `wind often rising` does not match `The Wind Rises.` If `minimum_should_match` is `1`, it matches. For details, see [Minimum should match]({{site.url}}{{site.baseurl}}/query-dsl/minimum-should-match/). `operator` | String | If the query string contains multiple search terms, whether all terms need to match (`AND`) or only one term needs to match (`OR`) for a document to be considered a match. Valid values are:
- `OR`: The string `to be` is interpreted as `to OR be`
- `AND`: The string `to be` is interpreted as `to AND be`
Default is `OR`. `prefix_length` | Non-negative integer | The number of leading characters that are not considered in fuzziness. Default is `0`. -`zero_terms_query` | String | In some cases, the analyzer removes all terms from a query string. For example, the `stop` analyzer removes all terms from the string `an but this`. In those cases, `zero_terms_query` specifies whether to match no documents (`none`) or all documents (`all`). Valid values are `none` and `all`. Default is `none`. \ No newline at end of file +`zero_terms_query` | String | In some cases, the analyzer removes all terms from a query string. For example, the `stop` analyzer removes all terms from the string `an but this`. In those cases, `zero_terms_query` specifies whether to match no documents (`none`) or all documents (`all`). Valid values are `none` and `all`. Default is `none`. From 10f4b1e43bd6c112249150503ee14648f357868b Mon Sep 17 00:00:00 2001 From: Ashish Singh Date: Tue, 30 Jul 2024 21:27:34 +0530 Subject: [PATCH 074/154] Update documentation for create / update repository api (#7851) * Update documentation for create / update repository api Signed-off-by: Ashish Singh * Update create-repository.md * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Ashish Singh Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Nathan Bower --- _api-reference/snapshots/create-repository.md | 48 ++++++++++++------- 1 file changed, 30 insertions(+), 18 deletions(-) diff --git a/_api-reference/snapshots/create-repository.md b/_api-reference/snapshots/create-repository.md index 8ee7885ca8..ca4c04114c 100644 --- a/_api-reference/snapshots/create-repository.md +++ b/_api-reference/snapshots/create-repository.md @@ -30,12 +30,20 @@ PUT /_snapshot/my-first-repo/ Parameter | Data type | Description :--- | :--- | :--- -repository | String | Repository name | +`repository` | String | Repository name | ## Request parameters Request parameters depend on the type of repository: `fs` or `s3`. +### Common parameters + +The following table lists parameters that can be used with both the `fs` and `s3` repositories. + +Request field | Description +:--- | :--- +`prefix_mode_verification` | When enabled, adds a hashed value of a random seed to the prefix for repository verification. For remote-store-enabled clusters, you can add the `setting.prefix_mode_verification` setting to the node attributes for the supplied repository. This field works with both new and existing repositories. Optional. + ### fs repository Request field | Description @@ -48,20 +56,6 @@ Request field | Description `remote_store_index_shallow_copy` | Boolean | Determines whether the snapshot of the remote store indexes are captured as a shallow copy. Default is `false`. `readonly` | Whether the repository is read-only. Useful when migrating from one cluster (`"readonly": false` when registering) to another cluster (`"readonly": true` when registering). Optional. -## Example request - -The following example registers an `fs` repository using the local directory `/mnt/snapshots` as `location`. - -```json -PUT /_snapshot/my-fs-repository -{ - "type": "fs", - "settings": { - "location": "/mnt/snapshots" - } -} -``` -{% include copy-curl.html %} #### s3 repository @@ -85,9 +79,27 @@ Request field | Description For the `base_path` parameter, do not enter the `s3://` prefix when entering your S3 bucket details. Only the name of the bucket is required. {: .note} -## Example request +## Example requests -The following request registers a new S3 repository called `my-opensearch-repo` in an existing bucket called `my-open-search-bucket`. By default, all snapshots are stored in the `my/snapshot/directory`. +### `fs` + +The following example registers an `fs` repository using the local directory `/mnt/snapshots` as `location`: + +```json +PUT /_snapshot/my-fs-repository +{ + "type": "fs", + "settings": { + "location": "/mnt/snapshots" + } +} +``` +{% include copy-curl.html %} + +### `s3` + + +The following request registers a new S3 repository called `my-opensearch-repo` in an existing bucket called `my-open-search-bucket`. By default, all snapshots are stored in the `my/snapshot/directory`: ```json PUT /_snapshot/my-opensearch-repo @@ -112,4 +124,4 @@ Upon success, the following JSON object is returned: ``` To verify that the repository was registered, use the [Get snapshot repository]({{site.url}}{{site.baseurl}}/api-reference/snapshots/get-snapshot-repository) API, passing the repository name as the `repository` path parameter. -{: .note} \ No newline at end of file +{: .note} From 4388aa0852a425d27081f0347a67625f2ea0a311 Mon Sep 17 00:00:00 2001 From: Sokratis Papadopoulos Date: Tue, 30 Jul 2024 18:08:33 +0200 Subject: [PATCH 075/154] Add documentation on Kerberos configuration (#7844) * Add documentation on Kerberos configuration. Signed-off-by: Sokratis Papadopoulos * Add krb doc Signed-off-by: Sokratis Papadopoulos * Reorder kerberos in backend list Signed-off-by: Sokratis Papadopoulos * Reformat Signed-off-by: Sokratis Papadopoulos * Typo on acceptor_principal Signed-off-by: Sokratis Papadopoulos * Fix style Signed-off-by: Sokratis Papadopoulos * Fix style Signed-off-by: Sokratis Papadopoulos * Typo on acceptor_principal Signed-off-by: Sokratis Papadopoulos * Add default value for boolean params Signed-off-by: Sokratis Papadopoulos * Update kerberos.md --------- Signed-off-by: Sokratis Papadopoulos Co-authored-by: Sokratis Papadopoulos Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _security/authentication-backends/kerberos.md | 62 ++++++++++++++++++ _security/configuration/configuration.md | 64 +------------------ 2 files changed, 64 insertions(+), 62 deletions(-) create mode 100644 _security/authentication-backends/kerberos.md diff --git a/_security/authentication-backends/kerberos.md b/_security/authentication-backends/kerberos.md new file mode 100644 index 0000000000..40b041abcb --- /dev/null +++ b/_security/authentication-backends/kerberos.md @@ -0,0 +1,62 @@ +--- +layout: default +title: Kerberos +parent: Authentication backends +nav_order: 75 +--- + +# Kerberos + +Kerberos is a robust and secure method for user authentication that prevents passwords from being sent over the internet by issuing "tickets" for secure identity verification. + +In order to use Kerberos authentication, you must set the following settings in `opensearch.yml` and `config.yml`. + +## OpenSearch node configuration + +In `opensearch.yml`, define the following settings: + +```yml +plugins.security.kerberos.krb5_filepath: '/etc/krb5.conf' +plugins.security.kerberos.acceptor_keytab_filepath: 'opensearch_keytab.tab' +plugins.security.kerberos.acceptor_principal: 'HTTP/localhost' +``` + +Name | Description +:--- | :--- +`krb5_filepath` | The path to your Kerberos configuration file. This file contains various settings regarding your Kerberos installation, for example, the `realm` names, `hostnames`, and ports of the Kerberos key distribution center (KDC). +`acceptor_keytab_filepath` | The path to the `keytab` file, which contains the principal that the Security plugin uses to issue requests through Kerberos. +`acceptor_principal` | The principal that the Security plugin uses to issue requests through Kerberos. This value must be present in the `keytab` file. + +Due to security restrictions, the `keytab` file must be placed in `config` or a subdirectory, and the path in `opensearch.yml` must be relative, not absolute. +{: .note } + +## Cluster security configuration + +The following example shows a typical Kerberos authentication domain in `config.yml`: + +```yml +kerberos_auth_domain: + enabled: true + order: 1 + http_authenticator: + type: kerberos + challenge: true + config: + krb_debug: false + strip_realm_from_principal: true + authentication_backend: + type: noop +``` + +Authentication through Kerberos when using a browser on an HTTP level is achieved using SPNEGO. Kerberos/SPNEGO implementations vary, depending on your browser and operating system. This is important when deciding if you need to set the `challenge` flag to `true` or `false`. + +As with [HTTP Basic Authentication]({{site.url}}{{site.baseurl}}/security/authentication-backends/basic-authc/), this flag determines how the Security plugin should react when no `Authorization` header is found in the HTTP request or if this header does not equal `negotiate`. + +If set to `true`, the Security plugin sends a response with status code 401 and a `WWW-Authenticate` header set to `negotiate`. This tells the client (browser) to resend the request with the `Authorization` header set. If set to `false`, the Security plugin cannot extract the credentials from the request, and authentication fails. Setting `challenge` to `false` thus makes sense only if the Kerberos credentials are sent in the initial request. + +Name | Description +:--- | :--- +`krb_debug` | As the name implies, setting it to `true` outputs Kerberos-specific debugging messages to `stdout`. Use this setting if you encounter problems with your Kerberos integration. Default is `false`. +`strip_realm_from_principal` | When set it to `true`, the Security plugin strips the realm from the user name. Default: `true`. + +Because Kerberos/SPNEGO authenticates users on an HTTP level, no additional `authentication_backend` is needed. Set this value to `noop`. diff --git a/_security/configuration/configuration.md b/_security/configuration/configuration.md index 2a038b7fb9..57008a4158 100755 --- a/_security/configuration/configuration.md +++ b/_security/configuration/configuration.md @@ -97,6 +97,7 @@ The `type` setting for `http_authenticator` accepts the following values. For mo | Value | Description | | :--- | :--- | | `basic` | HTTP basic authentication. For more information about using basic authentication, see the HTTP basic authentication documentation. | +| `kerberos` | Kerberos authentication. See the Kerberos documentation for additional configuration information. | | `jwt` | JSON Web Token (JWT) authentication. See the JSON Web Token documentation for additional configuration information. | | `openid` | OpenID Connect authentication. See the OpenID Connect documentation for additional configuration information. | | `saml` | SAML authentication. See the SAML documentation for additional configuration information. | @@ -162,65 +163,4 @@ To learn about configuring the authentication backends, see the [Authentication * [Active Directory and LDAP]({{site.url}}{{site.baseurl}}/security/authentication-backends/ldap/) * [Proxy-based authentication]({{site.url}}{{site.baseurl}}/security/authentication-backends/proxy/) * [Client certificate authentication]({{site.url}}{{site.baseurl}}/security/authentication-backends/client-auth/) - - - - +* [Kerberos authentication]({{site.url}}{{site.baseurl}}/security/authentication-backends/kerberos/) From b672be7c7ec0f7b185dd3c2004427b2f788dcc46 Mon Sep 17 00:00:00 2001 From: Tim <2527559+svitlo@users.noreply.github.com> Date: Tue, 30 Jul 2024 20:38:26 +0300 Subject: [PATCH 076/154] Update index.md (#7865) Corrected the name of the metric used on the field `fuzzy`. Signed-off-by: Tim <2527559+svitlo@users.noreply.github.com> --- _query-dsl/term/index.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_query-dsl/term/index.md b/_query-dsl/term/index.md index 4a789b0b72..e262f31975 100644 --- a/_query-dsl/term/index.md +++ b/_query-dsl/term/index.md @@ -30,6 +30,6 @@ Query type | Description [`range`]({{site.url}}{{site.baseurl}}/query-dsl/term/range/) | Searches for documents with field values in a specific range. [`prefix`]({{site.url}}{{site.baseurl}}/query-dsl/term/prefix/) | Searches for documents containing terms that begin with a specific prefix. [`exists`]({{site.url}}{{site.baseurl}}/query-dsl/term/exists/) | Searches for documents with any indexed value in a specific field. -[`fuzzy`]({{site.url}}{{site.baseurl}}/query-dsl/term/fuzzy/) | Searches for documents containing terms that are similar to the search term within the maximum allowed [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance). The Levenshtein distance measures the number of one-character changes needed to change one term to another term. +[`fuzzy`]({{site.url}}{{site.baseurl}}/query-dsl/term/fuzzy/) | Searches for documents containing terms that are similar to the search term within the maximum allowed [Damerau–Levenshtein distance](https://en.wikipedia.org/wiki/Damerau–Levenshtein_distance). The Damerau–Levenshtein distance measures the number of one-character changes needed to change one term to another term. [`wildcard`]({{site.url}}{{site.baseurl}}/query-dsl/term/wildcard/) | Searches for documents containing terms that match a wildcard pattern. -[`regexp`]({{site.url}}{{site.baseurl}}/query-dsl/term/regexp/) | Searches for documents containing terms that match a regular expression. \ No newline at end of file +[`regexp`]({{site.url}}{{site.baseurl}}/query-dsl/term/regexp/) | Searches for documents containing terms that match a regular expression. From e8e340ea5b62a47a36812a609eb902ec53401413 Mon Sep 17 00:00:00 2001 From: AWSHurneyt Date: Tue, 30 Jul 2024 12:16:10 -0700 Subject: [PATCH 077/154] Update settings.md (#7868) Changed default setting to true for v2.16. Signed-off-by: AWSHurneyt --- _observing-your-data/alerting/settings.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_observing-your-data/alerting/settings.md b/_observing-your-data/alerting/settings.md index c7eefe528b..ba5ed0e8c7 100644 --- a/_observing-your-data/alerting/settings.md +++ b/_observing-your-data/alerting/settings.md @@ -54,7 +54,7 @@ Setting | Default | Description `plugins.alerting.alert_history_retention_period` | 60d | The amount of time to store history indexes before automatically deleting them. `plugins.alerting.destination.allow_list` | ["chime", "slack", "custom_webhook", "email", "test_action"] | The list of allowed destinations. If you don't want to allow users to a certain type of destination, you can remove it from this list, but we recommend leaving this setting as-is. `plugins.alerting.filter_by_backend_roles` | "false" | Restricts access to monitors by backend role. See [Alerting security]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/security/). -`plugins.alerting.cross_cluster_monitoring_enabled` | "false" | Toggles whether cluster metrics monitors support running against remote clusters. +`plugins.alerting.cross_cluster_monitoring_enabled` | "true" | Toggles whether cluster metrics monitors support running against remote clusters. `plugins.scheduled_jobs.sweeper.period` | 5m | The alerting feature uses its "job sweeper" component to periodically check for new or updated jobs. This setting is the rate at which the sweeper checks to see if any jobs (monitors) have changed and need to be rescheduled. `plugins.scheduled_jobs.sweeper.page_size` | 100 | The page size for the sweeper. You shouldn't need to change this value. `plugins.scheduled_jobs.sweeper.backoff_millis` | 50ms | The amount of time the sweeper waits between retries---increases exponentially after each failed retry. From da34229754739f870a5ab831b340987e422d239d Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Tue, 30 Jul 2024 14:35:38 -0500 Subject: [PATCH 078/154] Add compatibility page (#7821) * Adding OS matrix changelog and remove CentOS7 as it is approaching EOL Signed-off-by: Peter Zhu * Add new compatibility page Signed-off-by: Archer * Fix links Signed-off-by: Archer * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _install-and-configure/install-opensearch/index.md Co-authored-by: Heather Halter Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _install-and-configure/install-opensearch/index.md Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Peter Zhu Signed-off-by: Archer Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Peter Zhu Co-authored-by: Heather Halter Co-authored-by: Nathan Bower --- .../install-opensearch/index.md | 17 ++-------- .../install-opensearch/rpm.md | 2 +- _install-and-configure/os-comp.md | 32 +++++++++++++++++++ 3 files changed, 35 insertions(+), 16 deletions(-) create mode 100644 _install-and-configure/os-comp.md diff --git a/_install-and-configure/install-opensearch/index.md b/_install-and-configure/install-opensearch/index.md index 541321bcdd..e1d63927b0 100644 --- a/_install-and-configure/install-opensearch/index.md +++ b/_install-and-configure/install-opensearch/index.md @@ -13,22 +13,9 @@ redirect_from: # Installing OpenSearch -This section details how to install OpenSearch on your host, including which operating systems are [compatible with OpenSearch](#operating-system-compatibility), which [ports to open](#network-requirements), and which [important settings](#important-settings) to configure on your host. +This section provides information about how to install OpenSearch on your host, including which [ports to open](#network-requirements) and which [important settings](#important-settings) to configure on your host. -## Operating system compatibility - -OpenSearch and OpenSearch Dashboards are compatible with Red Hat Enterprise Linux (RHEL) and Debian-based Linux distributions that use [`systemd`](https://en.wikipedia.org/wiki/Systemd), such as Amazon Linux, and Ubuntu Long-Term Support (LTS). While OpenSearch and OpenSearch Dashboards should work on most Linux distributions, we only test a subset. - -The following table lists the operating system versions that we are currently testing on: - -OS | Version -:---------- | :-------- -CentOS | 7 -Rocky Linux | 8 -Alma Linux | 8 -Amazon Linux | 2/2023 -Ubuntu | 20.04 -Windows Server | 2019 +For operating system compatibility, see [Compatible operating systems]({{site.url}}{{site.baseurl}}/install-and-configure/os-comp/). ## File system recommendations diff --git a/_install-and-configure/install-opensearch/rpm.md b/_install-and-configure/install-opensearch/rpm.md index 1810273fb1..85872b7c34 100644 --- a/_install-and-configure/install-opensearch/rpm.md +++ b/_install-and-configure/install-opensearch/rpm.md @@ -27,7 +27,7 @@ Generally speaking, installing OpenSearch from the RPM distribution can be broke 1. **Configure OpenSearch for your environment.** - Apply basic settings to OpenSearch and start using it in your environment. -The RPM distribution provides everything you need to run OpenSearch inside Red Hat or Red Hat–based Linux Distributions. For a list of supported operating systems, see [Operating system compatibility]({{site.url}}{{site.baseurl}}/install-and-configure/install-opensearch/index/#operating-system-compatibility). +The RPM distribution provides everything you need to run OpenSearch inside Red Hat or Red Hat–based Linux Distributions. For a list of supported operating systems, see [Operating system compatibility]({{site.url}}{{site.baseurl}}/install-and-configure/os-comp/). This guide assumes that you are comfortable working from the Linux command line interface (CLI). You should understand how to input commands, navigate between directories, and edit text files. Some example commands reference the `vi` text editor, but you may use any text editor available. {:.note} diff --git a/_install-and-configure/os-comp.md b/_install-and-configure/os-comp.md new file mode 100644 index 0000000000..a62b82b7da --- /dev/null +++ b/_install-and-configure/os-comp.md @@ -0,0 +1,32 @@ +--- +layout: default +title: Compatible operating systems +nav_order: 12 +--- + +OpenSearch and OpenSearch Dashboards are compatible with Red Hat Enterprise Linux (RHEL) and Debian-based Linux distributions that use [`systemd`](https://en.wikipedia.org/wiki/Systemd), such as Amazon Linux, and Ubuntu Long-Term Support (LTS). While OpenSearch and OpenSearch Dashboards should work on most Linux distributions, we only test a subset. + +## Supported operating systems + +The following table lists the operating system versions that we are currently testing: + +OS | Version +:---------- | :-------- +Rocky Linux | 8 +Alma Linux | 8 +Amazon Linux | 2/2023 +Ubuntu | 20.04 +Windows Server | 2019 + + +## Change log + +The following table lists changes made to operating system compatibility. + +
+ +| Date | Issue | PR | Details | +|:-----------|:-------|:-------|:--------------------------| +| 2024-07-23 | [opensearch-build Issue 4379](https://github.com/opensearch-project/opensearch-build/issues/4379) | [PR 7821](https://github.com/opensearch-project/documentation-website/pull/7821) | Remove [CentOS7](https://blog.centos.org/2023/04/end-dates-are-coming-for-centos-stream-8-and-centos-linux-7/). | +| 2024-03-08 | [opensearch-build Issue 4573](https://github.com/opensearch-project/opensearch-build/issues/4573) | [PR 6637](https://github.com/opensearch-project/documentation-website/pull/6637) | Remove CentOS8, add Almalinux8/Rockylinux8, and remove Ubuntu 16.04/18.04 because we currently only test on 20.04 | +| 2023-06-06 | [documentation-website Issue 4217](https://github.com/opensearch-project/documentation-website/issues/4217) | [PR 4218](https://github.com/opensearch-project/documentation-website/pull/4218) | Support matrix creation | \ No newline at end of file From 3dcff57d359527f161fd8f8608c8d2f160322fef Mon Sep 17 00:00:00 2001 From: Tim <2527559+svitlo@users.noreply.github.com> Date: Wed, 31 Jul 2024 00:20:20 +0300 Subject: [PATCH 079/154] Update fuzzy.md (#7842) * Update fuzzy.md Corrected the name of the string metric used by default Signed-off-by: Tim <2527559+svitlo@users.noreply.github.com> Signed-off-by: svitlo * correct the line 17 of fuzzy.md to clarify the distance name and the role of the field 'transpositions' Signed-off-by: svitlo * correct the line 17 to not break the thought regarding the usage of the field 'max_expansions' Signed-off-by: svitlo * Update _query-dsl/term/fuzzy.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Tim <2527559+svitlo@users.noreply.github.com> Signed-off-by: svitlo --------- Signed-off-by: Tim <2527559+svitlo@users.noreply.github.com> Signed-off-by: svitlo Co-authored-by: svitlo Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _query-dsl/term/fuzzy.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/_query-dsl/term/fuzzy.md b/_query-dsl/term/fuzzy.md index bf2bd43bba..7a426fd794 100644 --- a/_query-dsl/term/fuzzy.md +++ b/_query-dsl/term/fuzzy.md @@ -7,14 +7,14 @@ nav_order: 20 # Fuzzy query -A fuzzy query searches for documents containing terms that are similar to the search term within the maximum allowed [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance). The Levenshtein distance measures the number of one-character changes needed to change one term to another term. These changes include: +A fuzzy query searches for documents containing terms that are similar to the search term within the maximum allowed [Damerau–Levenshtein distance](https://en.wikipedia.org/wiki/Damerau–Levenshtein_distance). The Damerau–Levenshtein distance measures the number of one-character changes needed to change one term to another term. These changes include: - Replacements: **c**at to **b**at - Insertions: cat to cat**s** - Deletions: **c**at to at - Transpositions: **ca**t to **ac**t -A fuzzy query creates a list of all possible expansions of the search term that fall within the Levenshtein distance. You can specify the maximum number of such expansions in the `max_expansions` field. Then it searches for documents that match any of the expansions. +A fuzzy query creates a list of all possible expansions of the search term that fall within the Damerau-Levenshtein distance. You can specify the maximum number of such expansions in the `max_expansions` field. The query then searches for documents that match any of the expansions. If you set the `transpositions` parameter to `false`, then your search will use the classic [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance). The following example query searches for the speaker `HALET` (misspelled `HAMLET`). The maximum edit distance is not specified, so the default `AUTO` edit distance is used: @@ -90,4 +90,4 @@ Specifying a large value in `max_expansions` can lead to poor performance, espec {: .warning} If [`search.allow_expensive_queries`]({{site.url}}{{site.baseurl}}/query-dsl/index/#expensive-queries) is set to `false`, fuzzy queries are not run. -{: .important} \ No newline at end of file +{: .important} From 6187028068aa3120c38b486655ce908d52e6b121 Mon Sep 17 00:00:00 2001 From: Tim <2527559+svitlo@users.noreply.github.com> Date: Wed, 31 Jul 2024 19:57:59 +0300 Subject: [PATCH 080/154] Update completion.md (#7872) Corrected the name of the metric used on the descriptions of the fields `fuzziness` and `prefix_length`. Signed-off-by: Tim <2527559+svitlo@users.noreply.github.com> --- _field-types/supported-field-types/completion.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/_field-types/supported-field-types/completion.md b/_field-types/supported-field-types/completion.md index 9214c25857..85c803baa1 100644 --- a/_field-types/supported-field-types/completion.md +++ b/_field-types/supported-field-types/completion.md @@ -315,9 +315,9 @@ The following table lists the parameters accepted by the fuzzy completion sugges Parameter | Description :--- | :--- -`fuzziness` | Fuzziness can be set as one of the following:
1. An integer that specifies the maximum allowed [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) for this edit.
2. `AUTO`: Strings of 0–2 characters must match exactly, strings of 3–5 characters allow 1 edit, and strings longer than 5 characters allow 2 edits.
Default is `AUTO`. +`fuzziness` | Fuzziness can be set as one of the following:
1. An integer that specifies the maximum allowed [Damerau–Levenshtein distance](https://en.wikipedia.org/wiki/Damerau–Levenshtein_distance) for this edit.
2. `AUTO`: Strings of 0–2 characters must match exactly, strings of 3–5 characters allow 1 edit, and strings longer than 5 characters allow 2 edits.
Default is `AUTO`. `min_length` | An integer that specifies the minimum length the input must be to start returning suggestions. If the search term is shorter than `min_length`, no suggestions are returned. Default is 3. -`prefix_length` | An integer that specifies the minimum length the matched prefix must be to start returning suggestions. If the prefix of `prefix_length` is not matched, but the search term is still within the Levenshtein distance, no suggestions are returned. Default is 1. +`prefix_length` | An integer that specifies the minimum length the matched prefix must be to start returning suggestions. If the prefix of `prefix_length` is not matched, but the search term is still within the Damerau–Levenshtein distance, no suggestions are returned. Default is 1. `transpositions` | A Boolean value that specifies to count transpositions (interchanges of adjacent characters) as one edit instead of two. Example: The suggestion's `input` parameter is `abcde` and the `fuzziness` is 1. If `transpositions` is set to `true`, `abdce` will match, but if `transpositions` is set to `false`, `abdce` will not match. Default is `true`. `unicode_aware` | A Boolean value that specifies whether to use Unicode code points when measuring the edit distance, transposition, and length. If `unicode_aware` is set to `true`, the measurement is slower. Default is `false`, in which case distances are measured in bytes. @@ -389,4 +389,4 @@ The response matches the string "abcde": ] } } -``` \ No newline at end of file +``` From c84fc04e84d9d2c22b5341d518cc2e55798f4073 Mon Sep 17 00:00:00 2001 From: Songkan Tang Date: Thu, 1 Aug 2024 02:53:40 +0800 Subject: [PATCH 081/154] Add experimental feature flag to dashboard assistant (#7855) * Add experimental feature flag to dashboard assistant Signed-off-by: Songkan Tang * Update _dashboards/dashboards-assistant/index.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Songkan Tang * Update _dashboards/dashboards-assistant/index.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Songkan Tang Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _dashboards/dashboards-assistant/index.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/_dashboards/dashboards-assistant/index.md b/_dashboards/dashboards-assistant/index.md index 1c2f0f8299..bf2d754be8 100644 --- a/_dashboards/dashboards-assistant/index.md +++ b/_dashboards/dashboards-assistant/index.md @@ -120,6 +120,17 @@ The following screenshot shows a saved conversation, along with actions you can Notebooks interface with saved OpenSearch Assistant conversations +## Enabling Dashboards Assistant experimental features +**Introduced 2.16** +{: .label .label-purple } + +To enable experimental assistant features, such as text to visualization, locate your copy of the `opensearch_dashboards.yml` file and set the following option: + +```yaml +assistant.next.enabled: true +``` +{% include copy-curl.html %} + ## Related articles - [Getting started guide for OpenSearch Assistant in OpenSearch Dashboards](https://github.com/opensearch-project/dashboards-assistant/blob/main/GETTING_STARTED_GUIDE.md) From cd3e5f51976daa51b35468a951626c0784312bc8 Mon Sep 17 00:00:00 2001 From: Qi Chen Date: Wed, 31 Jul 2024 13:56:18 -0500 Subject: [PATCH 082/154] MAINT: documentation update for certain processors (#7713) MAINT: documentation update for processors Signed-off-by: George Chen --- _data-prepper/common-use-cases/log-enrichment.md | 2 +- _data-prepper/common-use-cases/trace-analytics.md | 6 +++--- _data-prepper/pipelines/configuration/processors/date.md | 7 +++++++ .../pipelines/configuration/processors/delete_entries.md | 7 +++++++ _data-prepper/pipelines/configuration/processors/grok.md | 9 ++++++++- .../processors/{otel-metrics.md => otel_metrics.md} | 9 ++++++++- .../processors/{otel-trace-raw.md => otel_traces.md} | 9 ++++++++- .../{service-map-stateful.md => service_map.md} | 9 ++++++++- .../processors/{split-string.md => split_string.md} | 7 +++++++ .../{string-converter.md => string_converter.md} | 7 +++++++ .../{substitute-string.md => substitute_string.md} | 7 +++++++ .../processors/{trim-string.md => trim_string.md} | 7 +++++++ .../pipelines/configuration/processors/truncate.md | 7 +++++++ _observing-your-data/trace/ta-dashboards.md | 2 +- 14 files changed, 86 insertions(+), 9 deletions(-) rename _data-prepper/pipelines/configuration/processors/{otel-metrics.md => otel_metrics.md} (93%) rename _data-prepper/pipelines/configuration/processors/{otel-trace-raw.md => otel_traces.md} (82%) rename _data-prepper/pipelines/configuration/processors/{service-map-stateful.md => service_map.md} (77%) rename _data-prepper/pipelines/configuration/processors/{split-string.md => split_string.md} (76%) rename _data-prepper/pipelines/configuration/processors/{string-converter.md => string_converter.md} (66%) rename _data-prepper/pipelines/configuration/processors/{substitute-string.md => substitute_string.md} (75%) rename _data-prepper/pipelines/configuration/processors/{trim-string.md => trim_string.md} (67%) diff --git a/_data-prepper/common-use-cases/log-enrichment.md b/_data-prepper/common-use-cases/log-enrichment.md index 0d8ce4ab7d..0c878dd76e 100644 --- a/_data-prepper/common-use-cases/log-enrichment.md +++ b/_data-prepper/common-use-cases/log-enrichment.md @@ -370,7 +370,7 @@ The `date` processor can generate timestamps for incoming events if you specify ### Deriving punctuation patterns -The [`substitute_string`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/substitute-string/) processor (which is one of the mutate string processors) lets you derive a punctuation pattern from incoming events. In the following example pipeline, the processor will scan incoming Apache log events and derive punctuation patterns from them: +The [`substitute_string`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/substitute_string/) processor (which is one of the mutate string processors) lets you derive a punctuation pattern from incoming events. In the following example pipeline, the processor will scan incoming Apache log events and derive punctuation patterns from them: ```yaml processor: diff --git a/_data-prepper/common-use-cases/trace-analytics.md b/_data-prepper/common-use-cases/trace-analytics.md index 033830351a..3deca7b632 100644 --- a/_data-prepper/common-use-cases/trace-analytics.md +++ b/_data-prepper/common-use-cases/trace-analytics.md @@ -32,7 +32,7 @@ To monitor trace analytics in Data Prepper, we provide three pipelines: `entry-p ### OpenTelemetry trace source -The [OpenTelemetry source]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/otel-trace-raw/) accepts trace data from the OpenTelemetry Collector. The source follows the [OpenTelemetry Protocol](https://github.com/open-telemetry/opentelemetry-specification/tree/master/specification/protocol) and officially supports transport over gRPC and the use of industry-standard encryption (TLS/HTTPS). +The [OpenTelemetry source]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/otel_traces/) accepts trace data from the OpenTelemetry Collector. The source follows the [OpenTelemetry Protocol](https://github.com/open-telemetry/opentelemetry-specification/tree/master/specification/protocol) and officially supports transport over gRPC and the use of industry-standard encryption (TLS/HTTPS). ### Processor @@ -49,8 +49,8 @@ OpenSearch provides a generic sink that writes data to OpenSearch as the destina The sink provides specific configurations for the trace analytics feature. These configurations allow the sink to use indexes and index templates specific to trace analytics. The following OpenSearch indexes are specific to trace analytics: -* otel-v1-apm-span –- The *otel-v1-apm-span* index stores the output from the [otel_traces_raw]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/otel-trace-raw/) processor. -* otel-v1-apm-service-map –- The *otel-v1-apm-service-map* index stores the output from the [service_map_stateful]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/service-map-stateful/) processor. +* otel-v1-apm-span –- The *otel-v1-apm-span* index stores the output from the [otel_traces_raw]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/otel_traces/) processor. +* otel-v1-apm-service-map –- The *otel-v1-apm-service-map* index stores the output from the [service_map_stateful]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/service_map/) processor. ## Trace tuning diff --git a/_data-prepper/pipelines/configuration/processors/date.md b/_data-prepper/pipelines/configuration/processors/date.md index c44a10ba16..4f65f7b593 100644 --- a/_data-prepper/pipelines/configuration/processors/date.md +++ b/_data-prepper/pipelines/configuration/processors/date.md @@ -15,6 +15,13 @@ The `date` processor adds a default timestamp to an event, parses timestamp fiel The following table describes the options you can use to configure the `date` processor. + + Option | Required | Type | Description :--- | :--- | :--- | :--- diff --git a/_data-prepper/pipelines/configuration/processors/delete_entries.md b/_data-prepper/pipelines/configuration/processors/delete_entries.md index c9a93a1f3e..e7c022c6a7 100644 --- a/_data-prepper/pipelines/configuration/processors/delete_entries.md +++ b/_data-prepper/pipelines/configuration/processors/delete_entries.md @@ -14,6 +14,13 @@ The `delete_entries` processor deletes entries, such as key-value pairs, from an You can configure the `delete_entries` processor with the following options. + + | Option | Required | Description | :--- | :--- | :--- | `with_keys` | Yes | An array of keys for the entries to be deleted. | diff --git a/_data-prepper/pipelines/configuration/processors/grok.md b/_data-prepper/pipelines/configuration/processors/grok.md index 16f72c4968..3724278adf 100644 --- a/_data-prepper/pipelines/configuration/processors/grok.md +++ b/_data-prepper/pipelines/configuration/processors/grok.md @@ -1,6 +1,6 @@ --- layout: default -title: Grok +title: grok parent: Processors grand_parent: Pipelines nav_order: 50 @@ -14,6 +14,13 @@ The Grok processor uses pattern matching to structure and extract important keys The following table describes options you can use with the Grok processor to structure your data and make your data easier to query. + + Option | Required | Type | Description :--- | :--- |:--- | :--- `break_on_match` | No | Boolean | Specifies whether to match all patterns (`true`) or stop once the first successful match is found (`false`). Default is `true`. diff --git a/_data-prepper/pipelines/configuration/processors/otel-metrics.md b/_data-prepper/pipelines/configuration/processors/otel_metrics.md similarity index 93% rename from _data-prepper/pipelines/configuration/processors/otel-metrics.md rename to _data-prepper/pipelines/configuration/processors/otel_metrics.md index 08fb72810e..6fc82f5deb 100644 --- a/_data-prepper/pipelines/configuration/processors/otel-metrics.md +++ b/_data-prepper/pipelines/configuration/processors/otel_metrics.md @@ -16,7 +16,7 @@ To get started, add the following processor to your `pipeline.yaml` configuratio ``` yaml processor: - - otel_metrics_raw_processor: + - otel_metrics: ``` {% include copy.html %} @@ -24,6 +24,13 @@ processor: You can use the following optional parameters to configure histogram buckets and their default values. A histogram displays numerical data by grouping data into buckets. You can use histogram buckets to view sets of events that are organized by the total event count and aggregate sum for all events. For more detailed information, see [OpenTelemetry Histograms](https://opentelemetry.io/docs/reference/specification/metrics/data-model/#histogram). + + | Parameter | Default value | Description | | :--- | :--- | :--- | | `calculate_histogram_buckets` | `True` | Whether or not to calculate histogram buckets. | diff --git a/_data-prepper/pipelines/configuration/processors/otel-trace-raw.md b/_data-prepper/pipelines/configuration/processors/otel_traces.md similarity index 82% rename from _data-prepper/pipelines/configuration/processors/otel-trace-raw.md rename to _data-prepper/pipelines/configuration/processors/otel_traces.md index 395956a668..6d26a5aca8 100644 --- a/_data-prepper/pipelines/configuration/processors/otel-trace-raw.md +++ b/_data-prepper/pipelines/configuration/processors/otel_traces.md @@ -1,6 +1,6 @@ --- layout: default -title: otel_trace +title: otel_traces parent: Processors grand_parent: Pipelines nav_order: 75 @@ -23,6 +23,13 @@ This processor includes the following parameters. The following table describes the options you can use to configure the `otel_trace` processor. + + Option | Required | Type | Description :--- | :--- | :--- | :--- trace_flush_interval | No | Integer | Represents the time interval in seconds to flush all the descendant spans without any root span. Default is 180. diff --git a/_data-prepper/pipelines/configuration/processors/service-map-stateful.md b/_data-prepper/pipelines/configuration/processors/service_map.md similarity index 77% rename from _data-prepper/pipelines/configuration/processors/service-map-stateful.md rename to _data-prepper/pipelines/configuration/processors/service_map.md index a05f44863a..b62e222fd5 100644 --- a/_data-prepper/pipelines/configuration/processors/service-map-stateful.md +++ b/_data-prepper/pipelines/configuration/processors/service_map.md @@ -14,6 +14,13 @@ The `service_map` processor uses OpenTelemetry data to create a distributed serv The following table describes the option you can use to configure the `service_map` processor. + + Option | Required | Type | Description :--- | :--- | :--- | :--- window_duration | No | Integer | Represents the fixed time window, in seconds, during which service map relationships are evaluated. Default value is 180. @@ -32,7 +39,7 @@ The following table describes common [Abstract processor](https://github.com/ope | `recordsOut` | Counter | Metric representing the egress of records from a pipeline component. | | `timeElapsed` | Timer | Metric representing the time elapsed during execution of a pipeline component. | -The `service-map-stateful` processor includes following custom metrics: +The `service_map` processor includes following custom metrics: * `traceGroupCacheCount`: The number of trace groups in the trace group cache. * `spanSetCount`: The number of span sets in the span set collection. \ No newline at end of file diff --git a/_data-prepper/pipelines/configuration/processors/split-string.md b/_data-prepper/pipelines/configuration/processors/split_string.md similarity index 76% rename from _data-prepper/pipelines/configuration/processors/split-string.md rename to _data-prepper/pipelines/configuration/processors/split_string.md index 3959ae5acd..a8058dd530 100644 --- a/_data-prepper/pipelines/configuration/processors/split-string.md +++ b/_data-prepper/pipelines/configuration/processors/split_string.md @@ -11,6 +11,13 @@ nav_order: 100 The `split_string` processor splits a field into an array using a delimiting character and is a [mutate string](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-string-processors#mutate-string-processors) processor. The following table describes the options you can use to configure the `split_string` processor. + + Option | Required | Type | Description :--- | :--- | :--- | :--- entries | Yes | List | List of entries. Valid values are `source`, `delimiter`, and `delimiter_regex`. diff --git a/_data-prepper/pipelines/configuration/processors/string-converter.md b/_data-prepper/pipelines/configuration/processors/string_converter.md similarity index 66% rename from _data-prepper/pipelines/configuration/processors/string-converter.md rename to _data-prepper/pipelines/configuration/processors/string_converter.md index 32055791b8..8d3df165fb 100644 --- a/_data-prepper/pipelines/configuration/processors/string-converter.md +++ b/_data-prepper/pipelines/configuration/processors/string_converter.md @@ -11,6 +11,13 @@ nav_order: 105 The `string_converter` processor converts a string to uppercase or lowercase. You can use it as an example for developing your own processor. The following table describes the option you can use to configure the `string_converter` processor. + + Option | Required | Type | Description :--- | :--- | :--- | :--- upper_case | No | Boolean | Whether to convert to uppercase (`true`) or lowercase (`false`). diff --git a/_data-prepper/pipelines/configuration/processors/substitute-string.md b/_data-prepper/pipelines/configuration/processors/substitute_string.md similarity index 75% rename from _data-prepper/pipelines/configuration/processors/substitute-string.md rename to _data-prepper/pipelines/configuration/processors/substitute_string.md index 5d18bf6a4f..6958ff8e42 100644 --- a/_data-prepper/pipelines/configuration/processors/substitute-string.md +++ b/_data-prepper/pipelines/configuration/processors/substitute_string.md @@ -14,6 +14,13 @@ The `substitute_string` processor matches a key's value against a regular expres The following table describes the options you can use to configure the `substitute_string` processor. + + Option | Required | Type | Description :--- | :--- | :--- | :--- entries | Yes | List | List of entries. Valid values are `source`, `from`, and `to`. diff --git a/_data-prepper/pipelines/configuration/processors/trim-string.md b/_data-prepper/pipelines/configuration/processors/trim_string.md similarity index 67% rename from _data-prepper/pipelines/configuration/processors/trim-string.md rename to _data-prepper/pipelines/configuration/processors/trim_string.md index 46b6ad4af1..97927949a2 100644 --- a/_data-prepper/pipelines/configuration/processors/trim-string.md +++ b/_data-prepper/pipelines/configuration/processors/trim_string.md @@ -10,6 +10,13 @@ nav_order: 120 The `trim_string` processor removes white space from the beginning and end of a key and is a [mutate string](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-string-processors#mutate-string-processors) processor. The following table describes the option you can use to configure the `trim_string` processor. + + Option | Required | Type | Description :--- | :--- | :--- | :--- with_keys | Yes | List | A list of keys to trim the white space from. diff --git a/_data-prepper/pipelines/configuration/processors/truncate.md b/_data-prepper/pipelines/configuration/processors/truncate.md index 3714d80847..8b4c3d19e9 100644 --- a/_data-prepper/pipelines/configuration/processors/truncate.md +++ b/_data-prepper/pipelines/configuration/processors/truncate.md @@ -14,6 +14,13 @@ The `truncate` processor truncates a key's value at the beginning, the end, or o You can configure the `truncate` processor using the following options. + + Option | Required | Type | Description :--- | :--- | :--- | :--- `entries` | Yes | String list | A list of entries to add to an event. diff --git a/_observing-your-data/trace/ta-dashboards.md b/_observing-your-data/trace/ta-dashboards.md index 595dce6ca2..c7cf0a5091 100644 --- a/_observing-your-data/trace/ta-dashboards.md +++ b/_observing-your-data/trace/ta-dashboards.md @@ -48,7 +48,7 @@ The **Trace Analytics** application includes two options: **Services** and **Tra The plugin requires you to use [Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/) to process and visualize OTel data and relies on the following Data Prepper pipelines for OTel correlations and service map calculations: - [Trace analytics pipeline]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/trace-analytics/) -- [Service map pipeline]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/service-map-stateful/) +- [Service map pipeline]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/service_map/) ### Standardized telemetry data From 47d64f87605669bfa505da79fc43febe86f3f980 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Wed, 31 Jul 2024 16:14:00 -0500 Subject: [PATCH 083/154] Add Index Template APIs (#7635) * Add Index Template APIs Signed-off-by: Archer * Add Index Template APIs Signed-off-by: Archer * Add Delete template API Signed-off-by: Archer * Add delete API Signed-off-by: Archer * Add component template. Signed-off-by: Archer * Add missing parameter Signed-off-by: Archer * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _api-reference/index-apis/create-index-template.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _api-reference/index-apis/create-index-template.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _api-reference/index-apis/component-template.md Signed-off-by: Heather Halter * Apply suggestions from code review Co-authored-by: Heather Halter Co-authored-by: Sarthak Aggarwal Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update component-template.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update create-index-template.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update delete-index-template.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Archer Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Heather Halter Co-authored-by: Heather Halter Co-authored-by: Sarthak Aggarwal Co-authored-by: Nathan Bower --- .../index-apis/component-template.md | 154 +++++++++++ .../index-apis/create-index-template.md | 240 ++++++++++++++++++ _api-reference/index-apis/create-index.md | 6 +- .../index-apis/delete-index-template.md | 31 +++ .../index-apis/get-index-template.md | 42 +++ 5 files changed, 470 insertions(+), 3 deletions(-) create mode 100644 _api-reference/index-apis/component-template.md create mode 100644 _api-reference/index-apis/create-index-template.md create mode 100644 _api-reference/index-apis/delete-index-template.md create mode 100644 _api-reference/index-apis/get-index-template.md diff --git a/_api-reference/index-apis/component-template.md b/_api-reference/index-apis/component-template.md new file mode 100644 index 0000000000..bafdfa95c7 --- /dev/null +++ b/_api-reference/index-apis/component-template.md @@ -0,0 +1,154 @@ +--- +layout: default +title: Create or update component template +parent: Index APIs +nav_order: 29 +--- + +# Create or update component template + +You can use the Component Template API to create or update a component template. A component template is a reusable building block that defines settings, mappings, and aliases that can be shared across multiple index templates. + +An index template can be constructed using multiple component templates. To incorporate a component template into an index template, you need to list it in the `composed_of` section of the index template. Component templates are only applied to newly created data streams and indexes that match the criteria specified in the index template. + +If any settings or mappings are directly defined in the index template or the index creation request, those settings will take precedence over the settings or mappings specified in a component template. + +Component templates are used solely during the process of index creation. For data streams, this includes the creation of the data stream itself and the creation of the backing indexes that support the stream. Modifications made to component templates will not affect existing indexes, including the backing indexes of a data stream. + +## Path and HTTP methods + +The PUT method adds a component template and accepts both query parameters and a request body. The GET method retrieves information about an existing component template and accepts only query parameters: + +```json +PUT _component_template/ +GET _component_template/ +``` + +## Path parameters + +Parameter | Data type | Description +:--- | :--- | :--- +`component-template-name` | String | The name of the component template. + +## Query parameters + +The following optional query parameters are supported. + +Parameter | Data type | Description +:--- | :--- | :--- +`create` | Boolean | When true, the API cannot replace or update any existing index templates. Default is `false`. +`cluster_manager_timeout` | Time | The amount of time to wait for a connection to the cluster manager node. Default is `30s`. +`timeout` | Time | The amount of time for the operation to wait for a response. Default is `30s`. + +## Request fields + +The following options can be used in the request body to customize the index template. + + +Parameter | Data type | Description +:--- | :--- | :--- +`template` | Object | The template that includes the `aliases`, `mappings`, or `settings` for the index. For more information, see [#template]. Required. +`version` | Integer | The version number used to manage index templates. Version numbers are not automatically set by OpenSearch. Optional. +`_meta` | Object | The metadata that provides details about the index template. Optional. +`allow_auto_create` | Boolean | When `true`, indexes can be automatically created with this template even if the `actions.auto_create_index` is disabled. When `false`, indexes and data streams matching the template cannot be automatically created. Optional. +`deprecated` | Boolean | When `true`, the component template is deprecated. If deprecated, OpenSearch will output a warning whenever the template is referenced. + + +### Template + +You can use the following objects with the `template` option in the request body. + +#### `alias` + +The name of the alias to associate with the template as a key. Required when the `template` option exists in the request body. This option supports multiple aliases. + +The object body contains the following optional alias parameters. + +Parameter | Data type | Description +:--- | :--- | :--- +`filter` | Query DSL object | The query that limits the number of documents that the alias can access. +`index_routing` | String | The value that routes the indexing operations to a specific shard. When specified, overwrites the `routing` value for the indexing operations. +`is_hidden` | Boolean | When `true`, the alias is hidden. Default is false. All alias indexes must have matching values for this setting. +`is_write_index` | Boolean | When `true`, the index is the write index for the alias. Default is `false`. +`routing` | String | The value used to route index and search operations to a specific shard. +`search_routing` | String | The value used to write search operations to a specific shard. When specified, this option overwrites the `routing` value for the search operations. + +#### `mappings` + +The field mappings that exist in the index. For more information, see [Mappings and field types](https://opensearch.org/docs/latest/field-types/). Optional. + +#### `settings` + +Any configuration options for the index. For more information, see [Index settings](https://opensearch.org/docs/latest/install-and-configure/configuring-opensearch/index-settings/). + +## Example requests + +The following example requests show how to use the Component Template API. + +### Create with index aliases + +The following example request creates a component template including index aliases: + +```json +PUT _component_template/alias_template +{ + "template": { + "settings" : { + "number_of_shards" : 1 + }, + "aliases" : { + "alias1" : {}, + "alias2" : { + "filter" : { + "term" : {"user.id" : "hamlet" } + }, + "routing" : "shard-1" + }, + "{index}-alias" : {} + } + } +} +``` + +### Adding component versioning + + +The following example adds a `version` number to a component template which simplifies template management for external systems: + +```json +PUT /_component_template/version_template +{ + "template": { + "settings" : { + "number_of_shards" : 1 + } + }, + "version": 3 +} +``` +{% include copy-curl.html %} + +## Adding template metadata + +The following example request uses the `meta` parameter to add metadata to the index template. All metadata is stored in the cluster state. + +```json +PUT /_component_template/meta_template +{ + "template": { + "settings" : { + "number_of_shards" : 1 + } + }, + "_meta": { + "description": "Where art thou", + "serialization": { + "class": "MyIndexTemplate", + "id": 12 + } + } +} +``` + + + diff --git a/_api-reference/index-apis/create-index-template.md b/_api-reference/index-apis/create-index-template.md new file mode 100644 index 0000000000..2a92e3f4c4 --- /dev/null +++ b/_api-reference/index-apis/create-index-template.md @@ -0,0 +1,240 @@ +--- +layout: default +title: Create or update index template +parent: Index APIs +nav_order: 26 +--- + +# Create or update index template + +You can use the Create or Update Index Template API to create indexes with predefined mappings and settings as well as update existing index templates. + +## Path and HTTP methods + +```json +PUT _index_template/ +POST _index_template/ +``` + +## Path parameters + +Parameter | Data type | Description +:--- | :--- | :--- +`template-name` | String | The name of the index template. + +## Query parameters + +The following optional query parameters are supported. + +Parameter | Data type | Description +:--- | :--- | :--- +`create` | Boolean | When true, the API cannot replace or update any existing index templates. Default is `false`. +`cluster_manager_timeout` | Time | The amount of time to wait for a connection to the cluster manager node. Default is `30s`. + +## Request body options + +The following options can be used in the request body to customize the index template. + + +Parameter | Type | Description +:--- | :--- | :--- +`index_patterns` | String array | An array of wildcard expressions that match the names of data streams and indexes created during template creation. Required. +`composed_of` | String array | An ordered list of component template names. These templates are merged using the specified order. For more information, see [Using multiple component templates](#using-multiple-component-templates). Optional. +`data_stream` | Object | When used, the request creates data streams and any backing indexes based on the template. This setting requires a matching index template. It can also be used with the `hidden` setting, which, when set to `true`, hides the data stream backing indexes. Optional. +`_meta` | Object | Optional metadata that provides details about the index template. Optional. +`priority` | Integer | A number that determines which index templates take precedence during the creation of a new index or data stream. OpenSearch chooses the template with the highest priority. When no priority is given, the template is assigned a `0`, signifying the lowest priority. Optional. +`template` | Object | The template that includes the `aliases`, `mappings`, or `settings` for the index. For more information, see [#template]. Optional. +`version` | Integer | The version number used to manage index templates. Version numbers are not automatically set by OpenSearch. Optional. + + +### Template + +You can use the following objects with the `template` option in the request body. + +#### `alias` + +The name of the alias to associate with the template as a key. Required when the `template` option exists in the request body. This option supports multiple aliases. + +The object body contains the following optional alias parameters. + +Parameter | Data type | Description +:--- | :--- | :--- +`filter` | Query DSL object | The query that limits the number of documents that the alias can access. +`index_routing` | String | The value that routes indexing operations to a specific shard. When specified, overwrites the `routing` value for indexing operations. +`is_hidden` | Boolean | When `true`, the alias is hidden. Default is `false`. All alias indexes must have matching values for this setting. +`is_write_index` | Boolean | When `true`, the index is the write index for the alias. Default is `false`. +`routing` | String | The value used to route index and search operations to a specific shard. +`search_routing` | String | The value used to write specific search operations to a specific shard. When specified, this option overwrites the `routing` value for search operations. + +#### `mappings` + +The field mappings that exist in the index. For more information, see [Mappings and field types](https://opensearch.org/docs/latest/field-types/). Optional. + +#### `settings` + +Any configuration options for the index. For more information, see [Index settings](https://opensearch.org/docs/latest/install-and-configure/configuring-opensearch/index-settings/). + +## Example requests + +The following examples show how to use the Create or Update Index Template API. + +### Index template with index aliases + +The following example request includes index aliases in the template: + +```json +PUT _index_template/alias-template +{ + "index_patterns" : ["sh*"], + "template": { + "settings" : { + "number_of_shards" : 1 + }, + "aliases" : { + "alias1" : {}, + "alias2" : { + "filter" : { + "term" : {"user.id" : "hamlet" } + }, + "routing" : "shard-1" + }, + "{index}-alias" : {} + } + } +} +``` +{% include copy-curl.html %} + +### Using multiple matching templates + +When multiple index templates match the name of a new index or data stream, the template with the highest priority is used. For example, the following two requests create index templates with different priorities: + +```json +PUT /_index_template/template_one +{ + "index_patterns" : ["h*"], + "priority" : 0, + "template": { + "settings" : { + "number_of_shards" : 1, + "number_of_replicas": 0 + }, + "mappings" : { + "_source" : { "enabled" : false } + } + } +} + +PUT /_index_template/template_two +{ + "index_patterns" : ["ha*"], + "priority" : 1, + "template": { + "settings" : { + "number_of_shards" : 2 + }, + "mappings" : { + "_source" : { "enabled" : true } + } + } +} +``` +{% include copy-curl.html %} + +For indexes that start with `ha`, the `_source` is enabled. Because only `template_two` is applied, the index will have two primary shards and one replica. + +Overlapping index patterns given the same priority are not allowed. An error will occur when attempting to create a template matching an existing index template with identical priorities. +{: .note} + +### Adding template versioning + +The following example request adds a `version` number to an index template, which simplifies template management for external systems: + +```json +PUT /_index_template/template_one +{ + "index_patterns" : ["mac", "cheese"], + "priority" : 0, + "template": { + "settings" : { + "number_of_shards" : 1 + } + }, + "version": 1 +} +``` +{% include copy-curl.html %} + + +### Adding template metadata + +The following example request uses the `meta` parameter to add metadata to the index template. All metadata is stored in the cluster state: + +```json +PUT /_index_template/template_one +{ + "index_patterns": ["rom", "juliet"], + "template": { + "settings" : { + "number_of_shards" : 2 + } + }, + "_meta": { + "description": "Where art thou", + "serialization": { + "class": "MyIndexTemplate", + "id": 12 + } + } +} +``` + +### Data stream definition + +Include a `data_stream` object to use an index template for data streams, as shown in the following example request: + +```json +PUT /_index_template/template_1 +{ + "index_patterns": ["logs-*"], + "data_stream": { } +} +``` + +## Using multiple component templates + +When using multiple component templates with the `composed_of` field, the component templates are merged in the specified order. Next, all mappings, settings, and aliases from the parent index template of the component are merged. Lastly, any configuration options added to the index requests are merged. + +In the following example request, an index with `h*` has two merged primary shards. If the order in the request body were reversed, then the index would have one primary shard: + +```json +PUT /_component_template/template_with_1_shard +{ + "template": { + "settings": { + "index.number_of_shards": 1 + } + } +} + +PUT /_component_template/template_with_2_shards +{ + "template": { + "settings": { + "index.number_of_shards": 2 + } + } +} + +PUT /_index_template/template_1 +{ + "index_patterns": ["h*"], + "composed_of": ["template_with_1_shard", "template_with_2_shards"] +} +``` +{% include copy-curl.html %} + + +Recursive merging is used for mapping definition and root options such as `dynamic_templates` and `meta`, meaning that when an earlier component contains a `meta` block, new `meta` entries are added to the end of the metadata in the index. Any entries containing a preexisting key are overwritten. + + diff --git a/_api-reference/index-apis/create-index.md b/_api-reference/index-apis/create-index.md index ff5d7dbda5..2f4c1041bc 100644 --- a/_api-reference/index-apis/create-index.md +++ b/_api-reference/index-apis/create-index.md @@ -34,9 +34,9 @@ OpenSearch indexes have the following naming restrictions: ## Path parameters -| Parameter | Description | -:--- | :--- -| index | String | The index name. Must conform to the [index naming restrictions](#index-naming-restrictions). Required. | +Parameter | Data type | Description +:--- | :--- | :--- +index | String | The index name. Must conform to the [index naming restrictions](#index-naming-restrictions). Required. ## Query parameters diff --git a/_api-reference/index-apis/delete-index-template.md b/_api-reference/index-apis/delete-index-template.md new file mode 100644 index 0000000000..f6e2f38773 --- /dev/null +++ b/_api-reference/index-apis/delete-index-template.md @@ -0,0 +1,31 @@ +--- +layout: default +title: Delete index template +parent: Index APIs +nav_order: 28 +--- + +# Delete index template + +The Delete Index Template API deletes one or more index templates. + +## Path and HTTP methods + +```json +DELETE /_index_template/ +``` + +## Path parameters + +Parameter | Type | Description +:--- | :--- | :--- +`template-name` | String | The name of the index template. You can delete multiple templates in one request by separating the template names with commas. When multiple template names are used in the request, wildcards are not supported. + +## Query parameters + +The following optional query parameters are supported. + +Parameter | Type | Description +:--- | :--- | :--- +`cluster_manager_timeout` | Time | The amount of time to wait for a connection to the cluster manager node. Default is `30s`. +`timeout` | Time | The amount of time that the operation will wait for a response. Default is `30s`. diff --git a/_api-reference/index-apis/get-index-template.md b/_api-reference/index-apis/get-index-template.md new file mode 100644 index 0000000000..7e2d383640 --- /dev/null +++ b/_api-reference/index-apis/get-index-template.md @@ -0,0 +1,42 @@ +--- +layout: default +title: Get index template +parent: Index APIs +nav_order: 27 +--- + +# Get index template + +The Get Index Template API returns information about one or more index templates. + +## Path and HTTP methods + +```json +GET /_index_template/ +``` + +## Query parameters + +The following optional query parameters are supported. + +Parameter | Type | Description +:--- | :--- | :--- +`create` | Boolean | When true, the API cannot replace or update any existing index templates. Default is `false`. +`cluster_manager_timeout` | Time | The amount of time to wait for a connection to the cluster manager node. Default is `30s`. +`flat_settings` | Boolean | Whether to return settings in the flat form, which can improve readability, especially for heavily nested settings. For example, the flat form of "index": { "creation_date": "123456789" } is "index.creation_date": "123456789". + +## Example requests + +The following example request gets information about an index template by using a wildcard expression: + +```json +GET /_index_template/h* +``` +{% include copy-curl.html %} + +The following example request gets information about all index templates: + +```json +GET /_index_template +``` +{% include copy-curl.html %} From d4bbdd919cab989bc0744e2fa155a85ef4d0d8d6 Mon Sep 17 00:00:00 2001 From: Siddhant Deshmukh Date: Wed, 31 Jul 2024 14:20:20 -0700 Subject: [PATCH 084/154] Add documentation for query insights - query metrics feature (#7846) * Add documentation for query insigts - query metrics feature Signed-off-by: Siddhant Deshmukh * Address auto comments Signed-off-by: Siddhant Deshmukh * Fix dead link Signed-off-by: Siddhant Deshmukh * Address auto comments Signed-off-by: Siddhant Deshmukh * Doc review Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _observing-your-data/query-insights/query-metrics.md Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _observing-your-data/query-insights/query-metrics.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Siddhant Deshmukh Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Fanit Kolchina Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- _observing-your-data/query-insights/index.md | 5 +- .../query-insights/query-metrics.md | 84 +++++++++++++++++++ .../query-insights/top-n-queries.md | 2 +- 3 files changed, 88 insertions(+), 3 deletions(-) create mode 100644 _observing-your-data/query-insights/query-metrics.md diff --git a/_observing-your-data/query-insights/index.md b/_observing-your-data/query-insights/index.md index 7bad169d1d..549371240f 100644 --- a/_observing-your-data/query-insights/index.md +++ b/_observing-your-data/query-insights/index.md @@ -31,8 +31,9 @@ bin/opensearch-plugin install query-insights ``` For information about installing plugins, see [Installing plugins]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/). -## Query insights settings +## Query Insights settings -Query insights features support the following settings: +You can obtain the following information using Query Insights: - [Top n queries]({{site.url}}{{site.baseurl}}/observing-your-data/query-insights/top-n-queries/) +- [Query metrics]({{site.url}}{{site.baseurl}}/observing-your-data/query-insights/query-metrics/) diff --git a/_observing-your-data/query-insights/query-metrics.md b/_observing-your-data/query-insights/query-metrics.md new file mode 100644 index 0000000000..c8caf21d65 --- /dev/null +++ b/_observing-your-data/query-insights/query-metrics.md @@ -0,0 +1,84 @@ +--- +layout: default +title: Query metrics +parent: Query insights +nav_order: 20 +--- + +# Query metrics + +Key query [metrics](#metrics), such as aggregation types, query types, latency, and resource usage per query type, are captured along the search path by using the OpenTelemetry (OTel) instrumentation framework. The telemetry data can be consumed using OTel metrics [exporters]({{site.url}}{{site.baseurl}}/observing-your-data/trace/distributed-tracing/#exporters). + +## Configuring query metric generation + +To configure query metric generation, use the following steps. + +### Step 1: Install the Query Insights plugin + +For information about installing the Query Insights plugin, see [Installing the Query Insights plugin]({{site.url}}{{site.baseurl}}/observing-your-data/query-insights/index/#installing-the-query-insights-plugin). + +### Step 2: Install the OpenTelemetry plugin + +For information about installing the OpenTelemetry plugin, see [Distributed tracing]({{site.url}}{{site.baseurl}}/observing-your-data/trace/distributed-tracing/). + +### Step 3: Enable query metrics + +Enable query metrics by configuring the following `opensearch.yml` settings: + +```yaml +telemetry.feature.metrics.enabled: true +search.query.metrics.enabled: true +``` +{% include copy.html %} + +The following is a complete sample configuration that includes a telemetry configuration: + +```yaml +# Enable query metrics feature +search.query.metrics.enabled: true +telemetry.feature.metrics.enabled: true + +# OTel-related configuration +opensearch.experimental.feature.telemetry.enabled: true +telemetry.tracer.sampler.probability: 1.0 +telemetry.feature.tracer.enabled: true +``` +{% include copy.html %} + +Alternatively, you can configure query metric generation using the API: + +```json +PUT _cluster/settings +{ + "persistent" : { + "search.query.metrics.enabled" : true + } +} +``` +{% include copy-curl.html %} + +Configure the export of metrics and traces using a gRPC exporter. For more information, see [Exporters]({{site.url}}{{site.baseurl}}/observing-your-data/trace/distributed-tracing/#exporters). You can skip this step if you use the [default logging exporter](#default-logging-exporter): + +```yaml +telemetry.otel.tracer.span.exporter.class: io.opentelemetry.exporter.otlp.trace.OtlpGrpcSpanExporter +telemetry.otel.metrics.exporter.class: io.opentelemetry.exporter.otlp.metrics.OtlpGrpcMetricExporter +``` +{% include copy.html %} + +## Metrics + +Query metrics provide the following measurements: + +- The number of queries per query type (for example, the number of `match` or `regex` queries) +- The number of queries per aggregation type (for example, the number of `terms` aggregation queries) +- The number of queries per sort order (for example, the number of ascending and descending `sort` queries) +- Histograms of `latency` for each query type, aggregation type, and sort order +- Histograms of `cpu` for each query type, aggregation type, and sort order +- Histograms of `memory` for each query type, aggregation type, and sort order + +## Default logging exporter + +By default, if no gRPC exporters are configured, then the metrics and traces are exported to log files. The data is saved in the `opensearch/logs` directory in the following files: + +- `opensearch_otel_metrics.log` +- `opensearch_otel_traces.log` diff --git a/_observing-your-data/query-insights/top-n-queries.md b/_observing-your-data/query-insights/top-n-queries.md index e6dadf33c5..f07fd2dfef 100644 --- a/_observing-your-data/query-insights/top-n-queries.md +++ b/_observing-your-data/query-insights/top-n-queries.md @@ -2,7 +2,7 @@ layout: default title: Top N queries parent: Query insights -nav_order: 65 +nav_order: 10 --- # Top N queries From 5510bdc64c5cfc670585249c52ff0e86c309b7b9 Mon Sep 17 00:00:00 2001 From: Sander van de Geijn Date: Thu, 1 Aug 2024 17:40:01 +0200 Subject: [PATCH 085/154] Added target_bulk_bytes to the docs for logstash-output plugin (#7869) * Added target_bulk_bytes Signed-off-by: Sander van de Geijn * Update _tools/logstash/ship-to-opensearch.md Nice Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Sander van de Geijn * Update _tools/logstash/ship-to-opensearch.md Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update ship-to-opensearch.md * Remove "we" * Update ship-to-opensearch.md * Update ship-to-opensearch.md * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Sander van de Geijn Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Nathan Bower --- _tools/logstash/ship-to-opensearch.md | 74 ++++++++++++++------------- 1 file changed, 38 insertions(+), 36 deletions(-) diff --git a/_tools/logstash/ship-to-opensearch.md b/_tools/logstash/ship-to-opensearch.md index e56163c288..6ea355b34f 100644 --- a/_tools/logstash/ship-to-opensearch.md +++ b/_tools/logstash/ship-to-opensearch.md @@ -9,7 +9,7 @@ redirect_from: # Ship events to OpenSearch -You can Ship Logstash events to an OpenSearch cluster and then visualize your events with OpenSearch Dashboards. +You can ship Logstash events to an OpenSearch cluster and then visualize your events with OpenSearch Dashboards. Make sure you have [Logstash]({{site.url}}{{site.baseurl}}/tools/logstash/index#install-logstash), [OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/install-opensearch/index/), and [OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/install-and-configure/install-dashboards/index/). {: .note } @@ -30,9 +30,10 @@ output { } ``` - ## Sample walkthrough +The following walkthrough shows an example of how the ship a Logstash event. + 1. Open the `config/pipeline.conf` file and add in the following configuration: ```yml @@ -53,7 +54,7 @@ output { } ``` - This Logstash pipeline accepts JSON input through the terminal and ships the events to an OpenSearch cluster running locally. Logstash writes the events to an index with the `logstash-logs-%{+YYYY.MM.dd}` naming convention. +The Logstash pipeline accepts JSON input through the terminal and ships the events to an OpenSearch cluster running locally. Logstash writes the events to an index with the `logstash-logs-%{+YYYY.MM.dd}` naming convention. 2. Start Logstash: @@ -78,13 +79,9 @@ output { green | open | logstash-logs-2021.07.01 | iuh648LYSnmQrkGf70pplA | 1 | 1 | 1 | 0 | 10.3kb | 5.1kb ``` -## Adding different Authentication mechanisms in the Output plugin - -## auth_type to support different authentication mechanisms +## Adding different authentication mechanisms in the Output plugin -In addition to the existing authentication mechanisms, if we want to add new authentication then we will be adding them in the configuration by using auth_type - -Example Configuration for basic authentication: +In addition to the existing authentication mechanisms, you can add a new authentication mechanism using the `auth_type` setting, as shown in the following example configuration: ```yml output { @@ -101,15 +98,15 @@ output { ``` ### Parameters inside auth_type -- type (string) - We should specify the type of authentication -- We should add credentials required for that authentication like 'user' and 'password' for 'basic' authentication -- We should also add other parameters required for that authentication mechanism like we added 'region' for 'aws_iam' authentication +The following parameters are supported in the `auth_type` setting: -## Configuration for AWS IAM Authentication +- `type` (string): The type of authentication. +- `user`: A user name. +- `password`: The password used for basic authentication. -To run the Logstash Output Opensearch plugin using aws_iam authentication, simply add a configuration following the below documentation. +## Configuration for AWS IAM Authentication -Example Configuration: +To run the Logstash Output OpenSearch plugin using `aws_iam` authentication, add the following configuration: ```yml output { @@ -129,36 +126,41 @@ output { ### Required Parameters -- hosts (array of string) - AmazonOpensearchService domain endpoint : port number -- auth_type (Json object) - Which holds other parameters required for authentication - - type (string) - "aws_iam" - - aws_access_key_id (string) - AWS access key - - aws_secret_access_key (string) - AWS secret access key - - region (string, :default => "us-east-1") - region in which the domain is located - - if we want to pass other optional parameters like profile, session_token,etc. They needs to be added in auth_type -- port (string) - AmazonOpensearchService listens on port 443 for HTTPS -- protocol (string) - The protocol used to connect to AmazonOpensearchService is 'https' +- `hosts` (array of string): The `AmazonOpensearchService` domain endpoint and port number. +- `auth_type` (JSON object): The authentication settings. + - `type` (string): "aws_iam". + - `aws_access_key_id` (string): AWS access key. + - `aws_secret_access_key` (string): AWS secret access key. + - `region` (string, :default => "us-east-1"): The region in which the domain is located. +- port (string): AmazonOpensearchService listens on port 443 for `HTTPS`. +- protocol (string): The protocol used to connect. For `AmazonOpensearchService`, the protocol is `https`. ### Optional Parameters -- The credential resolution logic can be described as follows: - - User passed aws_access_key_id and aws_secret_access_key in configuration - - Environment variables - AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY (RECOMMENDED since they are recognized by all the AWS SDKs and CLI except for .NET), or AWS_ACCESS_KEY and AWS_SECRET_KEY (only recognized by Java SDK) - - Credential profiles file at the default location (~/.aws/credentials) shared by all AWS SDKs and the AWS CLI - - Instance profile credentials delivered through the Amazon EC2 metadata service -- template (path) - You can set the path to your own template here. If no template is specified, the plugin uses the default template. -- template_name (string, default => "logstash") - Defines how the template is named inside Opensearch -- service_name (string, default => "es") - Defines the service name to be used for `aws_iam` authentication. -- legacy_template (boolean, default => true) - Selects the OpenSearch template API. When `true`, uses legacy templates via the _template API. When `false`, uses composable templates via the _index_template API. -- default_server_major_version (number) - The OpenSearch server major version to use when it's not available from the OpenSearch root URL. If not set, the plugin throws an exception when the version can't be fetched. + +- `template` (path): You can set the path to your own template here. If no template is specified, the plugin uses the default template. +- `template_name` (string, default => `logstash`): Defines how the template is named inside OpenSearch. +- `service_name` (string): Defines the service name to be used for `aws_iam` authentication. +- `legacy_template` (Boolean, default => `true`): Selects the OpenSearch template API. When `true`, uses legacy templates derived from the `_template` API. When `false`, uses the `index_template` API. +- `default_server_major_version` (number): The OpenSearch server major version to use when it's not available from the OpenSearch root URL. If not set, the plugin throws an exception when the version can't be fetched. +- `target_bulk_bytes` (number): The maximum number of bytes in the buffer. When the maximum is reached, Logstash will flush the data to OpenSearch. This is useful when the bulk requests are too large for the OpenSearch cluster and the cluster returns a `429` error. + +### Credential resolution logic + +The following list provides details on the credential resolution logic: + +- A user passes `aws_access_key_id` and `aws_secret_access_key` in the configuration. +- Environment variables, such `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` are recommended since they are recognized by all the AWS SDKs and CLIs except for `.NET`. You can also use `AWS_ACCESS_KEY` and `AWS_SECRET_KEY` which are recognized by the Java SDK. +- The credential profiles file found in the `~/.aws/credentials` directory, is shared by all AWS SDKs and the AWS CLI. +- Instance profile credentials are delivered through the Amazon EC2 metadata service. ## Data streams The OpenSearch output plugin can store both time series datasets (such as logs, events, and metrics) and non-time series data in OpenSearch. The data stream is recommended to index time series datasets (such as logs, metrics, and events) into OpenSearch. -To know more about data streams, refer to this [documentation](https://opensearch.org/docs/latest/opensearch/data-streams/). +To learn more about data streams, see the [data stream documentation](https://opensearch.org/docs/latest/opensearch/data-streams/). -We can ingest data into a data stream through logstash. We need to create the data stream and specify the name of data stream and the `op_type` of `create` in the output configuration. The sample configuration is shown below: +To ingest data into a data stream through Logstash, create the data stream and specify the name of the data stream and set the `action` setting to `create`, as shown in the following example configuration: ```yml output { From 73ab08c996ce741bb8510c941eeecc32c4728b5c Mon Sep 17 00:00:00 2001 From: Junqiu Lei Date: Thu, 1 Aug 2024 11:40:49 -0700 Subject: [PATCH 086/154] Add doc for binary format support in k-NN (#7840) * Add doc for binary format support in k-NN Signed-off-by: Junqiu Lei * Resolve tech feedback Signed-off-by: Junqiu Lei * Doc review Signed-off-by: Fanit Kolchina * Add newline Signed-off-by: Fanit Kolchina * Formatting Signed-off-by: Fanit Kolchina * Link fix Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Add query results to examples Signed-off-by: Junqiu Lei * Rephrased sentences and changed vector field name Signed-off-by: Fanit Kolchina * Editorial review Signed-off-by: Fanit Kolchina * Remove details from one of the requests Signed-off-by: Fanit Kolchina --------- Signed-off-by: Junqiu Lei Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Fanit Kolchina Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- .../supported-field-types/knn-vector.md | 416 +++++++++++++++++- _search-plugins/knn/approximate-knn.md | 18 +- _search-plugins/knn/knn-index.md | 13 +- _search-plugins/knn/knn-score-script.md | 14 +- _search-plugins/knn/painless-functions.md | 4 + _search-plugins/vector-search.md | 2 +- 6 files changed, 447 insertions(+), 20 deletions(-) diff --git a/_field-types/supported-field-types/knn-vector.md b/_field-types/supported-field-types/knn-vector.md index c7f9ec7f2b..a2a7137733 100644 --- a/_field-types/supported-field-types/knn-vector.md +++ b/_field-types/supported-field-types/knn-vector.md @@ -13,7 +13,7 @@ The [k-NN plugin]({{site.url}}{{site.baseurl}}/search-plugins/knn/index/) introd ## Example -For example, to map `my_vector1` as a `knn_vector`, use the following request: +For example, to map `my_vector` as a `knn_vector`, use the following request: ```json PUT test-index @@ -26,7 +26,7 @@ PUT test-index }, "mappings": { "properties": { - "my_vector1": { + "my_vector": { "type": "knn_vector", "dimension": 3, "method": { @@ -67,8 +67,7 @@ PUT test-index ## Model IDs -Model IDs are used when the underlying Approximate k-NN algorithm requires a training step. As a prerequisite, the -model has to be created with the [Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-a-model). The +Model IDs are used when the underlying Approximate k-NN algorithm requires a training step. As a prerequisite, the model must be created with the [Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-a-model). The model contains the information needed to initialize the native library segment files. ```json @@ -111,7 +110,7 @@ PUT test-index }, "mappings": { "properties": { - "my_vector1": { + "my_vector": { "type": "knn_vector", "dimension": 3, "data_type": "byte", @@ -136,7 +135,7 @@ Then ingest documents as usual. Make sure each dimension in the vector is in the ```json PUT test-index/_doc/1 { - "my_vector1": [-126, 28, 127] + "my_vector": [-126, 28, 127] } ``` {% include copy-curl.html %} @@ -144,7 +143,7 @@ PUT test-index/_doc/1 ```json PUT test-index/_doc/2 { - "my_vector1": [100, -128, 0] + "my_vector": [100, -128, 0] } ``` {% include copy-curl.html %} @@ -157,7 +156,7 @@ GET test-index/_search "size": 2, "query": { "knn": { - "my_vector1": { + "my_vector": { "vector": [26, -120, 99], "k": 2 } @@ -267,3 +266,404 @@ else: return Byte(bval) ``` {% include copy.html %} + +## Binary k-NN vectors + +You can reduce memory costs by a factor of 32 by switching from float to binary vectors. +Using binary vector indexes can lower operational costs while maintaining high recall performance, making large-scale deployment more economical and efficient. + +Binary format is available for the following k-NN search types: + +- [Approximate k-NN]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/): Supports binary vectors only for the Faiss engine with the HNSW and IVF algorithms. +- [Script score k-NN]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-score-script/): Enables the use of binary vectors in script scoring. +- [Painless extensions]({{site.url}}{{site.baseurl}}/search-plugins/knn/painless-functions/): Allows the use of binary vectors with Painless scripting extensions. + +### Requirements + +There are several requirements for using binary vectors in the OpenSearch k-NN plugin: + +- The `data_type` of the binary vector index must be `binary`. +- The `space_type` of the binary vector index must be `hamming`. +- The `dimension` of the binary vector index must be a multiple of 8. +- You must convert your binary data into 8-bit signed integers (`int8`) in the [-128, 127] range. For example, the binary sequence of 8 bits `0, 1, 1, 0, 0, 0, 1, 1` must be converted into its equivalent byte value of `99` to be used as a binary vector input. + +### Example: HNSW + +To create a binary vector index with the Faiss engine and HNSW algorithm, send the following request: + +```json +PUT /test-binary-hnsw +{ + "settings": { + "index": { + "knn": true + } + }, + "mappings": { + "properties": { + "my_vector": { + "type": "knn_vector", + "dimension": 8, + "data_type": "binary", + "method": { + "name": "hnsw", + "space_type": "hamming", + "engine": "faiss", + "parameters": { + "ef_construction": 128, + "m": 24 + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +Then ingest some documents containing binary vectors: + +```json +PUT _bulk +{"index": {"_index": "test-binary-hnsw", "_id": "1"}} +{"my_vector": [7], "price": 4.4} +{"index": {"_index": "test-binary-hnsw", "_id": "2"}} +{"my_vector": [10], "price": 14.2} +{"index": {"_index": "test-binary-hnsw", "_id": "3"}} +{"my_vector": [15], "price": 19.1} +{"index": {"_index": "test-binary-hnsw", "_id": "4"}} +{"my_vector": [99], "price": 1.2} +{"index": {"_index": "test-binary-hnsw", "_id": "5"}} +{"my_vector": [80], "price": 16.5} +``` +{% include copy-curl.html %} + +When querying, be sure to use a binary vector: + +```json +GET /test-binary-hnsw/_search +{ + "size": 2, + "query": { + "knn": { + "my_vector": { + "vector": [9], + "k": 2 + } + } + } +} +``` +{% include copy-curl.html %} + +The response contains the two vectors closest to the query vector: + +
+ + Response + + {: .text-delta} + +```json +{ + "took": 8, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 2, + "relation": "eq" + }, + "max_score": 0.5, + "hits": [ + { + "_index": "test-binary-hnsw", + "_id": "2", + "_score": 0.5, + "_source": { + "my_vector": [ + 10 + ], + "price": 14.2 + } + }, + { + "_index": "test-binary-hnsw", + "_id": "5", + "_score": 0.25, + "_source": { + "my_vector": [ + 80 + ], + "price": 16.5 + } + } + ] + } +} +``` +
+ +### Example: IVF + +The IVF method requires a training step that creates and trains the model used to initialize the native library index during segment creation. For more information, see [Building a k-NN index from a model]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model). + +First, create an index that will contain binary vector training data. Specify the Faiss engine and IVF algorithm and make sure that the `dimension` matches the dimension of the model you want to create: + +```json +PUT train-index +{ + "mappings": { + "properties": { + "train-field": { + "type": "knn_vector", + "dimension": 8, + "data_type": "binary" + } + } + } +} +``` +{% include copy-curl.html %} + +Ingest training data containing binary vectors into the training index: + +
+ + Bulk ingest request + + {: .text-delta} + +```json +PUT _bulk +{ "index": { "_index": "train-index", "_id": "1" } } +{ "train-field": [1] } +{ "index": { "_index": "train-index", "_id": "2" } } +{ "train-field": [2] } +{ "index": { "_index": "train-index", "_id": "3" } } +{ "train-field": [3] } +{ "index": { "_index": "train-index", "_id": "4" } } +{ "train-field": [4] } +{ "index": { "_index": "train-index", "_id": "5" } } +{ "train-field": [5] } +{ "index": { "_index": "train-index", "_id": "6" } } +{ "train-field": [6] } +{ "index": { "_index": "train-index", "_id": "7" } } +{ "train-field": [7] } +{ "index": { "_index": "train-index", "_id": "8" } } +{ "train-field": [8] } +{ "index": { "_index": "train-index", "_id": "9" } } +{ "train-field": [9] } +{ "index": { "_index": "train-index", "_id": "10" } } +{ "train-field": [10] } +{ "index": { "_index": "train-index", "_id": "11" } } +{ "train-field": [11] } +{ "index": { "_index": "train-index", "_id": "12" } } +{ "train-field": [12] } +{ "index": { "_index": "train-index", "_id": "13" } } +{ "train-field": [13] } +{ "index": { "_index": "train-index", "_id": "14" } } +{ "train-field": [14] } +{ "index": { "_index": "train-index", "_id": "15" } } +{ "train-field": [15] } +{ "index": { "_index": "train-index", "_id": "16" } } +{ "train-field": [16] } +{ "index": { "_index": "train-index", "_id": "17" } } +{ "train-field": [17] } +{ "index": { "_index": "train-index", "_id": "18" } } +{ "train-field": [18] } +{ "index": { "_index": "train-index", "_id": "19" } } +{ "train-field": [19] } +{ "index": { "_index": "train-index", "_id": "20" } } +{ "train-field": [20] } +{ "index": { "_index": "train-index", "_id": "21" } } +{ "train-field": [21] } +{ "index": { "_index": "train-index", "_id": "22" } } +{ "train-field": [22] } +{ "index": { "_index": "train-index", "_id": "23" } } +{ "train-field": [23] } +{ "index": { "_index": "train-index", "_id": "24" } } +{ "train-field": [24] } +{ "index": { "_index": "train-index", "_id": "25" } } +{ "train-field": [25] } +{ "index": { "_index": "train-index", "_id": "26" } } +{ "train-field": [26] } +{ "index": { "_index": "train-index", "_id": "27" } } +{ "train-field": [27] } +{ "index": { "_index": "train-index", "_id": "28" } } +{ "train-field": [28] } +{ "index": { "_index": "train-index", "_id": "29" } } +{ "train-field": [29] } +{ "index": { "_index": "train-index", "_id": "30" } } +{ "train-field": [30] } +{ "index": { "_index": "train-index", "_id": "31" } } +{ "train-field": [31] } +{ "index": { "_index": "train-index", "_id": "32" } } +{ "train-field": [32] } +{ "index": { "_index": "train-index", "_id": "33" } } +{ "train-field": [33] } +{ "index": { "_index": "train-index", "_id": "34" } } +{ "train-field": [34] } +{ "index": { "_index": "train-index", "_id": "35" } } +{ "train-field": [35] } +{ "index": { "_index": "train-index", "_id": "36" } } +{ "train-field": [36] } +{ "index": { "_index": "train-index", "_id": "37" } } +{ "train-field": [37] } +{ "index": { "_index": "train-index", "_id": "38" } } +{ "train-field": [38] } +{ "index": { "_index": "train-index", "_id": "39" } } +{ "train-field": [39] } +{ "index": { "_index": "train-index", "_id": "40" } } +{ "train-field": [40] } +``` +{% include copy-curl.html %} +
+ +Then, create and train the model named `test-binary-model`. The model will be trained using the training data from the `train_field` in the `train-index`. Specify the `binary` data type and `hamming` space type: + +```json +POST _plugins/_knn/models/test-binary-model/_train +{ + "training_index": "train-index", + "training_field": "train-field", + "dimension": 8, + "description": "model with binary data", + "data_type": "binary", + "method": { + "name": "ivf", + "engine": "faiss", + "space_type": "hamming", + "parameters": { + "nlist": 1, + "nprobes": 1 + } + } +} +``` +{% include copy-curl.html %} + +To check the model training status, call the Get Model API: + +```json +GET _plugins/_knn/models/test-binary-model?filter_path=state +``` +{% include copy-curl.html %} + +Once the training is complete, the `state` changes to `created`. + +Next, create an index that will initialize its native library indexes using the trained model: + +```json +PUT test-binary-ivf +{ + "settings": { + "index": { + "knn": true + } + }, + "mappings": { + "properties": { + "my_vector": { + "type": "knn_vector", + "model_id": "test-binary-model" + } + } + } +} +``` +{% include copy-curl.html %} + +Ingest the data containing the binary vectors that you want to search into the created index: + +```json +PUT _bulk?refresh=true +{"index": {"_index": "test-binary-ivf", "_id": "1"}} +{"my_vector": [7], "price": 4.4} +{"index": {"_index": "test-binary-ivf", "_id": "2"}} +{"my_vector": [10], "price": 14.2} +{"index": {"_index": "test-binary-ivf", "_id": "3"}} +{"my_vector": [15], "price": 19.1} +{"index": {"_index": "test-binary-ivf", "_id": "4"}} +{"my_vector": [99], "price": 1.2} +{"index": {"_index": "test-binary-ivf", "_id": "5"}} +{"my_vector": [80], "price": 16.5} +``` +{% include copy-curl.html %} + +Finally, search the data. Be sure to provide a binary vector in the k-NN vector field: + +```json +GET test-binary-ivf/_search +{ + "size": 2, + "query": { + "knn": { + "my_vector": { + "vector": [8], + "k": 2 + } + } + } +} +``` +{% include copy-curl.html %} + +The response contains the two vectors closest to the query vector: + +
+ + Response + + {: .text-delta} + +```json +GET /_plugins/_knn/models/my-model?filter_path=state +{ + "took": 7, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 2, + "relation": "eq" + }, + "max_score": 0.5, + "hits": [ + { + "_index": "test-binary-ivf", + "_id": "2", + "_score": 0.5, + "_source": { + "my_vector": [ + 10 + ], + "price": 14.2 + } + }, + { + "_index": "test-binary-ivf", + "_id": "3", + "_score": 0.25, + "_source": { + "my_vector": [ + 15 + ], + "price": 19.1 + } + } + ] + } +} +``` +
diff --git a/_search-plugins/knn/approximate-knn.md b/_search-plugins/knn/approximate-knn.md index fa1b4096c7..0b5a48059b 100644 --- a/_search-plugins/knn/approximate-knn.md +++ b/_search-plugins/knn/approximate-knn.md @@ -314,6 +314,10 @@ To learn about using k-NN search with nested fields, see [k-NN search with neste To learn more about the radial search feature, see [k-NN radial search]({{site.url}}{{site.baseurl}}/search-plugins/knn/radial-search-knn/). +### Using approximate k-NN with binary vectors + +To learn more about using binary vectors with k-NN search, see [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-k-nn-vectors). + ## Spaces A space corresponds to the function used to measure the distance between two points in order to determine the k-nearest neighbors. From the k-NN perspective, a lower score equates to a closer and better result. This is the opposite of how OpenSearch scores results, where a greater score equates to a better result. To convert distances to OpenSearch scores, we take 1 / (1 + distance). The k-NN plugin supports the following spaces. @@ -325,9 +329,9 @@ Not every method supports each of these spaces. Be sure to check out [the method - - - + + + @@ -363,6 +367,11 @@ Not every method supports each of these spaces. Be sure to check out [the method \[ \text{If} d > 0, score = d + 1 \] \[\text{If} d \le 0\] \[score = {1 \over 1 + (-1 · d) }\] + + + + +
spaceTypeDistance Function (d)OpenSearch ScoreSpace typeDistance function (d)OpenSearch score
hamming (supported for binary vectors in OpenSearch version 2.16 and later)\[ d(\mathbf{x}, \mathbf{y}) = \text{countSetBits}(\mathbf{x} \oplus \mathbf{y})\]\[ score = {1 \over 1 + d } \]
The cosine similarity formula does not include the `1 -` prefix. However, because similarity search libraries equates @@ -374,3 +383,6 @@ With cosine similarity, it is not valid to pass a zero vector (`[0, 0, ...]`) as such a vector is 0, which raises a `divide by 0` exception in the corresponding formula. Requests containing the zero vector will be rejected and a corresponding exception will be thrown. {: .note } + +The `hamming` space type is supported for binary vectors in OpenSearch version 2.16 and later. For more information, see [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-k-nn-vectors). +{: .note} diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md index ed8b9217f5..a6ffd922eb 100644 --- a/_search-plugins/knn/knn-index.md +++ b/_search-plugins/knn/knn-index.md @@ -45,6 +45,10 @@ PUT /test-index Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine to reduce the amount of storage space needed. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector). +## Binary vector + +Starting with k-NN plugin version 2.16, you can use `binary` vectors with the `faiss` engine to reduce the amount of required storage space. For more information, see [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-k-nn-vectors). + ## SIMD optimization for the Faiss engine Starting with version 2.13, the k-NN plugin supports [Single Instruction Multiple Data (SIMD)](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) processing if the underlying hardware supports SIMD instructions (AVX2 on x64 architecture and Neon on ARM64 architecture). SIMD is supported by default on Linux machines only for the Faiss engine. SIMD architecture helps boost overall performance by improving indexing throughput and reducing search latency. @@ -105,13 +109,16 @@ An index created in OpenSearch version 2.11 or earlier will still use the old `e ### Supported Faiss methods Method name | Requires training | Supported spaces | Description -:--- | :--- | :--- | :--- -`hnsw` | false | l2, innerproduct | Hierarchical proximity graph approach to approximate k-NN search. -`ivf` | true | l2, innerproduct | Stands for _inverted file index_. Bucketing approach where vectors are assigned different buckets based on clustering and, during search, only a subset of the buckets is searched. +:--- | :--- |:---| :--- +`hnsw` | false | l2, innerproduct, hamming | Hierarchical proximity graph approach to approximate k-NN search. +`ivf` | true | l2, innerproduct, hamming | Stands for _inverted file index_. Bucketing approach where vectors are assigned different buckets based on clustering and, during search, only a subset of the buckets is searched. For hnsw, "innerproduct" is not available when PQ is used. {: .note} +The `hamming` space type is supported for binary vectors in OpenSearch version 2.16 and later. For more information, see [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-k-nn-vectors). +{: .note} + #### HNSW parameters Parameter name | Required | Default | Updatable | Description diff --git a/_search-plugins/knn/knn-score-script.md b/_search-plugins/knn/knn-score-script.md index 1696bd4cad..1a21f49513 100644 --- a/_search-plugins/knn/knn-score-script.md +++ b/_search-plugins/knn/knn-score-script.md @@ -319,7 +319,10 @@ A space corresponds to the function used to measure the distance between two poi - hammingbit + + hammingbit (supported for binary and long vectors)

+ hamming (supported for binary vectors in OpenSearch version 2.16 and later) + \[ d(\mathbf{x}, \mathbf{y}) = \text{countSetBits}(\mathbf{x} \oplus \mathbf{y})\] \[ score = {1 \over 1 + d } \] @@ -328,7 +331,8 @@ A space corresponds to the function used to measure the distance between two poi Cosine similarity returns a number between -1 and 1, and because OpenSearch relevance scores can't be below 0, the k-NN plugin adds 1 to get the final score. -With cosine similarity, it is not valid to pass a zero vector (`[0, 0, ...`]) as input. This is because the magnitude of -such a vector is 0, which raises a `divide by 0` exception in the corresponding formula. Requests -containing the zero vector will be rejected and a corresponding exception will be thrown. -{: .note } \ No newline at end of file +With cosine similarity, it is not valid to pass a zero vector (`[0, 0, ... ]`) as input. This is because the magnitude of such a vector is 0, which raises a `divide by 0` exception in the corresponding formula. Requests containing the zero vector will be rejected, and a corresponding exception will be thrown. +{: .note } + +The `hamming` space type is supported for binary vectors in OpenSearch version 2.16 and later. For more information, see [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-k-nn-vectors). +{: .note} diff --git a/_search-plugins/knn/painless-functions.md b/_search-plugins/knn/painless-functions.md index 09eb989702..85840ff535 100644 --- a/_search-plugins/knn/painless-functions.md +++ b/_search-plugins/knn/painless-functions.md @@ -52,6 +52,10 @@ Function name | Function signature | Description l2Squared | `float l2Squared (float[] queryVector, doc['vector field'])` | This function calculates the square of the L2 distance (Euclidean distance) between a given query vector and document vectors. The shorter the distance, the more relevant the document is, so this example inverts the return value of the l2Squared function. If the document vector matches the query vector, the result is 0, so this example also adds 1 to the distance to avoid divide by zero errors. l1Norm | `float l1Norm (float[] queryVector, doc['vector field'])` | This function calculates the square of the L2 distance (Euclidean distance) between a given query vector and document vectors. The shorter the distance, the more relevant the document is, so this example inverts the return value of the l2Squared function. If the document vector matches the query vector, the result is 0, so this example also adds 1 to the distance to avoid divide by zero errors. cosineSimilarity | `float cosineSimilarity (float[] queryVector, doc['vector field'])` | Cosine similarity is an inner product of the query vector and document vector normalized to both have a length of 1. If the magnitude of the query vector doesn't change throughout the query, you can pass the magnitude of the query vector to improve performance, instead of calculating the magnitude every time for every filtered document:
`float cosineSimilarity (float[] queryVector, doc['vector field'], float normQueryVector)`
In general, the range of cosine similarity is [-1, 1]. However, in the case of information retrieval, the cosine similarity of two documents ranges from 0 to 1 because the tf-idf statistic can't be negative. Therefore, the k-NN plugin adds 1.0 in order to always yield a positive cosine similarity score. +hamming | `float hamming (float[] queryVector, doc['vector field'])` | This function calculates the Hamming distance between a given query vector and document vectors. The Hamming distance is the number of positions at which the corresponding elements are different. The shorter the distance, the more relevant the document is, so this example inverts the return value of the Hamming distance. + +The `hamming` space type is supported for binary vectors in OpenSearch version 2.16 and later. For more information, see [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-k-nn-vectors). +{: .note} ## Constraints diff --git a/_search-plugins/vector-search.md b/_search-plugins/vector-search.md index 862b26b375..68f6dea08c 100644 --- a/_search-plugins/vector-search.md +++ b/_search-plugins/vector-search.md @@ -57,7 +57,7 @@ PUT test-index You must designate the field that will store vectors as a [`knn_vector`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) field type. OpenSearch supports vectors of up to 16,000 dimensions, each of which is represented as a 32-bit or 16-bit float. -To save storage space, you can use `byte` vectors. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector). +To save storage space, you can use `byte` or `binary` vectors. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector) and [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-k-nn-vectors). ### k-NN vector search From bc28bf9c745de3cb4f5ff147f148eb5c19a8c8f5 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 1 Aug 2024 15:14:46 -0600 Subject: [PATCH 087/154] Edit for redundant information and sections across Data Prepper (#7127) * Edit for redundant information and sections across Data Prepper Signed-off-by: Melissa Vagi * Edit for redundant information and sections across Data Prepper Signed-off-by: Melissa Vagi * Rewrite expression syntax and reorganize doc structure for readability Signed-off-by: Melissa Vagi * Rewrite expression syntax and reorganize doc structure for readability Signed-off-by: Melissa Vagi * Rewrite expression syntax and reorganize doc structure for readability Signed-off-by: Melissa Vagi * Rewrite expression syntax and reorganize doc structure for readability Signed-off-by: Melissa Vagi * Rewrite expression syntax and reorganize doc structure for readability Signed-off-by: Melissa Vagi * Update _data-prepper/index.md Signed-off-by: Melissa Vagi * Update configuring-data-prepper.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/expression-syntax.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/expression-syntax.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/pipelines.md Signed-off-by: Melissa Vagi * Update expression-syntax.md Signed-off-by: Melissa Vagi * Create Functions subpages Signed-off-by: Melissa Vagi * Create functions subpages Signed-off-by: Melissa Vagi * Copy edit Signed-off-by: Melissa Vagi * add remaining subpages Signed-off-by: Melissa Vagi * Update _data-prepper/index.md Co-authored-by: Nathan Bower Signed-off-by: Heather Halter * Apply suggestions from code review Accepted editorial suggestions. Co-authored-by: Nathan Bower Signed-off-by: Heather Halter * Apply suggestions from code review Accepted more editorial suggestions that were hidden. Co-authored-by: Nathan Bower Signed-off-by: Heather Halter * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: David Venable * removed-line Signed-off-by: Heather Halter * Fixed broken link to pipelines Signed-off-by: Heather Halter * Fixed broken links on Update add-entries.md Signed-off-by: Heather Halter * Fixed broken link in Update dynamo-db.md Signed-off-by: Heather Halter * Fixed link syntax in Update index.md Signed-off-by: Heather Halter --------- Signed-off-by: Melissa Vagi Signed-off-by: Heather Halter Signed-off-by: David Venable Signed-off-by: Heather Halter Co-authored-by: Heather Halter Co-authored-by: Nathan Bower Co-authored-by: David Venable --- _data-prepper/index.md | 55 ++- .../configuring-data-prepper.md | 3 +- _data-prepper/pipelines/cidrcontains.md | 24 ++ .../configuration/buffers/buffers.md | 8 +- .../configuration/processors/add-entries.md | 6 +- .../configuration/processors/processors.md | 10 +- .../pipelines/configuration/sinks/sinks.md | 16 +- .../configuration/sources/dynamo-db.md | 2 +- .../configuration/sources/sources.md | 6 +- _data-prepper/pipelines/contains.md | 36 ++ _data-prepper/pipelines/dlq.md | 2 +- _data-prepper/pipelines/expression-syntax.md | 263 +++++---------- _data-prepper/pipelines/functions.md | 18 + _data-prepper/pipelines/get-metadata.md | 42 +++ _data-prepper/pipelines/has-tags.md | 45 +++ _data-prepper/pipelines/join.md | 16 + _data-prepper/pipelines/length.md | 24 ++ .../pipelines-configuration-options.md | 18 - _data-prepper/pipelines/pipelines.md | 313 ++---------------- 19 files changed, 364 insertions(+), 543 deletions(-) create mode 100644 _data-prepper/pipelines/cidrcontains.md create mode 100644 _data-prepper/pipelines/contains.md create mode 100644 _data-prepper/pipelines/functions.md create mode 100644 _data-prepper/pipelines/get-metadata.md create mode 100644 _data-prepper/pipelines/has-tags.md create mode 100644 _data-prepper/pipelines/join.md create mode 100644 _data-prepper/pipelines/length.md delete mode 100644 _data-prepper/pipelines/pipelines-configuration-options.md diff --git a/_data-prepper/index.md b/_data-prepper/index.md index 423fe9fe95..e418aa1966 100644 --- a/_data-prepper/index.md +++ b/_data-prepper/index.md @@ -18,42 +18,24 @@ Data Prepper is a server-side data collector capable of filtering, enriching, tr With Data Prepper you can build custom pipelines to improve the operational view of applications. Two common use cases for Data Prepper are trace analytics and log analytics. [Trace analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/trace-analytics/) can help you visualize event flows and identify performance problems. [Log analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/log-analytics/) equips you with tools to enhance your search capabilities, conduct comprehensive analysis, and gain insights into your applications' performance and behavior. -## Concepts +## Key concepts and fundamentals -Data Prepper includes one or more **pipelines** that collect and filter data based on the components set within the pipeline. Each component is pluggable, enabling you to use your own custom implementation of each component. These components include the following: +Data Prepper ingests data through customizable [pipelines]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/). These pipelines consist of pluggable components that you can customize to fit your needs, even allowing you to plug in your own implementations. A Data Prepper pipeline consists of the following components: -- One [source](#source) -- One or more [sinks](#sink) -- (Optional) One [buffer](#buffer) -- (Optional) One or more [processors](#processor) +- One [source]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/sources/) +- One or more [sinks]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sinks/sinks/) +- (Optional) One [buffer]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/buffers/buffers/) +- (Optional) One or more [processors]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/processors/) -A single instance of Data Prepper can have one or more pipelines. +Each pipeline contains two required components: `source` and `sink`. If a `buffer`, a `processor`, or both are missing from the pipeline, then Data Prepper uses the default `bounded_blocking` buffer and a no-op processor. Note that a single instance of Data Prepper can have one or more pipelines. -Each pipeline definition contains two required components: **source** and **sink**. If buffers and processors are missing from the Data Prepper pipeline, Data Prepper uses the default buffer and a no-op processor. +## Basic pipeline configurations -### Source +To understand how the pipeline components function within a Data Prepper configuration, see the following examples. Each pipeline configuration uses a `yaml` file format. For more information, see [Pipelines]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/) for more information and examples. -Source is the input component that defines the mechanism through which a Data Prepper pipeline will consume events. A pipeline can have only one source. The source can consume events either by receiving the events over HTTP or HTTPS or by reading from external endpoints like OTeL Collector for traces and metrics and Amazon Simple Storage Service (Amazon S3). Sources have their own configuration options based on the format of the events (such as string, JSON, Amazon CloudWatch logs, or open telemetry trace). The source component consumes events and writes them to the buffer component. +### Minimal configuration -### Buffer - -The buffer component acts as the layer between the source and the sink. Buffer can be either in-memory or disk based. The default buffer uses an in-memory queue called `bounded_blocking` that is bounded by the number of events. If the buffer component is not explicitly mentioned in the pipeline configuration, Data Prepper uses the default `bounded_blocking`. - -### Sink - -Sink is the output component that defines the destination(s) to which a Data Prepper pipeline publishes events. A sink destination could be a service, such as OpenSearch or Amazon S3, or another Data Prepper pipeline. When using another Data Prepper pipeline as the sink, you can chain multiple pipelines together based on the needs of the data. Sink contains its own configuration options based on the destination type. - -### Processor - -Processors are units within the Data Prepper pipeline that can filter, transform, and enrich events using your desired format before publishing the record to the sink component. The processor is not defined in the pipeline configuration; the events publish in the format defined in the source component. You can have more than one processor within a pipeline. When using multiple processors, the processors are run in the order they are defined inside the pipeline specification. - -## Sample pipeline configurations - -To understand how all pipeline components function within a Data Prepper configuration, see the following examples. Each pipeline configuration uses a `yaml` file format. - -### Minimal component - -This pipeline configuration reads from the file source and writes to another file in the same path. It uses the default options for the buffer and processor. +The following minimal pipeline configuration reads from the file source and writes the data to another file on the same path. It uses the default options for the `buffer` and `processor` components. ```yml sample-pipeline: @@ -65,13 +47,13 @@ sample-pipeline: path: ``` -### All components +### Comprehensive configuration -The following pipeline uses a source that reads string events from the `input-file`. The source then pushes the data to the buffer, bounded by a max size of `1024`. The pipeline is configured to have `4` workers, each of them reading a maximum of `256` events from the buffer for every `100 milliseconds`. Each worker runs the `string_converter` processor and writes the output of the processor to the `output-file`. +The following comprehensive pipeline configuration uses both required and optional components: ```yml sample-pipeline: - workers: 4 #Number of workers + workers: 4 # Number of workers delay: 100 # in milliseconds, how often the workers should run source: file: @@ -88,9 +70,10 @@ sample-pipeline: path: ``` -## Next steps - -To get started building your own custom pipelines with Data Prepper, see [Getting started]({{site.url}}{{site.baseurl}}/clients/data-prepper/get-started/). +In the given pipeline configuration, the `source` component reads string events from the `input-file` and pushes the data to a bounded buffer with a maximum size of `1024`. The `workers` component specifies `4` concurrent threads that will process events from the buffer, each reading a maximum of `256` events from the buffer every `100` milliseconds. Each `workers` component runs the `string_converter` processor, which converts the strings to uppercase and writes the processed output to the `output-file`. - +## Next steps +- [Get started with Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/). +- [Get familiar with Data Prepper pipelines]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/). +- [Explore common use cases]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/common-use-cases/). diff --git a/_data-prepper/managing-data-prepper/configuring-data-prepper.md b/_data-prepper/managing-data-prepper/configuring-data-prepper.md index d890b741cc..e42a9e9449 100644 --- a/_data-prepper/managing-data-prepper/configuring-data-prepper.md +++ b/_data-prepper/managing-data-prepper/configuring-data-prepper.md @@ -103,8 +103,7 @@ check_interval | No | Duration | Specifies the time between checks of the heap s ### Extension plugins -Since Data Prepper 2.5, Data Prepper provides support for user configurable extension plugins. Extension plugins are shared common -configurations shared across pipeline plugins, such as [sources, buffers, processors, and sinks]({{site.url}}{{site.baseurl}}/data-prepper/index/#concepts). +Data Prepper provides support for user-configurable extension plugins. Extension plugins are common configurations shared across pipeline plugins, such as [sources, buffers, processors, and sinks]({{site.url}}{{site.baseurl}}/data-prepper/index/#key-concepts-and-fundamentals). ### AWS extension plugins diff --git a/_data-prepper/pipelines/cidrcontains.md b/_data-prepper/pipelines/cidrcontains.md new file mode 100644 index 0000000000..898f1bc1f5 --- /dev/null +++ b/_data-prepper/pipelines/cidrcontains.md @@ -0,0 +1,24 @@ +--- +layout: default +title: cidrContains() +parent: Functions +grand_parent: Pipelines +nav_order: 5 +--- + +# cidrContains() + +The `cidrContains()` function is used to check if an IP address is contained within a specified Classless Inter-Domain Routing (CIDR) block or range of CIDR blocks. It accepts two or more arguments: + +- The first argument is a JSON pointer, which represents the key or path to the field containing the IP address to be checked. It supports both IPv4 and IPv6 address formats. + +- The subsequent arguments are strings representing one or more CIDR blocks or IP address ranges. The function checks if the IP address specified in the first argument matches or is contained within any of these CIDR blocks. + +For example, if your data contains an IP address field named `client.ip` and you want to check if it belongs to the CIDR blocks `192.168.0.0/16` or `10.0.0.0/8`, you can use the `cidrContains()` function as follows: + +``` +cidrContains('/client.ip', '192.168.0.0/16', '10.0.0.0/8') +``` +{% include copy-curl.html %} + +This function returns `true` if the IP address matches any of the specified CIDR blocks or `false` if it does not. \ No newline at end of file diff --git a/_data-prepper/pipelines/configuration/buffers/buffers.md b/_data-prepper/pipelines/configuration/buffers/buffers.md index eeb68260ea..287825b549 100644 --- a/_data-prepper/pipelines/configuration/buffers/buffers.md +++ b/_data-prepper/pipelines/configuration/buffers/buffers.md @@ -3,9 +3,13 @@ layout: default title: Buffers parent: Pipelines has_children: true -nav_order: 20 +nav_order: 30 --- # Buffers -Buffers store data as it passes through the pipeline. If you implement a custom buffer, it can be memory based, which provides better performance, or disk based, which is larger in size. \ No newline at end of file +The `buffer` component acts as an intermediary layer between the `source` and `sink` components in a Data Prepper pipeline. It serves as temporary storage for events, decoupling the `source` from the downstream processors and sinks. Buffers can be either in-memory or disk based. + +If not explicitly specified in the pipeline configuration, Data Prepper uses the default `bounded_blocking` buffer, which is an in-memory queue bounded by the number of events it can store. The `bounded_blocking` buffer is a convenient option when the event volume and processing rates are manageable within the available memory constraints. + + diff --git a/_data-prepper/pipelines/configuration/processors/add-entries.md b/_data-prepper/pipelines/configuration/processors/add-entries.md index 26b95c7b64..c32e8adb3d 100644 --- a/_data-prepper/pipelines/configuration/processors/add-entries.md +++ b/_data-prepper/pipelines/configuration/processors/add-entries.md @@ -21,8 +21,8 @@ You can configure the `add_entries` processor with the following options. | `metadata_key` | No | The key for the new metadata attribute. The argument must be a literal string key and not a JSON Pointer. Either one string key or `metadata_key` is required. | | `value` | No | The value of the new entry to be added, which can be used with any of the following data types: strings, Booleans, numbers, null, nested objects, and arrays. | | `format` | No | A format string to use as the value of the new entry, for example, `${key1}-${key2}`, where `key1` and `key2` are existing keys in the event. Required if neither `value` nor `value_expression` is specified. | -| `value_expression` | No | An expression string to use as the value of the new entry. For example, `/key` is an existing key in the event with a type of either a number, a string, or a Boolean. Expressions can also contain functions returning number/string/integer. For example, `length(/key)` will return the length of the key in the event when the key is a string. For more information about keys, see [Expression syntax](https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/). | -| `add_when` | No | A [conditional expression](https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/), such as `/some-key == "test"'`, that will be evaluated to determine whether the processor will be run on the event. | +| `value_expression` | No | An expression string to use as the value of the new entry. For example, `/key` is an existing key in the event with a type of either a number, a string, or a Boolean. Expressions can also contain functions returning number/string/integer. For example, `length(/key)` will return the length of the key in the event when the key is a string. For more information about keys, see [Expression syntax]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). | +| `add_when` | No | A [conditional expression]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/), such as `/some-key == "test"'`, that will be evaluated to determine whether the processor will be run on the event. | | `overwrite_if_key_exists` | No | When set to `true`, the existing value is overwritten if `key` already exists in the event. The default value is `false`. | | `append_if_key_exists` | No | When set to `true`, the existing value will be appended if a `key` already exists in the event. An array will be created if the existing value is not an array. Default is `false`. | @@ -135,7 +135,7 @@ When the input event contains the following data: {"message": "hello"} ``` -The processed event will have the same data, with the metadata, `{"length": 5}`, attached. You can subsequently use expressions like `getMetadata("length")` in the pipeline. For more information, see the [`getMetadata` function](https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/#getmetadata) documentation. +The processed event will have the same data, with the metadata, `{"length": 5}`, attached. You can subsequently use expressions like `getMetadata("length")` in the pipeline. For more information, see [`getMetadata` function]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/get-metadata/). ### Example: Add a dynamic key diff --git a/_data-prepper/pipelines/configuration/processors/processors.md b/_data-prepper/pipelines/configuration/processors/processors.md index 3000d71670..1fa7120551 100644 --- a/_data-prepper/pipelines/configuration/processors/processors.md +++ b/_data-prepper/pipelines/configuration/processors/processors.md @@ -3,12 +3,14 @@ layout: default title: Processors has_children: true parent: Pipelines -nav_order: 25 +nav_order: 35 --- # Processors -Processors perform an action on your data, such as filtering, transforming, or enriching. +Processors are components within a Data Prepper pipeline that enable you to filter, transform, and enrich events using your desired format before publishing records to the `sink` component. If no `processor` is defined in the pipeline configuration, then the events are published in the format specified by the `source` component. You can incorporate multiple processors within a single pipeline, and they are executed sequentially as defined in the pipeline. + +Prior to Data Prepper 1.3, these components were named *preppers*. In Data Prepper 1.3, the term *prepper* was deprecated in favor of *processor*. In Data Prepper 2.0, the term *prepper* was removed. +{: .note } + -Prior to Data Prepper 1.3, processors were named preppers. Starting in Data Prepper 1.3, the term *prepper* is deprecated in favor of the term *processor*. Data Prepper will continue to support the term *prepper* until 2.0, where it will be removed. -{: .note } \ No newline at end of file diff --git a/_data-prepper/pipelines/configuration/sinks/sinks.md b/_data-prepper/pipelines/configuration/sinks/sinks.md index 0f3af6ab25..51bf3b1c9c 100644 --- a/_data-prepper/pipelines/configuration/sinks/sinks.md +++ b/_data-prepper/pipelines/configuration/sinks/sinks.md @@ -3,20 +3,22 @@ layout: default title: Sinks parent: Pipelines has_children: true -nav_order: 30 +nav_order: 25 --- # Sinks -Sinks define where Data Prepper writes your data to. +A `sink` is an output component that specifies the destination(s) to which a Data Prepper pipeline publishes events. Sink destinations can be services like OpenSearch, Amazon Simple Storage Service (Amazon S3), or even another Data Prepper pipeline, enabling chaining of multiple pipelines. The sink component has the following configurable options that you can use to customize the destination type. -## General options for all sink types +## Configuration options The following table describes options you can use to configure the `sinks` sink. Option | Required | Type | Description :--- | :--- |:------------| :--- -routes | No | String list | A list of routes for which this sink applies. If not provided, this sink receives all events. See [conditional routing]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines#conditional-routing) for more information. -tags_target_key | No | String | When specified, includes event tags in the output of the provided key. -include_keys | No | String list | When specified, provides the keys in this list in the data sent to the sink. Some codecs and sinks do not allow use of this field. -exclude_keys | No | String list | When specified, excludes the keys given from the data sent to the sink. Some codecs and sinks do not allow use of this field. +`routes` | No | String list | A list of routes to which the sink applies. If not provided, then the sink receives all events. See [conditional routing]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines#conditional-routing) for more information. +`tags_target_key` | No | String | When specified, includes event tags in the output under the provided key. +`include_keys` | No | String list | When specified, provides only the listed keys in the data sent to the sink. Some codecs and sinks may not support this field. +`exclude_keys` | No | String list | When specified, excludes the listed keys from the data sent to the sink. Some codecs and sinks may not support this field. + + diff --git a/_data-prepper/pipelines/configuration/sources/dynamo-db.md b/_data-prepper/pipelines/configuration/sources/dynamo-db.md index f75489f103..e465f45044 100644 --- a/_data-prepper/pipelines/configuration/sources/dynamo-db.md +++ b/_data-prepper/pipelines/configuration/sources/dynamo-db.md @@ -92,7 +92,7 @@ Option | Required | Type | Description ## Exposed metadata attributes -The following metadata will be added to each event that is processed by the `dynamodb` source. These metadata attributes can be accessed using the [expression syntax `getMetadata` function](https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/#getmetadata). +The following metadata will be added to each event that is processed by the `dynamodb` source. These metadata attributes can be accessed using the [expression syntax `getMetadata` function]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/get-metadata/). * `primary_key`: The primary key of the DynamoDB item. For tables that only contain a partition key, this value provides the partition key. For tables that contain both a partition and sort key, the `primary_key` attribute will be equal to the partition and sort key, separated by a `|`, for example, `partition_key|sort_key`. * `partition_key`: The partition key of the DynamoDB item. diff --git a/_data-prepper/pipelines/configuration/sources/sources.md b/_data-prepper/pipelines/configuration/sources/sources.md index b684db56e9..811b161e16 100644 --- a/_data-prepper/pipelines/configuration/sources/sources.md +++ b/_data-prepper/pipelines/configuration/sources/sources.md @@ -3,9 +3,11 @@ layout: default title: Sources parent: Pipelines has_children: true -nav_order: 15 +nav_order: 20 --- # Sources -Sources define where your data comes from within a Data Prepper pipeline. +A `source` is an input component that specifies how a Data Prepper pipeline ingests events. Each pipeline has a single source that either receives events over HTTP(S) or reads from external endpoints, such as OpenTelemetry Collector or Amazon Simple Storage Service (Amazon S3). Sources have configurable options based on the event format (string, JSON, Amazon CloudWatch logs, OpenTelemtry traces). The source consumes events and passes them to the [`buffer`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/buffers/buffers/) component. + + diff --git a/_data-prepper/pipelines/contains.md b/_data-prepper/pipelines/contains.md new file mode 100644 index 0000000000..657f66bd28 --- /dev/null +++ b/_data-prepper/pipelines/contains.md @@ -0,0 +1,36 @@ +--- +layout: default +title: contains() +parent: Functions +grand_parent: Pipelines +nav_order: 10 +--- + +# contains() + +The `contains()` function is used to check if a substring exists within a given string or the value of a field in an event. It takes two arguments: + +- The first argument is either a literal string or a JSON pointer that represents the field or value to be searched. + +- The second argument is the substring to be searched for within the first argument. +The function returns `true` if the substring specified in the second argument is found within the string or field value represented by the first argument. It returns `false` if it is not. + +For example, if you want to check if the string `"abcd"` is contained within the value of a field named `message`, you can use the `contains()` function as follows: + +``` +contains('/message', 'abcd') +``` +{% include copy-curl.html %} + +This will return `true` if the field `message` contains the substring `abcd` or `false` if it does not. + +Alternatively, you can also use a literal string as the first argument: + +``` +contains('This is a test message', 'test') +``` +{% include copy-curl.html %} + +In this case, the function will return `true` because the substring `test` is present within the string `This is a test message`. + +Note that the `contains()` function performs a case-sensitive search by default. If you need to perform a case-insensitive search, you can use the `containsIgnoreCase()` function instead. diff --git a/_data-prepper/pipelines/dlq.md b/_data-prepper/pipelines/dlq.md index 3032536e93..ac1d868ea4 100644 --- a/_data-prepper/pipelines/dlq.md +++ b/_data-prepper/pipelines/dlq.md @@ -2,7 +2,7 @@ layout: default title: Dead-letter queues parent: Pipelines -nav_order: 13 +nav_order: 15 --- # Dead-letter queues diff --git a/_data-prepper/pipelines/expression-syntax.md b/_data-prepper/pipelines/expression-syntax.md index b4603e34f9..383b54c19b 100644 --- a/_data-prepper/pipelines/expression-syntax.md +++ b/_data-prepper/pipelines/expression-syntax.md @@ -2,70 +2,41 @@ layout: default title: Expression syntax parent: Pipelines -nav_order: 12 +nav_order: 5 --- -# Expression syntax +# Expression syntax -The following sections provide information about expression syntax in Data Prepper. +Expressions provide flexibility in manipulating, filtering, and routing data. The following sections provide information about expression syntax in Data Prepper. -## Supported operators +## Key terms -Operators are listed in order of precedence (top to bottom, left to right). +The following key terms are used in the context of expressions. -| Operator | Description | Associativity | -|----------------------|-------------------------------------------------------|---------------| -| `()` | Priority Expression | left-to-right | -| `not`
`+`
`-`| Unary Logical NOT
Unary Positive
Unary negative | right-to-left | -| `<`, `<=`, `>`, `>=` | Relational Operators | left-to-right | -| `==`, `!=` | Equality Operators | left-to-right | -| `and`, `or` | Conditional Expression | left-to-right | - -## Reserved for possible future functionality - -Reserved symbol set: `^`, `*`, `/`, `%`, `+`, `-`, `xor`, `=`, `+=`, `-=`, `*=`, `/=`, `%=`, `++`, `--`, `${}` - -## Set initializer +Term | Definition +-----|----------- +**Expression** | A generic component that contains a primary or an operator. Expressions can be nested within other expressions. An expression's imminent children can contain 0–1 operators. +**Expression string** | The highest priority in a Data Prepper expression and supports only one expression string resulting in a return value. An expression string is not the same as an expression. +**Literal** | A fundamental value that has no children. A literal can be one of the following: float, integer, Boolean, JSON pointer, string, or null. See [Literals](#literals). +**Operator** | A hardcoded token that identifies the operation used in an expression. +**Primary** | Can be one of the following: set initializer, priority expression, or literal. +**Statement** | The highest-priority component within an expression string. -The set initializer defines a set or term and/or expressions. - -### Examples - -The following are examples of set initializer syntax. - -#### HTTP status codes - -``` -{200, 201, 202} -``` - -#### HTTP response payloads - -``` -{"Created", "Accepted"} -``` - -#### Handle multiple event types with different keys - -``` -{/request_payload, /request_message} -``` +## Operators -## Priority expression +The following table lists the supported operators. Operators are listed in order of precedence (top to bottom, left to right). -A priority expression identifies an expression that will be evaluated at the highest priority level. A priority expression must contain an expression or value; empty parentheses are not supported. - -### Example - -``` -/is_cool == (/name == "Steven") -``` - -## Relational operators +| Operator | Description | Associativity | +|----------------------|-------------------------------------------------------|---------------| +| `()` | Priority expression | Left to right | +| `not`
`+`
`-`| Unary logical NOT
Unary positive
Unary negative | Right to left | +| `<`, `<=`, `>`, `>=` | Relational operators | Left to right | +| `==`, `!=` | Equality operators | Left to right | +| `and`, `or` | Conditional expression | Left to right | -Relational operators are used to test the relationship of two numeric values. The operands must be numbers or JSON Pointers that resolve to numbers. +### Relational operators -### Syntax +Relational operators compare numeric values or JSON pointers that resolve to numeric values. The operators are used to test the relationship between two operands, determining if one is greater than, less than, or equal to the other. The syntax for using relational operators is as follows: ``` < @@ -73,75 +44,44 @@ Relational operators are used to test the relationship of two numeric values. Th > >= ``` +{% include copy-curl.html %} -### Example +For example, to check if the value of the `status_code` field in an event is within the range of successful HTTP responses (200--299), you can use the following expression: ``` /status_code >= 200 and /status_code < 300 ``` +{% include copy-curl.html %} -## Equality operators +### Equality operators -Equality operators are used to test whether two values are equivalent. +Equality operators are used to test whether two values are equivalent. These operators compare values of any type, including JSON pointers, literals, and expressions. The syntax for using equality operators is as follows: -### Syntax ``` == != ``` +{% include copy-curl.html %} -### Examples -``` -/is_cool == true -3.14 != /status_code -{1, 2} == /event/set_property -``` -## Using equality operators to check for a JSON Pointer - -Equality operators can also be used to check whether a JSON Pointer exists by comparing the value with `null`. +The following are some example equality operators: -### Syntax -``` - == null - != null -null == -null != -``` +- `/is_cool == true`: Checks if the value referenced by the JSON pointer is equal to the Boolean value. +- `3.14 != /status_code`: Checks if the numeric value is not equal to the value referenced by the JSON pointer. +- `{1, 2} == /event/set_property`: Checks if the array is equal to the value referenced by the JSON pointer. -### Example -``` -/response == null -null != /response -``` +### Conditional expressions -## Type check operator +Conditional expressions allow you to combine multiple expressions or values using logical operators to create more complex evaluation criteria. The available conditional operators are `and`, `or`, and `not`. The syntax for using these conditional operators is as follows: -The type check operator tests whether a JSON Pointer is of a certain data type. - -### Syntax -``` - typeof -``` -Supported data types are `integer`, `long`, `boolean`, `double`, `string`, `map`, and `array`. - -#### Example -``` -/response typeof integer -/message typeof string -``` - -### Conditional expression - -A conditional expression is used to chain together multiple expressions and/or values. - -#### Syntax ``` and or not ``` +{% include copy-curl.html %} + +The following are some example conditional expressions: -### Example ``` /status_code == 200 and /message == "Hello world" /status_code == 200 or /status_code == 202 @@ -149,80 +89,80 @@ not /status_code in {200, 202} /response == null /response != null ``` +{% include copy-curl.html %} -## Definitions +### Reserved symbols -This section provides expression definitions. +Reserved symbols are symbols that are not currently used in the expression syntax but are reserved for possible future functionality or extensions. Reserved symbols include `^`, `*`, `/`, `%`, `+`, `-`, `xor`, `=`, `+=`, `-=`, `*=`, `/=`, `%=`, `++`, `--`, and `${}`. -### Literal -A literal is a fundamental value that has no children: -- Float: Supports values from 3.40282347 × 1038 to 1.40239846 × 10−45. -- Integer: Supports values from −2,147,483,648 to 2,147,483,647. -- Boolean: Supports true or false. -- JSON Pointer: See the [JSON Pointer](#json-pointer) section for details. -- String: Supports valid Java strings. -- Null: Supports null check to see whether a JSON Pointer exists. +## Syntax components -### Expression string -An expression string takes the highest priority in a Data Prepper expression and only supports one expression string resulting in a return value. An _expression string_ is not the same as an _expression_. +Syntax components are the building blocks of expressions in Data Prepper. They allow you to define sets, specify evaluation order, reference values within events, use literal values, and follow specific white space rules. Understanding these components is crucial for creating and working with expressions effectively in Data Prepper pipelines. -### Statement -A statement is the highest-priority component of an expression string. +### Priority expressions -### Expression -An expression is a generic component that contains a _Primary_ or an _Operator_. Expressions may contain expressions. An expression's imminent children can contain 0–1 _Operators_. +Priority expressions specify the evaluation order of expressions. They are enclosed in parentheses `()`. Priority expressions must contain an expression or value (empty parentheses are not supported). The following is an example priority expression: -### Primary +``` +/is_cool == (/name == "Steven") +``` +{% include copy-curl.html %} -- _Set_ -- _Priority Expression_ -- _Literal_ +### JSON pointers -### Operator -An operator is a hardcoded token that identifies the operation used in an _expression_. +JSON pointers are used to reference values within an event. They start with a leading forward slash `/` followed by alphanumeric characters or underscores that are separated by additional forward slashes `/`. -### JSON Pointer -A JSON Pointer is a literal used to reference a value within an event and provided as context for an _expression string_. JSON Pointers are identified by a leading `/` containing alphanumeric characters or underscores, delimited by `/`. JSON Pointers can use an extended character set if wrapped in double quotes (`"`) using the escape character `\`. Note that JSON Pointers require `~` and `/` characters, which should be used as part of the path and not as a delimiter that needs to be escaped. +JSON pointers can use an extended character set by wrapping the entire pointer in double quotation marks `""` and escaping characters using a backslash `\`. Note that the `~` and `/` characters are considered to be part of the pointer path and do not need to be escaped. The following are some examples of valid JSON pointers: `~0` to represent the literal character `~` or `~1` to represent the literal character `/`. -The following are examples of JSON Pointers: +#### Shorthand syntax -- `~0` representing `~` -- `~1` representing `/` +The shorthand syntax for a JSON pointer can be expressed using the following regular expression pattern, where `\w` represents any word character (A--Z, a-z, 0--9, or underscore): -#### Shorthand syntax (Regex, `\w` = `[A-Za-z_]`) ``` -/\w+(/\w+)* +/\w+(/\w+)*` ``` +{% include copy-curl.html %} + -#### Example of shorthand - -The following is an example of shorthand: +The following is an example of this shorthand syntax: ``` /Hello/World/0 ``` +{% include copy-curl.html %} -#### Example of escaped syntax +#### Escaped syntax + +The escaped syntax for a JSON pointer can be expressed as follows: -The following is an example of escaped syntax: ``` "/(/)*" ``` +{% include copy-curl.html %} -#### Example of an escaped JSON Pointer +The following is an example of an escaped JSON pointer: -The following is an example of an escaped JSON Pointer: ``` # Path # { "Hello - 'world/" : [{ "\"JsonPointer\"": true }] } "/Hello - 'world\//0/\"JsonPointer\"" ``` +{% include copy-curl.html %} + +### Literals -## White space +Literals are fundamental values that have no children. Data Prepper supports the following literal types: -White space is **optional** surrounding relational operators, regex equality operators, equality operators, and commas. -White space is **required** surrounding set initializers, priority expressions, set operators, and conditional expressions. +- **Float:** Supports values from 3.40282347 x 10^38 to 1.40239846 x 10^-45. +- **Integer:** Supports values from -2,147,483,648 to 2,147,483,647. +- **Boolean:** Supports `true` or `false`. +- **JSON pointer:** See [JSON pointers](#json-pointers) for more information. +- **String:** Supports valid Java strings. +- **Null:** Supports `null` to check if a JSON pointer exists. +### White space rules + +White space is optional around relational operators, regex equality operators, equality operators, and commas. White space is required around set initializers, priority expressions, set operators, and conditional expressions. | Operator | Description | White space required | ✅ Valid examples | ❌ Invalid examples | |----------------------|--------------------------|----------------------|----------------------------------------------------------------|---------------------------------------| @@ -230,53 +170,12 @@ White space is **required** surrounding set initializers, priority expressions, | `()` | Priority expression | Yes | `/a==(/b==200)`
`/a in ({200})` | `/status in({200})` | | `in`, `not in` | Set operators | Yes | `/a in {200}`
`/a not in {400}` | `/a in{200, 202}`
`/a not in{400}` | | `<`, `<=`, `>`, `>=` | Relational operators | No | `/status < 300`
`/status>=300` | | -| `=~`, `!~` | Regex equality pperators | No | `/msg =~ "^\w*$"`
`/msg=~"^\w*$"` | | +| `=~`, `!~` | Regex equality operators | No | `/msg =~ "^\w*$"`
`/msg=~"^\w*$"` | | | `==`, `!=` | Equality operators | No | `/status == 200`
`/status_code==200` | | | `and`, `or`, `not` | Conditional operators | Yes | `/a<300 and /b>200` | `/b<300and/b>200` | | `,` | Set value delimiter | No | `/a in {200, 202}`
`/a in {200,202}`
`/a in {200 , 202}` | `/a in {200,}` | | `typeof` | Type check operator | Yes | `/a typeof integer`
`/a typeof long`
`/a typeof string`
`/a typeof double`
`/a typeof boolean`
`/a typeof map`
`/a typeof array` |`/a typeof /b`
`/a typeof 2` | +## Related articles -## Functions - -Data Prepper supports the following built-in functions that can be used in an expression. - -### `length()` - -The `length()` function takes one argument of the JSON pointer type and returns the length of the value passed. For example, `length(/message)` returns a length of `10` when a key message exists in the event and has a value of `1234567890`. - -### `hasTags()` - -The `hasTags()` function takes one or more string type arguments and returns `true` if all of the arguments passed are present in an event's tags. When an argument does not exist in the event's tags, the function returns `false`. For example, if you use the expression `hasTags("tag1")` and the event contains `tag1`, Data Prepper returns `true`. If you use the expression `hasTags("tag2")` but the event only contains `tag1`, Data Prepper returns `false`. - -### `getMetadata()` - -The `getMetadata()` function takes one literal string argument to look up specific keys in a an event's metadata. If the key contains a `/`, then the function looks up the metadata recursively. When passed, the expression returns the value corresponding to the key. The value returned can be of any type. For example, if the metadata contains `{"key1": "value2", "key2": 10}`, then the function, `getMetadata("key1")`, returns `value2`. The function, `getMetadata("key2")`, returns 10. - -### `contains()` - -The `contains()` function takes two string arguments and determines whether either a literal string or a JSON pointer is contained within an event. When the second argument contains a substring of the first argument, such as `contains("abcde", "abcd")`, the function returns `true`. If the second argument does not contain any substrings, such as `contains("abcde", "xyz")`, it returns `false`. - -### `cidrContains()` - -The `cidrContains()` function takes two or more arguments. The first argument is a JSON pointer, which represents the key to the IP address that is checked. It supports both IPv4 and IPv6 addresses. Every argument that comes after the key is a string type that represents CIDR blocks that are checked against. - -If the IP address in the first argument is in the range of any of the given CIDR blocks, the function returns `true`. If the IP address is not in the range of the CIDR blocks, the function returns `false`. For example, `cidrContains(/sourceIp,"192.0.2.0/24","10.0.1.0/16")` will return `true` if the `sourceIp` field indicated in the JSON pointer has a value of `192.0.2.5`. - -### `join()` - -The `join()` function joins elements of a list to form a string. The function takes a JSON pointer, which represents the key to a list or a map where values are of the list type, and joins the lists as strings using commas (`,`), the default delimiter between strings. - -If `{"source": [1, 2, 3]}` is the input data, as shown in the following example: - - -```json -{"source": {"key1": [1, 2, 3], "key2": ["a", "b", "c"]}} -``` - -Then `join(/source)` will return `"1,2,3"` in the following format: - -```json -{"key1": "1,2,3", "key2": "a,b,c"} -``` -You can also specify a delimiter other than the default inside the expression. For example, `join("-", /source)` joins each `source` field using a hyphen (`-`) as the delimiter. +- [Functions]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/functions/) diff --git a/_data-prepper/pipelines/functions.md b/_data-prepper/pipelines/functions.md new file mode 100644 index 0000000000..f0661faba4 --- /dev/null +++ b/_data-prepper/pipelines/functions.md @@ -0,0 +1,18 @@ +--- +layout: default +title: Functions +parent: Pipelines +nav_order: 10 +has_children: true +--- + +# Functions + +Data Prepper offers a range of built-in functions that can be used within expressions to perform common data preprocessing tasks, such as calculating lengths, checking for tags, retrieving metadata, searching for substrings, checking IP address ranges, and joining list elements. These functions include the following: + +- [`cidrContains()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/cidrcontains/) +- [`contains()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/contains/) +- [`getMetadata()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/get-metadata/) +- [`hasTags()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/has-tags/) +- [`join()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/join/) +- [`length()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/length/) \ No newline at end of file diff --git a/_data-prepper/pipelines/get-metadata.md b/_data-prepper/pipelines/get-metadata.md new file mode 100644 index 0000000000..fc89ed51d6 --- /dev/null +++ b/_data-prepper/pipelines/get-metadata.md @@ -0,0 +1,42 @@ +--- +layout: default +title: getMetadata() +parent: Functions +grand_parent: Pipelines +nav_order: 15 +--- + +# getMetadata() + +The `getMetadata()` function takes one literal string argument and looks up specific keys in event metadata. + +If the key contains a `/`, then the function looks up the metadata recursively. When passed, the expression returns the value corresponding to the key. + +The value returned can be of any type. For example, if the metadata contains `{"key1": "value2", "key2": 10}`, then the function `getMetadata("key1")` returns `value2`. The function `getMetadata("key2")` returns `10`. + +#### Example + +```json +{ + "event": { + "metadata": { + "key1": "value2", + "key2": 10 + }, + "data": { + // ... + } + }, + "output": [ + { + "key": "key1", + "value": "value2" + }, + { + "key": "key2", + "value": 10 + } + ] +} +``` +{% include copy-curl.html %} diff --git a/_data-prepper/pipelines/has-tags.md b/_data-prepper/pipelines/has-tags.md new file mode 100644 index 0000000000..d6cb498b11 --- /dev/null +++ b/_data-prepper/pipelines/has-tags.md @@ -0,0 +1,45 @@ +--- +layout: default +title: hasTags() +parent: Functions +grand_parent: Pipelines +nav_order: 20 +--- + +# hasTags() + +The `hasTags()` function takes one or more string type arguments and returns `true` if all of the arguments passed are present in an event's tags. If an argument does not exist in the event's tags, then the function returns `false`. + +For example, if you use the expression `hasTags("tag1")` and the event contains `tag1`, then Data Prepper returns `true`. If you use the expression `hasTags("tag2")` but the event only contains `tag1`, then Data Prepper returns `false`. + +#### Example + +```json +{ + "events": [ + { + "tags": ["tag1"], + "data": { + // ... + } + }, + { + "tags": ["tag1", "tag2"], + "data": { + // ... + } + } + ], + "expressions": [ + { + "expression": "hasTags(\"tag1\")", + "expected_results": [true, true] + }, + { + "expression": "hasTags(\"tag2\")", + "expected_results": [false, true] + } + ] +} +``` +{% include copy-curl.html %} diff --git a/_data-prepper/pipelines/join.md b/_data-prepper/pipelines/join.md new file mode 100644 index 0000000000..3a4d77d5c2 --- /dev/null +++ b/_data-prepper/pipelines/join.md @@ -0,0 +1,16 @@ +--- +layout: default +title: join() +parent: Functions +grand_parent: Pipelines +nav_order: 25 +--- + +# join() + + +The `join()` function joins elements of a list to form a string. The function takes a JSON pointer, which represents the key to a list or map where values are of the list type, and joins the lists as strings using commas `,`. Commas are the default delimiter between strings. + +If `{"source": [1, 2, 3]}` is the input data, as in `{"source": {"key1": [1, 2, 3], "key2": ["a", "b", "c"]}}`, then `join(/source)` returns `"1,2,3"` in the following format: `{"key1": "1,2,3", "key2": "a,b,c"}`. + +You can specify an alternative delimiter inside the expression. For example, `join("-", /source)` joins each source field using a hyphen `-` as the delimiter. diff --git a/_data-prepper/pipelines/length.md b/_data-prepper/pipelines/length.md new file mode 100644 index 0000000000..fca4b10df2 --- /dev/null +++ b/_data-prepper/pipelines/length.md @@ -0,0 +1,24 @@ +--- +layout: default +title: length() +parent: Functions +grand_parent: Pipelines +nav_order: 30 +--- + +# length() + +The `length()` function takes one argument of the JSON pointer type and returns the length of the passed value. For example, `length(/message)` returns a length of `10` when a key message exists in the event and has a value of `1234567890`. + +#### Example + +```json +{ + "event": { + "/message": "1234567890" + }, + "expression": "length(/message)", + "expected_output": 10 +} +``` +{% include copy-curl.html %} diff --git a/_data-prepper/pipelines/pipelines-configuration-options.md b/_data-prepper/pipelines/pipelines-configuration-options.md deleted file mode 100644 index 5667906af1..0000000000 --- a/_data-prepper/pipelines/pipelines-configuration-options.md +++ /dev/null @@ -1,18 +0,0 @@ ---- -layout: default -title: Pipeline options -parent: Pipelines -nav_order: 11 ---- - -# Pipeline options - -This page provides information about pipeline configuration options in Data Prepper. - -## General pipeline options - -Option | Required | Type | Description -:--- | :--- | :--- | :--- -workers | No | Integer | Essentially the number of application threads. As a starting point for your use case, try setting this value to the number of CPU cores on the machine. Default is 1. -delay | No | Integer | Amount of time in milliseconds workers wait between buffer read attempts. Default is `3000`. - diff --git a/_data-prepper/pipelines/pipelines.md b/_data-prepper/pipelines/pipelines.md index e897ed5596..d519f0da80 100644 --- a/_data-prepper/pipelines/pipelines.md +++ b/_data-prepper/pipelines/pipelines.md @@ -10,11 +10,15 @@ redirect_from: # Pipelines -The following image illustrates how a pipeline works. +Pipelines are critical components that streamline the process of acquiring, transforming, and loading data from various sources into a centralized data repository or processing system. The following diagram illustrates how Data Prepper ingests data into OpenSearch. Data Prepper pipeline{: .img-fluid} -To use Data Prepper, you define pipelines in a configuration YAML file. Each pipeline is a combination of a source, a buffer, zero or more processors, and one or more sinks. For example: +## Configuring Data Prepper pipelines + +Pipelines are defined in the configuration YAML file. Starting with Data Prepper 2.0, you can define pipelines across multiple YAML configuration files, with each file containing the configuration for one or more pipelines. This gives you flexibility to organize and chain together complex pipeline configurations. To ensure proper loading of your pipeline configurations, place the YAML configuration files in the `pipelines` folder in your application's home directory, for example, `/usr/share/data-prepper`. + +The following is an example configuration: ```yml simple-sample-pipeline: @@ -32,36 +36,36 @@ simple-sample-pipeline: sink: - stdout: ``` +{% include copy-curl.html %} -- Sources define where your data comes from. In this case, the source is a random UUID generator (`random`). - -- Buffers store data as it passes through the pipeline. - - By default, Data Prepper uses its one and only buffer, the `bounded_blocking` buffer, so you can omit this section unless you developed a custom buffer or need to tune the buffer settings. +### Pipeline components -- Processors perform some action on your data: filter, transform, enrich, etc. +The following table describes the components used in the given pipeline. - You can have multiple processors, which run sequentially from top to bottom, not in parallel. The `string_converter` processor transform the strings by making them uppercase. +Option | Required | Type | Description +:--- | :--- |:------------| :--- +`workers` | No | Integer | The number of application threads. Set to the number of CPU cores. Default is `1`. +`delay` | No | Integer | The number of milliseconds that `workers` wait between buffer read attempts. Default is `3000`. +`source` | Yes | String list | `random` generates random numbers by using a Universally Unique Identifier (UUID) generator. +`bounded_blocking` | No | String list | The default buffer in Data Prepper. +`processor` | No | String list | A `string_converter` with an `upper_case` processor that converts strings to uppercase. +`sink` | Yes | `stdout` outputs to standard output. -- Sinks define where your data goes. In this case, the sink is stdout. +## Pipeline concepts -Starting from Data Prepper 2.0, you can define pipelines across multiple configuration YAML files, where each file contains the configuration for one or more pipelines. This gives you more freedom to organize and chain complex pipeline configurations. For Data Prepper to load your pipeline configuration properly, place your configuration YAML files in the `pipelines` folder under your application's home directory (e.g. `/usr/share/data-prepper`). -{: .note } +The following are fundamental concepts relating to Data Prepper pipelines. -## End-to-end acknowledgments +### End-to-end acknowledgments -Data Prepper ensures the durability and reliability of data written from sources and delivered to sinks through end-to-end (E2E) acknowledgments. An E2E acknowledgment begins at the source, which monitors a batch of events set inside pipelines and waits for a positive acknowledgment when those events are successfully pushed to sinks. When a pipeline contains multiple sinks, including sinks set as additional Data Prepper pipelines, the E2E acknowledgment sends when events are received by the final sink in a pipeline chain. +Data Prepper ensures reliable and durable data delivery from sources to sinks through end-to-end (E2E) acknowledgments. The E2E acknowledgment process begins at the source, which monitors event batches within pipelines and waits for a positive acknowledgment upon successful delivery to the sinks. In pipelines with multiple sinks, including nested Data Prepper pipelines, the E2E acknowledgment is sent when events reach the final sink in the pipeline chain. Conversely, the source sends a negative acknowledgment if an event cannot be delivered to a sink for any reason. -Alternatively, the source sends a negative acknowledgment when an event cannot be delivered to a sink for any reason. +If a pipeline component fails to process and send an event, then the source receives no acknowledgment. In the case of a failure, the pipeline's source times out, allowing you to take necessary actions, such as rerunning the pipeline or logging the failure. -When any component of a pipeline fails and is unable to send an event, the source receives no acknowledgment. In the case of a failure, the pipeline's source times out. This gives you the ability to take any necessary actions to address the source failure, including rerunning the pipeline or logging the failure. +### Conditional routing +Pipelines also support conditional routing, which enables the routing of events to different sinks based on specific conditions. To add conditional routing, specify a list of named routes using the `route` component and assign specific routes to sinks using the `routes` property. Any sink with the `routes` property will only accept events matching at least one of the routing conditions. -## Conditional routing - -Pipelines also support **conditional routing** which allows you to route events to different sinks based on specific conditions. To add conditional routing to a pipeline, specify a list of named routes under the `route` component and add specific routes to sinks under the `routes` property. Any sink with the `routes` property will only accept events that match at least one of the routing conditions. - -In the following example, `application-logs` is a named route with a condition set to `/log_type == "application"`. The route uses [Data Prepper expressions](https://github.com/opensearch-project/data-prepper/tree/main/examples) to define the conditions. Data Prepper only routes events that satisfy the condition to the first OpenSearch sink. By default, Data Prepper routes all events to a sink which does not define a route. In the example, all events route into the third OpenSearch sink. +In the following example pipeline, `application-logs` is a named route with a condition set to `/log_type == "application"`. The route uses [Data Prepper expressions](https://github.com/opensearch-project/data-prepper/tree/main/examples) to define the condition. Data Prepper routes events satisfying this condition to the first OpenSearch sink. By default, Data Prepper routes all events to sinks without a defined route, as shown in the third OpenSearch sink of the given pipeline: ```yml conditional-routing-sample-pipeline: @@ -84,269 +88,8 @@ conditional-routing-sample-pipeline: hosts: [ "https://opensearch:9200" ] index: all_logs ``` +{% include copy-curl.html %} +## Next steps -## Examples - -This section provides some pipeline examples that you can use to start creating your own pipelines. For more pipeline configurations, select from the following options for each component: - -- [Buffers]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/buffers/buffers/) -- [Processors]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/processors/) -- [Sinks]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sinks/sinks/) -- [Sources]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/sources/) - -The Data Prepper repository has several [sample applications](https://github.com/opensearch-project/data-prepper/tree/main/examples) to help you get started. - -### Log ingestion pipeline - -The following example `pipeline.yaml` file with SSL and basic authentication enabled for the `http-source` demonstrates how to use the HTTP Source and Grok Prepper plugins to process unstructured log data: - - -```yaml -log-pipeline: - source: - http: - ssl_certificate_file: "/full/path/to/certfile.crt" - ssl_key_file: "/full/path/to/keyfile.key" - authentication: - http_basic: - username: "myuser" - password: "mys3cret" - processor: - - grok: - match: - # This will match logs with a "log" key against the COMMONAPACHELOG pattern (ex: { "log": "actual apache log..." } ) - # You should change this to match what your logs look like. See the grok documenation to get started. - log: [ "%{COMMONAPACHELOG}" ] - sink: - - opensearch: - hosts: [ "https://localhost:9200" ] - # Change to your credentials - username: "admin" - password: "admin" - # Add a certificate file if you are accessing an OpenSearch cluster with a self-signed certificate - #cert: /path/to/cert - # If you are connecting to an Amazon OpenSearch Service domain without - # Fine-Grained Access Control, enable these settings. Comment out the - # username and password above. - #aws_sigv4: true - #aws_region: us-east-1 - # Since we are Grok matching for Apache logs, it makes sense to send them to an OpenSearch index named apache_logs. - # You should change this to correspond with how your OpenSearch indexes are set up. - index: apache_logs -``` - -This example uses weak security. We strongly recommend securing all plugins which open external ports in production environments. -{: .note} - -### Trace analytics pipeline - -The following example demonstrates how to build a pipeline that supports the [Trace Analytics OpenSearch Dashboards plugin]({{site.url}}{{site.baseurl}}/observability-plugin/trace/ta-dashboards/). This pipeline takes data from the OpenTelemetry Collector and uses two other pipelines as sinks. These two separate pipelines index trace and the service map documents for the dashboard plugin. - -Starting from Data Prepper 2.0, Data Prepper no longer supports `otel_trace_raw_prepper` processor due to the Data Prepper internal data model evolution. -Instead, users should use `otel_trace_raw`. - -```yml -entry-pipeline: - delay: "100" - source: - otel_trace_source: - ssl: false - buffer: - bounded_blocking: - buffer_size: 10240 - batch_size: 160 - sink: - - pipeline: - name: "raw-pipeline" - - pipeline: - name: "service-map-pipeline" -raw-pipeline: - source: - pipeline: - name: "entry-pipeline" - buffer: - bounded_blocking: - buffer_size: 10240 - batch_size: 160 - processor: - - otel_trace_raw: - sink: - - opensearch: - hosts: ["https://localhost:9200"] - insecure: true - username: admin - password: admin - index_type: trace-analytics-raw -service-map-pipeline: - delay: "100" - source: - pipeline: - name: "entry-pipeline" - buffer: - bounded_blocking: - buffer_size: 10240 - batch_size: 160 - processor: - - service_map_stateful: - sink: - - opensearch: - hosts: ["https://localhost:9200"] - insecure: true - username: admin - password: admin - index_type: trace-analytics-service-map -``` - -To maintain similar ingestion throughput and latency, scale the `buffer_size` and `batch_size` by the estimated maximum batch size in the client request payload. -{: .tip} - -### Metrics pipeline - -Data Prepper supports metrics ingestion using OTel. It currently supports the following metric types: - -* Gauge -* Sum -* Summary -* Histogram - -Other types are not supported. Data Prepper drops all other types, including Exponential Histogram and Summary. Additionally, Data Prepper does not support Scope instrumentation. - -To set up a metrics pipeline: - -```yml -metrics-pipeline: - source: - otel_metrics_source: - processor: - - otel_metrics_raw_processor: - sink: - - opensearch: - hosts: ["https://localhost:9200"] - username: admin - password: admin -``` - -### S3 log ingestion pipeline - -The following example demonstrates how to use the S3Source and Grok Processor plugins to process unstructured log data from [Amazon Simple Storage Service](https://aws.amazon.com/s3/) (Amazon S3). This example uses application load balancer logs. As the application load balancer writes logs to S3, S3 creates notifications in Amazon SQS. Data Prepper monitors those notifications and reads the S3 objects to get the log data and process it. - -```yml -log-pipeline: - source: - s3: - notification_type: "sqs" - compression: "gzip" - codec: - newline: - sqs: - queue_url: "https://sqs.us-east-1.amazonaws.com/12345678910/ApplicationLoadBalancer" - aws: - region: "us-east-1" - sts_role_arn: "arn:aws:iam::12345678910:role/Data-Prepper" - - processor: - - grok: - match: - message: ["%{DATA:type} %{TIMESTAMP_ISO8601:time} %{DATA:elb} %{DATA:client} %{DATA:target} %{BASE10NUM:request_processing_time} %{DATA:target_processing_time} %{BASE10NUM:response_processing_time} %{BASE10NUM:elb_status_code} %{DATA:target_status_code} %{BASE10NUM:received_bytes} %{BASE10NUM:sent_bytes} \"%{DATA:request}\" \"%{DATA:user_agent}\" %{DATA:ssl_cipher} %{DATA:ssl_protocol} %{DATA:target_group_arn} \"%{DATA:trace_id}\" \"%{DATA:domain_name}\" \"%{DATA:chosen_cert_arn}\" %{DATA:matched_rule_priority} %{TIMESTAMP_ISO8601:request_creation_time} \"%{DATA:actions_executed}\" \"%{DATA:redirect_url}\" \"%{DATA:error_reason}\" \"%{DATA:target_list}\" \"%{DATA:target_status_code_list}\" \"%{DATA:classification}\" \"%{DATA:classification_reason}"] - - grok: - match: - request: ["(%{NOTSPACE:http_method})? (%{NOTSPACE:http_uri})? (%{NOTSPACE:http_version})?"] - - grok: - match: - http_uri: ["(%{WORD:protocol})?(://)?(%{IPORHOST:domain})?(:)?(%{INT:http_port})?(%{GREEDYDATA:request_uri})?"] - - date: - from_time_received: true - destination: "@timestamp" - - - sink: - - opensearch: - hosts: [ "https://localhost:9200" ] - username: "admin" - password: "admin" - index: alb_logs -``` - -## Migrating from Logstash - -Data Prepper supports Logstash configuration files for a limited set of plugins. Simply use the logstash config to run Data Prepper. - -```bash -docker run --name data-prepper \ - -v /full/path/to/logstash.conf:/usr/share/data-prepper/pipelines/pipelines.conf \ - opensearchproject/opensearch-data-prepper:latest -``` - -This feature is limited by feature parity of Data Prepper. As of Data Prepper 1.2 release, the following plugins from the Logstash configuration are supported: - -- HTTP Input plugin -- Grok Filter plugin -- Elasticsearch Output plugin -- Amazon Elasticsearch Output plugin - -## Configure the Data Prepper server - -Data Prepper itself provides administrative HTTP endpoints such as `/list` to list pipelines and `/metrics/prometheus` to provide Prometheus-compatible metrics data. The port that has these endpoints has a TLS configuration and is specified by a separate YAML file. By default, these endpoints are secured by Data Prepper docker images. We strongly recommend providing your own configuration file for securing production environments. Here is an example `data-prepper-config.yaml`: - -```yml -ssl: true -keyStoreFilePath: "/usr/share/data-prepper/keystore.jks" -keyStorePassword: "password" -privateKeyPassword: "other_password" -serverPort: 1234 -``` - -To configure the Data Prepper server, run Data Prepper with the additional yaml file. - -```bash -docker run --name data-prepper \ - -v /full/path/to/my-pipelines.yaml:/usr/share/data-prepper/pipelines/my-pipelines.yaml \ - -v /full/path/to/data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml \ - opensearchproject/data-prepper:latest -``` - -## Configure peer forwarder - -Data Prepper provides an HTTP service to forward events between Data Prepper nodes for aggregation. This is required for operating Data Prepper in a clustered deployment. Currently, peer forwarding is supported in `aggregate`, `service_map_stateful`, and `otel_trace_raw` processors. Peer forwarder groups events based on the identification keys provided by the processors. For `service_map_stateful` and `otel_trace_raw` it's `traceId` by default and can not be configured. For `aggregate` processor, it is configurable using `identification_keys` option. - -Peer forwarder supports peer discovery through one of three options: a static list, a DNS record lookup , or AWS Cloud Map. Peer discovery can be configured using `discovery_mode` option. Peer forwarder also supports SSL for verification and encryption, and mTLS for mutual authentication in a peer forwarding service. - -To configure peer forwarder, add configuration options to `data-prepper-config.yaml` mentioned in the [Configure the Data Prepper server](#configure-the-data-prepper-server) section: - -```yml -peer_forwarder: - discovery_mode: dns - domain_name: "data-prepper-cluster.my-domain.net" - ssl: true - ssl_certificate_file: "" - ssl_key_file: "" - authentication: - mutual_tls: -``` - - -## Pipeline Configurations - -Since Data Prepper 2.5, shared pipeline components can be configured under the reserved section `pipeline_configurations` when all pipelines are defined in a single pipeline configuration YAML file. -Shared pipeline configurations can include certain components within [Extension Plugins]({{site.url}}{{site.baseurl}}/data-prepper/managing-data-prepper/configuring-data-prepper/#extension-plugins), as shown in the following example that refers to secrets configurations for an `opensearch` sink: - -```json -pipeline_configurations: - aws: - secrets: - credential-secret-config: - secret_id: - region: - sts_role_arn: -simple-sample-pipeline: - ... - sink: - - opensearch: - hosts: [ {% raw %}"${{aws_secrets:host-secret-config}}"{% endraw %} ] - username: {% raw %}"${{aws_secrets:credential-secret-config:username}}"{% endraw %} - password: {% raw %}"${{aws_secrets:credential-secret-config:password}}"{% endraw %} - index: "test-migration" -``` - -When the same component is defined in both `pipelines.yaml` and `data-prepper-config.yaml`, the definition in the `pipelines.yaml` will overwrite the counterpart in `data-prepper-config.yaml`. For more information on shared pipeline components, see [AWS secrets extension plugin]({{site.url}}{{site.baseurl}}/data-prepper/managing-data-prepper/configuring-data-prepper/#aws-secrets-extension-plugin) for details. +- See [Common uses cases]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/common-use-cases/) for example configurations. From cf1806532ef0be2d7deb04a275c8fe2ea6e3cbe0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Philipp=20D=C3=BCnnebeil?= <53494432+PhilD90@users.noreply.github.com> Date: Fri, 2 Aug 2024 15:00:44 +0200 Subject: [PATCH 088/154] Update index.md (#7893) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit fixed typo Signed-off-by: Philipp Dünnebeil <53494432+PhilD90@users.noreply.github.com> --- _api-reference/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/index.md b/_api-reference/index.md index 815b4af365..f87d40214e 100644 --- a/_api-reference/index.md +++ b/_api-reference/index.md @@ -48,7 +48,7 @@ This reference includes the REST APIs supported by OpenSearch. If a REST API is - [Popular APIs]({{site.url}}{{site.baseurl}}/api-reference/popular-api/) - [Ranking evaluation]({{site.url}}{{site.baseurl}}/api-reference/rank-eval/) - [Refresh search analyzer]({{site.url}}{{site.baseurl}}/im-plugin/refresh-analyzer/) -- [Remove cluster information]({{site.url}}{{site.baseurl}}/api-reference/remote-info/) +- [Remote cluster information]({{site.url}}{{site.baseurl}}/api-reference/remote-info/) - [Root cause analysis API]({{site.url}}{{site.baseurl}}/monitoring-your-cluster/pa/rca/api/) - [Snapshot management API]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/snapshots/sm-api/) - [Script APIs]({{site.url}}{{site.baseurl}}/api-reference/script-apis/index/) From 5e11b720a61dc3546ff93af357e20b362585042e Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Fri, 2 Aug 2024 11:09:16 -0400 Subject: [PATCH 089/154] Fix typo and make left nav heading uniform for neural sparse processor (#7895) Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- .../search-pipelines/neural-sparse-query-two-phase-processor.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md index 53d69c1cc2..de36225a99 100644 --- a/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md +++ b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md @@ -1,6 +1,6 @@ --- layout: default -title: Neural spare query two-phase processor +title: Neural sparse query two-phase nav_order: 13 parent: Search processors grand_parent: Search pipelines From 23e1ffb13326ccb5ff5e252b8ac8256fc7e16364 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Fri, 2 Aug 2024 11:15:23 -0400 Subject: [PATCH 090/154] Add custom JSON lexer and highlighting color scheme (#7892) * Add custom JSON lexer and highlighting color scheme Signed-off-by: Fanit Kolchina * Update _getting-started/quickstart.md Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- _getting-started/communicate.md | 8 ++--- _getting-started/quickstart.md | 4 ++- _plugins/custom-lexer.rb | 18 +++++++++++ _sass/color_schemes/opensearch.scss | 50 ++++++++++++++--------------- _sass/custom/custom.scss | 5 +++ 5 files changed, 55 insertions(+), 30 deletions(-) create mode 100644 _plugins/custom-lexer.rb diff --git a/_getting-started/communicate.md b/_getting-started/communicate.md index 391bc9bef0..9960f63b2c 100644 --- a/_getting-started/communicate.md +++ b/_getting-started/communicate.md @@ -21,7 +21,7 @@ When sending cURL requests in a terminal, the request format varies depending on If you're not using the Security plugin, send the following request: ```bash -curl -XGET "http://localhost:9200/_cluster/health" +curl -X GET "http://localhost:9200/_cluster/health" ``` {% include copy.html %} @@ -37,7 +37,7 @@ The default username is `admin`, and the password is set in your `docker-compose OpenSearch generally returns responses in a flat JSON format by default. For a human-readable response body, provide the `pretty` query parameter: ```bash -curl -XGET "http://localhost:9200/_cluster/health?pretty" +curl -X GET "http://localhost:9200/_cluster/health?pretty" ``` {% include copy.html %} @@ -46,7 +46,7 @@ For more information about `pretty` and other useful query parameters, see [Comm For requests that contain a body, specify the `Content-Type` header and provide the request payload in the `-d` (data) option: ```json -curl -XGET "http://localhost:9200/students/_search?pretty" -H 'Content-Type: application/json' -d' +curl -X GET "http://localhost:9200/students/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "match_all": {} @@ -135,7 +135,7 @@ OpenSearch responds with the field `type` for each field: OpenSearch mapped the numeric fields to the `float` and `long` types. Notice that OpenSearch mapped the `name` text field to `text` and added a `name.keyword` subfield mapped to `keyword`. Fields mapped to `text` are analyzed (lowercased and split into terms) and can be used for full-text search. Fields mapped to `keyword` are used for exact term search. -OpenSearch mapped the `grad_year` field to `long`. If you want to map it to the `date` type instead, you need to [delete the index](#deleting-an-index) and then recreate it, explicitly specifying the mappings. For instructions on how to explicitly specify mappings, see [Index settings and mappings](#index-mappings-and-settings). +OpenSearch mapped the `grad_year` field to `long`. If you want to map it to the `date` type instead, you need to [delete the index](#deleting-an-index) and then recreate it, explicitly specifying the mappings. For instructions on how to explicitly specify mappings, see [Index mappings and settings](#index-mappings-and-settings). ## Searching for documents diff --git a/_getting-started/quickstart.md b/_getting-started/quickstart.md index 78104b1913..0a28e29a04 100644 --- a/_getting-started/quickstart.md +++ b/_getting-started/quickstart.md @@ -10,7 +10,9 @@ redirect_from: # Installation quickstart -Get started using OpenSearch and OpenSearch Dashboards by deploying your containers with [Docker](https://www.docker.com/). Before proceeding, you need to [get Docker](https://docs.docker.com/get-docker/) and [Docker Compose](https://github.com/docker/compose) installed on your local machine. +To quickly get started using OpenSearch and OpenSearch Dashboards, deploy your containers using [Docker](https://www.docker.com/). For all installation guides, see [Install and upgrade OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/). + +Before proceeding, you need to install [Docker](https://docs.docker.com/get-docker/) and [Docker Compose](https://github.com/docker/compose) on your local machine. The Docker Compose commands used in this guide are written with a hyphen (for example, `docker-compose`). If you installed Docker Desktop on your machine, which automatically installs a bundled version of Docker Compose, then you should remove the hyphen. For example, change `docker-compose` to `docker compose`. {: .note} diff --git a/_plugins/custom-lexer.rb b/_plugins/custom-lexer.rb new file mode 100644 index 0000000000..da58db6304 --- /dev/null +++ b/_plugins/custom-lexer.rb @@ -0,0 +1,18 @@ +require 'jekyll' +require 'rouge' + +module Rouge + module Lexers + class CustomJSON < JSON + tag 'json' + + prepend :root do + rule %r/\b(GET|PUT|POST|DELETE|PATCH|HEAD)\b/, Keyword::Reserved + end + end + end +end + +Jekyll::Hooks.register :site, :pre_render do |site| + # Ensure the custom lexer is loaded +end diff --git a/_sass/color_schemes/opensearch.scss b/_sass/color_schemes/opensearch.scss index 36223b5811..7a683e3bcb 100644 --- a/_sass/color_schemes/opensearch.scss +++ b/_sass/color_schemes/opensearch.scss @@ -119,7 +119,7 @@ $media-queries: ( color: $purple-100; } // operator // .highlight .x { - color: #cb4b16; + color: #a31521; } // other // .highlight .p { color: $grey-dk-300; @@ -137,7 +137,7 @@ $media-queries: ( color: $purple-000; } // comment.special // .highlight .gd { - color: #2aa198; + color: #279a8f; } // generic.deleted // .highlight .ge { font-style: italic; @@ -147,7 +147,7 @@ $media-queries: ( color: #dc322f; } // generic.error // .highlight .gh { - color: #cb4b16; + color: #a31521; } // generic.heading // .highlight .gi { color: $purple-000; @@ -163,13 +163,13 @@ $media-queries: ( color: $grey-dk-300; } // generic.strong // .highlight .gu { - color: #cb4b16; + color: #a31521; } // generic.subheading // .highlight .gt { color: $grey-dk-300; } // generic.traceback // .highlight .kc { - color: #cb4b16; + color: #a31521; } // keyword.constant // .highlight .kd { color: #268bd2; @@ -190,13 +190,13 @@ $media-queries: ( color: $grey-dk-300; } // literal.date // .highlight .m { - color: #2aa198; + color: #2aa18e; } // literal.number // .highlight .s { - color: #2aa198; + color: #2aa18e; } // literal.string // .highlight .na { - color: #555; + color: #0451a3; } // name.attribute // .highlight .nb { color: #b58900; @@ -205,28 +205,28 @@ $media-queries: ( color: #268bd2; } // name.class // .highlight .no { - color: #cb4b16; + color: #a31521; } // name.constant // .highlight .nd { color: #268bd2; } // name.decorator // .highlight .ni { - color: #cb4b16; + color: #a31521; } // name.entity // .highlight .ne { - color: #cb4b16; + color: #a31521; } // name.exception // .highlight .nf { color: #268bd2; } // name.function // .highlight .nl { - color: #555; + color: #0451a3; } // name.label // .highlight .nn { color: $grey-dk-300; } // name.namespace // .highlight .nx { - color: #555; + color: #0451a3; } // name.other // .highlight .py { color: $grey-dk-300; @@ -244,49 +244,49 @@ $media-queries: ( color: $grey-dk-300; } // text.whitespace // .highlight .mf { - color: #2aa198; + color: #2aa18e; } // literal.number.float // .highlight .mh { - color: #2aa198; + color: #2aa18e; } // literal.number.hex // .highlight .mi { - color: #2aa198; + color: #2aa18e; } // literal.number.integer // .highlight .mo { - color: #2aa198; + color: #2aa18e; } // literal.number.oct // .highlight .sb { color: $blue-dk-200; } // literal.string.backtick // .highlight .sc { - color: #2aa198; + color: #2aa18e; } // literal.string.char // .highlight .sd { color: $grey-dk-300; } // literal.string.doc // .highlight .s2 { - color: #2aa198; + color: #2aa18e; } // literal.string.double // .highlight .se { - color: #cb4b16; + color: #a31521; } // literal.string.escape // .highlight .sh { color: $grey-dk-300; } // literal.string.heredoc // .highlight .si { - color: #2aa198; + color: #2aa18e; } // literal.string.interpol // .highlight .sx { - color: #2aa198; + color: #2aa18e; } // literal.string.other // .highlight .sr { color: #dc322f; } // literal.string.regex // .highlight .s1 { - color: #2aa198; + color: #2aa18e; } // literal.string.single // .highlight .ss { - color: #2aa198; + color: #2aa18e; } // literal.string.symbol // .highlight .bp { color: #268bd2; @@ -301,5 +301,5 @@ $media-queries: ( color: #268bd2; } // name.variable.instance // .highlight .il { - color: #2aa198; + color: #2aa18e; } // literal.number.integer.long // \ No newline at end of file diff --git a/_sass/custom/custom.scss b/_sass/custom/custom.scss index 0f1c549504..7d7a168fb4 100755 --- a/_sass/custom/custom.scss +++ b/_sass/custom/custom.scss @@ -335,6 +335,11 @@ img { } } +// Needed so the panel and copy-button-wrap panel blend together +div.highlighter-rouge { + border-radius: 0; +} + .copy-button-wrap { background-color: $sidebar-color; padding: 0.25rem 2rem 0.5rem 2rem; From 8a321e5621451ad07f025bd664ce3cebfa07d3bb Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Fri, 2 Aug 2024 14:05:11 -0400 Subject: [PATCH 091/154] Add model names to Vale (#7901) Signed-off-by: Fanit Kolchina --- .github/vale/styles/Vocab/OpenSearch/Products/accept.txt | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/.github/vale/styles/Vocab/OpenSearch/Products/accept.txt b/.github/vale/styles/Vocab/OpenSearch/Products/accept.txt index 4ea310a086..e33ac09744 100644 --- a/.github/vale/styles/Vocab/OpenSearch/Products/accept.txt +++ b/.github/vale/styles/Vocab/OpenSearch/Products/accept.txt @@ -1,14 +1,18 @@ Active Directory Adoptium +AI21 Labs Jurassic Amazon Amazon OpenSearch Serverless Amazon OpenSearch Service Amazon Bedrock Amazon SageMaker Ansible +Anthropic Claude Auditbeat AWS Cloud Cohere Command +Cohere Embed English +Cohere Embed Multilingual Cognito Dashboards Query Language Data Prepper @@ -87,6 +91,8 @@ Ruby Simple Schema for Observability Tableau Textract +Titan Multimodal Embeddings +Titan Text Embeddings TorchScript Tribuo VisBuilder From 76a29ff8827c937fc318d2481c10a0e8d66d7d3f Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Fri, 2 Aug 2024 14:29:13 -0400 Subject: [PATCH 092/154] Renamed data prepper files to have dashes for consistency (#7790) * Renamed data prepper files to have dashes for consistency Signed-off-by: Fanit Kolchina * More files Signed-off-by: Fanit Kolchina --------- Signed-off-by: Fanit Kolchina --- _data-prepper/common-use-cases/log-enrichment.md | 2 +- _data-prepper/common-use-cases/trace-analytics.md | 6 +++--- .../extensions/{geoip_service.md => geoip-service.md} | 0 .../{convert_entry_type.md => convert-entry-type.md} | 0 .../processors/{delete_entries.md => delete-entries.md} | 0 _data-prepper/pipelines/configuration/processors/geoip.md | 2 +- .../pipelines/configuration/processors/mutate-event.md | 4 ++-- .../processors/{otel_metrics.md => otel-metrics.md} | 0 .../processors/{otel_traces.md => otel-traces.md} | 0 .../configuration/processors/{parse_ion.md => parse-ion.md} | 0 .../processors/{parse_json.md => parse-json.md} | 0 .../configuration/processors/{parse_xml.md => parse-xml.md} | 0 .../processors/{service_map.md => service-map.md} | 0 .../processors/{split_string.md => split-string.md} | 0 .../processors/{string_converter.md => string-converter.md} | 0 .../{substitute_string.md => substitute-string.md} | 0 .../processors/{trim_string.md => trim-string.md} | 0 .../processors/{write_json.md => write-json.md} | 0 _data-prepper/pipelines/configuration/sources/s3.md | 2 +- _ingest-pipelines/processors/convert.md | 2 +- _observing-your-data/trace/ta-dashboards.md | 2 +- 21 files changed, 10 insertions(+), 10 deletions(-) rename _data-prepper/managing-data-prepper/extensions/{geoip_service.md => geoip-service.md} (100%) rename _data-prepper/pipelines/configuration/processors/{convert_entry_type.md => convert-entry-type.md} (100%) rename _data-prepper/pipelines/configuration/processors/{delete_entries.md => delete-entries.md} (100%) rename _data-prepper/pipelines/configuration/processors/{otel_metrics.md => otel-metrics.md} (100%) rename _data-prepper/pipelines/configuration/processors/{otel_traces.md => otel-traces.md} (100%) rename _data-prepper/pipelines/configuration/processors/{parse_ion.md => parse-ion.md} (100%) rename _data-prepper/pipelines/configuration/processors/{parse_json.md => parse-json.md} (100%) rename _data-prepper/pipelines/configuration/processors/{parse_xml.md => parse-xml.md} (100%) rename _data-prepper/pipelines/configuration/processors/{service_map.md => service-map.md} (100%) rename _data-prepper/pipelines/configuration/processors/{split_string.md => split-string.md} (100%) rename _data-prepper/pipelines/configuration/processors/{string_converter.md => string-converter.md} (100%) rename _data-prepper/pipelines/configuration/processors/{substitute_string.md => substitute-string.md} (100%) rename _data-prepper/pipelines/configuration/processors/{trim_string.md => trim-string.md} (100%) rename _data-prepper/pipelines/configuration/processors/{write_json.md => write-json.md} (100%) diff --git a/_data-prepper/common-use-cases/log-enrichment.md b/_data-prepper/common-use-cases/log-enrichment.md index 0c878dd76e..0d8ce4ab7d 100644 --- a/_data-prepper/common-use-cases/log-enrichment.md +++ b/_data-prepper/common-use-cases/log-enrichment.md @@ -370,7 +370,7 @@ The `date` processor can generate timestamps for incoming events if you specify ### Deriving punctuation patterns -The [`substitute_string`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/substitute_string/) processor (which is one of the mutate string processors) lets you derive a punctuation pattern from incoming events. In the following example pipeline, the processor will scan incoming Apache log events and derive punctuation patterns from them: +The [`substitute_string`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/substitute-string/) processor (which is one of the mutate string processors) lets you derive a punctuation pattern from incoming events. In the following example pipeline, the processor will scan incoming Apache log events and derive punctuation patterns from them: ```yaml processor: diff --git a/_data-prepper/common-use-cases/trace-analytics.md b/_data-prepper/common-use-cases/trace-analytics.md index 3deca7b632..1a961077fe 100644 --- a/_data-prepper/common-use-cases/trace-analytics.md +++ b/_data-prepper/common-use-cases/trace-analytics.md @@ -32,7 +32,7 @@ To monitor trace analytics in Data Prepper, we provide three pipelines: `entry-p ### OpenTelemetry trace source -The [OpenTelemetry source]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/otel_traces/) accepts trace data from the OpenTelemetry Collector. The source follows the [OpenTelemetry Protocol](https://github.com/open-telemetry/opentelemetry-specification/tree/master/specification/protocol) and officially supports transport over gRPC and the use of industry-standard encryption (TLS/HTTPS). +The [OpenTelemetry source]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/otel-traces/) accepts trace data from the OpenTelemetry Collector. The source follows the [OpenTelemetry Protocol](https://github.com/open-telemetry/opentelemetry-specification/tree/master/specification/protocol) and officially supports transport over gRPC and the use of industry-standard encryption (TLS/HTTPS). ### Processor @@ -49,8 +49,8 @@ OpenSearch provides a generic sink that writes data to OpenSearch as the destina The sink provides specific configurations for the trace analytics feature. These configurations allow the sink to use indexes and index templates specific to trace analytics. The following OpenSearch indexes are specific to trace analytics: -* otel-v1-apm-span –- The *otel-v1-apm-span* index stores the output from the [otel_traces_raw]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/otel_traces/) processor. -* otel-v1-apm-service-map –- The *otel-v1-apm-service-map* index stores the output from the [service_map_stateful]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/service_map/) processor. +* otel-v1-apm-span –- The *otel-v1-apm-span* index stores the output from the [otel_traces_raw]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/otel-traces/) processor. +* otel-v1-apm-service-map –- The *otel-v1-apm-service-map* index stores the output from the [service_map_stateful]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/service-map/) processor. ## Trace tuning diff --git a/_data-prepper/managing-data-prepper/extensions/geoip_service.md b/_data-prepper/managing-data-prepper/extensions/geoip-service.md similarity index 100% rename from _data-prepper/managing-data-prepper/extensions/geoip_service.md rename to _data-prepper/managing-data-prepper/extensions/geoip-service.md diff --git a/_data-prepper/pipelines/configuration/processors/convert_entry_type.md b/_data-prepper/pipelines/configuration/processors/convert-entry-type.md similarity index 100% rename from _data-prepper/pipelines/configuration/processors/convert_entry_type.md rename to _data-prepper/pipelines/configuration/processors/convert-entry-type.md diff --git a/_data-prepper/pipelines/configuration/processors/delete_entries.md b/_data-prepper/pipelines/configuration/processors/delete-entries.md similarity index 100% rename from _data-prepper/pipelines/configuration/processors/delete_entries.md rename to _data-prepper/pipelines/configuration/processors/delete-entries.md diff --git a/_data-prepper/pipelines/configuration/processors/geoip.md b/_data-prepper/pipelines/configuration/processors/geoip.md index b7418c66c6..d0b6bd1cbb 100644 --- a/_data-prepper/pipelines/configuration/processors/geoip.md +++ b/_data-prepper/pipelines/configuration/processors/geoip.md @@ -10,7 +10,7 @@ nav_order: 49 The `geoip` processor enriches events with geographic information extracted from IP addresses contained in the events. By default, Data Prepper uses the [MaxMind GeoLite2](https://dev.maxmind.com/geoip/geolite2-free-geolocation-data) geolocation database. -Data Prepper administrators can configure the databases using the [`geoip_service`]({{site.url}}{{site.baseurl}}/data-prepper/managing-data-prepper/extensions/geoip_service) extension configuration. +Data Prepper administrators can configure the databases using the [`geoip_service`]({{site.url}}{{site.baseurl}}/data-prepper/managing-data-prepper/extensions/geoip-service/) extension configuration. ## Usage diff --git a/_data-prepper/pipelines/configuration/processors/mutate-event.md b/_data-prepper/pipelines/configuration/processors/mutate-event.md index 1afb34a970..ff2da6b527 100644 --- a/_data-prepper/pipelines/configuration/processors/mutate-event.md +++ b/_data-prepper/pipelines/configuration/processors/mutate-event.md @@ -11,9 +11,9 @@ nav_order: 65 Mutate event processors allow you to modify events in Data Prepper. The following processors are available: * [add_entries]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/add-entries/) allows you to add entries to an event. -* [convert_entry_type]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/convert_entry_type/) allows you to convert value types in an event. +* [convert_entry_type]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/convert-entry-type/) allows you to convert value types in an event. * [copy_values]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/copy-values/) allows you to copy values within an event. -* [delete_entries]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/delete_entries/) allows you to delete entries from an event. +* [delete_entries]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/delete-entries/) allows you to delete entries from an event. * [list_to_map]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/list-to-map) allows you to convert list of objects from an event where each object contains a `key` field into a map of target keys. * `map_to_list` allows you to convert a map of objects from an event, where each object contains a `key` field, into a list of target keys. * [rename_keys]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/rename-keys/) allows you to rename keys in an event. diff --git a/_data-prepper/pipelines/configuration/processors/otel_metrics.md b/_data-prepper/pipelines/configuration/processors/otel-metrics.md similarity index 100% rename from _data-prepper/pipelines/configuration/processors/otel_metrics.md rename to _data-prepper/pipelines/configuration/processors/otel-metrics.md diff --git a/_data-prepper/pipelines/configuration/processors/otel_traces.md b/_data-prepper/pipelines/configuration/processors/otel-traces.md similarity index 100% rename from _data-prepper/pipelines/configuration/processors/otel_traces.md rename to _data-prepper/pipelines/configuration/processors/otel-traces.md diff --git a/_data-prepper/pipelines/configuration/processors/parse_ion.md b/_data-prepper/pipelines/configuration/processors/parse-ion.md similarity index 100% rename from _data-prepper/pipelines/configuration/processors/parse_ion.md rename to _data-prepper/pipelines/configuration/processors/parse-ion.md diff --git a/_data-prepper/pipelines/configuration/processors/parse_json.md b/_data-prepper/pipelines/configuration/processors/parse-json.md similarity index 100% rename from _data-prepper/pipelines/configuration/processors/parse_json.md rename to _data-prepper/pipelines/configuration/processors/parse-json.md diff --git a/_data-prepper/pipelines/configuration/processors/parse_xml.md b/_data-prepper/pipelines/configuration/processors/parse-xml.md similarity index 100% rename from _data-prepper/pipelines/configuration/processors/parse_xml.md rename to _data-prepper/pipelines/configuration/processors/parse-xml.md diff --git a/_data-prepper/pipelines/configuration/processors/service_map.md b/_data-prepper/pipelines/configuration/processors/service-map.md similarity index 100% rename from _data-prepper/pipelines/configuration/processors/service_map.md rename to _data-prepper/pipelines/configuration/processors/service-map.md diff --git a/_data-prepper/pipelines/configuration/processors/split_string.md b/_data-prepper/pipelines/configuration/processors/split-string.md similarity index 100% rename from _data-prepper/pipelines/configuration/processors/split_string.md rename to _data-prepper/pipelines/configuration/processors/split-string.md diff --git a/_data-prepper/pipelines/configuration/processors/string_converter.md b/_data-prepper/pipelines/configuration/processors/string-converter.md similarity index 100% rename from _data-prepper/pipelines/configuration/processors/string_converter.md rename to _data-prepper/pipelines/configuration/processors/string-converter.md diff --git a/_data-prepper/pipelines/configuration/processors/substitute_string.md b/_data-prepper/pipelines/configuration/processors/substitute-string.md similarity index 100% rename from _data-prepper/pipelines/configuration/processors/substitute_string.md rename to _data-prepper/pipelines/configuration/processors/substitute-string.md diff --git a/_data-prepper/pipelines/configuration/processors/trim_string.md b/_data-prepper/pipelines/configuration/processors/trim-string.md similarity index 100% rename from _data-prepper/pipelines/configuration/processors/trim_string.md rename to _data-prepper/pipelines/configuration/processors/trim-string.md diff --git a/_data-prepper/pipelines/configuration/processors/write_json.md b/_data-prepper/pipelines/configuration/processors/write-json.md similarity index 100% rename from _data-prepper/pipelines/configuration/processors/write_json.md rename to _data-prepper/pipelines/configuration/processors/write-json.md diff --git a/_data-prepper/pipelines/configuration/sources/s3.md b/_data-prepper/pipelines/configuration/sources/s3.md index 7b1599f838..5a7d9986e5 100644 --- a/_data-prepper/pipelines/configuration/sources/s3.md +++ b/_data-prepper/pipelines/configuration/sources/s3.md @@ -138,7 +138,7 @@ The `codec` determines how the `s3` source parses each Amazon S3 object. For inc ### `newline` codec -The `newline` codec parses each single line as a single log event. This is ideal for most application logs because each event parses per single line. It can also be suitable for S3 objects that have individual JSON objects on each line, which matches well when used with the [parse_json]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/parse_json/) processor to parse each line. +The `newline` codec parses each single line as a single log event. This is ideal for most application logs because each event parses per single line. It can also be suitable for S3 objects that have individual JSON objects on each line, which matches well when used with the [parse_json]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/parse-json/) processor to parse each line. Use the following options to configure the `newline` codec. diff --git a/_ingest-pipelines/processors/convert.md b/_ingest-pipelines/processors/convert.md index fe92dfebe7..c86f86c9a7 100644 --- a/_ingest-pipelines/processors/convert.md +++ b/_ingest-pipelines/processors/convert.md @@ -7,7 +7,7 @@ redirect_from: - /api-reference/ingest-apis/processors/convert/ --- -This documentation describes using the `convert` processor in OpenSearch ingest pipelines. Consider using the [Data Prepper `convert_entry_type` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/convert_entry_type/), which runs on the OpenSearch cluster, if your use case involves large or complex datasets. +This documentation describes using the `convert` processor in OpenSearch ingest pipelines. Consider using the [Data Prepper `convert_entry_type` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/convert-entry-type/), which runs on the OpenSearch cluster, if your use case involves large or complex datasets. {: .note} # Convert processor diff --git a/_observing-your-data/trace/ta-dashboards.md b/_observing-your-data/trace/ta-dashboards.md index c7cf0a5091..c7ef2117ad 100644 --- a/_observing-your-data/trace/ta-dashboards.md +++ b/_observing-your-data/trace/ta-dashboards.md @@ -48,7 +48,7 @@ The **Trace Analytics** application includes two options: **Services** and **Tra The plugin requires you to use [Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/) to process and visualize OTel data and relies on the following Data Prepper pipelines for OTel correlations and service map calculations: - [Trace analytics pipeline]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/trace-analytics/) -- [Service map pipeline]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/service_map/) +- [Service map pipeline]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/service-map/) ### Standardized telemetry data From 40943f4ea1bbf9ad2f25d9d77d0f50baf31cbdca Mon Sep 17 00:00:00 2001 From: Mingshi Liu Date: Fri, 2 Aug 2024 12:56:34 -0700 Subject: [PATCH 093/154] Add documentation for ml inference search request processor/ search response processor (#7852) * draft ml inference search request processor Signed-off-by: Mingshi Liu * add doc Signed-off-by: Mingshi Liu * add doc Signed-off-by: Mingshi Liu * Doc review Signed-off-by: Fanit Kolchina * Fixed links Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Unify processor docs Signed-off-by: Fanit Kolchina * Update _query-dsl/geo-and-xy/xy.md Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Remove note Signed-off-by: Fanit Kolchina * Fix link Signed-off-by: Fanit Kolchina --------- Signed-off-by: Mingshi Liu Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Fanit Kolchina Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- _query-dsl/geo-and-xy/xy.md | 4 +- .../search-pipelines/collapse-processor.md | 4 +- .../filter-query-processor.md | 4 +- .../ml-inference-search-request.md | 531 ++++++++++++++++++ .../ml-inference-search-response.md | 391 +++++++++++++ .../search-pipelines/neural-query-enricher.md | 4 +- ...neural-sparse-query-two-phase-processor.md | 2 +- .../normalization-processor.md | 4 +- .../search-pipelines/oversample-processor.md | 4 +- .../personalize-search-ranking.md | 2 + .../search-pipelines/rag-processor.md | 4 +- .../rename-field-processor.md | 4 +- .../search-pipelines/rerank-processor.md | 4 +- .../search-pipelines/script-processor.md | 4 +- .../search-pipelines/search-processors.md | 8 +- .../search-pipelines/sort-processor.md | 4 +- .../search-pipelines/split-processor.md | 4 +- .../truncate-hits-processor.md | 4 +- 18 files changed, 968 insertions(+), 18 deletions(-) create mode 100644 _search-plugins/search-pipelines/ml-inference-search-request.md create mode 100644 _search-plugins/search-pipelines/ml-inference-search-response.md diff --git a/_query-dsl/geo-and-xy/xy.md b/_query-dsl/geo-and-xy/xy.md index d0ed61c050..c62e4a94eb 100644 --- a/_query-dsl/geo-and-xy/xy.md +++ b/_query-dsl/geo-and-xy/xy.md @@ -14,7 +14,7 @@ To search for documents that contain [xy point]({{site.url}}{{site.baseurl}}/ope ## Spatial relations -When you provide an xy shape to the xy query, the xy fields are matched using the following spatial relations to the provided shape. +When you provide an xy shape to the xy query, the xy fields in the documents are matched using the following spatial relations to the provided shape. Relation | Description | Supporting xy field type :--- | :--- | :--- @@ -33,7 +33,7 @@ You can define the shape in an xy query either by providing a new shape definiti To provide a new shape to an xy query, define it in the `xy_shape` field. -The following example illustrates searching for documents with xy shapes that match an xy shape defined at query time. +The following example illustrates how to search for documents containing xy shapes that match an xy shape defined at query time. First, create an index and map the `geometry` field as an `xy_shape`: diff --git a/_search-plugins/search-pipelines/collapse-processor.md b/_search-plugins/search-pipelines/collapse-processor.md index cea0a15396..8a2723efa7 100644 --- a/_search-plugins/search-pipelines/collapse-processor.md +++ b/_search-plugins/search-pipelines/collapse-processor.md @@ -1,13 +1,15 @@ --- layout: default title: Collapse -nav_order: 7 +nav_order: 10 has_children: false parent: Search processors grand_parent: Search pipelines --- # Collapse processor +Introduced 2.12 +{: .label .label-purple } The `collapse` response processor discards hits that have the same value for a particular field as a previous document in the result set. This is similar to passing the `collapse` parameter in a search request, but the response processor is applied to the diff --git a/_search-plugins/search-pipelines/filter-query-processor.md b/_search-plugins/search-pipelines/filter-query-processor.md index 6c68821a27..799d393e42 100644 --- a/_search-plugins/search-pipelines/filter-query-processor.md +++ b/_search-plugins/search-pipelines/filter-query-processor.md @@ -1,13 +1,15 @@ --- layout: default title: Filter query -nav_order: 10 +nav_order: 20 has_children: false parent: Search processors grand_parent: Search pipelines --- # Filter query processor +Introduced 2.8 +{: .label .label-purple } The `filter_query` search request processor intercepts a search request and applies an additional query to the request, filtering the results. This is useful when you don't want to rewrite existing queries in your application but need additional filtering of the results. diff --git a/_search-plugins/search-pipelines/ml-inference-search-request.md b/_search-plugins/search-pipelines/ml-inference-search-request.md new file mode 100644 index 0000000000..a072458a41 --- /dev/null +++ b/_search-plugins/search-pipelines/ml-inference-search-request.md @@ -0,0 +1,531 @@ +--- +layout: default +title: ML inference (request) +nav_order: 30 +has_children: false +parent: Search processors +grand_parent: Search pipelines +--- + +# ML inference search request processor +Introduced 2.16 +{: .label .label-purple } + +The `ml_inference` search request processor is used to invoke registered machine learning (ML) models in order to rewrite queries using the model output. + +**PREREQUISITE**
+Before using the `ml_inference` search request processor, you must have either a local ML model hosted on your OpenSearch cluster or an externally hosted model connected to your OpenSearch cluster through the ML Commons plugin. For more information about local models, see [Using ML models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/). +For more information about externally hosted models, see [Connecting to externally hosted models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index/). +{: .note} + +## Syntax + +The following is the syntax for the `ml-inference` search request processor: + +```json +{ + "ml_inference": { + "model_id": "", + "function_name": "", + "full_response_path": "", + "query_template": "", + "model_config": { + "": "" + }, + "model_input": "", + "input_map": [ + { + "": "" + } + ], + "output_map": [ + { + "": "" + } + ] + } +} +``` +{% include copy-curl.html %} + +## Configuration parameters + +The following table lists the required and optional parameters for the `ml-inference` search request processor. + +| Parameter | Data type | Required/Optional | Description | +|:--| :--- |:---|:---| +| `model_id`| String | Required | The ID of the ML model used by the processor. | +| `query_template` | String | Optional | A query string template used to construct a new query containing a `new_document_field`. Often used when rewriting a search query to a new query type. | +| `function_name` | String | Optional for externally hosted models

Required for local models | The function name of the ML model configured in the processor. For local models, valid values are `sparse_encoding`, `sparse_tokenize`, `text_embedding`, and `text_similarity`. For externally hosted models, valid value is `remote`. Default is `remote`. | +| `model_config` | Object | Optional | Custom configuration options for the ML model. For more information, see [The `model_config` object]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/register-model/#the-model_config-object). | +| `model_input` | String | Optional for externally hosted models

Required for local models | A template that defines the input field format expected by the model. Each local model type might use a different set of inputs. For externally hosted models, default is `"{ \"parameters\": ${ml_inference.parameters} }`. | +| `input_map` | Array | Required | An array specifying how to map query string fields to the model input fields. Each element of the array is a map in the `"": ""` format and corresponds to one model invocation of a document field. If no input mapping is specified for an externally hosted model, then all document fields are passed to the model directly as input. The `input_map` size indicates the number of times the model is invoked (the number of Predict API requests). | +| `` | String | Required | The model input field name. | +| `` | String | Required | The name or JSON path of the query field used as the model input. | +| `output_map` | Array | Required | An array specifying how to map the model output fields to new fields in the query string. Each element of the array is a map in the `"": ""` format. | +| `` | String | Required | The name of the query field in which the model's output (specified by `model_output`) is stored. | +| `` | String | Required | The name or JSON path of the field in the model output to be stored in the `query_output_field`. | +| `full_response_path` | Boolean | Optional | Set this parameter to `true` if the `model_output_field` contains a full JSON path to the field instead of the field name. The model output will then be fully parsed to get the value of the field. Default is `true` for local models and `false` for externally hosted models. | +| `ignore_missing` | Boolean | Optional | If `true` and any of the input fields defined in the `input_map` or `output_map` are missing, then the missing fields are ignored. Otherwise, a missing field causes a failure. Default is `false`. | +| `ignore_failure` | Boolean | Optional | Specifies whether the processor continues execution even if it encounters an error. If `true`, then any failure is ignored and the search continues. If `false`, then any failure causes the search to be canceled. Default is `false`. | +| `max_prediction_tasks` | Integer | Optional | The maximum number of concurrent model invocations that can run during query search. Default is `10`. | +| `description` | String | Optional | A brief description of the processor. | +| `tag` | String | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | + +The `input_map` and `output_map` mappings support standard [JSON path](https://github.com/json-path/JsonPath) notation for specifying complex data structures. +{: .note} + +## Using the processor + +Follow these steps to use the processor in a pipeline. You must provide a model ID, `input_map`, and `output_map` when creating the processor. Before testing a pipeline using the processor, make sure that the model is successfully deployed. You can check the model state using the [Get Model API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/get-model/). + +For local models, you must provide a `model_input` field that specifies the model input format. Add any input fields in `model_config` to `model_input`. + +For externally hosted models, the `model_input` field is optional, and its default value +is `"{ \"parameters\": ${ml_inference.parameters} }`. + +### Setup + +Create an index named `my_index` and index two documents: + +```json +POST /my_index/_doc/1 +{ + "passage_text": "I am excited", + "passage_language": "en", + "label": "POSITIVE", + "passage_embedding": [ + 2.3886719, + 0.032714844, + -0.22229004 + ...] +} +``` +{% include copy-curl.html %} + +```json +POST /my_index/_doc/2 +{ + "passage_text": "I am sad", + "passage_language": "en", + "label": "NEGATIVE", + "passage_embedding": [ + 1.7773438, + 0.4309082, + 1.8857422, + 0.95996094, + ...] +} +``` +{% include copy-curl.html %} + +When you run a term query on the created index without a search pipeline, the query searches for documents that contain the exact term specified in the query. The following query does not return any results because the query text does not match any of the documents in the index: + +```json +GET /my_index/_search +{ + "query": { + "term": { + "passage_text": { + "value": "happy moments", + "boost": 1 + } + } + } +} +``` + +By using a model, the search pipeline can dynamically rewrite the term value to enhance or alter the search results based on the model inference. This means the model takes an initial input from the search query, processes it, and then updates the query term to reflect the model inference, potentially improving the relevance of the search results. + +### Example: Externally hosted model + +The following example configures an `ml_inference` processor with an externally hosted model. + +**Step 1: Create a pipeline** + +This example demonstrates how to create a search pipeline for an externally hosted sentiment analysis model that rewrites the term query value. The model requires an `inputs` field and produces results in a `label` field. Because the `function_name` is not specified, it defaults to `remote`, indicating an externally hosted model. + +The term query value is rewritten based on the model's output. The `ml_inference` processor in the search request needs an `input_map` to retrieve the query field value for the model input and an `output_map` to assign the model output to the query string. + +In this example, an `ml_inference` search request processor is used for the following term query: + +```json + { + "query": { + "term": { + "label": { + "value": "happy moments", + "boost": 1 + } + } + } +} +``` + +The following request creates a search pipeline that rewrites the preceding term query: + +```json +PUT /_search/pipeline/ml_inference_pipeline +{ + "description": "Generate passage_embedding for searched documents", + "processors": [ + { + "ml_inference": { + "model_id": "", + "input_map": [ + { + "inputs": "query.term.label.value" + } + ], + "output_map": [ + { + "query.term.label.value": "label" + } + ] + } + } + ] +} +``` +{% include copy-curl.html %} + +When making a Predict API request to an externally hosted model, all necessary fields and parameters are usually contained within a `parameters` object: + +```json +POST /_plugins/_ml/models/cleMb4kBJ1eYAeTMFFg4/_predict +{ + "parameters": { + "inputs": [ + { + ... + } + ] + } +} +``` + +Thus, to use an externally hosted sentiment analysis model, send a Predict API request in the following format: + +```json +POST /_plugins/_ml/models/cywgD5EB6KAJXDLxyDp1/_predict +{ + "parameters": { + "inputs": "happy moments" + } +} +``` +{% include copy-curl.html %} + +The model processes the input and generates a prediction based on the sentiment of the input text. In this case, the sentiment is positive: + +```json +{ + "inference_results": [ + { + "output": [ + { + "name": "response", + "dataAsMap": { + "label": "POSITIVE", + "score": "0.948" + } + } + ], + "status_code": 200 + } + ] +} +``` + +When specifying the `input_map` for an externally hosted model, you can directly reference the `inputs` field instead of providing its dot path `parameters.inputs`: + +```json +"input_map": [ + { + "inputs": "query.term.label.value" + } +] +``` + +**Step 2: Run the pipeline** + +Once you have created a search pipeline, you can run the same term query with the search pipeline: + +```json +GET /my_index/_search?search_pipeline=my_pipeline_request_review +{ + "query": { + "term": { + "label": { + "value": "happy moments", + "boost": 1 + } + } + } +} +``` +{% include copy-curl.html %} + +The query term value is rewritten based on the model's output. The model determines that the sentiment of the query term is positive, so the rewritten query appears as follows: + +```json +{ + "query": { + "term": { + "label": { + "value": "POSITIVE", + "boost": 1 + } + } + } +} +``` + +The response includes the document whose `label` field has the value `POSITIVE`: + +```json +{ + "took": 288, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 0.00009405752, + "hits": [ + { + "_index": "my_index", + "_id": "3", + "_score": 0.00009405752, + "_source": { + "passage_text": "I am excited", + "passage_language": "en", + "label": "POSITIVE" + } + } + ] + } +} +``` + +### Example: Local model + +The following example shows you how to configure an `ml_inference` processor with a local model to rewrite a term query into a k-NN query. + +**Step 1: Create a pipeline** + +The following example shows you how to create a search pipeline for the `huggingface/sentence-transformers/all-distilroberta-v1` local model. The model is a [pretrained sentence transformer model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#sentence-transformers) +hosted in your OpenSearch cluster. + +If you invoke the model using the Predict API, then the request appears as follows: + +```json +POST /_plugins/_ml/_predict/text_embedding/cleMb4kBJ1eYAeTMFFg4 +{ + "text_docs": [ + "today is sunny" + ], + "return_number": true, + "target_response": [ + "sentence_embedding" + ] +} +``` + +Using this schema, specify the `model_input` as follows: + +```json + "model_input": "{ \"text_docs\": ${input_map.text_docs}, \"return_number\": ${model_config.return_number}, \"target_response\": ${model_config.target_response} }" +``` + +In the `input_map`, map the `query.term.passage_embedding.value` query field to the `text_docs` field expected by the model: + +```json +"input_map": [ + { + "text_docs": "query.term.passage_embedding.value" + } +] +``` + +Because you specified the field to be converted into embeddings as a JSON path, you need to set the `full_response_path` to `true`. Then the full JSON document is parsed in order to obtain the input field: + +```json +"full_response_path": true +``` + +The text in the `query.term.passage_embedding.value` field will be used to generate embeddings: + +```json +{ + "text_docs": "happy passage" +} +``` + +The Predict API request returns the following response: + +```json +{ + "inference_results": [ + { + "output": [ + { + "name": "sentence_embedding", + "data_type": "FLOAT32", + "shape": [ + 768 + ], + "data": [ + 0.25517133, + -0.28009856, + 0.48519906, + ... + ] + } + ] + } + ] +} +``` + +The model generates embeddings in the `$.inference_results.*.output.*.data` field. The `output_map` maps this field to the query field in the query template: + +```json +"output_map": [ + { + "modelPredictionOutcome": "$.inference_results.*.output.*.data" + } +] +``` + +To configure an `ml_inference` search request processor with a local model, specify the `function_name` explicitly. In this example, the `function_name` is `text_embedding`. For information about valid `function_name` values, see [Configuration parameters](#configuration-parameters). + +The following is the final configuration of the `ml_inference` processor with the local model: + +```json +PUT /_search/pipeline/ml_inference_pipeline_local +{ + "description": "searchs reviews and generates embeddings", + "processors": [ + { + "ml_inference": { + "function_name": "text_embedding", + "full_response_path": true, + "model_id": "", + "model_config": { + "return_number": true, + "target_response": [ + "sentence_embedding" + ] + }, + "model_input": "{ \"text_docs\": ${input_map.text_docs}, \"return_number\": ${model_config.return_number}, \"target_response\": ${model_config.target_response} }", + "query_template": """{ + "size": 2, + "query": { + "knn": { + "passage_embedding": { + "vector": ${modelPredictionOutcome}, + "k": 5 + } + } + } + }""", + "input_map": [ + { + "text_docs": "query.term.passage_embedding.value" + } + ], + "output_map": [ + { + "modelPredictionOutcome": "$.inference_results.*.output.*.data" + } + ], + "ignore_missing": true, + "ignore_failure": true + } + } + ] +} +``` +{% include copy-curl.html %} + +**Step 2: Run the pipeline** + +Run the following query, providing the pipeline name in the request: + +```json +GET /my_index/_search?search_pipeline=ml_inference_pipeline_local +{ +"query": { + "term": { + "passage_embedding": { + "value": "happy passage" + } + } + } +} +``` +{% include copy-curl.html %} + +The response confirms that the processor ran a k-NN query, which returned document 1 with a higher score: + +```json +{ + "took": 288, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 2, + "relation": "eq" + }, + "max_score": 0.00009405752, + "hits": [ + { + "_index": "my_index", + "_id": "1", + "_score": 0.00009405752, + "_source": { + "passage_text": "I am excited", + "passage_language": "en", + "label": "POSITIVE", + "passage_embedding": [ + 2.3886719, + 0.032714844, + -0.22229004 + ...] + } + }, + { + "_index": "my_index", + "_id": "2", + "_score": 0.00001405052, + "_source": { + "passage_text": "I am sad", + "passage_language": "en", + "label": "NEGATIVE", + "passage_embedding": [ + 1.7773438, + 0.4309082, + 1.8857422, + 0.95996094, + ... + ] + } + } + ] + } +} +``` diff --git a/_search-plugins/search-pipelines/ml-inference-search-response.md b/_search-plugins/search-pipelines/ml-inference-search-response.md new file mode 100644 index 0000000000..e2ed7889c7 --- /dev/null +++ b/_search-plugins/search-pipelines/ml-inference-search-response.md @@ -0,0 +1,391 @@ +--- +layout: default +title: ML inference (response) +nav_order: 40 +has_children: false +parent: Search processors +grand_parent: Search pipelines +--- + +# ML inference search response processor +Introduced 2.16 +{: .label .label-purple } + +The `ml_inference` search response processor is used to invoke registered machine learning (ML) models in order to incorporate their outputs as new fields in documents within search results. + +**PREREQUISITE**
+Before using the `ml_inference` search response processor, you must have either a local ML model hosted on your OpenSearch cluster or an externally hosted model connected to your OpenSearch cluster through the ML Commons plugin. For more information about local models, see [Using ML models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/). For more information about externally hosted models, see [Connecting to externally hosted models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index/). +{: .note} + +## Syntax + +The following is the syntax for the `ml-inference` search response processor: + +```json +{ + "ml_inference": { + "model_id": "", + "function_name": "", + "full_response_path": "", + "model_config":{ + "": "" + }, + "model_input": "", + "input_map": [ + { + "": "" + } + ], + "output_map": [ + { + "": "" + } + ], + "override": "", + "one_to_one": false + } +} +``` +{% include copy-curl.html %} + +## Request fields + +The following table lists the required and optional parameters for the `ml-inference` search response processor. + +| Parameter | Data type | Required/Optional | Description | +|:--| :--- | :--- |:---| +| `model_id` | String | Required | The ID of the ML model used by the processor. | +| `function_name` | String | Optional for externally hosted models

Required for local models | The function name of the ML model configured in the processor. For local models, valid values are `sparse_encoding`, `sparse_tokenize`, `text_embedding`, and `text_similarity`. For externally hosted models, valid value is `remote`. Default is `remote`. | +| `model_config` | Object | Optional | Custom configuration options for the ML model. For more information, see [The `model_config` object]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/register-model/#the-model_config-object).| +| `model_input` | String | Optional for externally hosted models

Required for local models | A template that defines the input field format expected by the model. Each local model type might use a different set of inputs. For externally hosted models, default is `"{ \"parameters\": ${ml_inference.parameters} }`. | +| `input_map` | Array | Optional for externally hosted models

Required for local models | An array specifying how to map document fields in the search response to the model input fields. Each element of the array is a map in the `"": ""` format and corresponds to one model invocation of a document field. If no input mapping is specified for an externally hosted model, then all document fields are passed to the model directly as input. The `input_map` size indicates the number of times the model is invoked (the number of Predict API requests). | +| `` | String | Optional for externally hosted models

Required for local models | The model input field name. | +| `` | String | Optional for externally hosted models

Required for local models | The name or JSON path of the document field in the search response used as the model input. | +| `output_map` | Array | Optional for externally hosted models

Required for local models | An array specifying how to map the model output fields to new fields in the search response document. Each element of the array is a map in the `"": ""` format. | +| `` | String | Optional for externally hosted models

Required for local models | The name of the new field in the document in which the model's output (specified by `model_output`) is stored. If no output mapping is specified for externally hosted models, then all fields from the model output are added to the new document field. | +| `` | String | Optional for externally hosted models

Required for local models | The name or JSON path of the field in the model output to be stored in the `new_document_field`. | +| `full_response_path` | Boolean | Optional | Set this parameter to `true` if the `model_output_field` contains a full JSON path to the field instead of the field name. The model output will then be fully parsed to get the value of the field. Default is `true` for local models and `false` for externally hosted models. | +| `ignore_missing` | Boolean | Optional | If `true` and any of the input fields defined in the `input_map` or `output_map` are missing, then the missing fields are ignored. Otherwise, a missing field causes a failure. Default is `false`. | +| `ignore_failure` | Boolean | Optional | Specifies whether the processor continues execution even if it encounters an error. If `true`, then any failure is ignored and the search continues. If `false`, then any failure causes the search to be canceled. Default is `false`. | +| `override` | Boolean | Optional | Relevant if a document in the response already contains a field with the name specified in ``. If `override` is `false`, then the input field is skipped. If `true`, then the existing field value is overridden by the new model output. Default is `false`. | +| `max_prediction_tasks` | Integer | Optional | The maximum number of concurrent model invocations that can run during document search. Default is `10`. | +| `one_to_one` | Boolean | Optional | Set this parameter to `true` to invoke the model once (make one Predict API request) for each document. Default value (`false`) specifies to invoke the model with all documents from the search response, making one Predict API request. | +| `description` | String | Optional | A brief description of the processor. | +| `tag` | String | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | + +The `input_map` and `output_map` mappings support standard [JSON path](https://github.com/json-path/JsonPath) notation for specifying complex data structures. +{: .note} + +### Setup + +Create an index named `my_index` and index one document to explain the mappings: + +```json +POST /my_index/_doc/1 +{ + "passage_text": "hello world" +} +``` +{% include copy-curl.html %} + +## Using the processor + +Follow these steps to use the processor in a pipeline. You must provide a model ID when creating the processor. Before testing a pipeline using the processor, make sure that the model is successfully deployed. You can check the model state using the [Get Model API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/get-model/). + +For local models, you must provide a `model_input` field that specifies the model input format. Add any input fields in `model_config` to `model_input`. + +For remote models, the `model_input` field is optional, and its default value is `"{ \"parameters\": ${ml_inference.parameters} }`. + +### Example: Externally hosted model + +The following example shows you how to configure an `ml_inference` search response processor with an externally hosted model. + +**Step 1: Create a pipeline** + +The following example shows you how to create a search pipeline for an externally hosted text embedding model. The model requires an `input` field and generates results in a `data` field. It converts the text in the `passage_text` field into text embeddings and stores the embeddings in the `passage_embedding` field. The `function_name` is not explicitly specified in the processor configuration, so it defaults to `remote`, signifying an externally hosted model: + +```json +PUT /_search/pipeline/ml_inference_pipeline +{ + "description": "Generate passage_embedding when search documents", + "processors": [ + { + "ml_inference": { + "model_id": "", + "input_map": [ + { + "input": "passage_text" + } + ], + "output_map": [ + { + "passage_embedding": "data" + } + ] + } + } + ] +} +``` +{% include copy-curl.html %} + +When making a Predict API request to an externally hosted model, all necessary fields and parameters are usually contained within a `parameters` object: + +```json +POST /_plugins/_ml/models/cleMb4kBJ1eYAeTMFFg4/_predict +{ + "parameters": { + "input": [ + { + ... + } + ] + } +} +``` + +When specifying the `input_map` for an externally hosted model, you can directly reference the `input` field instead of providing its dot path `parameters.input`: + +```json +"input_map": [ + { + "input": "passage_text" + } +] +``` + +**Step 2: Run the pipeline** + +Run the following query, providing the pipeline name in the request: + +```json +GET /my_index/_search?search_pipeline=ml_inference_pipeline_local +{ + "query": { + "match_all": { + } + } +} +``` +{% include copy-curl.html %} + +The response confirms that the processor has generated text embeddings in the `passage_embedding` field. The document within `_source` now contains both the `passage_text` and `passage_embedding` fields: + +```json + +{ + "took": 288, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 0.00009405752, + "hits": [ + { + "_index": "my_index", + "_id": "1", + "_score": 0.00009405752, + "_source": { + "passage_text": "hello world", + "passage_embedding": [ + 0.017304314, + -0.021530833, + 0.050184276, + 0.08962978, + ...] + } + } + } + ] + } +} +``` + +### Example: Local model + +The following example shows you how to configure an `ml_inference` search response processor with a local model. + +**Step 1: Create a pipeline** + +The following example shows you how to create a search pipeline for the `huggingface/sentence-transformers/all-distilroberta-v1` local model. The model is a [pretrained sentence transformer model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#sentence-transformers) hosted in your OpenSearch cluster. + +If you invoke the model using the Predict API, then the request appears as follows: + +```json +POST /_plugins/_ml/_predict/text_embedding/cleMb4kBJ1eYAeTMFFg4 +{ + "text_docs":[ "today is sunny"], + "return_number": true, + "target_response": ["sentence_embedding"] +} +``` + +Using this schema, specify the `model_input` as follows: + +```json + "model_input": "{ \"text_docs\": ${input_map.text_docs}, \"return_number\": ${model_config.return_number}, \"target_response\": ${model_config.target_response} }" +``` + +In the `input_map`, map the `passage_text` document field to the `text_docs` field expected by the model: + +```json +"input_map": [ + { + "text_docs": "passage_text" + } +] +``` + +Because you specified the field to be converted into embeddings as a JSON path, you need to set the `full_response_path` to `true`. Then the full JSON document is parsed in order to obtain the input field: + +```json +"full_response_path": true +``` + +The text in the `passage_text` field will be used to generate embeddings: + +```json +{ + "passage_text": "hello world" +} +``` + +The Predict API request returns the following response: + +```json +{ + "inference_results" : [ + { + "output" : [ + { + "name" : "sentence_embedding", + "data_type" : "FLOAT32", + "shape" : [ + 768 + ], + "data" : [ + 0.25517133, + -0.28009856, + 0.48519906, + ... + ] + } + ] + } + ] +} +``` + +The model generates embeddings in the `$.inference_results.*.output.*.data` field. The `output_map` maps this field to the newly created `passage_embedding` field in the search response document: + +```json +"output_map": [ + { + "passage_embedding": "$.inference_results.*.output.*.data" + } +] +``` + +To configure an `ml_inference` search response processor with a local model, specify the `function_name` explicitly. In this example, the `function_name` is `text_embedding`. For information about valid `function_name` values, see [Request fields](#request-fields). + +The following is the final configuration of the `ml_inference` search response processor with the local model: + +```json +PUT /_search/pipeline/ml_inference_pipeline_local +{ + "description": "search passage and generates embeddings", + "processors": [ + { + "ml_inference": { + "function_name": "text_embedding", + "full_response_path": true, + "model_id": "", + "model_config": { + "return_number": true, + "target_response": ["sentence_embedding"] + }, + "model_input": "{ \"text_docs\": ${input_map.text_docs}, \"return_number\": ${model_config.return_number}, \"target_response\": ${model_config.target_response} }", + "input_map": [ + { + "text_docs": "passage_text" + } + ], + "output_map": [ + { + "passage_embedding": "$.inference_results.*.output.*.data" + } + ], + "ignore_missing": true, + "ignore_failure": true + } + } + ] +} +``` +{% include copy-curl.html %} + +**Step 2: Run the pipeline** + +Run the following query, providing the pipeline name in the request: + +```json +GET /my_index/_search?search_pipeline=ml_inference_pipeline_local +{ +"query": { + "term": { + "passage_text": { + "value": "hello" + } + } + } +} +``` +{% include copy-curl.html %} + +#### Response + +The response confirms that the processor has generated text embeddings in the `passage_embedding` field: + +```json +{ + "took": 288, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 0.00009405752, + "hits": [ + { + "_index": "my_index", + "_id": "1", + "_score": 0.00009405752, + "_source": { + "passage_text": "hello world", + "passage_embedding": [ + 0.017304314, + -0.021530833, + 0.050184276, + 0.08962978, + ...] + } + } + ] + } +} +``` \ No newline at end of file diff --git a/_search-plugins/search-pipelines/neural-query-enricher.md b/_search-plugins/search-pipelines/neural-query-enricher.md index e187ea17a9..683eaa7b85 100644 --- a/_search-plugins/search-pipelines/neural-query-enricher.md +++ b/_search-plugins/search-pipelines/neural-query-enricher.md @@ -1,13 +1,15 @@ --- layout: default title: Neural query enricher -nav_order: 12 +nav_order: 50 has_children: false parent: Search processors grand_parent: Search pipelines --- # Neural query enricher processor +Introduced 2.11 +{: .label .label-purple } The `neural_query_enricher` search request processor is designed to set a default machine learning (ML) model ID at the index or field level for [neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/) queries. To learn more about ML models, see [Using ML models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/) and [Connecting to remote models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index/). diff --git a/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md index de36225a99..3ba1e21405 100644 --- a/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md +++ b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md @@ -1,7 +1,7 @@ --- layout: default title: Neural sparse query two-phase -nav_order: 13 +nav_order: 60 parent: Search processors grand_parent: Search pipelines --- diff --git a/_search-plugins/search-pipelines/normalization-processor.md b/_search-plugins/search-pipelines/normalization-processor.md index a8fad2e40d..ac29b079f1 100644 --- a/_search-plugins/search-pipelines/normalization-processor.md +++ b/_search-plugins/search-pipelines/normalization-processor.md @@ -1,13 +1,15 @@ --- layout: default title: Normalization -nav_order: 15 +nav_order: 70 has_children: false parent: Search processors grand_parent: Search pipelines --- # Normalization processor +Introduced 2.10 +{: .label .label-purple } The `normalization-processor` is a search phase results processor that runs between the query and fetch phases of search execution. It intercepts the query phase results and then normalizes and combines the document scores from different query clauses before passing the documents to the fetch phase. diff --git a/_search-plugins/search-pipelines/oversample-processor.md b/_search-plugins/search-pipelines/oversample-processor.md index 698d9572cf..81f4252f3d 100644 --- a/_search-plugins/search-pipelines/oversample-processor.md +++ b/_search-plugins/search-pipelines/oversample-processor.md @@ -1,13 +1,15 @@ --- layout: default title: Oversample -nav_order: 17 +nav_order: 80 has_children: false parent: Search processors grand_parent: Search pipelines --- # Oversample processor +Introduced 2.12 +{: .label .label-purple } The `oversample` request processor multiplies the `size` parameter of the search request by a specified `sample_factor` (>= 1.0), saving the original value in the `original_size` pipeline variable. The `oversample` processor is designed to work with the [`truncate_hits` response processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/truncate-hits-processor/) but may be used on its own. diff --git a/_search-plugins/search-pipelines/personalize-search-ranking.md b/_search-plugins/search-pipelines/personalize-search-ranking.md index c7a7dd8dde..b63ba4b966 100644 --- a/_search-plugins/search-pipelines/personalize-search-ranking.md +++ b/_search-plugins/search-pipelines/personalize-search-ranking.md @@ -8,6 +8,8 @@ grand_parent: Search pipelines --- # Personalize search ranking processor +Introduced 2.9 +{: .label .label-purple } The `personalize_search_ranking` search response processor intercepts a search response and uses [Amazon Personalize](https://aws.amazon.com/personalize/) to rerank search results according to their Amazon Personalize ranking. This ranking is based on the user's past behavior and metadata about the search items and the user. diff --git a/_search-plugins/search-pipelines/rag-processor.md b/_search-plugins/search-pipelines/rag-processor.md index 7137134aff..60257ebd05 100644 --- a/_search-plugins/search-pipelines/rag-processor.md +++ b/_search-plugins/search-pipelines/rag-processor.md @@ -1,13 +1,15 @@ --- layout: default title: Retrieval-augmented generation -nav_order: 18 +nav_order: 90 has_children: false parent: Search processors grand_parent: Search pipelines --- # Retrieval-augmented generation processor +Introduced 2.12 +{: .label .label-purple } The `retrieval_augmented_generation` processor is a search results processor that you can use in [conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/) for retrieval-augmented generation (RAG). The processor intercepts query results, retrieves previous messages from the conversation from the conversational memory, and sends a prompt to a large language model (LLM). After the processor receives a response from the LLM, it saves the response in conversational memory and returns both the original OpenSearch query results and the LLM response. diff --git a/_search-plugins/search-pipelines/rename-field-processor.md b/_search-plugins/search-pipelines/rename-field-processor.md index cb01125df5..9c734af656 100644 --- a/_search-plugins/search-pipelines/rename-field-processor.md +++ b/_search-plugins/search-pipelines/rename-field-processor.md @@ -1,13 +1,15 @@ --- layout: default title: Rename field -nav_order: 20 +nav_order: 100 has_children: false parent: Search processors grand_parent: Search pipelines --- # Rename field processor +Introduced 2.8 +{: .label .label-purple } The `rename_field` search response processor intercepts a search response and renames the specified field. This is useful when your index and your application use different names for the same field. For example, if you rename a field in your index, the `rename_field` processor can change the new name to the old one before sending the response to your application. diff --git a/_search-plugins/search-pipelines/rerank-processor.md b/_search-plugins/search-pipelines/rerank-processor.md index 73bacd35c9..313ae5f74d 100644 --- a/_search-plugins/search-pipelines/rerank-processor.md +++ b/_search-plugins/search-pipelines/rerank-processor.md @@ -1,13 +1,15 @@ --- layout: default title: Rerank -nav_order: 25 +nav_order: 110 has_children: false parent: Search processors grand_parent: Search pipelines --- # Rerank processor +Introduced 2.12 +{: .label .label-purple } The `rerank` search request processor intercepts search results and passes them to a cross-encoder model to be reranked. The model reranks the results, taking into account the scoring context. Then the processor orders documents in the search results based on their new scores. diff --git a/_search-plugins/search-pipelines/script-processor.md b/_search-plugins/search-pipelines/script-processor.md index e1e629e398..1fd1d08e57 100644 --- a/_search-plugins/search-pipelines/script-processor.md +++ b/_search-plugins/search-pipelines/script-processor.md @@ -1,13 +1,15 @@ --- layout: default title: Script -nav_order: 30 +nav_order: 120 has_children: false parent: Search processors grand_parent: Search pipelines --- # Script processor +Introduced 2.8 +{: .label .label-purple } The `script` search request processor intercepts a search request and adds an inline Painless script that is run on incoming requests. The script can only run on the following request fields: diff --git a/_search-plugins/search-pipelines/search-processors.md b/_search-plugins/search-pipelines/search-processors.md index ad515cc541..d696859a78 100644 --- a/_search-plugins/search-pipelines/search-processors.md +++ b/_search-plugins/search-pipelines/search-processors.md @@ -24,10 +24,11 @@ The following table lists all supported search request processors. Processor | Description | Earliest available version :--- | :--- | :--- [`filter_query`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/filter-query-processor/) | Adds a filtering query that is used to filter requests. | 2.8 -[`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) | Sets a default model for neural search and neural sparse search at the index or field level. | 2.11(neural), 2.13(neural sparse) -[`script`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/script-processor/) | Adds a script that is run on newly indexed documents. | 2.8 +[`ml_inference`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/ml-inference-search-request/) | Invokes registered machine learning (ML) models in order to rewrite queries. | 2.16 +[`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) | Sets a default model for neural search and neural sparse search at the index or field level. | 2.11 (neural), 2.13 (neural sparse) +[`neural_sparse_two_phase_processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-sparse-query-two-phase-processor/) | Accelerates the neural sparse query. | 2.15 [`oversample`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/oversample-processor/) | Increases the search request `size` parameter, storing the original value in the pipeline state. | 2.12 - +[`script`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/script-processor/) | Adds a script that is run on newly indexed documents. | 2.8 ## Search response processors @@ -38,6 +39,7 @@ The following table lists all supported search response processors. Processor | Description | Earliest available version :--- | :--- | :--- [`collapse`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/collapse-processor/)| Deduplicates search hits based on a field value, similarly to `collapse` in a search request. | 2.12 +[`ml_inference`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/ml-inference-search-response/) | Invokes registered machine learning (ML) models in order to incorporate model output as additional search response fields. | 2.16 [`personalize_search_ranking`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/personalize-search-ranking/) | Uses [Amazon Personalize](https://aws.amazon.com/personalize/) to rerank search results (requires setting up the Amazon Personalize service). | 2.9 [`rename_field`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rename-field-processor/)| Renames an existing field. | 2.8 [`rerank`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/)| Reranks search results using a cross-encoder model. | 2.12 diff --git a/_search-plugins/search-pipelines/sort-processor.md b/_search-plugins/search-pipelines/sort-processor.md index dde05c1b3a..6df2352c1e 100644 --- a/_search-plugins/search-pipelines/sort-processor.md +++ b/_search-plugins/search-pipelines/sort-processor.md @@ -1,13 +1,15 @@ --- layout: default title: Sort -nav_order: 32 +nav_order: 130 has_children: false parent: Search processors grand_parent: Search pipelines --- # Sort processor +Introduced 2.16 +{: .label .label-purple } The `sort` processor sorts an array of items in either ascending or descending order. Numeric arrays are sorted numerically, while string or mixed arrays (strings and numbers) are sorted lexicographically. The processor throws an error if the input is not an array. diff --git a/_search-plugins/search-pipelines/split-processor.md b/_search-plugins/search-pipelines/split-processor.md index 6830f81ec3..4afe49e6d2 100644 --- a/_search-plugins/search-pipelines/split-processor.md +++ b/_search-plugins/search-pipelines/split-processor.md @@ -1,13 +1,15 @@ --- layout: default title: Split -nav_order: 33 +nav_order: 140 has_children: false parent: Search processors grand_parent: Search pipelines --- # Split processor +Introduced 2.16 +{: .label .label-purple } The `split` processor splits a string field into an array of substrings based on a specified delimiter. diff --git a/_search-plugins/search-pipelines/truncate-hits-processor.md b/_search-plugins/search-pipelines/truncate-hits-processor.md index 871879efe3..7bba627734 100644 --- a/_search-plugins/search-pipelines/truncate-hits-processor.md +++ b/_search-plugins/search-pipelines/truncate-hits-processor.md @@ -1,13 +1,15 @@ --- layout: default title: Truncate hits -nav_order: 35 +nav_order: 150 has_children: false parent: Search processors grand_parent: Search pipelines --- # Truncate hits processor +Introduced 2.12 +{: .label .label-purple } The `truncate_hits` response processor discards returned search hits after a given hit count is reached. The `truncate_hits` processor is designed to work with the [`oversample` request processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/oversample-processor/) but may be used on its own. From ae9ab080ea0a0968485a2057646fab234ec65508 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Mon, 5 Aug 2024 10:51:38 -0400 Subject: [PATCH 094/154] Refactor k-NN documentation (#7890) * Refactor k-NN documentation Signed-off-by: Fanit Kolchina * Change field name for cohesiveness Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Heather Halter Co-authored-by: Nathan Bower --- _search-plugins/knn/approximate-knn.md | 98 +++++++---------------- _search-plugins/knn/knn-score-script.md | 70 +++++----------- _search-plugins/knn/painless-functions.md | 5 +- 3 files changed, 51 insertions(+), 122 deletions(-) diff --git a/_search-plugins/knn/approximate-knn.md b/_search-plugins/knn/approximate-knn.md index 0b5a48059b..e9cff8562f 100644 --- a/_search-plugins/knn/approximate-knn.md +++ b/_search-plugins/knn/approximate-knn.md @@ -76,8 +76,9 @@ PUT my-knn-index-1 } } ``` +{% include copy-curl.html %} -In the example above, both `knn_vector` fields are configured from method definitions. Additionally, `knn_vector` fields can also be configured from models. You can learn more about this in the [knn_vector data type]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) section. +In the preceding example, both `knn_vector` fields are configured using method definitions. Additionally, `knn_vector` fields can be configured using models. For more information, see [k-NN vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/). The `knn_vector` data type supports a vector of floats that can have a dimension count of up to 16,000 for the NMSLIB, Faiss, and Lucene engines, as set by the dimension mapping parameter. @@ -106,8 +107,8 @@ POST _bulk { "my_vector2": [4.5, 5.5, 6.7, 3.7], "price": 4.4 } { "index": { "_index": "my-knn-index-1", "_id": "9" } } { "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 8.9 } - ``` +{% include copy-curl.html %} Then you can execute an approximate nearest neighbor search on the data using the `knn` query type: @@ -125,6 +126,7 @@ GET my-knn-index-1/_search } } ``` +{% include copy-curl.html %} ### The number of returned results @@ -148,10 +150,9 @@ Starting in OpenSearch 2.14, you can use `k`, `min_score`, or `max_distance` for ### Building a k-NN index from a model -For some of the algorithms that we support, the native library index needs to be trained before it can be used. It would be expensive to training every newly created segment, so, instead, we introduce the concept of a *model* that is used to initialize the native library index during segment creation. A *model* is created by calling the [Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-a-model), passing in the source of training data as well as the method definition of the model. Once training is complete, the model will be serialized to a k-NN model system index. Then, during indexing, the model is pulled from this index to initialize the segments. +For some of the algorithms that the k-NN plugin supports, the native library index needs to be trained before it can be used. It would be expensive to train every newly created segment, so, instead, the plugin features the concept of a *model* that initializes the native library index during segment creation. You can create a model by calling the [Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-a-model) and passing in the source of the training data and the method definition of the model. Once training is complete, the model is serialized to a k-NN model system index. Then, during indexing, the model is pulled from this index to initialize the segments. -To train a model, we first need an OpenSearch index with training data in it. Training data can come from -any `knn_vector` field that has a dimension matching the dimension of the model you want to create. Training data can be the same data that you are going to index or have in a separate set. Let's create a training index: +To train a model, you first need an OpenSearch index containing training data. Training data can come from any `knn_vector` field that has a dimension matching the dimension of the model you want to create. Training data can be the same data that you are going to index or data in a separate set. To create a training index, send the following request: ```json PUT /train-index @@ -170,6 +171,7 @@ PUT /train-index } } ``` +{% include copy-curl.html %} Notice that `index.knn` is not set in the index settings. This ensures that you do not create native library indexes for this index. @@ -186,8 +188,9 @@ POST _bulk { "index": { "_index": "train-index", "_id": "4" } } { "train-field": [1.5, 5.5, 4.5, 6.4]} ``` +{% include copy-curl.html %} -After indexing into the training index completes, we can call the Train API: +After indexing into the training index completes, you can call the Train API: ```json POST /_plugins/_knn/models/my-model/_train @@ -207,8 +210,9 @@ POST /_plugins/_knn/models/my-model/_train } } ``` +{% include copy-curl.html %} -The Train API will return as soon as the training job is started. To check its status, we can use the Get Model API: +The Train API returns as soon as the training job is started. To check the job status, use the Get Model API: ```json GET /_plugins/_knn/models/my-model?filter_path=state&pretty @@ -216,9 +220,9 @@ GET /_plugins/_knn/models/my-model?filter_path=state&pretty "state": "training" } ``` +{% include copy-curl.html %} -Once the model enters the "created" state, you can create an index that will use this model to initialize its native -library indexes: +Once the model enters the `created` state, you can create an index that will use this model to initialize its native library indexes: ```json PUT /target-index @@ -238,8 +242,10 @@ PUT /target-index } } ``` +{% include copy-curl.html %} + +Lastly, you can add the documents you want to be searched to the index: -Lastly, we can add the documents we want to be searched to the index: ```json POST _bulk { "index": { "_index": "target-index", "_id": "1" } } @@ -250,8 +256,8 @@ POST _bulk { "target-field": [4.5, 5.5, 6.7, 3.7]} { "index": { "_index": "target-index", "_id": "4" } } { "target-field": [1.5, 5.5, 4.5, 6.4]} -... ``` +{% include copy-curl.html %} After data is ingested, it can be searched in the same way as any other `knn_vector` field. @@ -265,7 +271,7 @@ GET my-knn-index-1/_search "size": 2, "query": { "knn": { - "my_vector2": { + "target-field": { "vector": [2, 3, 5, 6], "k": 2, "method_parameters" : { @@ -294,7 +300,7 @@ Engine | Radial query support | Notes #### `nprobes` -You can provide the `nprobes` parameter when searching an index created using the `ivf` method. The `nprobes` parameter specifies the number of `nprobes` clusters to examine in order to find the top k nearest neighbors. Higher `nprobes` values improve recall at the cost of increased search latency. The value must be positive. +You can provide the `nprobes` parameter when searching an index created using the `ivf` method. The `nprobes` parameter specifies the number of buckets to examine in order to find the top k nearest neighbors. Higher `nprobes` values improve recall at the cost of increased search latency. The value must be positive. The following table provides information about the `nprobes` parameter for the supported engines. @@ -320,68 +326,24 @@ To learn more about using binary vectors with k-NN search, see [Binary k-NN vect ## Spaces -A space corresponds to the function used to measure the distance between two points in order to determine the k-nearest neighbors. From the k-NN perspective, a lower score equates to a closer and better result. This is the opposite of how OpenSearch scores results, where a greater score equates to a better result. To convert distances to OpenSearch scores, we take 1 / (1 + distance). The k-NN plugin supports the following spaces. +A _space_ corresponds to the function used to measure the distance between two points in order to determine the k-nearest neighbors. From the k-NN perspective, a lower score equates to a closer and better result. This is the opposite of how OpenSearch scores results, where a higher score equates to a better result. The k-NN plugin supports the following spaces. Not every method supports each of these spaces. Be sure to check out [the method documentation]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#method-definitions) to make sure the space you are interested in is supported. {: note.} +| Space type | Distance function ($$d$$ ) | OpenSearch score | +| :--- | :--- | :--- | +| `l1` | $$ d(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^n \lvert x_i - y_i \rvert $$ | $$ score = {1 \over {1 + d} } $$ | +| `l2` | $$ d(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^n (x_i - y_i)^2 $$ | $$ score = {1 \over 1 + d } $$ | +| `linf` | $$ d(\mathbf{x}, \mathbf{y}) = max(\lvert x_i - y_i \rvert) $$ | $$ score = {1 \over 1 + d } $$ | +| `cosinesimil` | $$ d(\mathbf{x}, \mathbf{y}) = 1 - cos { \theta } = 1 - {\mathbf{x} \cdot \mathbf{y} \over \lVert \mathbf{x}\rVert \cdot \lVert \mathbf{y}\rVert}$$$$ = 1 - {\sum_{i=1}^n x_i y_i \over \sqrt{\sum_{i=1}^n x_i^2} \cdot \sqrt{\sum_{i=1}^n y_i^2}}$$,
where $$\lVert \mathbf{x}\rVert$$ and $$\lVert \mathbf{y}\rVert$$ represent the norms of vectors $$\mathbf{x}$$ and $$\mathbf{y}$$, respectively. | **NMSLIB** and **Faiss**:
$$ score = {1 \over 1 + d } $$

**Lucene**:
$$ score = {2 - d \over 2}$$ | +| `innerproduct` (supported for Lucene in OpenSearch version 2.13 and later) | **NMSLIB** and **Faiss**:
$$ d(\mathbf{x}, \mathbf{y}) = - {\mathbf{x} \cdot \mathbf{y}} = - \sum_{i=1}^n x_i y_i $$

**Lucene**:
$$ d(\mathbf{x}, \mathbf{y}) = {\mathbf{x} \cdot \mathbf{y}} = \sum_{i=1}^n x_i y_i $$ | **NMSLIB** and **Faiss**:
$$ \text{If} d \ge 0, score = {1 \over 1 + d }$$
$$\text{If} d < 0, score = −d + 1$$

**Lucene:**
$$ \text{If} d > 0, score = d + 1 $$
$$\text{If} d \le 0, score = {1 \over 1 + (-1 \cdot d) }$$ | +| `hamming` (supported for binary vectors in OpenSearch version 2.16 and later) | $$ d(\mathbf{x}, \mathbf{y}) = \text{countSetBits}(\mathbf{x} \oplus \mathbf{y})$$ | $$ score = {1 \over 1 + d } $$ | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Space typeDistance function (d)OpenSearch score
l1\[ d(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^n |x_i - y_i| \]\[ score = {1 \over 1 + d } \]
l2\[ d(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^n (x_i - y_i)^2 \]\[ score = {1 \over 1 + d } \]
linf\[ d(\mathbf{x}, \mathbf{y}) = max(|x_i - y_i|) \]\[ score = {1 \over 1 + d } \]
cosinesimil\[ d(\mathbf{x}, \mathbf{y}) = 1 - cos { \theta } = 1 - {\mathbf{x} · \mathbf{y} \over \|\mathbf{x}\| · \|\mathbf{y}\|}\]\[ = 1 - - {\sum_{i=1}^n x_i y_i \over \sqrt{\sum_{i=1}^n x_i^2} · \sqrt{\sum_{i=1}^n y_i^2}}\] - where \(\|\mathbf{x}\|\) and \(\|\mathbf{y}\|\) represent the norms of vectors x and y respectively.nmslib and faiss:\[ score = {1 \over 1 + d } \]
Lucene:\[ score = {2 - d \over 2}\]
innerproduct (supported for Lucene in OpenSearch version 2.13 and later)\[ d(\mathbf{x}, \mathbf{y}) = - {\mathbf{x} · \mathbf{y}} = - \sum_{i=1}^n x_i y_i \] -
Lucene: - \[ d(\mathbf{x}, \mathbf{y}) = {\mathbf{x} · \mathbf{y}} = \sum_{i=1}^n x_i y_i \] -
\[ \text{If} d \ge 0, \] \[score = {1 \over 1 + d }\] \[\text{If} d < 0, score = −d + 1\] -
Lucene: - \[ \text{If} d > 0, score = d + 1 \] \[\text{If} d \le 0\] \[score = {1 \over 1 + (-1 · d) }\] -
hamming (supported for binary vectors in OpenSearch version 2.16 and later)\[ d(\mathbf{x}, \mathbf{y}) = \text{countSetBits}(\mathbf{x} \oplus \mathbf{y})\]\[ score = {1 \over 1 + d } \]
- -The cosine similarity formula does not include the `1 -` prefix. However, because similarity search libraries equates -smaller scores with closer results, they return `1 - cosineSimilarity` for cosine similarity space---that's why `1 -` is -included in the distance function. +The cosine similarity formula does not include the `1 -` prefix. However, because similarity search libraries equate lower scores with closer results, they return `1 - cosineSimilarity` for the cosine similarity space---this is why `1 -` is included in the distance function. {: .note } -With cosine similarity, it is not valid to pass a zero vector (`[0, 0, ...]`) as input. This is because the magnitude of -such a vector is 0, which raises a `divide by 0` exception in the corresponding formula. Requests -containing the zero vector will be rejected and a corresponding exception will be thrown. +With cosine similarity, it is not valid to pass a zero vector (`[0, 0, ...]`) as input. This is because the magnitude of such a vector is 0, which raises a `divide by 0` exception in the corresponding formula. Requests containing the zero vector will be rejected, and a corresponding exception will be thrown. {: .note } The `hamming` space type is supported for binary vectors in OpenSearch version 2.16 and later. For more information, see [Binary k-NN vectors]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#binary-k-nn-vectors). diff --git a/_search-plugins/knn/knn-score-script.md b/_search-plugins/knn/knn-score-script.md index 1a21f49513..d2fd883e74 100644 --- a/_search-plugins/knn/knn-score-script.md +++ b/_search-plugins/knn/knn-score-script.md @@ -38,6 +38,7 @@ PUT my-knn-index-1 } } ``` +{% include copy-curl.html %} If you *only* want to use the score script, you can omit `"index.knn": true`. The benefit of this approach is faster indexing speed and lower memory usage, but you lose the ability to perform standard k-NN queries on the index. {: .tip} @@ -64,8 +65,8 @@ POST _bulk { "my_vector2": [4.5, 5.5, 6.7, 3.7], "price": 4.4 } { "index": { "_index": "my-knn-index-1", "_id": "9" } } { "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 8.9 } - ``` +{% include copy-curl.html %} Finally, you can execute an exact nearest neighbor search on the data using the `knn` script: ```json @@ -90,6 +91,7 @@ GET my-knn-index-1/_search } } ``` +{% include copy-curl.html %} All parameters are required. @@ -122,6 +124,7 @@ PUT my-knn-index-2 } } ``` +{% include copy-curl.html %} Then add some documents: @@ -139,8 +142,8 @@ POST _bulk { "my_vector": [20, 20], "color" : "BLUE" } { "index": { "_index": "my-knn-index-2", "_id": "6" } } { "my_vector": [30, 30], "color" : "BLUE" } - ``` +{% include copy-curl.html %} Finally, use the `script_score` query to pre-filter your documents before identifying nearest neighbors: @@ -172,6 +175,7 @@ GET my-knn-index-2/_search } } ``` +{% include copy-curl.html %} ## Getting started with the score script for binary data The k-NN score script also allows you to run k-NN search on your binary data with the Hamming distance space. @@ -195,6 +199,7 @@ PUT my-index } } ``` +{% include copy-curl.html %} Then add some documents: @@ -212,8 +217,8 @@ POST _bulk { "my_binary": "QSBjb3VwbGUgbW9yZSBkb2NzLi4u", "color" : "BLUE" } { "index": { "_index": "my-index", "_id": "6" } } { "my_binary": "TGFzdCBvbmUh", "color" : "BLUE" } - ``` +{% include copy-curl.html %} Finally, use the `script_score` query to pre-filter your documents before identifying nearest neighbors: @@ -245,6 +250,7 @@ GET my-index/_search } } ``` +{% include copy-curl.html %} Similarly, you can encode your data with the `long` field and run a search: @@ -276,58 +282,20 @@ GET my-long-index/_search } } ``` +{% include copy-curl.html %} ## Spaces -A space corresponds to the function used to measure the distance between two points in order to determine the k-nearest neighbors. From the k-NN perspective, a lower score equates to a closer and better result. This is the opposite of how OpenSearch scores results, where a greater score equates to a better result. The following table illustrates how OpenSearch converts spaces to scores: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
spaceTypeDistance Function (d)OpenSearch Score
l1\[ d(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^n |x_i - y_i| \]\[ score = {1 \over 1 + d } \]
l2\[ d(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^n (x_i - y_i)^2 \]\[ score = {1 \over 1 + d } \]
linf\[ d(\mathbf{x}, \mathbf{y}) = max(|x_i - y_i|) \]\[ score = {1 \over 1 + d } \]
cosinesimil\[ d(\mathbf{x}, \mathbf{y}) = 1 - cos { \theta } = 1 - {\mathbf{x} · \mathbf{y} \over \|\mathbf{x}\| · \|\mathbf{y}\|}\]\[ = 1 - - {\sum_{i=1}^n x_i y_i \over \sqrt{\sum_{i=1}^n x_i^2} · \sqrt{\sum_{i=1}^n y_i^2}}\] - where \(\|\mathbf{x}\|\) and \(\|\mathbf{y}\|\) represent the norms of vectors x and y respectively.\[ score = 2 - d \]
innerproduct (supported for Lucene in OpenSearch version 2.13 and later)\[ d(\mathbf{x}, \mathbf{y}) = - {\mathbf{x} · \mathbf{y}} = - \sum_{i=1}^n x_i y_i \] - \[ \text{If} d \ge 0, \] \[score = {1 \over 1 + d }\] \[\text{If} d < 0, score = −d + 1\] -
- hammingbit (supported for binary and long vectors)

- hamming (supported for binary vectors in OpenSearch version 2.16 and later) -
\[ d(\mathbf{x}, \mathbf{y}) = \text{countSetBits}(\mathbf{x} \oplus \mathbf{y})\]\[ score = {1 \over 1 + d } \]
+A _space_ corresponds to the function used to measure the distance between two points in order to determine the k-nearest neighbors. From the k-NN perspective, a lower score equates to a closer and better result. This is the opposite of how OpenSearch scores results, where a higher score equates to a better result. The following table illustrates how OpenSearch converts spaces to scores. +| Space type | Distance function ($$d$$ ) | OpenSearch score | +| :--- | :--- | :--- | +| `l1` | $$ d(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^n \lvert x_i - y_i \rvert $$ | $$ score = {1 \over {1 + d} } $$ | +| `l2` | $$ d(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^n (x_i - y_i)^2 $$ | $$ score = {1 \over 1 + d } $$ | +| `linf` | $$ d(\mathbf{x}, \mathbf{y}) = max(\lvert x_i - y_i \rvert) $$ | $$ score = {1 \over 1 + d } $$ | +| `cosinesimil` | $$ d(\mathbf{x}, \mathbf{y}) = 1 - cos { \theta } = 1 - {\mathbf{x} \cdot \mathbf{y} \over \lVert \mathbf{x}\rVert \cdot \lVert \mathbf{y}\rVert}$$$$ = 1 - {\sum_{i=1}^n x_i y_i \over \sqrt{\sum_{i=1}^n x_i^2} \cdot \sqrt{\sum_{i=1}^n y_i^2}}$$,
where $$\lVert \mathbf{x}\rVert$$ and $$\lVert \mathbf{y}\rVert$$ represent the norms of vectors $$\mathbf{x}$$ and $$\mathbf{y}$$, respectively. | $$ score = 2 - d $$ | +| `innerproduct` (supported for Lucene in OpenSearch version 2.13 and later) | $$ d(\mathbf{x}, \mathbf{y}) = - {\mathbf{x} \cdot \mathbf{y}} = - \sum_{i=1}^n x_i y_i $$ | $$ \text{If} d \ge 0, score = {1 \over 1 + d }$$
$$\text{If} d < 0, score = −d + 1$$ | +| `hammingbit` (supported for binary and long vectors)

`hamming` (supported for binary vectors in OpenSearch version 2.16 and later) | $$ d(\mathbf{x}, \mathbf{y}) = \text{countSetBits}(\mathbf{x} \oplus \mathbf{y})$$ | $$ score = {1 \over 1 + d } $$ | Cosine similarity returns a number between -1 and 1, and because OpenSearch relevance scores can't be below 0, the k-NN plugin adds 1 to get the final score. diff --git a/_search-plugins/knn/painless-functions.md b/_search-plugins/knn/painless-functions.md index 85840ff535..cc27776fc4 100644 --- a/_search-plugins/knn/painless-functions.md +++ b/_search-plugins/knn/painless-functions.md @@ -41,6 +41,7 @@ GET my-knn-index-2/_search } } ``` +{% include copy-curl.html %} `field` needs to map to a `knn_vector` field, and `query_value` needs to be a floating point array with the same dimension as `field`. @@ -71,7 +72,5 @@ The `hamming` space type is supported for binary vectors in OpenSearch version 2 Because scores can only be positive, this script ranks documents with vector fields higher than those without. -With cosine similarity, it is not valid to pass a zero vector (`[0, 0, ...`]) as input. This is because the magnitude of -such a vector is 0, which raises a `divide by 0` exception when computing the value. Requests -containing the zero vector will be rejected and a corresponding exception will be thrown. +With cosine similarity, it is not valid to pass a zero vector (`[0, 0, ...]`) as input. This is because the magnitude of such a vector is 0, which raises a `divide by 0` exception in the corresponding formula. Requests containing the zero vector will be rejected, and a corresponding exception will be thrown. {: .note } \ No newline at end of file From dfc4c9cedfe8cdef5e321f9c4b31952dc35a74d2 Mon Sep 17 00:00:00 2001 From: Xun Zhang Date: Mon, 5 Aug 2024 07:52:12 -0700 Subject: [PATCH 095/154] Ml commons batch inference (#7899) * add batch inference API Signed-off-by: Xun Zhang * add more links and mark the api as experimental Signed-off-by: Xun Zhang * use openAI as the blueprint example details Signed-off-by: Xun Zhang * address comments Signed-off-by: Xun Zhang * Doc review Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Xun Zhang Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Fanit Kolchina Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- _ml-commons-plugin/api/index.md | 1 - .../api/model-apis/batch-predict.md | 167 ++++++++++++++++++ _ml-commons-plugin/api/model-apis/index.md | 44 +++-- _ml-commons-plugin/api/train-predict/index.md | 24 --- .../api/train-predict/predict.md | 4 +- .../api/train-predict/train-and-predict.md | 4 +- _ml-commons-plugin/api/train-predict/train.md | 4 +- 7 files changed, 207 insertions(+), 41 deletions(-) create mode 100644 _ml-commons-plugin/api/model-apis/batch-predict.md delete mode 100644 _ml-commons-plugin/api/train-predict/index.md diff --git a/_ml-commons-plugin/api/index.md b/_ml-commons-plugin/api/index.md index ec4cf12492..65171b163f 100644 --- a/_ml-commons-plugin/api/index.md +++ b/_ml-commons-plugin/api/index.md @@ -21,6 +21,5 @@ ML Commons supports the following APIs: - [Controller APIs]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/controller-apis/index/) - [Execute Algorithm API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/execute-algorithm/) - [Tasks APIs]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/tasks-apis/index/) -- [Train and Predict APIs]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/train-predict/index/) - [Profile API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/profile/) - [Stats API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/stats/) diff --git a/_ml-commons-plugin/api/model-apis/batch-predict.md b/_ml-commons-plugin/api/model-apis/batch-predict.md new file mode 100644 index 0000000000..b32fbb108d --- /dev/null +++ b/_ml-commons-plugin/api/model-apis/batch-predict.md @@ -0,0 +1,167 @@ +--- +layout: default +title: Batch predict +parent: Model APIs +grand_parent: ML Commons APIs +nav_order: 65 +--- + +# Batch predict + +This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/ml-commons/issues/2488). +{: .warning} + +ML Commons can perform inference on large datasets in an offline asynchronous mode using a model deployed on external model servers. To use the Batch Predict API, you must provide the `model_id` for an externally hosted model. Amazon SageMaker, Cohere, and OpenAI are currently the only verified external servers that support this API. + +For information about user access for this API, see [Model access control considerations]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/index/#model-access-control-considerations). + +For information about externally hosted models, see [Connecting to externally hosted models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index/). + +For instructions on how set up batch inference and connector blueprints, see the following: + +- [Amazon SageMaker batch predict connector blueprint](https://github.com/opensearch-project/ml-commons/blob/main/docs/remote_inference_blueprints/batch_inference_sagemaker_connector_blueprint.md) + +- [OpenAI batch predict connector blueprint](https://github.com/opensearch-project/ml-commons/blob/main/docs/remote_inference_blueprints/batch_inference_openAI_connector_blueprint.md) + +## Path and HTTP methods + +```json +POST /_plugins/_ml/models//_batch_predict +``` + +## Prerequisites + +Before using the Batch Predict API, you need to create a connector to the externally hosted model. For example, to create a connector to an OpenAI `text-embedding-ada-002` model, send the following request: + +```json +POST /_plugins/_ml/connectors/_create +{ + "name": "OpenAI Embedding model", + "description": "OpenAI embedding model for testing offline batch", + "version": "1", + "protocol": "http", + "parameters": { + "model": "text-embedding-ada-002", + "input_file_id": "", + "endpoint": "/v1/embeddings" + }, + "credential": { + "openAI_key": "" + }, + "actions": [ + { + "action_type": "predict", + "method": "POST", + "url": "https://api.openai.com/v1/embeddings", + "headers": { + "Authorization": "Bearer ${credential.openAI_key}" + }, + "request_body": "{ \"input\": ${parameters.input}, \"model\": \"${parameters.model}\" }", + "pre_process_function": "connector.pre_process.openai.embedding", + "post_process_function": "connector.post_process.openai.embedding" + }, + { + "action_type": "batch_predict", + "method": "POST", + "url": "https://api.openai.com/v1/batches", + "headers": { + "Authorization": "Bearer ${credential.openAI_key}" + }, + "request_body": "{ \"input_file_id\": \"${parameters.input_file_id}\", \"endpoint\": \"${parameters.endpoint}\", \"completion_window\": \"24h\" }" + } + ] +} +``` +{% include copy-curl.html %} + +The response contains a connector ID that you'll use in the next steps: + +```json +{ + "connector_id": "XU5UiokBpXT9icfOM0vt" +} +``` + +Next, register an externally hosted model and provide the connector ID of the created connector: + +```json +POST /_plugins/_ml/models/_register?deploy=true +{ + "name": "OpenAI model for realtime embedding and offline batch inference", + "function_name": "remote", + "description": "OpenAI text embedding model", + "connector_id": "XU5UiokBpXT9icfOM0vt" +} +``` +{% include copy-curl.html %} + +The response contains the task ID for the register operation: + +```json +{ + "task_id": "rMormY8B8aiZvtEZIO_j", + "status": "CREATED", + "model_id": "lyjxwZABNrAVdFa9zrcZ" +} +``` + +To check the status of the operation, provide the task ID to the [Tasks API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/tasks-apis/get-task/). Once the registration is complete, the task `state` changes to `COMPLETED`. + +#### Example request + +Once you have completed the prerequisite steps, you can call the Batch Predict API. The parameters in the batch predict request override those defined in the connector: + +```json +POST /_plugins/_ml/models/lyjxwZABNrAVdFa9zrcZ/_batch_predict +{ + "parameters": { + "model": "text-embedding-3-large" + } +} +``` +{% include copy-curl.html %} + +#### Example response + +```json +{ + "inference_results": [ + { + "output": [ + { + "name": "response", + "dataAsMap": { + "id": "batch_", + "object": "batch", + "endpoint": "/v1/embeddings", + "errors": null, + "input_file_id": "file-", + "completion_window": "24h", + "status": "validating", + "output_file_id": null, + "error_file_id": null, + "created_at": 1722037257, + "in_progress_at": null, + "expires_at": 1722123657, + "finalizing_at": null, + "completed_at": null, + "failed_at": null, + "expired_at": null, + "cancelling_at": null, + "cancelled_at": null, + "request_counts": { + "total": 0, + "completed": 0, + "failed": 0 + }, + "metadata": null + } + } + ], + "status_code": 200 + } + ] +} +``` + +For the definition of each field in the result, see [OpenAI Batch API](https://platform.openai.com/docs/guides/batch). Once the batch inference is complete, you can download the output by calling the [OpenAI Files API](https://platform.openai.com/docs/api-reference/files) and providing the file name specified in the `id` field of the response. \ No newline at end of file diff --git a/_ml-commons-plugin/api/model-apis/index.md b/_ml-commons-plugin/api/model-apis/index.md index 444da1fe70..9cf992d54b 100644 --- a/_ml-commons-plugin/api/model-apis/index.md +++ b/_ml-commons-plugin/api/model-apis/index.md @@ -9,16 +9,40 @@ has_toc: false # Model APIs -ML Commons supports the following model-level APIs: - -- [Register model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/register-model/) -- [Deploy model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/deploy-model/) -- [Get model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/get-model/) -- [Search model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/search-model/) -- [Update model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/update-model/) -- [Undeploy model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/undeploy-model/) -- [Delete model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/delete-model/) -- [Predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/train-predict/predict/) (invokes a model) +ML Commons supports the following model-level CRUD APIs: + +- [Register Model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/register-model/) +- [Deploy Model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/deploy-model/) +- [Get Model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/get-model/) +- [Search Model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/search-model/) +- [Update Model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/update-model/) +- [Undeploy Model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/undeploy-model/) +- [Delete Model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/delete-model/) + +# Predict APIs + +Predict APIs are used to invoke machine learning (ML) models. ML Commons supports the following Predict APIs: + +- [Predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/train-predict/predict/) +- [Batch Predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/batch-predict/) (experimental) + +# Train API + +The ML Commons Train API lets you train ML algorithms synchronously and asynchronously: + +- [Train]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/train-predict/train/) + +To train tasks through the API, three inputs are required: + +- Algorithm name: Must be a [FunctionName](https://github.com/opensearch-project/ml-commons/blob/1.3/common/src/main/java/org/opensearch/ml/common/parameter/FunctionName.java). This determines what algorithm the ML model runs. To add a new function, see [How To Add a New Function](https://github.com/opensearch-project/ml-commons/blob/main/docs/how-to-add-new-function.md). +- Model hyperparameters: Adjust these parameters to improve model accuracy. +- Input data: The data that trains the ML model or applies it to predictions. You can input data in two ways: query against your index or use a data frame. + +# Train and Predict API + +The Train and Predict API lets you train and invoke the model using the same dataset: + +- [Train and Predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/train-predict/train-and-predict/) ## Model access control considerations diff --git a/_ml-commons-plugin/api/train-predict/index.md b/_ml-commons-plugin/api/train-predict/index.md deleted file mode 100644 index 8486b4beb9..0000000000 --- a/_ml-commons-plugin/api/train-predict/index.md +++ /dev/null @@ -1,24 +0,0 @@ ---- -layout: default -title: Train and Predict APIs -parent: ML Commons APIs -has_children: true -has_toc: false -nav_order: 30 ---- - -# Train and Predict APIs - -The ML Commons API lets you train machine learning (ML) algorithms synchronously and asynchronously, make predictions with that trained model, and train and predict with the same dataset. - -To train tasks through the API, three inputs are required: - -- Algorithm name: Must be one of a [FunctionName](https://github.com/opensearch-project/ml-commons/blob/1.3/common/src/main/java/org/opensearch/ml/common/parameter/FunctionName.java). This determines what algorithm the ML Engine runs. To add a new function, see [How To Add a New Function](https://github.com/opensearch-project/ml-commons/blob/main/docs/how-to-add-new-function.md). -- Model hyperparameters: Adjust these parameters to improve model accuracy. -- Input data: The data that trains the ML model, or applies the ML models to predictions. You can input data in two ways, query against your index or use a data frame. - -ML Commons supports the following Train and Predict APIs: - -- [Train]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/train-predict/train/) -- [Predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/train-predict/predict/) -- [Train and Predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/train-predict/train-and-predict/) diff --git a/_ml-commons-plugin/api/train-predict/predict.md b/_ml-commons-plugin/api/train-predict/predict.md index 299c957122..ea0938da36 100644 --- a/_ml-commons-plugin/api/train-predict/predict.md +++ b/_ml-commons-plugin/api/train-predict/predict.md @@ -1,9 +1,9 @@ --- layout: default title: Predict -parent: Train and Predict APIs +parent: Model APIs grand_parent: ML Commons APIs -nav_order: 20 +nav_order: 60 --- # Predict diff --git a/_ml-commons-plugin/api/train-predict/train-and-predict.md b/_ml-commons-plugin/api/train-predict/train-and-predict.md index 1df0e5e3be..f8f8f7893a 100644 --- a/_ml-commons-plugin/api/train-predict/train-and-predict.md +++ b/_ml-commons-plugin/api/train-predict/train-and-predict.md @@ -1,9 +1,9 @@ --- layout: default title: Train and predict -parent: Train and Predict APIs +parent: Model APIs grand_parent: ML Commons APIs -nav_order: 10 +nav_order: 70 --- ## Train and predict diff --git a/_ml-commons-plugin/api/train-predict/train.md b/_ml-commons-plugin/api/train-predict/train.md index 8de486198d..80cbf8abdb 100644 --- a/_ml-commons-plugin/api/train-predict/train.md +++ b/_ml-commons-plugin/api/train-predict/train.md @@ -1,9 +1,9 @@ --- layout: default title: Train -parent: Train and Predict APIs +parent: Model APIs grand_parent: ML Commons APIs -nav_order: 10 +nav_order: 50 --- # Train From e9bbf6ab27a6ef8612062196335cd614eda49ae0 Mon Sep 17 00:00:00 2001 From: Peter Alfonsi Date: Mon, 5 Aug 2024 17:19:44 -0700 Subject: [PATCH 096/154] Remove repeated sentence in distributed tracing doc (#7906) Signed-off-by: Peter Alfonsi Co-authored-by: Peter Alfonsi --- _observing-your-data/trace/distributed-tracing.md | 1 - 1 file changed, 1 deletion(-) diff --git a/_observing-your-data/trace/distributed-tracing.md b/_observing-your-data/trace/distributed-tracing.md index fb38933711..4fb464f67c 100644 --- a/_observing-your-data/trace/distributed-tracing.md +++ b/_observing-your-data/trace/distributed-tracing.md @@ -155,7 +155,6 @@ Currently, the distributed tracing feature generates traces and spans for HTTP r 2. **Exporters:** Exporters are responsible for persisting the data. OpenTelemetry provides several out-of-the-box exporters, and OpenSearch supports the following: - `LoggingSpanExporter`: Exports spans to a log file, generating a separate file in the logs directory `_otel_traces.log`. Default is `telemetry.otel.tracer.span.exporter.class=io.opentelemetry.exporter.logging.LoggingSpanExporter`. - `OtlpGrpcSpanExporter`: Exports spans through gRPC. To use this exporter, you need to install the `otel-collector` on the node. By default, it writes to the http://localhost:4317/ endpoint. To use this exporter, set the following static setting: `telemetry.otel.tracer.span.exporter.class=io.opentelemetry.exporter.otlp.trace.OtlpGrpcSpanExporter`. - - `LoggingSpanExporter`: Exports spans to a log file, generating a separate file in the logs directory `_otel_traces.log`. Default is `telemetry.otel.tracer.span.exporter.class=io.opentelemetry.exporter.logging.LoggingSpanExporter`. ### Sampling From 6eccc888acf2ff6ec46d2de416aa9b397c177fef Mon Sep 17 00:00:00 2001 From: AntonEliatra Date: Tue, 6 Aug 2024 14:36:32 +0100 Subject: [PATCH 097/154] Add apostrophe token filter page #7871 (#7884) * adding apostrophe token filter page #7871 Signed-off-by: AntonEliatra * fixing vale error Signed-off-by: AntonEliatra * Update apostrophe-token-filter.md Signed-off-by: AntonEliatra * updating the naming Signed-off-by: AntonEliatra * updating as per the review comments Signed-off-by: AntonEliatra * updating the heading to Apostrophe token filter Signed-off-by: AntonEliatra * updating as per PR comments Signed-off-by: AntonEliatra * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: AntonEliatra * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: AntonEliatra --------- Signed-off-by: AntonEliatra Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- _analyzers/token-filters/apostrophe.md | 118 +++++++++++++++++++++++++ _analyzers/token-filters/index.md | 2 +- 2 files changed, 119 insertions(+), 1 deletion(-) create mode 100644 _analyzers/token-filters/apostrophe.md diff --git a/_analyzers/token-filters/apostrophe.md b/_analyzers/token-filters/apostrophe.md new file mode 100644 index 0000000000..e393bcfdb4 --- /dev/null +++ b/_analyzers/token-filters/apostrophe.md @@ -0,0 +1,118 @@ +--- +layout: default +title: Apostrophe +parent: Token filters +nav_order: 110 +--- + +# Apostrophe token filter + +The `apostrophe` token filter's primary function is to remove possessive apostrophes and anything following them. This can be very useful in analyzing text in languages that rely heavily on apostrophes, such as Turkish, in which apostrophes separate the root word from suffixes, including possessive suffixes, case markers, and other grammatical endings. + + +## Example + +The following example request creates a new index named `custom_text_index` with a custom analyzer configured in `settings` and used in `mappings`: + +```json +PUT /custom_text_index +{ + "settings": { + "analysis": { + "analyzer": { + "custom_analyzer": { + "type": "custom", + "tokenizer": "standard", // splits text into words + "filter": [ + "lowercase", + "apostrophe" + ] + } + } + } + }, + "mappings": { + "properties": { + "content": { + "type": "text", + "analyzer": "custom_analyzer" + } + } + } +} +``` +{% include copy-curl.html %} + +## Generated tokens + +Use the following request to examine the tokens generated using the created analyzer: + +```json +POST /custom_text_index/_analyze +{ + "analyzer": "custom_analyzer", + "text": "John's car is faster than Peter's bike" +} +``` +{% include copy-curl.html %} + +The response contains the generated tokens: + +```json +{ + "tokens": [ + { + "token": "john", + "start_offset": 0, + "end_offset": 6, + "type": "", + "position": 0 + }, + { + "token": "car", + "start_offset": 7, + "end_offset": 10, + "type": "", + "position": 1 + }, + { + "token": "is", + "start_offset": 11, + "end_offset": 13, + "type": "", + "position": 2 + }, + { + "token": "faster", + "start_offset": 14, + "end_offset": 20, + "type": "", + "position": 3 + }, + { + "token": "than", + "start_offset": 21, + "end_offset": 25, + "type": "", + "position": 4 + }, + { + "token": "peter", + "start_offset": 26, + "end_offset": 33, + "type": "", + "position": 5 + }, + { + "token": "bike", + "start_offset": 34, + "end_offset": 38, + "type": "", + "position": 6 + } + ] +} +``` + +The built-in `apostrophe` token filter is not suitable for languages such as French, in which apostrophes are used at the beginning of words. For example, `"C'est l'amour de l'école"` will result in four tokens: "C", "l", "de", and "l". +{: .note} diff --git a/_analyzers/token-filters/index.md b/_analyzers/token-filters/index.md index e6d9875736..f4e9c434e7 100644 --- a/_analyzers/token-filters/index.md +++ b/_analyzers/token-filters/index.md @@ -13,7 +13,7 @@ Token filters receive the stream of tokens from the tokenizer and add, remove, o The following table lists all token filters that OpenSearch supports. Token filter | Underlying Lucene token filter| Description -`apostrophe` | [ApostropheFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/tr/ApostropheFilter.html) | In each token that contains an apostrophe, the `apostrophe` token filter removes the apostrophe itself and all characters following the apostrophe. +[`apostrophe`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/apostrophe/) | [ApostropheFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/tr/ApostropheFilter.html) | In each token containing an apostrophe, the `apostrophe` token filter removes the apostrophe itself and all characters following it. `asciifolding` | [ASCIIFoldingFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html) | Converts alphabetic, numeric, and symbolic characters. `cjk_bigram` | [CJKBigramFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/cjk/CJKBigramFilter.html) | Forms bigrams of Chinese, Japanese, and Korean (CJK) tokens. `cjk_width` | [CJKWidthFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/cjk/CJKWidthFilter.html) | Normalizes Chinese, Japanese, and Korean (CJK) tokens according to the following rules:
- Folds full-width ASCII character variants into the equivalent basic Latin characters.
- Folds half-width Katakana character variants into the equivalent Kana characters. From 47599b30cd020c2723aeb6784c6a786ef279023a Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Tue, 6 Aug 2024 16:22:59 -0400 Subject: [PATCH 098/154] Change the person PRs are assigned to by default (#7920) Signed-off-by: Fanit Kolchina --- .github/workflows/pr_checklist.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/pr_checklist.yml b/.github/workflows/pr_checklist.yml index 4130f5e2bd..b56174793e 100644 --- a/.github/workflows/pr_checklist.yml +++ b/.github/workflows/pr_checklist.yml @@ -32,7 +32,7 @@ jobs: const prOwners = ['Naarcha-AWS', 'kolchfa-aws', 'vagimeli', 'natebower']; if (!prOwners.includes(assignee)) { - assignee = 'hdhalter' + assignee = 'kolchfa-aws' } github.rest.issues.addAssignees({ From 0d9a062dca5ffa1d1019ab0f805b527ab45a210f Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Wed, 7 Aug 2024 09:51:12 -0400 Subject: [PATCH 099/154] Update apostrophe filter nav order (#7925) Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _analyzers/token-filters/apostrophe.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_analyzers/token-filters/apostrophe.md b/_analyzers/token-filters/apostrophe.md index e393bcfdb4..27ee92266b 100644 --- a/_analyzers/token-filters/apostrophe.md +++ b/_analyzers/token-filters/apostrophe.md @@ -2,7 +2,7 @@ layout: default title: Apostrophe parent: Token filters -nav_order: 110 +nav_order: 10 --- # Apostrophe token filter From e1e0503dcfb274aa4590d97163fda5dc6613386a Mon Sep 17 00:00:00 2001 From: AntonEliatra Date: Wed, 7 Aug 2024 15:04:56 +0100 Subject: [PATCH 100/154] Update users-roles.md (#7924) Signed-off-by: AntonEliatra --- _security/access-control/users-roles.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_security/access-control/users-roles.md b/_security/access-control/users-roles.md index b6157bf2d9..b182e1576a 100644 --- a/_security/access-control/users-roles.md +++ b/_security/access-control/users-roles.md @@ -136,7 +136,7 @@ As with any role in OpenSearch, a read-only role can be configured using the fol - Using the Cluster Settings API The simplest way to get familiar with roles and role mappings is to use OpenSearch Dashboards. The interface simplifies creating roles and assigning those roles to users, with an easy-to-navigate workflow. -{ .tip} +{: .tip} ### Defining a basic read-only role From ca1c9d9da8908b17e8787747cb4768af197ffa8e Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Wed, 7 Aug 2024 10:05:50 -0400 Subject: [PATCH 101/154] Fix typo (#7921) * gathering potential documentation attempts Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * considering the dashboard tutorial Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * place holder for js data structure usage Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * data-structures placeholder Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Updating index links Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * adding old doc to be merged Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Starting to link things together Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * fix broken link Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * respond to vale Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * more vale violations Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * name files consistently with docs site and fix links. Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * vale Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Minor tweaks. Moved Ubi under SEARCH. Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * add label for versining of spec and OS version Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * try to sort out vale error Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Converting mermaid diagrams to png's Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Updating query_id mermaid code Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Better way to ignore the mermaid scripts in the md files Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * description updates Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * schema.md updating (still need to update the mermaid diagram) Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * schema updates Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * updates Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Rebuilding main Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * merging in images Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Updating UBI spec number Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Use released version Signed-off-by: Eric Pugh * Update _search-plugins/index.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/index.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/index.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Addressing PR feedback Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/data-structures.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/data-structures.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/data-structures.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/data-structures.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Adding dsl intro Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Adding intro sentence Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Title adjust Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Addressing PR feedback Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Addressing pr feedback Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Addressing PR feedback Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Fixing vale errors Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Finishing initial pr feedback Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Next round of PR feedback Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Describing chorus workbench link Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Adding captions for result tables Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * PR clean up Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Adding a few more suggestions Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * updating query filter for laptos Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * PR feedback Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Update _search-plugins/ubi/ubi-dashboard-tutorial.md Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * PR feedback Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> * Apply suggestions from code review Edits to all files to comply with OpenSearch standards; nav_order updates Signed-off-by: Heather Halter * Apply suggestions from code review Missed a file - more commits to the sql-query topic Signed-off-by: Heather Halter * Update _search-plugins/ubi/sql-queries.md Signed-off-by: Heather Halter * Apply suggestions from code review Cleaned up the sample query topic Signed-off-by: Heather Halter * Apply suggestions from code review Signed-off-by: Heather Halter * Update _search-plugins/ubi/dsl-queries.md Signed-off-by: Heather Halter * Update _search-plugins/ubi/dsl-queries.md Signed-off-by: Heather Halter * Apply suggestions from code review Accepted editorial suggestions. Co-authored-by: Nathan Bower Signed-off-by: Heather Halter * Update index.md Reformatted table info Signed-off-by: Heather Halter * Update _search-plugins/ubi/dsl-queries.md Signed-off-by: Heather Halter * Apply suggestions from code review Signed-off-by: Heather Halter * Update index.md Signed-off-by: Heather Halter * Update schemas.md Signed-off-by: Heather Halter * Update index.md Added a missing link and fixed the table. Signed-off-by: Heather Halter * Update index.md Changed the bold to italics Signed-off-by: Heather Halter * Update ubi-dashboard-tutorial.md Removed unnecessary note tag. Signed-off-by: Heather Halter * Update schemas.md Inserted comma Signed-off-by: Heather Halter * Update sql-queries.md Signed-off-by: Heather Halter * Apply suggestions from code review There were some hidden comments that I found in this file. Co-authored-by: Nathan Bower Signed-off-by: Heather Halter * Update _search-plugins/ubi/sql-queries.md Signed-off-by: Heather Halter * Update _search-plugins/ubi/sql-queries.md Signed-off-by: Heather Halter * Apply suggestions from code review Signed-off-by: Heather Halter * Update _search-plugins/ubi/schemas.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update schemas.md Removed links on 'object_id' and 'query_id' Signed-off-by: Heather Halter * Update sql-queries.md removed a note tag and fixed line 326 Signed-off-by: Heather Halter * Update sql-queries.md One more table heading Signed-off-by: Heather Halter * found a typo Signed-off-by: Eric Pugh --------- Signed-off-by: RasonJ <145287540+RasonJ@users.noreply.github.com> Signed-off-by: Eric Pugh Signed-off-by: Heather Halter Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: RasonJ <145287540+RasonJ@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Heather Halter Co-authored-by: Nathan Bower --- _ml-commons-plugin/agents-tools/index.md | 4 ++-- _ml-commons-plugin/api/agent-apis/register-agent.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/_ml-commons-plugin/agents-tools/index.md b/_ml-commons-plugin/agents-tools/index.md index 009906d4cf..1fa86bdf67 100644 --- a/_ml-commons-plugin/agents-tools/index.md +++ b/_ml-commons-plugin/agents-tools/index.md @@ -130,7 +130,7 @@ POST /_plugins/_ml/agents/_register { "type": "VectorDBTool", "name": "VectorDBTool", - "description": "A tool to search opensearch index with natural language quesiotn. If you don't know answer for some question, you should always try to search data with this tool. Action Input: ", + "description": "A tool to search opensearch index with natural language question. If you don't know answer for some question, you should always try to search data with this tool. Action Input: ", "parameters": { "model_id": "YOUR_TEXT_EMBEDDING_MODEL_ID", "index": "my_test_data", @@ -157,4 +157,4 @@ It is important to provide thorough descriptions of the tools so that the LLM ca - For a list of supported tools, see [Tools]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/index/). - For a step-by-step tutorial, see [Agents and tools tutorial]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/agents-tools-tutorial/). - For supported APIs, see [Agent APIs]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/agent-apis/). -- To use agents and tools in configuration automation, see [Automating configurations]({{site.url}}{{site.baseurl}}/automating-configurations/index/). \ No newline at end of file +- To use agents and tools in configuration automation, see [Automating configurations]({{site.url}}{{site.baseurl}}/automating-configurations/index/). diff --git a/_ml-commons-plugin/api/agent-apis/register-agent.md b/_ml-commons-plugin/api/agent-apis/register-agent.md index 820bb923f7..339c25bf0e 100644 --- a/_ml-commons-plugin/api/agent-apis/register-agent.md +++ b/_ml-commons-plugin/api/agent-apis/register-agent.md @@ -161,7 +161,7 @@ POST /_plugins/_ml/agents/_register { "type": "VectorDBTool", "name": "VectorDBTool", - "description": "A tool to search opensearch index with natural language quesiotn. If you don't know answer for some question, you should always try to search data with this tool. Action Input: ", + "description": "A tool to search opensearch index with natural language question. If you don't know answer for some question, you should always try to search data with this tool. Action Input: ", "parameters": { "model_id": "", "index": "", @@ -190,4 +190,4 @@ OpenSearch responds with an agent ID that you can use to refer to the agent: { "agent_id": "bpV_Zo0BRhAwb9PZqGja" } -``` \ No newline at end of file +``` From 237e83934779909d8d3f2c66b6fc519c364fa435 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Wed, 7 Aug 2024 09:08:57 -0500 Subject: [PATCH 102/154] Add Threat Intelligence Section (#7905) * added threat intel source apis for create delete get search operations Signed-off-by: Surya Sashank Nistala * add threat intel findings and alerts APIs Signed-off-by: Surya Sashank Nistala * Update _security-analytics/api-tools/threat-intel/threat-intel-source.md Co-authored-by: Heather Halter Signed-off-by: Surya Sashank Nistala * Update _security-analytics/api-tools/threat-intel/threat-intel-source.md Co-authored-by: Heather Halter Signed-off-by: Surya Sashank Nistala * Update _security-analytics/api-tools/threat-intel/threat-intel-source.md Co-authored-by: Heather Halter Signed-off-by: Surya Sashank Nistala * Update _security-analytics/api-tools/threat-intel/threat-intel-source.md Co-authored-by: Heather Halter Signed-off-by: Surya Sashank Nistala * Update _security-analytics/api-tools/threat-intel/threat-intel-source.md Co-authored-by: Heather Halter Signed-off-by: Surya Sashank Nistala * Update _security-analytics/api-tools/threat-intel/threat-intel-source.md Co-authored-by: Heather Halter Signed-off-by: Surya Sashank Nistala * Update _security-analytics/api-tools/threat-intel/threat-intel-source.md Co-authored-by: Heather Halter Signed-off-by: Surya Sashank Nistala * Update _security-analytics/api-tools/threat-intel/threat-intel-source.md Co-authored-by: Heather Halter Signed-off-by: Surya Sashank Nistala * change the word intel to intelligence across files Signed-off-by: Surya Sashank Nistala * threat intel monitors apis Signed-off-by: Surya Sashank Nistala * add threat intelligence analytics overview documentation Signed-off-by: Surya Sashank Nistala * adds threat intel iocs example file for S3 or local file upload Signed-off-by: Surya Sashank Nistala * Edit alert-findings page Signed-off-by: Archer * Edit and streamline monitor APIs Signed-off-by: Archer * Update source API. Signed-off-by: Archer * Add threat intelligence directory and default pages Signed-off-by: Archer * Fix broken link. Signed-off-by: Archer * Fix metadata. Signed-off-by: Archer * Fix parent relationship Signed-off-by: Archer * Add UI text Signed-off-by: Archer * Add additional info about Threat intel view Signed-off-by: Archer * Fix capitalization. Add more consistent formatting Signed-off-by: Archer * Delete redundant file Signed-off-by: Archer * Add example link Signed-off-by: Archer * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * A couple more typo fixes. Signed-off-by: Archer * Fix title Signed-off-by: Archer * Update _security-analytics/threat-intelligence/api/monitor.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _security-analytics/threat-intelligence/api/monitor.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Delete redundant section. Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _security-analytics/threat-intelligence/api/source.md Co-authored-by: Heather Halter Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Fix IOC acryonym to be in line with AWS Signed-off-by: Archer * Fix remaining typos Signed-off-by: Archer * Fix example link Signed-off-by: Archer * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _security-analytics/threat-intelligence/index.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _security-analytics/threat-intelligence/api/findings.md Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _security-analytics/threat-intelligence/api/monitor.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _security-analytics/threat-intelligence/api/source.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _security-analytics/threat-intelligence/api/source.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Fix header Signed-off-by: Archer --------- Signed-off-by: Surya Sashank Nistala Signed-off-by: Surya Sashank Nistala Signed-off-by: Archer Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Surya Sashank Nistala Co-authored-by: Surya Sashank Nistala Co-authored-by: Heather Halter Co-authored-by: Nathan Bower --- .../threat-intelligence/api/findings.md | 267 ++++++++++ .../threat-intelligence/api/monitor.md | 326 ++++++++++++ .../threat-intelligence/api/source.md | 488 ++++++++++++++++++ .../api/threat-intel-api.md | 14 + .../threat-intelligence/getting-started.md | 85 +++ .../threat-intelligence/index.md | 16 + _security-analytics/usage/detectors.md | 2 +- assets/examples/all-ioc-type-examples.json | 20 + 8 files changed, 1217 insertions(+), 1 deletion(-) create mode 100644 _security-analytics/threat-intelligence/api/findings.md create mode 100644 _security-analytics/threat-intelligence/api/monitor.md create mode 100644 _security-analytics/threat-intelligence/api/source.md create mode 100644 _security-analytics/threat-intelligence/api/threat-intel-api.md create mode 100644 _security-analytics/threat-intelligence/getting-started.md create mode 100644 _security-analytics/threat-intelligence/index.md create mode 100644 assets/examples/all-ioc-type-examples.json diff --git a/_security-analytics/threat-intelligence/api/findings.md b/_security-analytics/threat-intelligence/api/findings.md new file mode 100644 index 0000000000..3d1b3e8951 --- /dev/null +++ b/_security-analytics/threat-intelligence/api/findings.md @@ -0,0 +1,267 @@ +--- +layout: default +title: Alerts and Findings API +parent: Threat intelligence APIs +grand_parent: Threat intelligence +nav_order: 50 +--- + + +# Alerts and Findings API + +The threat intelligence Alerts and Findings API retrieves information about alerts and findings from threat intelligence feeds. + + +--- + +## Get threat intelligence alerts + +Retrieves any alerts related to threat intelligence monitors. + +### Path and HTTP methods + +```json +GET /_plugins/_security_analytics/threat_intel/alerts +``` +{% include copy-curl.html %} + + +### Path parameters + +You can specify the following parameters when requesting an alert. + +Parameter | Description +:--- | :---- +`severityLevel` | Filter alerts by severity level. Optional. +`alertState` | Used to filter by alert state. Possible values are `ACTIVE`, `ACKNOWLEDGED`, `COMPLETED`, `ERROR`, or `DELETED`. Optional. +`sortString` | The string Security Analytics uses to sort the alerts. Optional. +`sortOrder` | The order used to sort the list of alerts. Possible values are `asc` or `desc`. Optional. +`missing` | A list of fields for which no alias mappings were found. Optional. +`size` | An optional maximum number of results to be returned in the response. Optional. +`startIndex` | The pagination indicator. Optional. +`searchString` | The alert attribute you want returned in the search. Optional. + +### Example request + +```json +GET /_plugins/_security_analytics/threat_intel/alerts +``` +{% include copy-curl.html %} + +### Example response + +```json +{ + "alerts": [{ + "id": "906669ee-56e8-4f40-a12f-ab4c274d7521", + "version": 1, + "schema_version": 0, + "seq_no": 0, + "primary_term": 1, + "trigger_id": "regwarg", + "trigger_name": "regwarg", + "state": "ACTIVE", + "error_message": null, + "ioc_value": "example-has00001", + "ioc_type": "hashes", + "severity": "high", + "finding_ids": [ + "a9c10094-6139-42b3-81a8-867dffbe381d" + ], + "acknowledged_time": 1722038395105, + "last_updated_time": null, + "start_time": 1722038395105, + "end_time": null + }], + "total_alerts": 1 +} +``` + +### Response fields + +A threat intelligence alert can have one of the following states. + +| State | Description | +| :---- | :--- | +| `ACTIVE` | The alert is ongoing and unacknowledged. Alerts remain in this state until they are acknowledged, the trigger associated with the alert is deleted, or the threat intelligence monitor is deleted entirely. | +| `ACKNOWLEDGED` | The alert is acknowledged, but the root cause of the alert has not been addressed. | +| `COMPLETED` | The alert is no longer ongoing. Alerts enter this state after the corresponding trigger evaluates to `false`. | +| `DELETED` | The monitor or trigger for the alert was deleted while the alert was active. | + +--- + +## Update Alerts Status API + +Updates the status of the specified alerts to `ACKNOWLEDGED` or `COMPLETED`. Only alerts in the `ACTIVE` state can be updated. + +### Path and HTTP methods + +```json +PUT /plugins/security_analytics/threat_intel/alerts/status +``` + +### Example requests + +The following example updates the status of the specified alerts to `ACKNOWLEDGED`: + +```json +PUT /plugins/security_analytics/threat_intel/alerts/status?state=ACKNOWLEDGED&alert_ids=, +``` + +The following example updates the status of the specified alerts to `COMPLETED`: + +```json +PUT /plugins/security_analytics/threat_intel/alerts/status?state=COMPLETED&alert_ids=alert_ids=, +``` + +### Example response + +```json +{ + "updated_alerts": [ + { + "id": "906669ee-56e8-4f40-a12f-ab4c274d7521", + "version": 1, + "schema_version": 0, + "seq_no": 2, + "primary_term": 1, + "trigger_id": "regwarg", + "trigger_name": "regwarg", + "state": "ACKNOWLEDGED", + "error_message": null, + "ioc_value": "example-has00001", + "ioc_type": "hashes", + "severity": "high", + "finding_ids": [ + "a9c10094-6139-42b3-81a8-867dffbe381d" + ], + "acknowledged_time": 1722039091209, + "last_updated_time": 1722039091209, + "start_time": 1722038395105, + "end_time": null + }, + { + "id": "56e8-4f40-a12f-ab4c274d7521-906669ee", + "version": 1, + "schema_version": 0, + "seq_no": 2, + "primary_term": 1, + "trigger_id": "regwarg", + "trigger_name": "regwarg", + "state": "ACKNOWLEDGED", + "error_message": null, + "ioc_value": "example-has00001", + "ioc_type": "hashes", + "severity": "high", + "finding_ids": [ + "a9c10094-6139-42b3-81a8-867dffbe381d" + ], + "acknowledged_time": 1722039091209, + "last_updated_time": 1722039091209, + "start_time": 1722038395105, + "end_time": null + } + ], + "failure_messages": [] +} +``` + + + +--- + +## Get findings + +Returns threat intelligence indicator of compromise (IOC) findings. When the threat intelligence monitor finds a malicious IOC during a data scan, a finding is automatically generated. + +### Path and HTTP methods + +```json +GET /_plugins/_security_analytics/threat_intel/findings/ +``` + +### Path parameters + +| Parameter | Description | +|:---------------|:--------------------------------------------------------------------------------------------| +| `sortString` | Specifies which string Security Analytics uses to sort the alerts. Optional. | +| `sortOrder` | The order used to sort the list of findings. Possible values are `asc` or `desc`. Optional. | +| `missing` | A list of fields for which there were no alias mappings found. Optional. | +| `size` | The maximum number of results to be returned in the response. Optional. | +| `startIndex` | The pagination indicator. Optional. | +| `searchString` | The alert attribute you want returned in the search. Optional. | + +### Example request + +```json +GET /_plugins/_security_analytics/threat_intel/findings/_search?size=3 +``` + +```json +{ + "total_findings": 10, + "ioc_findings": [ + { + "id": "a9c10094-6139-42b3-81a8-867dffbe381d", + "related_doc_ids": [ + "Ccp88ZAB1vBjq44wmTEu:windows" + ], + "ioc_feed_ids": [ + { + "ioc_id": "2", + "feed_id": "Bsp88ZAB1vBjq44wiDGo", + "feed_name": "my_custom_feed", + "index": "" + } + ], + "monitor_id": "B8p88ZAB1vBjq44wkjEy", + "monitor_name": "Threat intelligence monitor", + "ioc_value": "example-has00001", + "ioc_type": "hashes", + "timestamp": 1722038394501, + "execution_id": "01cae635-93dc-4f07-9e39-31076b9535d1" + }, + { + "id": "8d87aee0-aaa4-4c12-b4e2-b4b1f4ec80f9", + "related_doc_ids": [ + "GsqI8ZAB1vBjq44wXTHa:windows" + ], + "ioc_feed_ids": [ + { + "ioc_id": "2", + "feed_id": "Bsp88ZAB1vBjq44wiDGo", + "feed_name": "my_custom_feed", + "index": "" + } + ], + "monitor_id": "B8p88ZAB1vBjq44wkjEy", + "monitor_name": "Threat intelligence monitor", + "ioc_value": "example-has00001", + "ioc_type": "hashes", + "timestamp": 1722039165824, + "execution_id": "54899e32-aeeb-401e-a031-b1728772f0aa" + }, + { + "id": "2419f624-ba1a-4873-978c-760183b449b7", + "related_doc_ids": [ + "H8qI8ZAB1vBjq44woDHU:windows" + ], + "ioc_feed_ids": [ + { + "ioc_id": "2", + "feed_id": "Bsp88ZAB1vBjq44wiDGo", + "feed_name": "my_custom_feed", + "index": "" + } + ], + "monitor_id": "B8p88ZAB1vBjq44wkjEy", + "monitor_name": "Threat intelligence monitor", + "ioc_value": "example-has00001", + "ioc_type": "hashes", + "timestamp": 1722039182616, + "execution_id": "32ad2544-4b8b-4c9b-b2b4-2ba6d31ece12" + } + ] +} + +``` diff --git a/_security-analytics/threat-intelligence/api/monitor.md b/_security-analytics/threat-intelligence/api/monitor.md new file mode 100644 index 0000000000..965fd79af3 --- /dev/null +++ b/_security-analytics/threat-intelligence/api/monitor.md @@ -0,0 +1,326 @@ +--- +layout: default +title: Monitor API +parent: Threat intelligence APIs +grand_parent: Threat intelligence +nav_order: 35 +--- + +# Monitor API + +You can use the threat intelligence Monitor API to create, search, and update [monitors](https://opensearch.org/docs/latest/observing-your-data/alerting/monitors/) for your threat intelligence feeds. + + +--- +## Create or update a threat intelligence monitor + +Creates or updates a threat intelligence monitor. + +### Path and HTTP methods + +The `POST` method creates a new monitor. The `PUT` method updates a monitor. + +```json +POST _plugins/_security_analytics/threat_intel/monitors +PUT _plugins/_security_analytics/threat_intel/monitors/ +``` + +### Request fields + +You can specify the following fields in the request body. + +| Field | Type | Description | +| :--- | :--- | :--- | +| `name` | String | The name of the monitor. Required. | +| `schedule` | Object | The schedule that determines how often the monitor runs. Required. | +| `schedule.period` | Object | Information about the frequency of the schedule. Required. | +| `schedule.period.interval` | Integer | The interval at which the monitor runs. Required. | +| `schedule.period.unit` | String | The unit of time for the interval. | +| `enabled` | Object | Information about the user who created the monitor. Required. | +| `user.backend_roles` | Array | The backend roles associated with the user. Optional. | +| `user.roles` | Array | The roles associated with the user. Optional. | +| `user.custom_attribute_names` | Array | Custom attribute names associated with the user. Optional. | +| `user.user_requested_tenant` | String | The tenant requested by the user. Optional. | +| `indices` | Array | The log data sources used for the monitor. Required. | +| `per_ioc_type_scan_input_list` | Array | A list of inputs to scan based on the indicator of compromise (IOC) types. Required. | +| `per_ioc_type_scan_input_list.ioc_type` | String | The type of IOC (for example, hashes). Required. | +| `per_ioc_type_scan_input_list.index_to_fields_map` | Object |The index field mappings that contain values for the given IOC type. Required. | +| `per_ioc_type_scan_input_list.index_to_fields_map.` | Array | A list of fields contained in the specified index. Required. | +| `triggers` | Array | The trigger settings for alerts. Required. | +| `triggers.data_sources` | Array | A list of data sources associated with the trigger. Required. | +| `triggers.name` | String | The name of the trigger. Required. | +| `triggers.severity` | String | The severity level of the trigger (for example, high, medium, or low). Required. | + +### Example requests + +The following section provides example requests for the Monitor API. + + +#### Create a monitor + +```json +{ + "name": "Threat intel monitor", + "schedule": { + "period": { + "interval": 1, + "unit": "MINUTES" + } + }, + "enabled": false, + "user": { + "name": "", + "backend_roles": [], + "roles": [], + "custom_attribute_names": [], + "user_requested_tenant": null + }, + "indices": [ + "windows" + ], + "per_ioc_type_scan_input_list": [ + { + "ioc_type": "hashes", + "index_to_fields_map": { + "windows": [ + "file_hash" + ] + } + } + ], + "triggers": [ + { + "data_sources": [ + "windows", + "random" + ], + "name": "regwarg", + "severity": "high" + } + ] +} +``` + +### Update a monitor + +```json +{ + "name": "Threat intel monitor", + "schedule": { + "period": { + "interval": 1, + "unit": "MINUTES" + } + }, + "enabled": false, + "user": { + "name": "", + "backend_roles": [], + "roles": [], + "custom_attribute_names": [], + "user_requested_tenant": null + }, + "indices": [ + "windows" + ], + "per_ioc_type_scan_input_list": [ + { + "ioc_type": "hashes", + "index_to_fields_map": { + "windows": [ + "file_hash" + ] + } + } + ], + "triggers": [ + { + "data_sources": [ + "windows", + "random" + ], + "name": "regwarg", + "severity": "high" + } + ] +} +``` + + +### Example response + +```json +{ + "id": "B8p88ZAB1vBjq44wkjEy", + "name": 1, + "seq_no": 0, + "primary_term": 1, + "monitor": { + "id": "B8p88ZAB1vBjq44wkjEy", + "name": "Threat intel monitor", + "per_ioc_type_scan_input_list": [ + { + "ioc_type": "hashes", + "index_to_fields_map": { + "windows": [ + "file_hash" + ] + } + } + ], + "schedule": { + "period": { + "interval": 1, + "unit": "MINUTES" + } + }, + "enabled": false, + "user": { + "name": "", + "backend_roles": [], + "roles": [], + "custom_attribute_names": [], + "user_requested_tenant": null + }, + "indices": [ + "windows" + ], + "triggers": [ + { + "data_sources": [ + "windows", + "random" + ], + "ioc_types": [], + "actions": [], + "id": "afdd80cc-a669-4487-98a0-d84bea8e1e39", + "name": "regwarg", + "severity": "high" + } + ] + } +} +``` +--- + +## Delete a monitor + +Deletes an existing threat intelligence monitor. + +### Path and HTTP methods + +```json +DELETE /_plugins/_security_analytics/threat_intel/monitors/ +``` + +### Example request + +```json +DELETE /_plugins/_security_analytics/threat_intel/monitors/B8p88ZAB1vBjq44wkjEy +``` +{% include copy-curl.html %} + +### Example response + +```json +{ + "_id" : "B8p88ZAB1vBjq44wkjEy", + "_version" : 1 +} +``` + +## Search for a monitor + +Searches for an existing monitor using a query. The request body expects a search query. For query options, see [Query DSL]({{site.url}}{{site.baseurl}}/query-dsl/). + +### Example request + +The following example request using a match query with the monitor's ID to search for the monitor: + +```json +POST /_plugins/_security_analytics/detectors/_search +{ + "query": { + "match": { + "_id": "HMqq_5AB1vBjq44wpTIN" + } + } +} +``` +{% include copy-curl.html %} + +### Example response + +```json +{ + "took": 11, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 2.0, + "hits": [ + { + "_index": ".opendistro-alerting-config", + "_id": "HMqq_5AB1vBjq44wpTIN", + "_version": 1, + "_seq_no": 8, + "_primary_term": 1, + "_score": 2.0, + "_source": { + "id": "HMqq_5AB1vBjq44wpTIN", + "name": "Threat intel monitor", + "per_ioc_type_scan_input_list": [ + { + "ioc_type": "hashes", + "index_to_fields_map": { + "windows": [ + "file_hash" + ] + } + } + ], + "schedule": { + "period": { + "interval": 1, + "unit": "MINUTES" + } + }, + "enabled": false, + "user": { + "name": "", + "backend_roles": [], + "roles": [], + "custom_attribute_names": [], + "user_requested_tenant": null + }, + "indices": [ + "windows" + ], + "triggers": [ + { + "data_sources": [ + "windows", + "random" + ], + "ioc_types": [], + "actions": [], + "id": "63426758-c82d-4c87-a52c-f86ee6a8a06d", + "name": "regwarg", + "severity": "high" + } + ] + } + } + ] + } +} +``` \ No newline at end of file diff --git a/_security-analytics/threat-intelligence/api/source.md b/_security-analytics/threat-intelligence/api/source.md new file mode 100644 index 0000000000..7cfadfd813 --- /dev/null +++ b/_security-analytics/threat-intelligence/api/source.md @@ -0,0 +1,488 @@ +--- +layout: default +title: Source API +parent: Threat intelligence APIs +grand_parent: Threat intelligence +nav_order: 50 +--- + +# Source API + +The threat intelligence Source API updates and returns information about tasks related to threat intelligence source configurations. + +## Create or update a threat intelligence source + +Creates or updates a threat intelligence source and loads indicators of compromise (IOCs) from that source. + +### Path and HTTP methods + +```json +POST _plugins/_security_analytics/threat_intel/sources +PUT _plugins/_security_analytics/threat_intel/sources/ +``` + +### Request fields + +| Field | Type | Description | +| :--- | :--- | :---- | +| `type` | String | The type of threat intelligence source, such as `S3_CUSTOM` or `IOC_UPLOAD`. | +| `name` | String | The name of the threat intelligence source. | +| `format` | String | The format of the threat intelligence data, such as `STIX2`. | +| `description` | String | A description of the threat intelligence source. | +| `enabled` | Boolean | Indicates whether the scheduled refresh of IOCs from the source is enabled. | +| `ioc_types` | Array of strings | The `STIX2` types of IOCs that the source supports, for example, `hashes`, `domain-name`, `ipv4-addr`, or `ipv6-addr`. | +| `source` | Object | The source information for the threat intelligence data. | +| `source.ioc_upload` | Object | Information about the IOC upload. Applicable to the `IOC_UPLOAD` type. | +| `source.ioc_upload.file_name` | String | The name of the file containing IOCs, such as `test`. Applicable to the`IOC_UPLOAD` type. | +| `source.ioc_upload.iocs` | Array of objects | A list of IOCs in `STIX2` format. Applicable to the `IOC_UPLOAD` type. | +| `source_config.source.s3` | Object | Information about the Amazon Simple Storage Service (Amazon S3) source. Applicable to the `S3_CUSTOM` type. | +| `source_config.source.s3.bucket_name` | String | The name of the S3 bucket, such as `threat-intel-s3-test-bucket`. Applicable to the `S3_CUSTOM` type. | +| `source_config.source.s3.object_key` | String | The key for the object in the S3 bucket, such `alltypess3object`. Applicable to the `S3_CUSTOM` type. | +| `source_config.source.s3.region` | String | The AWS Region in which the S3 bucket is located. Example: `us-west-2`. Applicable to the `S3_CUSTOM` type. | +| `source_config.source.s3.role_arn` | String | The Amazon Resource Name (ARN) of the role used to access the S3 bucket, such as `arn:aws:iam::248279774929:role/threat_intel_s3_test_role`. Applicable to the `S3_CUSTOM` type. | + +#### IOC fields (STIX2) + +The following fields modify the `ioc_types` option. + +| Field | Type | Description | +| :--- | :---- | :---- | +| `id` | String | A unique identifier for the IOC, such as `1`. | +| `name` | String | A human-readable name for the IOC, such as `ioc-name`. | +| `type` | String | The type of IOC, such as `hashes`. | +| `value` | String | The value of the IOC, which can be a hash value, such as `gof`. | +| `severity` | String | The severity level of the IOC. Example: `thvvz`. | +| `created` | Integer/String | The timestamp indicating when the IOC was created, either in UNIX epoch format or ISO_8601 format, for example, `1719519073` or `2024-06-20T01:06:20.562008Z`. | +| `modified` | Integer/String | The timestamp indicating when the IOC was last modified, either in UNIX epoch format or ISO_8601 format, for example, `1719519073` or `2024-06-20T01:06:20.562008Z.` | +| `description` | String | A description of the IOC. | +| `labels` | Array of strings | Any labels or tags associated with the IOC. | +| `feed_id` | String | A unique identifier for the feed to which the IOC belongs. | +| `spec_version` | String | The specification version used for the IOC. | +| `version` | Integer | A version number for the IOC. | + +### Response fields + +| Field | Data type | Description | +| :---- | :--- |:----- | +| `_id` | String | The unique identifier for the threat intelligence source. | +| `_version` | Integer | The version number of the threat intelligence source. | +| `source_config` | Object | The configuration details of the threat intelligence source. | +| `source_config.name` | String | The name of the threat intelligence source. | +| `source_config.format` | String | The format of the threat intelligence data. | +| `source_config.type` | String | The type of the threat intelligence source. | +| `source_config.ioc_types` | Array of strings | The types of IOCs supported by the source. | +| `source_config.description` | String | A description of the threat intelligence source. | +| `source_config.created_by_user` | String or null | The user who created the threat intelligence source. | +| `source_config.created_at` | String (DateTime) | The date and time when the threat intelligence source was created. | +| `source_config.source` | Object | Contains information about the source of the threat intelligence data. | +| `source_config.source.ioc_upload` | Object | Information about the IOC upload. | +| `source_config.source.ioc_upload.file_name` | String | The name of the uploaded file. Example: `test`. | +| `source_config.source.ioc_upload.iocs` | Array of objects | Any additional information about the IOC upload. When the IOC is stored successfully, this appears as an empty array. | +| `source_config.enabled` | Boolean | Indicates whether the threat intelligence source is enabled. | +| `source_config.enabled_time` | String or null | The date and time when the source was enabled. | +| `source_config.last_update_time` | String (DateTime) | The date and time when the threat intelligence source was last updated. | +| `source_config.schedule` | String or null | The schedule for the threat intelligence source. | +| `source_config.state` | String | The current state of the threat intelligence source. | +| `source_config.refresh_type` | String | The type of refresh applied to the source. | +| `source_config.last_refreshed_user` | String or null | The user who last refreshed the source. | +| `source_config.last_refreshed_time` | String (DateTime) | The date and time when the source was last refreshed. | + +### Example requests + +The following example requests show you how to use the Source API. + +#### IOC_UPLOAD type + +```json +POST _plugins/_security_analytics/threat_intel/sources/ +{ + "type": "IOC_UPLOAD", + "name": "my_custom_feed", + "format": "STIX2", + "description": "this is the description", + "store_type": "OS", + "enabled": "false", + "ioc_types": [ + "hashes" + ], + "source": { + "ioc_upload": { + "file_name": "test", + "iocs": [ + { + "id": "1", + "name": "uldzafothwgik", + "type": "hashes", + "value": "gof", + "severity": "thvvz", + "created": 1719519073, + "modified": 1719519073, + "description": "first one here", + "labels": [ + "ik" + ], + "feed_id": "jl", + "spec_version": "gavvnespe", + "version": -4356924786557562654 + }, + { + "id": "2", + "name": "uldzafothwgik", + "type": "hashes", + "value": "example-has00001", + "severity": "thvvz", + "created": "2024-06-20T01:06:20.562008Z", + "modified": "2024-06-20T02:06:20.56201Z", + "description": "first one here", + "labels": [ + "ik" + ], + "feed_id": "jl", + "spec_version": "gavvnespe", + "version": -4356924786557562654 + } + ] + } + } +} +``` +{% include copy-curl.html %} + +#### S3_CUSTOM type source + +```json +POST _plugins/_security_analytics/threat_intel/sources/ +{ + "type": "S3_CUSTOM", + "name": "example-ipv4-from-SAP-account", + "format": "STIX2", + "store_type": "OS", + "enabled": "true", + "schedule": { + "interval": { + "start_time": 1717097122, + "period": "10", + "unit": "DAYS" + } + }, + "source": { + "s3": { + "bucket_name": "threat-intel-s3-test-bucket", + "object_key": "alltypess3object", + "region": "us-west-2", + "role_arn": "arn:aws:iam::248279774929:role/threat_intel_s3_test_role" + } + }, + "ioc_types": [ + "domain-name", + "ipv4-addr" + ] +} +``` +{% include copy-curl.html %} + +### Example responses + +The following example responses show what OpenSearch returns after a successful request. + + +#### IOC_UPLOAD type + +```json +{ + "_id": "2c0u7JAB9IJUg27gcjUp", + "_version": 2, + "source_config": { + "name": "my_custom_feed", + "format": "STIX2", + "type": "IOC_UPLOAD", + "ioc_types": [ + "hashes" + ], + "description": "this is the description", + "created_by_user": null, + "created_at": "2024-07-25T23:16:25.257697Z", + "source": { + "ioc_upload": { + "file_name": "test", + "iocs": [] + } + }, + "enabled": false, + "enabled_time": null, + "last_update_time": "2024-07-25T23:16:26.011774Z", + "schedule": null, + "state": "AVAILABLE", + "refresh_type": "FULL", + "last_refreshed_user": null, + "last_refreshed_time": "2024-07-25T23:16:25.522735Z" + } +} +``` + +#### S3_CUSTOM type source + +```json +{ + "id": "rGO5zJABLVyN2kq1wbFS", + "version": 206, + "name": "example-ipv4-from-SAP-account", + "format": "STIX2", + "type": "S3_CUSTOM", + "ioc_types": [ + "domain-name", + "ipv4-addr" + ], + "created_by_user": { + "name": "admin", + "backend_roles": [], + "roles": [ + "security_manager", + "all_access" + ], + "custom_attribute_names": [] + }, + "created_at": "2024-07-19T20:40:44.114Z", + "source": { + "s3": { + "bucket_name": "threat-intel-s3-test-bucket", + "object_key": "alltypess3object", + "region": "us-west-2", + "role_arn": "arn:aws:iam::248279774929:role/threat_intel_s3_test_role" + } + }, + "enabled": true, + "enabled_time": "2024-07-19T20:40:44.114Z", + "last_update_time": "2024-07-25T20:58:18.213Z", + "schedule": { + "interval": { + "start_time": 1717097122, + "period": 10, + "unit": "Days" + } + }, + "state": "AVAILBLE", + "refresh_type": "FULL", + "last_refreshed_user": { + "name": "admin", + "backend_roles": [], + "roles": [ + "security_manager", + "all_access" + ], + "custom_attribute_names": [], + "user_requested_tenant": null + }, + "last_refreshed_time": "2024-07-25T20:58:17.131Z" +} +``` + +--- + +## Get threat intelligence source configuration details + +Retrieves the threat intelligence source configuration details. + +### Path and HTTP methods + + +```json +GET /_plugins/_security_analytics/threat_intel/sources/ +``` + +### Example request + +```json +GET /_plugins/_security_analytics/threat_intel/sources/ +``` +{% include copy-curl.html %} + +### Example response + +```json +{ + "_id": "a-jnfjkAF_uQjn8Weo4", + "_version": 2, + "source_config": { + "name": "my_custom_feed_2", + "format": "STIX2", + "type": "S3_CUSTOM", + "ioc_types": [ + "ipv4_addr", + "hashes" + ], + "description": "this is the description", + "created_by_user": null, + "created_at": "2024-06-27T00:52:56.373Z", + "source": { + "s3": { + "bucket_name": "threat-intel-s3-test-bucket", + "object_key": "bd", + "region": "us-west-2", + "role_arn": "arn:aws:iam::540654354201:role/threat_intel_s3_test_role" + } + }, + "enabled": true, + "enabled_time": "2024-06-27T00:52:56.373Z", + "last_update_time": "2024-06-27T00:52:57.824Z", + "schedule": { + "interval": { + "start_time": 1717097122, + "period": 1, + "unit": "Days" + } + }, + "state": "AVAILABLE", + "refresh_type": "FULL", + "last_refreshed_user": null, + "last_refreshed_time": "2024-06-27T00:52:56.533Z" + } +} +``` +--- + +## Search for a threat intelligence source + +Searches for threat intelligence source matches based on the search query. The request body expects a search query. For query options, see [Query DSL]({{site.url}}{{site.baseurl}}/query-dsl/). + + +### Path and HTTP methods + +```json +POST /_plugins/_security_analytics/threat_intel/sources/_search +``` + +### Example request + +```json +POST /_plugins/_security_analytics/threat_intel/sources/_search +{ + "query": { + "match": { + "source_config.type": "S3_CUSTOM" + } + } +} +``` +{% include copy-curl.html %} + +### Example response + +```json +{ + "took": 20, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1.0, + "hits": [ + { + "_index": ".opensearch-sap--job", + "_id": "YEAuV5ABx0lQn6qhY5C1", + "_version": 2, + "_seq_no": 1, + "_primary_term": 1, + "_score": 1.0, + "_source": { + "source_config": { + "name": "my_custom_feed_2", + "format": "STIX2", + "type": "S3_CUSTOM", + "description": "this is the description", + "created_by_user": null, + "source": { + "s3": { + "bucket_name": "threat-intelligence-s3-test-bucket", + "object_key": "bd", + "region": "us-west-2", + "role_arn": "arn:aws:iam::540654354201:role/threat_intel_s3_test_role" + } + }, + "created_at": 1719449576373, + "enabled_time": 1719449576373, + "last_update_time": 1719449577824, + "schedule": { + "interval": { + "start_time": 1717097122, + "period": 1, + "unit": "Days" + } + }, + "state": "AVAILABLE", + "refresh_type": "FULL", + "last_refreshed_time": 1719449576533, + "last_refreshed_user": null, + "enabled": true, + "ioc_types": [ + "ip", + "hash" + ] + } + } + } + ] + } +} +``` + +--- + +## Delete Threat Intelligence Source API + +Deletes a threat intelligence source. + +### Path and HTTP methods + +```json +DELETE /_plugins/_security_analytics/threat_intel/sources/ +``` + +### Example request + +```json +DELETE /_plugins/_security_analytics/threat_intel/sources/2c0u7JAB9IJUg27gcjUp +``` +{% include copy-curl.html %} + +### Example response + +```json +{ + "_id": "2c0u7JAB9IJUg27gcjUp" +} +``` +--- + +## Refresh the source + +Downloads any IOCs from the threat intelligence source. Only supports the `S3_CUSTOM` type source. + +### Path and HTTP methods + +```json +POST /_plugins/_security_analytics/threat_intel/sources//_refresh +``` + +### Example request + +```json +POST /_plugins/_security_analytics/threat_intel/sources/IJAXz4QBrmVplM4JYxx_/_refresh +``` +{% include copy-curl.html %} + +### Example response + +```json +{ + "acknowledged": true +} +``` diff --git a/_security-analytics/threat-intelligence/api/threat-intel-api.md b/_security-analytics/threat-intelligence/api/threat-intel-api.md new file mode 100644 index 0000000000..c87133b700 --- /dev/null +++ b/_security-analytics/threat-intelligence/api/threat-intel-api.md @@ -0,0 +1,14 @@ +--- +layout: default +title: Threat intelligence APIs +nav_order: 50 +parent: Threat intelligence +has_children: true +has_toc: true +--- + +# Threat intelligence APIs + +OpenSearch provides several APIs that allow you to set up and interact with your threat intelligence feeds. + + diff --git a/_security-analytics/threat-intelligence/getting-started.md b/_security-analytics/threat-intelligence/getting-started.md new file mode 100644 index 0000000000..366bc2674c --- /dev/null +++ b/_security-analytics/threat-intelligence/getting-started.md @@ -0,0 +1,85 @@ +--- +layout: default +title: Getting started +parent: Threat intelligence +nav_order: 41 +--- + +# Getting started + +To get started with threat intelligence, you'll need to set up your threat intelligence sources and set up monitors to scan your log sources. The following tutorial shows you how to get started using OpenSearch Dashboards. Alternatively, you can use the [API](({{site.url}}{{site.baseurl}}/security-analytics/threat-intelligence/api/threat-intel-api/). + +## Threat intelligence view + +To access threat intelligence, log in to OpenSearch Dashboards and select **Security Analytics** > **Threat Intelligence**. + +In the threat intelligence view, you can access the following tabs: + +- **Threat intel sources**: Shows a list of all active and inactive threat intelligence sources, including the default IP reputation feed, [AlienVault OTX](https://otx.alienvault.com/), which comes prepackaged when downloading OpenSearch. +- **Scan configuration**: Shows an overview of your scan configuration, including the configured **Log sources**, **Scan schedule**, and **Alert triggers**. From the **Actions** dropdown list, you can also **Stop scan**, **Edit scan configuration**, or **Delete scan configuration**. + + +## Step 1: Set up threat intelligence sources + +To add a threat intelligence source, select **Add threat intel source** from the threat intelligence page. The **Add custom threat intelligence source** page appears. + +On the threat intelligence source page, add the following information: + +- **Name**: A name for the source. +- **Description**: An optional description of the source. +- **Threat intel source type**: The source type determines where the `STIX2` file is stored. You can choose one of the following options: + - **Remote data store location**: Connects to a custom data store. As of OpenSearch 2.16, only the `S3_SOURCE` type is supported. This setting also gives you the ability to set a download schedule, where OpenSearch downloads the newest `STIX2` file from the data store. For more information, see [S3_SOURCE connection details](#s3_source-connection-information). + - **Local file upload**: Uploads a custom threat intelligence IOC file. Custom files cannot be downloaded based on a schedule and must be uploaded manually in order to update the IOCs. For more information, see [Local file upload](#local-file-upload). +- **Types of malicious indicators**: Determines the types of malicious IOCs to pull from the `STIX2` file. The following IOCs are supported: + - IPv4-Address + - IPv6-Address + - Domains + - File hash + +After all the relevant information has been entered, select **Add threat intel source**. + +### Local file upload + +Local files uploaded as the threat intelligence source must use the following specifications: + +- Upload as a JSON file in the `STIX2` format. For an example `STIX2` file, download [this file]({{site.url}}{{site.baseurl}}/assets/examples/all-ioc-type-examples.json), which contains example formatting for all supported IOC types. +- Be less than 500 kB. + + +### S3_SOURCE connection information + +When using the `S3_SOURCE` as a remote store, the following connection information must be provided: + +- **IAM Role ARN**: The Amazon Resource Name (ARN) for an AWS Identity and Access Management (IAM) role. +- **S3 bucket directory**: The name of the Amazon Simple Storage Service (Amazon S3) bucket in which the `STIX2` file is stored. +- **Specify a directory or file**: The object key or directory path for the `STIX2` file in the S3 bucket. +- **Region**: The AWS Region for the S3 bucket. + +You can also set the **Download schedule**, which determines to where OpenSearch downloads an updated `STIX2` file from the connected S3 bucket. The default interval is once a day. Only daily intervals are supported. + +Alternatively, you can check the **Download on demand** option, which prevents new data from the bucket from being automatically downloaded. + + +## Step 2: Set up scanning for your log sources + +You can configure threat intelligence monitors to scan your aliases and data streams. The monitor scans for newly ingested data from your indexes and matches that data against any IOCs present in the threat intelligence sources. The scan applies to all threat intelligence sources added to OpenSearch. By default, the scan runs once each minute. + +To add or edit a scan configuration: + +1. From the threat intelligence view, select **Add scan configuration** or **Edit scan configuration**. +2. Select the indexes or aliases to scan. +3. Select the **fields** from your indexes or aliases to scan based on their IOC type. For example, if an alias has two fields called `src_ip` and `dst_ip` that contain `ipv4` addresses, then those fields must be entered into the `ipv4-addr` section of the monitor request. +4. Determine a **Scan schedule** for the indicated indexes or aliases. By default, OpenSearch scans for IOCs once each minute. +5. Set up any alert triggers and trigger conditions. You can add multiple triggers: + 1. Add a name for the trigger. + 2. Choose an indicator type. The indicator type matches the IOC type. + 3. Select a severity for the alert. + 4. Select whether to send a notification when the alert is triggered. When enabled, you can customize which channels the notification is sent to as well as the notification message. The notification message can be customized using a [Mustache template](https://mustache.github.io/mustache.5.html). +6. Once your settings have been entered, select **Save and start monitoring**. + +When malicious IOCs are found, OpenSearch creates **findings**, which provide information about the threat. You can also configure triggers to create alerts, which send notifications to configured webhooks or endpoints. + + +## Viewing alerts and findings + +You can view the alerts and findings generated by threat intelligence monitors to analyze which malicious indicators have occurred in their security logs. To view alerts or findings, select **View findings** or **View alerts** from the threat intelligence view. diff --git a/_security-analytics/threat-intelligence/index.md b/_security-analytics/threat-intelligence/index.md new file mode 100644 index 0000000000..b116d045c1 --- /dev/null +++ b/_security-analytics/threat-intelligence/index.md @@ -0,0 +1,16 @@ +--- +layout: default +title: Threat intelligence +nav_order: 40 +has_children: true +--- + +# Threat intelligence + +Threat intelligence in Security Analytics offers the capability to integrate your threat intelligence feeds. Feeds comprise indicators of compromise (IOCs), which search for malicious indicators in your data by setting up threat intelligence monitors. These monitors generate findings and can send notifications when malicious IPs, domains, or hashes from the threat intelligence feeds match your data. + + +You can interact with threat intelligence in the following ways: + +- Threat intelligence APsI: To configure threat intelligence using API operations, see [Threat Intelligence APIs]({{site.url}}{{site.baseurl}}/security-analytics/threat-intelligence/api/threat-intel-api/). +- OpenSearch Dashboards: To configure and use threat intelligence through the OpenSearch Dashboards interface, see [Getting started]({{site.url}}{{site.baseurl}}/security-analytics/threat-intelligence/getting-started/). \ No newline at end of file diff --git a/_security-analytics/usage/detectors.md b/_security-analytics/usage/detectors.md index bd7868bc37..1246812a22 100644 --- a/_security-analytics/usage/detectors.md +++ b/_security-analytics/usage/detectors.md @@ -37,7 +37,7 @@ After you select the **Alert triggers** tab, you also have the option to add add ### Threat intelligence feeds -A threat intelligence feed is a real-time, continuous data stream that gathers information related to risks or threats. A piece of information in the tactical threat intelligence feed suggesting that your cluster may have been compromised, such as a login from an unknown user or location or anomalous activity like an increase in read volume, is called an *indicator of compromise* (IoC). These IoCs can be used by investigators to help isolate security incidents. +A threat intelligence feed is a real-time, continuous data stream that gathers information related to risks or threats. A piece of information in the tactical threat intelligence feed suggesting that your cluster may have been compromised, such as a login from an unknown user or location or anomalous activity like an increase in read volume, is called an *indicator of compromise (IOC)*. These IOCs can be used by investigators to help isolate security incidents. As of OpenSearch 2.12, you can enable threat intelligence for Sigma rules related to malicious IP addresses. diff --git a/assets/examples/all-ioc-type-examples.json b/assets/examples/all-ioc-type-examples.json new file mode 100644 index 0000000000..46202f7933 --- /dev/null +++ b/assets/examples/all-ioc-type-examples.json @@ -0,0 +1,20 @@ +{"name":"test-domain-ioc-1","type":"domain-name","value":"example1.com","severity":"3","created":"2024-06-24T23:38:59.817536Z","modified":"2024-06-25T00:38:59.81754Z","description":"test ioc description","labels":["label1"],"spec_version":"spec1"} +{"name":"test-domain-ioc-2","type":"domain-name","value":"example2.com","severity":"3","created":"2024-06-24T23:38:59.817536Z","modified":"2024-06-25T00:38:59.81754Z","description":"test ioc description","labels":["label1"],"spec_version":"spec1"} +{"name":"test-domain-ioc-3","type":"domain-name","value":"example3.com","severity":"3","created":"2024-06-24T23:38:59.817536Z","modified":"2024-06-25T00:38:59.81754Z","description":"test ioc description","labels":["label1"],"spec_version":"spec1"} +{"name":"test-domain-ioc-4","type":"domain-name","value":"example4.com","severity":"3","created":"2024-06-24T23:38:59.817536Z","modified":"2024-06-25T00:38:59.81754Z","description":"test ioc description","labels":["label1"],"spec_version":"spec1"} +{"name":"test-domain-ioc-5","type":"domain-name","value":"example5.com","severity":"3","created":"2024-06-24T23:38:59.817536Z","modified":"2024-06-25T00:38:59.81754Z","description":"test ioc description","labels":["label1"],"spec_version":"spec1"} +{"name":"test-hash-ioc1","type":"hashes","value":"examplehash0000000000000000000000000000000000000000000000000001","severity":"3","created":"2024-06-24T23:38:59.817536Z","modified":"2024-06-25T00:38:59.81754Z","description":"test ioc description","labels":["label1"],"spec_version":"spec1"} +{"name":"test-hash-ioc2","type":"hashes","value":"examplehash0000000000000000000000000000000000000000000000000002","severity":"3","created":"2024-06-24T23:38:59.817536Z","modified":"2024-06-25T00:38:59.81754Z","description":"test ioc description","labels":["label1"],"spec_version":"spec1"} +{"name":"test-hash-ioc3","type":"hashes","value":"examplehash0000000000000000000000000000000000000000000000000003","severity":"3","created":"2024-06-24T23:38:59.817536Z","modified":"2024-06-25T00:38:59.81754Z","description":"test ioc description","labels":["label1"],"spec_version":"spec1"} +{"name":"test-hash-ioc4","type":"hashes","value":"examplehash0000000000000000000000000000000000000000000000000004","severity":"3","created":"2024-06-24T23:38:59.817536Z","modified":"2024-06-25T00:38:59.81754Z","description":"test ioc description","labels":["label1"],"spec_version":"spec1"} +{"name":"test-hash-ioc5","type":"hashes","value":"examplehash0000000000000000000000000000000000000000000000000005","severity":"3","created":"2024-06-24T23:38:59.817536Z","modified":"2024-06-25T00:38:59.81754Z","description":"test ioc description","labels":["label1"],"spec_version":"spec1"} +{"name":"test-ipv4-ioc1","type":"ipv4-addr","value":"1.0.0.0","severity":"3","created":"2024-06-24T23:38:59.817536Z","modified":"2024-06-25T00:38:59.81754Z","description":"test ioc description","labels":["label1"],"spec_version":"spec1"} +{"name":"test-ipv4-ioc2","type":"ipv4-addr","value":"2.0.0.0","severity":"3","created":"2024-06-24T23:38:59.817536Z","modified":"2024-06-25T00:38:59.81754Z","description":"test ioc description","labels":["label1"],"spec_version":"spec1"} +{"name":"test-ipv4-ioc3","type":"ipv4-addr","value":"3.0.0.0","severity":"3","created":"2024-06-24T23:38:59.817536Z","modified":"2024-06-25T00:38:59.81754Z","description":"test ioc description","labels":["label1"],"spec_version":"spec1"} +{"name":"test-ipv4-ioc4","type":"ipv4-addr","value":"4.0.0.0","severity":"3","created":"2024-06-24T23:38:59.817536Z","modified":"2024-06-25T00:38:59.81754Z","description":"test ioc description","labels":["label1"],"spec_version":"spec1"} +{"name":"test-ipv4-ioc5","type":"ipv4-addr","value":"5.0.0.0","severity":"3","created":"2024-06-24T23:38:59.817536Z","modified":"2024-06-25T00:38:59.81754Z","description":"test ioc description","labels":["label1"],"spec_version":"spec1"} +{"name":"test-ipv6-ioc1","type":"ipv6-addr","value":"1000:0000:0000:0000:0000:0000:0000:0000","severity":"3","created":"2024-06-24T23:38:59.817536Z","modified":"2024-06-25T00:38:59.81754Z","description":"test ioc description","labels":["label1"],"spec_version":"spec1"} +{"name":"test-ipv6-ioc2","type":"ipv6-addr","value":"2000:0000:0000:0000:0000:0000:0000:0000","severity":"3","created":"2024-06-24T23:38:59.817536Z","modified":"2024-06-25T00:38:59.81754Z","description":"test ioc description","labels":["label1"],"spec_version":"spec1"} +{"name":"test-ipv6-ioc3","type":"ipv6-addr","value":"3000:0000:0000:0000:0000:0000:0000:0000","severity":"3","created":"2024-06-24T23:38:59.817536Z","modified":"2024-06-25T00:38:59.81754Z","description":"test ioc description","labels":["label1"],"spec_version":"spec1"} +{"name":"test-ipv6-ioc4","type":"ipv6-addr","value":"4000:0000:0000:0000:0000:0000:0000:0000","severity":"3","created":"2024-06-24T23:38:59.817536Z","modified":"2024-06-25T00:38:59.81754Z","description":"test ioc description","labels":["label1"],"spec_version":"spec1"} +{"name":"test-ipv6-ioc5","type":"ipv6-addr","value":"5000:0000:0000:0000:0000:0000:0000:0000","severity":"3","created":"2024-06-24T23:38:59.817536Z","modified":"2024-06-25T00:38:59.81754Z","description":"test ioc description","labels":["label1"],"spec_version":"spec1"} From a0bf74ae9dc0880c801873be16daa7934064b7b7 Mon Sep 17 00:00:00 2001 From: AntonEliatra Date: Wed, 7 Aug 2024 15:23:51 +0100 Subject: [PATCH 103/154] Add a comment regarding read/write control via HTTP verbs #863 (#7909) * adding a comment regarding read/write control via HTTP verbs #863 Signed-off-by: AntonEliatra * adding a comment regarding read/write control via HTTP verbs #863 Signed-off-by: AntonEliatra * Update rest-layer-authz.md Signed-off-by: AntonEliatra * Update _security/access-control/rest-layer-authz.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: AntonEliatra Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _security/access-control/rest-layer-authz.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/_security/access-control/rest-layer-authz.md b/_security/access-control/rest-layer-authz.md index 71960481ff..b882cf04e0 100644 --- a/_security/access-control/rest-layer-authz.md +++ b/_security/access-control/rest-layer-authz.md @@ -16,6 +16,8 @@ Developers, on the other hand, will need to understand the ideas behind `NamedRo The benefits of using the REST layer for authorization include the ability to authorize requests at the REST layer and filter out unauthorized requests. As a result, this decreases the processing burden on the transport layer while allowing granular control over access to APIs. +Some read operations, such as [scroll]({{site.url}}{{site.baseurl}}/api-reference/scroll/), manage state. Therefore, it is recommended to control read and write access using the Security plugin [permissions]({{site.url}}{{site.baseurl}}/security/access-control/permissions/), instead of allowing/blocking HTTP request verbs. + You must have the Security plugin enabled to use REST layer authorization. {: .note } From 646cbd2e4a45483a7386ad080ab915bcac4418cf Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Wed, 7 Aug 2024 11:50:35 -0400 Subject: [PATCH 104/154] Remove comment from apostrophe token filter request (#7930) Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _analyzers/token-filters/apostrophe.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_analyzers/token-filters/apostrophe.md b/_analyzers/token-filters/apostrophe.md index 27ee92266b..3c4aaca216 100644 --- a/_analyzers/token-filters/apostrophe.md +++ b/_analyzers/token-filters/apostrophe.md @@ -22,7 +22,7 @@ PUT /custom_text_index "analyzer": { "custom_analyzer": { "type": "custom", - "tokenizer": "standard", // splits text into words + "tokenizer": "standard" "filter": [ "lowercase", "apostrophe" From a1b7736863e1f18577de0dbfaef5282ef63df116 Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Wed, 7 Aug 2024 12:23:57 -0400 Subject: [PATCH 105/154] fix a typo (#7934) Signed-off-by: Eric Pugh --- _ml-commons-plugin/agents-tools/index.md | 2 +- _ml-commons-plugin/api/agent-apis/register-agent.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/_ml-commons-plugin/agents-tools/index.md b/_ml-commons-plugin/agents-tools/index.md index 1fa86bdf67..f1c2c49b20 100644 --- a/_ml-commons-plugin/agents-tools/index.md +++ b/_ml-commons-plugin/agents-tools/index.md @@ -130,7 +130,7 @@ POST /_plugins/_ml/agents/_register { "type": "VectorDBTool", "name": "VectorDBTool", - "description": "A tool to search opensearch index with natural language question. If you don't know answer for some question, you should always try to search data with this tool. Action Input: ", + "description": "A tool to search opensearch index with natural language question. If you don't know answer for some question, you should always try to search data with this tool. Action Input: ", "parameters": { "model_id": "YOUR_TEXT_EMBEDDING_MODEL_ID", "index": "my_test_data", diff --git a/_ml-commons-plugin/api/agent-apis/register-agent.md b/_ml-commons-plugin/api/agent-apis/register-agent.md index 339c25bf0e..eeea2af715 100644 --- a/_ml-commons-plugin/api/agent-apis/register-agent.md +++ b/_ml-commons-plugin/api/agent-apis/register-agent.md @@ -161,7 +161,7 @@ POST /_plugins/_ml/agents/_register { "type": "VectorDBTool", "name": "VectorDBTool", - "description": "A tool to search opensearch index with natural language question. If you don't know answer for some question, you should always try to search data with this tool. Action Input: ", + "description": "A tool to search opensearch index with natural language question. If you don't know answer for some question, you should always try to search data with this tool. Action Input: ", "parameters": { "model_id": "", "index": "", From 106009c59bb1786e3dadcbe126729efc61d7a0f7 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Wed, 7 Aug 2024 12:25:00 -0400 Subject: [PATCH 106/154] Update apostrophe.md (#7932) Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _analyzers/token-filters/apostrophe.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_analyzers/token-filters/apostrophe.md b/_analyzers/token-filters/apostrophe.md index 3c4aaca216..47d7698081 100644 --- a/_analyzers/token-filters/apostrophe.md +++ b/_analyzers/token-filters/apostrophe.md @@ -22,7 +22,7 @@ PUT /custom_text_index "analyzer": { "custom_analyzer": { "type": "custom", - "tokenizer": "standard" + "tokenizer": "standard", "filter": [ "lowercase", "apostrophe" From 8b731c55e23b63a68ba77c73370fd4116f5c4604 Mon Sep 17 00:00:00 2001 From: ldrick <3674067+ldrick@users.noreply.github.com> Date: Wed, 7 Aug 2024 20:30:46 +0200 Subject: [PATCH 107/154] Add documentation for ingest-attachment plugin (#7891) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * add ingest-attachment plugin doc Signed-off-by: Ricky Lippmann * extend ingest-attachment with information how to limit content Signed-off-by: Ricky Lippmann * Added target_bulk_bytes to the docs for logstash-output plugin (#7869) * Added target_bulk_bytes Signed-off-by: Sander van de Geijn * Update _tools/logstash/ship-to-opensearch.md Nice Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Sander van de Geijn * Update _tools/logstash/ship-to-opensearch.md Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update ship-to-opensearch.md * Remove "we" * Update ship-to-opensearch.md * Update ship-to-opensearch.md * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Sander van de Geijn Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: Ricky Lippmann * Add doc for binary format support in k-NN (#7840) * Add doc for binary format support in k-NN Signed-off-by: Junqiu Lei * Resolve tech feedback Signed-off-by: Junqiu Lei * Doc review Signed-off-by: Fanit Kolchina * Add newline Signed-off-by: Fanit Kolchina * Formatting Signed-off-by: Fanit Kolchina * Link fix Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Add query results to examples Signed-off-by: Junqiu Lei * Rephrased sentences and changed vector field name Signed-off-by: Fanit Kolchina * Editorial review Signed-off-by: Fanit Kolchina * Remove details from one of the requests Signed-off-by: Fanit Kolchina --------- Signed-off-by: Junqiu Lei Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Fanit Kolchina Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: Ricky Lippmann * Edit for redundant information and sections across Data Prepper (#7127) * Edit for redundant information and sections across Data Prepper Signed-off-by: Melissa Vagi * Edit for redundant information and sections across Data Prepper Signed-off-by: Melissa Vagi * Rewrite expression syntax and reorganize doc structure for readability Signed-off-by: Melissa Vagi * Rewrite expression syntax and reorganize doc structure for readability Signed-off-by: Melissa Vagi * Rewrite expression syntax and reorganize doc structure for readability Signed-off-by: Melissa Vagi * Rewrite expression syntax and reorganize doc structure for readability Signed-off-by: Melissa Vagi * Rewrite expression syntax and reorganize doc structure for readability Signed-off-by: Melissa Vagi * Update _data-prepper/index.md Signed-off-by: Melissa Vagi * Update configuring-data-prepper.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/expression-syntax.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/expression-syntax.md Signed-off-by: Melissa Vagi * Update _data-prepper/pipelines/pipelines.md Signed-off-by: Melissa Vagi * Update expression-syntax.md Signed-off-by: Melissa Vagi * Create Functions subpages Signed-off-by: Melissa Vagi * Create functions subpages Signed-off-by: Melissa Vagi * Copy edit Signed-off-by: Melissa Vagi * add remaining subpages Signed-off-by: Melissa Vagi * Update _data-prepper/index.md Co-authored-by: Nathan Bower Signed-off-by: Heather Halter * Apply suggestions from code review Accepted editorial suggestions. Co-authored-by: Nathan Bower Signed-off-by: Heather Halter * Apply suggestions from code review Accepted more editorial suggestions that were hidden. Co-authored-by: Nathan Bower Signed-off-by: Heather Halter * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: David Venable * removed-line Signed-off-by: Heather Halter * Fixed broken link to pipelines Signed-off-by: Heather Halter * Fixed broken links on Update add-entries.md Signed-off-by: Heather Halter * Fixed broken link in Update dynamo-db.md Signed-off-by: Heather Halter * Fixed link syntax in Update index.md Signed-off-by: Heather Halter --------- Signed-off-by: Melissa Vagi Signed-off-by: Heather Halter Signed-off-by: David Venable Signed-off-by: Heather Halter Co-authored-by: Heather Halter Co-authored-by: Nathan Bower Co-authored-by: David Venable Signed-off-by: Ricky Lippmann * Update index.md (#7893) fixed typo Signed-off-by: Philipp Dünnebeil <53494432+PhilD90@users.noreply.github.com> Signed-off-by: Ricky Lippmann * Fix typo and make left nav heading uniform for neural sparse processor (#7895) Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Ricky Lippmann * Add custom JSON lexer and highlighting color scheme (#7892) * Add custom JSON lexer and highlighting color scheme Signed-off-by: Fanit Kolchina * Update _getting-started/quickstart.md Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: Ricky Lippmann * Add model names to Vale (#7901) Signed-off-by: Fanit Kolchina Signed-off-by: Ricky Lippmann * Renamed data prepper files to have dashes for consistency (#7790) * Renamed data prepper files to have dashes for consistency Signed-off-by: Fanit Kolchina * More files Signed-off-by: Fanit Kolchina --------- Signed-off-by: Fanit Kolchina Signed-off-by: Ricky Lippmann * Add documentation for ml inference search request processor/ search response processor (#7852) * draft ml inference search request processor Signed-off-by: Mingshi Liu * add doc Signed-off-by: Mingshi Liu * add doc Signed-off-by: Mingshi Liu * Doc review Signed-off-by: Fanit Kolchina * Fixed links Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Unify processor docs Signed-off-by: Fanit Kolchina * Update _query-dsl/geo-and-xy/xy.md Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Remove note Signed-off-by: Fanit Kolchina * Fix link Signed-off-by: Fanit Kolchina --------- Signed-off-by: Mingshi Liu Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Fanit Kolchina Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: Ricky Lippmann * Refactor k-NN documentation (#7890) * Refactor k-NN documentation Signed-off-by: Fanit Kolchina * Change field name for cohesiveness Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Heather Halter Co-authored-by: Nathan Bower Signed-off-by: Ricky Lippmann * Ml commons batch inference (#7899) * add batch inference API Signed-off-by: Xun Zhang * add more links and mark the api as experimental Signed-off-by: Xun Zhang * use openAI as the blueprint example details Signed-off-by: Xun Zhang * address comments Signed-off-by: Xun Zhang * Doc review Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Xun Zhang Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Fanit Kolchina Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: Ricky Lippmann * Remove repeated sentence in distributed tracing doc (#7906) Signed-off-by: Peter Alfonsi Co-authored-by: Peter Alfonsi Signed-off-by: Ricky Lippmann * Add apostrophe token filter page #7871 (#7884) * adding apostrophe token filter page #7871 Signed-off-by: AntonEliatra * fixing vale error Signed-off-by: AntonEliatra * Update apostrophe-token-filter.md Signed-off-by: AntonEliatra * updating the naming Signed-off-by: AntonEliatra * updating as per the review comments Signed-off-by: AntonEliatra * updating the heading to Apostrophe token filter Signed-off-by: AntonEliatra * updating as per PR comments Signed-off-by: AntonEliatra * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: AntonEliatra * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: AntonEliatra --------- Signed-off-by: AntonEliatra Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower Signed-off-by: Ricky Lippmann * removed unnecessary backslash Signed-off-by: Ricky Lippmann * fix:add missing whitespace in table Signed-off-by: Ricky Lippmann * docs: add link to tika supported file formats Signed-off-by: Ricky Lippmann * Update ingest-attachment-plugin.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * adjust to keep technical specific information with improved wording Signed-off-by: Ricky Lippmann * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Ricky Lippmann Signed-off-by: Sander van de Geijn Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Junqiu Lei Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi Signed-off-by: Heather Halter Signed-off-by: David Venable Signed-off-by: Heather Halter Signed-off-by: Philipp Dünnebeil <53494432+PhilD90@users.noreply.github.com> Signed-off-by: Mingshi Liu Signed-off-by: Xun Zhang Signed-off-by: Peter Alfonsi Signed-off-by: AntonEliatra Co-authored-by: Sander van de Geijn Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Nathan Bower Co-authored-by: Junqiu Lei Co-authored-by: Fanit Kolchina Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Melissa Vagi Co-authored-by: Heather Halter Co-authored-by: David Venable Co-authored-by: Philipp Dünnebeil <53494432+PhilD90@users.noreply.github.com> Co-authored-by: Mingshi Liu Co-authored-by: Xun Zhang Co-authored-by: Peter Alfonsi Co-authored-by: Peter Alfonsi Co-authored-by: AntonEliatra --- .../additional-plugins/index.md | 6 +- .../ingest-attachment-plugin.md | 228 ++++++++++++++++++ 2 files changed, 231 insertions(+), 3 deletions(-) create mode 100644 _install-and-configure/additional-plugins/ingest-attachment-plugin.md diff --git a/_install-and-configure/additional-plugins/index.md b/_install-and-configure/additional-plugins/index.md index de97af0b1a..87d0662442 100644 --- a/_install-and-configure/additional-plugins/index.md +++ b/_install-and-configure/additional-plugins/index.md @@ -9,7 +9,6 @@ nav_order: 10 There are many more plugins available in addition to those provided by the standard distribution of OpenSearch. These additional plugins have been built by OpenSearch developers or members of the OpenSearch community. While it isn't possible to provide an exhaustive list (because many plugins are not maintained in an OpenSearch GitHub repository), the following plugins, available in the [OpenSearch/plugins](https://github.com/opensearch-project/OpenSearch/tree/main/plugins) directory on GitHub, are some of the plugins that can be installed using one of the installation options, for example, using the command `bin/opensearch-plugin install `. - | Plugin name | Earliest available version | | :--- | :--- | | analysis-icu | 1.0.0 | @@ -22,7 +21,7 @@ There are many more plugins available in addition to those provided by the stand | discovery-azure-classic | 1.0.0 | | discovery-ec2 | 1.0.0 | | discovery-gce | 1.0.0 | -| ingest-attachment | 1.0.0 | +| [`ingest-attachment`]({{site.url}}{{site.baseurl}}/install-and-configure/additional-plugins/ingest-attachment-plugin/) | 1.0.0 | | mapper-annotated-text | 1.0.0 | | mapper-murmur3 | 1.0.0 | | [`mapper-size`]({{site.url}}{{site.baseurl}}/install-and-configure/additional-plugins/mapper-size-plugin/) | 1.0.0 | @@ -34,7 +33,8 @@ There are many more plugins available in addition to those provided by the stand | store-smb | 1.0.0 | | transport-nio | 1.0.0 | - ## Related articles + [Installing plugins]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/) +[`ingest-attachment` plugin]({{site.url}}{{site.baseurl}}/install-and-configure/additional-plugins/ingest-attachment-plugin/) [`mapper-size` plugin]({{site.url}}{{site.baseurl}}/install-and-configure/additional-plugins/mapper-size-plugin/) diff --git a/_install-and-configure/additional-plugins/ingest-attachment-plugin.md b/_install-and-configure/additional-plugins/ingest-attachment-plugin.md new file mode 100644 index 0000000000..d2062f441b --- /dev/null +++ b/_install-and-configure/additional-plugins/ingest-attachment-plugin.md @@ -0,0 +1,228 @@ +--- +layout: default +title: Ingest-attachment plugin +parent: Installing plugins +nav_order: 20 + +--- + +# Ingest-attachment plugin + +The `ingest-attachment` plugin enables OpenSearch to extract content and other information from files using the Apache text extraction library [Tika](https://tika.apache.org/). +Supported document formats include PPT, PDF, RTF, ODF, and many more Tika ([Supported Document Formats](https://tika.apache.org/2.9.2/formats.html)). + +The input field must be a base64-encoded binary. + +## Installing the plugin + +Install the `ingest-attachment` plugin using the following command: + +```sh +./bin/opensearch-plugin install ingest-attachment +``` + +## Attachment processor options + +| Name | Required | Default | Description | +| :--- | :--- | :--- | :--- | +| `field` | Yes | N/A | The field from which to get the base64-encoded binary. | +| `target_field` | No | Attachment | The field that stores the attachment information. | +| `properties` | No | All properties | An array of properties that should be stored. Can be `content`, `language`, `date`, `title`, `author`, `keywords`, `content_type`, or `content_length`. | +| `indexed_chars` | No | `100_000` | The number of characters used for extraction to prevent fields from becoming too large. Use `-1` for no limit. | +| `indexed_chars_field` | No | `null` | The field name used to overwrite the number of chars being used for extraction, for example, `indexed_chars`. | +| `ignore_missing` | No | `false` | When `true`, the processor exits without modifying the document when the specified field doesn't exist. | + +## Example + +The following steps show you how to get started with the `ingest-attachment` plugin. + +### Step 1: Create an index for storing your attachments + +The following command creates an index for storing your attachments: + +```json +PUT /example-attachment-index +{ + "mappings": { + "properties": {} + } +} +``` + +### Step 2: Create a pipeline + +The following command creates a pipeline containing the attachment processor: + +```json +PUT _ingest/pipeline/attachment +{ + "description" : "Extract attachment information", + "processors" : [ + { + "attachment" : { + "field" : "data" + } + } + ] +} +``` + +### Step 3: Store an attachment + +Convert the attachment to a base64 string to pass it as `data`. +In this example the `base64` command converts the file `lorem.rtf`: + +```sh +base64 lorem.rtf +``` + +Alternatively, you can use Node.js to read the file to `base64`, as shown in the following commands: + +```typescript +import * as fs from "node:fs/promises"; +import path from "node:path"; + +const filePath = path.join(import.meta.dirname, "lorem.rtf"); +const base64File = await fs.readFile(filePath, { encoding: "base64" }); + +console.log(base64File); +``` + +The`.rtf` file contains the following base64 text: + +`Lorem ipsum dolor sit amet`: +`e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=`. + +```json +PUT example-attachment-index/_doc/lorem_rtf?pipeline=attachment +{ + "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=" +} +``` + +### Query results + +With the attachment processed, you can now search through the data using search queries, as shown in the following example: + +```json +POST example-attachment-index/_search +{ + "query": { + "match": { + "attachment.content": "ipsum" + } + } +} +``` + +OpenSearch responds with the following: + +```json +{ + "took": 5, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1.1724279, + "hits": [ + { + "_index": "example-attachment-index", + "_id": "lorem_rtf", + "_score": 1.1724279, + "_source": { + "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=", + "attachment": { + "content_type": "application/rtf", + "language": "pt", + "content": "Lorem ipsum dolor sit amet", + "content_length": 28 + } + } + } + ] + } +} +``` + +## Extracted information + +The following fields can be extracted using the plugin: + +- `content` +- `language` +- `date` +- `title` +- `author` +- `keywords` +- `content_type` +- `content_length` + +To extract only a subset of these fields, define them in the `properties` of the +pipeline processor, as shown in the following example: + +```json +PUT _ingest/pipeline/attachment +{ + "description" : "Extract attachment information", + "processors" : [ + { + "attachment" : { + "field" : "data", + "properties": ["content", "title", "author"] + } + } + ] +} +``` + +## Limit the extracted content + +To prevent extracting too many characters and overloading the node memory, the default limit is `100_000`. +You can change this value using the setting `indexed_chars`. For example, you can use `-1` for unlimited characters, but you need to make sure you have enough HEAP space on your OpenSearch node to extract the content of large documents. + +You can also define this limit per document using the `indexed_chars_field` request field. +If a document contains `indexed_chars_field`, it will overwrite the `indexed_chars` setting, as shown in the following example: + +```json +PUT _ingest/pipeline/attachment +{ + "description" : "Extract attachment information", + "processors" : [ + { + "attachment" : { + "field" : "data", + "indexed_chars" : 10, + "indexed_chars_field" : "max_chars", + } + } + ] +} +``` + +With the attachment pipeline configured, you can extract the default `10` characters without specifying `max_chars` in the request, as shown in the following example: + +```json +PUT example-attachment-index/_doc/lorem_rtf?pipeline=attachment +{ + "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=" +} +``` + +Alternatively, you can change the `max_char` per document in order to extract up to `15` characters, as shown in the following example: + +```json +PUT example-attachment-index/_doc/lorem_rtf?pipeline=attachment +{ + "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=", + "max_chars": 15 +} +``` From 1edf5ae33618e401e59603bf5659c7d361c90e74 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Wed, 7 Aug 2024 16:54:23 -0400 Subject: [PATCH 108/154] Add 2.16 version (#7910) Signed-off-by: Fanit Kolchina --- _config.yml | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/_config.yml b/_config.yml index be015cec06..8a43e2f61a 100644 --- a/_config.yml +++ b/_config.yml @@ -5,10 +5,10 @@ baseurl: "/docs/latest" # the subpath of your site, e.g. /blog url: "https://opensearch.org" # the base hostname & protocol for your site, e.g. http://example.com permalink: /:path/ -opensearch_version: '2.15.0' -opensearch_dashboards_version: '2.15.0' -opensearch_major_minor_version: '2.15' -lucene_version: '9_10_0' +opensearch_version: '2.16.0' +opensearch_dashboards_version: '2.16.0' +opensearch_major_minor_version: '2.16' +lucene_version: '9_11_1' # Build settings markdown: kramdown From 9ddbb579ff2e2f452be0ef64fb31e36f509985b0 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Wed, 7 Aug 2024 16:54:34 -0400 Subject: [PATCH 109/154] Add 2.16 to version history (#7911) Signed-off-by: Fanit Kolchina --- _about/version-history.md | 1 + 1 file changed, 1 insertion(+) diff --git a/_about/version-history.md b/_about/version-history.md index 09f331b235..d7273ffedb 100644 --- a/_about/version-history.md +++ b/_about/version-history.md @@ -9,6 +9,7 @@ permalink: /version-history/ OpenSearch version | Release highlights | Release date :--- | :--- | :--- +[2.16.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.16.0.md) | Includes built-in byte vector quantization and binary vector support in k-NN. Adds new sort, split, and ML inference search processors for search pipelines. Provides application-based configuration templates and additional plugins to integrate multiple data sources in OpenSearch Dashboards. Includes an experimental Batch Predict ML Commons API. For a full list of release highlights, see the Release Notes. | 06 August 2024 [2.15.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.15.0.md) | Includes parallel ingestion processing, SIMD support for exact search, and the ability to disable doc values for the k-NN field. Adds wildcard and derived field types. Improves performance for single-cardinality aggregations, rolling upgrades to remote-backed clusters, and more metrics for top N queries. For a full list of release highlights, see the Release Notes. | 25 June 2024 [2.14.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.14.0.md) | Includes performance improvements to hybrid search and date histogram queries with multi-range traversal, ML model integration within the Ingest API, semantic cache for LangChain applications, low-level vector query interface for neural sparse queries, and improved k-NN search filtering. Provides an experimental tiered cache feature. For a full list of release highlights, see the Release Notes. | 14 May 2024 [2.13.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.13.0.md) | Makes agents and tools and the OpenSearch Assistant Toolkit generally available. Introduces vector quantization within OpenSearch. Adds LLM guardrails and hybrid search with aggregations. Adds the Bloom filter skipping index for Apache Spark data sources, I/O-based admission control, and the ability to add an alerting cluster that manages all alerting tasks. For a full list of release highlights, see the Release Notes. | 2 April 2024 From 520cf14c92fefb905eb028f316aad98c08d5428c Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Wed, 7 Aug 2024 17:00:31 -0400 Subject: [PATCH 110/154] Add 2.16 release notes (#7914) * Add 2.16 release notes Signed-off-by: Fanit Kolchina * Apply suggestions from code review Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update release-notes/opensearch-documentation-release-notes-2.16.0.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- ...arch-documentation-release-notes-2.16.0.md | 35 +++++++++++++++++++ 1 file changed, 35 insertions(+) create mode 100644 release-notes/opensearch-documentation-release-notes-2.16.0.md diff --git a/release-notes/opensearch-documentation-release-notes-2.16.0.md b/release-notes/opensearch-documentation-release-notes-2.16.0.md new file mode 100644 index 0000000000..8c0092d7bb --- /dev/null +++ b/release-notes/opensearch-documentation-release-notes-2.16.0.md @@ -0,0 +1,35 @@ +# OpenSearch Documentation Website 2.16.0 Release Notes + +The OpenSearch 2.16.0 documentation includes the following additions and updates. + +## New documentation for 2.16.0 + +- Alerting plugin - changed default setting to true for v2.16. [#7868](https://github.com/opensearch-project/documentation-website/pull/7868) +- Add experimental feature flag to dashboard assistant [#7855](https://github.com/opensearch-project/documentation-website/pull/7855) +- Add documentation for ml inference search request processor/ search response processor [#7852](https://github.com/opensearch-project/documentation-website/pull/7852) +- Update documentation for create / update repository api [#7851](https://github.com/opensearch-project/documentation-website/pull/7851) +- Add documentation for query insights - query metrics feature [#7846](https://github.com/opensearch-project/documentation-website/pull/7846) +- Add doc for binary format support in k-NN [#7840](https://github.com/opensearch-project/documentation-website/pull/7840) +- Add predefined model interface doc [#7830](https://github.com/opensearch-project/documentation-website/pull/7830) +- Add compatibility page [#7821](https://github.com/opensearch-project/documentation-website/pull/7821) +- Sorting and Search After in Hybrid Search [#7820](https://github.com/opensearch-project/documentation-website/pull/7820) +- Add disk free space cluster settings [#7799](https://github.com/opensearch-project/documentation-website/pull/7799) +- [Doc] Lucene inbuilt scalar quantization [#7797](https://github.com/opensearch-project/documentation-website/pull/7797) +- Documentation Updates for plugins.query.datasources.enabled SQL Setting [#7794](https://github.com/opensearch-project/documentation-website/pull/7794) +- Document new Split and Sort SearchResponseProcessors [#7767](https://github.com/opensearch-project/documentation-website/pull/7767) +- Adds Documentation for dynamic query parameters for kNN search request [#7761](https://github.com/opensearch-project/documentation-website/pull/7761) +- Add strict_allow_templates option for the dynamic mapping parameter [#7745](https://github.com/opensearch-project/documentation-website/pull/7745) +- Document CreateAnomalyDetectorTool [#7742](https://github.com/opensearch-project/documentation-website/pull/7742) +- Document the nested_path parameter in agent search tools [#7741](https://github.com/opensearch-project/documentation-website/pull/7741) +- Move bulk API's batch_size parameter to processors [#7719](https://github.com/opensearch-project/documentation-website/pull/7719) +- Add documentation for configuring the password hashing algorithm and its properties [#7697](https://github.com/opensearch-project/documentation-website/pull/7697) +- Add documentation for Deprovision Workflow API allow_delete parameter [#7639](https://github.com/opensearch-project/documentation-website/pull/7639) +- Add new update_fields parameter to update workflow API [#7632](https://github.com/opensearch-project/documentation-website/pull/7632) +- Add fingerprint processor [#7631](https://github.com/opensearch-project/documentation-website/pull/7631) +- Document new ingest and search pipeline allowlist settings [#7414](https://github.com/opensearch-project/documentation-website/pull/7414) +- Update docs for new clause count setting [#7391](https://github.com/opensearch-project/documentation-website/pull/7391) +- Add Threat Intelligence Section [#7905](https://github.com/opensearch-project/documentation-website/pull/7905) + +## Documentation for 2.16.0 experimental features + +- Ml commons batch inference [#7899](https://github.com/opensearch-project/documentation-website/pull/7899) From 912d2af0a2ed04abd861be8b6fb9becf800a8b73 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Wed, 7 Aug 2024 17:51:47 -0400 Subject: [PATCH 111/154] Add 2.16 to version selector on main (#7941) Signed-off-by: Fanit Kolchina --- _data/versions.json | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/_data/versions.json b/_data/versions.json index 0c99ed871e..4f7e55c21b 100644 --- a/_data/versions.json +++ b/_data/versions.json @@ -1,10 +1,11 @@ { - "current": "2.15", + "current": "2.16", "all": [ - "2.15", + "2.16", "1.3" ], "archived": [ + "2.15", "2.14", "2.13", "2.12", @@ -24,7 +25,7 @@ "1.1", "1.0" ], - "latest": "2.15" + "latest": "2.16" } From f959fcb8245012f5b5ade92630e36064d2e31563 Mon Sep 17 00:00:00 2001 From: Pawel Wlodarczyk Date: Thu, 8 Aug 2024 15:50:35 +0100 Subject: [PATCH 112/154] Update rolling-upgrade.md (#7943) The current procedure doesn't follow the correct order. The shard allocation must be re-enabled after each node restart and the cluster status must be checked before upgrading the next node. Signed-off-by: Pawel Wlodarczyk --- .../upgrade-opensearch/rolling-upgrade.md | 24 +++++++++---------- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/_install-and-configure/upgrade-opensearch/rolling-upgrade.md b/_install-and-configure/upgrade-opensearch/rolling-upgrade.md index 9052cd4c93..f6b0470b66 100644 --- a/_install-and-configure/upgrade-opensearch/rolling-upgrade.md +++ b/_install-and-configure/upgrade-opensearch/rolling-upgrade.md @@ -131,18 +131,6 @@ Review [Upgrading OpenSearch]({{site.url}}{{site.baseurl}}/upgrade-opensearch/in ```bash os-node-01 v1.3.7 ``` -1. Repeat steps 5 through 9 for each node in your cluster. Remember to upgrade an eligible cluster manager node last. After replacing the last node, query the `_cat/nodes` endpoint to confirm that all nodes have joined the cluster. The cluster is now bootstrapped to the new version of OpenSearch. You can verify the cluster version by querying the `_cat/nodes` API endpoint: - ```bash - GET "/_cat/nodes?v&h=name,version,node.role,master" | column -t - ``` - The response should look similar to the following example: - ```bash - name version node.role master - os-node-04 1.3.7 dimr - - os-node-02 1.3.7 dimr * - os-node-01 1.3.7 dimr - - os-node-03 1.3.7 dimr - - ``` 1. Reenable shard replication: ```json PUT "/_cluster/settings?pretty" @@ -193,6 +181,18 @@ Review [Upgrading OpenSearch]({{site.url}}{{site.baseurl}}/upgrade-opensearch/in "active_shards_percent_as_number" : 100.0 } ``` +1. Repeat steps 5 through 11 for each node in your cluster. Remember to upgrade an eligible cluster manager node last. After replacing the last node, query the `_cat/nodes` endpoint to confirm that all nodes have joined the cluster. The cluster is now bootstrapped to the new version of OpenSearch. You can verify the cluster version by querying the `_cat/nodes` API endpoint: + ```bash + GET "/_cat/nodes?v&h=name,version,node.role,master" | column -t + ``` + The response should look similar to the following example: + ```bash + name version node.role master + os-node-04 1.3.7 dimr - + os-node-02 1.3.7 dimr * + os-node-01 1.3.7 dimr - + os-node-03 1.3.7 dimr - + ``` 1. The upgrade is now complete, and you can begin enjoying the latest features and fixes! ### Related articles From 4f1c22014d29318c15c51a6daf8220687870e8ef Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Thu, 8 Aug 2024 09:54:34 -0500 Subject: [PATCH 113/154] Add Blocks API and add additional Get document descriptions. (#7836) * Add additional Get document information Signed-off-by: Archer * Add Block Index APi Signed-off-by: Archer * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Remove redundant information Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Heather Halter Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Archer Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Heather Halter Co-authored-by: Nathan Bower --- _api-reference/document-apis/get-documents.md | 98 ++++++++++++++++--- _api-reference/index-apis/blocks.md | 59 +++++++++++ 2 files changed, 146 insertions(+), 11 deletions(-) create mode 100644 _api-reference/index-apis/blocks.md diff --git a/_api-reference/document-apis/get-documents.md b/_api-reference/document-apis/get-documents.md index d493df136b..232e9083c7 100644 --- a/_api-reference/document-apis/get-documents.md +++ b/_api-reference/document-apis/get-documents.md @@ -11,29 +11,28 @@ redirect_from: **Introduced 1.0** {: .label .label-purple } -After adding a JSON document to your index, you can use the get document API operation to retrieve the document's information and data. +After adding a JSON document to your index, you can use the Get Document API operation to retrieve the document's information and data. -## Example - -```json -GET sample-index1/_doc/1 -``` -{% include copy-curl.html %} ## Path and HTTP methods -``` +Use the GET method to retrieve a document and its source or stored fields from a particular index. Use the HEAD method to verify that a document exists: + +```json GET /_doc/<_id> HEAD /_doc/<_id> ``` -``` + +Use `_source` to retrieve the document source or to verify that it exists: + +```json GET /_source/<_id> HEAD /_source/<_id> ``` -## URL parameters +## Query parameters -All get document URL parameters are optional. +All query parameters are optional. Parameter | Type | Description :--- | :--- | :--- @@ -48,6 +47,83 @@ _source_includes | String | A comma-separated list of source fields to include i version | Integer | The version of the document to return, which must match the current version of the document. version_type | Enum | Retrieves a specifically typed document. Available options are `external` (retrieve the document if the specified version number is greater than the document's current version) and `external_gte` (retrieve the document if the specified version number is greater than or equal to the document's current version). For example, to retrieve version 3 of a document, use `/_doc/1?version=3&version_type=external`. +### Real time + +The OpenSearch Get Document API operates in real time by default, which means that it retrieves the latest version of the document regardless of the index's refresh rate or the rate at which new data becomes searchable. However, if you request stored fields (using the `stored_fields` parameter) for a document that has been updated but not yet refreshed, then the Get Document API parses and analyzes the document's source to extract those stored fields. + +To disable the real-time behavior and retrieve the document based on the last refreshed state of the index, set the `realtime` parameter to `false`. + +### Source filtering + +By default, the Get Document API returns the entire contents of the `_source` field for the requested document. However, you can choose to exclude the `_source` field from the response by using the `_source` URL parameter and setting it to `false`, as shown in the following example: + +```json +GET test-index/_doc/0?_source=false +``` + +#### `source` includes and excludes + +If you only want to retrieve specific fields from the source, use the `_source_includes` or `_source_excludes` parameters to include or exclude particular fields, respectively. This can be beneficial for large documents because retrieving only the required fields can reduce network overhead. + +Both parameters accept a comma-separated list of fields and wildcard expressions, as shown in the following example, where any `_source` that contains `*.play` is included in the response but sources with the field `entities` are excluded: + +```json +GET test-index/_doc/0?_source_includes=*.play&_source_excludes=entities +``` + +#### Shorter notation + +If you only want to include certain fields and don't need to exclude any, you can use a shorter notation by specifying the desired fields directly in the `_source` parameter: + +```json +GET test-index/_doc/0?_source=*.id +``` + +### Routing + +When indexing documents in OpenSearch, you can specify a `routing` value to control the shard assignments for documents. If routing was used during indexing, you must provide the same routing value when retrieving the document using the Get Document API, as shown in the following example: + +```json +GET test-index/_doc/1?routing=user1 +``` + +This request retrieves the document with the ID `1`, but it uses the routing value "user1" to determine on which shard the document is stored. If the correct routing value is not specified, the Get Document API is not able to locate and fetch the requested document. + +### Preference + +The Get Document API allows you to control which shard replica handles the request. By default, the operation is randomly distributed across the available shard replicas. + +However, you can specify a preference to influence the replica selection. The preference can be set to one of the following values: + +- `_local`: The operation attempts to execute on a locally allocated shard replica, if possible. This can improve performance by reducing network overhead. +- Custom (string) value: Specifying a custom string value ensures that requests with the same value are routed to the same set of shards. This consistency can be beneficial when managing shards in different refresh states because it prevents "jumping values" that may occur when hitting shards with varying data visibility. A common practice is to use a web session ID or a user name as the custom value. + + +### Refresh + +Set the `refresh` parameter to `true` to force a refresh of the relevant shard before running the Get Document API operation. This ensures that the most recent data changes are made searchable and visible to the API. However, a refresh should be performed judiciously because it can potentially impose a heavy load on the system and slow down indexing performance. It's recommended to carefully evaluate the trade-off between data freshness and system load before enabling the `refresh` parameter. + +### Distributed + +When running the Get Document API, OpenSearch first calculates a hash value based on the document ID, which determines the specific ID of the shard on which the document resides. The operation is then redirected to one of the replicas (including the primary shard and its replica shards) in that shard ID group, and the result is returned from that replica. + +A higher number of shard replicas improves the scalability and performance of GET operations because the load can be distributed across multiple replica shards. This means that as the number of replicas increases, you can achieve better scaling and throughput for Get Document API requests. + +### Versioning support + +Use the `version` parameter to retrieve a document only if its current version matches the specified version number. This can be useful for ensuring data consistency and preventing conflicts when working with versioned documents. + +Internally, when a document is updated in OpenSearch, the original version is marked as deleted, and a new version of the document is added. However, the original version doesn't immediately disappear from the system. While you won't be able to access it through the Get Document API, OpenSearch manages the cleanup of deleted document versions in the background as you continue indexing new data. + +## Example request + +The following example request retrieves information about a document named `1`: + +```json +GET sample-index1/_doc/1 +``` +{% include copy-curl.html %} + ## Example response ```json diff --git a/_api-reference/index-apis/blocks.md b/_api-reference/index-apis/blocks.md new file mode 100644 index 0000000000..61b0e1ddd6 --- /dev/null +++ b/_api-reference/index-apis/blocks.md @@ -0,0 +1,59 @@ +--- +layout: default +title: Blocks +parent: Index APIs +nav_order: 6 +--- + +# Blocks +**Introduced 1.0** +{: .label .label-purple } + +Use the Blocks API to limit certain operations on a specified index. Different types of blocks allow you to restrict index write, read, or metadata operations. +For example, adding a `write` block through the API ensures that all index shards have properly accounted for the block before returning a successful response. Any in-flight write operations to the index must be complete before the `write` block takes effect. + +## Path and HTTP methods + +```json +PUT //_block/ +``` + +## Path parameters + +| Parameter | Data type | Description | +:--- | :--- | :--- +| `index` | String | A comma-delimited list of index names. Wildcard expressions (`*`) are supported. To target all data streams and indexes in a cluster, use `_all` or `*`. Optional. | +| `` | String | Specifies the type of block to apply to the index. Valid values are:
`metadata`: Disables all metadata changes, such as closing the index.
`read`: Disables any read operations.
`read_only`: Disables any write operations and metadata changes.
`write`: Disables write operations. However, metadata changes are still allowed. | + +## Query parameters + +The following table lists the available query parameters. All query parameters are optional. + +| Parameter | Data type | Description | +| :--- | :--- | :--- | +| `ignore_unavailable` | Boolean | When `false`, the request returns an error when it targets a missing or closed index. Default is `false`. +| `allow_no_indices` | Boolean | When `false`, the Refresh Index API returns an error when a wildcard expression, index alias, or `_all` targets only closed or missing indexes, even when the request is made against open indexes. Default is `true`. | +| `expand_wildcards` | String | The type of index that the wildcard patterns can match. If the request targets data streams, this argument determines whether the wildcard expressions match any hidden data streams. Supports comma-separated values, such as `open,hidden`. Valid values are `all`, `open`, `closed`, `hidden`, and `none`. | +`cluster_manager_timeout` | Time | The amount of time to wait for a connection to the cluster manager node. Default is `30s`. +`timeout` | Time | The amount of time to wait for the request to return. Default is `30s`. | + +## Example request + +The following example request disables any `write` operations made to the test index: + +```json +PUT /test-index/_block/write +``` + +## Example response + +```json +{ + "acknowledged" : true, + "shards_acknowledged" : true, + "indices" : [ { + "name" : "test-index", + "blocked" : true + } ] +} +``` \ No newline at end of file From cac23d00f69302830346d8bede0f718bc809db09 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Thu, 8 Aug 2024 10:36:37 -0500 Subject: [PATCH 114/154] Remove missing parameter (#7944) Removes a parameter that does not exist in the Segments API spec. Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _api-reference/index-apis/segment.md | 1 - 1 file changed, 1 deletion(-) diff --git a/_api-reference/index-apis/segment.md b/_api-reference/index-apis/segment.md index a8a7ccaee1..0ecee63e77 100644 --- a/_api-reference/index-apis/segment.md +++ b/_api-reference/index-apis/segment.md @@ -34,7 +34,6 @@ The Segment API supports the following optional query parameters. Parameter | Data type | Description :--- | :--- | :--- `allow_no_indices` | Boolean | Whether to ignore wildcards that don't match any indexes. Default is `true`. -`allow_partial_search_results` | Boolean | Whether to return partial results if the request encounters an error or times out. Default is `true`. `expand_wildcards` | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are `all` (match any index), `open` (match open, non-hidden indexes), `closed` (match closed, non-hidden indexes), `hidden` (match hidden indexes), and `none` (deny wildcard expressions). Default is `open`. `ignore_unavailable` | Boolean | When `true`, OpenSearch ignores missing or closed indexes. If `false`, OpenSearch returns an error if the force merge operation encounters missing or closed indexes. Default is `false`. `verbose` | Boolean | When `true`, provides information about Lucene's memory usage. Default is `false`. From 256ec4e105f5243d2d5df1647bdeb1f0cec76854 Mon Sep 17 00:00:00 2001 From: Craig Perkins Date: Mon, 12 Aug 2024 12:51:29 -0400 Subject: [PATCH 115/154] Add documentation for ignore_hosts config option for ip-based rate limiting (#7859) * Add documentation for ignore_hosts config option for ip-based rate limiting Signed-off-by: Craig Perkins * Update _security/configuration/api-rate-limiting.md Signed-off-by: Melissa Vagi --------- Signed-off-by: Craig Perkins Signed-off-by: Melissa Vagi Co-authored-by: Melissa Vagi --- _security/configuration/api-rate-limiting.md | 31 ++++++++++---------- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/_security/configuration/api-rate-limiting.md b/_security/configuration/api-rate-limiting.md index d5dc230731..a5481bfee1 100644 --- a/_security/configuration/api-rate-limiting.md +++ b/_security/configuration/api-rate-limiting.md @@ -19,14 +19,14 @@ The username rate limiting configuration limits login attempts by username. When ```yml auth_failure_listeners: - internal_authentication_backend_limiting: - type: username - authentication_backend: internal - allowed_tries: 3 - time_window_seconds: 60 - block_expiry_seconds: 60 - max_blocked_clients: 100000 - max_tracked_clients: 100000 + internal_authentication_backend_limiting: + type: username + authentication_backend: internal + allowed_tries: 3 + time_window_seconds: 60 + block_expiry_seconds: 60 + max_blocked_clients: 100000 + max_tracked_clients: 100000 ``` {% include copy.html %} @@ -61,13 +61,13 @@ Second, configure the IP address rate limiting settings. The following example s ```yml auth_failure_listeners: - ip_rate_limiting: - type: ip - allowed_tries: 1 - time_window_seconds: 20 - block_expiry_seconds: 180 - max_blocked_clients: 100000 - max_tracked_clients: 100000 + ip_rate_limiting: + type: ip + allowed_tries: 1 + time_window_seconds: 20 + block_expiry_seconds: 180 + max_blocked_clients: 100000 + max_tracked_clients: 100000 ``` {% include copy.html %} @@ -81,4 +81,5 @@ The following table describes the individual settings for this type of configura | `block_expiry_seconds` | The window of time during which login attempts remain blocked after a failed login. After this time elapses, login is reset and the IP address can attempt to log in again. | | `max_blocked_clients` | The maximum number of blocked IP addresses. This limits heap usage to avoid a potential DoS attack. | | `max_tracked_clients` | The maximum number of tracked IP addresses with failed login attempts. This limits heap usage to avoid a potential DoS attack. | +| `ignore_hosts` | A list of IP addresses or hostname patterns to ignore for rate limiting. `config.dynamic.hosts_resolver_mode` must be set to `ip-hostname` to support hostname matching. | From 43ffd8f769f34c99cf4dc08c20bdf879ff9824b7 Mon Sep 17 00:00:00 2001 From: Andriy Redko Date: Mon, 12 Aug 2024 12:57:34 -0400 Subject: [PATCH 116/154] OpenJDK Update (July 2024 Patch releases) (#7861) Signed-off-by: Andriy Redko --- _install-and-configure/install-opensearch/index.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_install-and-configure/install-opensearch/index.md b/_install-and-configure/install-opensearch/index.md index e1d63927b0..d0c6e242cd 100644 --- a/_install-and-configure/install-opensearch/index.md +++ b/_install-and-configure/install-opensearch/index.md @@ -29,9 +29,9 @@ The OpenSearch distribution for Linux ships with a compatible [Adoptium JDK](htt OpenSearch Version | Compatible Java Versions | Bundled Java Version :---------- | :-------- | :----------- 1.0--1.2.x | 11, 15 | 15.0.1+9 -1.3.x | 8, 11, 14 | 11.0.23+9 +1.3.x | 8, 11, 14 | 11.0.24+8 2.0.0--2.11.x | 11, 17 | 17.0.2+8 -2.12.0 | 11, 17, 21 | 21.0.3+9 +2.12.0+ | 11, 17, 21 | 21.0.4+7 To use a different Java installation, set the `OPENSEARCH_JAVA_HOME` or `JAVA_HOME` environment variable to the Java install location. For example: ```bash From 1e6d6facca461c5f75a5b215218e53c78059c430 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Tue, 13 Aug 2024 08:53:04 -0500 Subject: [PATCH 117/154] Fix links in Operations reference (#7975) * Fix links in Operations reference Signed-off-by: Archer * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _benchmark/reference/workloads/operations.md Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Archer Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Nathan Bower --- _benchmark/reference/workloads/operations.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/_benchmark/reference/workloads/operations.md b/_benchmark/reference/workloads/operations.md index ed6e6b8527..4e89b4ac42 100644 --- a/_benchmark/reference/workloads/operations.md +++ b/_benchmark/reference/workloads/operations.md @@ -10,13 +10,13 @@ nav_order: 100 # operations -The `operations` element contains a list of all available operations for specifying a schedule. +The `operations` element contains a list of all available operations for specifying a schedule. ## bulk -The `bulk` operation type allows you to run [bulk](/api-reference/document-apis/bulk/) requests as a task. +The `bulk` operation type allows you to run [bulk]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/) requests as a task. ### Usage @@ -82,7 +82,7 @@ If `detailed-results` is `true`, the following metadata is returned: ## create-index -The `create-index` operation runs the [Create Index API](/api-reference/index-apis/create-index/). It supports the following two modes of index creation: +The `create-index` operation runs the [Create Index API]({{site.url}}{{site.baseurl}}/api-reference/index-apis/create-index/). It supports the following two index creation modes: - Creating all indexes specified in the workloads `indices` section - Creating one specific index defined within the operation itself @@ -157,7 +157,7 @@ The `create-index` operation returns the following metadata: ## delete-index -The `delete-index` operation runs the [Delete Index API](api-reference/index-apis/delete-index/). Like with the [`create-index`](#create-index) operation, you can delete all indexes found in the `indices` section of the workload or delete one or more indexes based on the string passed in the `index` setting. +The `delete-index` operation runs the [Delete Index API]({{site.url}}{{site.baseurl}}/api-reference/index-apis/delete-index/). As with the [`create-index`](#create-index) operation, you can delete all indexes found in the `indices` section of the workload or delete one or more indexes based on the string passed in the `index` setting. ### Usage @@ -215,7 +215,7 @@ The `delete-index` operation returns the following metadata: ## cluster-health -The `cluster-health` operation runs the [Cluster Health API](api-reference/cluster-api/cluster-health/), which checks the cluster health status and returns the expected status according to the parameters set for `request-params`. If an unexpected cluster health status is returned, the operation reports a failure. You can use the `--on-error` option in the OpenSearch Benchmark `execute-test` command to control how OpenSearch Benchmark behaves when the health check fails. +The `cluster-health` operation runs the [Cluster Health API]({{site.url}}{{site.baseurl}}/api-reference/cluster-api/cluster-health/), which checks the cluster health status and returns the expected status according to the parameters set for `request-params`. If an unexpected cluster health status is returned, then the operation reports a failure. You can use the `--on-error` option in the OpenSearch Benchmark `execute-test` command to control how OpenSearch Benchmark behaves when the health check fails. ### Usage @@ -285,7 +285,7 @@ Parameter | Required | Type | Description ## search -The `search` operation runs the [Search API](/api-reference/search/), which you can use to run queries in OpenSearch Benchmark indexes. +The `search` operation runs the [Search API]({{site.url}}{{site.baseurl}}/api-reference/search/), which you can use to run queries in OpenSearch Benchmark indexes. ### Usage From 22daf21c2d98ca7da63db18957270a094c1132b0 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Tue, 13 Aug 2024 14:37:23 -0500 Subject: [PATCH 118/154] Add IP option to SAN certificate (#7972) * Add IP option to SAN certificate Signed-off-by: Archer * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _security/configuration/generate-certificates.md Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _security/configuration/generate-certificates.md Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Archer Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Nathan Bower --- _security/configuration/generate-certificates.md | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/_security/configuration/generate-certificates.md b/_security/configuration/generate-certificates.md index 4e83ff83d1..2316fd33be 100755 --- a/_security/configuration/generate-certificates.md +++ b/_security/configuration/generate-certificates.md @@ -115,13 +115,21 @@ openssl req -new -key node1-key.pem -out node1.csr For all host and client certificates, you should specify a subject alternative name (SAN) to ensure compliance with [RFC 2818 (HTTP Over TLS)](https://datatracker.ietf.org/doc/html/rfc2818). The SAN should match the corresponding CN so that both refer to the same DNS A record. {: .note } -Before generating a signed certificate, create a SAN extension file which describes the DNS A record for the host: +Before generating a signed certificate, create a SAN extension file that describes the DNS A record for the host. If you're connecting to a host that only has an IP address, either IPv4 or IPv6, use the `IP` syntax: + +**No IP** ```bash echo 'subjectAltName=DNS:node1.dns.a-record' > node1.ext ``` -Generate the certificate: +**With IP** + +```bash +echo subjectAltName=IP:127.0.0.1 > node1.ext +``` + +With the DNS A record described, generate the certificate: ```bash openssl x509 -req -in node1.csr -CA root-ca.pem -CAkey root-ca-key.pem -CAcreateserial -sha256 -out node1.pem -days 730 -extfile node1.ext From ecd2232ac2b6c97dad9f5be5ab4e528a629c5a71 Mon Sep 17 00:00:00 2001 From: zhichao-aws Date: Wed, 14 Aug 2024 04:54:00 +0800 Subject: [PATCH 119/154] Refactor of the neural sparse search tutorial (#7922) * refactor Signed-off-by: zhichao-aws * fix Signed-off-by: zhichao-aws * Doc review Signed-off-by: Fanit Kolchina * Link fix Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: zhichao-aws Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Fanit Kolchina Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- .../processors/sparse-encoding.md | 2 +- _ml-commons-plugin/pretrained-models.md | 8 +- _search-plugins/neural-sparse-search.md | 421 +-------------- .../neural-sparse-with-pipelines.md | 486 ++++++++++++++++++ .../neural-sparse-with-raw-vectors.md | 99 ++++ 5 files changed, 607 insertions(+), 409 deletions(-) create mode 100644 _search-plugins/neural-sparse-with-pipelines.md create mode 100644 _search-plugins/neural-sparse-with-raw-vectors.md diff --git a/_ingest-pipelines/processors/sparse-encoding.md b/_ingest-pipelines/processors/sparse-encoding.md index 38b44320b1..3af6f4e987 100644 --- a/_ingest-pipelines/processors/sparse-encoding.md +++ b/_ingest-pipelines/processors/sparse-encoding.md @@ -141,7 +141,7 @@ The response confirms that in addition to the `passage_text` field, the processo } ``` -Once you have created an ingest pipeline, you need to create an index for ingestion and ingest documents into the index. To learn more, see [Step 2: Create an index for ingestion]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/#step-2-create-an-index-for-ingestion) and [Step 3: Ingest documents into the index]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/#step-3-ingest-documents-into-the-index) of [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/). +Once you have created an ingest pipeline, you need to create an index for ingestion and ingest documents into the index. To learn more, see [Create an index for ingestion]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-with-pipelines/#step-2b-create-an-index-for-ingestion) and [Step 3: Ingest documents into the index]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-with-pipelines/#step-2c-ingest-documents-into-the-index) of [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/). --- diff --git a/_ml-commons-plugin/pretrained-models.md b/_ml-commons-plugin/pretrained-models.md index 30540cfe49..154b8b530f 100644 --- a/_ml-commons-plugin/pretrained-models.md +++ b/_ml-commons-plugin/pretrained-models.md @@ -46,11 +46,13 @@ The following table provides a list of sentence transformer models and artifact Sparse encoding models transfer text into a sparse vector and convert the vector to a list of `` pairs representing the text entry and its corresponding weight in the sparse vector. You can use these models for use cases such as clustering or sparse neural search. -We recommend the following models for optimal performance: +We recommend the following combinations for optimal performance: - Use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` model during both ingestion and search. - Use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1` model during ingestion and the -`amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` model during search. +`amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` tokenizer during search. + +For more information about the preceding options for running neural sparse search, see [Generating sparse vector embeddings within OpenSearch]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-with-pipelines/). The following table provides a list of sparse encoding models and artifact links you can use to download them. @@ -58,7 +60,7 @@ The following table provides a list of sparse encoding models and artifact links |:---|:---|:---|:---|:---| | `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` | 1.0.1 | Yes | - [model_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-encoding-v1-1.0.1-torch_script.zip)
- [config_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v1/1.0.1/torch_script/config.json) | A neural sparse encoding model. The model transforms text into a sparse vector, identifies the indexes of non-zero elements in the vector, and then converts the vector into `` pairs, where each entry corresponds to a non-zero element index. To experiment with this model using transformers and the PyTorch API, see the [HuggingFace documentation](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-v1). | | `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1` | 1.0.1 | Yes | - [model_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-encoding-doc-v1-1.0.1-torch_script.zip)
- [config_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1/1.0.1/torch_script/config.json) | A neural sparse encoding model. The model transforms text into a sparse vector, identifies the indexes of non-zero elements in the vector, and then converts the vector into `` pairs, where each entry corresponds to a non-zero element index. To experiment with this model using transformers and the PyTorch API, see the [HuggingFace documentation](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v1). | -| `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` | 1.0.1 | Yes | - [model_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-tokenizer-v1-1.0.1-torch_script.zip)
- [config_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1/1.0.1/torch_script/config.json) | A neural sparse tokenizer model. The model tokenizes text into tokens and assigns each token a predefined weight, which is the token's inverse document frequency (IDF). If the IDF file is not provided, the weight defaults to 1. For more information, see [Preparing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/custom-local-models/#preparing-a-model). | +| `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` | 1.0.1 | Yes | - [model_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-tokenizer-v1-1.0.1-torch_script.zip)
- [config_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1/1.0.1/torch_script/config.json) | A neural sparse tokenizer. The tokenizer splits text into tokens and assigns each token a predefined weight, which is the token's inverse document frequency (IDF). If the IDF file is not provided, the weight defaults to 1. For more information, see [Preparing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/custom-local-models/#preparing-a-model). | ### Cross-encoder models **Introduced 2.12** diff --git a/_search-plugins/neural-sparse-search.md b/_search-plugins/neural-sparse-search.md index 8aa2ff7dbf..0beee26ef0 100644 --- a/_search-plugins/neural-sparse-search.md +++ b/_search-plugins/neural-sparse-search.md @@ -2,7 +2,7 @@ layout: default title: Neural sparse search nav_order: 50 -has_children: false +has_children: true redirect_from: - /search-plugins/neural-sparse-search/ - /search-plugins/sparse-search/ @@ -14,261 +14,20 @@ Introduced 2.11 [Semantic search]({{site.url}}{{site.baseurl}}/search-plugins/semantic-search/) relies on dense retrieval that is based on text embedding models. However, dense methods use k-NN search, which consumes a large amount of memory and CPU resources. An alternative to semantic search, neural sparse search is implemented using an inverted index and is thus as efficient as BM25. Neural sparse search is facilitated by sparse embedding models. When you perform a neural sparse search, it creates a sparse vector (a list of `token: weight` key-value pairs representing an entry and its weight) and ingests data into a rank features index. -When selecting a model, choose one of the following options: +To further boost search relevance, you can combine neural sparse search with dense [semantic search]({{site.url}}{{site.baseurl}}/search-plugins/semantic-search/) using a [hybrid query]({{site.url}}{{site.baseurl}}/query-dsl/compound/hybrid/). -- Use a sparse encoding model at both ingestion time and search time for better search relevance at the expense of relatively high latency. -- Use a sparse encoding model at ingestion time and a tokenizer at search time for lower search latency at the expense of relatively lower search relevance. Tokenization doesn't involve model inference, so you can deploy and invoke a tokenizer using the ML Commons Model API for a more streamlined experience. +You can configure neural sparse search in the following ways: -**PREREQUISITE**
-Before using neural sparse search, make sure to set up a [pretrained sparse embedding model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#sparse-encoding-models) or your own sparse embedding model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model). -{: .note} +- Generate vector embeddings within OpenSearch: Configure an ingest pipeline to generate and store sparse vector embeddings from document text at ingestion time. At query time, input plain text, which will be automatically converted into vector embeddings for search. For complete setup steps, see [Configuring ingest pipelines for neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-with-pipelines/). +- Ingest raw sparse vectors and search using sparse vectors directly. For complete setup steps, see [Ingesting and searching raw vectors]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-with-raw-vectors/). -## Using neural sparse search +To learn more about splitting long text into passages for neural search, see [Text chunking]({{site.url}}{{site.baseurl}}/search-plugins/text-chunking/). -To use neural sparse search, follow these steps: +## Accelerating neural sparse search -1. [Create an ingest pipeline](#step-1-create-an-ingest-pipeline). -1. [Create an index for ingestion](#step-2-create-an-index-for-ingestion). -1. [Ingest documents into the index](#step-3-ingest-documents-into-the-index). -1. [Search the index using neural search](#step-4-search-the-index-using-neural-sparse-search). -1. _Optional_ [Create and enable the two-phase processor](#step-5-create-and-enable-the-two-phase-processor-optional). +Starting with OpenSearch version 2.15, you can significantly accelerate the search process by creating a search pipeline with a `neural_sparse_two_phase_processor`. -## Step 1: Create an ingest pipeline - -To generate vector embeddings, you need to create an [ingest pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/) that contains a [`sparse_encoding` processor]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/processors/sparse-encoding/), which will convert the text in a document field to vector embeddings. The processor's `field_map` determines the input fields from which to generate vector embeddings and the output fields in which to store the embeddings. - -The following example request creates an ingest pipeline where the text from `passage_text` will be converted into text embeddings and the embeddings will be stored in `passage_embedding`: - -```json -PUT /_ingest/pipeline/nlp-ingest-pipeline-sparse -{ - "description": "An sparse encoding ingest pipeline", - "processors": [ - { - "sparse_encoding": { - "model_id": "aP2Q8ooBpBj3wT4HVS8a", - "field_map": { - "passage_text": "passage_embedding" - } - } - } - ] -} -``` -{% include copy-curl.html %} - -To split long text into passages, use the `text_chunking` ingest processor before the `sparse_encoding` processor. For more information, see [Text chunking]({{site.url}}{{site.baseurl}}/search-plugins/text-chunking/). - - -## Step 2: Create an index for ingestion - -In order to use the text embedding processor defined in your pipeline, create a rank features index, adding the pipeline created in the previous step as the default pipeline. Ensure that the fields defined in the `field_map` are mapped as correct types. Continuing with the example, the `passage_embedding` field must be mapped as [`rank_features`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/rank/#rank-features). Similarly, the `passage_text` field should be mapped as `text`. - -The following example request creates a rank features index that is set up with a default ingest pipeline: - -```json -PUT /my-nlp-index -{ - "settings": { - "default_pipeline": "nlp-ingest-pipeline-sparse" - }, - "mappings": { - "properties": { - "id": { - "type": "text" - }, - "passage_embedding": { - "type": "rank_features" - }, - "passage_text": { - "type": "text" - } - } - } -} -``` -{% include copy-curl.html %} - -To save disk space, you can exclude the embedding vector from the source as follows: - -```json -PUT /my-nlp-index -{ - "settings": { - "default_pipeline": "nlp-ingest-pipeline-sparse" - }, - "mappings": { - "_source": { - "excludes": [ - "passage_embedding" - ] - }, - "properties": { - "id": { - "type": "text" - }, - "passage_embedding": { - "type": "rank_features" - }, - "passage_text": { - "type": "text" - } - } - } -} -``` -{% include copy-curl.html %} - -Once the `` pairs are excluded from the source, they cannot be recovered. Before applying this optimization, make sure you don't need the `` pairs for your application. -{: .important} - -## Step 3: Ingest documents into the index - -To ingest documents into the index created in the previous step, send the following requests: - -```json -PUT /my-nlp-index/_doc/1 -{ - "passage_text": "Hello world", - "id": "s1" -} -``` -{% include copy-curl.html %} - -```json -PUT /my-nlp-index/_doc/2 -{ - "passage_text": "Hi planet", - "id": "s2" -} -``` -{% include copy-curl.html %} - -Before the document is ingested into the index, the ingest pipeline runs the `sparse_encoding` processor on the document, generating vector embeddings for the `passage_text` field. The indexed document includes the `passage_text` field, which contains the original text, and the `passage_embedding` field, which contains the vector embeddings. - -## Step 4: Search the index using neural sparse search - -To perform a neural sparse search on your index, use the `neural_sparse` query clause in [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/) queries. - -The following example request uses a `neural_sparse` query to search for relevant documents using a raw text query: - -```json -GET my-nlp-index/_search -{ - "query": { - "neural_sparse": { - "passage_embedding": { - "query_text": "Hi world", - "model_id": "aP2Q8ooBpBj3wT4HVS8a" - } - } - } -} -``` -{% include copy-curl.html %} - -The response contains the matching documents: - -```json -{ - "took" : 688, - "timed_out" : false, - "_shards" : { - "total" : 1, - "successful" : 1, - "skipped" : 0, - "failed" : 0 - }, - "hits" : { - "total" : { - "value" : 2, - "relation" : "eq" - }, - "max_score" : 30.0029, - "hits" : [ - { - "_index" : "my-nlp-index", - "_id" : "1", - "_score" : 30.0029, - "_source" : { - "passage_text" : "Hello world", - "passage_embedding" : { - "!" : 0.8708904, - "door" : 0.8587369, - "hi" : 2.3929274, - "worlds" : 2.7839446, - "yes" : 0.75845814, - "##world" : 2.5432441, - "born" : 0.2682308, - "nothing" : 0.8625516, - "goodbye" : 0.17146169, - "greeting" : 0.96817183, - "birth" : 1.2788506, - "come" : 0.1623208, - "global" : 0.4371151, - "it" : 0.42951578, - "life" : 1.5750692, - "thanks" : 0.26481047, - "world" : 4.7300377, - "tiny" : 0.5462298, - "earth" : 2.6555297, - "universe" : 2.0308156, - "worldwide" : 1.3903781, - "hello" : 6.696973, - "so" : 0.20279501, - "?" : 0.67785245 - }, - "id" : "s1" - } - }, - { - "_index" : "my-nlp-index", - "_id" : "2", - "_score" : 16.480486, - "_source" : { - "passage_text" : "Hi planet", - "passage_embedding" : { - "hi" : 4.338913, - "planets" : 2.7755864, - "planet" : 5.0969057, - "mars" : 1.7405145, - "earth" : 2.6087382, - "hello" : 3.3210192 - }, - "id" : "s2" - } - } - ] - } -} -``` - -You can also use the `neural_sparse` query with sparse vector embeddings: -```json -GET my-nlp-index/_search -{ - "query": { - "neural_sparse": { - "passage_embedding": { - "query_tokens": { - "hi" : 4.338913, - "planets" : 2.7755864, - "planet" : 5.0969057, - "mars" : 1.7405145, - "earth" : 2.6087382, - "hello" : 3.3210192 - } - } - } - } -} -``` -## Step 5: Create and enable the two-phase processor (Optional) - - -The `neural_sparse_two_phase_processor` is a new feature introduced in OpenSearch 2.15. Using the two-phase processor can significantly improve the performance of neural sparse queries. - -To quickly launch a search pipeline with neural sparse search, use the following example pipeline: +To create a search pipeline with a two-phase processor for neural sparse search, use the following request: ```json PUT /_search/pipeline/two_phase_search_pipeline @@ -277,7 +36,7 @@ PUT /_search/pipeline/two_phase_search_pipeline { "neural_sparse_two_phase_processor": { "tag": "neural-sparse", - "description": "This processor is making two-phase processor." + "description": "Creates a two-phase processor for neural sparse search." } } ] @@ -286,166 +45,18 @@ PUT /_search/pipeline/two_phase_search_pipeline {% include copy-curl.html %} Then choose the index you want to configure with the search pipeline and set the `index.search.default_pipeline` to the pipeline name, as shown in the following example: -```json -PUT /index-name/_settings -{ - "index.search.default_pipeline" : "two_phase_search_pipeline" -} -``` -{% include copy-curl.html %} - - - -## Setting a default model on an index or field - -A [`neural_sparse`]({{site.url}}{{site.baseurl}}/query-dsl/specialized/neural-sparse/) query requires a model ID for generating sparse embeddings. To eliminate passing the model ID with each neural_sparse query request, you can set a default model on index-level or field-level. - -First, create a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) with a [`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) request processor. To set a default model for an index, provide the model ID in the `default_model_id` parameter. To set a default model for a specific field, provide the field name and the corresponding model ID in the `neural_field_default_id` map. If you provide both `default_model_id` and `neural_field_default_id`, `neural_field_default_id` takes precedence: - -```json -PUT /_search/pipeline/default_model_pipeline -{ - "request_processors": [ - { - "neural_query_enricher" : { - "default_model_id": "bQ1J8ooBpBj3wT4HVUsb", - "neural_field_default_id": { - "my_field_1": "uZj0qYoBMtvQlfhaYeud", - "my_field_2": "upj0qYoBMtvQlfhaZOuM" - } - } - } - ] -} -``` -{% include copy-curl.html %} - -Then set the default model for your index: - -```json -PUT /my-nlp-index/_settings -{ - "index.search.default_pipeline" : "default_model_pipeline" -} -``` -{% include copy-curl.html %} - -You can now omit the model ID when searching: ```json -GET /my-nlp-index/_search +PUT /my-nlp-index/_settings { - "query": { - "neural_sparse": { - "passage_embedding": { - "query_text": "Hi world" - } - } - } + "index.search.default_pipeline" : "two_phase_search_pipeline" } ``` {% include copy-curl.html %} -The response contains both documents: - -```json -{ - "took" : 688, - "timed_out" : false, - "_shards" : { - "total" : 1, - "successful" : 1, - "skipped" : 0, - "failed" : 0 - }, - "hits" : { - "total" : { - "value" : 2, - "relation" : "eq" - }, - "max_score" : 30.0029, - "hits" : [ - { - "_index" : "my-nlp-index", - "_id" : "1", - "_score" : 30.0029, - "_source" : { - "passage_text" : "Hello world", - "passage_embedding" : { - "!" : 0.8708904, - "door" : 0.8587369, - "hi" : 2.3929274, - "worlds" : 2.7839446, - "yes" : 0.75845814, - "##world" : 2.5432441, - "born" : 0.2682308, - "nothing" : 0.8625516, - "goodbye" : 0.17146169, - "greeting" : 0.96817183, - "birth" : 1.2788506, - "come" : 0.1623208, - "global" : 0.4371151, - "it" : 0.42951578, - "life" : 1.5750692, - "thanks" : 0.26481047, - "world" : 4.7300377, - "tiny" : 0.5462298, - "earth" : 2.6555297, - "universe" : 2.0308156, - "worldwide" : 1.3903781, - "hello" : 6.696973, - "so" : 0.20279501, - "?" : 0.67785245 - }, - "id" : "s1" - } - }, - { - "_index" : "my-nlp-index", - "_id" : "2", - "_score" : 16.480486, - "_source" : { - "passage_text" : "Hi planet", - "passage_embedding" : { - "hi" : 4.338913, - "planets" : 2.7755864, - "planet" : 5.0969057, - "mars" : 1.7405145, - "earth" : 2.6087382, - "hello" : 3.3210192 - }, - "id" : "s2" - } - } - ] - } -} -``` - -## Next steps - -- To learn more about splitting long text into passages for neural search, see [Text chunking]({{site.url}}{{site.baseurl}}/search-plugins/text-chunking/). - -## FAQ - -Refer to the following frequently asked questions for more information about neural sparse search. - -### How do I mitigate remote connector throttling exceptions? - -When using connectors to call a remote service like SageMaker, ingestion and search calls sometimes fail due to remote connector throttling exceptions. - -To mitigate throttling exceptions, modify the connector's [`client_config`]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/blueprints/#configuration-parameters) parameter to decrease the number of maximum connections, using the `max_connection` setting to prevent the maximum number of concurrent connections from exceeding the threshold of the remote service. You can also modify the retry settings to flatten the request spike during ingestion. - -For versions earlier than OpenSearch 2.15, the SageMaker throttling exception will be thrown as the following "error": - -``` - { - "type": "status_exception", - "reason": "Error from remote service: {\"message\":null}" - } -``` - +For information about `two_phase_search_pipeline`, see [Neural sparse query two-phase processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-sparse-query-two-phase-processor/). -## Next steps +## Further reading -- To learn more about splitting long text into passages for neural search, see [Text chunking]({{site.url}}{{site.baseurl}}/search-plugins/text-chunking/). +- Learn more about how sparse encoding models work and explore OpenSearch neural sparse search benchmarks in [Improving document retrieval with sparse semantic encoders](https://opensearch.org/blog/improving-document-retrieval-with-sparse-semantic-encoders/). +- Learn the fundamentals of neural sparse search and its efficiency in [A deep dive into faster semantic sparse retrieval in OpenSearch 2.12](https://opensearch.org/blog/A-deep-dive-into-faster-semantic-sparse-retrieval-in-OS-2.12/). diff --git a/_search-plugins/neural-sparse-with-pipelines.md b/_search-plugins/neural-sparse-with-pipelines.md new file mode 100644 index 0000000000..fea2f0d795 --- /dev/null +++ b/_search-plugins/neural-sparse-with-pipelines.md @@ -0,0 +1,486 @@ +--- +layout: default +title: Configuring ingest pipelines +parent: Neural sparse search +nav_order: 10 +has_children: false +--- + +# Configuring ingest pipelines for neural sparse search + +Generating sparse vector embeddings within OpenSearch enables neural sparse search to function like lexical search. To take advantage of this encapsulation, set up an ingest pipeline to create and store sparse vector embeddings from document text during ingestion. At query time, input plain text, which will be automatically converted into vector embeddings for search. + +For this tutorial, you'll use neural sparse search with OpenSearch's built-in machine learning (ML) model hosting and ingest pipelines. Because the transformation of text to embeddings is performed within OpenSearch, you'll use text when ingesting and searching documents. + +At ingestion time, neural sparse search uses a sparse encoding model to generate sparse vector embeddings from text fields. + +At query time, neural sparse search operates in one of two search modes: + +- **Bi-encoder mode** (requires a sparse encoding model): A sparse encoding model generates sparse vector embeddings from query text. This approach provides better search relevance at the cost of a slight increase in latency. + +- **Doc-only mode** (requires a sparse encoding model and a tokenizer): A sparse encoding model generates sparse vector embeddings from query text. In this mode, neural sparse search tokenizes query text using a tokenizer and obtains the token weights from a lookup table. This approach provides faster retrieval at the cost of a slight decrease in search relevance. The tokenizer is deployed and invoked using the [Model API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/index/) for a uniform neural sparse search experience. + +For more information about choosing the neural sparse search mode that best suits your workload, see [Choose the search mode](#step-1a-choose-the-search-mode). + +## Tutorial + +This tutorial consists of the following steps: + +1. [**Configure a sparse encoding model/tokenizer**](#step-1-configure-a-sparse-encoding-modeltokenizer). + 1. [Choose the search mode](#step-1a-choose-the-search-mode) + 1. [Register the model/tokenizer](#step-1b-register-the-modeltokenizer) + 1. [Deploy the model/tokenizer](#step-1c-deploy-the-modeltokenizer) +1. [**Ingest data**](#step-2-ingest-data) + 1. [Create an ingest pipeline](#step-2a-create-an-ingest-pipeline) + 1. [Create an index for ingestion](#step-2b-create-an-index-for-ingestion) + 1. [Ingest documents into the index](#step-2c-ingest-documents-into-the-index) +1. [**Search the data**](#step-3-search-the-data) + +### Prerequisites + +Before you start, complete the [prerequisites]({{site.url}}{{site.baseurl}}/search-plugins/neural-search-tutorial/#prerequisites) for neural search. + +## Step 1: Configure a sparse encoding model/tokenizer + +Both the bi-encoder and doc-only search modes require you to configure a sparse encoding model. Doc-only mode requires you to configure a tokenizer in addition to the model. + +### Step 1(a): Choose the search mode + +Choose the search mode and the appropriate model/tokenizer combination: + +- **Bi-encoder**: Use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` model during both ingestion and search. + +- **Doc-only**: Use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1` model during ingestion and the `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` tokenizer during search. + +The following table provides a search relevance comparison for the two search modes so that you can choose the best mode for your use case. + +| Mode | Ingestion model | Search model | Avg search relevance on BEIR | Model parameters | +|-----------|---------------------------------------------------------------|---------------------------------------------------------------|------------------------------|------------------| +| Doc-only | `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1` | `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` | 0.49 | 133M | +| Bi-encoder| `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` | `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` | 0.524 | 133M | + +### Step 1(b): Register the model/tokenizer + +When you register a model/tokenizer, OpenSearch creates a model group for the model/tokenizer. You can also explicitly create a model group before registering models. For more information, see [Model access control]({{site.url}}{{site.baseurl}}/ml-commons-plugin/model-access-control/). + +#### Bi-encoder mode + +When using bi-encoder mode, you only need to register the `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` model. + +Register the sparse encoding model: + +```json +POST /_plugins/_ml/models/_register?deploy=true +{ + "name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-v1", + "version": "1.0.1", + "model_format": "TORCH_SCRIPT" +} +``` +{% include copy-curl.html %} + +Registering a model is an asynchronous task. OpenSearch returns a task ID for every model you register: + +```json +{ + "task_id": "aFeif4oB5Vm0Tdw8yoN7", + "status": "CREATED" +} +``` + +You can check the status of the task by calling the Tasks API: + +```json +GET /_plugins/_ml/tasks/aFeif4oB5Vm0Tdw8yoN7 +``` +{% include copy-curl.html %} + +Once the task is complete, the task state will change to `COMPLETED` and the Tasks API response will contain the model ID of the registered model: + +```json +{ + "model_id": "", + "task_type": "REGISTER_MODEL", + "function_name": "SPARSE_ENCODING", + "state": "COMPLETED", + "worker_node": [ + "4p6FVOmJRtu3wehDD74hzQ" + ], + "create_time": 1694358489722, + "last_update_time": 1694358499139, + "is_async": true +} +``` + +Note the `model_id` of the model you've created; you'll need it for the following steps. + +#### Doc-only mode + +When using doc-only mode, you need to register the `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1` model, which you'll use at ingestion time, and the `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` tokenizer, which you'll use at search time. + +Register the sparse encoding model: + +```json +POST /_plugins/_ml/models/_register?deploy=true +{ + "name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1", + "version": "1.0.1", + "model_format": "TORCH_SCRIPT" +} +``` +{% include copy-curl.html %} + +Register the tokenizer: + +```json +POST /_plugins/_ml/models/_register?deploy=true +{ + "name": "amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1", + "version": "1.0.1", + "model_format": "TORCH_SCRIPT" +} +``` +{% include copy-curl.html %} + +Like in the bi-encoder mode, use the Tasks API to check the status of the registration task. After the Tasks API returns the task state as `COMPLETED`. Note the `model_id` of the model and the tokenizer you've created; you'll need them for the following steps. + +### Step 1(c): Deploy the model/tokenizer + +Next, you'll need to deploy the model/tokenizer you registered. Deploying a model creates a model instance and caches the model in memory. + +#### Bi-encoder mode + +To deploy the model, provide its model ID to the `_deploy` endpoint: + +```json +POST /_plugins/_ml/models//_deploy +``` +{% include copy-curl.html %} + +As with the register operation, the deploy operation is asynchronous, so you'll get a task ID in the response: + +```json +{ + "task_id": "ale6f4oB5Vm0Tdw8NINO", + "status": "CREATED" +} +``` + +You can check the status of the task by using the Tasks API: + +```json +GET /_plugins/_ml/tasks/ale6f4oB5Vm0Tdw8NINO +``` +{% include copy-curl.html %} + +Once the task is complete, the task state will change to `COMPLETED`: + +```json +{ + "model_id": "", + "task_type": "DEPLOY_MODEL", + "function_name": "SPARSE_ENCODING", + "state": "COMPLETED", + "worker_node": [ + "4p6FVOmJRtu3wehDD74hzQ" + ], + "create_time": 1694360024141, + "last_update_time": 1694360027940, + "is_async": true +} +``` + +#### Doc-only mode + +To deploy the model, provide its model ID to the `_deploy` endpoint: + +```json +POST /_plugins/_ml/models//_deploy +``` +{% include copy-curl.html %} + +You can deploy the tokenizer in the same way: + +```json +POST /_plugins/_ml/models//_deploy +``` +{% include copy-curl.html %} + +As with bi-encoder mode, you can check the status of both deploy tasks by using the Tasks API. Once the task is complete, the task state will change to `COMPLETED`. + +## Step 2: Ingest data + +In both the bi-encoder and doc-only modes, you'll use a sparse encoding model at ingestion time to generate sparse vector embeddings. + +### Step 2(a): Create an ingest pipeline + +To generate sparse vector embeddings, you need to create an [ingest pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/) that contains a [`sparse_encoding` processor]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/processors/sparse-encoding/), which will convert the text in a document field to vector embeddings. The processor's `field_map` determines the input fields from which to generate vector embeddings and the output fields in which to store the embeddings. + +The following example request creates an ingest pipeline where the text from `passage_text` will be converted into sparse vector embeddings, which will be stored in `passage_embedding`. Provide the model ID of the registered model in the request: + +```json +PUT /_ingest/pipeline/nlp-ingest-pipeline-sparse +{ + "description": "An sparse encoding ingest pipeline", + "processors": [ + { + "sparse_encoding": { + "model_id": "", + "field_map": { + "passage_text": "passage_embedding" + } + } + } + ] +} +``` +{% include copy-curl.html %} + +To split long text into passages, use the `text_chunking` ingest processor before the `sparse_encoding` processor. For more information, see [Text chunking]({{site.url}}{{site.baseurl}}/search-plugins/text-chunking/). + +### Step 2(b): Create an index for ingestion + +In order to use the sparse encoding processor defined in your pipeline, create a rank features index, adding the pipeline created in the previous step as the default pipeline. Ensure that the fields defined in the `field_map` are mapped as correct types. Continuing with the example, the `passage_embedding` field must be mapped as [`rank_features`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/rank/#rank-features). Similarly, the `passage_text` field must be mapped as `text`. + +The following example request creates a rank features index configured with a default ingest pipeline: + +```json +PUT /my-nlp-index +{ + "settings": { + "default_pipeline": "nlp-ingest-pipeline-sparse" + }, + "mappings": { + "properties": { + "id": { + "type": "text" + }, + "passage_embedding": { + "type": "rank_features" + }, + "passage_text": { + "type": "text" + } + } + } +} +``` +{% include copy-curl.html %} + +To save disk space, you can exclude the embedding vector from the source as follows: + +```json +PUT /my-nlp-index +{ + "settings": { + "default_pipeline": "nlp-ingest-pipeline-sparse" + }, + "mappings": { + "_source": { + "excludes": [ + "passage_embedding" + ] + }, + "properties": { + "id": { + "type": "text" + }, + "passage_embedding": { + "type": "rank_features" + }, + "passage_text": { + "type": "text" + } + } + } +} +``` +{% include copy-curl.html %} + +Once the `` pairs are excluded from the source, they cannot be recovered. Before applying this optimization, make sure you don't need the `` pairs for your application. +{: .important} + +### Step 2(c): Ingest documents into the index + +To ingest documents into the index created in the previous step, send the following requests: + +```json +PUT /my-nlp-index/_doc/1 +{ + "passage_text": "Hello world", + "id": "s1" +} +``` +{% include copy-curl.html %} + +```json +PUT /my-nlp-index/_doc/2 +{ + "passage_text": "Hi planet", + "id": "s2" +} +``` +{% include copy-curl.html %} + +Before the document is ingested into the index, the ingest pipeline runs the `sparse_encoding` processor on the document, generating vector embeddings for the `passage_text` field. The indexed document includes the `passage_text` field, which contains the original text, and the `passage_embedding` field, which contains the vector embeddings. + +## Step 3: Search the data + +To perform a neural sparse search on your index, use the `neural_sparse` query clause in [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/) queries. + +The following example request uses a `neural_sparse` query to search for relevant documents using a raw text query. Provide the model ID for bi-encoder mode or the tokenizer ID for doc-only mode: + +```json +GET my-nlp-index/_search +{ + "query": { + "neural_sparse": { + "passage_embedding": { + "query_text": "Hi world", + "model_id": "" + } + } + } +} +``` +{% include copy-curl.html %} + +The response contains the matching documents: + +```json +{ + "took" : 688, + "timed_out" : false, + "_shards" : { + "total" : 1, + "successful" : 1, + "skipped" : 0, + "failed" : 0 + }, + "hits" : { + "total" : { + "value" : 2, + "relation" : "eq" + }, + "max_score" : 30.0029, + "hits" : [ + { + "_index" : "my-nlp-index", + "_id" : "1", + "_score" : 30.0029, + "_source" : { + "passage_text" : "Hello world", + "passage_embedding" : { + "!" : 0.8708904, + "door" : 0.8587369, + "hi" : 2.3929274, + "worlds" : 2.7839446, + "yes" : 0.75845814, + "##world" : 2.5432441, + "born" : 0.2682308, + "nothing" : 0.8625516, + "goodbye" : 0.17146169, + "greeting" : 0.96817183, + "birth" : 1.2788506, + "come" : 0.1623208, + "global" : 0.4371151, + "it" : 0.42951578, + "life" : 1.5750692, + "thanks" : 0.26481047, + "world" : 4.7300377, + "tiny" : 0.5462298, + "earth" : 2.6555297, + "universe" : 2.0308156, + "worldwide" : 1.3903781, + "hello" : 6.696973, + "so" : 0.20279501, + "?" : 0.67785245 + }, + "id" : "s1" + } + }, + { + "_index" : "my-nlp-index", + "_id" : "2", + "_score" : 16.480486, + "_source" : { + "passage_text" : "Hi planet", + "passage_embedding" : { + "hi" : 4.338913, + "planets" : 2.7755864, + "planet" : 5.0969057, + "mars" : 1.7405145, + "earth" : 2.6087382, + "hello" : 3.3210192 + }, + "id" : "s2" + } + } + ] + } +} +``` + +## Accelerating neural sparse search + +To learn more about improving retrieval time for neural sparse search, see [Accelerating neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/#accelerating-neural-sparse-search). + +## Creating a search pipeline for neural sparse search + +You can create a search pipeline that augments neural sparse search functionality by: + +- Accelerating neural sparse search for faster retrieval. +- Setting the default model ID on an index for easier use. + +To configure the pipeline, add a [`neural_sparse_two_phase_processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-sparse-query-two-phase-processor/) or a [`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) processor. The following request creates a pipeline with both processors: + +```json +PUT /_search/pipeline/neural_search_pipeline +{ + "request_processors": [ + { + "neural_sparse_two_phase_processor": { + "tag": "neural-sparse", + "description": "Creates a two-phase processor for neural sparse search." + } + }, + { + "neural_query_enricher" : { + "default_model_id": "" + } + } + ] +} +``` +{% include copy-curl.html %} + +Then set the default pipeline for your index to the newly created search pipeline: + +```json +PUT /my-nlp-index/_settings +{ + "index.search.default_pipeline" : "neural_search_pipeline" +} +``` +{% include copy-curl.html %} + +For more information about setting a default model on an index, or to learn how to set a default model on a specific field, see [Setting a default model on an index or field]({{site.url}}{{site.baseurl}}/search-plugins/semantic-search/#setting-a-default-model-on-an-index-or-field). + +## Troubleshooting + +This section contains information about resolving common issues encountered while running neural sparse search. + +### Remote connector throttling exceptions + +When using connectors to call a remote service such as Amazon SageMaker, ingestion and search calls sometimes fail because of remote connector throttling exceptions. + +For OpenSearch versions earlier than 2.15, a throttling exception will be returned as an error from the remote service: + +```json +{ + "type": "status_exception", + "reason": "Error from remote service: {\"message\":null}" +} +``` + +To mitigate throttling exceptions, decrease the maximum number of connections specified in the `max_connection` setting in the connector's [`client_config`]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/blueprints/#configuration-parameters) object. Doing so will prevent the maximum number of concurrent connections from exceeding the threshold of the remote service. You can also modify the retry settings to avoid a request spike during ingestion. \ No newline at end of file diff --git a/_search-plugins/neural-sparse-with-raw-vectors.md b/_search-plugins/neural-sparse-with-raw-vectors.md new file mode 100644 index 0000000000..d69a789a1d --- /dev/null +++ b/_search-plugins/neural-sparse-with-raw-vectors.md @@ -0,0 +1,99 @@ +--- +layout: default +title: Using raw vectors +parent: Neural sparse search +nav_order: 20 +has_children: false +--- + +# Using raw vectors for neural sparse search + +If you're using self-hosted sparse embedding models, you can ingest raw sparse vectors and use neural sparse search. + +## Tutorial + +This tutorial consists of the following steps: + +1. [**Ingest sparse vectors**](#step-1-ingest-sparse-vectors) + 1. [Create an index](#step-1a-create-an-index) + 1. [Ingest documents into the index](#step-1b-ingest-documents-into-the-index) +1. [**Search the data using raw sparse vector**](#step-2-search-the-data-using-a-sparse-vector). + + +## Step 1: Ingest sparse vectors + +Once you have generated sparse vector embeddings, you can directly ingest them into OpenSearch. + +### Step 1(a): Create an index + +In order to ingest documents containing raw sparse vectors, create a rank features index: + +```json +PUT /my-nlp-index +{ + "mappings": { + "properties": { + "id": { + "type": "text" + }, + "passage_embedding": { + "type": "rank_features" + }, + "passage_text": { + "type": "text" + } + } + } +} +``` +{% include copy-curl.html %} + +### Step 1(b): Ingest documents into the index + +To ingest documents into the index created in the previous step, send the following request: + +```json +PUT /my-nlp-index/_doc/1 +{ + "passage_text": "Hello world", + "id": "s1", + "passage_embedding": { + "hi" : 4.338913, + "planets" : 2.7755864, + "planet" : 5.0969057, + "mars" : 1.7405145, + "earth" : 2.6087382, + "hello" : 3.3210192 + } +} +``` +{% include copy-curl.html %} + +## Step 2: Search the data using a sparse vector + +To search the documents using a sparse vector, provide the sparse embeddings in the `neural_sparse` query: + +```json +GET my-nlp-index/_search +{ + "query": { + "neural_sparse": { + "passage_embedding": { + "query_tokens": { + "hi" : 4.338913, + "planets" : 2.7755864, + "planet" : 5.0969057, + "mars" : 1.7405145, + "earth" : 2.6087382, + "hello" : 3.3210192 + } + } + } + } +} +``` +{% include copy-curl.html %} + +## Accelerating neural sparse search + +To learn more about improving retrieval time for neural sparse search, see [Accelerating neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/#accelerating-neural-sparse-search). From 61396f73690bfb6bc8694cbe7cc38135e41ca01f Mon Sep 17 00:00:00 2001 From: Daniel Widdis Date: Thu, 15 Aug 2024 06:39:39 -0700 Subject: [PATCH 120/154] Add missing link to Get model group API (#7992) Signed-off-by: Daniel Widdis --- _ml-commons-plugin/api/model-group-apis/index.md | 1 + 1 file changed, 1 insertion(+) diff --git a/_ml-commons-plugin/api/model-group-apis/index.md b/_ml-commons-plugin/api/model-group-apis/index.md index 6df8b3e8fe..85dabf3c3b 100644 --- a/_ml-commons-plugin/api/model-group-apis/index.md +++ b/_ml-commons-plugin/api/model-group-apis/index.md @@ -13,5 +13,6 @@ ML Commons supports the following model-group-level APIs: - [Register model group]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-group-apis/register-model-group/) - [Update model group]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-group-apis/update-model-group/) +- [Get model group]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-group-apis/get-model-group/) - [Search model group]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-group-apis/search-model-group/) - [Delete model group]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-group-apis/delete-model-group/) \ No newline at end of file From d9d19fcec6da1d0922c96350214fec1215be195f Mon Sep 17 00:00:00 2001 From: Pawel Wlodarczyk Date: Thu, 15 Aug 2024 14:40:29 +0100 Subject: [PATCH 121/154] Update rolling-upgrade.md (#7993) Signed-off-by: Pawel Wlodarczyk --- _install-and-configure/upgrade-opensearch/rolling-upgrade.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_install-and-configure/upgrade-opensearch/rolling-upgrade.md b/_install-and-configure/upgrade-opensearch/rolling-upgrade.md index f6b0470b66..1e4145e7ba 100644 --- a/_install-and-configure/upgrade-opensearch/rolling-upgrade.md +++ b/_install-and-configure/upgrade-opensearch/rolling-upgrade.md @@ -181,7 +181,7 @@ Review [Upgrading OpenSearch]({{site.url}}{{site.baseurl}}/upgrade-opensearch/in "active_shards_percent_as_number" : 100.0 } ``` -1. Repeat steps 5 through 11 for each node in your cluster. Remember to upgrade an eligible cluster manager node last. After replacing the last node, query the `_cat/nodes` endpoint to confirm that all nodes have joined the cluster. The cluster is now bootstrapped to the new version of OpenSearch. You can verify the cluster version by querying the `_cat/nodes` API endpoint: +1. Repeat steps 2 through 11 for each node in your cluster. Remember to upgrade an eligible cluster manager node last. After replacing the last node, query the `_cat/nodes` endpoint to confirm that all nodes have joined the cluster. The cluster is now bootstrapped to the new version of OpenSearch. You can verify the cluster version by querying the `_cat/nodes` API endpoint: ```bash GET "/_cat/nodes?v&h=name,version,node.role,master" | column -t ``` From e5c6395507d12f0c3ec8ce681f9be5e5d9c0ecd0 Mon Sep 17 00:00:00 2001 From: Jun Ohtani Date: Thu, 15 Aug 2024 22:41:00 +0900 Subject: [PATCH 122/154] Fix typo in the logs index mappings (#7986) Signed-off-by: Jun Ohtani --- _field-types/supported-field-types/derived.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_field-types/supported-field-types/derived.md b/_field-types/supported-field-types/derived.md index 2ca00927d1..d989c3e4a4 100644 --- a/_field-types/supported-field-types/derived.md +++ b/_field-types/supported-field-types/derived.md @@ -69,7 +69,7 @@ PUT logs } } }, - "client_ip": { + "clientip": { "type": "keyword" } } From e3396512a913948dbbc8c6d525cd407bd3abb37a Mon Sep 17 00:00:00 2001 From: AntonEliatra Date: Thu, 15 Aug 2024 14:50:15 +0100 Subject: [PATCH 123/154] Adding search shard routing docs (#7656) * Adding documentation for search-shard-routing #7507 Signed-off-by: AntonEliatra * Adding documentation for search-shard-routing #7507 Signed-off-by: AntonEliatra * Update search-shard-routing.md Signed-off-by: AntonEliatra * fixing typo Signed-off-by: AntonEliatra * updating details as per comments Signed-off-by: AntonEliatra * Update search-shard-routing.md Signed-off-by: AntonEliatra * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: AntonEliatra * moving search shard routing to a new location and updating as per PR comments Signed-off-by: Anton Rubin * adding link to configuring static and dymanic settings Signed-off-by: Anton Rubin * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: AntonEliatra * Update search-shard-routing.md Signed-off-by: AntonEliatra * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: AntonEliatra --------- Signed-off-by: AntonEliatra Signed-off-by: Anton Rubin Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- .../cluster-settings.md | 2 + .../searching-data/search-shard-routing.md | 212 ++++++++++++++++++ 2 files changed, 214 insertions(+) create mode 100644 _search-plugins/searching-data/search-shard-routing.md diff --git a/_install-and-configure/configuring-opensearch/cluster-settings.md b/_install-and-configure/configuring-opensearch/cluster-settings.md index 1bda6db262..9af0f5c5b1 100644 --- a/_install-and-configure/configuring-opensearch/cluster-settings.md +++ b/_install-and-configure/configuring-opensearch/cluster-settings.md @@ -106,6 +106,8 @@ OpenSearch supports the following cluster-level routing and shard allocation set OpenSearch supports the following cluster-level shard, block, and task settings: +- `action.search.shard_count.limit` (Integer): Limits the maximum number of shards to be hit during search. Requests that exceed this limit will be rejected. + - `cluster.blocks.read_only` (Boolean): Sets the entire cluster to read-only. Default is `false`. - `cluster.blocks.read_only_allow_delete` (Boolean): Similar to `cluster.blocks.read_only`, but allows you to delete indexes. diff --git a/_search-plugins/searching-data/search-shard-routing.md b/_search-plugins/searching-data/search-shard-routing.md new file mode 100644 index 0000000000..77c5fc7ce4 --- /dev/null +++ b/_search-plugins/searching-data/search-shard-routing.md @@ -0,0 +1,212 @@ +--- +layout: default +parent: Searching data +title: Search shard routing +nav_order: 70 +--- + +# Search shard routing + +To ensure redundancy and improve search performance, OpenSearch distributes index data across multiple primary shards, with each primary shard having one or more replica shards. When a search query is executed, OpenSearch routes the request to a node containing either a primary or replica index shard. This technique is known as _search shard routing_. + + +## Adaptive replica selection + +In order to improve latency, search requests are routed using _adaptive replica selection_, which chooses the nodes based on the following factors: + +- The amount of time it took a particular node to run previous requests. +- The latency between the coordinating node and the selected node. +- The queue size of the node's search thread pool. + +If you have permissions to call the OpenSearch REST APIs, you can turn off search shard routing. For more information about REST API user access, see [REST management API settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/security-settings/#rest-management-api-settings). To disable search shard routing, update the cluster settings as follows: + +```json +PUT /_cluster/settings +{ + "persistent": { + "cluster.routing.use_adaptive_replica_selection": false + } +} +``` +{% include copy-curl.html %} + +If you turn off search shard routing, OpenSearch will use round-robin routing, which can negatively impact search latency. +{: .note} + +## Node and shard selection during searches + +OpenSearch uses all nodes to choose the best routing for search requests. However, in some cases you may want to manually select the nodes or shards to which the search request is sent, including the following: + +- Using cached previous searches. +- Dedicating specific hardware to searches. +- Using only local nodes for searches. + +You can use the `preference` parameter in the search query to indicate the search destination. The following is a complete list of available options: + +1. `_primary`: Forces the search to execute only on primary shards. + + ```json + GET /my-index/_search?preference=_primary + ``` + {% include copy-curl.html %} + +2. `_primary_first`: Prefers primary shards but will use replica shards if the primary shards are not available. + + ```json + GET /my-index/_search?preference=_primary_first + ``` + {% include copy-curl.html %} + +3. `_replica`: Forces the search to execute only on replica shards. + + ```json + GET /my-index/_search?preference=_replica + ``` + {% include copy-curl.html %} + +4. `_replica_first`: Prefers replica shards but will use primary shards if no replica shards are available. + + ```json + GET /my-index/_search?preference=_replica_first + ``` + {% include copy-curl.html %} + +5. `_only_nodes:,`: Limits the search to execute only on specific nodes according to their IDs. + + ```json + GET /my-index/_search?preference=_only_nodes:node-1,node-2 + ``` + {% include copy-curl.html %} + +6. `_prefer_nodes:,`: Prefers to execute the search on specific nodes but will use other nodes if the preferred nodes are not available. + + ```json + GET /my-index/_search?preference=_prefer_nodes:node-1,node-2 + ``` + {% include copy-curl.html %} + +7. `_shards:,`: Limits the search to specific shards. + + ```json + GET /my-index/_search?preference=_shards:0,1 + ``` + {% include copy-curl.html %} + +8. `_local`: Executes the search on the local node if possible, which can reduce latency. + + ```json + GET /my-index/_search?preference=_local + ``` + {% include copy-curl.html %} + +9. Custom string: You can use any custom string as the preference value. This custom string ensures that requests containing the same string are routed to the same shards consistently, which can be useful for caching. + + ```json + GET /my-index/_search?preference=custom_string + ``` + {% include copy-curl.html %} + +## Custom routing during index and search + +You can specify routing during both indexing and search operations. + +### Routing during indexing +When you index a document, OpenSearch calculates a hash of the routing value and uses this hash to determine the shard on which the document will be stored. If you don't specify a routing value, OpenSearch uses the document ID to calculate the hash. + +The following is an example index operation with a routing value: + +```json +POST /index1/_doc/1?routing=user1 +{ + "name": "John Doe", + "age": 20 +} +``` +{% include copy-curl.html %} + +In this example, the document with ID `1` is indexed with the routing value `user1`. All documents with the same routing value will be stored on the same shard. + +### Routing during searches + +When you search for documents, specifying the same routing value ensures that the search request is routed to the appropriate shard. This can significantly improve performance by reducing the number of shards that need to be queried. + +The following example request searches with a specific routing value: + +```json +GET /index1/_search?routing=user1 +{ + "query": { + "match": { + "name": "John Doe" + } + } +} +``` +{% include copy-curl.html %} + +In this example, the search query is routed to the shard containing documents indexed with the routing value `user1`. + +Caution needs to be exercised when using custom routing in order to prevent hot spots and data skew: + + - A _hot spot_ occurs when a disproportionate number of documents are routed to a single shard. This can lead to that shard becoming a bottleneck because it will have to handle more read and write operations compared to other shards. Consequently, this shard may experience higher CPU, memory, and I/O usage, leading to performance degradation. + + - _Data skew_ refers to an uneven distribution of data across shards. If routing values are not evenly distributed, some shards may end up storing significantly more data than others. This can result in imbalanced storage usage, where certain nodes have a much higher disk utilization compared to others. + +## Concurrent shard request + +Hitting a large number of shards simultaneously during a search can significantly impact CPU and memory consumption. By default, OpenSearch does not reject these requests. However, there are a number of methods that you can use to mitigate this risk. The following sections describe these methods. + +### Limit the number of shards that can be queried concurrently + +You can use the `max_concurrent_shard_requests` parameter in the search request to limit the number of shards that can be queried concurrently. For example, the following request limits the number of concurrent shard requests to `12`: + +```json +GET /index1/_search?max_concurrent_shard_requests=12 +{ + "query": { + "match_all": {} + } +} +``` +{% include copy-curl.html %} + + +### Define a search shard count limit + +You can define the dynamic `action.search.shard_count.limit` setting either in your `opensearch.yml` file or by using the REST API. Any search request that exceeds this limit will be rejected and throw an error. This helps to prevent a single search request from consuming too many resources, which can degrade the performance of the entire cluster. The following example request updates this cluster setting using the API: + +```json +PUT /_cluster/settings +{ + "transient": { + "action.search.shard_count.limit": 1000 + } +} +``` +{% include copy-curl.html %} + +### Search thread pool + +OpenSearch uses thread pools to manage the execution of various tasks, including search operations. The search thread pool is specifically used for search requests. You can adjust the size and queue capacity of the search thread pool by adding the following settings to `opensearch.yml`: +``` +thread_pool.search.size: 100 +thread_pool.search.queue_size: 1000 +``` +This setting is static. For more information about how to configure dynamic and static settings, see [Configuring OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/). + +#### Thread pool states + +The following three states describe thread pool operations: + + - _Thread Assignment_: If there are available threads in the search thread pool, then the request is immediately assigned to a thread and begins processing. + + - _Queueing_: If all threads in the search thread pool are busy, then the request is placed in the queue. + + - _Rejection_: If the queue is full (for example, the number of queued requests reaches the queue size limit), then additional incoming search requests are rejected until there is space available in the queue. + +You can check the current configuration of the search thread pool by running the following request: + +```json +GET /_cat/thread_pool/search?v&h=id,name,active,rejected,completed,size,queue_size +``` +{% include copy-curl.html %} From d95a9bf051c5e1cb8cfacbb5fbc8a95e922fa51c Mon Sep 17 00:00:00 2001 From: Landon Lengyel Date: Thu, 15 Aug 2024 09:20:31 -0600 Subject: [PATCH 124/154] Correcting contradictions on SecurityAdmin.sh port (#7989) Signed-off-by: Landon Lengyel Co-authored-by: Landon Lengyel --- _security/configuration/security-admin.md | 2 +- _troubleshoot/security-admin.md | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/_security/configuration/security-admin.md b/_security/configuration/security-admin.md index 77d3711385..a03d30fd03 100755 --- a/_security/configuration/security-admin.md +++ b/_security/configuration/security-admin.md @@ -197,7 +197,7 @@ If you run a default OpenSearch installation, which listens on port 9200 and use Name | Description :--- | :--- `-h` | OpenSearch hostname. Default is `localhost`. -`-p` | OpenSearch port. Default is 9200 - not the HTTP port. +`-p` | OpenSearch port. Default is 9200 `-cn` | Cluster name. Default is `opensearch`. `-icl` | Ignore cluster name. `-sniff` | Sniff cluster nodes. Sniffing detects available nodes using the OpenSearch `_cluster/state` API. diff --git a/_troubleshoot/security-admin.md b/_troubleshoot/security-admin.md index f36f1e3b0b..f4770c1ddb 100644 --- a/_troubleshoot/security-admin.md +++ b/_troubleshoot/security-admin.md @@ -24,8 +24,8 @@ If `securityadmin.sh` can't reach the cluster, it outputs: ``` OpenSearch Security Admin v6 -Will connect to localhost:9300 -ERR: Seems there is no opensearch running on localhost:9300 - Will exit +Will connect to localhost:9200 +ERR: Seems there is no opensearch running on localhost:9200 - Will exit ``` @@ -36,9 +36,9 @@ By default, `securityadmin.sh` uses `localhost`. If your cluster runs on any oth ### Check the port -Check that you are running `securityadmin.sh` against the transport port, **not** the HTTP port. +Check that you are running `securityadmin.sh` against the HTTP port, **not** the transport port. -By default, `securityadmin.sh` uses `9300`. If your cluster runs on a different port, use the `-p` option to specify the port number. +By default, `securityadmin.sh` uses `9200`. If your cluster runs on a different port, use the `-p` option to specify the port number. ## None of the configured nodes are available From 3f3364a46cd990d087b448944633b995c14a4033 Mon Sep 17 00:00:00 2001 From: Qi Chen Date: Thu, 15 Aug 2024 10:37:04 -0500 Subject: [PATCH 125/154] [Data Prepper] MAINT: add HTML comment on obfuscate processor config table (#7651) * MAINT: add HTML comment Signed-off-by: George Chen * MNT: address comments Signed-off-by: George Chen * MAINT: period Signed-off-by: George Chen * Update _data-prepper/pipelines/configuration/processors/obfuscate.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Qi Chen --------- Signed-off-by: George Chen Signed-off-by: Qi Chen Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- .../pipelines/configuration/processors/obfuscate.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/_data-prepper/pipelines/configuration/processors/obfuscate.md b/_data-prepper/pipelines/configuration/processors/obfuscate.md index 8d6bf901da..96b03e7405 100644 --- a/_data-prepper/pipelines/configuration/processors/obfuscate.md +++ b/_data-prepper/pipelines/configuration/processors/obfuscate.md @@ -62,6 +62,13 @@ When run, the `obfuscate` processor parses the fields into the following output: Use the following configuration options with the `obfuscate` processor. + + | Parameter | Required | Description | | :--- | :--- | :--- | | `source` | Yes | The source field to obfuscate. | From de627129900046347ce968fbe16dd037c3ce8c9b Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 15 Aug 2024 14:07:11 -0600 Subject: [PATCH 126/154] Update README.md (#7990) Update points of contact Signed-off-by: Melissa Vagi --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index 7d8de14151..66beb1948c 100644 --- a/README.md +++ b/README.md @@ -21,7 +21,6 @@ The following resources provide important guidance regarding contributions to th If you encounter problems or have questions when contributing to the documentation, these people can help: -- [hdhalter](https://github.com/hdhalter) - [kolchfa-aws](https://github.com/kolchfa-aws) - [Naarcha-AWS](https://github.com/Naarcha-AWS) - [vagimeli](https://github.com/vagimeli) From e1fc06541e543ebc552fd4a6cd83dcc825ec1b42 Mon Sep 17 00:00:00 2001 From: Jay Deng Date: Thu, 15 Aug 2024 13:46:21 -0700 Subject: [PATCH 127/154] Remove composite agg limitations for concurrent search (#7904) Signed-off-by: Jay Deng Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _search-plugins/concurrent-segment-search.md | 1 - 1 file changed, 1 deletion(-) diff --git a/_search-plugins/concurrent-segment-search.md b/_search-plugins/concurrent-segment-search.md index 9c0e2da7c6..cbbb993ac9 100644 --- a/_search-plugins/concurrent-segment-search.md +++ b/_search-plugins/concurrent-segment-search.md @@ -95,7 +95,6 @@ Concurrent segment search helps to improve the performance of search requests at The following aggregations do not support the concurrent search model. If a search request contains one of these aggregations, the request will be executed using the non-concurrent path even if concurrent segment search is enabled at the cluster level or index level. - Parent aggregations on [join]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/join/) fields. See [this GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/9316) for more information. - `sampler` and `diversified_sampler` aggregations. See [this GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/11075) for more information. -- Composite aggregations that use scripts. See [this GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/12947) for more information. Composite aggregations without scripts do support concurrent segment search. ## Other considerations From 9b8c68de498c327ee180c25aefb5996e8baf15d3 Mon Sep 17 00:00:00 2001 From: Zelin Hao Date: Thu, 15 Aug 2024 14:29:24 -0700 Subject: [PATCH 128/154] Add new results landing page for website search (#7942) * Add new results landing page for website Signed-off-by: Zelin Hao * Update some features Signed-off-by: Zelin Hao * Update the margin between search results Signed-off-by: Zelin Hao * Update display when no checkbox selected Signed-off-by: Zelin Hao --------- Signed-off-by: Zelin Hao --- _layouts/search_layout.html | 195 ++++++++++++++++++++++++++++++++++++ _sass/custom/custom.scss | 65 ++++++++++++ assets/js/search.js | 81 ++++++++++++++- search.md | 12 +++ 4 files changed, 349 insertions(+), 4 deletions(-) create mode 100644 _layouts/search_layout.html create mode 100644 search.md diff --git a/_layouts/search_layout.html b/_layouts/search_layout.html new file mode 100644 index 0000000000..47b8f25d1c --- /dev/null +++ b/_layouts/search_layout.html @@ -0,0 +1,195 @@ +--- +layout: table_wrappers +--- + + + + +{% include head.html %} + + + + Expand + + + + + + +{% include header.html %} + +
+
+ + + + Results Page Head from layout + +
+ +
+
+ + + + + +
+
+

+
+ +
+
+
+
+ + + +
+
+ +{% include footer.html %} + + + + + + + + diff --git a/_sass/custom/custom.scss b/_sass/custom/custom.scss index 7d7a168fb4..3a9dcc5e6d 100755 --- a/_sass/custom/custom.scss +++ b/_sass/custom/custom.scss @@ -1035,6 +1035,71 @@ body { border-bottom: 1px solid #eeebee; } +.search-page { + display: flex; + align-items: flex-start; + justify-content: center; + gap: 20px; + margin: 0 auto; +} + +.search-page--sidebar { + flex: 1; + max-width: 200px; + flex: 0 0 200px; +} + +.search-page--sidebar--category-filter--checkbox-child { + padding-left: 20px; +} + +.search-page--results { + flex: 3; + display: flex; + flex-direction: column; + align-items: center; + max-width: 60%; +} + +.search-page--results--input { + width: 100%; + position: relative; +} + +.search-page--results--input-box { + width: 100%; + padding: 10px; + margin-bottom: 20px; + border: 1px solid #ccc; + border-radius: 4px; +} + +.search-page--results--input-icon { + position: absolute; + top: 35%; + right: 10px; + transform: translateY(-50%); + pointer-events: none; + color: #333; +} + +.search-page--results--diplay { + width: 100%; + position: relative; + flex-flow: column nowrap; +} + +.search-page--results--diplay--header { + text-align: center; + margin-bottom: 20px; + background-color: transparent; +} + +.search-page--results--diplay--container--item { + margin-bottom: 1%; + display: block; +} + @mixin body-text($color: #000) { color: $color; font-family: 'Open Sans'; diff --git a/assets/js/search.js b/assets/js/search.js index 37de270ebd..8d9cab2ec5 100644 --- a/assets/js/search.js +++ b/assets/js/search.js @@ -13,7 +13,11 @@ const CLASSNAME_HIGHLIGHTED = 'highlighted'; const canSmoothScroll = 'scrollBehavior' in document.documentElement.style; - const docsVersion = elInput.getAttribute('data-docs-version'); + + //Extract version from the URL path + const urlPath = window.location.pathname; + const versionMatch = urlPath.match(/(\d+\.\d+)/); + const docsVersion = versionMatch ? versionMatch[1] : elInput.getAttribute('data-docs-version'); let _showingResults = false, animationFrame, @@ -46,7 +50,7 @@ case 'Enter': e.preventDefault(); - navToHighlightedResult(); + navToResult(); break; } }); @@ -247,9 +251,19 @@ } }; - const navToHighlightedResult = () => { + const navToResultsPage = () => { + const query = encodeURIComponent(elInput.value); + window.location.href = `/docs/${docsVersion}/search.html?q=${query}`; + } + + const navToResult = () => { const searchResultClassName = 'top-banner-search--field-with-results--field--wrapper--search-component--search-results--result'; - elResults.querySelector(`.${searchResultClassName}.highlighted a[href]`)?.click?.(); + const element = elResults.querySelector(`.${searchResultClassName}.highlighted a[href]`); + if (element) { + element.click?.(); + } else { + navToResultsPage(); + } }; const recordEvent = (name, data) => { @@ -261,3 +275,62 @@ }; }); })(); + + +window.doResultsPageSearch = async (query, type, version) => { + console.log("Running results page search!"); + + const searchResultsContainer = document.getElementById('searchPageResultsContainer'); + + try { + const response = await fetch(`https://search-api.opensearch.org/search?q=${query}&v=${version}&t=${type}`); + const data = await response.json(); + // Clear any previous search results + searchResultsContainer.innerHTML = ''; + + if (data.results && data.results.length > 0) { + data.results.forEach(result => { + const resultElement = document.createElement('div'); + resultElement.classList.add('search-page--results--diplay--container--item'); + + const contentCite = document.createElement('cite'); + const crumbs = [...result.ancestors]; + if (result.type === 'DOCS') crumbs.unshift(`OpenSearch ${result.versionLabel || result.version}`); + else if (result.type) crumbs.unshift(result.type); + contentCite.textContent = crumbs.join(' › ')?.replace?.(/ Date: Fri, 16 Aug 2024 22:27:25 +0800 Subject: [PATCH 129/154] Add documentation for v2 neural sparse models (#7987) * update for v2 model Signed-off-by: zhichao-aws * exclude source Signed-off-by: zhichao-aws * Doc review Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: zhichao-aws Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Fanit Kolchina Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- .../agents-tools/tools/neural-sparse-tool.md | 6 +-- .../api/model-apis/register-model.md | 4 +- _ml-commons-plugin/pretrained-models.md | 11 ++-- .../neural-sparse-with-pipelines.md | 51 ++++++++++++++----- 4 files changed, 50 insertions(+), 22 deletions(-) diff --git a/_ml-commons-plugin/agents-tools/tools/neural-sparse-tool.md b/_ml-commons-plugin/agents-tools/tools/neural-sparse-tool.md index 9014c585c8..b78d3d641e 100644 --- a/_ml-commons-plugin/agents-tools/tools/neural-sparse-tool.md +++ b/_ml-commons-plugin/agents-tools/tools/neural-sparse-tool.md @@ -20,13 +20,13 @@ The `NeuralSparseSearchTool` performs sparse vector retrieval. For more informat OpenSearch supports several pretrained sparse encoding models. You can either use one of those models or your own custom model. For a list of supported pretrained models, see [Sparse encoding models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#sparse-encoding-models). For more information, see [OpenSearch-provided pretrained models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/) and [Custom local models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/custom-local-models/). -In this example, you'll use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` pretrained model for both ingestion and search. To register and deploy the model to OpenSearch, send the following request: +In this example, you'll use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill` pretrained model for both ingestion and search. To register the model and deploy it to OpenSearch, send the following request: ```json POST /_plugins/_ml/models/_register?deploy=true { - "name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-v1", - "version": "1.0.1", + "name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill", + "version": "1.0.0", "model_format": "TORCH_SCRIPT" } ``` diff --git a/_ml-commons-plugin/api/model-apis/register-model.md b/_ml-commons-plugin/api/model-apis/register-model.md index 2a0e9706e9..7d8f6d8cc6 100644 --- a/_ml-commons-plugin/api/model-apis/register-model.md +++ b/_ml-commons-plugin/api/model-apis/register-model.md @@ -95,8 +95,8 @@ Field | Data type | Required/Optional | Description ```json POST /_plugins/_ml/models/_register { - "name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1", - "version": "1.0.1", + "name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-distill", + "version": "1.0.0", "model_group_id": "Z1eQf4oB5Vm0Tdw8EIP2", "model_format": "TORCH_SCRIPT" } diff --git a/_ml-commons-plugin/pretrained-models.md b/_ml-commons-plugin/pretrained-models.md index 154b8b530f..1b0c726c33 100644 --- a/_ml-commons-plugin/pretrained-models.md +++ b/_ml-commons-plugin/pretrained-models.md @@ -48,8 +48,8 @@ Sparse encoding models transfer text into a sparse vector and convert the vector We recommend the following combinations for optimal performance: -- Use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` model during both ingestion and search. -- Use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1` model during ingestion and the +- Use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill` model during both ingestion and search. +- Use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-distill` model during ingestion and the `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` tokenizer during search. For more information about the preceding options for running neural sparse search, see [Generating sparse vector embeddings within OpenSearch]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-with-pipelines/). @@ -58,8 +58,11 @@ The following table provides a list of sparse encoding models and artifact links | Model name | Version | Auto-truncation | TorchScript artifact | Description | |:---|:---|:---|:---|:---| -| `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` | 1.0.1 | Yes | - [model_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-encoding-v1-1.0.1-torch_script.zip)
- [config_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v1/1.0.1/torch_script/config.json) | A neural sparse encoding model. The model transforms text into a sparse vector, identifies the indexes of non-zero elements in the vector, and then converts the vector into `` pairs, where each entry corresponds to a non-zero element index. To experiment with this model using transformers and the PyTorch API, see the [HuggingFace documentation](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-v1). | -| `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1` | 1.0.1 | Yes | - [model_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-encoding-doc-v1-1.0.1-torch_script.zip)
- [config_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1/1.0.1/torch_script/config.json) | A neural sparse encoding model. The model transforms text into a sparse vector, identifies the indexes of non-zero elements in the vector, and then converts the vector into `` pairs, where each entry corresponds to a non-zero element index. To experiment with this model using transformers and the PyTorch API, see the [HuggingFace documentation](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v1). | +| `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` | 1.0.1 | Yes | - [model_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-encoding-v1-1.0.1-torch_script.zip)
- [config_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v1/1.0.1/torch_script/config.json) | A neural sparse encoding model. The model transforms text into a sparse vector, identifies the indices of non-zero elements in the vector, and then converts the vector into `` pairs, where each entry corresponds to a non-zero element index. To experiment with this model using transformers and the PyTorch API, see the [Hugging Face documentation](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-v1). | +| `amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill` | 1.0.0 | Yes | - [model_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill/1.0.0/torch_script/neural-sparse_opensearch-neural-sparse-encoding-v2-distill-1.0.0-torch_script.zip)
- [config_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill/1.0.0/torch_script/config.json) | A neural sparse encoding model. The model transforms text into a sparse vector, identifies the indices of non-zero elements in the vector, and then converts the vector into `` pairs, where each entry corresponds to a non-zero element index. To experiment with this model using transformers and the PyTorch API, see the [Hugging Face documentation](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-v2-distill). | +| `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1` | 1.0.1 | Yes | - [model_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-encoding-doc-v1-1.0.1-torch_script.zip)
- [config_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1/1.0.1/torch_script/config.json) | A neural sparse encoding model. The model transforms text into a sparse vector, identifies the indices of non-zero elements in the vector, and then converts the vector into `` pairs, where each entry corresponds to a non-zero element index. To experiment with this model using transformers and the PyTorch API, see the [Hugging Face documentation](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v1). | +| `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-distill` | 1.0.0 | Yes | - [model_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-distill/1.0.0/torch_script/neural-sparse_opensearch-neural-sparse-encoding-doc-v2-distill-1.0.0-torch_script.zip)
- [config_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-distill/1.0.0/torch_script/config.json) | A neural sparse encoding model. The model transforms text into a sparse vector, identifies the indices of non-zero elements in the vector, and then converts the vector into `` pairs, where each entry corresponds to a non-zero element index. To experiment with this model using transformers and the PyTorch API, see the [Hugging Face documentation](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill). | +| `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-mini` | 1.0.0 | Yes | - [model_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-mini/1.0.0/torch_script/neural-sparse_opensearch-neural-sparse-encoding-doc-v2-mini-1.0.0-torch_script.zip)
- [config_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-mini/1.0.0/torch_script/config.json) | A neural sparse encoding model. The model transforms text into a sparse vector, identifies the indices of non-zero elements in the vector, and then converts the vector into `` pairs, where each entry corresponds to a non-zero element index. To experiment with this model using transformers and the PyTorch API, see the [Hugging Face documentation](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini). | | `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` | 1.0.1 | Yes | - [model_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-tokenizer-v1-1.0.1-torch_script.zip)
- [config_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1/1.0.1/torch_script/config.json) | A neural sparse tokenizer. The tokenizer splits text into tokens and assigns each token a predefined weight, which is the token's inverse document frequency (IDF). If the IDF file is not provided, the weight defaults to 1. For more information, see [Preparing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/custom-local-models/#preparing-a-model). | ### Cross-encoder models diff --git a/_search-plugins/neural-sparse-with-pipelines.md b/_search-plugins/neural-sparse-with-pipelines.md index fea2f0d795..ef7044494a 100644 --- a/_search-plugins/neural-sparse-with-pipelines.md +++ b/_search-plugins/neural-sparse-with-pipelines.md @@ -16,9 +16,9 @@ At ingestion time, neural sparse search uses a sparse encoding model to generate At query time, neural sparse search operates in one of two search modes: -- **Bi-encoder mode** (requires a sparse encoding model): A sparse encoding model generates sparse vector embeddings from query text. This approach provides better search relevance at the cost of a slight increase in latency. +- **Bi-encoder mode** (requires a sparse encoding model): A sparse encoding model generates sparse vector embeddings from both documents and query text. This approach provides better search relevance at the cost of an increase in latency. -- **Doc-only mode** (requires a sparse encoding model and a tokenizer): A sparse encoding model generates sparse vector embeddings from query text. In this mode, neural sparse search tokenizes query text using a tokenizer and obtains the token weights from a lookup table. This approach provides faster retrieval at the cost of a slight decrease in search relevance. The tokenizer is deployed and invoked using the [Model API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/index/) for a uniform neural sparse search experience. +- **Doc-only mode** (requires a sparse encoding model and a tokenizer): A sparse encoding model generates sparse vector embeddings from documents. In this mode, neural sparse search tokenizes query text using a tokenizer and obtains the token weights from a lookup table. This approach provides faster retrieval at the cost of a slight decrease in search relevance. The tokenizer is deployed and invoked using the [Model API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/index/) for a uniform neural sparse search experience. For more information about choosing the neural sparse search mode that best suits your workload, see [Choose the search mode](#step-1a-choose-the-search-mode). @@ -48,32 +48,35 @@ Both the bi-encoder and doc-only search modes require you to configure a sparse Choose the search mode and the appropriate model/tokenizer combination: -- **Bi-encoder**: Use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` model during both ingestion and search. +- **Bi-encoder**: Use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill` model during both ingestion and search. -- **Doc-only**: Use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1` model during ingestion and the `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` tokenizer during search. +- **Doc-only**: Use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-distill` model during ingestion and the `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` tokenizer during search. -The following table provides a search relevance comparison for the two search modes so that you can choose the best mode for your use case. +The following table provides a search relevance comparison for all available combinations of the two search modes so that you can choose the best combination for your use case. | Mode | Ingestion model | Search model | Avg search relevance on BEIR | Model parameters | |-----------|---------------------------------------------------------------|---------------------------------------------------------------|------------------------------|------------------| | Doc-only | `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1` | `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` | 0.49 | 133M | +| Doc-only | `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-distill` | `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` | 0.504 | 67M | +| Doc-only | `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-mini` | `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` | 0.497 | 23M | | Bi-encoder| `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` | `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` | 0.524 | 133M | +| Bi-encoder| `amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill` | `amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill` | 0.528 | 67M | -### Step 1(b): Register the model/tokenizer +### Step 1(b): Register the model/tokenizer When you register a model/tokenizer, OpenSearch creates a model group for the model/tokenizer. You can also explicitly create a model group before registering models. For more information, see [Model access control]({{site.url}}{{site.baseurl}}/ml-commons-plugin/model-access-control/). #### Bi-encoder mode -When using bi-encoder mode, you only need to register the `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` model. +When using bi-encoder mode, you only need to register the `amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill` model. Register the sparse encoding model: ```json POST /_plugins/_ml/models/_register?deploy=true { - "name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-v1", - "version": "1.0.1", + "name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill", + "version": "1.0.0", "model_format": "TORCH_SCRIPT" } ``` @@ -116,15 +119,15 @@ Note the `model_id` of the model you've created; you'll need it for the followin #### Doc-only mode -When using doc-only mode, you need to register the `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1` model, which you'll use at ingestion time, and the `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` tokenizer, which you'll use at search time. +When using doc-only mode, you need to register the `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-distill` model, which you'll use at ingestion time, and the `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` tokenizer, which you'll use at search time. Register the sparse encoding model: ```json POST /_plugins/_ml/models/_register?deploy=true { - "name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1", - "version": "1.0.1", + "name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-distill", + "version": "1.0.0", "model_format": "TORCH_SCRIPT" } ``` @@ -276,7 +279,7 @@ PUT /my-nlp-index "default_pipeline": "nlp-ingest-pipeline-sparse" }, "mappings": { - "_source": { + "_source": { "excludes": [ "passage_embedding" ] @@ -421,6 +424,28 @@ The response contains the matching documents: } ``` +To minimize disk and network I/O latency related to sparse embedding sources, you can exclude the embedding vector source from the query as follows: + +```json +GET my-nlp-index/_search +{ + "_source": { + "excludes": [ + "passage_embedding" + ] + }, + "query": { + "neural_sparse": { + "passage_embedding": { + "query_text": "Hi world", + "model_id": "" + } + } + } +} +``` +{% include copy-curl.html %} + ## Accelerating neural sparse search To learn more about improving retrieval time for neural sparse search, see [Accelerating neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/#accelerating-neural-sparse-search). From c088e5741dd7eb4675756d71b1be7d7f13066718 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 16 Aug 2024 10:17:41 -0600 Subject: [PATCH 130/154] Update CODEOWNERS (#8003) * Update CODEOWNERS Updated with current list of codeowners Signed-off-by: Melissa Vagi * Update .github/CODEOWNERS Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Melissa Vagi --------- Signed-off-by: Melissa Vagi Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .github/CODEOWNERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 0ec6c5e009..815687fa17 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -1 +1 @@ -* @hdhalter @kolchfa-aws @Naarcha-AWS @vagimeli @AMoo-Miki @natebower @dlvenable @scrawfor99 @epugh +* @kolchfa-aws @Naarcha-AWS @vagimeli @AMoo-Miki @natebower @dlvenable @stephen-crawford @epugh From 8b99242c541ff95e74450ba2adca5e55272b8135 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Mon, 19 Aug 2024 08:26:37 -0400 Subject: [PATCH 131/154] Revert the header to the previous version to fix hamburger menu on mobile (#8044) Signed-off-by: Fanit Kolchina --- _includes/header.html | 474 +++++++++++++++++------------------------- 1 file changed, 196 insertions(+), 278 deletions(-) diff --git a/_includes/header.html b/_includes/header.html index b7dce4c317..20d82c451e 100644 --- a/_includes/header.html +++ b/_includes/header.html @@ -1,3 +1,71 @@ +{% assign url_parts = page.url | split: "/" %} +{% if url_parts.size > 0 %} + + {% assign last_url_part = url_parts | last %} + + {% comment %} Does the URL contain a filename, and is it an index.html or not? {% endcomment %} + {% if last_url_part contains ".html" %} + {% assign url_has_filename = true %} + {% if last_url_part == 'index.html' %} + {% assign url_filename_is_index = true %} + {% else %} + {% assign url_filename_is_index = false %} + {% endif %} + {% else %} + {% assign url_has_filename = false %} + {% endif %} + + {% comment %} + OpenSearchCon URLs require some special consideration, because it's a specialization + of the /events URL which is itself a child of Community; te OpenSearchCon menu is NOT + a child of Community. + {% endcomment %} + {% if page.url contains "opensearchcon" %} + {% assign is_conference_page = true %} + {% else %} + {% assign is_conference_page = false %} + {% endif %} + + {% if is_conference_page %} + {% comment %} + If the page is a confernce page and it has a filename then its the penultimate + path component that has the child menu item of the OpenSearchCon that needs + to be marked as in-category. If there's no filename then reference the ultimate + path component. + Unless the filename is opensearchcon2023-cfp, because it's a one off that is not + within the /events/opensearchcon/... structure. + {% endcomment %} + {% if url_has_filename %} + {% unless page.url contains 'opensearchcon2023-cfp' %} + {% assign url_fragment_index = url_parts | size | minus: 2 %} + {% assign url_fragment = url_parts[url_fragment_index] %} + {% else %} + {% assign url_fragment = 'opensearchcon2023-cfp' %} + {% endunless %} + {% else %} + {% assign url_fragment = last_url_part %} + {% endif %} + {% else %} + {% comment %} + If the page is NOT a conference page, the URL has a filename, and the filename + is NOT index.html then refer to the filename without the .html extension. + If the filename is index.html then refer to the penultimate path component. + If there is not filename then refer to the ultimate path component. + {% endcomment %} + {% if url_has_filename %} + {% unless url_filename_is_index %} + {% assign url_fragment = last_url_part | replace: '.html', '' %} + {% else %} + {% assign url_fragment_index = url_parts | size | minus: 2 %} + {% assign url_fragment = url_parts[url_fragment_index] %} + {% endunless %} + {% else %} + {% assign url_fragment = last_url_part %} + {% endif %} + {% endif %} +{% else %} + {% assign url_fragment = '' %} +{% endif %} {% if page.alert %}