From 67682f2f7997606ed6949b81634c19cc772804f1 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 30 Aug 2024 09:51:34 -0600 Subject: [PATCH 1/7] Delete outdated images (#8130) * Delete outdated images Signed-off-by: Melissa Vagi * Delete outdated images Signed-off-by: Melissa Vagi --------- Signed-off-by: Melissa Vagi --- .../management/accelerate-external-data.md | 46 ++++++------------- _dashboards/management/query-data-source.md | 25 +++------- 2 files changed, 21 insertions(+), 50 deletions(-) diff --git a/_dashboards/management/accelerate-external-data.md b/_dashboards/management/accelerate-external-data.md index 00e4600ffd..6d1fa030e4 100644 --- a/_dashboards/management/accelerate-external-data.md +++ b/_dashboards/management/accelerate-external-data.md @@ -12,55 +12,37 @@ Introduced 2.11 {: .label .label-purple } -Query performance can be slow when using external data sources for reasons such as network latency, data transformation, and data volume. You can optimize your query performance by using OpenSearch indexes, such as a skipping index or a covering index. A _skipping index_ uses skip acceleration methods, such as partition, minimum and maximum values, and value sets, to ingest and create compact aggregate data structures. This makes them an economical option for direct querying scenarios. A _covering index_ ingests all or some of the data from the source into OpenSearch and makes it possible to use all OpenSearch Dashboards and plugin functionality. See the [Flint Index Reference Manual](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md) for comprehensive guidance on this feature's indexing process. +Query performance can be slow when using external data sources for reasons such as network latency, data transformation, and data volume. You can optimize your query performance by using OpenSearch indexes, such as a skipping index or a covering index. + +A _skipping index_ uses skip acceleration methods, such as partition, minimum and maximum values, and value sets, to ingest and create compact aggregate data structures. This makes them an economical option for direct querying scenarios. + +A _covering index_ ingests all or some of the data from the source into OpenSearch and makes it possible to use all OpenSearch Dashboards and plugin functionality. See the [Flint Index Reference Manual](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md) for comprehensive guidance on this feature's indexing process. ## Data sources use case: Accelerate performance To get started with the **Accelerate performance** use case available in **Data sources**, follow these steps: 1. Go to **OpenSearch Dashboards** > **Query Workbench** and select your Amazon S3 data source from the **Data sources** dropdown menu in the upper-left corner. -2. From the left-side navigation menu, select a database. An example using the `http_logs` database is shown in the following image. - - Query Workbench accelerate data UI - +2. From the left-side navigation menu, select a database. 3. View the results in the table and confirm that you have the desired data. 4. Create an OpenSearch index by following these steps: - 1. Select the **Accelerate data** button. A pop-up window appears. An example is shown in the following image. - - Accelerate data pop-up window - + 1. Select the **Accelerate data** button. A pop-up window appears. 2. Enter your details in **Select data fields**. In the **Database** field, select the desired acceleration index: **Skipping index** or **Covering index**. A _skipping index_ uses skip acceleration methods, such as partition, min/max, and value sets, to ingest data using compact aggregate data structures. This makes them an economical option for direct querying scenarios. A _covering index_ ingests all or some of the data from the source into OpenSearch and makes it possible to use all OpenSearch Dashboards and plugin functionality. - -5. Under **Index settings**, enter the information for your acceleration index. For information about naming, select **Help**. Note that an Amazon S3 table can only have one skipping index at a time. An example is shown in the following image. - - Skipping index settings +5. Under **Index settings**, enter the information for your acceleration index. For information about naming, select **Help**. Note that an Amazon S3 table can only have one skipping index at a time. ### Define skipping index settings -1. Under **Skipping index definition**, select the **Add fields** button to define the skipping index acceleration method and choose the fields you want to add. An example is shown in the following image. - - Skipping index add fields - +1. Under **Skipping index definition**, select the **Add fields** button to define the skipping index acceleration method and choose the fields you want to add. 2. Select the **Copy Query to Editor** button to apply your skipping index settings. -3. View the skipping index query details in the table pane and then select the **Run** button. Your index is added to the left-side navigation menu containing the list of your databases. An example is shown in the following image. - - Run a skippping or covering index UI +3. View the skipping index query details in the table pane and then select the **Run** button. Your index is added to the left-side navigation menu containing the list of your databases. ### Define covering index settings -1. Under **Index settings**, enter a valid index name. Note that each Amazon S3 table can have multiple covering indexes. An example is shown in the following image. - - Covering index settings - -2. Once you have added the index name, define the covering index fields by selecting `(add fields here)` under **Covering index definition**. An example is shown in the following image. - - Covering index field naming - +1. Under **Index settings**, enter a valid index name. Note that each Amazon S3 table can have multiple covering indexes. +2. Once you have added the index name, define the covering index fields by selecting `(add fields here)` under **Covering index definition**. 3. Select the **Copy Query to Editor** button to apply your covering index settings. -4. View the covering index query details in the table pane and then select the **Run** button. Your index is added to the left-side navigation menu containing the list of your databases. An example UI is shown in the following image. - - Run index in Query Workbench +4. View the covering index query details in the table pane and then select the **Run** button. Your index is added to the left-side navigation menu containing the list of your databases. ## Limitations -This feature is still under development, so there are some limitations. For real-time updates, see the [developer documentation on GitHub](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#limitations). +This feature is still under development, so there are some limitations. For real-time updates, refer to the [developer documentation on GitHub](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#limitations). diff --git a/_dashboards/management/query-data-source.md b/_dashboards/management/query-data-source.md index f1496b3e17..a3392c073e 100644 --- a/_dashboards/management/query-data-source.md +++ b/_dashboards/management/query-data-source.md @@ -11,7 +11,7 @@ has_children: false Introduced 2.11 {: .label .label-purple } -This tutorial guides you through using the **Query data** use case for querying and visualizing your Amazon Simple Storage Service (Amazon S3) data using OpenSearch Dashboards. +This tutorial guides you through using the **Query data** use case for querying and visualizing your Amazon Simple Storage Service (Amazon S3) data using OpenSearch Dashboards. ## Prerequisites @@ -22,15 +22,9 @@ You must be using the `opensearch-security` plugin and have the appropriate role To get started, follow these steps: 1. On the **Manage data sources** page, select your data source from the list. -2. On the data source's detail page, select the **Query data** card. This option takes you to the **Observability** > **Logs** page, which is shown in the following image. - - Observability Logs UI - +2. On the data source's detail page, select the **Query data** card. This option takes you to the **Observability** > **Logs** page. 3. Select the **Event Explorer** button. This option creates and saves frequently searched queries and visualizations using [Piped Processing Language (PPL)]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/index/) or [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/), which connects to Spark SQL. -4. Select the Amazon S3 data source from the dropdown menu in the upper-left corner. An example is shown in the following image. - - Observability Logs Amazon S3 dropdown menu - +4. Select the Amazon S3 data source from the dropdown menu in the upper-left corner. 5. Enter the query in the **Enter PPL query** field. Note that the default language is SQL. To change the language, select PPL from the dropdown menu. 6. Select the **Search** button. The **Query Processing** message is shown, confirming that your query is being processed. 7. View the results, which are listed in a table on the **Events** tab. On this page, details such as available fields, source, and time are shown in a table format. @@ -40,10 +34,7 @@ To get started, follow these steps: To create visualizations, follow these steps: -1. On the **Explorer** page, select the **Visualizations** tab. An example is shown in the following image. - - Explorer Amazon S3 visualizations UI - +1. On the **Explorer** page, select the **Visualizations** tab. 2. Select **Index data to visualize**. This option currently only creates [acceleration indexes]({{site.url}}{{site.baseurl}}/dashboards/management/accelerate-external-data/), which give you views of the data visualizations from the **Visualizations** tab. To create a visualization of your Amazon S3 data, go to **Discover**. See the [Discover documentation]({{site.url}}{{site.baseurl}}/dashboards/discover/index-discover/) for information and a tutorial. ## Use Query Workbench with your Amazon S3 data source @@ -53,14 +44,12 @@ To create visualizations, follow these steps: To use Query Workbench with your Amazon S3 data, follow these steps: 1. From the OpenSearch Dashboards main menu, select **OpenSearch Plugins** > **Query Workbench**. -2. From the **Data Sources** dropdown menu in the upper-left corner, choose your Amazon S3 data source. Your data begins loading the databases that are part of your data source. An example is shown in the following image. - - Query Workbench Amazon S3 data loading UI - +2. From the **Data Sources** dropdown menu in the upper-left corner, choose your Amazon S3 data source. Your data begins loading the databases that are part of your data source. 3. View the databases listed in the left-side navigation menu and select a database to view its details. Any information about acceleration indexes is listed under **Acceleration index destination**. 4. Choose the **Describe Index** button to learn more about how data is stored in that particular index. 5. Choose the **Drop index** button to delete and clear both the OpenSearch index and the Amazon S3 Spark job that refreshes the data. -6. Enter your SQL query and select **Run**. +6. Enter your SQL query and select **Run**. + ## Next steps - Learn about [accelerating the query performance of your external data sources]({{site.url}}{{site.baseurl}}/dashboards/management/accelerate-external-data/). From 2648797663b2f86a4329051d22257d25f33aca96 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Fri, 30 Aug 2024 18:12:16 -0400 Subject: [PATCH 2/7] Update network-settings.md (#8138) Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- .../configuring-opensearch/network-settings.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_install-and-configure/configuring-opensearch/network-settings.md b/_install-and-configure/configuring-opensearch/network-settings.md index f96dde97e1..dc61ccc49b 100644 --- a/_install-and-configure/configuring-opensearch/network-settings.md +++ b/_install-and-configure/configuring-opensearch/network-settings.md @@ -51,7 +51,7 @@ OpenSearch supports the following advanced network settings for transport commun ## Selecting the transport -The default OpenSearch transport is provided by the `transport-netty4` module and uses the [Netty 4](https://netty.io/) engine for both internal TCP-based communication between nodes in the cluster and external HTTP-based communication with clients. This communication is fully asynchronous and non-blocking. However, there are other transport plugins available that can be used interchangeably: +The default OpenSearch transport is provided by the `transport-netty4` module and uses the [Netty 4](https://netty.io/) engine for both internal TCP-based communication between nodes in the cluster and external HTTP-based communication with clients. This communication is fully asynchronous and non-blocking. The following table lists other available transport plugins that can be used interchangeably. Plugin | Description :---------- | :-------- From 0427252ba7b2dad26b6515cd04a47520f480581b Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Sun, 1 Sep 2024 23:49:19 -0500 Subject: [PATCH 3/7] Add common operations section to User Guide. (#7974) * Add common operations section to User Guide. Signed-off-by: Archer * Fix link Signed-off-by: Archer * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Archer Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../choosing-a-workload.md | 2 +- .../common-operations.md | 181 ++++++++++++++++++ 2 files changed, 182 insertions(+), 1 deletion(-) create mode 100644 _benchmark/user-guide/understanding-workloads/common-operations.md diff --git a/_benchmark/user-guide/understanding-workloads/choosing-a-workload.md b/_benchmark/user-guide/understanding-workloads/choosing-a-workload.md index d7ae48ad0a..6016caee0a 100644 --- a/_benchmark/user-guide/understanding-workloads/choosing-a-workload.md +++ b/_benchmark/user-guide/understanding-workloads/choosing-a-workload.md @@ -18,7 +18,7 @@ Consider the following criteria when deciding which workload would work best for - The cluster's use case. - The data types that your cluster uses compared to the data structure of the documents contained in the workload. Each workload contains an example document so that you can compare data types, or you can view the index mappings and data types in the `index.json` file. -- The query types most commonly used inside your cluster. The `operations/default.json` file contains information about the query types and workload operations. +- The query types most commonly used inside your cluster. The `operations/default.json` file contains information about the query types and workload operations. For a list of common operations, see [Common operations]({{site.url}}{{site.baseurl}}/benchmark/user-guide/understanding-workloads/common-operations/). ## General search clusters diff --git a/_benchmark/user-guide/understanding-workloads/common-operations.md b/_benchmark/user-guide/understanding-workloads/common-operations.md new file mode 100644 index 0000000000..c9fe15c18c --- /dev/null +++ b/_benchmark/user-guide/understanding-workloads/common-operations.md @@ -0,0 +1,181 @@ +--- +layout: default +title: Common operations +nav_order: 16 +grand_parent: User guide +parent: Understanding workloads +--- + +# Common operations + +[Test procedures]({{site.url}}{{site.baseurl}}/benchmark/user-guide/understanding-workloads/anatomy-of-a-workload#_operations-and-_test-procedures) use a variety of operations, found inside the `operations` directory of a workload. This page details the most common operations found inside OpenSearch Benchmark workloads. + +- [Common operations](#common-operations) + - [bulk](#bulk) + - [create-index](#create-index) + - [delete-index](#delete-index) + - [cluster-health](#cluster-health) + - [refresh](#refresh) + - [search](#search) + + +## bulk + + +The `bulk` operation type allows you to run [bulk](/api-reference/document-apis/bulk/) requests as a task. + +The following example shows a `bulk` operation type with a `bulk-size` of `5000` documents: + +```yml +{ + "name": "index-append", + "operation-type": "bulk", + "bulk-size": 5000 +} +``` + + + +## create-index + + +The `create-index` operation runs the [Create Index API](/api-reference/index-apis/create-index/). It supports the following two modes of index creation: + +- Creating all indexes specified in the workloads `indices` section +- Creating one specific index defined within the operation itself + +The following example creates all indexes defined in the `indices` section of the workload. It uses all of the index settings defined in the workload but overrides the number of shards: + +```yml +{ + "name": "create-all-indices", + "operation-type": "create-index", + "settings": { + "index.number_of_shards": 1 + }, + "request-params": { + "wait_for_active_shards": "true" + } +} +``` + +The following example creates a new index with all index settings specified in the operation body: + +```yml +{ + "name": "create-an-index", + "operation-type": "create-index", + "index": "people", + "body": { + "settings": { + "index.number_of_shards": 0 + }, + "mappings": { + "docs": { + "properties": { + "name": { + "type": "text" + } + } + } + } + } +} +``` + + + + +## delete-index + + +The `delete-index` operation runs the [Delete Index API](api-reference/index-apis/delete-index/). Like with the [`create-index`](#create-index) operation, you can delete all indexes found in the `indices` section of the workload or delete one or more indexes based on the string passed in the `index` setting. + +The following example deletes all indexes found in the `indices` section of the workload: + +```yml +{ + "name": "delete-all-indices", + "operation-type": "delete-index" +} +``` + +The following example deletes all `logs_*` indexes: + +```yml +{ + "name": "delete-logs", + "operation-type": "delete-index", + "index": "logs-*", + "only-if-exists": false, + "request-params": { + "expand_wildcards": "all", + "allow_no_indices": "true", + "ignore_unavailable": "true" + } +} +``` + + +## cluster-health + + +The `cluster-health` operation runs the [Cluster Health API](api-reference/cluster-api/cluster-health/), which checks the cluster health status and returns the expected status according to the parameters set for `request-params`. If an unexpected cluster health status is returned, the operation reports a failure. You can use the `--on-error` option in the OpenSearch Benchmark `execute-test` command to control how OpenSearch Benchmark behaves when the health check fails. + +The following example creates a `cluster-health` operation that checks for a `green` health status on any `log-*` indexes: + +```yml +{ + "name": "check-cluster-green", + "operation-type": "cluster-health", + "index": "logs-*", + "request-params": { + "wait_for_status": "green", + "wait_for_no_relocating_shards": "true" + }, + "retry-until-success": true +} + +``` + + +## refresh + + +The `refresh` operation runs the Refresh API. The `operation` returns no metadata. + + +The following example refreshes all `logs-*` indexes: + +```yml +{ + "name": "refresh", + "operation-type": "refresh", + "index": "logs-*" +} +``` + + + +## search + + +The `search` operation runs the [Search API](/api-reference/search/), which you can use to run queries in OpenSearch Benchmark indexes. + +The following example runs a `match_all` query inside the `search` operation: + +```yml +{ + "name": "default", + "operation-type": "search", + "body": { + "query": { + "match_all": {} + } + }, + "request-params": { + "_source_include": "some_field", + "analyze_wildcard": "false" + } +} +``` From e3576fba3eed65b9fa1c635fba591723542bddb5 Mon Sep 17 00:00:00 2001 From: Kunal Kotwani Date: Tue, 3 Sep 2024 07:21:49 -0700 Subject: [PATCH 4/7] Update known limitations for kNN based indexes (#8137) * Update known limitations for kNN based indexes Signed-off-by: Kunal Kotwani * Update _tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Kunal Kotwani Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- .../availability-and-recovery/snapshots/searchable_snapshot.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md b/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md index 4af25004a7..b9e35b2697 100644 --- a/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md +++ b/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md @@ -108,4 +108,5 @@ The following are known limitations of the searchable snapshots feature: - Many remote object stores charge on a per-request basis for retrieval, so users should closely monitor any costs incurred. - Searching remote data can impact the performance of other queries running on the same node. We recommend that users provision dedicated nodes with the `search` role for performance-critical applications. - For better search performance, consider [force merging]({{site.url}}{{site.baseurl}}/api-reference/index-apis/force-merge/) indexes into a smaller number of segments before taking a snapshot. For the best performance, at the cost of using compute resources prior to snapshotting, force merge your index into one segment. -- We recommend configuring a maximum ratio of remote data to local disk cache size using the `cluster.filecache.remote_data_ratio` setting. A ratio of 5 is a good starting point for most workloads to ensure good query performance. If the ratio is too large, then there may not be sufficient disk space to handle the search workload. For more details on the maximum ratio of remote data, see issue [#11676](https://github.com/opensearch-project/OpenSearch/issues/11676). +- We recommend configuring a maximum ratio of remote data to local disk cache size using the `cluster.filecache.remote_data_ratio` setting. A ratio of 5 is a good starting point for most workloads to ensure good query performance. If the ratio is too large, then there may not be sufficient disk space to handle the search workload. For more details on the maximum ratio of remote data, see issue [#11676](https://github.com/opensearch-project/OpenSearch/issues/11676). +- k-NN native-engine-based indexes using `faiss` and `nmslib` engines are incompatible with searchable snapshots. From 9e7aedc3d11d52fec60513300786c6d2f9ab97a9 Mon Sep 17 00:00:00 2001 From: kkewwei Date: Tue, 3 Sep 2024 22:25:13 +0800 Subject: [PATCH 5/7] Update binary.md (#8142) According the code, the default value of `hasDocValues` is false https://github.com/opensearch-project/OpenSearch/blob/03d9a249e47b99b33c6de3625f43b12bef29c1cb/server/src/main/java/org/opensearch/index/mapper/BinaryFieldMapper.java#L85 Signed-off-by: kkewwei Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _field-types/supported-field-types/binary.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_field-types/supported-field-types/binary.md b/_field-types/supported-field-types/binary.md index d6974ad4cf..99d468c1dc 100644 --- a/_field-types/supported-field-types/binary.md +++ b/_field-types/supported-field-types/binary.md @@ -50,5 +50,5 @@ The following table lists the parameters accepted by binary field types. All par Parameter | Description :--- | :--- -`doc_values` | A Boolean value that specifies whether the field should be stored on disk so that it can be used for aggregations, sorting, or scripting. Optional. Default is `true`. -`store` | A Boolean value that specifies whether the field value should be stored and can be retrieved separately from the _source field. Optional. Default is `false`. \ No newline at end of file +`doc_values` | A Boolean value that specifies whether the field should be stored on disk so that it can be used for aggregations, sorting, or scripting. Optional. Default is `false`. +`store` | A Boolean value that specifies whether the field value should be stored and can be retrieved separately from the _source field. Optional. Default is `false`. From a5b230cecdba02b5c4d8f66a4f2d9fa8243f56ec Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Tue, 3 Sep 2024 13:38:34 -0500 Subject: [PATCH 6/7] Fix broken links (#8147) Closes #8144 Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _data-prepper/pipelines/configuration/sinks/s3.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/_data-prepper/pipelines/configuration/sinks/s3.md b/_data-prepper/pipelines/configuration/sinks/s3.md index 3ff266cccf..6bae749d38 100644 --- a/_data-prepper/pipelines/configuration/sinks/s3.md +++ b/_data-prepper/pipelines/configuration/sinks/s3.md @@ -173,14 +173,14 @@ When you provide your own Avro schema, that schema defines the final structure o In cases where your data is uniform, you may be able to automatically generate a schema. Automatically generated schemas are based on the first event that the codec receives. The schema will only contain keys from this event, and all keys must be present in all events in order to automatically generate a working schema. Automatically generated schemas make all fields nullable. Use the `include_keys` and `exclude_keys` sink configurations to control which data is included in the automatically generated schema. -Avro fields should use a null [union](https://avro.apache.org/docs/1.10.2/spec.html#Unions) because this will allow missing values. Otherwise, all required fields must be present for each event. Use non-nullable fields only when you are certain they exist. +Avro fields should use a null [union](https://avro.apache.org/docs/1.12.0/specification/#unions) because this will allow missing values. Otherwise, all required fields must be present for each event. Use non-nullable fields only when you are certain they exist. Use the following options to configure the codec. Option | Required | Type | Description :--- | :--- | :--- | :--- -`schema` | Yes | String | The Avro [schema declaration](https://avro.apache.org/docs/1.2.0/spec.html#schemas). Not required if `auto_schema` is set to true. -`auto_schema` | No | Boolean | When set to `true`, automatically generates the Avro [schema declaration](https://avro.apache.org/docs/1.2.0/spec.html#schemas) from the first event. +`schema` | Yes | String | The Avro [schema declaration](https://avro.apache.org/docs/1.12.0/specification/#schema-declaration). Not required if `auto_schema` is set to true. +`auto_schema` | No | Boolean | When set to `true`, automatically generates the Avro [schema declaration](https://avro.apache.org/docs/1.12.0/specification/#schema-declaration) from the first event. ### `ndjson` codec @@ -208,8 +208,8 @@ Use the following options to configure the codec. Option | Required | Type | Description :--- | :--- | :--- | :--- -`schema` | Yes | String | The Avro [schema declaration](https://avro.apache.org/docs/current/specification/#schema-declaration). Not required if `auto_schema` is set to true. -`auto_schema` | No | Boolean | When set to `true`, automatically generates the Avro [schema declaration](https://avro.apache.org/docs/current/specification/#schema-declaration) from the first event. +`schema` | Yes | String | The Avro [schema declaration](https://avro.apache.org/docs/1.12.0/specification/#schema-declaration). Not required if `auto_schema` is set to true. +`auto_schema` | No | Boolean | When set to `true`, automatically generates the Avro [schema declaration](https://avro.apache.org/docs/1.12.0/specification/#schema-declaration) from the first event. ### Setting a schema with Parquet From ef8abd7ae007917e37f53b69f21c377db64353da Mon Sep 17 00:00:00 2001 From: leanneeliatra <131779422+leanneeliatra@users.noreply.github.com> Date: Tue, 3 Sep 2024 19:55:57 +0100 Subject: [PATCH 7/7] Addition of full file paths in security documentation (#8113) * added full file paths for security config files Signed-off-by: leanne.laceybyrne@eliatra.com Signed-off-by: leanne.laceybyrne@eliatra.com * added full file paths for security config files Signed-off-by: leanne.laceybyrne@eliatra.com Signed-off-by: leanne.laceybyrne@eliatra.com # Conflicts: # _security/configuration/yaml.md * small edits to full file paths for security config files Signed-off-by: leanne.laceybyrne@eliatra.com Signed-off-by: leanne.laceybyrne@eliatra.com * updates to file paths following tech review Signed-off-by: leanne.laceybyrne@eliatra.com Signed-off-by: leanne.laceybyrne@eliatra.com * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Take into account previous changes Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: leanne.laceybyrne@eliatra.com Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../configuring-opensearch/security-settings.md | 2 +- _security/configuration/index.md | 2 +- _security/configuration/security-admin.md | 4 ++-- _security/configuration/yaml.md | 8 +++++--- 4 files changed, 9 insertions(+), 7 deletions(-) diff --git a/_install-and-configure/configuring-opensearch/security-settings.md b/_install-and-configure/configuring-opensearch/security-settings.md index b9c375d208..2ac09a4819 100644 --- a/_install-and-configure/configuring-opensearch/security-settings.md +++ b/_install-and-configure/configuring-opensearch/security-settings.md @@ -9,7 +9,7 @@ nav_order: 40 The Security plugin provides a number of YAML configuration files that are used to store the necessary settings that define the way the Security plugin manages users, roles, and activity within the cluster. For a full list of the Security plugin configuration files, see [Modifying the YAML files]({{site.url}}{{site.baseurl}}/security/configuration/yaml/). -The following sections describe security-related settings in `opensearch.yml`. To learn more about static and dynamic settings, see [Configuring OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/). +The following sections describe security-related settings in `opensearch.yml`. You can find the `opensearch.yml` in the `/config/opensearch.yml`. To learn more about static and dynamic settings, see [Configuring OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/). ## Common settings diff --git a/_security/configuration/index.md b/_security/configuration/index.md index 31292c320a..e351e8865f 100644 --- a/_security/configuration/index.md +++ b/_security/configuration/index.md @@ -28,4 +28,4 @@ The Security plugin has several default users, roles, action groups, permissions {: .note } For a full list of `opensearch.yml` Security plugin settings, Security plugin settings, see [Security settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/security-settings/). -{: .note} \ No newline at end of file +{: .note} diff --git a/_security/configuration/security-admin.md b/_security/configuration/security-admin.md index a03d30fd03..b4d23dce5b 100755 --- a/_security/configuration/security-admin.md +++ b/_security/configuration/security-admin.md @@ -23,13 +23,13 @@ The `securityadmin.sh` script requires SSL/TLS HTTP to be enabled for your OpenS ## A word of caution -If you make changes to the configuration files in `config/opensearch-security`, OpenSearch does _not_ automatically apply these changes. Instead, you must run `securityadmin.sh` to load the updated files into the index. +If you make changes to the configuration files in `config/opensearch-security`, OpenSearch does _not_ automatically apply these changes. Instead, you must run `securityadmin.sh` to load the updated files into the index. The `securityadmin.sh` file can be found in `/plugins/opensearch-security/tools/securityadmin.[sh|bat]`. Running `securityadmin.sh` **overwrites** one or more portions of the `.opendistro_security` index. Run it with extreme care to avoid losing your existing resources. Consider the following example: 1. You initialize the `.opendistro_security` index. 1. You create ten users using the REST API. -1. You decide to create a new [reserved user]({{site.url}}{{site.baseurl}}/security/access-control/api/#reserved-and-hidden-resources) using `internal_users.yml`. +1. You decide to create a new [reserved user]({{site.url}}{{site.baseurl}}/security/access-control/api/#reserved-and-hidden-resources) using `internal_users.yml`, found in `/config/opensearch-security/` directory. 1. You run `securityadmin.sh` again to load the new reserved user into the index. 1. You lose all ten users that you created using the REST API. diff --git a/_security/configuration/yaml.md b/_security/configuration/yaml.md index 4bcb8b0460..1686c8332e 100644 --- a/_security/configuration/yaml.md +++ b/_security/configuration/yaml.md @@ -17,7 +17,7 @@ The approach we recommend for using the YAML files is to first configure [reserv ## action_groups.yml -This file contains any initial action groups that you want to add to the Security plugin. +This file contains any role mappings required for your security configuration. You can find the `role_mapping.yml` file in `/config/opensearch-security/roles_mapping.yml`. Aside from some metadata, the default file is empty, because the Security plugin has a number of static action groups that it adds automatically. These static action groups cover a wide variety of use cases and are a great way to get started with the plugin. @@ -43,6 +43,8 @@ _meta: You can use `allowlist.yml` to add any endpoints and HTTP requests to a list of allowed endpoints and requests. If enabled, all users except the super admin are allowed access to only the specified endpoints and HTTP requests, and all other HTTP requests associated with the endpoint are denied. For example, if GET `_cluster/settings` is added to the allow list, users cannot submit PUT requests to `_cluster/settings` to update cluster settings. +You can find the `allowlist.yml` file in `/config/opensearch-security/allowlist.yml`. + Note that while you can configure access to endpoints this way, for most cases, it is still best to configure permissions using the Security plugin's users and roles, which have more granular settings. ```yml @@ -92,7 +94,7 @@ requests: # Only allow GET requests to /sample-index1/_doc/1 and /sample-index2/ ## internal_users.yml -This file contains any initial users that you want to add to the Security plugin's internal user database. +This file contains any initial users that you want to add to the Security plugin's internal user database. You can find this file in ``/config/opensearch-security/internal_users.yml`. The file format requires a hashed password. To generate one, run `plugins/opensearch-security/tools/hash.sh -p `. If you decide to keep any of the demo users, *change their passwords* and re-run [securityadmin.sh]({{site.url}}{{site.baseurl}}/security/configuration/security-admin/) to apply the new passwords. @@ -313,7 +315,7 @@ admin_tenant: ## opensearch.yml -In addition to many OpenSearch settings, this file contains paths to TLS certificates and their attributes, such as distinguished names and trusted certificate authorities. +In addition to many OpenSearch settings, the `opensearch.yml` file contains paths to TLS certificates and their attributes, such as distinguished names and trusted certificate authorities. You can find this file in `/config/`. ```yml plugins.security.ssl.transport.pemcert_filepath: esnode.pem