diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 0ec6c5e009..815687fa17 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -1 +1 @@ -* @hdhalter @kolchfa-aws @Naarcha-AWS @vagimeli @AMoo-Miki @natebower @dlvenable @scrawfor99 @epugh +* @kolchfa-aws @Naarcha-AWS @vagimeli @AMoo-Miki @natebower @dlvenable @stephen-crawford @epugh diff --git a/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt b/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt index 9e09f21c3a..11ff53efe6 100644 --- a/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt +++ b/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt @@ -80,6 +80,7 @@ Levenshtein [Oo]versamples? [Oo]nboarding pebibyte +p\d{2} [Pp]erformant [Pp]laintext [Pp]luggable @@ -126,6 +127,7 @@ stdout [Ss]ubvector [Ss]ubwords? [Ss]uperset +[Ss]uperadmins? [Ss]yslog tebibyte [Tt]emplated diff --git a/MAINTAINERS.md b/MAINTAINERS.md index 1bf2a1d219..55b908e027 100644 --- a/MAINTAINERS.md +++ b/MAINTAINERS.md @@ -6,12 +6,17 @@ This document lists the maintainers in this repo. See [opensearch-project/.githu | Maintainer | GitHub ID | Affiliation | | ---------------- | ----------------------------------------------- | ----------- | -| Heather Halter | [hdhalter](https://github.com/hdhalter) | Amazon | | Fanit Kolchina | [kolchfa-aws](https://github.com/kolchfa-aws) | Amazon | | Nate Archer | [Naarcha-AWS](https://github.com/Naarcha-AWS) | Amazon | | Nathan Bower | [natebower](https://github.com/natebower) | Amazon | | Melissa Vagi | [vagimeli](https://github.com/vagimeli) | Amazon | | Miki Barahmand | [AMoo-Miki](https://github.com/AMoo-Miki) | Amazon | | David Venable | [dlvenable](https://github.com/dlvenable) | Amazon | -| Stephen Crawford | [scraw99](https://github.com/scrawfor99) | Amazon | +| Stephen Crawford | [stephen-crawford](https://github.com/stephen-crawford) | Amazon | | Eric Pugh | [epugh](https://github.com/epugh) | OpenSource Connections | + +## Emeritus + +| Maintainer | GitHub ID | Affiliation | +| ---------------- | ----------------------------------------------- | ----------- | +| Heather Halter | [hdhalter](https://github.com/hdhalter) | Amazon | diff --git a/README.md b/README.md index 7d8de14151..66beb1948c 100644 --- a/README.md +++ b/README.md @@ -21,7 +21,6 @@ The following resources provide important guidance regarding contributions to th If you encounter problems or have questions when contributing to the documentation, these people can help: -- [hdhalter](https://github.com/hdhalter) - [kolchfa-aws](https://github.com/kolchfa-aws) - [Naarcha-AWS](https://github.com/Naarcha-AWS) - [vagimeli](https://github.com/vagimeli) diff --git a/_about/index.md b/_about/index.md index d2cc011b55..041197eeba 100644 --- a/_about/index.md +++ b/_about/index.md @@ -22,16 +22,21 @@ This section contains documentation for OpenSearch and OpenSearch Dashboards. ## Getting started -- [Intro to OpenSearch]({{site.url}}{{site.baseurl}}/intro/) -- [Quickstart]({{site.url}}{{site.baseurl}}/quickstart/) +To get started, explore the following documentation: + +- [Getting started guide]({{site.url}}{{site.baseurl}}/getting-started/): + - [Intro to OpenSearch]({{site.url}}{{site.baseurl}}/getting-started/intro/) + - [Installation quickstart]({{site.url}}{{site.baseurl}}/getting-started/quickstart/) + - [Communicate with OpenSearch]({{site.url}}{{site.baseurl}}/getting-started/communicate/) + - [Ingest data]({{site.url}}{{site.baseurl}}/getting-started/ingest-data/) + - [Search data]({{site.url}}{{site.baseurl}}/getting-started/search-data/) + - [Getting started with OpenSearch security]({{site.url}}{{site.baseurl}}/getting-started/security/) - [Install OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/install-opensearch/index/) - [Install OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/install-and-configure/install-dashboards/index/) -- [See the FAQ](https://opensearch.org/faq) +- [FAQ](https://opensearch.org/faq) ## Why use OpenSearch? -With OpenSearch, you can perform the following use cases: - @@ -41,35 +46,38 @@ With OpenSearch, you can perform the following use cases: - - - - + + + + - + - +
Operational health tracking
Fast, Scalable Full-text SearchApplication and Infrastructure MonitoringSecurity and Event Information ManagementOperational Health TrackingFast, scalable full-text searchApplication and infrastructure monitoringSecurity and event information managementOperational health tracking
Help users find the right information within your application, website, or data lake catalog. Easily store and analyze log data, and set automated alerts for underperformance.Easily store and analyze log data, and set automated alerts for performance issues. Centralize logs to enable real-time security monitoring and forensic analysis.Use observability logs, metrics, and traces to monitor your applications and business in real time.Use observability logs, metrics, and traces to monitor your applications in real time.
-**Additional features and plugins:** +## Key features + +OpenSearch provides several features to help index, secure, monitor, and analyze your data: -OpenSearch has several features and plugins to help index, secure, monitor, and analyze your data. Most OpenSearch plugins have corresponding OpenSearch Dashboards plugins that provide a convenient, unified user interface. -- [Anomaly detection]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/) - Identify atypical data and receive automatic notifications -- [KNN]({{site.url}}{{site.baseurl}}/search-plugins/knn/) - Find “nearest neighbors” in your vector data -- [Performance Analyzer]({{site.url}}{{site.baseurl}}/monitoring-plugins/pa/) - Monitor and optimize your cluster -- [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) - Use SQL or a piped processing language to query your data -- [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/) - Automate index operations -- [ML Commons plugin]({{site.url}}{{site.baseurl}}/ml-commons-plugin/index/) - Train and execute machine-learning models -- [Asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/) - Run search requests in the background -- [Cross-cluster replication]({{site.url}}{{site.baseurl}}/replication-plugin/index/) - Replicate your data across multiple OpenSearch clusters +- [Anomaly detection]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/) -- Identify atypical data and receive automatic notifications. +- [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) -- Use SQL or a Piped Processing Language (PPL) to query your data. +- [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/) -- Automate index operations. +- [Search methods]({{site.url}}{{site.baseurl}}/search-plugins/knn/) -- From traditional lexical search to advanced vector and hybrid search, discover the optimal search method for your use case. +- [Machine learning]({{site.url}}{{site.baseurl}}/ml-commons-plugin/index/) -- Integrate machine learning models into your workloads. +- [Workflow automation]({{site.url}}{{site.baseurl}}/automating-configurations/index/) -- Automate complex OpenSearch setup and preprocessing tasks. +- [Performance evaluation]({{site.url}}{{site.baseurl}}/monitoring-plugins/pa/) -- Monitor and optimize your cluster. +- [Asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/) -- Run search requests in the background. +- [Cross-cluster replication]({{site.url}}{{site.baseurl}}/replication-plugin/index/) -- Replicate your data across multiple OpenSearch clusters. ## The secure path forward -OpenSearch includes a demo configuration so that you can get up and running quickly, but before using OpenSearch in a production environment, you must [configure the Security plugin manually]({{site.url}}{{site.baseurl}}/security/configuration/index/) with your own certificates, authentication method, users, and passwords. + +OpenSearch includes a demo configuration so that you can get up and running quickly, but before using OpenSearch in a production environment, you must [configure the Security plugin manually]({{site.url}}{{site.baseurl}}/security/configuration/index/) with your own certificates, authentication method, users, and passwords. To get started, see [Getting started with OpenSearch security]({{site.url}}{{site.baseurl}}/getting-started/security/). ## Looking for the Javadoc? diff --git a/_about/version-history.md b/_about/version-history.md index d7273ffedb..fd635aff5b 100644 --- a/_about/version-history.md +++ b/_about/version-history.md @@ -31,6 +31,7 @@ OpenSearch version | Release highlights | Release date [2.0.1](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.0.1.md) | Includes bug fixes and maintenance updates for Alerting and Anomaly Detection. | 16 June 2022 [2.0.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.0.0.md) | Includes document-level monitors for alerting, OpenSearch Notifications plugins, and Geo Map Tiles in OpenSearch Dashboards. Also adds support for Lucene 9 and bug fixes for all OpenSearch plugins. For a full list of release highlights, see the Release Notes. | 26 May 2022 [2.0.0-rc1](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.0.0-rc1.md) | The Release Candidate for 2.0.0. This version allows you to preview the upcoming 2.0.0 release before the GA release. The preview release adds document-level alerting, support for Lucene 9, and the ability to use term lookup queries in document level security. | 03 May 2022 +[1.3.19](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.19.md) | Includes bug fixes and maintenance updates for OpenSearch security, OpenSearch security Dashboards, and anomaly detection. | 27 August 2024 [1.3.18](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.18.md) | Includes maintenance updates for OpenSearch security. | 16 July 2024 [1.3.17](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.17.md) | Includes maintenance updates for OpenSearch security and OpenSearch Dashboards security. | 06 June 2024 [1.3.16](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.16.md) | Includes bug fixes and maintenance updates for OpenSearch security, index management, performance analyzer, and reporting. | 23 April 2024 diff --git a/_api-reference/cluster-api/cluster-health.md b/_api-reference/cluster-api/cluster-health.md index 7081a1a587..0fd5662d91 100644 --- a/_api-reference/cluster-api/cluster-health.md +++ b/_api-reference/cluster-api/cluster-health.md @@ -25,6 +25,14 @@ GET _cluster/health GET _cluster/health/ ``` +## Path parameters + +The following table lists the available path parameters. All path parameters are optional. + +Parameter | Data type | Description +:--- | :--- | :--- +<index-name> | String | Limits health reporting to a specific index. Can be a single index or a comma-separated list of index names. + ## Query parameters The following table lists the available query parameters. All query parameters are optional. diff --git a/_api-reference/nodes-apis/nodes-stats.md b/_api-reference/nodes-apis/nodes-stats.md index 2ccea54390..479cd8e732 100644 --- a/_api-reference/nodes-apis/nodes-stats.md +++ b/_api-reference/nodes-apis/nodes-stats.md @@ -893,24 +893,27 @@ search.suggest_total | Integer | The total number of shard suggest operations. search.suggest_time_in_millis | Integer | The total amount of time for all shard suggest operations, in milliseconds. search.suggest_current | Integer | The number of shard suggest operations that are currently running. search.request | Object | Statistics about coordinator search operations for the node. +search.request.took.time_in_millis | Integer | The total amount of time taken for all search requests, in milliseconds. +search.request.took.current | Integer | The number of search requests that are currently running. +search.request.took.total | Integer | The total number of search requests completed. search.request.dfs_pre_query.time_in_millis | Integer | The total amount of time for all coordinator depth-first search (DFS) prequery operations, in milliseconds. search.request.dfs_pre_query.current | Integer | The number of coordinator DFS prequery operations that are currently running. -search.request.dfs_pre_query.total | Integer | The total number of coordinator DFS prequery operations. +search.request.dfs_pre_query.total | Integer | The total number of coordinator DFS prequery operations completed. search.request.query.time_in_millis | Integer | The total amount of time for all coordinator query operations, in milliseconds. search.request.query.current | Integer | The number of coordinator query operations that are currently running. -search.request.query.total | Integer | The total number of coordinator query operations. +search.request.query.total | Integer | The total number of coordinator query operations completed. search.request.fetch.time_in_millis | Integer | The total amount of time for all coordinator fetch operations, in milliseconds. search.request.fetch.current | Integer | The number of coordinator fetch operations that are currently running. -search.request.fetch.total | Integer | The total number of coordinator fetch operations. +search.request.fetch.total | Integer | The total number of coordinator fetch operations completed. search.request.dfs_query.time_in_millis | Integer | The total amount of time for all coordinator DFS prequery operations, in milliseconds. search.request.dfs_query.current | Integer | The number of coordinator DFS prequery operations that are currently running. -search.request.dfs_query.total | Integer | The total number of coordinator DFS prequery operations. +search.request.dfs_query.total | Integer | The total number of coordinator DFS prequery operations completed. search.request.expand.time_in_millis | Integer | The total amount of time for all coordinator expand operations, in milliseconds. search.request.expand.current | Integer | The number of coordinator expand operations that are currently running. -search.request.expand.total | Integer | The total number of coordinator expand operations. +search.request.expand.total | Integer | The total number of coordinator expand operations completed. search.request.can_match.time_in_millis | Integer | The total amount of time for all coordinator match operations, in milliseconds. search.request.can_match.current | Integer | The number of coordinator match operations that are currently running. -search.request.can_match.total | Integer | The total number of coordinator match operations. +search.request.can_match.total | Integer | The total number of coordinator match operations completed. merges | Object | Statistics about merge operations for the node. merges.current | Integer | The number of merge operations that are currently running. merges.current_docs | Integer | The number of document merges that are currently running. diff --git a/_api-reference/validate.md b/_api-reference/validate.md new file mode 100644 index 0000000000..6e1470a505 --- /dev/null +++ b/_api-reference/validate.md @@ -0,0 +1,205 @@ +--- +layout: default +title: Validate Query +nav_order: 87 +--- + +# Validate Query + +You can use the Validate Query API to validate a query without running it. The query can be sent as a path parameter or included in the request body. + +## Path and HTTP methods + +The Validate Query API contains the following path: + +```json +GET /_validate/query +``` + +## Path parameters + +All path parameters are optional. + +Parameter | Data type | Description +:--- | :--- | :--- +`index` | String | The index to validate the query against. If you don't specify an index or multiple indexes as part of the URL (or want to override the URL value for an individual search), you can include it here. Examples include `"logs-*"` and `["my-store", "sample_data_ecommerce"]`. +`query` | Query object | The query using [Query DSL]({{site.url}}{{site.baseurl}}/query-dsl/). + +## Query parameters + +The following table lists the available query parameters. All query parameters are optional. + +Parameter | Data type | Description +:--- | :--- | :--- +`all_shards` | Boolean | When `true`, validation is run against [all shards](#rewrite-and-all_shards) instead of against one shard per index. Default is `false`. +`allow_no_indices` | Boolean | Whether to ignore wildcards that don't match any indexes. Default is `true`. +allow_partial_search_results | Boolean | Whether to return partial results if the request encounters an error or times out. Default is `true`. +`analyzer` | String | The analyzer to use in the query string. This should only be used with the `q` option. +`analyze_wildcard` | Boolean | Specifies whether to analyze wildcard and prefix queries. Default is `false`. +`default_operator` | String | Indicates whether the default operator for a string query should be `AND` or `OR`. Default is `OR`. +`df` | String | The default field if a field prefix is not provided in the query string. +`expand_wildcards` | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are `all` (match any index), `open` (match open, non-hidden indexes), `closed` (match closed, non-hidden indexes), `hidden` (match hidden indexes), and `none` (deny wildcard expressions). Default is `open`. +`explain` | Boolean | Whether to return information about how OpenSearch computed the [document's score](#explain). Default is `false`. +`ignore_unavailable` | Boolean | Specifies whether to include missing or closed indexes in the response and ignores unavailable shards during the search request. Default is `false`. +`lenient` | Boolean | Specifies whether OpenSearch should ignore format-based query failures (for example, as a result of querying a text field for an integer). Default is `false`. +`rewrite` | Determines how OpenSearch [rewrites](#rewrite) and scores multi-term queries. Valid values are `constant_score`, `scoring_boolean`, `constant_score_boolean`, `top_terms_N`, `top_terms_boost_N`, and `top_terms_blended_freqs_N`. Default is `constant_score`. +`q` | String | A query in the Lucene string syntax. + +## Example request + +The following example request uses an index named `Hamlet` created using a `bulk` request: + +```json +PUT hamlet/_bulk?refresh +{"index":{"_id":1}} +{"user" : { "id": "hamlet" }, "@timestamp" : "2099-11-15T14:12:12", "message" : "To Search or Not To Search"} +{"index":{"_id":2}} +{"user" : { "id": "hamlet" }, "@timestamp" : "2099-11-15T14:12:13", "message" : "My dad says that I'm such a ham."} +``` +{% include copy.html %} + +You can then use the Validate Query API to validate an index query, as shown in the following example: + +```json +GET hamlet/_validate/query?q=user.id:hamlet +``` +{% include copy.html %} + +The query can also be sent as a request body, as shown in the following example: + +```json +GET hamlet/_validate/query +{ + "query" : { + "bool" : { + "must" : { + "query_string" : { + "query" : "*:*" + } + }, + "filter" : { + "term" : { "user.id" : "hamlet" } + } + } + } +} +``` +{% include copy.html %} + + +## Example responses + +If the query passes validation, then the response indicates that the query is `true`, as shown in the following example response, where the `valid` parameter is `true`: + +``` +{ + "_shards": { + "total": 1, + "successful": 1, + "failed": 0 + }, + "valid": true +} +``` + +If the query does not pass validation, then OpenSearch responds that the query is `false`. The following example request query includes a dynamic mapping not configured in the `hamlet` index: + +```json +GET hamlet/_validate/query +{ + "query": { + "query_string": { + "query": "@timestamp:foo", + "lenient": false + } + } +} +``` + +OpenSearch responds with the following, where the `valid` parameter is `false`: + +``` +{ + "_shards": { + "total": 1, + "successful": 1, + "failed": 0 + }, + "valid": false +} +``` + +Certain query parameters can also affect what is included in the response. The following examples show how the [Explain](#explain), [Rewrite](#rewrite), and [all_shards](#rewrite-and-all_shards) query options affect the response. + +### Explain + +The `explain` option returns information about the query failure in the `explanations` field, as shown in the following example response: + +``` +{ + "valid" : false, + "_shards" : { + "total" : 1, + "successful" : 1, + "failed" : 0 + }, + "explanations" : [ { + "index" : "_shakespeare", + "valid" : false, + "error" : "shakespeare/IAEc2nIXSSunQA_suI0MLw] QueryShardException[failed to create query:...failed to parse date field [foo]" + } ] +} +``` + + +### Rewrite + +When the `rewrite` option is set to `true` in the request, the `explanations` option shows the Lucene query that is executed as a string, as shown in the following response: + +``` +{ + "valid": true, + "_shards": { + "total": 1, + "successful": 1, + "failed": 0 + }, + "explanations": [ + { + "index": "", + "valid": true, + "explanation": "((user:hamlet^4.256753 play:hamlet^6.863601 play:romeo^2.8415773 plot:puck^3.4193945 plot:othello^3.8244398 ... )~4) -ConstantScore(_id:2) #(ConstantScore(_type:_doc))^0.0" + } + ] +} +``` + + +### Rewrite and all_shards + +When both the `rewrite` and `all_shards` options are set to `true`, the Validate Query API responds with detailed information from all available shards as opposed to only one shard (the default), as shown in the following response: + +``` +{ + "valid": true, + "_shards": { + "total": 1, + "successful": 1, + "failed": 0 + }, + "explanations": [ + { + "index": "my-index-000001", + "shard": 0, + "valid": true, + "explanation": "(user.id:hamlet)^0.6333333" + } + ] +} +``` + + + + + + diff --git a/_automating-configurations/api/index.md b/_automating-configurations/api/index.md index 716e19c41f..78bc4eaede 100644 --- a/_automating-configurations/api/index.md +++ b/_automating-configurations/api/index.md @@ -18,4 +18,6 @@ OpenSearch supports the following workflow APIs: * [Search workflow]({{site.url}}{{site.baseurl}}/automating-configurations/api/search-workflow/) * [Search workflow state]({{site.url}}{{site.baseurl}}/automating-configurations/api/search-workflow-state/) * [Deprovision workflow]({{site.url}}{{site.baseurl}}/automating-configurations/api/deprovision-workflow/) -* [Delete workflow]({{site.url}}{{site.baseurl}}/automating-configurations/api/delete-workflow/) \ No newline at end of file +* [Delete workflow]({{site.url}}{{site.baseurl}}/automating-configurations/api/delete-workflow/) + +For information about workflow access control, see [Workflow template security]({{site.url}}{{site.baseurl}}/automating-configurations/workflow-security/). \ No newline at end of file diff --git a/_automating-configurations/index.md b/_automating-configurations/index.md index 144ad445c8..68742f6149 100644 --- a/_automating-configurations/index.md +++ b/_automating-configurations/index.md @@ -44,3 +44,4 @@ Workflow automation provides the following benefits: - For the workflow step syntax, see [Workflow steps]({{site.url}}{{site.baseurl}}/automating-configurations/workflow-steps/). - For a complete example, see [Workflow tutorial]({{site.url}}{{site.baseurl}}/automating-configurations/workflow-tutorial/). - For configurable settings, see [Workflow settings]({{site.url}}{{site.baseurl}}/automating-configurations/workflow-settings/). +- For information about workflow access control, see [Workflow template security]({{site.url}}{{site.baseurl}}/automating-configurations/workflow-security/). \ No newline at end of file diff --git a/_automating-configurations/workflow-security.md b/_automating-configurations/workflow-security.md new file mode 100644 index 0000000000..f3a3d7eeb9 --- /dev/null +++ b/_automating-configurations/workflow-security.md @@ -0,0 +1,93 @@ +--- +layout: default +title: Workflow template security +nav_order: 50 +--- + +# Workflow template security + +In OpenSearch, automated workflow configurations are provided by the Flow Framework plugin. You can use the Security plugin together with the Flow Framework plugin to limit non-admin users to specific actions. For example, you might want some users to only be able to create, update, or delete workflows, while others may only be able to view workflows. + +All Flow Framework indexes are protected as system indexes. Only a superadmin user or an admin user with a TLS certificate can access system indexes. For more information, see [System indexes]({{site.url}}{{site.baseurl}}/security/configuration/system-indices/). + +Security for Flow Framework is set up similarly to [security for anomaly detection]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/security/). + +## Basic permissions + +As an admin user, you can use the Security plugin to assign specific permissions to users based on the APIs they need to access. For a list of supported Flow Framework APIs, see [Workflow APIs]({{site.url}}{{site.baseurl}}/automating-configurations/api/index/). + +The Security plugin has two built-in roles that cover most Flow Framework use cases: `flow_framework_full_access` and `flow_framework_read_access`. For descriptions of each, see [Predefined roles]({{site.url}}{{site.baseurl}}/security/access-control/users-roles#predefined-roles). + +If these roles don't meet your needs, you can assign users individual Flow Framework [permissions]({{site.url}}{{site.baseurl}}/security/access-control/permissions/) to suit your use case. Each action corresponds to an operation in the REST API. For example, the `cluster:admin/opensearch/flow_framework/workflow/search` permission lets you search workflows. + +### Fine-grained access control + +To reduce the chances of unintended users viewing metadata that describes an index, we recommend that administrators enable role-based access control when assigning permissions to the intended user group. For more information, see [Limit access by backend role](#advanced-limit-access-by-backend-role). + +## (Advanced) Limit access by backend role + +Use backend roles to configure fine-grained access to individual workflows based on roles. For example, users in different departments of an organization can view workflows owned by their own department. + +First, make sure your users have the appropriate [backend roles]({{site.url}}{{site.baseurl}}/security/access-control/index/). Backend roles usually come from an [LDAP server]({{site.url}}{{site.baseurl}}/security/configuration/ldap/) or [SAML provider]({{site.url}}{{site.baseurl}}/security/configuration/saml/), but if you use an internal user database, you can [create users manually using the API]({{site.url}}{{site.baseurl}}/security/access-control/api#create-user). + +Next, enable the following setting: + +```json +PUT _cluster/settings +{ + "transient": { + "plugins.flow_framework.filter_by_backend_roles": "true" + } +} +``` +{% include copy-curl.html %} + +Now when users view workflow resources in OpenSearch Dashboards (or make REST API calls), they only see workflows created by users who share at least one backend role. + +For example, consider two users: `alice` and `bob`. + +`alice` has an `analyst` backend role: + +```json +PUT _plugins/_security/api/internalusers/alice +{ + "password": "alice", + "backend_roles": [ + "analyst" + ], + "attributes": {} +} +``` + +`bob` has a `human-resources` backend role: + +```json +PUT _plugins/_security/api/internalusers/bob +{ + "password": "bob", + "backend_roles": [ + "human-resources" + ], + "attributes": {} +} +``` + +Both `alice` and `bob` have full access to the Flow Framework APIs: + +```json +PUT _plugins/_security/api/rolesmapping/flow_framework_full_access +{ + "backend_roles": [], + "hosts": [], + "users": [ + "alice", + "bob" + ] +} +``` + +Because they have different backend roles, `alice` and `bob` cannot view each other's workflows or their results. + +Users without backend roles can still view other users' workflow results if they have `flow_framework_read_access`. This also applies to users who have `flow_framework_full_access` because this permission includes all of the permissions of `flow_framework_read_access`. + +Administrators should inform users that the `flow_framework_read_access` permission allows them to view the results of any workflow in a cluster, including data not directly accessible to them. To limit access to the results of a specific workflow, administrators should apply backend role filters when creating the workflow. This ensures that only users with matching backend roles can access that workflow's results. \ No newline at end of file diff --git a/_benchmark/reference/workloads/operations.md b/_benchmark/reference/workloads/operations.md index ed6e6b8527..4e89b4ac42 100644 --- a/_benchmark/reference/workloads/operations.md +++ b/_benchmark/reference/workloads/operations.md @@ -10,13 +10,13 @@ nav_order: 100 # operations -The `operations` element contains a list of all available operations for specifying a schedule. +The `operations` element contains a list of all available operations for specifying a schedule. ## bulk -The `bulk` operation type allows you to run [bulk](/api-reference/document-apis/bulk/) requests as a task. +The `bulk` operation type allows you to run [bulk]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/) requests as a task. ### Usage @@ -82,7 +82,7 @@ If `detailed-results` is `true`, the following metadata is returned: ## create-index -The `create-index` operation runs the [Create Index API](/api-reference/index-apis/create-index/). It supports the following two modes of index creation: +The `create-index` operation runs the [Create Index API]({{site.url}}{{site.baseurl}}/api-reference/index-apis/create-index/). It supports the following two index creation modes: - Creating all indexes specified in the workloads `indices` section - Creating one specific index defined within the operation itself @@ -157,7 +157,7 @@ The `create-index` operation returns the following metadata: ## delete-index -The `delete-index` operation runs the [Delete Index API](api-reference/index-apis/delete-index/). Like with the [`create-index`](#create-index) operation, you can delete all indexes found in the `indices` section of the workload or delete one or more indexes based on the string passed in the `index` setting. +The `delete-index` operation runs the [Delete Index API]({{site.url}}{{site.baseurl}}/api-reference/index-apis/delete-index/). As with the [`create-index`](#create-index) operation, you can delete all indexes found in the `indices` section of the workload or delete one or more indexes based on the string passed in the `index` setting. ### Usage @@ -215,7 +215,7 @@ The `delete-index` operation returns the following metadata: ## cluster-health -The `cluster-health` operation runs the [Cluster Health API](api-reference/cluster-api/cluster-health/), which checks the cluster health status and returns the expected status according to the parameters set for `request-params`. If an unexpected cluster health status is returned, the operation reports a failure. You can use the `--on-error` option in the OpenSearch Benchmark `execute-test` command to control how OpenSearch Benchmark behaves when the health check fails. +The `cluster-health` operation runs the [Cluster Health API]({{site.url}}{{site.baseurl}}/api-reference/cluster-api/cluster-health/), which checks the cluster health status and returns the expected status according to the parameters set for `request-params`. If an unexpected cluster health status is returned, then the operation reports a failure. You can use the `--on-error` option in the OpenSearch Benchmark `execute-test` command to control how OpenSearch Benchmark behaves when the health check fails. ### Usage @@ -285,7 +285,7 @@ Parameter | Required | Type | Description ## search -The `search` operation runs the [Search API](/api-reference/search/), which you can use to run queries in OpenSearch Benchmark indexes. +The `search` operation runs the [Search API]({{site.url}}{{site.baseurl}}/api-reference/search/), which you can use to run queries in OpenSearch Benchmark indexes. ### Usage diff --git a/_dashboards/management/S3-data-source.md b/_dashboards/management/S3-data-source.md index 585edeac81..1a7cc579b0 100644 --- a/_dashboards/management/S3-data-source.md +++ b/_dashboards/management/S3-data-source.md @@ -10,51 +10,41 @@ has_children: true Introduced 2.11 {: .label .label-purple } -Starting with OpenSearch 2.11, you can connect OpenSearch to your Amazon Simple Storage Service (Amazon S3) data source using the OpenSearch Dashboards UI. You can then query that data, optimize query performance, define tables, and integrate your S3 data within a single UI. +You can connect OpenSearch to your Amazon Simple Storage Service (Amazon S3) data source using the OpenSearch Dashboards interface and then query that data, optimize query performance, define tables, and integrate your S3 data. ## Prerequisites -To connect data from Amazon S3 to OpenSearch using OpenSearch Dashboards, you must have: +Before connecting a data source, verify that the following requirements are met: -- Access to Amazon S3 and the [AWS Glue Data Catalog](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/connectors/s3glue_connector.rst#id2). -- Access to OpenSearch and OpenSearch Dashboards. -- An understanding of OpenSearch data source and connector concepts. See the [developer documentation](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/datasources.rst#introduction) for information about these concepts. +- You have access to Amazon S3 and the [AWS Glue Data Catalog](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/connectors/s3glue_connector.rst#id2). +- You have access to OpenSearch and OpenSearch Dashboards. +- You have an understanding of OpenSearch data source and connector concepts. See the [developer documentation](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/datasources.rst#introduction) for more information. -## Connect your Amazon S3 data source +## Connect your data source -To connect your Amazon S3 data source, follow these steps: +To connect your data source, follow these steps: -1. From the OpenSearch Dashboards main menu, select **Management** > **Data sources**. -2. On the **Data sources** page, select **New data source** > **S3**. An example UI is shown in the following image. +1. From the OpenSearch Dashboards main menu, go to **Management** > **Dashboards Management** > **Data sources**. +2. On the **Data sources** page, select **Create data source connection** > **Amazon S3**. +3. On the **Configure Amazon S3 data source** page, enter the data source, authentication details, and permissions. +4. Select the **Review Configuration** button to verify the connection details. +5. Select the **Connect to Amazon S3** button to establish a connection. - Amazon S3 data sources UI +## Manage your data source -3. On the **Configure Amazon S3 data source** page, enter the required **Data source details**, **AWS Glue authentication details**, **AWS Glue index store details**, and **Query permissions**. An example UI is shown in the following image. - - Amazon S3 configuration UI - -4. Select the **Review Configuration** button and verify the details. -5. Select the **Connect to Amazon S3** button. - -## Manage your Amazon S3 data source - -Once you've connected your Amazon S3 data source, you can explore your data through the **Manage data sources** tab. The following steps guide you through using this functionality: +To manage your data source, follow these steps: 1. On the **Manage data sources** tab, choose a date source from the list. -2. On that data source's page, you can manage the data source, choose a use case, and manage access controls and configurations. An example UI is shown in the following image. - - Manage data sources UI - -3. (Optional) Explore the Amazon S3 use cases, including querying your data and optimizing query performance. Go to **Next steps** to learn more about each use case. +2. On the page for the data source, you can manage the data source, choose a use case, and configure access controls. +3. (Optional) Explore the Amazon S3 use cases, including querying your data and optimizing query performance. Refer to the [**Next steps**](#next-steps) section to learn more about each use case. ## Limitations -This feature is still under development, including the data integration functionality. For real-time updates, see the [developer documentation on GitHub](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#limitations). +This feature is currently under development, including the data integration functionality. For up-to-date information, refer to the [developer documentation on GitHub](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#limitations). ## Next steps - Learn about [querying your data in Data Explorer]({{site.url}}{{site.baseurl}}/dashboards/management/query-data-source/) through OpenSearch Dashboards. -- Learn about ways to [optimize the query performance of your external data sources]({{site.url}}{{site.baseurl}}/dashboards/management/accelerate-external-data/), such as Amazon S3, through Query Workbench. +- Learn about [optimizing the query performance of your external data sources]({{site.url}}{{site.baseurl}}/dashboards/management/accelerate-external-data/), such as Amazon S3, through Query Workbench. - Learn about [Amazon S3 and AWS Glue Data Catalog](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/connectors/s3glue_connector.rst) and the APIS used with Amazon S3 data sources, including configuration settings and query examples. - Learn about [managing your indexes]({{site.url}}{{site.baseurl}}/dashboards/im-dashboards/index/) through OpenSearch Dashboards. - diff --git a/_dashboards/management/accelerate-external-data.md b/_dashboards/management/accelerate-external-data.md index 00e4600ffd..6d1fa030e4 100644 --- a/_dashboards/management/accelerate-external-data.md +++ b/_dashboards/management/accelerate-external-data.md @@ -12,55 +12,37 @@ Introduced 2.11 {: .label .label-purple } -Query performance can be slow when using external data sources for reasons such as network latency, data transformation, and data volume. You can optimize your query performance by using OpenSearch indexes, such as a skipping index or a covering index. A _skipping index_ uses skip acceleration methods, such as partition, minimum and maximum values, and value sets, to ingest and create compact aggregate data structures. This makes them an economical option for direct querying scenarios. A _covering index_ ingests all or some of the data from the source into OpenSearch and makes it possible to use all OpenSearch Dashboards and plugin functionality. See the [Flint Index Reference Manual](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md) for comprehensive guidance on this feature's indexing process. +Query performance can be slow when using external data sources for reasons such as network latency, data transformation, and data volume. You can optimize your query performance by using OpenSearch indexes, such as a skipping index or a covering index. + +A _skipping index_ uses skip acceleration methods, such as partition, minimum and maximum values, and value sets, to ingest and create compact aggregate data structures. This makes them an economical option for direct querying scenarios. + +A _covering index_ ingests all or some of the data from the source into OpenSearch and makes it possible to use all OpenSearch Dashboards and plugin functionality. See the [Flint Index Reference Manual](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md) for comprehensive guidance on this feature's indexing process. ## Data sources use case: Accelerate performance To get started with the **Accelerate performance** use case available in **Data sources**, follow these steps: 1. Go to **OpenSearch Dashboards** > **Query Workbench** and select your Amazon S3 data source from the **Data sources** dropdown menu in the upper-left corner. -2. From the left-side navigation menu, select a database. An example using the `http_logs` database is shown in the following image. - - Query Workbench accelerate data UI - +2. From the left-side navigation menu, select a database. 3. View the results in the table and confirm that you have the desired data. 4. Create an OpenSearch index by following these steps: - 1. Select the **Accelerate data** button. A pop-up window appears. An example is shown in the following image. - - Accelerate data pop-up window - + 1. Select the **Accelerate data** button. A pop-up window appears. 2. Enter your details in **Select data fields**. In the **Database** field, select the desired acceleration index: **Skipping index** or **Covering index**. A _skipping index_ uses skip acceleration methods, such as partition, min/max, and value sets, to ingest data using compact aggregate data structures. This makes them an economical option for direct querying scenarios. A _covering index_ ingests all or some of the data from the source into OpenSearch and makes it possible to use all OpenSearch Dashboards and plugin functionality. - -5. Under **Index settings**, enter the information for your acceleration index. For information about naming, select **Help**. Note that an Amazon S3 table can only have one skipping index at a time. An example is shown in the following image. - - Skipping index settings +5. Under **Index settings**, enter the information for your acceleration index. For information about naming, select **Help**. Note that an Amazon S3 table can only have one skipping index at a time. ### Define skipping index settings -1. Under **Skipping index definition**, select the **Add fields** button to define the skipping index acceleration method and choose the fields you want to add. An example is shown in the following image. - - Skipping index add fields - +1. Under **Skipping index definition**, select the **Add fields** button to define the skipping index acceleration method and choose the fields you want to add. 2. Select the **Copy Query to Editor** button to apply your skipping index settings. -3. View the skipping index query details in the table pane and then select the **Run** button. Your index is added to the left-side navigation menu containing the list of your databases. An example is shown in the following image. - - Run a skippping or covering index UI +3. View the skipping index query details in the table pane and then select the **Run** button. Your index is added to the left-side navigation menu containing the list of your databases. ### Define covering index settings -1. Under **Index settings**, enter a valid index name. Note that each Amazon S3 table can have multiple covering indexes. An example is shown in the following image. - - Covering index settings - -2. Once you have added the index name, define the covering index fields by selecting `(add fields here)` under **Covering index definition**. An example is shown in the following image. - - Covering index field naming - +1. Under **Index settings**, enter a valid index name. Note that each Amazon S3 table can have multiple covering indexes. +2. Once you have added the index name, define the covering index fields by selecting `(add fields here)` under **Covering index definition**. 3. Select the **Copy Query to Editor** button to apply your covering index settings. -4. View the covering index query details in the table pane and then select the **Run** button. Your index is added to the left-side navigation menu containing the list of your databases. An example UI is shown in the following image. - - Run index in Query Workbench +4. View the covering index query details in the table pane and then select the **Run** button. Your index is added to the left-side navigation menu containing the list of your databases. ## Limitations -This feature is still under development, so there are some limitations. For real-time updates, see the [developer documentation on GitHub](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#limitations). +This feature is still under development, so there are some limitations. For real-time updates, refer to the [developer documentation on GitHub](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#limitations). diff --git a/_dashboards/management/connect-prometheus.md b/_dashboards/management/connect-prometheus.md new file mode 100644 index 0000000000..ed5545bc56 --- /dev/null +++ b/_dashboards/management/connect-prometheus.md @@ -0,0 +1,53 @@ +--- +layout: default +title: Connecting Prometheus to OpenSearch +parent: Data sources +nav_order: 20 +--- + +# Connecting Prometheus to OpenSearch +Introduced 2.16 +{: .label .label-purple } + +This documentation covers the key steps to connect Prometheus to OpenSearch using the OpenSearch Dashboards interface, including setting up the data source connection, modifying the connection details, and creating an index pattern for the Prometheus data. + +## Prerequisites and permissions + +Before connecting a data source, ensure you have met the [Prerequisites]({{site.url}}{{site.baseurl}}/dashboards/management/data-sources/#prerequisites) and have the necessary [Permissions]({{site.url}}{{site.baseurl}}/dashboards/management/data-sources/#permissions). + +## Create a Prometheus data source connection + +A data source connection specifies the parameters needed to connect to a data source. These parameters form a connection string for the data source. Using OpenSearch Dashboards, you can add new **Prometheus** data source connections or manage existing ones. + +Follow these steps to connect your data source: + +1. From the OpenSearch Dashboards main menu, go to **Management** > **Data sources** > **New data source** > **Prometheus**. + +2. From the **Configure Prometheus data source** section: + + - Under **Data source details**, provide a title and optional description. + - Under **Prometheus data location**, enter the Prometheus URI. + - Under **Authentication details**, select the appropriate authentication method from the dropdown list and enter the required details: + - **Basic authentication**: Enter a username and password. + - **AWS Signature Version 4**: Specify the **Region**, select the OpenSearch service from the **Service Name** list (**Amazon OpenSearch Service** or **Amazon OpenSearch Serverless**), and enter the **Access Key** and **Secret Key**. + - Under **Query permissions**, choose the role needed to search and index data. If you select **Restricted**, an additional field will become available to configure the required role. + +3. Select **Review Configuration** > **Connect to Prometheus** to save your settings. The new connection will appear in the list of data sources. + +## Modify a data source connection + +To modify a data source connection, follow these steps: + +1. Select the desired connection from the list on the **Data sources** main page. This will open the **Connection Details** window. +2. Within the **Connection Details** window, edit the **Title** and **Description** fields. Select the **Save changes** button to apply the changes. +3. To update the **Authentication Method**, choose the method from the dropdown list and enter any necessary credentials. Select **Save changes** to apply the changes. + - To update the **Basic authentication** authentication method, select the **Update stored password** button. Within the pop-up window, enter the updated password and confirm it and select **Update stored password** to save the changes. To test the connection, select the **Test connection** button. + - To update the **AWS Signature Version 4** authentication method, select the **Update stored AWS credential** button. Within the pop-up window, enter the updated access and secret keys and select **Update stored AWS credential** to save the changes. To test the connection, select the **Test connection** button. + +## Delete a data source connection + +To delete the data source connection, select the {::nomarkdown}delete icon{:/} icon. + +## Create an index pattern + +After creating a data source connection, the next step is to create an index pattern for that data source. For more information and a tutorial on index patterns, refer to [Index patterns]({{site.url}}{{site.baseurl}}/dashboards/management/index-patterns/). diff --git a/_dashboards/management/data-sources.md b/_dashboards/management/data-sources.md index fdd4edc150..62d3a5aab2 100644 --- a/_dashboards/management/data-sources.md +++ b/_dashboards/management/data-sources.md @@ -13,67 +13,37 @@ This documentation focuses on using the OpenSeach Dashboards interface to connec ## Prerequisites -The first step in connecting your data sources to OpenSearch is to install OpenSearch and OpenSearch Dashboards on your system. You can follow the installation instructions in the [OpenSearch documentation]({{site.url}}{{site.baseurl}}/install-and-configure/index/) to install these tools. +The first step in connecting your data sources to OpenSearch is to install OpenSearch and OpenSearch Dashboards on your system. Refer to the [installation instructions]({{site.url}}{{site.baseurl}}/install-and-configure/index/) for information. Once you have installed OpenSearch and OpenSearch Dashboards, you can use Dashboards to connect your data sources to OpenSearch and then use Dashboards to manage data sources, create index patterns based on those data sources, run queries against a specific data source, and combine visualizations in one dashboard. Configuration of the [YAML files]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/#configuration-file) and installation of the `dashboards-observability` and `opensearch-sql` plugins is necessary. For more information, see [OpenSearch plugins]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/). -## Permissions - -To work with data sources in OpenSearch Dashboards, make sure that the user has been assigned the correct cluster-level [data source permission]({{site.url}}{{site.baseurl}}/security/access-control/permissions#data-source-permissions). - - - -## Create a data source connection - -A data source connection specifies the parameters needed to connect to a data source. These parameters form a connection string for the data source. Using Dashboards, you can add new data source connections or manage existing ones. - -The following steps guide you through the basics of creating a data source connection: - -1. From the OpenSearch Dashboards main menu, select **Management** > **Data sources** > **Create data source connection**. The UI is shown in the following image. +To securely store and encrypt data source connections in OpenSearch, you must add the following configuration to the `opensearch.yml` file on all the nodes: - Connecting a data source UI +`plugins.query.datasources.encryption.masterkey: "YOUR_GENERATED_MASTER_KEY_HERE"` -2. Create the data source connection by entering the appropriate information into the **Connection Details** and **Authentication Method** fields. - - - Under **Connection Details**, enter a title and endpoint URL. For this tutorial, use the URL `http://localhost:5601/app/management/opensearch-dashboards/dataSources`. Entering a description is optional. +The key must be 16, 24, or 32 characters. You can use the following command to generate a 24-character key: - - Under **Authentication Method**, select an authentication method from the dropdown list. Once an authentication method is selected, the applicable fields for that method appear. You can then enter the required details. The authentication method options are: - - **No authentication**: No authentication is used to connect to the data source. - - **Username & Password**: A basic username and password are used to connect to the data source. - - **AWS SigV4**: An AWS Signature Version 4 authenticating request is used to connect to the data source. AWS Signature Version 4 requires an access key and a secret key. - - For AWS Signature Version 4 authentication, first specify the **Region**. Next, select the OpenSearch service in the **Service Name** list. The options are **Amazon OpenSearch Service** and **Amazon OpenSearch Serverless**. Lastly, enter the **Access Key** and **Secret Key** for authorization. +`openssl rand -hex 12` - After you have populated the required fields, the **Test connection** and **Create data source** buttons become active. You can select **Test connection** to confirm that the connection is valid. +Generating 12 bytes results in a hexadecimal string that is 12 * 2 = 24 characters. +{: .note} -3. Select **Create data source** to save your settings. The connection is created. The active window returns to the **Data sources** main page, and the new connection appears in the list of data sources. - -4. To delete a data source connection, select the checkbox to the left of the data source **Title** and then select the **Delete 1 connection** button. Selecting multiple checkboxes for multiple connections is supported. An example UI is shown in the following image. - - Deleting a data source UI - -### Modify a data source connection - -To make changes to a data source connection, select a connection in the list on the **Data sources** main page. The **Connection Details** window opens. - -To make changes to **Connection Details**, edit one or both of the **Title** and **Description** fields and select **Save changes** in the lower-right corner of the screen. You can also cancel changes here. To change the **Authentication Method**, choose a different authentication method, enter your credentials (if applicable), and then select **Save changes** in the lower-right corner of the screen. The changes are saved. - -When **Username & Password** is the selected authentication method, you can update the password by choosing **Update stored password** next to the **Password** field. In the pop-up window, enter a new password in the first field and then enter it again in the second field to confirm. Select **Update stored password** in the pop-up window. The new password is saved. Select **Test connection** to confirm that the connection is valid. +## Permissions -When **AWS SigV4** is the selected authentication method, you can update the credentials by selecting **Update stored AWS credential**. In the pop-up window, enter a new access key in the first field and a new secret key in the second field. Select **Update stored AWS credential** in the pop-up window. The new credentials are saved. Select **Test connection** in the upper-right corner of the screen to confirm that the connection is valid. +To work with data sources in OpenSearch Dashboards, you must be assigned the correct cluster-level [data source permissions]({{site.url}}{{site.baseurl}}/security/access-control/permissions#data-source-permissions). -To delete the data source connection, select the delete icon ({::nomarkdown}delete icon{:/}). +## Types of data streams -## Create an index pattern +To configure data sources through OpenSearch Dashboards, go to **Management** > **Dashboards Management** > **Data sources**. This flow can be used for OpenSearch data stream connections. See [Configuring and using multiple data sources]({{site.url}}{{site.baseurl}}/dashboards/management/multi-data-sources/). -Once you've created a data source connection, you can create an index pattern for the data source. An _index pattern_ is a template that OpenSearch uses to create indexes for data from the data source. See [Index patterns]({{site.url}}{{site.baseurl}}/dashboards/management/index-patterns/) for more information and a tutorial. +Alternatively, if you are running OpenSearch Dashboards 2.16 or later, go to **Management** > **Data sources**. This flow can be used to connect Amazon Simple Storage Service (Amazon S3) and Prometheus. See [Connecting Amazon S3 to OpenSearch]({{site.url}}{{site.baseurl}}/dashboards/management/S3-data-source/) and [Connecting Prometheus to OpenSearch]({{site.url}}{{site.baseurl}}/dashboards/management/connect-prometheus/) for more information. ## Next steps - Learn about [managing index patterns]({{site.url}}{{site.baseurl}}/dashboards/management/index-patterns/) through OpenSearch Dashboards. - Learn about [indexing data using Index Management]({{site.url}}{{site.baseurl}}/dashboards/im-dashboards/index/) through OpenSearch Dashboards. - Learn about how to connect [multiple data sources]({{site.url}}{{site.baseurl}}/dashboards/management/multi-data-sources/). -- Learn about how to [connect OpenSearch and Amazon S3 through OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/dashboards/management/S3-data-source/). -- Learn about the [Integrations]({{site.url}}{{site.baseurl}}/integrations/index/) tool, which gives you the flexibility to use various data ingestion methods and connect data from the Dashboards UI. - +- Learn about how to connect [OpenSearch and Amazon S3]({{site.url}}{{site.baseurl}}/dashboards/management/S3-data-source/) and [OpenSearch and Prometheus]({{site.url}}{{site.baseurl}}/dashboards/management/connect-prometheus/) using the OpenSearch Dashboards interface. +- Learn about the [Integrations]({{site.url}}{{site.baseurl}}/integrations/index/) plugin, which gives you the flexibility to use various data ingestion methods and connect data to OpenSearch Dashboards. diff --git a/_dashboards/management/query-data-source.md b/_dashboards/management/query-data-source.md index f1496b3e17..a3392c073e 100644 --- a/_dashboards/management/query-data-source.md +++ b/_dashboards/management/query-data-source.md @@ -11,7 +11,7 @@ has_children: false Introduced 2.11 {: .label .label-purple } -This tutorial guides you through using the **Query data** use case for querying and visualizing your Amazon Simple Storage Service (Amazon S3) data using OpenSearch Dashboards. +This tutorial guides you through using the **Query data** use case for querying and visualizing your Amazon Simple Storage Service (Amazon S3) data using OpenSearch Dashboards. ## Prerequisites @@ -22,15 +22,9 @@ You must be using the `opensearch-security` plugin and have the appropriate role To get started, follow these steps: 1. On the **Manage data sources** page, select your data source from the list. -2. On the data source's detail page, select the **Query data** card. This option takes you to the **Observability** > **Logs** page, which is shown in the following image. - - Observability Logs UI - +2. On the data source's detail page, select the **Query data** card. This option takes you to the **Observability** > **Logs** page. 3. Select the **Event Explorer** button. This option creates and saves frequently searched queries and visualizations using [Piped Processing Language (PPL)]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/index/) or [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/), which connects to Spark SQL. -4. Select the Amazon S3 data source from the dropdown menu in the upper-left corner. An example is shown in the following image. - - Observability Logs Amazon S3 dropdown menu - +4. Select the Amazon S3 data source from the dropdown menu in the upper-left corner. 5. Enter the query in the **Enter PPL query** field. Note that the default language is SQL. To change the language, select PPL from the dropdown menu. 6. Select the **Search** button. The **Query Processing** message is shown, confirming that your query is being processed. 7. View the results, which are listed in a table on the **Events** tab. On this page, details such as available fields, source, and time are shown in a table format. @@ -40,10 +34,7 @@ To get started, follow these steps: To create visualizations, follow these steps: -1. On the **Explorer** page, select the **Visualizations** tab. An example is shown in the following image. - - Explorer Amazon S3 visualizations UI - +1. On the **Explorer** page, select the **Visualizations** tab. 2. Select **Index data to visualize**. This option currently only creates [acceleration indexes]({{site.url}}{{site.baseurl}}/dashboards/management/accelerate-external-data/), which give you views of the data visualizations from the **Visualizations** tab. To create a visualization of your Amazon S3 data, go to **Discover**. See the [Discover documentation]({{site.url}}{{site.baseurl}}/dashboards/discover/index-discover/) for information and a tutorial. ## Use Query Workbench with your Amazon S3 data source @@ -53,14 +44,12 @@ To create visualizations, follow these steps: To use Query Workbench with your Amazon S3 data, follow these steps: 1. From the OpenSearch Dashboards main menu, select **OpenSearch Plugins** > **Query Workbench**. -2. From the **Data Sources** dropdown menu in the upper-left corner, choose your Amazon S3 data source. Your data begins loading the databases that are part of your data source. An example is shown in the following image. - - Query Workbench Amazon S3 data loading UI - +2. From the **Data Sources** dropdown menu in the upper-left corner, choose your Amazon S3 data source. Your data begins loading the databases that are part of your data source. 3. View the databases listed in the left-side navigation menu and select a database to view its details. Any information about acceleration indexes is listed under **Acceleration index destination**. 4. Choose the **Describe Index** button to learn more about how data is stored in that particular index. 5. Choose the **Drop index** button to delete and clear both the OpenSearch index and the Amazon S3 Spark job that refreshes the data. -6. Enter your SQL query and select **Run**. +6. Enter your SQL query and select **Run**. + ## Next steps - Learn about [accelerating the query performance of your external data sources]({{site.url}}{{site.baseurl}}/dashboards/management/accelerate-external-data/). diff --git a/_data-prepper/pipelines/configuration/processors/obfuscate.md b/_data-prepper/pipelines/configuration/processors/obfuscate.md index 8d6bf901da..96b03e7405 100644 --- a/_data-prepper/pipelines/configuration/processors/obfuscate.md +++ b/_data-prepper/pipelines/configuration/processors/obfuscate.md @@ -62,6 +62,13 @@ When run, the `obfuscate` processor parses the fields into the following output: Use the following configuration options with the `obfuscate` processor. + + | Parameter | Required | Description | | :--- | :--- | :--- | | `source` | Yes | The source field to obfuscate. | diff --git a/_data-prepper/pipelines/configuration/sinks/s3.md b/_data-prepper/pipelines/configuration/sinks/s3.md index d1413f6ffc..3ff266cccf 100644 --- a/_data-prepper/pipelines/configuration/sinks/s3.md +++ b/_data-prepper/pipelines/configuration/sinks/s3.md @@ -173,14 +173,14 @@ When you provide your own Avro schema, that schema defines the final structure o In cases where your data is uniform, you may be able to automatically generate a schema. Automatically generated schemas are based on the first event that the codec receives. The schema will only contain keys from this event, and all keys must be present in all events in order to automatically generate a working schema. Automatically generated schemas make all fields nullable. Use the `include_keys` and `exclude_keys` sink configurations to control which data is included in the automatically generated schema. -Avro fields should use a null [union](https://avro.apache.org/docs/current/specification/#unions) because this will allow missing values. Otherwise, all required fields must be present for each event. Use non-nullable fields only when you are certain they exist. +Avro fields should use a null [union](https://avro.apache.org/docs/1.10.2/spec.html#Unions) because this will allow missing values. Otherwise, all required fields must be present for each event. Use non-nullable fields only when you are certain they exist. Use the following options to configure the codec. Option | Required | Type | Description :--- | :--- | :--- | :--- -`schema` | Yes | String | The Avro [schema declaration](https://avro.apache.org/docs/current/specification/#schema-declaration). Not required if `auto_schema` is set to true. -`auto_schema` | No | Boolean | When set to `true`, automatically generates the Avro [schema declaration](https://avro.apache.org/docs/current/specification/#schema-declaration) from the first event. +`schema` | Yes | String | The Avro [schema declaration](https://avro.apache.org/docs/1.2.0/spec.html#schemas). Not required if `auto_schema` is set to true. +`auto_schema` | No | Boolean | When set to `true`, automatically generates the Avro [schema declaration](https://avro.apache.org/docs/1.2.0/spec.html#schemas) from the first event. ### `ndjson` codec diff --git a/_data-prepper/pipelines/configuration/sources/opensearch.md b/_data-prepper/pipelines/configuration/sources/opensearch.md index 7cc0b9a36a..a7ba965729 100644 --- a/_data-prepper/pipelines/configuration/sources/opensearch.md +++ b/_data-prepper/pipelines/configuration/sources/opensearch.md @@ -135,7 +135,7 @@ Option | Required | Type | Description ### Scheduling -The `scheduling` configuration allows the user to configure how indexes are reprocessed in the source based on the the `index_read_count` and recount time `interval`. +The `scheduling` configuration allows the user to configure how indexes are reprocessed in the source based on the `index_read_count` and recount time `interval`. For example, setting `index_read_count` to `3` with an `interval` of `1h` will result in all indexes being reprocessed 3 times, 1 hour apart. By default, indexes will only be processed once. diff --git a/_field-types/index.md b/_field-types/index.md index 7a7e816ada..e9250f409d 100644 --- a/_field-types/index.md +++ b/_field-types/index.md @@ -12,43 +12,77 @@ redirect_from: # Mappings and field types -You can define how documents and their fields are stored and indexed by creating a _mapping_. The mapping specifies the list of fields for a document. Every field in the document has a _field type_, which defines the type of data the field contains. For example, you may want to specify that the `year` field should be of type `date`. To learn more, see [Supported field types]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/index/). +Mappings tell OpenSearch how to store and index your documents and their fields. You can specify the data type for each field (for example, `year` as `date`) to make storage and querying more efficient. -If you're just starting to build out your cluster and data, you may not know exactly how your data should be stored. In those cases, you can use dynamic mappings, which tell OpenSearch to dynamically add data and its fields. However, if you know exactly what types your data falls under and want to enforce that standard, then you can use explicit mappings. +While [dynamic mappings](#dynamic-mapping) automatically add new data and fields, using explicit mappings is recommended. Explicit mappings let you define the exact structure and data types upfront. This helps to maintain data consistency and optimize performance, especially for large datasets or high-volume indexing operations. -For example, if you want to indicate that `year` should be of type `text` instead of an `integer`, and `age` should be an `integer`, you can do so with explicit mappings. By using dynamic mapping, OpenSearch might interpret both `year` and `age` as integers. +For example, with explicit mappings, you can ensure that `year` is treated as text and `age` as an integer instead of both being interpreted as integers by dynamic mapping. -This section provides an example for how to create an index mapping and how to add a document to it that will get ip_range validated. - -#### Table of contents -1. TOC -{:toc} - - ---- ## Dynamic mapping When you index a document, OpenSearch adds fields automatically with dynamic mapping. You can also explicitly add fields to an index mapping. -#### Dynamic mapping types +### Dynamic mapping types Type | Description :--- | :--- -null | A `null` field can't be indexed or searched. When a field is set to null, OpenSearch behaves as if that field has no values. -boolean | OpenSearch accepts `true` and `false` as boolean values. An empty string is equal to `false.` -float | A single-precision 32-bit floating point number. -double | A double-precision 64-bit floating point number. -integer | A signed 32-bit number. -object | Objects are standard JSON objects, which can have fields and mappings of their own. For example, a `movies` object can have additional properties such as `title`, `year`, and `director`. -array | Arrays in OpenSearch can only store values of one type, such as an array of just integers or strings. Empty arrays are treated as though they are fields with no values. -text | A string sequence of characters that represent full-text values. -keyword | A string sequence of structured characters, such as an email address or ZIP code. +`null` | A `null` field can't be indexed or searched. When a field is set to null, OpenSearch behaves as if the field has no value. +`boolean` | OpenSearch accepts `true` and `false` as Boolean values. An empty string is equal to `false.` +`float` | A single-precision, 32-bit floating-point number. +`double` | A double-precision, 64-bit floating-point number. +`integer` | A signed 32-bit number. +`object` | Objects are standard JSON objects, which can have fields and mappings of their own. For example, a `movies` object can have additional properties such as `title`, `year`, and `director`. +`array` | OpenSearch does not have a specific array data type. Arrays are represented as a set of values of the same data type (for example, integers or strings) associated with a field. When indexing, you can pass multiple values for a field, and OpenSearch will treat it as an array. Empty arrays are valid and recognized as array fields with zero elements---not as fields with no values. OpenSearch supports querying and filtering arrays, including checking for values, range queries, and array operations like concatenation and intersection. Nested arrays, which may contain complex objects or other arrays, can also be used for advanced data modeling. +`text` | A string sequence of characters that represent full-text values. +`keyword` | A string sequence of structured characters, such as an email address or ZIP code. date detection string | Enabled by default, if new string fields match a date's format, then the string is processed as a `date` field. For example, `date: "2012/03/11"` is processed as a date. numeric detection string | If disabled, OpenSearch may automatically process numeric values as strings when they should be processed as numbers. When enabled, OpenSearch can process strings into `long`, `integer`, `short`, `byte`, `double`, `float`, `half_float`, `scaled_float`, and `unsigned_long`. Default is disabled. +### Dynamic templates + +Dynamic templates are used to define custom mappings for dynamically added fields based on the data type, field name, or field path. They allow you to define a flexible schema for your data that can automatically adapt to changes in the structure or format of the input data. + +You can use the following syntax to define a dynamic mapping template: + +```json +PUT index +{ + "mappings": { + "dynamic_templates": [ + { + "fields": { + "mapping": { + "type": "short" + }, + "match_mapping_type": "string", + "path_match": "status*" + } + } + ] + } +} +``` +{% include copy-curl.html %} + +This mapping configuration dynamically maps any field with a name starting with `status` (for example, `status_code`) to the `short` data type if the initial value provided during indexing is a string. + +### Dynamic mapping parameters + +The `dynamic_templates` support the following parameters for matching conditions and mapping rules. The default value is `null`. + +Parameter | Description | +----------|-------------| +`match_mapping_type` | Specifies the JSON data type (for example, string, long, double, object, binary, Boolean, date) that triggers the mapping. +`match` | A regular expression used to match field names and apply the mapping. +`unmatch` | A regular expression used to exclude field names from the mapping. +`match_pattern` | Determines the pattern matching behavior, either `regex` or `simple`. Default is `simple`. +`path_match` | Allows you to match nested field paths using a regular expression. +`path_unmatch` | Excludes nested field paths from the mapping using a regular expression. +`mapping` | The mapping configuration to apply. + ## Explicit mapping -If you know exactly what your field data types need to be, you can specify them in your request body when creating your index. +If you know exactly which field data types you need to use, then you can specify them in your request body when creating your index, as shown in the following example request: ```json PUT sample-index1 @@ -62,8 +96,9 @@ PUT sample-index1 } } ``` +{% include copy-curl.html %} -### Response +#### Response ```json { "acknowledged": true, @@ -71,8 +106,9 @@ PUT sample-index1 "index": "sample-index1" } ``` +{% include copy-curl.html %} -To add mappings to an existing index or data stream, you can send a request to the `_mapping` endpoint using the `PUT` or `POST` HTTP method: +To add mappings to an existing index or data stream, you can send a request to the `_mapping` endpoint using the `PUT` or `POST` HTTP method, as shown in the following example request: ```json POST sample-index1/_mapping @@ -84,84 +120,29 @@ POST sample-index1/_mapping } } ``` +{% include copy-curl.html %} You cannot change the mapping of an existing field, you can only modify the field's mapping parameters. {: .note} ---- -## Mapping example usage +## Mapping parameters -The following example shows how to create a mapping to specify that OpenSearch should ignore any documents with malformed IP addresses that do not conform to the [`ip`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/ip/) data type. You accomplish this by setting the `ignore_malformed` parameter to `true`. +Mapping parameters are used to configure the behavior of index fields. See [Mappings and field types]({{site.url}}{{site.baseurl}}/field-types/) for more information. -### Create an index with an `ip` mapping +## Mapping limit settings -To create an index, use a PUT request: +OpenSearch has certain mapping limits and settings, such as the settings listed in the following table. Settings can be configured based on your requirements. -```json -PUT /test-index -{ - "mappings" : { - "properties" : { - "ip_address" : { - "type" : "ip", - "ignore_malformed": true - } - } - } -} -``` - -You can add a document that has a malformed IP address to your index: - -```json -PUT /test-index/_doc/1 -{ - "ip_address" : "malformed ip address" -} -``` - -This indexed IP address does not throw an error because `ignore_malformed` is set to true. - -You can query the index using the following request: - -```json -GET /test-index/_search -``` +| Setting | Default value | Allowed value | Type | Description | +|-|-|-|-|-| +| `index.mapping.nested_fields.limit` | 50 | [0,) | Dynamic | Limits the maximum number of nested fields that can be defined in an index mapping. | +| `index.mapping.nested_objects.limit` | 10,000 | [0,) | Dynamic | Limits the maximum number of nested objects that can be created in a single document. | +| `index.mapping.total_fields.limit` | 1,000 | [0,) | Dynamic | Limits the maximum number of fields that can be defined in an index mapping. | +| `index.mapping.depth.limit` | 20 | [1,100] | Dynamic | Limits the maximum depth of nested objects and nested fields that can be defined in an index mapping. | +| `index.mapping.field_name_length.limit` | 50,000 | [1,50000] | Dynamic | Limits the maximum length of field names that can be defined in an index mapping. | +| `index.mapper.dynamic` | true | {true,false} | Dynamic | Determines whether new fields should be dynamically added to a mapping. | -The response shows that the `ip_address` field is ignored in the indexed document: - -```json -{ - "took": 14, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 1, - "relation": "eq" - }, - "max_score": 1, - "hits": [ - { - "_index": "test-index", - "_id": "1", - "_score": 1, - "_ignored": [ - "ip_address" - ], - "_source": { - "ip_address": "malformed ip address" - } - } - ] - } -} -``` +--- ## Get a mapping @@ -170,14 +151,16 @@ To get all mappings for one or more indexes, use the following request: ```json GET /_mapping ``` +{% include copy-curl.html %} -In the above request, `` may be an index name or a comma-separated list of index names. +In the previous request, `` may be an index name or a comma-separated list of index names. To get all mappings for all indexes, use the following request: ```json GET _mapping ``` +{% include copy-curl.html %} To get a mapping for a specific field, provide the index name and the field name: @@ -185,14 +168,14 @@ To get a mapping for a specific field, provide the index name and the field name GET _mapping/field/ GET //_mapping/field/ ``` +{% include copy-curl.html %} -Both `` and `` can be specified as one value or a comma-separated list. - -For example, the following request retrieves the mapping for the `year` and `age` fields in `sample-index1`: +Both `` and `` can be specified as either one value or a comma-separated list. For example, the following request retrieves the mapping for the `year` and `age` fields in `sample-index1`: ```json GET sample-index1/_mapping/field/year,age ``` +{% include copy-curl.html %} The response contains the specified fields: @@ -220,3 +203,8 @@ The response contains the specified fields: } } ``` +{% include copy-curl.html %} + +## Mappings use cases + +See [Mappings use cases]({{site.url}}{{site.baseurl}}/field-types/mappings-use-cases/) for use case examples, including examples of mapping string fields and ignoring malformed IP addresses. diff --git a/_field-types/mappings-use-cases.md b/_field-types/mappings-use-cases.md new file mode 100644 index 0000000000..835e030bab --- /dev/null +++ b/_field-types/mappings-use-cases.md @@ -0,0 +1,122 @@ +--- +layout: default +title: Mappings use cases +parent: Mappings and fields types +nav_order: 5 +nav_exclude: true +--- + +# Mappings use cases + +Mappings provide control over how data is indexed and queried, enabling optimized performance and efficient storage for a range of use cases. + +--- + +## Example: Ignoring malformed IP addresses + +The following example shows you how to create a mapping specifying that OpenSearch should ignore any documents containing malformed IP addresses that do not conform to the [`ip`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/ip/) data type. You can accomplish this by setting the `ignore_malformed` parameter to `true`. + +### Create an index with an `ip` mapping + +To create an index with an `ip` mapping, use a PUT request: + +```json +PUT /test-index +{ + "mappings" : { + "properties" : { + "ip_address" : { + "type" : "ip", + "ignore_malformed": true + } + } + } +} +``` +{% include copy-curl.html %} + +Then add a document with a malformed IP address: + +```json +PUT /test-index/_doc/1 +{ + "ip_address" : "malformed ip address" +} +``` +{% include copy-curl.html %} + +When you query the index, the `ip_address` field will be ignored. You can query the index using the following request: + +```json +GET /test-index/_search +``` +{% include copy-curl.html %} + +#### Response + +```json +{ + "took": 14, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "test-index", + "_id": "1", + "_score": 1, + "_ignored": [ + "ip_address" + ], + "_source": { + "ip_address": "malformed ip address" + } + } + ] + } +} +``` +{% include copy-curl.html %} + +--- + +## Mapping string fields to `text` and `keyword` types + +To create an index named `movies1` with a dynamic template that maps all string fields to both the `text` and `keyword` types, you can use the following request: + +```json +PUT movies1 +{ + "mappings": { + "dynamic_templates": [ + { + "strings": { + "match_mapping_type": "string", + "mapping": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword", + "ignore_above": 256 + } + } + } + } + } + ] + } +} +``` +{% include copy-curl.html %} + +This dynamic template ensures that any string fields in your documents will be indexed as both a full-text `text` type and a `keyword` type. diff --git a/_field-types/metadata-fields/field-names.md b/_field-types/metadata-fields/field-names.md new file mode 100644 index 0000000000..b17e94fbb4 --- /dev/null +++ b/_field-types/metadata-fields/field-names.md @@ -0,0 +1,43 @@ +--- +layout: default +title: Field names +nav_order: 10 +parent: Metadata fields +--- + +# Field names + +The `_field_names` field indexes field names that contain non-null values. This enables the use of the `exists` query, which can identify documents that either have or do not have non-null values for a specified field. + +However, `_field_names` only indexes field names when both `doc_values` and `norms` are disabled. If either `doc_values` or `norms` are enabled, then the `exists` query still functions but will not rely on the `_field_names` field. + +## Mapping example + +```json +{ + "mappings": { + "_field_names": { + "enabled": "true" + }, + "properties": { + }, + "title": { + "type": "text", + "doc_values": false, + "norms": false + }, + "description": { + "type": "text", + "doc_values": true, + "norms": false + }, + "price": { + "type": "float", + "doc_values": false, + "norms": true + } + } + } +} +``` +{% include copy-curl.html %} diff --git a/_field-types/metadata-fields/id.md b/_field-types/metadata-fields/id.md new file mode 100644 index 0000000000..f66f4b8e13 --- /dev/null +++ b/_field-types/metadata-fields/id.md @@ -0,0 +1,86 @@ +--- +layout: default +title: ID +nav_order: 20 +parent: Metadata fields +--- + +# ID + +Each document in OpenSearch has a unique `_id` field. This field is indexed, allowing you to retrieve documents using the GET API or the [`ids` query]({{site.url}}{{site.baseurl}}/query-dsl/term/ids/). + +If you do not provide an `_id` value, then OpenSearch automatically generates one for the document. +{: .note} + +The following example request creates an index named `test-index1` and adds two documents with different `_id` values: + +```json +PUT test-index1/_doc/1 +{ + "text": "Document with ID 1" +} + +PUT test-index1/_doc/2?refresh=true +{ + "text": "Document with ID 2" +} +``` +{% include copy-curl.html %} + +You can then query the documents using the `_id` field, as shown in the following example request: + +```json +GET test-index1/_search +{ + "query": { + "terms": { + "_id": ["1", "2"] + } + } +} +``` +{% include copy-curl.html %} + +The response returns both documents with `_id` values of `1` and `2`: + +```json +{ + "took": 10, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 2, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "test-index1", + "_id": "1", + "_score": 1, + "_source": { + "text": "Document with ID 1" + } + }, + { + "_index": "test-index1", + "_id": "2", + "_score": 1, + "_source": { + "text": "Document with ID 2" + } + } + ] + } +``` +{% include copy-curl.html %} + +## Limitations of the `_id` field + +While the `_id` field can be used in various queries, it is restricted from use in aggregations, sorting, and scripting. If you need to sort or aggregate on the `_id` field, it is recommended to duplicate the `_id` content into another field with `doc_values` enabled. Refer to [IDs query]({{site.url}}{{site.baseurl}}/query-dsl/term/ids/) for an example. diff --git a/_field-types/metadata-fields/ignored.md b/_field-types/metadata-fields/ignored.md new file mode 100644 index 0000000000..e867cfc754 --- /dev/null +++ b/_field-types/metadata-fields/ignored.md @@ -0,0 +1,147 @@ +--- +layout: default +title: Ignored +nav_order: 25 +parent: Metadata fields +--- + +# Ignored + +The `_ignored` field helps you manage issues related to malformed data in your documents. This field is used to index and store field names that were ignored during the indexing process as a result of the `ignore_malformed` setting being enabled in the [index mapping]({{site.url}}{{site.baseurl}}/field-types/). + +The `_ignored` field allows you to search for and identify documents containing fields that were ignored as well as for the specific field names that were ignored. This can be useful for troubleshooting. + +You can query the `_ignored` field using the `term`, `terms`, and `exists` queries, and the results will be included in the search hits. + +The `_ignored` field is only populated when the `ignore_malformed` setting is enabled in your index mapping. If `ignore_malformed` is set to `false` (the default value), then malformed fields will cause the entire document to be rejected, and the `_ignored` field will not be populated. +{: .note} + +The following example request shows you how to use the `_ignored` field: + +```json +GET _search +{ + "query": { + "exists": { + "field": "_ignored" + } + } +} +``` +{% include copy-curl.html %} + +--- + +#### Example indexing request with the `_ignored` field + +The following example request adds a new document to the `test-ignored` index with `ignore_malformed` set to `true` so that no error is thrown during indexing: + +```json +PUT test-ignored +{ + "mappings": { + "properties": { + "title": { + "type": "text" + }, + "length": { + "type": "long", + "ignore_malformed": true + } + } + } +} + +POST test-ignored/_doc +{ + "title": "correct text", + "length": "not a number" +} + +GET test-ignored/_search +{ + "query": { + "exists": { + "field": "_ignored" + } + } +} +``` +{% include copy-curl.html %} + +#### Example reponse + +```json +{ + "took": 42, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "test-ignored", + "_id": "qcf0wZABpEYH7Rw9OT7F", + "_score": 1, + "_ignored": [ + "length" + ], + "_source": { + "title": "correct text", + "length": "not a number" + } + } + ] + } +} +``` + +--- + +## Ignoring a specified field + +You can use a `term` query to find documents in which a specific field was ignored, as shown in the following example request: + +```json +GET _search +{ + "query": { + "term": { + "_ignored": "created_at" + } + } +} +``` +{% include copy-curl.html %} + +#### Reponse + +```json +{ + "took": 51, + "timed_out": false, + "_shards": { + "total": 45, + "successful": 45, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 0, + "relation": "eq" + }, + "max_score": null, + "hits": [] + } +} +``` diff --git a/_field-types/metadata-fields/index-metadata.md b/_field-types/metadata-fields/index-metadata.md new file mode 100644 index 0000000000..657f7d62a5 --- /dev/null +++ b/_field-types/metadata-fields/index-metadata.md @@ -0,0 +1,86 @@ +--- +layout: default +title: Index +nav_order: 25 +parent: Metadata fields +--- + +# Index + +When querying across multiple indexes, you may need to filter results based on the index into which a document was indexed. The `index` field matches documents based on their index. + +The following example request creates two indexes, `products` and `customers`, and adds a document to each index: + +```json +PUT products/_doc/1 +{ + "name": "Widget X" +} + +PUT customers/_doc/2 +{ + "name": "John Doe" +} +``` +{% include copy-curl.html %} + +You can then query both indexes and filter the results using the `_index` field, as shown in the following example request: + +```json +GET products,customers/_search +{ + "query": { + "terms": { + "_index": ["products", "customers"] + } + }, + "aggs": { + "index_groups": { + "terms": { + "field": "_index", + "size": 10 + } + } + }, + "sort": [ + { + "_index": { + "order": "desc" + } + } + ], + "script_fields": { + "index_name": { + "script": { + "lang": "painless", + "source": "doc['_index'].value" + } + } + } +} +``` +{% include copy-curl.html %} + +In this example: + +- The `query` section uses a `terms` query to match documents from the `products` and `customers` indexes. +- The `aggs` section performs a `terms` aggregation on the `_index` field, grouping the results by index. +- The `sort` section sorts the results by the `_index` field in ascending order. +- The `script_fields` section adds a new field called `index_name` to the search results containing the `_index` field value for each document. + +## Querying on the `_index` field + +The `_index` field represents the index into which a document was indexed. You can use this field in your queries to filter, aggregate, sort, or retrieve index information for your search results. + +Because the `_index` field is automatically added to every document, you can use it in your queries like any other field. For example, you can use the `terms` query to match documents from multiple indexes. The following example query returns all documents from the `products` and `customers` indexes: + +```json + { + "query": { + "terms": { + "_index": ["products", "customers"] + } + } +} +``` +{% include copy-curl.html %} diff --git a/_field-types/metadata-fields/index.md b/_field-types/metadata-fields/index.md new file mode 100644 index 0000000000..cdc079e1e5 --- /dev/null +++ b/_field-types/metadata-fields/index.md @@ -0,0 +1,21 @@ +--- +layout: default +title: Metadata fields +nav_order: 90 +has_children: true +has_toc: false +--- + +# Metadata fields + +OpenSearch provides built-in metadata fields that allow you to access information about the documents in an index. These fields can be used in your queries as needed. + +Metadata field | Description +:--- | :--- +`_field_names` | The document fields with non-empty or non-null values. +`_ignored` | The document fields that were ignored during the indexing process due to the presence of malformed data, as specified by the `ignore_malformed` setting. +`_id` | The unique identifier assigned to each document. +`_index` | The index in which the document is stored. +`_meta` | Stores custom metadata or additional information specific to the application or use case. +`_routing` | Allows you to specify a custom value that determines the shard assignment for a document in an OpenSearch cluster. +`_source` | Contains the original JSON representation of the document data. diff --git a/_field-types/metadata-fields/meta.md b/_field-types/metadata-fields/meta.md new file mode 100644 index 0000000000..220d58f106 --- /dev/null +++ b/_field-types/metadata-fields/meta.md @@ -0,0 +1,87 @@ +--- +layout: default +title: Meta +nav_order: 30 +parent: Metadata fields +--- + +# Meta + +The `_meta` field is a mapping property that allows you to attach custom metadata to your index mappings. This metadata can be used by your application to store information relevant to your use case, such as versioning, ownership, categorization, or auditing. + +## Usage + +You can define the `_meta` field when creating a new index or updating an existing index's mapping, as shown in the following example request: + +```json +PUT my-index +{ + "mappings": { + "_meta": { + "application": "MyApp", + "version": "1.2.3", + "author": "John Doe" + }, + "properties": { + "title": { + "type": "text" + }, + "description": { + "type": "text" + } + } + } +} + +``` +{% include copy-curl.html %} + +In this example, three custom metadata fields are added: `application`, `version`, and `author`. These fields can be used by your application to store any relevant information about the index, such as the application it belongs to, the application version, or the author of the index. + +You can update the `_meta` field using the [Put Mapping API]({{site.url}}{{site.baseurl}}/api-reference/index-apis/put-mapping/) operation, as shown in the following example request: + +```json +PUT my-index/_mapping +{ + "_meta": { + "application": "MyApp", + "version": "1.3.0", + "author": "Jane Smith" + } +} +``` +{% include copy-curl.html %} + +## Retrieving `meta` information + +You can retrieve the `_meta` information for an index using the [Get Mapping API]({{site.url}}{{site.baseurl}}/field-types/#get-a-mapping) operation, as shown in the following example request: + +```json +GET my-index/_mapping +``` +{% include copy-curl.html %} + +The response returns the full index mapping, including the `_meta` field: + +```json +{ + "my-index": { + "mappings": { + "_meta": { + "application": "MyApp", + "version": "1.3.0", + "author": "Jane Smith" + }, + "properties": { + "description": { + "type": "text" + }, + "title": { + "type": "text" + } + } + } + } +} +``` +{% include copy-curl.html %} diff --git a/_field-types/metadata-fields/routing.md b/_field-types/metadata-fields/routing.md new file mode 100644 index 0000000000..9064e20c49 --- /dev/null +++ b/_field-types/metadata-fields/routing.md @@ -0,0 +1,92 @@ +--- +layout: default +title: Routing +nav_order: 35 +parent: Metadata fields +--- + +# Routing + +OpenSearch uses a hashing algorithm to route documents to specific shards in an index. By default, the document's `_id` field is used as the routing value, but you can also specify a custom routing value for each document. + +## Default routing + +The following is the default OpenSearch routing formula. The `_routing` value is the document's `_id`. + +```json +shard_num = hash(_routing) % num_primary_shards +``` + +## Custom routing + +You can specify a custom routing value when indexing a document, as shown in the following example request: + +```json +PUT sample-index1/_doc/1?routing=JohnDoe1 +{ + "title": "This is a document" +} +``` +{% include copy-curl.html %} + +In this example, the document is routed using the value `JohnDoe1` instead of the default `_id`. + +You must provide the same routing value when retrieving, deleting, or updating the document, as shown in the following example request: + +```json +GET sample-index1/_doc/1?routing=JohnDoe1 +``` +{% include copy-curl.html %} + +## Querying by routing + +You can query documents based on their routing value by using the `_routing` field, as shown in the following example. This query only searches the shard(s) associated with the `JohnDoe1` routing value: + +```json +GET sample-index1/_search +{ + "query": { + "terms": { + "_routing": [ "JohnDoe1" ] + } + } +} +``` +{% include copy-curl.html %} + +## Required routing + +You can make custom routing required for all CRUD operations on an index, as shown in the following example request. If you try to index a document without providing a routing value, OpenSearch will throw an exception. + +```json +PUT sample-index2 +{ + "mappings": { + "_routing": { + "required": true + } + } +} +``` +{% include copy-curl.html %} + +## Routing to specific shards + +You can configure an index to route custom values to a subset of shards rather than a single shard. This is done by setting `index.routing_partition_size` at the time of index creation. The formula for calculating the shard is `shard_num = (hash(_routing) + hash(_id)) % routing_partition_size) % num_primary_shards`. + +The following example request routes documents to one of four shards in the index: + +```json +PUT sample-index3 +{ + "settings": { + "index.routing_partition_size": 4 + }, + "mappings": { + "_routing": { + "required": true + } + } +} +``` +{% include copy-curl.html %} diff --git a/_field-types/metadata-fields/source.md b/_field-types/metadata-fields/source.md new file mode 100644 index 0000000000..c9e714f43c --- /dev/null +++ b/_field-types/metadata-fields/source.md @@ -0,0 +1,54 @@ +--- +layout: default +title: Source +nav_order: 40 +parent: Metadata fields +--- + +# Source + +The `_source` field contains the original JSON document body that was indexed. While this field is not searchable, it is stored so that the full document can be returned when executing fetch requests, such as `get` and `search`. + +## Disabling the field + +You can disable the `_source` field by setting the `enabled` parameter to `false`, as shown in the following example request: + +```json +PUT sample-index1 +{ + "mappings": { + "_source": { + "enabled": false + } + } +} +``` +{% include copy-curl.html %} + +Disabling the `_source` field can impact the availability of certain features, such as the `update`, `update_by_query`, and `reindex` APIs, as well as the ability to debug queries or aggregations using the original indexed document. +{: .warning} + +## Including or excluding fields + +You can selectively control the contents of the `_source` field by using the `includes` and `excludes` parameters. This allows you to prune the stored `_source` field after it is indexed but before it is saved, as shown in the following example request: + +```json +PUT logs +{ + "mappings": { + "_source": { + "includes": [ + "*.count", + "meta.*" + ], + "excludes": [ + "meta.description", + "meta.other.*" + ] + } + } +} +``` +{% include copy-curl.html %} + +These fields are not stored in the `_source`, but you can still search them because the data remains indexed. diff --git a/_field-types/supported-field-types/derived.md b/_field-types/supported-field-types/derived.md index 2ca00927d1..d989c3e4a4 100644 --- a/_field-types/supported-field-types/derived.md +++ b/_field-types/supported-field-types/derived.md @@ -69,7 +69,7 @@ PUT logs } } }, - "client_ip": { + "clientip": { "type": "keyword" } } diff --git a/_getting-started/intro.md b/_getting-started/intro.md index edd178a23f..f5eb24ba2b 100644 --- a/_getting-started/intro.md +++ b/_getting-started/intro.md @@ -56,6 +56,7 @@ ID | Name | GPA | Graduation year 1 | John Doe | 3.89 | 2022 2 | Jonathan Powers | 3.85 | 2025 3 | Jane Doe | 3.52 | 2024 +... | | | ## Clusters and nodes diff --git a/_includes/header.html b/_includes/header.html index b7dce4c317..20d82c451e 100644 --- a/_includes/header.html +++ b/_includes/header.html @@ -1,3 +1,71 @@ +{% assign url_parts = page.url | split: "/" %} +{% if url_parts.size > 0 %} + + {% assign last_url_part = url_parts | last %} + + {% comment %} Does the URL contain a filename, and is it an index.html or not? {% endcomment %} + {% if last_url_part contains ".html" %} + {% assign url_has_filename = true %} + {% if last_url_part == 'index.html' %} + {% assign url_filename_is_index = true %} + {% else %} + {% assign url_filename_is_index = false %} + {% endif %} + {% else %} + {% assign url_has_filename = false %} + {% endif %} + + {% comment %} + OpenSearchCon URLs require some special consideration, because it's a specialization + of the /events URL which is itself a child of Community; te OpenSearchCon menu is NOT + a child of Community. + {% endcomment %} + {% if page.url contains "opensearchcon" %} + {% assign is_conference_page = true %} + {% else %} + {% assign is_conference_page = false %} + {% endif %} + + {% if is_conference_page %} + {% comment %} + If the page is a confernce page and it has a filename then its the penultimate + path component that has the child menu item of the OpenSearchCon that needs + to be marked as in-category. If there's no filename then reference the ultimate + path component. + Unless the filename is opensearchcon2023-cfp, because it's a one off that is not + within the /events/opensearchcon/... structure. + {% endcomment %} + {% if url_has_filename %} + {% unless page.url contains 'opensearchcon2023-cfp' %} + {% assign url_fragment_index = url_parts | size | minus: 2 %} + {% assign url_fragment = url_parts[url_fragment_index] %} + {% else %} + {% assign url_fragment = 'opensearchcon2023-cfp' %} + {% endunless %} + {% else %} + {% assign url_fragment = last_url_part %} + {% endif %} + {% else %} + {% comment %} + If the page is NOT a conference page, the URL has a filename, and the filename + is NOT index.html then refer to the filename without the .html extension. + If the filename is index.html then refer to the penultimate path component. + If there is not filename then refer to the ultimate path component. + {% endcomment %} + {% if url_has_filename %} + {% unless url_filename_is_index %} + {% assign url_fragment = last_url_part | replace: '.html', '' %} + {% else %} + {% assign url_fragment_index = url_parts | size | minus: 2 %} + {% assign url_fragment = url_parts[url_fragment_index] %} + {% endunless %} + {% else %} + {% assign url_fragment = last_url_part %} + {% endif %} + {% endif %} +{% else %} + {% assign url_fragment = '' %} +{% endif %} {% if page.alert %}