diff --git a/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt b/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt index 9e09f21c3a..11ff53efe6 100644 --- a/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt +++ b/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt @@ -80,6 +80,7 @@ Levenshtein [Oo]versamples? [Oo]nboarding pebibyte +p\d{2} [Pp]erformant [Pp]laintext [Pp]luggable @@ -126,6 +127,7 @@ stdout [Ss]ubvector [Ss]ubwords? [Ss]uperset +[Ss]uperadmins? [Ss]yslog tebibyte [Tt]emplated diff --git a/_about/index.md b/_about/index.md index d2cc011b55..041197eeba 100644 --- a/_about/index.md +++ b/_about/index.md @@ -22,16 +22,21 @@ This section contains documentation for OpenSearch and OpenSearch Dashboards. ## Getting started -- [Intro to OpenSearch]({{site.url}}{{site.baseurl}}/intro/) -- [Quickstart]({{site.url}}{{site.baseurl}}/quickstart/) +To get started, explore the following documentation: + +- [Getting started guide]({{site.url}}{{site.baseurl}}/getting-started/): + - [Intro to OpenSearch]({{site.url}}{{site.baseurl}}/getting-started/intro/) + - [Installation quickstart]({{site.url}}{{site.baseurl}}/getting-started/quickstart/) + - [Communicate with OpenSearch]({{site.url}}{{site.baseurl}}/getting-started/communicate/) + - [Ingest data]({{site.url}}{{site.baseurl}}/getting-started/ingest-data/) + - [Search data]({{site.url}}{{site.baseurl}}/getting-started/search-data/) + - [Getting started with OpenSearch security]({{site.url}}{{site.baseurl}}/getting-started/security/) - [Install OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/install-opensearch/index/) - [Install OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/install-and-configure/install-dashboards/index/) -- [See the FAQ](https://opensearch.org/faq) +- [FAQ](https://opensearch.org/faq) ## Why use OpenSearch? -With OpenSearch, you can perform the following use cases: - @@ -41,35 +46,38 @@ With OpenSearch, you can perform the following use cases: - - - - + + + + - + - +
Operational health tracking
Fast, Scalable Full-text SearchApplication and Infrastructure MonitoringSecurity and Event Information ManagementOperational Health TrackingFast, scalable full-text searchApplication and infrastructure monitoringSecurity and event information managementOperational health tracking
Help users find the right information within your application, website, or data lake catalog. Easily store and analyze log data, and set automated alerts for underperformance.Easily store and analyze log data, and set automated alerts for performance issues. Centralize logs to enable real-time security monitoring and forensic analysis.Use observability logs, metrics, and traces to monitor your applications and business in real time.Use observability logs, metrics, and traces to monitor your applications in real time.
-**Additional features and plugins:** +## Key features + +OpenSearch provides several features to help index, secure, monitor, and analyze your data: -OpenSearch has several features and plugins to help index, secure, monitor, and analyze your data. Most OpenSearch plugins have corresponding OpenSearch Dashboards plugins that provide a convenient, unified user interface. -- [Anomaly detection]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/) - Identify atypical data and receive automatic notifications -- [KNN]({{site.url}}{{site.baseurl}}/search-plugins/knn/) - Find “nearest neighbors” in your vector data -- [Performance Analyzer]({{site.url}}{{site.baseurl}}/monitoring-plugins/pa/) - Monitor and optimize your cluster -- [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) - Use SQL or a piped processing language to query your data -- [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/) - Automate index operations -- [ML Commons plugin]({{site.url}}{{site.baseurl}}/ml-commons-plugin/index/) - Train and execute machine-learning models -- [Asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/) - Run search requests in the background -- [Cross-cluster replication]({{site.url}}{{site.baseurl}}/replication-plugin/index/) - Replicate your data across multiple OpenSearch clusters +- [Anomaly detection]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/) -- Identify atypical data and receive automatic notifications. +- [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) -- Use SQL or a Piped Processing Language (PPL) to query your data. +- [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/) -- Automate index operations. +- [Search methods]({{site.url}}{{site.baseurl}}/search-plugins/knn/) -- From traditional lexical search to advanced vector and hybrid search, discover the optimal search method for your use case. +- [Machine learning]({{site.url}}{{site.baseurl}}/ml-commons-plugin/index/) -- Integrate machine learning models into your workloads. +- [Workflow automation]({{site.url}}{{site.baseurl}}/automating-configurations/index/) -- Automate complex OpenSearch setup and preprocessing tasks. +- [Performance evaluation]({{site.url}}{{site.baseurl}}/monitoring-plugins/pa/) -- Monitor and optimize your cluster. +- [Asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/) -- Run search requests in the background. +- [Cross-cluster replication]({{site.url}}{{site.baseurl}}/replication-plugin/index/) -- Replicate your data across multiple OpenSearch clusters. ## The secure path forward -OpenSearch includes a demo configuration so that you can get up and running quickly, but before using OpenSearch in a production environment, you must [configure the Security plugin manually]({{site.url}}{{site.baseurl}}/security/configuration/index/) with your own certificates, authentication method, users, and passwords. + +OpenSearch includes a demo configuration so that you can get up and running quickly, but before using OpenSearch in a production environment, you must [configure the Security plugin manually]({{site.url}}{{site.baseurl}}/security/configuration/index/) with your own certificates, authentication method, users, and passwords. To get started, see [Getting started with OpenSearch security]({{site.url}}{{site.baseurl}}/getting-started/security/). ## Looking for the Javadoc? diff --git a/_about/version-history.md b/_about/version-history.md index d7273ffedb..fd635aff5b 100644 --- a/_about/version-history.md +++ b/_about/version-history.md @@ -31,6 +31,7 @@ OpenSearch version | Release highlights | Release date [2.0.1](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.0.1.md) | Includes bug fixes and maintenance updates for Alerting and Anomaly Detection. | 16 June 2022 [2.0.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.0.0.md) | Includes document-level monitors for alerting, OpenSearch Notifications plugins, and Geo Map Tiles in OpenSearch Dashboards. Also adds support for Lucene 9 and bug fixes for all OpenSearch plugins. For a full list of release highlights, see the Release Notes. | 26 May 2022 [2.0.0-rc1](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.0.0-rc1.md) | The Release Candidate for 2.0.0. This version allows you to preview the upcoming 2.0.0 release before the GA release. The preview release adds document-level alerting, support for Lucene 9, and the ability to use term lookup queries in document level security. | 03 May 2022 +[1.3.19](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.19.md) | Includes bug fixes and maintenance updates for OpenSearch security, OpenSearch security Dashboards, and anomaly detection. | 27 August 2024 [1.3.18](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.18.md) | Includes maintenance updates for OpenSearch security. | 16 July 2024 [1.3.17](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.17.md) | Includes maintenance updates for OpenSearch security and OpenSearch Dashboards security. | 06 June 2024 [1.3.16](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.16.md) | Includes bug fixes and maintenance updates for OpenSearch security, index management, performance analyzer, and reporting. | 23 April 2024 diff --git a/_api-reference/validate.md b/_api-reference/validate.md new file mode 100644 index 0000000000..6e1470a505 --- /dev/null +++ b/_api-reference/validate.md @@ -0,0 +1,205 @@ +--- +layout: default +title: Validate Query +nav_order: 87 +--- + +# Validate Query + +You can use the Validate Query API to validate a query without running it. The query can be sent as a path parameter or included in the request body. + +## Path and HTTP methods + +The Validate Query API contains the following path: + +```json +GET /_validate/query +``` + +## Path parameters + +All path parameters are optional. + +Parameter | Data type | Description +:--- | :--- | :--- +`index` | String | The index to validate the query against. If you don't specify an index or multiple indexes as part of the URL (or want to override the URL value for an individual search), you can include it here. Examples include `"logs-*"` and `["my-store", "sample_data_ecommerce"]`. +`query` | Query object | The query using [Query DSL]({{site.url}}{{site.baseurl}}/query-dsl/). + +## Query parameters + +The following table lists the available query parameters. All query parameters are optional. + +Parameter | Data type | Description +:--- | :--- | :--- +`all_shards` | Boolean | When `true`, validation is run against [all shards](#rewrite-and-all_shards) instead of against one shard per index. Default is `false`. +`allow_no_indices` | Boolean | Whether to ignore wildcards that don't match any indexes. Default is `true`. +allow_partial_search_results | Boolean | Whether to return partial results if the request encounters an error or times out. Default is `true`. +`analyzer` | String | The analyzer to use in the query string. This should only be used with the `q` option. +`analyze_wildcard` | Boolean | Specifies whether to analyze wildcard and prefix queries. Default is `false`. +`default_operator` | String | Indicates whether the default operator for a string query should be `AND` or `OR`. Default is `OR`. +`df` | String | The default field if a field prefix is not provided in the query string. +`expand_wildcards` | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are `all` (match any index), `open` (match open, non-hidden indexes), `closed` (match closed, non-hidden indexes), `hidden` (match hidden indexes), and `none` (deny wildcard expressions). Default is `open`. +`explain` | Boolean | Whether to return information about how OpenSearch computed the [document's score](#explain). Default is `false`. +`ignore_unavailable` | Boolean | Specifies whether to include missing or closed indexes in the response and ignores unavailable shards during the search request. Default is `false`. +`lenient` | Boolean | Specifies whether OpenSearch should ignore format-based query failures (for example, as a result of querying a text field for an integer). Default is `false`. +`rewrite` | Determines how OpenSearch [rewrites](#rewrite) and scores multi-term queries. Valid values are `constant_score`, `scoring_boolean`, `constant_score_boolean`, `top_terms_N`, `top_terms_boost_N`, and `top_terms_blended_freqs_N`. Default is `constant_score`. +`q` | String | A query in the Lucene string syntax. + +## Example request + +The following example request uses an index named `Hamlet` created using a `bulk` request: + +```json +PUT hamlet/_bulk?refresh +{"index":{"_id":1}} +{"user" : { "id": "hamlet" }, "@timestamp" : "2099-11-15T14:12:12", "message" : "To Search or Not To Search"} +{"index":{"_id":2}} +{"user" : { "id": "hamlet" }, "@timestamp" : "2099-11-15T14:12:13", "message" : "My dad says that I'm such a ham."} +``` +{% include copy.html %} + +You can then use the Validate Query API to validate an index query, as shown in the following example: + +```json +GET hamlet/_validate/query?q=user.id:hamlet +``` +{% include copy.html %} + +The query can also be sent as a request body, as shown in the following example: + +```json +GET hamlet/_validate/query +{ + "query" : { + "bool" : { + "must" : { + "query_string" : { + "query" : "*:*" + } + }, + "filter" : { + "term" : { "user.id" : "hamlet" } + } + } + } +} +``` +{% include copy.html %} + + +## Example responses + +If the query passes validation, then the response indicates that the query is `true`, as shown in the following example response, where the `valid` parameter is `true`: + +``` +{ + "_shards": { + "total": 1, + "successful": 1, + "failed": 0 + }, + "valid": true +} +``` + +If the query does not pass validation, then OpenSearch responds that the query is `false`. The following example request query includes a dynamic mapping not configured in the `hamlet` index: + +```json +GET hamlet/_validate/query +{ + "query": { + "query_string": { + "query": "@timestamp:foo", + "lenient": false + } + } +} +``` + +OpenSearch responds with the following, where the `valid` parameter is `false`: + +``` +{ + "_shards": { + "total": 1, + "successful": 1, + "failed": 0 + }, + "valid": false +} +``` + +Certain query parameters can also affect what is included in the response. The following examples show how the [Explain](#explain), [Rewrite](#rewrite), and [all_shards](#rewrite-and-all_shards) query options affect the response. + +### Explain + +The `explain` option returns information about the query failure in the `explanations` field, as shown in the following example response: + +``` +{ + "valid" : false, + "_shards" : { + "total" : 1, + "successful" : 1, + "failed" : 0 + }, + "explanations" : [ { + "index" : "_shakespeare", + "valid" : false, + "error" : "shakespeare/IAEc2nIXSSunQA_suI0MLw] QueryShardException[failed to create query:...failed to parse date field [foo]" + } ] +} +``` + + +### Rewrite + +When the `rewrite` option is set to `true` in the request, the `explanations` option shows the Lucene query that is executed as a string, as shown in the following response: + +``` +{ + "valid": true, + "_shards": { + "total": 1, + "successful": 1, + "failed": 0 + }, + "explanations": [ + { + "index": "", + "valid": true, + "explanation": "((user:hamlet^4.256753 play:hamlet^6.863601 play:romeo^2.8415773 plot:puck^3.4193945 plot:othello^3.8244398 ... )~4) -ConstantScore(_id:2) #(ConstantScore(_type:_doc))^0.0" + } + ] +} +``` + + +### Rewrite and all_shards + +When both the `rewrite` and `all_shards` options are set to `true`, the Validate Query API responds with detailed information from all available shards as opposed to only one shard (the default), as shown in the following response: + +``` +{ + "valid": true, + "_shards": { + "total": 1, + "successful": 1, + "failed": 0 + }, + "explanations": [ + { + "index": "my-index-000001", + "shard": 0, + "valid": true, + "explanation": "(user.id:hamlet)^0.6333333" + } + ] +} +``` + + + + + + diff --git a/_automating-configurations/api/index.md b/_automating-configurations/api/index.md index 716e19c41f..78bc4eaede 100644 --- a/_automating-configurations/api/index.md +++ b/_automating-configurations/api/index.md @@ -18,4 +18,6 @@ OpenSearch supports the following workflow APIs: * [Search workflow]({{site.url}}{{site.baseurl}}/automating-configurations/api/search-workflow/) * [Search workflow state]({{site.url}}{{site.baseurl}}/automating-configurations/api/search-workflow-state/) * [Deprovision workflow]({{site.url}}{{site.baseurl}}/automating-configurations/api/deprovision-workflow/) -* [Delete workflow]({{site.url}}{{site.baseurl}}/automating-configurations/api/delete-workflow/) \ No newline at end of file +* [Delete workflow]({{site.url}}{{site.baseurl}}/automating-configurations/api/delete-workflow/) + +For information about workflow access control, see [Workflow template security]({{site.url}}{{site.baseurl}}/automating-configurations/workflow-security/). \ No newline at end of file diff --git a/_automating-configurations/index.md b/_automating-configurations/index.md index 144ad445c8..68742f6149 100644 --- a/_automating-configurations/index.md +++ b/_automating-configurations/index.md @@ -44,3 +44,4 @@ Workflow automation provides the following benefits: - For the workflow step syntax, see [Workflow steps]({{site.url}}{{site.baseurl}}/automating-configurations/workflow-steps/). - For a complete example, see [Workflow tutorial]({{site.url}}{{site.baseurl}}/automating-configurations/workflow-tutorial/). - For configurable settings, see [Workflow settings]({{site.url}}{{site.baseurl}}/automating-configurations/workflow-settings/). +- For information about workflow access control, see [Workflow template security]({{site.url}}{{site.baseurl}}/automating-configurations/workflow-security/). \ No newline at end of file diff --git a/_automating-configurations/workflow-security.md b/_automating-configurations/workflow-security.md new file mode 100644 index 0000000000..f3a3d7eeb9 --- /dev/null +++ b/_automating-configurations/workflow-security.md @@ -0,0 +1,93 @@ +--- +layout: default +title: Workflow template security +nav_order: 50 +--- + +# Workflow template security + +In OpenSearch, automated workflow configurations are provided by the Flow Framework plugin. You can use the Security plugin together with the Flow Framework plugin to limit non-admin users to specific actions. For example, you might want some users to only be able to create, update, or delete workflows, while others may only be able to view workflows. + +All Flow Framework indexes are protected as system indexes. Only a superadmin user or an admin user with a TLS certificate can access system indexes. For more information, see [System indexes]({{site.url}}{{site.baseurl}}/security/configuration/system-indices/). + +Security for Flow Framework is set up similarly to [security for anomaly detection]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/security/). + +## Basic permissions + +As an admin user, you can use the Security plugin to assign specific permissions to users based on the APIs they need to access. For a list of supported Flow Framework APIs, see [Workflow APIs]({{site.url}}{{site.baseurl}}/automating-configurations/api/index/). + +The Security plugin has two built-in roles that cover most Flow Framework use cases: `flow_framework_full_access` and `flow_framework_read_access`. For descriptions of each, see [Predefined roles]({{site.url}}{{site.baseurl}}/security/access-control/users-roles#predefined-roles). + +If these roles don't meet your needs, you can assign users individual Flow Framework [permissions]({{site.url}}{{site.baseurl}}/security/access-control/permissions/) to suit your use case. Each action corresponds to an operation in the REST API. For example, the `cluster:admin/opensearch/flow_framework/workflow/search` permission lets you search workflows. + +### Fine-grained access control + +To reduce the chances of unintended users viewing metadata that describes an index, we recommend that administrators enable role-based access control when assigning permissions to the intended user group. For more information, see [Limit access by backend role](#advanced-limit-access-by-backend-role). + +## (Advanced) Limit access by backend role + +Use backend roles to configure fine-grained access to individual workflows based on roles. For example, users in different departments of an organization can view workflows owned by their own department. + +First, make sure your users have the appropriate [backend roles]({{site.url}}{{site.baseurl}}/security/access-control/index/). Backend roles usually come from an [LDAP server]({{site.url}}{{site.baseurl}}/security/configuration/ldap/) or [SAML provider]({{site.url}}{{site.baseurl}}/security/configuration/saml/), but if you use an internal user database, you can [create users manually using the API]({{site.url}}{{site.baseurl}}/security/access-control/api#create-user). + +Next, enable the following setting: + +```json +PUT _cluster/settings +{ + "transient": { + "plugins.flow_framework.filter_by_backend_roles": "true" + } +} +``` +{% include copy-curl.html %} + +Now when users view workflow resources in OpenSearch Dashboards (or make REST API calls), they only see workflows created by users who share at least one backend role. + +For example, consider two users: `alice` and `bob`. + +`alice` has an `analyst` backend role: + +```json +PUT _plugins/_security/api/internalusers/alice +{ + "password": "alice", + "backend_roles": [ + "analyst" + ], + "attributes": {} +} +``` + +`bob` has a `human-resources` backend role: + +```json +PUT _plugins/_security/api/internalusers/bob +{ + "password": "bob", + "backend_roles": [ + "human-resources" + ], + "attributes": {} +} +``` + +Both `alice` and `bob` have full access to the Flow Framework APIs: + +```json +PUT _plugins/_security/api/rolesmapping/flow_framework_full_access +{ + "backend_roles": [], + "hosts": [], + "users": [ + "alice", + "bob" + ] +} +``` + +Because they have different backend roles, `alice` and `bob` cannot view each other's workflows or their results. + +Users without backend roles can still view other users' workflow results if they have `flow_framework_read_access`. This also applies to users who have `flow_framework_full_access` because this permission includes all of the permissions of `flow_framework_read_access`. + +Administrators should inform users that the `flow_framework_read_access` permission allows them to view the results of any workflow in a cluster, including data not directly accessible to them. To limit access to the results of a specific workflow, administrators should apply backend role filters when creating the workflow. This ensures that only users with matching backend roles can access that workflow's results. \ No newline at end of file diff --git a/_data-prepper/pipelines/configuration/sinks/s3.md b/_data-prepper/pipelines/configuration/sinks/s3.md index d1413f6ffc..3ff266cccf 100644 --- a/_data-prepper/pipelines/configuration/sinks/s3.md +++ b/_data-prepper/pipelines/configuration/sinks/s3.md @@ -173,14 +173,14 @@ When you provide your own Avro schema, that schema defines the final structure o In cases where your data is uniform, you may be able to automatically generate a schema. Automatically generated schemas are based on the first event that the codec receives. The schema will only contain keys from this event, and all keys must be present in all events in order to automatically generate a working schema. Automatically generated schemas make all fields nullable. Use the `include_keys` and `exclude_keys` sink configurations to control which data is included in the automatically generated schema. -Avro fields should use a null [union](https://avro.apache.org/docs/current/specification/#unions) because this will allow missing values. Otherwise, all required fields must be present for each event. Use non-nullable fields only when you are certain they exist. +Avro fields should use a null [union](https://avro.apache.org/docs/1.10.2/spec.html#Unions) because this will allow missing values. Otherwise, all required fields must be present for each event. Use non-nullable fields only when you are certain they exist. Use the following options to configure the codec. Option | Required | Type | Description :--- | :--- | :--- | :--- -`schema` | Yes | String | The Avro [schema declaration](https://avro.apache.org/docs/current/specification/#schema-declaration). Not required if `auto_schema` is set to true. -`auto_schema` | No | Boolean | When set to `true`, automatically generates the Avro [schema declaration](https://avro.apache.org/docs/current/specification/#schema-declaration) from the first event. +`schema` | Yes | String | The Avro [schema declaration](https://avro.apache.org/docs/1.2.0/spec.html#schemas). Not required if `auto_schema` is set to true. +`auto_schema` | No | Boolean | When set to `true`, automatically generates the Avro [schema declaration](https://avro.apache.org/docs/1.2.0/spec.html#schemas) from the first event. ### `ndjson` codec diff --git a/_getting-started/intro.md b/_getting-started/intro.md index edd178a23f..f5eb24ba2b 100644 --- a/_getting-started/intro.md +++ b/_getting-started/intro.md @@ -56,6 +56,7 @@ ID | Name | GPA | Graduation year 1 | John Doe | 3.89 | 2022 2 | Jonathan Powers | 3.85 | 2025 3 | Jane Doe | 3.52 | 2024 +... | | | ## Clusters and nodes diff --git a/_ml-commons-plugin/index.md b/_ml-commons-plugin/index.md index f0355b6be3..50d637379e 100644 --- a/_ml-commons-plugin/index.md +++ b/_ml-commons-plugin/index.md @@ -32,6 +32,10 @@ ML Commons supports various algorithms to help train ML models and make predicti ML Commons provides its own set of REST APIs. For more information, see [ML Commons API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/index/). +## ML-powered search + +For information about available ML-powered search types, see [ML-powered search]({{site.url}}{{site.baseurl}}/search-plugins/index/#ml-powered-search). + ## Tutorials Using the OpenSearch ML framework, you can build various applications, from implementing conversational search to building your own chatbot. For more information, see [Tutorials]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/index/). \ No newline at end of file diff --git a/_query-dsl/joining/index.md b/_query-dsl/joining/index.md new file mode 100644 index 0000000000..20f48c0b16 --- /dev/null +++ b/_query-dsl/joining/index.md @@ -0,0 +1,18 @@ +--- +layout: default +title: Joining queries +has_children: true +nav_order: 55 +--- + +# Joining queries + +OpenSearch is a distributed system in which data is spread across multiple nodes. Thus, running a SQL-like JOIN operation in OpenSearch is resource intensive. As an alternative, OpenSearch provides the following queries that perform join operations and are optimized for scaling across multiple nodes: + +- `nested` queries: Act as wrappers for other queries to search [nested]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/nested/) fields. The nested field objects are searched as though they were indexed as separate documents. +- `has_child` queries: Search for parent documents whose child documents match the query. +- `has_parent` queries: Search for child documents whose parent documents match the query. +- `parent_id` queries: A [join]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/nested/) field type establishes a parent/child relationship between documents in the same index. `parent_id` queries search for child documents that are joined to a specific parent document. + +If [`search.allow_expensive_queries`]({{site.url}}{{site.baseurl}}/query-dsl/index/#expensive-queries) is set to `false`, then joining queries are not executed. +{: .important} \ No newline at end of file diff --git a/_query-dsl/term/fuzzy.md b/_query-dsl/term/fuzzy.md index 7a426fd794..0e448bbbda 100644 --- a/_query-dsl/term/fuzzy.md +++ b/_query-dsl/term/fuzzy.md @@ -89,5 +89,5 @@ Parameter | Data type | Description Specifying a large value in `max_expansions` can lead to poor performance, especially if `prefix_length` is set to `0`, because of the large number of variations of the word that OpenSearch tries to match. {: .warning} -If [`search.allow_expensive_queries`]({{site.url}}{{site.baseurl}}/query-dsl/index/#expensive-queries) is set to `false`, fuzzy queries are not run. +If [`search.allow_expensive_queries`]({{site.url}}{{site.baseurl}}/query-dsl/index/#expensive-queries) is set to `false`, then fuzzy queries are not executed. {: .important} diff --git a/_query-dsl/term/prefix.md b/_query-dsl/term/prefix.md index 2a429c9f0e..087c26cc30 100644 --- a/_query-dsl/term/prefix.md +++ b/_query-dsl/term/prefix.md @@ -66,5 +66,5 @@ Parameter | Data type | Description `case_insensitive` | Boolean | If `true`, allows case-insensitive matching of the value with the indexed field values. Default is `false` (case sensitivity is determined by the field's mapping). `rewrite` | String | Determines how OpenSearch rewrites and scores multi-term queries. Valid values are `constant_score`, `scoring_boolean`, `constant_score_boolean`, `top_terms_N`, `top_terms_boost_N`, and `top_terms_blended_freqs_N`. Default is `constant_score`. -If [`search.allow_expensive_queries`]({{site.url}}{{site.baseurl}}/query-dsl/index/#expensive-queries) is set to `false`, prefix queries are not run. If `index_prefixes` is enabled, the `search.allow_expensive_queries` setting is ignored and an optimized query is built and run. +If [`search.allow_expensive_queries`]({{site.url}}{{site.baseurl}}/query-dsl/index/#expensive-queries) is set to `false`, then prefix queries are not executed. If `index_prefixes` is enabled, then the `search.allow_expensive_queries` setting is ignored and an optimized query is built and run. {: .important} diff --git a/_query-dsl/term/range.md b/_query-dsl/term/range.md index ceb264db76..1e6fece218 100644 --- a/_query-dsl/term/range.md +++ b/_query-dsl/term/range.md @@ -217,5 +217,5 @@ Parameter | Data type | Description `boost` | Floating-point | A floating-point value that specifies the weight of this field toward the relevance score. Values above 1.0 increase the field’s relevance. Values between 0.0 and 1.0 decrease the field’s relevance. Default is 1.0. `time_zone` | String | The time zone used to convert [`date`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date/) values to UTC in the query. Valid values are ISO 8601 [UTC offsets](https://en.wikipedia.org/wiki/List_of_UTC_offsets) and [IANA time zone IDs](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones). For more information, see [Time zone](#time-zone). -If [`search.allow_expensive_queries`]({{site.url}}{{site.baseurl}}/query-dsl/index/#expensive-queries) is set to `false`, range queries on [`text`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/text/) and [`keyword`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/) fields are not run. +If [`search.allow_expensive_queries`]({{site.url}}{{site.baseurl}}/query-dsl/index/#expensive-queries) is set to `false`, then range queries on [`text`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/text/) and [`keyword`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/) fields are not executed. {: .important} diff --git a/_query-dsl/term/regexp.md b/_query-dsl/term/regexp.md index 4a038729c0..34a0c916ce 100644 --- a/_query-dsl/term/regexp.md +++ b/_query-dsl/term/regexp.md @@ -61,5 +61,5 @@ Parameter | Data type | Description `max_determinized_states` | Integer | Lucene converts a regular expression to an automaton with a number of determinized states. This parameter specifies the maximum number of automaton states the query requires. Use this parameter to prevent high resource consumption. To run complex regular expressions, you may need to increase the value of this parameter. Default is 10,000. `rewrite` | String | Determines how OpenSearch rewrites and scores multi-term queries. Valid values are `constant_score`, `scoring_boolean`, `constant_score_boolean`, `top_terms_N`, `top_terms_boost_N`, and `top_terms_blended_freqs_N`. Default is `constant_score`. -If [`search.allow_expensive_queries`]({{site.url}}{{site.baseurl}}/query-dsl/index/#expensive-queries) is set to `false`, `regexp` queries are not run. +If [`search.allow_expensive_queries`]({{site.url}}{{site.baseurl}}/query-dsl/index/#expensive-queries) is set to `false`, then `regexp` queries are not executed. {: .important} diff --git a/_query-dsl/term/wildcard.md b/_query-dsl/term/wildcard.md index b2d7238758..c6e0499517 100644 --- a/_query-dsl/term/wildcard.md +++ b/_query-dsl/term/wildcard.md @@ -64,5 +64,5 @@ Parameter | Data type | Description `case_insensitive` | Boolean | If `true`, allows case-insensitive matching of the value with the indexed field values. Default is `false` (case sensitivity is determined by the field's mapping). `rewrite` | String | Determines how OpenSearch rewrites and scores multi-term queries. Valid values are `constant_score`, `scoring_boolean`, `constant_score_boolean`, `top_terms_N`, `top_terms_boost_N`, and `top_terms_blended_freqs_N`. Default is `constant_score`. -If [`search.allow_expensive_queries`]({{site.url}}{{site.baseurl}}/query-dsl/index/#expensive-queries) is set to `false`, wildcard queries are not run. +If [`search.allow_expensive_queries`]({{site.url}}{{site.baseurl}}/query-dsl/index/#expensive-queries) is set to `false`, then wildcard queries are not executed. {: .important} diff --git a/_search-plugins/index.md b/_search-plugins/index.md index 79e0e715d0..3604245f11 100644 --- a/_search-plugins/index.md +++ b/_search-plugins/index.md @@ -16,29 +16,31 @@ OpenSearch provides many features for customizing your search use cases and impr ## Search methods -OpenSearch supports the following search methods: +OpenSearch supports the following search methods. -- **Traditional lexical search** +### Traditional lexical search - - [Keyword (BM25) search]({{site.url}}{{site.baseurl}}/search-plugins/keyword-search/): Searches the document corpus for words that appear in the query. +OpenSearch supports [keyword (BM25) search]({{site.url}}{{site.baseurl}}/search-plugins/keyword-search/), which searches the document corpus for words that appear in the query. -- **Machine learning (ML)-powered search** +### ML-powered search - - **Vector search** +OpenSearch supports the following machine learning (ML)-powered search methods: - - [k-NN search]({{site.url}}{{site.baseurl}}/search-plugins/knn/): Searches for k-nearest neighbors to a search term across an index of vectors. +- **Vector search** - - **Neural search**: [Neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/) facilitates generating vector embeddings at ingestion time and searching them at search time. Neural search lets you integrate ML models into your search and serves as a framework for implementing other search methods. The following search methods are built on top of neural search: + - [k-NN search]({{site.url}}{{site.baseurl}}/search-plugins/knn/): Searches for the k-nearest neighbors to a search term across an index of vectors. - - [Semantic search]({{site.url}}{{site.baseurl}}/search-plugins/semantic-search/): Considers the meaning of the words in the search context. Uses dense retrieval based on text embedding models to search text data. +- **Neural search**: [Neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/) facilitates generating vector embeddings at ingestion time and searching them at search time. Neural search lets you integrate ML models into your search and serves as a framework for implementing other search methods. The following search methods are built on top of neural search: - - [Multimodal search]({{site.url}}{{site.baseurl}}/search-plugins/multimodal-search/): Uses multimodal embedding models to search text and image data. + - [Semantic search]({{site.url}}{{site.baseurl}}/search-plugins/semantic-search/): Considers the meaning of the words in the search context. Uses dense retrieval based on text embedding models to search text data. - - [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/): Uses sparse retrieval based on sparse embedding models to search text data. + - [Multimodal search]({{site.url}}{{site.baseurl}}/search-plugins/multimodal-search/): Uses multimodal embedding models to search text and image data. - - [Hybrid search]({{site.url}}{{site.baseurl}}/search-plugins/hybrid-search/): Combines traditional search and vector search to improve search relevance. + - [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/): Uses sparse retrieval based on sparse embedding models to search text data. - - [Conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/): Implements a retrieval-augmented generative search. + - [Hybrid search]({{site.url}}{{site.baseurl}}/search-plugins/hybrid-search/): Combines traditional search and vector search to improve search relevance. + + - [Conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/): Implements a retrieval-augmented generative search. ## Query languages diff --git a/_search-plugins/search-pipelines/search-processors.md b/_search-plugins/search-pipelines/search-processors.md index d696859a78..83c46ca69d 100644 --- a/_search-plugins/search-pipelines/search-processors.md +++ b/_search-plugins/search-pipelines/search-processors.md @@ -45,7 +45,7 @@ Processor | Description | Earliest available version [`rerank`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/)| Reranks search results using a cross-encoder model. | 2.12 [`retrieval_augmented_generation`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rag-processor/) | Used for retrieval-augmented generation (RAG) in [conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/). | 2.10 (generally available in 2.12) [`sort`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/sort-processor/)| Sorts an array of items in either ascending or descending order. | 2.16 -[`split`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/split-processor/)| Splits a string field into an array of substrings based on a specified delimiter. | 2.16 +[`split`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/split-processor/)| Splits a string field into an array of substrings based on a specified delimiter. | 2.17 [`truncate_hits`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/truncate-hits-processor/)| Discards search hits after a specified target count is reached. Can undo the effect of the `oversample` request processor. | 2.12 diff --git a/_search-plugins/search-pipelines/split-processor.md b/_search-plugins/search-pipelines/split-processor.md index 4afe49e6d2..c524386262 100644 --- a/_search-plugins/search-pipelines/split-processor.md +++ b/_search-plugins/search-pipelines/split-processor.md @@ -8,7 +8,7 @@ grand_parent: Search pipelines --- # Split processor -Introduced 2.16 +Introduced 2.17 {: .label .label-purple } The `split` processor splits a string field into an array of substrings based on a specified delimiter. diff --git a/_search-plugins/searching-data/index.md b/_search-plugins/searching-data/index.md index dab57d6460..279958d97c 100644 --- a/_search-plugins/searching-data/index.md +++ b/_search-plugins/searching-data/index.md @@ -18,4 +18,5 @@ Feature | Description [Paginate results]({{site.url}}{{site.baseurl}}/opensearch/search/paginate/) | Rather than a single, long list, separate search results into pages. [Sort results]({{site.url}}{{site.baseurl}}/opensearch/search/sort/) | Allow sorting of results by different criteria. [Highlight query matches]({{site.url}}{{site.baseurl}}/opensearch/search/highlight/) | Highlight the search term in the results. +[Retrieve inner hits]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/inner-hits/) | Retrieve underlying hits in nested and parent-join objects. [Retrieve specific fields]({{site.url}}{{site.baseurl}}search-plugins/searching-data/retrieve-specific-fields/) | Retrieve only the specific fields diff --git a/_search-plugins/searching-data/inner-hits.md b/_search-plugins/searching-data/inner-hits.md new file mode 100644 index 0000000000..395e9e748a --- /dev/null +++ b/_search-plugins/searching-data/inner-hits.md @@ -0,0 +1,809 @@ +--- +layout: default +title: Inner hits +parent: Searching data +has_children: false +nav_order: 70 +--- + +# Inner hits + +In OpenSearch, when you perform a search using [nested objects]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/nested/) or [parent-join]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/join/), the underlying hits (nested inner objects or child documents) are hidden by default. You can retrieve inner hits by using the `inner_hits` parameter in the search query. + +You can also use `inner_hits` with the following features: + + - [Highlight query matches]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/highlight/) + - [Explain]({{site.url}}{{site.baseurl}}/api-reference/explain/) + +## Inner hits with nested objects +Nested objects allow you to index an array of objects and maintain their relationship within the same document. The following example request uses the `inner_hits` parameter to retrieve the underlying inner hits. + +1. Create an index mapping with a nested object: + + ```json + PUT /my_index + { + "mappings": { + "properties": { + "user": { + "type": "nested", + "properties": { + "name": { "type": "text" }, + "age": { "type": "integer" } + } + } + } + } + } + ``` + {% include copy-curl.html %} + +2. Index data: + + ```json + POST /my_index/_doc/1 + { + "group": "fans", + "user": [ + { + "name": "John Doe", + "age": 28 + }, + { + "name": "Jane Smith", + "age": 34 + } + ] + } + ``` + {% include copy-curl.html %} + +3. Query with `inner_hits`: + + ```json + GET /my_index/_search + { + "query": { + "nested": { + "path": "user", + "query": { + "bool": { + "must": [ + { "match": { "user.name": "John" } } + ] + } + }, + "inner_hits": {} + } + } + } + ``` + {% include copy-curl.html %} + +The preceding query searches for nested user objects containing the name John and returns the matching nested documents in the `inner_hits` section of the response: + +```json +{ + "hits" : { + "total" : { + "value" : 1, + "relation" : "eq" + }, + "max_score" : 0.6931471, + "hits" : [ + { + "_index" : "my_index", + "_id" : "1", + "_score" : 0.6931471, + "_source" : { + "group" : "fans", + "user" : [ + { + "name" : "John Doe", + "age" : 28 + }, + { + "name" : "Jane Smith", + "age" : 34 + } + ] + }, + "inner_hits" : { + "user" : { + "hits" : { + "total" : { + "value" : 1, + "relation" : "eq" + }, + "max_score" : 0.6931471, + "hits" : [ + { + "_index" : "my_index", + "_id" : "1", + "_nested" : { + "field" : "user", + "offset" : 0 + }, + "_score" : 0.6931471, + "_source" : { + "name" : "John Doe", + "age" : 28 + } + } + ] + } + } + } + } + ] + } +} +``` +## Inner hits with parent-child objects +Parent-join relationships allow you to create relationships between documents of different types within the same index. The following example request searches with `inner_hits` using parent-child objects. + +1. Create an index with a parent-join field: + + ```json + PUT /my_index + { + "mappings": { + "properties": { + "my_join_field": { + "type": "join", + "relations": { + "parent": "child" + } + }, + "text": { + "type": "text" + } + } + } + } + ``` + {% include copy-curl.html %} + +2. Index data: + + ```json + # Index a parent document + PUT /my_index/_doc/1 + { + "text": "This is a parent document", + "my_join_field": "parent" + } + + # Index a child document + PUT /my_index/_doc/2?routing=1 + { + "text": "This is a child document", + "my_join_field": { + "name": "child", + "parent": "1" + } + } + ``` + {% include copy-curl.html %} + +3. Search with `inner_hits`: + + ```json + GET /my_index/_search + { + "query": { + "has_child": { + "type": "child", + "query": { + "match": { + "text": "child" + } + }, + "inner_hits": {} + } + } + } + ``` + {% include copy-curl.html %} + +The preceding query searches for parent documents that have child documents matching the query criteria (in this case, containing the term `"child"`). It returns the matching child documents in the `inner_hits` section of the response: + +```json +{ + "hits" : { + "total" : { + "value" : 1, + "relation" : "eq" + }, + "max_score" : 1.0, + "hits" : [ + { + "_index" : "my_index", + "_id" : "1", + "_score" : 1.0, + "_source" : { + "text" : "This is a parent document", + "my_join_field" : "parent" + }, + "inner_hits" : { + "child" : { + "hits" : { + "total" : { + "value" : 1, + "relation" : "eq" + }, + "max_score" : 0.6931471, + "hits" : [ + { + "_index" : "my_index", + "_id" : "2", + "_score" : 0.6931471, + "_routing" : "1", + "_source" : { + "text" : "This is a child document", + "my_join_field" : { + "name" : "child", + "parent" : "1" + } + } + } + ] + } + } + } + } + ] + } +} +``` + +## Using both parent-join and nested objects with `inner_hits` + +The following example demonstrates using both parent-join and nested objects with `inner_hits`. + +1. Create an index with the following mapping: + + ```json + PUT /my_index + { + "mappings": { + "properties": { + "my_join_field": { + "type": "join", + "relations": { + "parent": "child" + } + }, + "text": { + "type": "text" + }, + "comments": { + "type": "nested", + "properties": { + "user": { "type": "text" }, + "message": { "type": "text" } + } + } + } + } + } + ``` + {% include copy-curl.html %} + +2. Index data: + + ```json + # Index a parent document + PUT /my_index/_doc/1 + { + "text": "This is a parent document", + "my_join_field": "parent" + } + + # Index a child document with nested comments + PUT /my_index/_doc/2?routing=1 + { + "text": "This is a child document", + "my_join_field": { + "name": "child", + "parent": "1" + }, + "comments": [ + { + "user": "John", + "message": "This is a comment" + }, + { + "user": "Jane", + "message": "Another comment" + } + ] + } + ``` + {% include copy-curl.html %} + +3. Query with `inner_hits`: + + ```json + GET /my_index/_search + { + "query": { + "has_child": { + "type": "child", + "query": { + "nested": { + "path": "comments", + "query": { + "bool": { + "must": [ + { "match": { "comments.user": "John" } } + ] + } + }, + "inner_hits": {} + } + }, + "inner_hits": {} + } + } + } + ``` + {% include copy-curl.html %} + +The preceding query searches for parent documents that have child documents containing comments made by John. Specifying `inner_hits` ensures that the matching child documents and their nested comments are returned: + +```json +{ + "hits" : { + "total" : { + "value" : 1, + "relation" : "eq" + }, + "max_score" : 1.0, + "hits" : [ + { + "_index" : "my_index", + "_id" : "1", + "_score" : 1.0, + "_source" : { + "text" : "This is a parent document", + "my_join_field" : "parent" + }, + "inner_hits" : { + "child" : { + "hits" : { + "total" : { + "value" : 1, + "relation" : "eq" + }, + "max_score" : 0.6931471, + "hits" : [ + { + "_index" : "my_index", + "_id" : "2", + "_score" : 0.6931471, + "_routing" : "1", + "_source" : { + "text" : "This is a child document", + "my_join_field" : { + "name" : "child", + "parent" : "1" + }, + "comments" : [ + { + "user" : "John", + "message" : "This is a comment" + }, + { + "user" : "Jane", + "message" : "Another comment" + } + ] + }, + "inner_hits" : { + "comments" : { + "hits" : { + "total" : { + "value" : 1, + "relation" : "eq" + }, + "max_score" : 0.6931471, + "hits" : [ + { + "_index" : "my_index", + "_id" : "2", + "_nested" : { + "field" : "comments", + "offset" : 0 + }, + "_score" : 0.6931471, + "_source" : { + "message" : "This is a comment", + "user" : "John" + } + } + ] + } + } + } + } + ] + } + } + } + } + ] + } +} +``` + + +## inner_hits parameters + +You can pass the following additional parameters to a search with `inner_hits` using both nested objects and parent-join relationships: + +* `from`: The offset from where to start fetching hits in the `inner_hits` results. +* `size`: The maximum number of inner hits to return. +* `sort`: The sorting order for the inner hits. +* `name`: A custom name for the inner hits in the response. This is useful in differentiating between multiple inner hits in a single query. + + +### Example: inner_hits parameters with nested objects + + +1. Create an index with the following mappings: + + ```json + PUT /products + { + "mappings": { + "properties": { + "product_name": { "type": "text" }, + "reviews": { + "type": "nested", + "properties": { + "user": { "type": "text" }, + "comment": { "type": "text" }, + "rating": { "type": "integer" } + } + } + } + } + } + ``` + {% include copy-curl.html %} + +2. Index data: + + ```json + POST /products/_doc/1 + { + "product_name": "Smartphone", + "reviews": [ + { "user": "Alice", "comment": "Great phone", "rating": 5 }, + { "user": "Bob", "comment": "Not bad", "rating": 3 }, + { "user": "Charlie", "comment": "Excellent", "rating": 4 } + ] + } + ``` + {% include copy-curl.html %} + + ```json + POST /products/_doc/2 + { + "product_name": "Laptop", + "reviews": [ + { "user": "Dave", "comment": "Very good", "rating": 5 }, + { "user": "Eve", "comment": "Good value", "rating": 4 } + ] + } + ``` + {% include copy-curl.html %} + +3. Query with `inner_hits` and provide additional parameters: + + ```json + GET /products/_search + { + "query": { + "nested": { + "path": "reviews", + "query": { + "match": { "reviews.comment": "Good" } + }, + "inner_hits": { + "from": 0, + "size": 2, + "sort": [ + { "reviews.rating": { "order": "desc" } } + ], + "name": "top_reviews" + } + } + } + } + ``` + {% include copy-curl.html %} + +The following is the expected result: + +```json +{ + "hits" : { + "total" : { + "value" : 1, + "relation" : "eq" + }, + "max_score" : 0.83740485, + "hits" : [ + { + "_index" : "products", + "_id" : "2", + "_score" : 0.83740485, + "_source" : { + "product_name" : "Laptop", + "reviews" : [ + { + "user" : "Dave", + "comment" : "Very good", + "rating" : 5 + }, + { + "user" : "Eve", + "comment" : "Good value", + "rating" : 4 + } + ] + }, + "inner_hits" : { + "top_reviews" : { + "hits" : { + "total" : { + "value" : 2, + "relation" : "eq" + }, + "max_score" : null, + "hits" : [ + { + "_index" : "products", + "_id" : "2", + "_nested" : { + "field" : "reviews", + "offset" : 0 + }, + "_score" : null, + "_source" : { + "rating" : 5, + "comment" : "Very good", + "user" : "Dave" + }, + "sort" : [ + 5 + ] + }, + { + "_index" : "products", + "_id" : "2", + "_nested" : { + "field" : "reviews", + "offset" : 1 + }, + "_score" : null, + "_source" : { + "rating" : 4, + "comment" : "Good value", + "user" : "Eve" + }, + "sort" : [ + 4 + ] + } + ] + } + } + } + } + ] + } +} +``` + + +### Example: inner_hits parameters with a parent-join relationship + + +1. Create an index with the following mappings: + + ```json + PUT /company + { + "mappings": { + "properties": { + "my_join_field": { + "type": "join", + "relations": { + "employee": "task" + } + }, + "name": { "type": "text" }, + "description": { + "type": "text", + "fields": { + "keyword": { "type": "keyword" } + } + } + } + } + } + ``` + {% include copy-curl.html %} + +2. Index data: + + ```json + # Index a parent document + PUT /company/_doc/1 + { + "name": "Alice", + "my_join_field": "employee" + } + ``` + {% include copy-curl.html %} + + ```json + # Index child documents + PUT /company/_doc/2?routing=1 + { + "description": "Complete the project", + "my_join_field": { + "name": "task", + "parent": "1" + } + } + ``` + {% include copy-curl.html %} + + ```json + PUT /company/_doc/3?routing=1 + { + "description": "Prepare the report", + "my_join_field": { + "name": "task", + "parent": "1" + } + } + ``` + {% include copy-curl.html %} + + ```json + PUT /company/_doc/4?routing=1 + { + "description": "Update project", + "my_join_field": { + "name": "task", + "parent": "1" + } + } + ``` + {% include copy-curl.html %} + +3. Query with `inner_hits` parameters: + + ```json + GET /company/_search + { + "query": { + "has_child": { + "type": "task", + "query": { + "match": { "description": "project" } + }, + "inner_hits": { + "from": 0, + "size": 10, + "sort": [ + { "description.keyword": { "order": "asc" } } + ], + "name": "related_tasks" + } + } + } + } + ``` + {% include copy-curl.html %} + +The following is the expected result: + +```json +{ + "hits" : { + "total" : { + "value" : 1, + "relation" : "eq" + }, + "max_score" : 1.0, + "hits": [ + { + "_index": "company", + "_id": "1", + "_score": 1, + "_source": { + "name": "Alice", + "my_join_field": "employee" + }, + "inner_hits": { + "related_tasks": { + "hits": { + "total": { + "value": 2, + "relation": "eq" + }, + "max_score": null, + "hits": [ + { + "_index": "company", + "_id": "2", + "_score": null, + "_routing": "1", + "_source": { + "description": "Complete the project", + "my_join_field": { + "name": "task", + "parent": "1" + } + }, + "sort": [ + "Complete the project" + ] + }, + { + "_index": "company", + "_id": "4", + "_score": null, + "_routing": "1", + "_source": { + "description": "Update project", + "my_join_field": { + "name": "task", + "parent": "1" + } + }, + "sort": [ + "Update project" + ] + } + ] + } + } + } + } + ] + } +} +``` + +## Benefits of using inner_hits + +* **Detailed query results** + + You can use `inner_hits` to retrieve detailed information about matching nested or child documents directly from the parent document's search results. This is particularly useful for understanding the context and specifics of the match without having to perform additional queries. + + Example use case: In a blog post index, you have comments as nested objects. When searching for blog posts containing specific comments, you can retrieve relevant comments that match the search criteria along with information about the post. + +* **Optimized performance** + + Without `inner_hits`, you may need to run multiple queries to fetch related documents. Using `inner_hits` consolidates these into a single query, reducing the number of round trips to the OpenSearch server and improving overall performance. + + Example use case: In an e-commerce application, you have products as parent documents and reviews as child documents. A single query using `inner_hits` can fetch products and their relevant reviews, avoiding multiple separate queries. + +* **Simplified query logic** + + You can combine parent/child or nested document logic in a single query to simplify the application code and reduce complexity. This helps to ensure that the code is more maintainable and consistent by centralizing the query logic in OpenSearch + + Example use case: In a job portal, you have jobs as parent documents and applications as nested or child documents. You can simplify the application logic by fetching jobs along with specific applications in one query. + +* **Contextual relevance** + + Using `inner_hits` provides contextual relevance by showing exactly which nested or child documents match the query criteria. This is crucial for applications in which the relevance of results depends on a specific part of the document that matches the query. + + Example use case: In a customer support system, you have tickets as parent documents and comments or updates as nested or child documents. You can determine which specific comment matches the search in order to better understand the context of the ticket search. \ No newline at end of file diff --git a/_search-plugins/vector-search.md b/_search-plugins/vector-search.md index 68f6dea08c..cd893f4144 100644 --- a/_search-plugins/vector-search.md +++ b/_search-plugins/vector-search.md @@ -97,7 +97,7 @@ In general, select NMSLIB or Faiss for large-scale use cases. Lucene is a good o | | NMSLIB/HNSW | Faiss/HNSW | Faiss/IVF | Lucene/HNSW | |:---|:---|:---|:---|:---| -| Max dimensions | 16,000 | 16,000 | 16,000 | 1,024 | +| Max dimensions | 16,000 | 16,000 | 16,000 | 16,000 | | Filter | Post-filter | Post-filter | Post-filter | Filter during search | | Training required | No | No | Yes | No | | Similarity metrics | `l2`, `innerproduct`, `cosinesimil`, `l1`, `linf` | `l2`, `innerproduct` | `l2`, `innerproduct` | `l2`, `cosinesimil` |