From 39de58ba6b6e6bc6aaa78588a9488fb9d27af152 Mon Sep 17 00:00:00 2001 From: "opensearch-trigger-bot[bot]" <98922864+opensearch-trigger-bot[bot]@users.noreply.github.com> Date: Tue, 24 Sep 2024 15:28:01 +0000 Subject: [PATCH] Add has_child query (#8354) (#8360) --- _field-types/supported-field-types/join.md | 2 +- _query-dsl/geo-and-xy/geo-bounding-box.md | 6 +- _query-dsl/geo-and-xy/geodistance.md | 6 +- _query-dsl/geo-and-xy/geopolygon.md | 6 +- _query-dsl/geo-and-xy/geoshape.md | 6 +- _query-dsl/joining/has-child.md | 259 +++++++++++++++++++++ _query-dsl/joining/index.md | 5 +- 7 files changed, 275 insertions(+), 15 deletions(-) create mode 100644 _query-dsl/joining/has-child.md diff --git a/_field-types/supported-field-types/join.md b/_field-types/supported-field-types/join.md index c707c66774..1c5b0d1322 100644 --- a/_field-types/supported-field-types/join.md +++ b/_field-types/supported-field-types/join.md @@ -61,7 +61,7 @@ PUT testindex1/_doc/1 ``` {% include copy-curl.html %} -When indexing child documents, you have to specify the `routing` query parameter because parent and child documents in the same relation have to be indexed on the same shard. Each child document refers to its parent's ID in the `parent` field. +When indexing child documents, you need to specify the `routing` query parameter because parent and child documents in the same parent/child hierarchy must be indexed on the same shard. For more information, see [Routing]({{site.url}}{{site.baseurl}}/field-types/metadata-fields/routing/). Each child document refers to its parent's ID in the `parent` field. Index two child documents, one for each parent: diff --git a/_query-dsl/geo-and-xy/geo-bounding-box.md b/_query-dsl/geo-and-xy/geo-bounding-box.md index 1112a4278e..66fcc224d6 100644 --- a/_query-dsl/geo-and-xy/geo-bounding-box.md +++ b/_query-dsl/geo-and-xy/geo-bounding-box.md @@ -173,11 +173,11 @@ GET testindex1/_search ``` {% include copy-curl.html %} -## Request fields +## Parameters -Geo-bounding box queries accept the following fields. +Geo-bounding box queries accept the following parameters. -Field | Data type | Description +Parameter | Data type | Description :--- | :--- | :--- `_name` | String | The name of the filter. Optional. `validation_method` | String | The validation method. Valid values are `IGNORE_MALFORMED` (accept geopoints with invalid coordinates), `COERCE` (try to coerce coordinates to valid values), and `STRICT` (return an error when coordinates are invalid). Default is `STRICT`. diff --git a/_query-dsl/geo-and-xy/geodistance.md b/_query-dsl/geo-and-xy/geodistance.md index b272cad81e..3eef58bc69 100644 --- a/_query-dsl/geo-and-xy/geodistance.md +++ b/_query-dsl/geo-and-xy/geodistance.md @@ -103,11 +103,11 @@ The response contains the matching document: } ``` -## Request fields +## Parameters -Geodistance queries accept the following fields. +Geodistance queries accept the following parameters. -Field | Data type | Description +Parameter | Data type | Description :--- | :--- | :--- `_name` | String | The name of the filter. Optional. `distance` | String | The distance within which to match the points. This distance is the radius of a circle centered at the specified point. For supported distance units, see [Distance units]({{site.url}}{{site.baseurl}}/api-reference/common-parameters/#distance-units). Required. diff --git a/_query-dsl/geo-and-xy/geopolygon.md b/_query-dsl/geo-and-xy/geopolygon.md index 980a0c5a63..810e48f2b7 100644 --- a/_query-dsl/geo-and-xy/geopolygon.md +++ b/_query-dsl/geo-and-xy/geopolygon.md @@ -161,11 +161,11 @@ However, if you specify the vertices in the following order: The response returns no results. -## Request fields +## Parameters -Geopolygon queries accept the following fields. +Geopolygon queries accept the following parameters. -Field | Data type | Description +Parameter | Data type | Description :--- | :--- | :--- `_name` | String | The name of the filter. Optional. `validation_method` | String | The validation method. Valid values are `IGNORE_MALFORMED` (accept geopoints with invalid coordinates), `COERCE` (try to coerce coordinates to valid values), and `STRICT` (return an error when coordinates are invalid). Optional. Default is `STRICT`. diff --git a/_query-dsl/geo-and-xy/geoshape.md b/_query-dsl/geo-and-xy/geoshape.md index 8acc691c3a..5b144b06d6 100644 --- a/_query-dsl/geo-and-xy/geoshape.md +++ b/_query-dsl/geo-and-xy/geoshape.md @@ -721,10 +721,10 @@ The response returns document 1: Note that when you indexed the geopoints, you specified their coordinates in `"latitude, longitude"` format. When you search for matching documents, the coordinate array is in `[longitude, latitude]` format. Thus, document 1 is returned in the results but document 2 is not. -## Request fields +## Parameters -Geoshape queries accept the following fields. +Geoshape queries accept the following parameters. -Field | Data type | Description +Parameter | Data type | Description :--- | :--- | :--- `ignore_unmapped` | Boolean | Specifies whether to ignore an unmapped field. If set to `true`, then the query does not return any documents that contain an unmapped field. If set to `false`, then an exception is thrown when the field is unmapped. Optional. Default is `false`. \ No newline at end of file diff --git a/_query-dsl/joining/has-child.md b/_query-dsl/joining/has-child.md new file mode 100644 index 0000000000..c1cc7a5423 --- /dev/null +++ b/_query-dsl/joining/has-child.md @@ -0,0 +1,259 @@ +--- +layout: default +title: Has child +parent: Joining queries +nav_order: 10 +--- + +# Has child query + +The `has_child` query returns parent documents whose child documents match a specific query. You can establish parent-child relationships between documents in the same index by using a [join]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/join/) field type. + +The `has_child` query is slower than other queries because of the join operation it performs. Performance decreases as the number of matching child documents pointing to different parent documents increases. Each `has_child` query in your search may significantly impact query performance. If you prioritize speed, avoid using this query or limit its usage as much as possible. +{: .warning} + +## Example + +Before you can run a `has_child` query, your index must contain a [join]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/join/) field in order to establish parent-child relationships. The index mapping request uses the following format: + +```json +PUT /example_index +{ + "mappings": { + "properties": { + "relationship_field": { + "type": "join", + "relations": { + "parent_doc": "child_doc" + } + } + } + } +} +``` +{% include copy-curl.html %} + +In this example, you'll configure an index that contains documents representing products and their brands. + +First, create the index and establish the parent-child relationship between `brand` and `product`: + +```json +PUT testindex1 +{ + "mappings": { + "properties": { + "product_to_brand": { + "type": "join", + "relations": { + "brand": "product" + } + } + } + } +} +``` +{% include copy-curl.html %} + +Index two parent (brand) documents: + +```json +PUT testindex1/_doc/1 +{ + "name": "Luxury brand", + "product_to_brand" : "brand" +} +``` +{% include copy-curl.html %} + +```json +PUT testindex1/_doc/2 +{ + "name": "Economy brand", + "product_to_brand" : "brand" +} +``` +{% include copy-curl.html %} + +Index three child (product) documents: + +```json +PUT testindex1/_doc/3?routing=1 +{ + "name": "Mechanical watch", + "sales_count": 150, + "product_to_brand": { + "name": "product", + "parent": "1" + } +} +``` +{% include copy-curl.html %} + +```json +PUT testindex1/_doc/4?routing=2 +{ + "name": "Electronic watch", + "sales_count": 300, + "product_to_brand": { + "name": "product", + "parent": "2" + } +} +``` +{% include copy-curl.html %} + +```json +PUT testindex1/_doc/5?routing=2 +{ + "name": "Digital watch", + "sales_count": 100, + "product_to_brand": { + "name": "product", + "parent": "2" + } +} +``` +{% include copy-curl.html %} + +To search for the parent of a child, use a `has_child` query. The following query returns parent documents (brands) that make watches: + +```json +GET testindex1/_search +{ + "query" : { + "has_child": { + "type":"product", + "query": { + "match" : { + "name": "watch" + } + } + } + } +} +``` +{% include copy-curl.html %} + +The response returns both brands: + +```json +{ + "took": 15, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 2, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "testindex1", + "_id": "1", + "_score": 1, + "_source": { + "name": "Luxury brand", + "product_to_brand": "brand" + } + }, + { + "_index": "testindex1", + "_id": "2", + "_score": 1, + "_source": { + "name": "Economy brand", + "product_to_brand": "brand" + } + } + ] + } +} +``` + +## Parameters + +The following table lists all top-level parameters supported by `has_child` queries. + +| Parameter | Required/Optional | Description | +|:---|:---|:---| +| `type` | Required | Specifies the name of the child relationship as defined in the `join` field mapping. | +| `query` | Required | The query to run on child documents. If a child document matches the query, the parent document is returned. | +| `ignore_unmapped` | Optional | Indicates whether to ignore unmapped `type` fields and not return documents instead of throwing an error. You can provide this parameter when querying multiple indexes, some of which may not contain the `type` field. Default is `false`. | +| `max_children` | Optional | The maximum number of matching child documents for a parent document. If exceeded, the parent document is excluded from the search results. | +| `min_children` | Optional | The minimum number of matching child documents required for a parent document to be included in the results. If not met, the parent is excluded. Default is `1`.| +| `score_mode` | Optional | Defines how scores of matching child documents influence the parent document's score. Valid values are:
- `none`: Ignores the relevance scores of child documents and assigns a score of `0` to the parent document.
- `avg`: Uses the average relevance score of all matching child documents.
- `max`: Assigns the highest relevance score from the matching child documents to the parent.
- `min`: Assigns the lowest relevance score from the matching child documents to the parent.
- `sum`: Sums the relevance scores of all matching child documents.
Default is `none`. | + + +## Sorting limitations + +The `has_child` query does not support [sorting results]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/sort/) using standard sorting options. If you need to sort parent documents by fields in their child documents, you can use a [`function_score` query]({{site.url}}{{site.baseurl}}/query-dsl/compound/function-score/) and sort by the parent document's score. + +In the preceding example, you can sort parent documents (brands) based on the `sales_count` of their child products. This query multiplies the score by the `sales_count` field of the child documents and assigns the highest relevance score from the matching child documents to the parent: + +```json +GET testindex1/_search +{ + "query": { + "has_child": { + "type": "product", + "query": { + "function_score": { + "script_score": { + "script": "_score * doc['sales_count'].value" + } + } + }, + "score_mode": "max" + } + } +} +``` +{% include copy-curl.html %} + +The response contains the brands sorted by the highest child `sales_count`: + +```json +{ + "took": 6, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 2, + "relation": "eq" + }, + "max_score": 300, + "hits": [ + { + "_index": "testindex1", + "_id": "2", + "_score": 300, + "_source": { + "name": "Economy brand", + "product_to_brand": "brand" + } + }, + { + "_index": "testindex1", + "_id": "1", + "_score": 150, + "_source": { + "name": "Luxury brand", + "product_to_brand": "brand" + } + } + ] + } +} +``` \ No newline at end of file diff --git a/_query-dsl/joining/index.md b/_query-dsl/joining/index.md index 20f48c0b16..4ed46b3e17 100644 --- a/_query-dsl/joining/index.md +++ b/_query-dsl/joining/index.md @@ -3,6 +3,7 @@ layout: default title: Joining queries has_children: true nav_order: 55 +has_toc: false --- # Joining queries @@ -10,9 +11,9 @@ nav_order: 55 OpenSearch is a distributed system in which data is spread across multiple nodes. Thus, running a SQL-like JOIN operation in OpenSearch is resource intensive. As an alternative, OpenSearch provides the following queries that perform join operations and are optimized for scaling across multiple nodes: - `nested` queries: Act as wrappers for other queries to search [nested]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/nested/) fields. The nested field objects are searched as though they were indexed as separate documents. -- `has_child` queries: Search for parent documents whose child documents match the query. +- [`has_child`]({{site.url}}{{site.baseurl}}/query-dsl/joining/has-child/) queries: Search for parent documents whose child documents match the query. - `has_parent` queries: Search for child documents whose parent documents match the query. -- `parent_id` queries: A [join]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/nested/) field type establishes a parent/child relationship between documents in the same index. `parent_id` queries search for child documents that are joined to a specific parent document. +- `parent_id` queries: A [join]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/join/) field type establishes a parent/child relationship between documents in the same index. `parent_id` queries search for child documents that are joined to a specific parent document. If [`search.allow_expensive_queries`]({{site.url}}{{site.baseurl}}/query-dsl/index/#expensive-queries) is set to `false`, then joining queries are not executed. {: .important} \ No newline at end of file