-
Notifications
You must be signed in to change notification settings - Fork 509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metadata fields for mappings (content gap initiative) #6933
Changes from 14 commits
a7d4567
7b9aa58
3ad5c25
653d2f4
48d3a6e
8722b15
b2dfc7a
cf4a80c
061cc80
a047b4c
0b4bf5e
bd3b85c
d6cadd1
4b6f1d1
57be501
a9f6f4f
5b7ab8b
2b4db59
edd60c6
a593836
0e41a24
6c0c5c8
5e11c4a
3dcfd50
9fcc2eb
3d30dfb
56991f3
a7f39c0
c286a28
078c49f
02cc9dd
aa6ac4d
415ef50
11e1a1c
ffd96dc
5c64fd6
8e7a288
459f3d0
0c451a5
8eefb2b
709d4b3
ced8826
23943c8
91d0f18
8485d22
a5ccbd9
d5a6655
f478040
f3337a6
e21c9af
f2516c9
365b145
2292abc
a114b6e
47a1bf9
00fc9c1
ae99ef0
61a9f6b
40d9b43
284b73e
c7d9585
dc2792e
83bfc25
4389093
02af070
f83fddf
9c819eb
20bdec2
e38ec49
8f22a9c
c1a2315
bd8ac77
cf4222a
243357a
375488b
74b82c3
1262bae
2e2c00b
1d622df
022eb23
c23b805
9f7694e
97cb348
4524c1a
27c40c8
ff7c843
c33c047
596fa72
57cbe5e
3d9f7ba
60ecace
30313fe
4f0b8e4
fc62bcd
3cb42e1
53354fe
cfc9997
31597db
95e7ef9
d67b974
5ad15d9
6a30faf
28a2923
c25d86b
01131a4
a49b1ac
e2d2d0d
bfa2952
9744ab7
4898e09
c44a46a
e1cef5f
7eaefbc
7f268dc
71e6ed4
e49c1d3
a708be7
5ee0a4d
8f2c11e
e5bc821
1c9328d
df556d7
e1876b6
4202902
b970b51
2aefa3c
381f3c9
2b6ada7
88d3c1e
46c0639
9087d12
f1a195d
e53ec2a
e90eb17
b700ff7
670328f
b9ccc47
116cecb
ce89a83
7bc5bfd
7b051f5
7213aa9
581b118
916a0dc
9d8a1e7
09e64a5
234d6a0
230c830
5b24526
f360baf
c893666
5802589
3552222
ae73ae8
d2cb1b5
b0f0b92
cd6b1aa
2e8ff8b
a5074df
9654a1b
097f3ef
5e263e3
421e97f
c834b5a
044fb6c
ba7b39b
dca4c32
92a949e
e3d8d0b
5573bc9
eac468b
1037b6e
215a3b7
0cb56be
219b1e3
2dc0e2a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,23 +14,17 @@ | |
|
||
You can define how documents and their fields are stored and indexed by creating a _mapping_. The mapping specifies the list of fields for a document. Every field in the document has a _field type_, which defines the type of data the field contains. For example, you may want to specify that the `year` field should be of type `date`. To learn more, see [Supported field types]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/index/). | ||
|
||
If you're just starting to build out your cluster and data, you may not know exactly how your data should be stored. In those cases, you can use dynamic mappings, which tell OpenSearch to dynamically add data and its fields. However, if you know exactly what types your data falls under and want to enforce that standard, then you can use explicit mappings. | ||
If you're starting to build out your cluster and data, you may not know exactly how your data should be stored. In those cases, you can use dynamic mappings, which tell OpenSearch to dynamically add data and its fields. However, if you know exactly what types your data falls under and want to enforce that standard, then you can use explicit mappings. | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
For example, if you want to indicate that `year` should be of type `text` instead of an `integer`, and `age` should be an `integer`, you can do so with explicit mappings. By using dynamic mapping, OpenSearch might interpret both `year` and `age` as integers. | ||
|
||
This section provides an example for how to create an index mapping and how to add a document to it that will get ip_range validated. | ||
This documentation provides an example for how to create an index mapping and how to add a document to it that will get `ip_range` validated. | ||
|
||
#### Table of contents | ||
1. TOC | ||
{:toc} | ||
|
||
|
||
--- | ||
## Dynamic mapping | ||
|
||
When you index a document, OpenSearch adds fields automatically with dynamic mapping. You can also explicitly add fields to an index mapping. | ||
|
||
#### Dynamic mapping types | ||
### Dynamic mapping types | ||
|
||
Type | Description | ||
:--- | :--- | ||
|
@@ -63,7 +57,7 @@ | |
} | ||
``` | ||
|
||
### Response | ||
#### Response | ||
```json | ||
{ | ||
"acknowledged": true, | ||
|
@@ -88,6 +82,42 @@ | |
You cannot change the mapping of an existing field, you can only modify the field's mapping parameters. | ||
{: .note} | ||
|
||
## Mapping parameters | ||
|
||
Mapping parameters are used to configure the behavior of fields in an index. The following table lists commonly used mapping parameters. | ||
|
||
Parameter | Description | ||
:--- | :--- | ||
`analyzer` | Specifies the analyzer used to analyze string fields. | ||
`boost` | Specifies a field-level query time to boost. | ||
`coerce` | Tries to convert the value to the specified data type. | ||
`copy_to` | Copies the values of this field to another field. | ||
`doc_values` | Specifies whether the field should be stored on disk to make sorting and aggregation faster. | ||
`dynamic` | Determines whether new fields should be added dynamically. | ||
`enabled` | Specifies whether the field is enabled or disabled. | ||
`format` | Specifies the date format for date fields. | ||
`ignore_above` | Skips indexing values that are longer than the specified length. | ||
`ignore_malformed` | Specifies whether malformed values should be ignored. | ||
`index` | Specifies whether the field should be indexed. | ||
`index_options` | Specifies what information should be stored in the index for scoring purposes. | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Mapping limit settings | ||
|
||
OpenSearch has certain limits or settings related to mappings, such as the settings listed in the following table. Settings can be configured based on your requirements. | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
| Setting | Default value | Allowed value | Type | Description | | ||
|-|-|-|-|-| | ||
| index.mapping.nested_fields.limit | 50 | [0,) | Dynamic | Limits the maximum number of nested fields that can be defined in an index mapping. | | ||
| index.mapping.nested_objects.limit | 10000 | [0,) | Dynamic | Limits the maximum number of nested objects that can be created within a single document. | | ||
| index.mapping.total_fields.limit | 1000 | [0,) | Dynamic | Limits the maximum number of fields that can be defined in an index mapping. | | ||
| index.mapping.depth.limit | 20 | [1,100] | Dynamic | Limits the maximum depth of nested objects and nested fields that can be defined in an index mapping. | | ||
| index.mapping.field_name_length.limit | 50000 | [1,50000] | Dynamic | Limits the maximum length of field names that can be defined in an index mapping. | | ||
| index.mapper.dynamic | true | {true,false} | Dynamic | Determines whether new fields should be added dynamically to the mapping when they are encountered in a document. | | ||
Check failure on line 115 in _field-types/index.md GitHub Actions / vale[vale] _field-types/index.md#L115
Raw output
|
||
|
||
## Runtime fields | ||
|
||
You can define fields at query time, rather than at index time, by using runtime fields. this can be useful for creating fields based on the values of other fields, or for performing transformations on data during the query process. Runtime fields are defined in the query itself and do not affect the underlying data in the index. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not yet available on OpenSearch There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removed |
||
--- | ||
## Mapping example usage | ||
|
||
|
@@ -171,7 +201,7 @@ | |
GET <index>/_mapping | ||
``` | ||
|
||
In the above request, `<index>` may be an index name or a comma-separated list of index names. | ||
In the previous request, `<index>` may be an index name or a comma-separated list of index names. | ||
|
||
To get all mappings for all indexes, use the following request: | ||
|
||
|
@@ -220,3 +250,27 @@ | |
} | ||
} | ||
``` | ||
|
||
## Delete a mapping | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we support There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removed text |
||
|
||
The syntax for deleting a mapping depends on whether you want to delete the entire mapping for an index or the mapping for a specific field. The syntax for deleting a mapping is as follows: | ||
|
||
```json | ||
DELETE /<index_name>/_mapping | ||
DELETE /<field_name/_mapping> | ||
``` | ||
|
||
For example, to delete the entire mapping for the `sample-index1` index, you can use the following commands: | ||
|
||
```json | ||
<insert command> | ||
``` | ||
|
||
If you want to delete the mapping for a specific field, you can <insert instructional text> For example, to delete the mapping for the `year` field, use the following command: | ||
|
||
```json | ||
<insert command> | ||
``` | ||
|
||
Deleting a field mapping will remove the mapping definition for that field across all indexes or the specified index. It will not delete the actual data stored in those fields. | ||
{: .note} |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,18 @@ | ||||||
--- | ||||||
layout: default | ||||||
title: Field names | ||||||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
nav_order: 10 | ||||||
has_children: false | ||||||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
parent: Metadata fields | ||||||
--- | ||||||
|
||||||
# Field names | ||||||
|
||||||
The `field_names` field indexes the names of fields within a document that contain non-null values. This field support the `exists` query, which identifies documents with or without non-null values for a specified field. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mgodwan Please review this narrative for technical accuracy. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, term queries on this metadata field are deprecated. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Revised. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
The `field_names` only indexes field names when both `doc_values` and `norms` are disabled for those fields. If either `doc_values` or `norms` are enabled, the `exists` query remains functional but does not rely on `field_names`. | ||||||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
## Mapping example | ||||||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
<SME: Please provide a mapping example.> | ||||||
Check warning on line 17 in _field-types/metadata-fields/field-names.md GitHub Actions / vale[vale] _field-types/metadata-fields/field-names.md#L17
Raw output
|
||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
--- | ||
layout: default | ||
title: ID | ||
nav_order: 20 | ||
has_children: false | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
parent: Metadata fields | ||
--- | ||
|
||
# ID | ||
|
||
Each document has an `_id` field that uniquely identifies it. This field is indexed, allowing documents to be retrieved either through the `GET` API or the [`ids` query]({{site.url}}{{site.baseurl}}/query-dsl/term/ids/). | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Line above: Is "GET" intentionally in code font? |
||
The following examples creates an index `test-index1` and add two documents with different `_id` values: | ||
|
||
```json | ||
PUT test-index1/_doc/1 | ||
{ | ||
"text": "Document with ID 1" | ||
} | ||
|
||
PUT test-index1/_doc/2?refresh=true | ||
{ | ||
"text": "Document with ID 2" | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
Now, you can query the documents using the `_id` field: | ||
|
||
```json | ||
GET test-index1/_search | ||
{ | ||
"query": { | ||
"terms": { | ||
"_id": ["1", "2"] | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
The following response shows that this query returns both documents with `_id` values of `1` and `2`. | ||
|
||
```json | ||
{ | ||
"took": 10, | ||
"timed_out": false, | ||
"_shards": { | ||
"total": 1, | ||
"successful": 1, | ||
"skipped": 0, | ||
"failed": 0 | ||
}, | ||
"hits": { | ||
"total": { | ||
"value": 2, | ||
"relation": "eq" | ||
}, | ||
"max_score": 1, | ||
"hits": [ | ||
{ | ||
"_index": "test-index1", | ||
"_id": "1", | ||
"_score": 1, | ||
"_source": { | ||
"text": "Document with ID 1" | ||
} | ||
}, | ||
{ | ||
"_index": "test-index1", | ||
"_id": "2", | ||
"_score": 1, | ||
"_source": { | ||
"text": "Document with ID 2" | ||
} | ||
} | ||
] | ||
} | ||
``` | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
{% include copy-curl.html %} | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
## Querying on the `_id` field | ||
|
||
While the `_id` field is accessible in various queries, it is restricted from use in aggregations, sorting, and scripting. If you need to sort or aggregate on the `_id` field, it is recommended to duplicate the content of the `_id` field into another field that has `doc_values` enabled. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @sandeshkr419 Would you be able to confirm this from search perspective? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hey @mgodwan missed the notification for this one somehow. The explanation seems correct. @vagimeli I'm wondering if we should link the usage example of querying over Something like:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Revised.
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
--- | ||
layout: default | ||
title: Ignored | ||
nav_order: 15 | ||
has_children: false | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
parent: Metadata fields | ||
--- | ||
|
||
# Ignored | ||
|
||
The `_ignored` field indexes and stores the name of fields within a document that were ignored during the indexing process due to being malformed. This functionality is enabled with the `ignore_malformed` setting is turned on in the [index mapping]({{site.url}}{{site.baseurl}}/field-types/#mapping-example-usage). | ||
|
||
The `_ignored` field allows you to search and identify documents that contain fields that were ignored, as well as the specific field names that were ignored. The can be useful for troubleshooting and understadning issues related to malformed data in your documents. | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
You can query the `_ignored` field using `term`, `terms`, and `exists` queries, and the results will be in the search hits. | ||
|
||
The `_ignored` field is only populated when the `ignore_malformed` setting is enabled in your index mapping. If `ignore_malformed` is set to `false` (the default value), malformed fields will cause the entire document to be rejected, and the `_ignored` field will not be populated. | ||
{: .note} | ||
|
||
For example, the following query will retrieve all documents that have at least one field that was ignored during indexing: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we add indexing example as well for this?
|
||
|
||
```json | ||
GET _search | ||
{ | ||
"query": { | ||
"exists": { | ||
"field": "_ignored" | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
Similarly, you can use a term query to find documents where a specific field, such as created_at, was ignored: | ||
|
||
```json | ||
GET _search | ||
{ | ||
"query": { | ||
"term": { | ||
"_ignored": "created_at" | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
#### Reponse | ||
|
||
```json | ||
{ | ||
"took": 51, | ||
"timed_out": false, | ||
"_shards": { | ||
"total": 45, | ||
"successful": 45, | ||
"skipped": 0, | ||
"failed": 0 | ||
}, | ||
"hits": { | ||
"total": { | ||
"value": 0, | ||
"relation": "eq" | ||
}, | ||
"max_score": null, | ||
"hits": [] | ||
} | ||
} | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
--- | ||
layout: default | ||
title: Index | ||
nav_order: 25 | ||
has_children: false | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
parent: Metadata fields | ||
--- | ||
|
||
# Index | ||
|
||
When querying across multiple indexes, you may need to filter results based on the index a document was indexed into. The `index` field matches documents based on their index. | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The following example creates two indexes, `products` and `customers` and adds a document to each index: | ||
|
||
```json | ||
PUT products/_doc/1 | ||
{ | ||
"name": "Widget X" | ||
} | ||
|
||
PUT customers/_doc/2?refresh=true | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can remove |
||
{ | ||
"name": "John Doe" | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
Now, you can query both indexes and filter the results using the `_index` field: | ||
|
||
```json | ||
GET products,customers/_search | ||
{ | ||
"query": { | ||
"terms": { | ||
"_index": ["products", "customers"] | ||
} | ||
}, | ||
"aggs": { | ||
"index_groups": { | ||
"terms": { | ||
"field": "_index", | ||
"size": 10 | ||
} | ||
} | ||
}, | ||
"sort": [ | ||
{ | ||
"_index": { | ||
"order": "desc" | ||
} | ||
} | ||
], | ||
"script_fields": { | ||
"index_name": { | ||
"script": { | ||
"lang": "painless", | ||
"source": "doc['_index'].value" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
In this example: | ||
|
||
- The `query` section uses a `terms` query to match documents from the `products` and `customers` indexes. | ||
- The `aggs` section performs a `terms` aggregation on the `_index` field, grouping the results by index. | ||
- The `sort` section sorts the results by the `_index` field in ascending order. | ||
- The `script_fields` section adds a new field `index_name` to the search results that contains the value of the `_index` field for each document. | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Querying on the `_index` field | ||
|
||
<SME: Please provide information necessary for users to understand how this works for them.> | ||
Check warning on line 74 in _field-types/metadata-fields/index-metadata.md GitHub Actions / vale[vale] _field-types/metadata-fields/index-metadata.md#L74
Raw output
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
-- | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
layout: default | ||
Check failure on line 2 in _field-types/metadata-fields/index.md GitHub Actions / vale[vale] _field-types/metadata-fields/index.md#L2
Raw output
|
||
title: Metadata fields | ||
nav_order: 90 | ||
has_children: true | ||
Check failure on line 5 in _field-types/metadata-fields/index.md GitHub Actions / vale[vale] _field-types/metadata-fields/index.md#L5
Raw output
|
||
has_toc: false | ||
Check failure on line 6 in _field-types/metadata-fields/index.md GitHub Actions / vale[vale] _field-types/metadata-fields/index.md#L6
Raw output
|
||
--- | ||
|
||
# Metadata fields | ||
|
||
OpenSearch has built-in metadata fields that provide information about the documents in an index. These fields can be accessed or used in queries as needed. | ||
|
||
Metadata fields | Description | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
:--- | :--- | ||
`field_names` | The fields within the document that hold non-empty or non-null values. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
`_ignored` | The fields in the document that were disregarded during the indexing process due to the presence of malformed data, as specified by the `ignore_malformed` setting. | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
`_id` | The unique identifier assigned to each individual document. | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
`_index` | The specific index within the OpenSearch database where the document is stored and organized. | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
`_meta` | Stores custom metadata or additional information specific to the application or use case. | ||
`_routing` | Allows you to specify a custom value that determines the shard assignment for the document within the OpenSearch cluster. | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
`_source` | Contains the original JSON representation of the document's data. | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
--- | ||
layout: default | ||
title: Meta | ||
nav_order: 30 | ||
has_children: false | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
parent: Metadata fields | ||
--- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are certain caveats of using the dynamic mappings (e.g. performance impact). I believe we should highlight the same and recommend to use explicit mappings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised