Skip to content

Commit

Permalink
Merge branch 'main' into 20240322-absolute-path-for-config.yml
Browse files Browse the repository at this point in the history
  • Loading branch information
leanneeliatra authored Jun 18, 2024
2 parents a4c1112 + b3573df commit 7bd5c2b
Show file tree
Hide file tree
Showing 8 changed files with 113 additions and 17 deletions.
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ Follow these steps to set up your local copy of the repository:

```
curl -sSL https://get.rvm.io | bash -s stable
rvm install 3.2
rvm install 3.2.4
ruby -v
```

Expand Down
2 changes: 1 addition & 1 deletion _field-types/supported-field-types/derived.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ parent: Supported field types
---

# Derived field type
**Introduced 2.14**
**Introduced 2.15**
{: .label .label-purple }

Derived fields allow you to create new fields dynamically by executing scripts on existing fields. The existing fields can be either retrieved from the `_source` field, which contains the original document, or from a field's doc values. Once you define a derived field either in an index mapping or within a search request, you can use the field in a query in the same way you would use a regular field.
Expand Down
2 changes: 1 addition & 1 deletion _field-types/supported-field-types/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Boolean | [`boolean`]({{site.url}}{{site.baseurl}}/field-types/supported-field-t
IP | [`ip`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/ip/): An IP address in IPv4 or IPv6 format.
[Range]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/range/) | A range of values (`integer_range`, `long_range`, `double_range`, `float_range`, `date_range`, `ip_range`).
[Object]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/object-fields/)| [`object`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/object/): A JSON object. <br>[`nested`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/nested/): Used when objects in an array need to be indexed independently as separate documents.<br>[`flat_object`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/flat-object/): A JSON object treated as a string.<br>[`join`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/join/): Establishes a parent-child relationship between documents in the same index.
[String]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/string/)|[`keyword`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/keyword/): Contains a string that is not analyzed.<br> [`text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/): Contains a string that is analyzed.<br> [`match_only_text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/match-only-text/): A space-optimized version of a `text` field.<br>[`token_count`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/token-count/): Stores the number of analyzed tokens in a string.
[String]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/string/)|[`keyword`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/keyword/): Contains a string that is not analyzed.<br> [`text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/): Contains a string that is analyzed.<br> [`match_only_text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/match-only-text/): A space-optimized version of a `text` field.<br>[`token_count`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/token-count/): Stores the number of analyzed tokens in a string. <br>[`wildcard`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/token-count/): A variation of `keyword` with efficient substring and regular expression matching.
[Autocomplete]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/autocomplete/) |[`completion`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/completion/): Provides autocomplete functionality through a completion suggester.<br> [`search_as_you_type`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/search-as-you-type/): Provides search-as-you-type functionality using both prefix and infix completion.
[Geographic]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geographic/)| [`geo_point`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-point/): A geographic point.<br>[`geo_shape`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-shape/): A geographic shape.
[Rank]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/rank/) | Boosts or decreases the relevance score of documents (`rank_feature`, `rank_features`).
Expand Down
1 change: 1 addition & 0 deletions _field-types/supported-field-types/string.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,4 @@ Field data type | Description
[`match_only_text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/match-only-text/) | A space-optimized version of a `text` field.
[`token_count`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/token-count/) | Counts the number of tokens in a string.
[`constant_keyword`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/constant-keyword/) | Similar to `keyword` but uses a single value for all documents.
[`wildcard`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/wildcard/) | A variation of `keyword` with efficient substring and regular expression matching.
50 changes: 50 additions & 0 deletions _field-types/supported-field-types/wildcard.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
layout: default
title: Wildcard
nav_order: 62
has_children: false
parent: String field types
grand_parent: Supported field types
---

# Wildcard field type

A `wildcard` field is a variant of a `keyword` field designed for arbitrary substring and regular expression matching.

Use a `wildcard` field when your content consists of "strings of characters" and not "text". Examples include unstructured log lines and computer code.

The `wildcard` field type is indexed differently from the `keyword` field type. Whereas `keyword` fields write the original field value to the index, the `wildcard` field type splits the field value into substrings with a length that is less than or equal to 3 and writes the substrings to the index. For example, the string `test` is split into strings `t`, `te`, `tes`, `e`, `es`, and `est`.

At search time, required substrings from the query pattern are matched against the index to produce candidate documents, which are then filtered according to the pattern in the query. For example, for the search term `test`, OpenSearch performs an indexed search for `tes AND est`. If the search term contains less than three characters, OpenSearch uses character substrings that are one or two characters long. For each matching document, if the source value is `test`, then the document is returned in the results. This excludes false positive values like `nikola tesla felt alternating current was best`.

In general, exact match queries (like [`term`]({{site.url}}{{site.baseurl}}/query-dsl/term/term/) or [`terms`]({{site.url}}{{site.baseurl}}/query-dsl/term/term/) queries) perform less effectively on `wildcard` fields than on `keyword` fields, while [`wildcard`]({{site.url}}{{site.baseurl}}/query-dsl/term/wildcard/), [`prefix`]({{site.url}}{{site.baseurl}}/query-dsl/term/prefix/), and [`regexp`]({{site.url}}{{site.baseurl}}/query-dsl/term/regexp/) queries perform better on `wildcard` fields.
{: .tip}

## Example

Create a mapping with a `wildcard` field:

```json
PUT logs
{
"mappings" : {
"properties" : {
"log_line" : {
"type" : "wildcard"
}
}
}
}
```
{% include copy-curl.html %}

## Parameters

The following table lists all parameters available for `wildcard` fields.

Parameter | Description
:--- | :---
`doc_values` | A Boolean value that specifies whether the field should be stored on disk so that it can be used for aggregations, sorting, or scripting. Default is `false`.
`ignore_above` | Any string longer than this integer value should not be indexed. Default is `2147483647`.
`normalizer` | The normalizer used to preprocess values for indexing and search. By default, no normalization occurs and the original value is used. You may use the `lowercase` normalizer to perform case-insentive matching on the field.
`null_value` | A value to be used in place of `null`. Must be of the same type as the field. If this parameter is not specified, then the field is treated as missing when its value is `null`. Default is `null`.
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,9 @@ OpenSearch supports the following search settings:

- `search.highlight.term_vector_multi_value` (Static, Boolean): Specifies to highlight snippets across values of a multi-valued field. Default is `true`.

- `search.max_aggregation_rewrite_filters` (Dynamic, integer): Determines the maximum number of rewrite filters allowed during aggregation. Set this value to `0` to disable the filter rewrite optimization for aggregations.
- `search.max_aggregation_rewrite_filters` (Dynamic, integer): Determines the maximum number of rewrite filters allowed during aggregation. Set this value to `0` to disable the filter rewrite optimization for aggregations. This is an experimental feature and may change or be removed in future versions.

- `search.dynamic_pruning.cardinality_aggregation.max_allowed_cardinality` (Dynamic, integer): Determines the threshold for applying dynamic pruning in cardinality aggregation. If a field’s cardinality exceeds this threshold, the aggregation reverts to the default method. This is an experimental feature and may change or be removed in future versions.

## Point in Time settings

Expand Down
66 changes: 54 additions & 12 deletions _observing-your-data/ad/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,39 @@ A detector is an individual anomaly detection task. You can define multiple dete
- Enter a name and brief description. Make sure the name is unique and descriptive enough to help you to identify the purpose of the detector.
1. Specify the data source.
- For **Data source**, choose the index you want to use as the data source. You can optionally use index patterns to choose multiple indexes.
- (Optional) For **Data filter**, filter the index you chose as the data source. From the **Data filter** menu, choose **Add data filter**, and then design your filter query by selecting **Field**, **Operator**, and **Value**, or choose **Use query DSL** and add your own JSON filter query.
- (Optional) For **Data filter**, filter the index you chose as the data source. From the **Data filter** menu, choose **Add data filter**, and then design your filter query by selecting **Field**, **Operator**, and **Value**, or choose **Use query DSL** and add your own JSON filter query. Only [Boolean queries]({{site.url}}{{site.baseurl}}/query-dsl/compound/bool/) are supported for query domain-specific language (DSL).

#### Example filter using query DSL
The query is designed to retrieve documents in which the `urlPath.keyword` field matches one of the following specified values:

- /domain/{id}/short
- /sub_dir/{id}/short
- /abcd/123/{id}/xyz

```json
{
"bool": {
"should": [
{
"term": {
"urlPath.keyword": "/domain/{id}/short"
}
},
{
"term": {
"urlPath.keyword": "/sub_dir/{id}/short"
}
},
{
"term": {
"urlPath.keyword": "/abcd/123/{id}/xyz"
}
}
]
}
}
```

1. Specify a timestamp.
- Select the **Timestamp field** in your index.
1. Define operation settings.
Expand All @@ -44,23 +76,33 @@ A detector is an individual anomaly detection task. You can define multiple dete
- (Optional) To add extra processing time for data collection, specify a **Window delay** value.
- This value tells the detector that the data is not ingested into OpenSearch in real time but with a certain delay. Set the window delay to shift the detector interval to account for this delay.
- For example, say the detector interval is 10 minutes and data is ingested into your cluster with a general delay of 1 minute. Assume the detector runs at 2:00. The detector attempts to get the last 10 minutes of data from 1:50 to 2:00, but because of the 1-minute delay, it only gets 9 minutes of data and misses the data from 1:59 to 2:00. Setting the window delay to 1 minute shifts the interval window to 1:49--1:59, so the detector accounts for all 10 minutes of the detector interval time.
1. Specify custom result index.
- If you want to store the anomaly detection results in your own index, choose **Enable custom result index** and specify the custom index to store the result. The anomaly detection plugin adds an `opensearch-ad-plugin-result-` prefix to the index name that you input. For example, if you input `abc` as the result index name, the final index name is `opensearch-ad-plugin-result-abc`.
1. Specify custom results index.
- The Anomaly Detection plugin allows you to store anomaly detection results in a custom index of your choice. To enable this, select **Enable custom results index** and provide a name for your index, for example, `abc`. The plugin then creates an alias prefixed with `opensearch-ad-plugin-result-` followed by your chosen name, for example, `opensearch-ad-plugin-result-abc`. This alias points to an actual index with a name containing the date and a sequence number, like `opensearch-ad-plugin-result-abc-history-2024.06.12-000002`, where your results are stored.

You can use the dash “-” sign to separate the namespace to manage custom result index permissions. For example, if you use `opensearch-ad-plugin-result-financial-us-group1` as the result index, you can create a permission role based on the pattern `opensearch-ad-plugin-result-financial-us-*` to represent the "financial" department at a granular level for the "us" area.
You can use the dash “-” sign to separate the namespace to manage custom results index permissions. For example, if you use `opensearch-ad-plugin-result-financial-us-group1` as the results index, you can create a permission role based on the pattern `opensearch-ad-plugin-result-financial-us-*` to represent the "financial" department at a granular level for the "us" area.
{: .note }

- If the custom index you specify doesn’t already exist, the Anomaly Detection plugin creates this index when you create the detector and start your real-time or historical analysis.
- If the custom index already exists, the plugin checks if the index mapping of the custom index matches the anomaly result file. You need to make sure the custom index has valid mapping as shown here: [anomaly-results.json](https://github.com/opensearch-project/anomaly-detection/blob/main/src/main/resources/mappings/anomaly-results.json).
- To use the custom result index option, you need the following permissions:
- `indices:admin/create` - If the custom index already exists, you don't need this.
- When the Security plugin (fine-grained access control) is enabled, the default results index becomes a system index and is no longer accessible through the standard Index or Search APIs. To access its content, you must use the Anomaly Detection RESTful API or the dashboard. As a result, you cannot build customized dashboards using the default results index if the Security plugin is enabled. However, you can create a custom results index in order to build customized dashboards.
- If the custom index you specify does not exist, the Anomaly Detection plugin will create it when you create the detector and start your real-time or historical analysis.
- If the custom index already exists, the plugin will verify that the index mapping matches the required structure for anomaly results. In this case, ensure that the custom index has a valid mapping as defined in the [`anomaly-results.json`](https://github.com/opensearch-project/anomaly-detection/blob/main/src/main/resources/mappings/anomaly-results.json) file.
- To use the custom results index option, you need the following permissions:
- `indices:admin/create` - The Anomaly Detection plugin requires the ability to create and roll over the custom index.
- `indices:admin/aliases` - The Anomaly Detection plugin requires access to create and manage an alias for the custom index.
- `indices:data/write/index` - You need the `write` permission for the Anomaly Detection plugin to write results into the custom index for a single-entity detector.
- `indices:data/read/search` - You need the `search` permission because the Anomaly Detection plugin needs to search custom result indexes to show results on the anomaly detection UI.
- `indices:data/read/search` - You need the `search` permission because the Anomaly Detection plugin needs to search custom results indexes to show results on the Anomaly Detection UI.
- `indices:data/write/delete` - Because the detector might generate a large number of anomaly results, you need the `delete` permission to delete old data and save disk space.
- `indices:data/write/bulk*` - You need the `bulk*` permission because the Anomaly Detection plugin uses the bulk API to write results into the custom index.
- Managing the custom result index:
- The anomaly detection dashboard queries all detectors’ results from all custom result indexes. Having too many custom result indexes might impact the performance of the Anomaly Detection plugin.
- You can use [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/ism/index/) to rollover old result indexes. You can also manually delete or archive any old result indexes. We recommend reusing a custom result index for multiple detectors.
- Managing the custom results index:
- The anomaly detection dashboard queries all detectors’ results from all custom results indexes. Having too many custom results indexes might impact the performance of the Anomaly Detection plugin.
- You can use [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/ism/index/) to rollover old results indexes. You can also manually delete or archive any old results indexes. We recommend reusing a custom results index for multiple detectors.
- The Anomaly Detection plugin also provides lifecycle management for custom indexes. It rolls an alias over to a new index when the custom results index meets any of the conditions in the following table.

Parameter | Description | Type | Unit | Example | Required
:--- | :--- |:--- |:--- |:--- |:---
`result_index_min_size` | The minimum total primary shard size (excluding replicas) required for index rollover. If set to 100 GiB and the index has 5 primary and 5 replica shards of 20 GiB each, then the total primary shard size is 100 GiB, triggering the rollover. | `integer` | `MB` | `51200` | No
`result_index_min_age` | The minimum index age required for rollover, calculated from its creation time to the current time. | `integer` |`day` | `7` | No
`result_index_ttl` | The minimum age required to permanently delete rolled-over indexes. | `integer` | `day` | `60` | No

1. Choose **Next**.

After you define the detector, the next step is to configure the model.
Expand Down
3 changes: 2 additions & 1 deletion _observing-your-data/ad/settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,4 +49,5 @@ plugins.anomaly_detection.dedicated_cache_size | 10 | If the real-time analysis
plugins.anomaly_detection.max_concurrent_preview | 2 | The maximum number of concurrent previews. You can use this setting to limit resource usage.
plugins.anomaly_detection.model_max_size_percent | 0.1 | The upper bound of the memory percentage for a model.
plugins.anomaly_detection.door_keeper_in_cache.enabled | False | When set to `true`, OpenSearch places a bloom filter in front of an inactive entity cache to filter out items that are not likely to appear more than once.
plugins.anomaly_detection.hcad_cold_start_interpolation.enabled | False | When set to `true`, enables interpolation in high-cardinality anomaly detection (HCAD) cold start.
plugins.anomaly_detection.hcad_cold_start_interpolation.enabled | False | When set to true, enables interpolation for high-cardinality anomaly detection (HCAD) during the initial cold start period.
plugins.anomaly_detection.jvm_heap_usage_threshold | 95 | Specifies the JVM memory usage threshold, as a percentage, at which anomaly detectors will be disabled. The default value is 95%, meaning that detectors will be disabled when JVM heap usage reaches 95%.

0 comments on commit 7bd5c2b

Please sign in to comment.