Skip to content

Commit

Permalink
Merge branch 'main' into bitmap-filtering
Browse files Browse the repository at this point in the history
  • Loading branch information
bowenlan-amzn authored Sep 4, 2024
2 parents 6abce34 + ef8abd7 commit 4266d1b
Show file tree
Hide file tree
Showing 12 changed files with 222 additions and 67 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Consider the following criteria when deciding which workload would work best for

- The cluster's use case.
- The data types that your cluster uses compared to the data structure of the documents contained in the workload. Each workload contains an example document so that you can compare data types, or you can view the index mappings and data types in the `index.json` file.
- The query types most commonly used inside your cluster. The `operations/default.json` file contains information about the query types and workload operations.
- The query types most commonly used inside your cluster. The `operations/default.json` file contains information about the query types and workload operations. For a list of common operations, see [Common operations]({{site.url}}{{site.baseurl}}/benchmark/user-guide/understanding-workloads/common-operations/).

## General search clusters

Expand Down
181 changes: 181 additions & 0 deletions _benchmark/user-guide/understanding-workloads/common-operations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
---
layout: default
title: Common operations
nav_order: 16
grand_parent: User guide
parent: Understanding workloads
---

# Common operations

[Test procedures]({{site.url}}{{site.baseurl}}/benchmark/user-guide/understanding-workloads/anatomy-of-a-workload#_operations-and-_test-procedures) use a variety of operations, found inside the `operations` directory of a workload. This page details the most common operations found inside OpenSearch Benchmark workloads.

- [Common operations](#common-operations)
- [bulk](#bulk)
- [create-index](#create-index)
- [delete-index](#delete-index)
- [cluster-health](#cluster-health)
- [refresh](#refresh)
- [search](#search)

<!-- vale off -->
## bulk
<!-- vale on -->

The `bulk` operation type allows you to run [bulk](/api-reference/document-apis/bulk/) requests as a task.

The following example shows a `bulk` operation type with a `bulk-size` of `5000` documents:

```yml
{
"name": "index-append",
"operation-type": "bulk",
"bulk-size": 5000
}
```


<!-- vale off -->
## create-index
<!-- vale on -->

The `create-index` operation runs the [Create Index API](/api-reference/index-apis/create-index/). It supports the following two modes of index creation:

- Creating all indexes specified in the workloads `indices` section
- Creating one specific index defined within the operation itself

The following example creates all indexes defined in the `indices` section of the workload. It uses all of the index settings defined in the workload but overrides the number of shards:

```yml
{
"name": "create-all-indices",
"operation-type": "create-index",
"settings": {
"index.number_of_shards": 1
},
"request-params": {
"wait_for_active_shards": "true"
}
}
```

The following example creates a new index with all index settings specified in the operation body:

```yml
{
"name": "create-an-index",
"operation-type": "create-index",
"index": "people",
"body": {
"settings": {
"index.number_of_shards": 0
},
"mappings": {
"docs": {
"properties": {
"name": {
"type": "text"
}
}
}
}
}
}
```



<!-- vale off -->
## delete-index
<!-- vale on -->

The `delete-index` operation runs the [Delete Index API](api-reference/index-apis/delete-index/). Like with the [`create-index`](#create-index) operation, you can delete all indexes found in the `indices` section of the workload or delete one or more indexes based on the string passed in the `index` setting.

The following example deletes all indexes found in the `indices` section of the workload:

```yml
{
"name": "delete-all-indices",
"operation-type": "delete-index"
}
```

The following example deletes all `logs_*` indexes:

```yml
{
"name": "delete-logs",
"operation-type": "delete-index",
"index": "logs-*",
"only-if-exists": false,
"request-params": {
"expand_wildcards": "all",
"allow_no_indices": "true",
"ignore_unavailable": "true"
}
}
```

<!-- vale off -->
## cluster-health
<!-- vale on -->

The `cluster-health` operation runs the [Cluster Health API](api-reference/cluster-api/cluster-health/), which checks the cluster health status and returns the expected status according to the parameters set for `request-params`. If an unexpected cluster health status is returned, the operation reports a failure. You can use the `--on-error` option in the OpenSearch Benchmark `execute-test` command to control how OpenSearch Benchmark behaves when the health check fails.

The following example creates a `cluster-health` operation that checks for a `green` health status on any `log-*` indexes:

```yml
{
"name": "check-cluster-green",
"operation-type": "cluster-health",
"index": "logs-*",
"request-params": {
"wait_for_status": "green",
"wait_for_no_relocating_shards": "true"
},
"retry-until-success": true
}

```

<!-- vale off -->
## refresh
<!-- vale on -->

The `refresh` operation runs the Refresh API. The `operation` returns no metadata.


The following example refreshes all `logs-*` indexes:

```yml
{
"name": "refresh",
"operation-type": "refresh",
"index": "logs-*"
}
```


<!-- vale off -->
## search
<!-- vale on -->

The `search` operation runs the [Search API](/api-reference/search/), which you can use to run queries in OpenSearch Benchmark indexes.

The following example runs a `match_all` query inside the `search` operation:

```yml
{
"name": "default",
"operation-type": "search",
"body": {
"query": {
"match_all": {}
}
},
"request-params": {
"_source_include": "some_field",
"analyze_wildcard": "false"
}
}
```
46 changes: 14 additions & 32 deletions _dashboards/management/accelerate-external-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,55 +12,37 @@ Introduced 2.11
{: .label .label-purple }


Query performance can be slow when using external data sources for reasons such as network latency, data transformation, and data volume. You can optimize your query performance by using OpenSearch indexes, such as a skipping index or a covering index. A _skipping index_ uses skip acceleration methods, such as partition, minimum and maximum values, and value sets, to ingest and create compact aggregate data structures. This makes them an economical option for direct querying scenarios. A _covering index_ ingests all or some of the data from the source into OpenSearch and makes it possible to use all OpenSearch Dashboards and plugin functionality. See the [Flint Index Reference Manual](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md) for comprehensive guidance on this feature's indexing process.
Query performance can be slow when using external data sources for reasons such as network latency, data transformation, and data volume. You can optimize your query performance by using OpenSearch indexes, such as a skipping index or a covering index.

A _skipping index_ uses skip acceleration methods, such as partition, minimum and maximum values, and value sets, to ingest and create compact aggregate data structures. This makes them an economical option for direct querying scenarios.

A _covering index_ ingests all or some of the data from the source into OpenSearch and makes it possible to use all OpenSearch Dashboards and plugin functionality. See the [Flint Index Reference Manual](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md) for comprehensive guidance on this feature's indexing process.

## Data sources use case: Accelerate performance

To get started with the **Accelerate performance** use case available in **Data sources**, follow these steps:

1. Go to **OpenSearch Dashboards** > **Query Workbench** and select your Amazon S3 data source from the **Data sources** dropdown menu in the upper-left corner.
2. From the left-side navigation menu, select a database. An example using the `http_logs` database is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/query-workbench-accelerate-data.png" alt="Query Workbench accelerate data UI" width="700"/>

2. From the left-side navigation menu, select a database.
3. View the results in the table and confirm that you have the desired data.
4. Create an OpenSearch index by following these steps:
1. Select the **Accelerate data** button. A pop-up window appears. An example is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/accelerate-data-popup.png" alt="Accelerate data pop-up window" width="700"/>

1. Select the **Accelerate data** button. A pop-up window appears.
2. Enter your details in **Select data fields**. In the **Database** field, select the desired acceleration index: **Skipping index** or **Covering index**. A _skipping index_ uses skip acceleration methods, such as partition, min/max, and value sets, to ingest data using compact aggregate data structures. This makes them an economical option for direct querying scenarios. A _covering index_ ingests all or some of the data from the source into OpenSearch and makes it possible to use all OpenSearch Dashboards and plugin functionality.

5. Under **Index settings**, enter the information for your acceleration index. For information about naming, select **Help**. Note that an Amazon S3 table can only have one skipping index at a time. An example is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/skipping-index-settings.png" alt="Skipping index settings" width="700"/>
5. Under **Index settings**, enter the information for your acceleration index. For information about naming, select **Help**. Note that an Amazon S3 table can only have one skipping index at a time.

### Define skipping index settings

1. Under **Skipping index definition**, select the **Add fields** button to define the skipping index acceleration method and choose the fields you want to add. An example is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/add-fields-skipping-index.png" alt="Skipping index add fields" width="700"/>

1. Under **Skipping index definition**, select the **Add fields** button to define the skipping index acceleration method and choose the fields you want to add.
2. Select the **Copy Query to Editor** button to apply your skipping index settings.
3. View the skipping index query details in the table pane and then select the **Run** button. Your index is added to the left-side navigation menu containing the list of your databases. An example is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/query-workbench-S3.png" alt="Run a skippping or covering index UI" width="700"/>
3. View the skipping index query details in the table pane and then select the **Run** button. Your index is added to the left-side navigation menu containing the list of your databases.

### Define covering index settings

1. Under **Index settings**, enter a valid index name. Note that each Amazon S3 table can have multiple covering indexes. An example is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/covering-index-naming.png" alt="Covering index settings" width="700"/>

2. Once you have added the index name, define the covering index fields by selecting `(add fields here)` under **Covering index definition**. An example is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/covering-index-fields.png" alt="Covering index field naming" width="700"/>

1. Under **Index settings**, enter a valid index name. Note that each Amazon S3 table can have multiple covering indexes.
2. Once you have added the index name, define the covering index fields by selecting `(add fields here)` under **Covering index definition**.
3. Select the **Copy Query to Editor** button to apply your covering index settings.
4. View the covering index query details in the table pane and then select the **Run** button. Your index is added to the left-side navigation menu containing the list of your databases. An example UI is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/run-index-query-workbench.png" alt="Run index in Query Workbench" width="700"/>
4. View the covering index query details in the table pane and then select the **Run** button. Your index is added to the left-side navigation menu containing the list of your databases.

## Limitations

This feature is still under development, so there are some limitations. For real-time updates, see the [developer documentation on GitHub](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#limitations).
This feature is still under development, so there are some limitations. For real-time updates, refer to the [developer documentation on GitHub](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#limitations).
25 changes: 7 additions & 18 deletions _dashboards/management/query-data-source.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ has_children: false
Introduced 2.11
{: .label .label-purple }

This tutorial guides you through using the **Query data** use case for querying and visualizing your Amazon Simple Storage Service (Amazon S3) data using OpenSearch Dashboards.
This tutorial guides you through using the **Query data** use case for querying and visualizing your Amazon Simple Storage Service (Amazon S3) data using OpenSearch Dashboards.

## Prerequisites

Expand All @@ -22,15 +22,9 @@ You must be using the `opensearch-security` plugin and have the appropriate role
To get started, follow these steps:

1. On the **Manage data sources** page, select your data source from the list.
2. On the data source's detail page, select the **Query data** card. This option takes you to the **Observability** > **Logs** page, which is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/observability-logs-UI.png" alt="Observability Logs UI" width="700">

2. On the data source's detail page, select the **Query data** card. This option takes you to the **Observability** > **Logs** page.
3. Select the **Event Explorer** button. This option creates and saves frequently searched queries and visualizations using [Piped Processing Language (PPL)]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/index/) or [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/), which connects to Spark SQL.
4. Select the Amazon S3 data source from the dropdown menu in the upper-left corner. An example is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/query-data-sources-UI-2.png" alt="Observability Logs Amazon S3 dropdown menu" width="700">

4. Select the Amazon S3 data source from the dropdown menu in the upper-left corner.
5. Enter the query in the **Enter PPL query** field. Note that the default language is SQL. To change the language, select PPL from the dropdown menu.
6. Select the **Search** button. The **Query Processing** message is shown, confirming that your query is being processed.
7. View the results, which are listed in a table on the **Events** tab. On this page, details such as available fields, source, and time are shown in a table format.
Expand All @@ -40,10 +34,7 @@ To get started, follow these steps:

To create visualizations, follow these steps:

1. On the **Explorer** page, select the **Visualizations** tab. An example is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/explorer-S3viz-UI.png" alt="Explorer Amazon S3 visualizations UI" width="700">

1. On the **Explorer** page, select the **Visualizations** tab.
2. Select **Index data to visualize**. This option currently only creates [acceleration indexes]({{site.url}}{{site.baseurl}}/dashboards/management/accelerate-external-data/), which give you views of the data visualizations from the **Visualizations** tab. To create a visualization of your Amazon S3 data, go to **Discover**. See the [Discover documentation]({{site.url}}{{site.baseurl}}/dashboards/discover/index-discover/) for information and a tutorial.

## Use Query Workbench with your Amazon S3 data source
Expand All @@ -53,14 +44,12 @@ To create visualizations, follow these steps:
To use Query Workbench with your Amazon S3 data, follow these steps:

1. From the OpenSearch Dashboards main menu, select **OpenSearch Plugins** > **Query Workbench**.
2. From the **Data Sources** dropdown menu in the upper-left corner, choose your Amazon S3 data source. Your data begins loading the databases that are part of your data source. An example is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/query-workbench-S3.png" alt="Query Workbench Amazon S3 data loading UI" width="700">

2. From the **Data Sources** dropdown menu in the upper-left corner, choose your Amazon S3 data source. Your data begins loading the databases that are part of your data source.
3. View the databases listed in the left-side navigation menu and select a database to view its details. Any information about acceleration indexes is listed under **Acceleration index destination**.
4. Choose the **Describe Index** button to learn more about how data is stored in that particular index.
5. Choose the **Drop index** button to delete and clear both the OpenSearch index and the Amazon S3 Spark job that refreshes the data.
6. Enter your SQL query and select **Run**.
6. Enter your SQL query and select **Run**.

## Next steps

- Learn about [accelerating the query performance of your external data sources]({{site.url}}{{site.baseurl}}/dashboards/management/accelerate-external-data/).
Loading

0 comments on commit 4266d1b

Please sign in to comment.