Skip to content

Commit

Permalink
Merge branch 'main' into adding-filter-search-results
Browse files Browse the repository at this point in the history
  • Loading branch information
vagimeli authored Sep 3, 2024
2 parents c625110 + 9e7aedc commit 59d3f09
Show file tree
Hide file tree
Showing 20 changed files with 1,253 additions and 388 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Consider the following criteria when deciding which workload would work best for

- The cluster's use case.
- The data types that your cluster uses compared to the data structure of the documents contained in the workload. Each workload contains an example document so that you can compare data types, or you can view the index mappings and data types in the `index.json` file.
- The query types most commonly used inside your cluster. The `operations/default.json` file contains information about the query types and workload operations.
- The query types most commonly used inside your cluster. The `operations/default.json` file contains information about the query types and workload operations. For a list of common operations, see [Common operations]({{site.url}}{{site.baseurl}}/benchmark/user-guide/understanding-workloads/common-operations/).

## General search clusters

Expand Down
181 changes: 181 additions & 0 deletions _benchmark/user-guide/understanding-workloads/common-operations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
---
layout: default
title: Common operations
nav_order: 16
grand_parent: User guide
parent: Understanding workloads
---

# Common operations

[Test procedures]({{site.url}}{{site.baseurl}}/benchmark/user-guide/understanding-workloads/anatomy-of-a-workload#_operations-and-_test-procedures) use a variety of operations, found inside the `operations` directory of a workload. This page details the most common operations found inside OpenSearch Benchmark workloads.

- [Common operations](#common-operations)
- [bulk](#bulk)
- [create-index](#create-index)
- [delete-index](#delete-index)
- [cluster-health](#cluster-health)
- [refresh](#refresh)
- [search](#search)

<!-- vale off -->
## bulk
<!-- vale on -->

The `bulk` operation type allows you to run [bulk](/api-reference/document-apis/bulk/) requests as a task.

The following example shows a `bulk` operation type with a `bulk-size` of `5000` documents:

```yml
{
"name": "index-append",
"operation-type": "bulk",
"bulk-size": 5000
}
```


<!-- vale off -->
## create-index
<!-- vale on -->

The `create-index` operation runs the [Create Index API](/api-reference/index-apis/create-index/). It supports the following two modes of index creation:

- Creating all indexes specified in the workloads `indices` section
- Creating one specific index defined within the operation itself

The following example creates all indexes defined in the `indices` section of the workload. It uses all of the index settings defined in the workload but overrides the number of shards:

```yml
{
"name": "create-all-indices",
"operation-type": "create-index",
"settings": {
"index.number_of_shards": 1
},
"request-params": {
"wait_for_active_shards": "true"
}
}
```

The following example creates a new index with all index settings specified in the operation body:

```yml
{
"name": "create-an-index",
"operation-type": "create-index",
"index": "people",
"body": {
"settings": {
"index.number_of_shards": 0
},
"mappings": {
"docs": {
"properties": {
"name": {
"type": "text"
}
}
}
}
}
}
```



<!-- vale off -->
## delete-index
<!-- vale on -->

The `delete-index` operation runs the [Delete Index API](api-reference/index-apis/delete-index/). Like with the [`create-index`](#create-index) operation, you can delete all indexes found in the `indices` section of the workload or delete one or more indexes based on the string passed in the `index` setting.

The following example deletes all indexes found in the `indices` section of the workload:

```yml
{
"name": "delete-all-indices",
"operation-type": "delete-index"
}
```

The following example deletes all `logs_*` indexes:

```yml
{
"name": "delete-logs",
"operation-type": "delete-index",
"index": "logs-*",
"only-if-exists": false,
"request-params": {
"expand_wildcards": "all",
"allow_no_indices": "true",
"ignore_unavailable": "true"
}
}
```

<!-- vale off -->
## cluster-health
<!-- vale on -->

The `cluster-health` operation runs the [Cluster Health API](api-reference/cluster-api/cluster-health/), which checks the cluster health status and returns the expected status according to the parameters set for `request-params`. If an unexpected cluster health status is returned, the operation reports a failure. You can use the `--on-error` option in the OpenSearch Benchmark `execute-test` command to control how OpenSearch Benchmark behaves when the health check fails.

The following example creates a `cluster-health` operation that checks for a `green` health status on any `log-*` indexes:

```yml
{
"name": "check-cluster-green",
"operation-type": "cluster-health",
"index": "logs-*",
"request-params": {
"wait_for_status": "green",
"wait_for_no_relocating_shards": "true"
},
"retry-until-success": true
}

```

<!-- vale off -->
## refresh
<!-- vale on -->

The `refresh` operation runs the Refresh API. The `operation` returns no metadata.


The following example refreshes all `logs-*` indexes:

```yml
{
"name": "refresh",
"operation-type": "refresh",
"index": "logs-*"
}
```


<!-- vale off -->
## search
<!-- vale on -->

The `search` operation runs the [Search API](/api-reference/search/), which you can use to run queries in OpenSearch Benchmark indexes.

The following example runs a `match_all` query inside the `search` operation:

```yml
{
"name": "default",
"operation-type": "search",
"body": {
"query": {
"match_all": {}
}
},
"request-params": {
"_source_include": "some_field",
"analyze_wildcard": "false"
}
}
```
46 changes: 18 additions & 28 deletions _dashboards/management/S3-data-source.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,51 +10,41 @@ has_children: true
Introduced 2.11
{: .label .label-purple }

Starting with OpenSearch 2.11, you can connect OpenSearch to your Amazon Simple Storage Service (Amazon S3) data source using the OpenSearch Dashboards UI. You can then query that data, optimize query performance, define tables, and integrate your S3 data within a single UI.
You can connect OpenSearch to your Amazon Simple Storage Service (Amazon S3) data source using the OpenSearch Dashboards interface and then query that data, optimize query performance, define tables, and integrate your S3 data.

## Prerequisites

To connect data from Amazon S3 to OpenSearch using OpenSearch Dashboards, you must have:
Before connecting a data source, verify that the following requirements are met:

- Access to Amazon S3 and the [AWS Glue Data Catalog](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/connectors/s3glue_connector.rst#id2).
- Access to OpenSearch and OpenSearch Dashboards.
- An understanding of OpenSearch data source and connector concepts. See the [developer documentation](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/datasources.rst#introduction) for information about these concepts.
- You have access to Amazon S3 and the [AWS Glue Data Catalog](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/connectors/s3glue_connector.rst#id2).
- You have access to OpenSearch and OpenSearch Dashboards.
- You have an understanding of OpenSearch data source and connector concepts. See the [developer documentation](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/datasources.rst#introduction) for more information.

## Connect your Amazon S3 data source
## Connect your data source

To connect your Amazon S3 data source, follow these steps:
To connect your data source, follow these steps:

1. From the OpenSearch Dashboards main menu, select **Management** > **Data sources**.
2. On the **Data sources** page, select **New data source** > **S3**. An example UI is shown in the following image.
1. From the OpenSearch Dashboards main menu, go to **Management** > **Dashboards Management** > **Data sources**.
2. On the **Data sources** page, select **Create data source connection** > **Amazon S3**.
3. On the **Configure Amazon S3 data source** page, enter the data source, authentication details, and permissions.
4. Select the **Review Configuration** button to verify the connection details.
5. Select the **Connect to Amazon S3** button to establish a connection.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/data-sources-UI.png" alt="Amazon S3 data sources UI" width="700"/>
## Manage your data source

3. On the **Configure Amazon S3 data source** page, enter the required **Data source details**, **AWS Glue authentication details**, **AWS Glue index store details**, and **Query permissions**. An example UI is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/S3-config-UI.png" alt="Amazon S3 configuration UI" width="700"/>

4. Select the **Review Configuration** button and verify the details.
5. Select the **Connect to Amazon S3** button.

## Manage your Amazon S3 data source

Once you've connected your Amazon S3 data source, you can explore your data through the **Manage data sources** tab. The following steps guide you through using this functionality:
To manage your data source, follow these steps:

1. On the **Manage data sources** tab, choose a date source from the list.
2. On that data source's page, you can manage the data source, choose a use case, and manage access controls and configurations. An example UI is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/manage-data-source-UI.png" alt="Manage data sources UI" width="700"/>

3. (Optional) Explore the Amazon S3 use cases, including querying your data and optimizing query performance. Go to **Next steps** to learn more about each use case.
2. On the page for the data source, you can manage the data source, choose a use case, and configure access controls.
3. (Optional) Explore the Amazon S3 use cases, including querying your data and optimizing query performance. Refer to the [**Next steps**](#next-steps) section to learn more about each use case.

## Limitations

This feature is still under development, including the data integration functionality. For real-time updates, see the [developer documentation on GitHub](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#limitations).
This feature is currently under development, including the data integration functionality. For up-to-date information, refer to the [developer documentation on GitHub](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#limitations).

## Next steps

- Learn about [querying your data in Data Explorer]({{site.url}}{{site.baseurl}}/dashboards/management/query-data-source/) through OpenSearch Dashboards.
- Learn about ways to [optimize the query performance of your external data sources]({{site.url}}{{site.baseurl}}/dashboards/management/accelerate-external-data/), such as Amazon S3, through Query Workbench.
- Learn about [optimizing the query performance of your external data sources]({{site.url}}{{site.baseurl}}/dashboards/management/accelerate-external-data/), such as Amazon S3, through Query Workbench.
- Learn about [Amazon S3 and AWS Glue Data Catalog](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/connectors/s3glue_connector.rst) and the APIS used with Amazon S3 data sources, including configuration settings and query examples.
- Learn about [managing your indexes]({{site.url}}{{site.baseurl}}/dashboards/im-dashboards/index/) through OpenSearch Dashboards.

46 changes: 14 additions & 32 deletions _dashboards/management/accelerate-external-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,55 +12,37 @@ Introduced 2.11
{: .label .label-purple }


Query performance can be slow when using external data sources for reasons such as network latency, data transformation, and data volume. You can optimize your query performance by using OpenSearch indexes, such as a skipping index or a covering index. A _skipping index_ uses skip acceleration methods, such as partition, minimum and maximum values, and value sets, to ingest and create compact aggregate data structures. This makes them an economical option for direct querying scenarios. A _covering index_ ingests all or some of the data from the source into OpenSearch and makes it possible to use all OpenSearch Dashboards and plugin functionality. See the [Flint Index Reference Manual](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md) for comprehensive guidance on this feature's indexing process.
Query performance can be slow when using external data sources for reasons such as network latency, data transformation, and data volume. You can optimize your query performance by using OpenSearch indexes, such as a skipping index or a covering index.

A _skipping index_ uses skip acceleration methods, such as partition, minimum and maximum values, and value sets, to ingest and create compact aggregate data structures. This makes them an economical option for direct querying scenarios.

A _covering index_ ingests all or some of the data from the source into OpenSearch and makes it possible to use all OpenSearch Dashboards and plugin functionality. See the [Flint Index Reference Manual](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md) for comprehensive guidance on this feature's indexing process.

## Data sources use case: Accelerate performance

To get started with the **Accelerate performance** use case available in **Data sources**, follow these steps:

1. Go to **OpenSearch Dashboards** > **Query Workbench** and select your Amazon S3 data source from the **Data sources** dropdown menu in the upper-left corner.
2. From the left-side navigation menu, select a database. An example using the `http_logs` database is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/query-workbench-accelerate-data.png" alt="Query Workbench accelerate data UI" width="700"/>

2. From the left-side navigation menu, select a database.
3. View the results in the table and confirm that you have the desired data.
4. Create an OpenSearch index by following these steps:
1. Select the **Accelerate data** button. A pop-up window appears. An example is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/accelerate-data-popup.png" alt="Accelerate data pop-up window" width="700"/>

1. Select the **Accelerate data** button. A pop-up window appears.
2. Enter your details in **Select data fields**. In the **Database** field, select the desired acceleration index: **Skipping index** or **Covering index**. A _skipping index_ uses skip acceleration methods, such as partition, min/max, and value sets, to ingest data using compact aggregate data structures. This makes them an economical option for direct querying scenarios. A _covering index_ ingests all or some of the data from the source into OpenSearch and makes it possible to use all OpenSearch Dashboards and plugin functionality.

5. Under **Index settings**, enter the information for your acceleration index. For information about naming, select **Help**. Note that an Amazon S3 table can only have one skipping index at a time. An example is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/skipping-index-settings.png" alt="Skipping index settings" width="700"/>
5. Under **Index settings**, enter the information for your acceleration index. For information about naming, select **Help**. Note that an Amazon S3 table can only have one skipping index at a time.

### Define skipping index settings

1. Under **Skipping index definition**, select the **Add fields** button to define the skipping index acceleration method and choose the fields you want to add. An example is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/add-fields-skipping-index.png" alt="Skipping index add fields" width="700"/>

1. Under **Skipping index definition**, select the **Add fields** button to define the skipping index acceleration method and choose the fields you want to add.
2. Select the **Copy Query to Editor** button to apply your skipping index settings.
3. View the skipping index query details in the table pane and then select the **Run** button. Your index is added to the left-side navigation menu containing the list of your databases. An example is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/query-workbench-S3.png" alt="Run a skippping or covering index UI" width="700"/>
3. View the skipping index query details in the table pane and then select the **Run** button. Your index is added to the left-side navigation menu containing the list of your databases.

### Define covering index settings

1. Under **Index settings**, enter a valid index name. Note that each Amazon S3 table can have multiple covering indexes. An example is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/covering-index-naming.png" alt="Covering index settings" width="700"/>

2. Once you have added the index name, define the covering index fields by selecting `(add fields here)` under **Covering index definition**. An example is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/covering-index-fields.png" alt="Covering index field naming" width="700"/>

1. Under **Index settings**, enter a valid index name. Note that each Amazon S3 table can have multiple covering indexes.
2. Once you have added the index name, define the covering index fields by selecting `(add fields here)` under **Covering index definition**.
3. Select the **Copy Query to Editor** button to apply your covering index settings.
4. View the covering index query details in the table pane and then select the **Run** button. Your index is added to the left-side navigation menu containing the list of your databases. An example UI is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/run-index-query-workbench.png" alt="Run index in Query Workbench" width="700"/>
4. View the covering index query details in the table pane and then select the **Run** button. Your index is added to the left-side navigation menu containing the list of your databases.

## Limitations

This feature is still under development, so there are some limitations. For real-time updates, see the [developer documentation on GitHub](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#limitations).
This feature is still under development, so there are some limitations. For real-time updates, refer to the [developer documentation on GitHub](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#limitations).
Loading

0 comments on commit 59d3f09

Please sign in to comment.