From 87a82044fbde7a127bd7b7d91b25a55c456cd6cf Mon Sep 17 00:00:00 2001 From: Andriy Redko Date: Thu, 29 Aug 2024 13:34:00 -0400 Subject: [PATCH 1/9] Document new experimental ingestion streaming APIs Signed-off-by: Andriy Redko --- .../document-apis/bulk-streaming.md | 79 +++++++++++++++++++ 1 file changed, 79 insertions(+) create mode 100644 _api-reference/document-apis/bulk-streaming.md diff --git a/_api-reference/document-apis/bulk-streaming.md b/_api-reference/document-apis/bulk-streaming.md new file mode 100644 index 0000000000..56b9c6cbca --- /dev/null +++ b/_api-reference/document-apis/bulk-streaming.md @@ -0,0 +1,79 @@ +--- +layout: default +title: Streaming Bulk +parent: Document APIs +nav_order: 20 +redirect_from: + - /opensearch/rest-api/document-apis/bulk/streaming/ +--- + +# Bulk +**Introduced 2.17.0** +{: .label .label-purple } + +The streaming bulk operation lets you add, update, or delete multiple documents in by streaming the request and getting the results as streaming response. In comparison to traditional [bulk]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/) APIs, streaming ingestion eliminates the need to guess the batch size (which is affected by the cluster operational state at any given moment of time) and naturally applies the back pressure between many clients and the cluster. The streaming works over HTTP/2 or HTTP 1.1 (using chunked transfer encoding), depending on the capabilities of the clients and the cluster. + +The streaming support not provided by default HTTP transport. Instead, the [transport-reactor-netty4]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/network-settings/#selecting-the-transport) HTTP transport plugin has to be installed and used as the default HTTP transport. Both the transport and streaming bulk APIs are experimental. +{: .note} + +## Example + +```json +POST _bulk/stream -H "Transfer-Encoding: chunked" -H "Content-Type: application/json" +{ "delete": { "_index": "movies", "_id": "tt2229499" } } +{ "index": { "_index": "movies", "_id": "tt1979320" } } +{ "title": "Rush", "year": 2013 } +{ "create": { "_index": "movies", "_id": "tt1392214" } } +{ "title": "Prisoners", "year": 2013 } +{ "update": { "_index": "movies", "_id": "tt0816711" } } +{ "doc" : { "title": "World War Z" } } + +``` +{% include copy-curl.html %} + + +## Path and HTTP methods + +``` +POST _bulk/stream +POST /_bulk/stream +``` + +Specifying the index in the path means you don't need to include it in the [request body chunks]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/#request-body). + +OpenSearch also accepts PUT requests to the `_bulk/steram` path, but we highly recommend using POST. The accepted usage of PUT---adding or replacing a single resource at a given path---doesn't make sense for streaming bulk requests. +{: .note } + + +## URL parameters + +All streaming bulk URL parameters are optional. + +Parameter | Type | Description +:--- | :--- | :--- +pipeline | String | The pipeline ID for preprocessing documents. +refresh | Enum | Whether to refresh the affected shards after performing the indexing operations. Default is `false`. `true` makes the changes show up in search results immediately, but hurts cluster performance. `wait_for` waits for a refresh. Requests take longer to return, but cluster performance doesn't suffer. +require_alias | Boolean | Set to `true` to require that all actions target an index alias rather than an index. Default is `false`. +routing | String | Routes the request to the specified shard. +timeout | Time | How long to wait for the request to return. Default `1m`. +type | String | (Deprecated) The default document type for documents that don't specify a type. Default is `_doc`. We highly recommend ignoring this parameter and using a type of `_doc` for all indexes. +wait_for_active_shards | String | Specifies the number of active shards that must be available before OpenSearch processes the bulk request. Default is 1 (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the request to succeed. +batch_interval | Time | Specifies how long bulk operations should be accumulated into batch before sending over to data nodes. +batch_size | Time | Specifies how many bulk operations should be accumulated into batch before sending over to data nodes. Default `1`. +{% comment %}_source | List | asdf +_source_excludes | list | asdf +_source_includes | list | asdf{% endcomment %} + +## Request body + +The streaming bulk API request body is fully compatible with bulk API [request body]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/#request-body), whereas each bulk operation (create / index / update / delete) is sent as a separate chunk. + +## Example response + +Depending on the batch settings, each streamed response chunk may report the results of one or many (batch) bulk operations, for example for the request from above with no batching (default), the following streaming response will be received: + +```json +{"took": 11, "errors": false, "items": [ { "index": {"_index": "movies", "_id": "tt1979320", "_version": 1, "result": "created", "_shards": { "total": 2 "successful": 1, "failed": 0 }, "_seq_no": 1, "_primary_term": 1, "status": 201 } } ] } +{"took": 2, "errors": true, "items": [ { "create": { "_index": "movies", "_id": "tt1392214", "status": 409, "error": { "type": "version_conflict_engine_exception", "reason": "[tt1392214]: version conflict, document already exists (current version [1])", "index": "movies", "shard": "0", "index_uuid": "yhizhusbSWmP0G7OJnmcLg" } } } ] } +{"took": 4, "errors": true, "items": [ { "update": { "_index": "movies", "_id": "tt0816711", "status": 404, "error": { "type": "document_missing_exception", "reason": "[_doc][tt0816711]: document missing", "index": "movies", "shard": "0", "index_uuid": "yhizhusbSWmP0G7OJnmcLg" } } } ] } +``` From a2b50fc70d5be336ba4749c3e2c2de4fa1b704a4 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Fri, 30 Aug 2024 18:15:14 -0400 Subject: [PATCH 2/9] Doc review Signed-off-by: Fanit Kolchina --- .../document-apis/bulk-streaming.md | 80 ++++++++++--------- 1 file changed, 41 insertions(+), 39 deletions(-) diff --git a/_api-reference/document-apis/bulk-streaming.md b/_api-reference/document-apis/bulk-streaming.md index 56b9c6cbca..9a25b9560b 100644 --- a/_api-reference/document-apis/bulk-streaming.md +++ b/_api-reference/document-apis/bulk-streaming.md @@ -1,76 +1,78 @@ --- layout: default -title: Streaming Bulk +title: Streaming bulk parent: Document APIs -nav_order: 20 +nav_order: 25 redirect_from: - /opensearch/rest-api/document-apis/bulk/streaming/ --- -# Bulk +# Streaming bulk **Introduced 2.17.0** {: .label .label-purple } -The streaming bulk operation lets you add, update, or delete multiple documents in by streaming the request and getting the results as streaming response. In comparison to traditional [bulk]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/) APIs, streaming ingestion eliminates the need to guess the batch size (which is affected by the cluster operational state at any given moment of time) and naturally applies the back pressure between many clients and the cluster. The streaming works over HTTP/2 or HTTP 1.1 (using chunked transfer encoding), depending on the capabilities of the clients and the cluster. +This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://example.issue.link). +{: .warning} -The streaming support not provided by default HTTP transport. Instead, the [transport-reactor-netty4]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/network-settings/#selecting-the-transport) HTTP transport plugin has to be installed and used as the default HTTP transport. Both the transport and streaming bulk APIs are experimental. -{: .note} - -## Example - -```json -POST _bulk/stream -H "Transfer-Encoding: chunked" -H "Content-Type: application/json" -{ "delete": { "_index": "movies", "_id": "tt2229499" } } -{ "index": { "_index": "movies", "_id": "tt1979320" } } -{ "title": "Rush", "year": 2013 } -{ "create": { "_index": "movies", "_id": "tt1392214" } } -{ "title": "Prisoners", "year": 2013 } -{ "update": { "_index": "movies", "_id": "tt0816711" } } -{ "doc" : { "title": "World War Z" } } - -``` -{% include copy-curl.html %} +The streaming bulk operation lets you add, update, or delete multiple documents in by streaming the request and getting the results as streaming response. In comparison to the traditional [Bulk API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/), streaming ingestion eliminates the need to guess the batch size (which is affected by the cluster operational state at any given time) and naturally applies the backpressure between many clients and the cluster. The streaming works over HTTP/2 or HTTP 1.1 (using chunked transfer encoding), depending on the capabilities of the clients and the cluster. +The default HTTP transport method does not support streaming. You must install the [`transport-reactor-netty4`]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/network-settings/#selecting-the-transport) HTTP transport plugin and use it as the default HTTP transport method. Both the transport method and the Streaming Bulk API are experimental. +{: .note} ## Path and HTTP methods -``` +```json POST _bulk/stream POST /_bulk/stream ``` -Specifying the index in the path means you don't need to include it in the [request body chunks]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/#request-body). +If you specify the index in the path, then you don't need to include it in the [request body chunks]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/#request-body). OpenSearch also accepts PUT requests to the `_bulk/steram` path, but we highly recommend using POST. The accepted usage of PUT---adding or replacing a single resource at a given path---doesn't make sense for streaming bulk requests. {: .note } -## URL parameters +## Query parameters -All streaming bulk URL parameters are optional. +The following table lists the available query parameters. All query parameters are optional. -Parameter | Type | Description +Parameter | Data type | Description :--- | :--- | :--- -pipeline | String | The pipeline ID for preprocessing documents. -refresh | Enum | Whether to refresh the affected shards after performing the indexing operations. Default is `false`. `true` makes the changes show up in search results immediately, but hurts cluster performance. `wait_for` waits for a refresh. Requests take longer to return, but cluster performance doesn't suffer. -require_alias | Boolean | Set to `true` to require that all actions target an index alias rather than an index. Default is `false`. -routing | String | Routes the request to the specified shard. -timeout | Time | How long to wait for the request to return. Default `1m`. -type | String | (Deprecated) The default document type for documents that don't specify a type. Default is `_doc`. We highly recommend ignoring this parameter and using a type of `_doc` for all indexes. -wait_for_active_shards | String | Specifies the number of active shards that must be available before OpenSearch processes the bulk request. Default is 1 (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the request to succeed. -batch_interval | Time | Specifies how long bulk operations should be accumulated into batch before sending over to data nodes. -batch_size | Time | Specifies how many bulk operations should be accumulated into batch before sending over to data nodes. Default `1`. +`pipeline` | String | The pipeline ID for preprocessing documents. +`refresh` | Enum | Whether to refresh the affected shards after performing the indexing operations. Default is `false`. `true` makes the changes show up in search results immediately, but hurts cluster performance. `wait_for` waits for a refresh. Requests take longer to return, but cluster performance doesn't suffer. +`require_alias` | Boolean | Set to `true` to require that all actions target an index alias rather than an index. Default is `false`. +`routing` | String | Routes the request to the specified shard. +`timeout` | Time | How long to wait for the request to return. Default `1m`. +`type` | String | (Deprecated) The default document type for documents that don't specify a type. Default is `_doc`. We highly recommend ignoring this parameter and using a type of `_doc` for all indexes. +`wait_for_active_shards` | String | Specifies the number of active shards that must be available before OpenSearch processes the bulk request. Default is 1 (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the request to succeed. +`batch_interval` | Time | Specifies how long bulk operations should be accumulated into batch before sending over to data nodes. +`batch_size` | Time | Specifies how many bulk operations should be accumulated into batch before sending over to data nodes. Default `1`. {% comment %}_source | List | asdf -_source_excludes | list | asdf -_source_includes | list | asdf{% endcomment %} +`_source_excludes` | list | asdf +`_source_includes` | list | asdf{% endcomment %} ## Request body -The streaming bulk API request body is fully compatible with bulk API [request body]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/#request-body), whereas each bulk operation (create / index / update / delete) is sent as a separate chunk. +The streaming bulk API request body is fully compatible with the [Bulk API request body]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/#request-body), where each bulk operation (create/index/update/delete) is sent as a separate chunk. + +## Example request + +```json +curl -X POST "http://localhost:9200/_bulk/stream" -H "Transfer-Encoding: chunked" -H "Content-Type: application/json" -d' +{ "delete": { "_index": "movies", "_id": "tt2229499" } } +{ "index": { "_index": "movies", "_id": "tt1979320" } } +{ "title": "Rush", "year": 2013 } +{ "create": { "_index": "movies", "_id": "tt1392214" } } +{ "title": "Prisoners", "year": 2013 } +{ "update": { "_index": "movies", "_id": "tt0816711" } } +{ "doc" : { "title": "World War Z" } } +' +``` +{% include copy.html %} ## Example response -Depending on the batch settings, each streamed response chunk may report the results of one or many (batch) bulk operations, for example for the request from above with no batching (default), the following streaming response will be received: +Depending on the batch settings, each streamed response chunk may report the results of one or many (batch) bulk operations, for example for the preceding request with no batching (default), the streaming response is the following: ```json {"took": 11, "errors": false, "items": [ { "index": {"_index": "movies", "_id": "tt1979320", "_version": 1, "result": "created", "_shards": { "total": 2 "successful": 1, "failed": 0 }, "_seq_no": 1, "_primary_term": 1, "status": 201 } } ] } From 8dd48ad621f0a730f1ab5cd6093ac11a9b1183ab Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Fri, 30 Aug 2024 18:18:17 -0400 Subject: [PATCH 3/9] Small rewording Signed-off-by: Fanit Kolchina --- _api-reference/document-apis/bulk-streaming.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/document-apis/bulk-streaming.md b/_api-reference/document-apis/bulk-streaming.md index 9a25b9560b..97288ae091 100644 --- a/_api-reference/document-apis/bulk-streaming.md +++ b/_api-reference/document-apis/bulk-streaming.md @@ -16,7 +16,7 @@ This is an experimental feature and is not recommended for use in a production e The streaming bulk operation lets you add, update, or delete multiple documents in by streaming the request and getting the results as streaming response. In comparison to the traditional [Bulk API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/), streaming ingestion eliminates the need to guess the batch size (which is affected by the cluster operational state at any given time) and naturally applies the backpressure between many clients and the cluster. The streaming works over HTTP/2 or HTTP 1.1 (using chunked transfer encoding), depending on the capabilities of the clients and the cluster. -The default HTTP transport method does not support streaming. You must install the [`transport-reactor-netty4`]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/network-settings/#selecting-the-transport) HTTP transport plugin and use it as the default HTTP transport method. Both the transport method and the Streaming Bulk API are experimental. +The default HTTP transport method does not support streaming. You must install the [`transport-reactor-netty4`]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/network-settings/#selecting-the-transport) HTTP transport plugin and use it as the default HTTP transport method. Both the `transport-reactor-netty4` plugin and the Streaming Bulk API are experimental. {: .note} ## Path and HTTP methods From dd36b316d75cc8ad9893333b47914947c065dc9c Mon Sep 17 00:00:00 2001 From: Andriy Redko Date: Mon, 2 Sep 2024 08:53:41 -0400 Subject: [PATCH 4/9] Address review comments Signed-off-by: Andriy Redko --- _api-reference/document-apis/bulk-streaming.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/_api-reference/document-apis/bulk-streaming.md b/_api-reference/document-apis/bulk-streaming.md index 97288ae091..f5ce490c61 100644 --- a/_api-reference/document-apis/bulk-streaming.md +++ b/_api-reference/document-apis/bulk-streaming.md @@ -11,12 +11,12 @@ redirect_from: **Introduced 2.17.0** {: .label .label-purple } -This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://example.issue.link). +This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/9065). {: .warning} -The streaming bulk operation lets you add, update, or delete multiple documents in by streaming the request and getting the results as streaming response. In comparison to the traditional [Bulk API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/), streaming ingestion eliminates the need to guess the batch size (which is affected by the cluster operational state at any given time) and naturally applies the backpressure between many clients and the cluster. The streaming works over HTTP/2 or HTTP 1.1 (using chunked transfer encoding), depending on the capabilities of the clients and the cluster. +The streaming bulk operation lets you add, update, or delete multiple documents in by streaming the request and getting the results as streaming response. In comparison to the traditional [Bulk API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/), streaming ingestion eliminates the need to guess the batch size (which is affected by the cluster operational state at any given time) and naturally applies the backpressure between many clients and the cluster. The streaming works over HTTP/2 or HTTP/1.1 (using chunked transfer encoding), depending on the capabilities of the clients and the cluster. -The default HTTP transport method does not support streaming. You must install the [`transport-reactor-netty4`]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/network-settings/#selecting-the-transport) HTTP transport plugin and use it as the default HTTP transport method. Both the `transport-reactor-netty4` plugin and the Streaming Bulk API are experimental. +The default HTTP transport method does not support streaming. You must install the [`transport-reactor-netty4`]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/network-settings/#selecting-the-transport) HTTP transport plugin and use it as the default HTTP transport layer. Both the `transport-reactor-netty4` plugin and the Streaming Bulk API are experimental. {: .note} ## Path and HTTP methods @@ -28,7 +28,7 @@ POST /_bulk/stream If you specify the index in the path, then you don't need to include it in the [request body chunks]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/#request-body). -OpenSearch also accepts PUT requests to the `_bulk/steram` path, but we highly recommend using POST. The accepted usage of PUT---adding or replacing a single resource at a given path---doesn't make sense for streaming bulk requests. +OpenSearch also accepts PUT requests to the `_bulk/stream` path, but we highly recommend using POST. The accepted usage of PUT---adding or replacing a single resource at a given path---doesn't make sense for streaming bulk requests. {: .note } @@ -57,7 +57,7 @@ The streaming bulk API request body is fully compatible with the [Bulk API reque ## Example request -```json +``` curl -X POST "http://localhost:9200/_bulk/stream" -H "Transfer-Encoding: chunked" -H "Content-Type: application/json" -d' { "delete": { "_index": "movies", "_id": "tt2229499" } } { "index": { "_index": "movies", "_id": "tt1979320" } } @@ -72,7 +72,7 @@ curl -X POST "http://localhost:9200/_bulk/stream" -H "Transfer-Encoding: chunked ## Example response -Depending on the batch settings, each streamed response chunk may report the results of one or many (batch) bulk operations, for example for the preceding request with no batching (default), the streaming response is the following: +Depending on the batch settings, each streamed response chunk may report the results of one or many (batch) bulk operations, for example for the preceding request with no batching (default), the streaming response may look like this: ```json {"took": 11, "errors": false, "items": [ { "index": {"_index": "movies", "_id": "tt1979320", "_version": 1, "result": "created", "_shards": { "total": 2 "successful": 1, "failed": 0 }, "_seq_no": 1, "_primary_term": 1, "status": 201 } } ] } From e0d2d7a8145b44f1675b3d2966d1c773f73758dc Mon Sep 17 00:00:00 2001 From: Andriy Redko Date: Tue, 3 Sep 2024 11:10:00 -0400 Subject: [PATCH 5/9] Address review comments Signed-off-by: Andriy Redko --- _api-reference/document-apis/bulk-streaming.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/document-apis/bulk-streaming.md b/_api-reference/document-apis/bulk-streaming.md index f5ce490c61..2d94505300 100644 --- a/_api-reference/document-apis/bulk-streaming.md +++ b/_api-reference/document-apis/bulk-streaming.md @@ -57,7 +57,7 @@ The streaming bulk API request body is fully compatible with the [Bulk API reque ## Example request -``` +```json curl -X POST "http://localhost:9200/_bulk/stream" -H "Transfer-Encoding: chunked" -H "Content-Type: application/json" -d' { "delete": { "_index": "movies", "_id": "tt2229499" } } { "index": { "_index": "movies", "_id": "tt1979320" } } From 749ec321b9de2466fba7828f091d048ff1d441de Mon Sep 17 00:00:00 2001 From: Andriy Redko Date: Thu, 12 Sep 2024 08:32:01 -0400 Subject: [PATCH 6/9] Address review comments Signed-off-by: Andriy Redko --- _api-reference/document-apis/bulk-streaming.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/_api-reference/document-apis/bulk-streaming.md b/_api-reference/document-apis/bulk-streaming.md index 2d94505300..31fd4fa178 100644 --- a/_api-reference/document-apis/bulk-streaming.md +++ b/_api-reference/document-apis/bulk-streaming.md @@ -14,7 +14,7 @@ redirect_from: This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/9065). {: .warning} -The streaming bulk operation lets you add, update, or delete multiple documents in by streaming the request and getting the results as streaming response. In comparison to the traditional [Bulk API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/), streaming ingestion eliminates the need to guess the batch size (which is affected by the cluster operational state at any given time) and naturally applies the backpressure between many clients and the cluster. The streaming works over HTTP/2 or HTTP/1.1 (using chunked transfer encoding), depending on the capabilities of the clients and the cluster. +The streaming bulk operation lets you add, update, or delete multiple documents by streaming the request and getting the results as a streaming response. In comparison to the traditional [Bulk API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/), streaming ingestion eliminates the need to estimate the batch size (which is affected by the cluster operational state at any given time) and naturally applies backpressure between many clients and the cluster. The streaming works over HTTP/2 or HTTP/1.1 (using chunked transfer encoding), depending on the capabilities of the clients and the cluster. The default HTTP transport method does not support streaming. You must install the [`transport-reactor-netty4`]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/network-settings/#selecting-the-transport) HTTP transport plugin and use it as the default HTTP transport layer. Both the `transport-reactor-netty4` plugin and the Streaming Bulk API are experimental. {: .note} @@ -28,7 +28,7 @@ POST /_bulk/stream If you specify the index in the path, then you don't need to include it in the [request body chunks]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/#request-body). -OpenSearch also accepts PUT requests to the `_bulk/stream` path, but we highly recommend using POST. The accepted usage of PUT---adding or replacing a single resource at a given path---doesn't make sense for streaming bulk requests. +OpenSearch also accepts PUT requests to the `_bulk/stream` path, but we highly recommend using POST. The accepted usage of PUT---adding or replacing a single resource on a given path---doesn't make sense for streaming bulk requests. {: .note } @@ -48,12 +48,12 @@ Parameter | Data type | Description `batch_interval` | Time | Specifies how long bulk operations should be accumulated into batch before sending over to data nodes. `batch_size` | Time | Specifies how many bulk operations should be accumulated into batch before sending over to data nodes. Default `1`. {% comment %}_source | List | asdf -`_source_excludes` | list | asdf -`_source_includes` | list | asdf{% endcomment %} +`_source_excludes` | List | asdf +`_source_includes` | List | asdf{% endcomment %} ## Request body -The streaming bulk API request body is fully compatible with the [Bulk API request body]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/#request-body), where each bulk operation (create/index/update/delete) is sent as a separate chunk. +The Streaming Bulk API request body is fully compatible with the [Bulk API request body]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/#request-body), where each bulk operation (create/index/update/delete) is sent as a separate chunk. ## Example request @@ -72,7 +72,7 @@ curl -X POST "http://localhost:9200/_bulk/stream" -H "Transfer-Encoding: chunked ## Example response -Depending on the batch settings, each streamed response chunk may report the results of one or many (batch) bulk operations, for example for the preceding request with no batching (default), the streaming response may look like this: +Depending on the batch settings, each streamed response chunk may report the results of one or many (batch) bulk operations. For example, for the preceding request with no batching (default), the streaming response may appear as follows: ```json {"took": 11, "errors": false, "items": [ { "index": {"_index": "movies", "_id": "tt1979320", "_version": 1, "result": "created", "_shards": { "total": 2 "successful": 1, "failed": 0 }, "_seq_no": 1, "_primary_term": 1, "status": 201 } } ] } From 57f1b550f41d9a6662733751073bc393f831cd63 Mon Sep 17 00:00:00 2001 From: Andriy Redko Date: Thu, 12 Sep 2024 08:36:12 -0400 Subject: [PATCH 7/9] Address review comments Signed-off-by: Andriy Redko --- _api-reference/document-apis/bulk-streaming.md | 8 ++++---- _api-reference/document-apis/bulk.md | 4 ++-- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/_api-reference/document-apis/bulk-streaming.md b/_api-reference/document-apis/bulk-streaming.md index 31fd4fa178..e4129f3660 100644 --- a/_api-reference/document-apis/bulk-streaming.md +++ b/_api-reference/document-apis/bulk-streaming.md @@ -42,11 +42,11 @@ Parameter | Data type | Description `refresh` | Enum | Whether to refresh the affected shards after performing the indexing operations. Default is `false`. `true` makes the changes show up in search results immediately, but hurts cluster performance. `wait_for` waits for a refresh. Requests take longer to return, but cluster performance doesn't suffer. `require_alias` | Boolean | Set to `true` to require that all actions target an index alias rather than an index. Default is `false`. `routing` | String | Routes the request to the specified shard. -`timeout` | Time | How long to wait for the request to return. Default `1m`. +`timeout` | Time | How long to wait for the request to return. Default is `1m`. `type` | String | (Deprecated) The default document type for documents that don't specify a type. Default is `_doc`. We highly recommend ignoring this parameter and using a type of `_doc` for all indexes. -`wait_for_active_shards` | String | Specifies the number of active shards that must be available before OpenSearch processes the bulk request. Default is 1 (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the request to succeed. -`batch_interval` | Time | Specifies how long bulk operations should be accumulated into batch before sending over to data nodes. -`batch_size` | Time | Specifies how many bulk operations should be accumulated into batch before sending over to data nodes. Default `1`. +`wait_for_active_shards` | String | Specifies the number of active shards that must be available before OpenSearch processes the bulk request. Default is `1` (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have 2 replicas distributed across 2 additional nodes in order for the request to succeed. +`batch_interval` | Time | Specifies for how long bulk operations should be accumulated into a batch before sending the batch to data nodes. +`batch_size` | Time | Specifies how many bulk operations should be accumulated into a batch before sending the batch to data nodes. Default is `1`. {% comment %}_source | List | asdf `_source_excludes` | List | asdf `_source_includes` | List | asdf{% endcomment %} diff --git a/_api-reference/document-apis/bulk.md b/_api-reference/document-apis/bulk.md index 0475aa573d..50fcb7375c 100644 --- a/_api-reference/document-apis/bulk.md +++ b/_api-reference/document-apis/bulk.md @@ -56,9 +56,9 @@ pipeline | String | The pipeline ID for preprocessing documents. refresh | Enum | Whether to refresh the affected shards after performing the indexing operations. Default is `false`. `true` makes the changes show up in search results immediately, but hurts cluster performance. `wait_for` waits for a refresh. Requests take longer to return, but cluster performance doesn't suffer. require_alias | Boolean | Set to `true` to require that all actions target an index alias rather than an index. Default is `false`. routing | String | Routes the request to the specified shard. -timeout | Time | How long to wait for the request to return. Default `1m`. +timeout | Time | How long to wait for the request to return. Default is `1m`. type | String | (Deprecated) The default document type for documents that don't specify a type. Default is `_doc`. We highly recommend ignoring this parameter and using a type of `_doc` for all indexes. -wait_for_active_shards | String | Specifies the number of active shards that must be available before OpenSearch processes the bulk request. Default is 1 (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the request to succeed. +wait_for_active_shards | String | Specifies the number of active shards that must be available before OpenSearch processes the bulk request. Default is `1` (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have 2 replicas distributed across 2 additional nodes in order for the request to succeed. batch_size | Integer | **(Deprecated)** Specifies the number of documents to be batched and sent to an ingest pipeline to be processed together. Default is `2147483647` (documents are ingested by an ingest pipeline all at once). If the bulk request doesn't explicitly specify an ingest pipeline or the index doesn't have a default ingest pipeline, then this parameter is ignored. Only documents with `create`, `index`, or `update` actions can be grouped into batches. {% comment %}_source | List | asdf _source_excludes | list | asdf From 2f0d2d974650b7a18fac502f0b23e362a12eb307 Mon Sep 17 00:00:00 2001 From: Andriy Redko Date: Thu, 12 Sep 2024 08:40:34 -0400 Subject: [PATCH 8/9] Address review comments Signed-off-by: Andriy Redko --- _api-reference/document-apis/bulk-streaming.md | 2 +- _api-reference/document-apis/bulk.md | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/_api-reference/document-apis/bulk-streaming.md b/_api-reference/document-apis/bulk-streaming.md index e4129f3660..cf0e7dc73c 100644 --- a/_api-reference/document-apis/bulk-streaming.md +++ b/_api-reference/document-apis/bulk-streaming.md @@ -43,7 +43,7 @@ Parameter | Data type | Description `require_alias` | Boolean | Set to `true` to require that all actions target an index alias rather than an index. Default is `false`. `routing` | String | Routes the request to the specified shard. `timeout` | Time | How long to wait for the request to return. Default is `1m`. -`type` | String | (Deprecated) The default document type for documents that don't specify a type. Default is `_doc`. We highly recommend ignoring this parameter and using a type of `_doc` for all indexes. +`type` | String | (Deprecated) The default document type for documents that don't specify a type. Default is `_doc`. We highly recommend ignoring this parameter and using the `_doc` type for all indexes. `wait_for_active_shards` | String | Specifies the number of active shards that must be available before OpenSearch processes the bulk request. Default is `1` (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have 2 replicas distributed across 2 additional nodes in order for the request to succeed. `batch_interval` | Time | Specifies for how long bulk operations should be accumulated into a batch before sending the batch to data nodes. `batch_size` | Time | Specifies how many bulk operations should be accumulated into a batch before sending the batch to data nodes. Default is `1`. diff --git a/_api-reference/document-apis/bulk.md b/_api-reference/document-apis/bulk.md index 50fcb7375c..34e1ac375e 100644 --- a/_api-reference/document-apis/bulk.md +++ b/_api-reference/document-apis/bulk.md @@ -57,12 +57,12 @@ refresh | Enum | Whether to refresh the affected shards after performing the ind require_alias | Boolean | Set to `true` to require that all actions target an index alias rather than an index. Default is `false`. routing | String | Routes the request to the specified shard. timeout | Time | How long to wait for the request to return. Default is `1m`. -type | String | (Deprecated) The default document type for documents that don't specify a type. Default is `_doc`. We highly recommend ignoring this parameter and using a type of `_doc` for all indexes. +type | String | (Deprecated) The default document type for documents that don't specify a type. Default is `_doc`. We highly recommend ignoring this parameter and using the `_doc` type for all indexes. wait_for_active_shards | String | Specifies the number of active shards that must be available before OpenSearch processes the bulk request. Default is `1` (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have 2 replicas distributed across 2 additional nodes in order for the request to succeed. batch_size | Integer | **(Deprecated)** Specifies the number of documents to be batched and sent to an ingest pipeline to be processed together. Default is `2147483647` (documents are ingested by an ingest pipeline all at once). If the bulk request doesn't explicitly specify an ingest pipeline or the index doesn't have a default ingest pipeline, then this parameter is ignored. Only documents with `create`, `index`, or `update` actions can be grouped into batches. {% comment %}_source | List | asdf -_source_excludes | list | asdf -_source_includes | list | asdf{% endcomment %} +_source_excludes | List | asdf +_source_includes | List | asdf{% endcomment %} ## Request body From 5b24c1688fd28a67693ca01f7f64b22514276951 Mon Sep 17 00:00:00 2001 From: Andriy Redko Date: Thu, 12 Sep 2024 08:42:22 -0400 Subject: [PATCH 9/9] Address review comments Signed-off-by: Andriy Redko --- _api-reference/document-apis/bulk-streaming.md | 2 +- _api-reference/document-apis/bulk.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/_api-reference/document-apis/bulk-streaming.md b/_api-reference/document-apis/bulk-streaming.md index cf0e7dc73c..7d05e93c8a 100644 --- a/_api-reference/document-apis/bulk-streaming.md +++ b/_api-reference/document-apis/bulk-streaming.md @@ -39,7 +39,7 @@ The following table lists the available query parameters. All query parameters a Parameter | Data type | Description :--- | :--- | :--- `pipeline` | String | The pipeline ID for preprocessing documents. -`refresh` | Enum | Whether to refresh the affected shards after performing the indexing operations. Default is `false`. `true` makes the changes show up in search results immediately, but hurts cluster performance. `wait_for` waits for a refresh. Requests take longer to return, but cluster performance doesn't suffer. +`refresh` | Enum | Whether to refresh the affected shards after performing the indexing operations. Default is `false`. `true` causes the changes show up in search results immediately but degrades cluster performance. `wait_for` waits for a refresh. Requests take longer to return, but cluster performance isn't degraded. `require_alias` | Boolean | Set to `true` to require that all actions target an index alias rather than an index. Default is `false`. `routing` | String | Routes the request to the specified shard. `timeout` | Time | How long to wait for the request to return. Default is `1m`. diff --git a/_api-reference/document-apis/bulk.md b/_api-reference/document-apis/bulk.md index 34e1ac375e..4add60ee37 100644 --- a/_api-reference/document-apis/bulk.md +++ b/_api-reference/document-apis/bulk.md @@ -53,7 +53,7 @@ All bulk URL parameters are optional. Parameter | Type | Description :--- | :--- | :--- pipeline | String | The pipeline ID for preprocessing documents. -refresh | Enum | Whether to refresh the affected shards after performing the indexing operations. Default is `false`. `true` makes the changes show up in search results immediately, but hurts cluster performance. `wait_for` waits for a refresh. Requests take longer to return, but cluster performance doesn't suffer. +refresh | Enum | Whether to refresh the affected shards after performing the indexing operations. Default is `false`. `true` causes the changes show up in search results immediately but degrades cluster performance. `wait_for` waits for a refresh. Requests take longer to return, but cluster performance isn't degraded. require_alias | Boolean | Set to `true` to require that all actions target an index alias rather than an index. Default is `false`. routing | String | Routes the request to the specified shard. timeout | Time | How long to wait for the request to return. Default is `1m`.