Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add idempotency support for bulk request API #3625

Open
dblock opened this issue Jun 17, 2022 · 1 comment
Open

Add idempotency support for bulk request API #3625

dblock opened this issue Jun 17, 2022 · 1 comment
Labels
Clients Clients within the Core repository such as High level Rest client and low level client distributed framework enhancement Enhancement or improvement to existing feature or request

Comments

@dblock
Copy link
Member

dblock commented Jun 17, 2022

Is your feature request related to a problem? Please describe.

Coming from #3000, where we say current bulk indexing API places a high configuration burden on users today to avoid RejectedExecutionException due to TOO_MANY_REQUESTS. This forces the user to "experiment" with bulk block sizes, multi-threading, refresh intervals, etc.

Describe the solution you'd like

Make it easier for clients to handle retries by making bulk requests idempotent by generating a request ID on the client, or assuming id=SHA(data), tracking pending bulk write tasks on the server and their progress, making it possible to safely retry the entire bulk operation. Clients receiving a 429 would have a built-in retry, and it would also be safe to retry from multiple instances of the client with the hope that one succeeds eventually when the server finally has capacity.

Some googling shows people are at least attempting implementing client-side retry, https://gist.github.com/henrikno/e0ebd6804cb62491343c or https://gitlab.com/gitlab-org/gitlab/-/issues/12372

Describe alternatives you've considered
Streaming API, #3000.

@dblock dblock added enhancement Enhancement or improvement to existing feature or request untriaged labels Jun 17, 2022
@kotwanikunal kotwanikunal added Clients Clients within the Core repository such as High level Rest client and low level client distributed framework and removed untriaged labels Jun 21, 2022
@adnapibar
Copy link
Contributor

There is a utility class for thread safe bulk processing with controls on when to flush, number of concurrent requests and retry policies - BulkProcessor.java

This can be used with a high level rest client, for example

try(RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200)));
            BulkProcessor bulkProcessor = BulkProcessor.builder(
                (request, bulkListener) -> client.bulkAsync(request, RequestOptions.DEFAULT, bulkListener), listener)
                    .setBulkActions(500)
                    .setBulkSize(new ByteSizeValue(2L, ByteSizeUnit.MB))
                    .setFlushInterval(TimeValue.timeValueSeconds(10L))
                    .setConcurrentRequests(0)
                    .setBackoffPolicy(BackoffPolicy.constantBackoff(TimeValue.timeValueSeconds(1L), 3))
                    .build()) {
            bulkProcessor.add(new IndexRequest(...));
            ...

} catch (IOException e) {
          throw new RuntimeException(e);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clients Clients within the Core repository such as High level Rest client and low level client distributed framework enhancement Enhancement or improvement to existing feature or request
Projects
None yet
Development

No branches or pull requests

3 participants