Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug][storage] Make ES-Rollover idempotent by checking if the index or alias already exist #6638

Merged
merged 3 commits into from
Feb 14, 2025

Conversation

Manik2708
Copy link
Contributor

@Manik2708 Manik2708 commented Jan 30, 2025

Which problem is this PR solving?

Fixes: #6203

Description of the changes

  • Currently es-rollover checks for index existence through errors, it is mainly expecting the error:
{"error":{"root_cause":[{"type":"resource_already_exists_exception","reason":"]"}],"type":"resource_already_exists_exception","reason":"request [/jaeger-*] contains unrecognized parameter: [help]"},"status":400}

But it can lead to inconsistent results as found in the issue, where init was failing due to the error:

Error: failed to create index: jaeger-span-000001, request failed, status code: 400, body: {"error":{"root_cause":[{"type":"invalid_index_name_exception","reason":"Invalid index name [jaeger-span-000001], already exists as alias","index_uuid":"_na_","index":"jaeger-span-000001"}],"type":"invalid_index_name_exception","reason":"Invalid index name [jaeger-span-000001], already exists as alias","index_uuid":"_na_","index":"jaeger-span-000001"},"status":400}

Here if we see carefully the error is coming due to existence of index but the reason is different. es-rollover is ready only for resource_already_exists_exception but there are other errors also which can be generated due to this (like the above).

The current way of marshalling error is unsafe, the safe way is: Check if index exists -> Create if not exists. This way the certained error (resource_already_exists is fixed) and the unavoidable error like index_name_exception is ignored.

How was this change tested?

  • Unit and E2E Tests

Checklist

Copy link

codecov bot commented Jan 30, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.03%. Comparing base (4b884bb) to head (a6db745).
Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #6638   +/-   ##
=======================================
  Coverage   96.03%   96.03%           
=======================================
  Files         364      364           
  Lines       20675    20696   +21     
=======================================
+ Hits        19855    19876   +21     
  Misses        626      626           
  Partials      194      194           
Flag Coverage Δ
badger_v1 9.76% <ø> (ø)
badger_v2 1.82% <ø> (ø)
cassandra-4.x-v1-manual 14.81% <ø> (ø)
cassandra-4.x-v2-auto 1.81% <ø> (ø)
cassandra-4.x-v2-manual 1.81% <ø> (ø)
cassandra-5.x-v1-manual 14.81% <ø> (ø)
cassandra-5.x-v2-auto 1.81% <ø> (ø)
cassandra-5.x-v2-manual 1.81% <ø> (ø)
elasticsearch-6.x-v1 19.15% <ø> (ø)
elasticsearch-7.x-v1 19.23% <ø> (ø)
elasticsearch-8.x-v1 19.40% <ø> (ø)
elasticsearch-8.x-v2 1.82% <ø> (ø)
grpc_v1 10.81% <ø> (ø)
grpc_v2 7.80% <ø> (ø)
kafka-3.x-v1 10.13% <ø> (ø)
kafka-3.x-v2 1.82% <ø> (ø)
memory_v2 1.82% <ø> (ø)
opensearch-1.x-v1 19.28% <ø> (ø)
opensearch-2.x-v1 19.28% <ø> (ø)
opensearch-2.x-v2 1.82% <ø> (ø)
tailsampling-processor 0.48% <ø> (ø)
unittests 94.92% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Manik2708
Copy link
Contributor Author

Manik2708 commented Jan 30, 2025

@yurishkuro I think, even this approach can lead to the error described in the issue. An index might not exist but still pointed by an alias. Should we ignore that error in init (as that means rollover has taken place already)? I intially thought that index existence could fix this but I may be wrong. So I want to take your views over ignoring this error in init.

@Manik2708 Manik2708 changed the title [bug][storage] Enhanced the idempotency of ES-Rollover [bug][storage] Fixed the ES-Rollover Idempotency Jan 31, 2025
@Manik2708
Copy link
Contributor Author

@mahadzaryab1 Can you please review this PR and solve this doubt #6638 (comment) ?

pkg/es/client/client.go Outdated Show resolved Hide resolved
cmd/es-rollover/app/init/action.go Outdated Show resolved Hide resolved
cmd/es-rollover/app/init/action.go Show resolved Hide resolved
@Manik2708 Manik2708 requested a review from yurishkuro February 11, 2025 11:20
pkg/es/client/client.go Outdated Show resolved Hide resolved
pkg/es/client/client.go Outdated Show resolved Hide resolved
@Manik2708 Manik2708 requested a review from yurishkuro February 11, 2025 14:57
@Manik2708
Copy link
Contributor Author

Fixing e2e tests

@Manik2708 Manik2708 requested a review from yurishkuro February 13, 2025 06:11
@yurishkuro yurishkuro changed the title [bug][storage] Fixed the ES-Rollover Idempotency [bug][storage] Make ES-Rollover idempotent by checking if the index or alias already exist Feb 13, 2025
@yurishkuro yurishkuro enabled auto-merge February 13, 2025 14:51
@yurishkuro yurishkuro disabled auto-merge February 13, 2025 14:52
@yurishkuro yurishkuro enabled auto-merge February 13, 2025 14:52
@yurishkuro yurishkuro disabled auto-merge February 13, 2025 14:53
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the binary cmd/esmapping-generator/esmapping-generator-linux-amd64

code coverage is below threshold (the ES error responses are not covered by the test)

Signed-off-by: Manik2708 <[email protected]>
@Manik2708 Manik2708 requested a review from yurishkuro February 13, 2025 16:14
@yurishkuro yurishkuro added this pull request to the merge queue Feb 13, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 13, 2025
@Manik2708
Copy link
Contributor Author

@yurishkuro Is there something I can do here for failing checks in merge queue?

@yurishkuro
Copy link
Member

you could add a fix to the label check workflow in a new PR #6712 (comment)

@Manik2708
Copy link
Contributor Author

Manik2708 commented Feb 13, 2025

you could add a fix to the label check workflow in a new PR #6712 (comment)

@yurishkuro Will it be ok to disable it? Because I don't have any reason to enable it for the commit. It makes sense in the PR but I am doubtful for the commit! Also, DCO check is failing! But I think for that we need to find a fix rather than removing it!

@yurishkuro yurishkuro added this pull request to the merge queue Feb 13, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 13, 2025
@yurishkuro yurishkuro added this pull request to the merge queue Feb 14, 2025
Merged via the queue into jaegertracing:main with commit d4253f7 Feb 14, 2025
55 checks passed
@Manik2708 Manik2708 deleted the es-roll-bug branch February 14, 2025 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: jaeger-es-rollover-init tries to create already existing [jaeger-span-000001] index in Elasticseach
2 participants