-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After upgrade of Opensearch to 2.14 Graylog starts throwing exceptions - Unable to perform search query: OpenSearch exception [type=concurrent_modification_exception, reason=null]. #19533
Comments
Hello, thanks for raising this Can I ask how busy the cluster is (how often events are running, typically)? Any possibility you could attach graylog's server.log file? |
@clickbg Looking at the failing code, we should be raising a related system notification event. |
Hi, thanks for the fast reply. Sure I am attaching the server.log and the relevant Opensearch logs - I just have removed any personally identifying domains or IPs the rest should be as it was logged. I am also attaching screenshot of the Event Definitions and the errors in System / Alerts & Events - @patrickmann Yes there are a lot of them - basically one for each alert definition that was ran after the upgrade. In terms of busy, the system (its a single system) isn't busy at all - log ingestion is below 100MB per day on average - around 50MB per day. There are 8 event definitions in total, 6 of them are running every 5 minutes with a backlog search of 6 minutes, 1 is running every 30 minutes with a backlog search of 31 minutes and the last one is running every 2 days (which was configured by mistake, it should be daily) but I left it as it for now. The misconfigured one hasn't ran since the upgrade which was done on 2024-06-01 14:33 EEST / 11:33 UTC. Thanks again! |
@clickbg Great - can you share the (redacted) details view of one of those System Notification Events? |
@clickbg I meant details of a System Notification event instance, not the definition itself. I'm hoping it will contain the actual query error. |
Just noting that https://go2docs.graylog.org/current/downloading_and_installing_graylog/installing_graylog.html says the max OpenSearch version supported with Graylog |
@coffee-squirrel yes unfortunately OpenSearch doesn't support downgrading and they treat 2.13 -> 2.14 as a minor upgrade, at least from package management perspective. Unlike Graylog where you have to purposefully change the repo, OpenSearch just upgrades automatically...
So anyone who does regular Ubuntu/Debian/RH/SLES upgrades will inevitably end up with 2.14 without a way to revert without having to delete everything and start from scratch. One way to avoid this is to bundle the correct version of OpenSearch in the Graylog repo - that way you control which version we get but it will add extra work in maintaining an extra package. Another way is to upgrade the docs and advise users to put a hold on the OpenSearch package (apt-mark hold) but that risks the reverse problem - people running too old of a version of OpenSearch which isn't compatible with Graylog anymore. External dependencies are always a pain. |
Same here on Graylog 5.2.7 / OpenSearch 2.14.0. |
Investigation showed, that IMHO this is a bug in OpenSearch. opensearch-project/OpenSearch#14032 |
@janheise thank you for the fast investigation and for excellent reporting of this to the respective project |
fixed for 2.15, see opensearch-project/opensearch-build#4681 |
I can confirm that the error is gone after upgrading to 2.15.0 |
I can also confirm that after upgrading to 2.15 on 25th this month no new alerts for this bug have been generated. |
Since upgrade to Graylog 6.0 and Opensearch 2.14 all alerts start to generate Opensearch related errors.
Example error:
Expected Behavior
Aggregation search should still function.
Current Behavior
Aggregation search seems to work fine but we are getting a lot of system errors.
Possible Solution
n/a
Steps to Reproduce (for bugs)
Context
There seems to be a discussion going on here - https://community.graylog.org/t/event-definitions-causing-concurrent-modification-exception/32529 but I can't find a PR about it. I am sorry if a PR already exists.
Your Environment
The text was updated successfully, but these errors were encountered: