Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After upgrade of Opensearch to 2.14 Graylog starts throwing exceptions - Unable to perform search query: OpenSearch exception [type=concurrent_modification_exception, reason=null]. #19533

Closed
clickbg opened this issue Jun 3, 2024 · 15 comments
Assignees

Comments

@clickbg
Copy link

clickbg commented Jun 3, 2024

Since upgrade to Graylog 6.0 and Opensearch 2.14 all alerts start to generate Opensearch related errors.

Example error:

Event definition SSH Failed Login Attempt Detected (6634e0a3f056cf428db56e00) failed: Unable to perform search query: OpenSearch exception [type=concurrent_modification_exception, reason=null].
2024-06-03T11:33:36.168+03:00 ERROR [PivotAggregationSearch] Aggregation search query <query-1> returned an error: Unable to perform search query:

OpenSearch exception [type=concurrent_modification_exception, reason=null].

2024-06-03T11:33:36.169+03:00 ERROR [EventProcessorExecutionJob] Event processor <aggregation-v1/6638a54549632a07ce5290ac> failed to execute: Unable to perform search query:

OpenSearch exception [type=concurrent_modification_exception, reason=null]. (retry in 5000 ms)
org.graylog.events.processor.EventProcessorException: Unable to perform search query:

OpenSearch exception [type=concurrent_modification_exception, reason=null].
	at org.graylog.events.processor.aggregation.PivotAggregationSearch.doSearch(PivotAggregationSearch.java:186) ~[graylog.jar:?]
	at org.graylog.events.processor.aggregation.AggregationEventProcessor.aggregatedSearch(AggregationEventProcessor.java:298) ~[graylog.jar:?]
	at org.graylog.events.processor.aggregation.AggregationEventProcessor.createEvents(AggregationEventProcessor.java:148) ~[graylog.jar:?]
	at org.graylog.events.processor.EventProcessorEngine.execute(EventProcessorEngine.java:100) ~[graylog.jar:?]
	at org.graylog.events.processor.EventProcessorExecutionJob.execute(EventProcessorExecutionJob.java:116) ~[graylog.jar:?]
	at org.graylog.scheduler.JobExecutionEngine.executeJob(JobExecutionEngine.java:293) ~[graylog.jar:?]
	at org.graylog.scheduler.JobExecutionEngine.lambda$handleTrigger$4(JobExecutionEngine.java:266) ~[graylog.jar:?]
	at com.codahale.metrics.Timer.time(Timer.java:151) ~[graylog.jar:?]
	at org.graylog.scheduler.JobExecutionEngine.handleTrigger(JobExecutionEngine.java:266) ~[graylog.jar:?]
	at org.graylog.scheduler.JobExecutionEngine.handleTriggerWithConcurrencyLimit(JobExecutionEngine.java:238) ~[graylog.jar:?]
	at org.graylog.scheduler.JobExecutionEngine.lambda$execute$2(JobExecutionEngine.java:203) ~[graylog.jar:?]
	at org.graylog.scheduler.worker.JobWorkerPool.lambda$execute$0(JobWorkerPool.java:122) ~[graylog.jar:?]
	at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:212) [graylog.jar:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
	at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) [graylog.jar:?]
	at java.base/java.lang.Thread.run(Unknown Source) [?:?]

Expected Behavior

Aggregation search should still function.

Current Behavior

Aggregation search seems to work fine but we are getting a lot of system errors.

Possible Solution

n/a

Steps to Reproduce (for bugs)

  1. Upgrade Opensearch to latest available version - 2.14.0
  2. Upgrade Graylog to 6.0

Context

There seems to be a discussion going on here - https://community.graylog.org/t/event-definitions-causing-concurrent-modification-exception/32529 but I can't find a PR about it. I am sorry if a PR already exists.

Your Environment

  • Graylog Version: 6.0.2-1
  • Java Version: 17.0.11
  • OpenSearch Version: 2.14.0
  • MongoDB Version: 6.0.15
  • Operating System: Ubuntu 22.04.4 LTS
  • Browser version: Any
@clickbg clickbg added the bug label Jun 3, 2024
@clickbg clickbg changed the title After upgrade of Opensearch to 2.14 Graylog starts throwingUnable to perform search query: OpenSearch exception [type=concurrent_modification_exception, reason=null]. After upgrade of Opensearch to 2.14 Graylog starts throwing exceptions - Unable to perform search query: OpenSearch exception [type=concurrent_modification_exception, reason=null]. Jun 3, 2024
@tellistone
Copy link

Hello, thanks for raising this

Can I ask how busy the cluster is (how often events are running, typically)?

Any possibility you could attach graylog's server.log file?

@patrickmann
Copy link
Contributor

@clickbg Looking at the failing code, we should be raising a related system notification event.
Do you see one on the system / alerts & events tab?

@clickbg
Copy link
Author

clickbg commented Jun 3, 2024

Hi, thanks for the fast reply.

Sure I am attaching the server.log and the relevant Opensearch logs - I just have removed any personally identifying domains or IPs the rest should be as it was logged. I am also attaching screenshot of the Event Definitions and the errors in System / Alerts & Events - @patrickmann Yes there are a lot of them - basically one for each alert definition that was ran after the upgrade.

In terms of busy, the system (its a single system) isn't busy at all - log ingestion is below 100MB per day on average - around 50MB per day. There are 8 event definitions in total, 6 of them are running every 5 minutes with a backlog search of 6 minutes, 1 is running every 30 minutes with a backlog search of 31 minutes and the last one is running every 2 days (which was configured by mistake, it should be daily) but I left it as it for now. The misconfigured one hasn't ran since the upgrade which was done on 2024-06-01 14:33 EEST / 11:33 UTC.

Thanks again!

server.log
opensearch-logs.tar.gz

Event-Definitions
Errors

@patrickmann
Copy link
Contributor

@clickbg Great - can you share the (redacted) details view of one of those System Notification Events?

@clickbg
Copy link
Author

clickbg commented Jun 3, 2024

Sure, I am attaching the details of most recent event that has generated an error. Most of them similar and search for a simple pattern and group by either source or an Grok extracted field - IP (%{IP})

Event-definition-details

@patrickmann
Copy link
Contributor

@clickbg I meant details of a System Notification event instance, not the definition itself. I'm hoping it will contain the actual query error.
Here's an example (of a different even type):
image

@clickbg
Copy link
Author

clickbg commented Jun 3, 2024

Ah sorry my mistake, I am attaching the details for the related Events and Alerts

alerts-details
events-details

@coffee-squirrel
Copy link

coffee-squirrel commented Jun 3, 2024

Just noting that https://go2docs.graylog.org/current/downloading_and_installing_graylog/installing_graylog.html says the max OpenSearch version supported with Graylog 6.0.x is currently 2.13.x (and the integration tests only seem to be testing up to 2.12.x).

@clickbg
Copy link
Author

clickbg commented Jun 3, 2024

@coffee-squirrel yes unfortunately OpenSearch doesn't support downgrading and they treat 2.13 -> 2.14 as a minor upgrade, at least from package management perspective. Unlike Graylog where you have to purposefully change the repo, OpenSearch just upgrades automatically...

root@apollo:~# apt policy opensearch
opensearch:
  Installed: 2.14.0
  Candidate: 2.14.0
  Version table:
 *** 2.14.0 500
        500 https://artifacts.opensearch.org/releases/bundle/opensearch/2.x/apt stable/main amd64 Packages
        100 /var/lib/dpkg/status
     2.13.0 500
        500 https://artifacts.opensearch.org/releases/bundle/opensearch/2.x/apt stable/main amd64 Packages
     2.12.0 500
        500 https://artifacts.opensearch.org/releases/bundle/opensearch/2.x/apt stable/main amd64 Packages
     2.11.1 500
        500 https://artifacts.opensearch.org/releases/bundle/opensearch/2.x/apt stable/main amd64 Packages
     2.11.0 500
        500 https://artifacts.opensearch.org/releases/bundle/opensearch/2.x/apt stable/main amd64 Packages
     2.10.0 500
        500 https://artifacts.opensearch.org/releases/bundle/opensearch/2.x/apt stable/main amd64 Packages
     2.9.0 500
        500 https://artifacts.opensearch.org/releases/bundle/opensearch/2.x/apt stable/main amd64 Packages
     2.8.0 500
        500 https://artifacts.opensearch.org/releases/bundle/opensearch/2.x/apt stable/main amd64 Packages
     2.7.0 500
        500 https://artifacts.opensearch.org/releases/bundle/opensearch/2.x/apt stable/main amd64 Packages
     2.6.0 500
        500 https://artifacts.opensearch.org/releases/bundle/opensearch/2.x/apt stable/main amd64 Packages
     2.5.0 500
        500 https://artifacts.opensearch.org/releases/bundle/opensearch/2.x/apt stable/main amd64 Packages

So anyone who does regular Ubuntu/Debian/RH/SLES upgrades will inevitably end up with 2.14 without a way to revert without having to delete everything and start from scratch. One way to avoid this is to bundle the correct version of OpenSearch in the Graylog repo - that way you control which version we get but it will add extra work in maintaining an extra package. Another way is to upgrade the docs and advise users to put a hold on the OpenSearch package (apt-mark hold) but that risks the reverse problem - people running too old of a version of OpenSearch which isn't compatible with Graylog anymore. External dependencies are always a pain.

@kmerz kmerz added the triaged label Jun 4, 2024
@hydrapolic
Copy link

Same here on Graylog 5.2.7 / OpenSearch 2.14.0.

@janheise
Copy link
Contributor

janheise commented Jun 6, 2024

Investigation showed, that IMHO this is a bug in OpenSearch. opensearch-project/OpenSearch#14032

@clickbg
Copy link
Author

clickbg commented Jun 7, 2024

@janheise thank you for the fast investigation and for excellent reporting of this to the respective project

@janheise
Copy link
Contributor

fixed for 2.15, see opensearch-project/opensearch-build#4681

@cocorossello
Copy link

I can confirm that the error is gone after upgrading to 2.15.0

@clickbg
Copy link
Author

clickbg commented Jun 27, 2024

I can also confirm that after upgrading to 2.15 on 25th this month no new alerts for this bug have been generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants