-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grouped workflows appear to be stuck and maxed out #61
Comments
hey @hardcodet, I took a look in our changelog and in the changes between those versions, but for now I don't see anything weird https://docs.bullmq.io/bullmq-pro/changelog, one question, the upgrade that you did includes any other changes, asking in order to isolate this |
Hi Rogger |
It's also noteworthy that those maxed out groups disappear after a while. I usually have a few when querying for groups, but never too many. So they are not indefinitely stuck (even though I just manually deleted one that seems to have been around for a few hours already). On part may be the automated cleanup routines I run on groups, when I maintenance routine of mine discovers that I have group workflows were due for execution, but didn't run. I assume that routine would clear out those empty groups without any jobs though, I guess. This is the routine that I wrote a while back with your help, which sometimes kicks in due to stuck groups (and kicked in a lot in the last few days):
|
FYI, after having simply deleted most pending workflows, and thus killed the rescheduling cycle, things have normalized again. There was one last empty maxed group that didn't want to go away, so I deleted that manually, and that was the end of it. I'm also back on 6.6.1, and didn't see any more issues (it's save to say that I will wait quite a while before rolling that out to PROD though - we'll have to monitor that situation over the next weeks). |
hi @hardcodet, we were refactoring how we are tracking maxed groups in v7.9.2 in case you want to give it a try |
Is there an update on this situation? Does upgrading to v7.9.2+ fix the issue or just decrease the likelihood of this situation? |
AFAIK there is no issue currently, but if this has happened to you using one of the latest versions we would like to know. |
I upgraded Bull from 6.3.4 to 6.6.1 recently, and noticed that our scheduling went completely havoc :)
Downgrading today didn't fix anything, so while it would be a big coincidence, the upgrade may be a red herring.
We are using Bull to schedule "workflows" at a given time, in most of the cases immediately, but sometimes also at a given time. We are often using groups with a concurrency of 1 to ensure sequential execution in certain contexts to avoid race conditions.
We also have a routine that detects workflows that have not been executed when scheduled. If it detects those, it reschedules them for immediate execution (in batches of max 500 workflows at a time). This typically doesn't happen, but since after the package upgrade (which was deployed a bit after 20:00), this has skyrocketed:
I have a testing endpoint that returns any group: Fetching groups often returns just an empty array (which is good, as it indicates there's simply no pending groups). But I am now frequently also am getting groups like those:
As you can see, they are all maxed, often without any jobs at all. Now, I saw that we also had a lot of load during that time from an integration team. That might have played into those maxed out groups, but those look super fishy.
Any idea what I could do to help triangulate those?
The text was updated successfully, but these errors were encountered: