-
-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infinite retry loop in RetryableJob because the canRetry/attempt not obeyed when Job/Worker segfaults #354
Comments
A lot of jobs are left in the reserved state, which is also where the attempt counter is incremented via hincrby in the redis driver. I believe these to be all the jobs that have segfaulted, but then get re-run. |
It seems that the segfault is occurring after the job finishes (at the garbage collecting stage) in the Zend memory manager. Similar to documented bugs like https://bugs.php.net/bug.php?id=71662 Switching off the Zend_MM with USE_ZEND_ALLOC=0 stops the segfaults. The question that remains is if the queue manager can deal with a segfault in the job and behave as expected in terms of queue/attempt management? |
No, it can't. Segfault can't be caught. |
I don't think the segfault needs to be caught. My thoughts are more along the line of adjusting the attempt increment/retry logic (so there is a safeguard before the job runs not after). |
Do you have an idea about implementation? |
I'll have a look |
What steps will reproduce the problem?
I am working on getting this info. It happens on a live system with a few thousand jobs per day where a few hundred segfault and get re-queued indefinitely.
The job implements \yii\queue\RetryableJobInterface and has:
public function canRetry($attempt, $error) {
return ($attempt < 3 ) && ($error instanceof TemporaryException);
}
What's expected?
Not sure if the segfault is a Queue issue, but at least the "Attempts" mechanism should work so we do not end up in an infinite race... a job should really not be retried more than twice, but I get the attempt counter (in the logs) up to 400+ (then I have to flush the queue to stop this).
What do you get instead?
Infinite re-queuing. The segfault must happen in a very awkward place in between the attempt counter being increased and canRetry call...
Additional info
Using Redis queue.
The text was updated successfully, but these errors were encountered: