Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deadlock problem #362

Closed
wants to merge 1 commit into from

Conversation

mathematicalman
Copy link

Fix 'Has not waited the lock' error. Tested at very high loads.

Q A
Is bugfix? yes
New feature? no
Breaks BC? no
Tests pass? yes/no
Fixed issues #234

Fix 'Has not waited the lock' error. Tested at very high loads.
@samdark samdark added the status:code review The pull request needs review. label Nov 29, 2019
@samdark samdark added this to the 2.3.1 milestone Nov 30, 2019
@samdark samdark requested review from zhuravljov and a team November 30, 2019 00:47
@rob006
Copy link

rob006 commented Dec 1, 2019

Can you explain how this is related to #234?

@mathematicalman
Copy link
Author

Can you explain how this is related to #234?

"Has not waited the lock" problem appears there and there - #245 . Wrapping release() in mutex lock and changing sql query of moveExpired() solve this problem.

@rob006
Copy link

rob006 commented Dec 2, 2019

#234 is about reserving lock - adding additional lock on releasing will only make things worse (more occasions to acquire lock failure). And it is unclear what is the point of such lock since you just ignore failure and continue without lock.
#245 was fixed by exact opposite change: eb8c7d4

@mathematicalman
Copy link
Author

#234 is about reserving lock - adding additional lock on releasing will only make things worse (more occasions to acquire lock failure). And it is unclear what is the point of such lock since you just ignore failure and continue without lock.
#245 was fixed by exact opposite change: eb8c7d4

Mutex does not slow down much. #245 was closed but not fully fixed.

@fl0v
Copy link

fl0v commented May 27, 2020

i think the mutex part of your code is wrong,
as it is now it does not block execution in case mutex is not aquired.

@see https://www.yiiframework.com/doc/api/2.0/yii-mutex-mutex

@samdark
Copy link
Member

samdark commented May 27, 2020

@fl0v do you mean MySQL driver of mutex extension? If so, it should be fixed there instead of in queue, right?

@fl0v
Copy link

fl0v commented May 27, 2020

$mutex = $this->mutex->acquire(__CLASS__ . $this->channel, $this->mutexTimeout);
should handle if $mutex is false

@samdark samdark modified the milestones: 2.3.1, 2.3.2 Dec 23, 2020
@samdark samdark removed this from the 2.3.2 milestone Oct 23, 2021
@hunwalk
Copy link

hunwalk commented May 30, 2022

@samdark this is still an issue, will you merge the PR into next release?

@samdark
Copy link
Member

samdark commented May 31, 2022

@hunwalk solution provided isn't finished.

@nadar
Copy link
Contributor

nadar commented Jun 7, 2022

We have that Deadlock found when trying to get lockfile over and over again, it would be nice to have that fixed. Or is there any option to fix that without this PR?

@samdark
Copy link
Member

samdark commented Jun 17, 2022

@nadar does this PR fix it?

@nadar
Copy link
Contributor

nadar commented Jun 17, 2022

I have not tested... but i assume @mathematicalman had that problem over and over again (like we have), so he made that PR and wrote Fix 'Has not waited the lock' error. **Tested at very high loads**. - so i assume this should fix that problem :-)

@samdark i have to admit, i think its not the same error, sorry i was not looking close enough - or at least i think its not the same! This is what we have, its about transaction deadlock

Error: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction
The SQL being executed was: UPDATE `queue` SET `reserved_at`=NULL WHERE `reserved_at` is not null and `reserved_at` < 1655449220 - `ttr` and `done_at` is null

@samdark
Copy link
Member

samdark commented Jun 20, 2022

@nadar no, that shouldn't be the same thing.

@nadar
Copy link
Contributor

nadar commented Jun 30, 2022

@samdark

Well, actually we have both problems :-) Just found out today:

2022-06-30 17:43:35 [pid: 1] - Worker is stopped (1:06:22)
Error: Has not waited the lock.

@samdark
Copy link
Member

samdark commented Jul 1, 2022

So... does the change from this PR fix the problem?

@gustavovendramini
Copy link

#449 (comment)

@nadar
Copy link
Contributor

nadar commented Jul 11, 2022

@samdark its "hard" to test that code from a fork via composer in production i have to admit. If it would be branch of yii2 queue it would be easy via dev-XYZ. Anyway. What @gustavovendramini mentioned points me into a interesting direction, because we have another queue connected to the same database on an "older" infrastructure which us running version 2.3.3 and we don't have any Error: Has not waited the lock. errors. So we will try to downgrade as well for the other workers.

@nadar
Copy link
Contributor

nadar commented Sep 21, 2022

This problem still persists. I accidently switched to latest version and got may errors, now switched back to 2.3.3.

@erickskrauch
Copy link
Contributor

erickskrauch commented Oct 27, 2022

So I just tested this PR on a real queue load. And unfortunately, it gets only worse. The application was unable to push any new jobs because the queue table is always locked. I have no idea, how and why it works, but adding mutexes on the release() method makes it impossible to insert/update the queue table. Some MySQL magic that I haven't studied yet 🙃

Also, I found a commit eb8c7d4 which was fixing deadlocks somehow by introducing non-optimal query, which I fixed back (#449) and returned deadlocks 🙃 Even more magic.

@samdark
Copy link
Member

samdark commented Nov 13, 2022

Closing because of such testing results. @erickskrauch thanks for testing it.

@samdark samdark closed this Nov 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status:code review The pull request needs review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants