-
-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Has not waited the lock" on queue with supervisor numprocs more than one #234
Comments
Which PostgreSQL version? |
psql (PostgreSQL) 10.3 (Ubuntu 10.3-1.pgdg16.04+1) |
I have the same problem on |
the same story with MySQL |
How large is your queue? |
for me, it's 5 workers |
I mean how many jobs do you have in queue table. |
right now I have a very little load. May be 1 job per minute |
I have ~300k waiting jobs in queue and DB driver (MariaDB) becomes unusable - it took like 2-3 seconds to reserve a job (which executes in less than 0.1 second, so queue spends most of its time on reserving jobs). |
You are getting lock contention, meaning that all workers are trying to obtain a lock at the same time. |
+1 |
I have the same problem in a Postgresql database "PostgreSQL 10.4 (Debian 10.4-2.pgdg90 + 1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18 + deb9u1) 6.3.0 20170516, 64-bit " it seems a subject of the database configuration. Is there any idea how to solve this? What effects can this have? |
+1 |
OP problem is related to mutex settings, see this answer on SO. In general if you get such errors, you should switch mutex backend to some more reliable implementation ( |
Doesn't seem there's anything to fix in the code but definitely worth documenting it. |
We too have been bitten by the Has not waited the lock exception. Our queue is currently configured as follows: 'queue' => [
'class' => \yii\queue\db\Queue::class,
'db' => 'queue-db',
'tableName' => '{{%queue}}',
'channel' => 'default',
'attempts' => 3, // Max number of attempts
'ttr' => 60, // Maximum duration a job should run before it is considered abandoned
'mutex' => \yii\mutex\MysqlMutex::class,
'as jobMonitor' => \zhuravljov\yii\queue\monitor\JobMonitor::class,
'as workerMonitor' => \zhuravljov\yii\queue\monitor\WorkerMonitor::class,
] We are processing ~15,000 - 20,000 jobs an hour with 8 concurrent workers. We see when DB load gets high the lock takes longer and times out. Our current thinking is to move the mutex to something other than MySQL (redis in our case) so that high db load does not impact the worker's ability to take/close locks. In my evaluation of the code it seems fine to have a back-end different than the mutex provider (e.g. keep mysql back-end, but move the mutex to redis). Just wanted to ping the community to get your thoughts on this approach - any red flags? |
@darrylkuhn It will probably not change anything, because it is highly unlikely that mutex is the bottleneck here. The problem is in the process, which holds lock. DB driver has know performance issues - in big queues reserving job may take a while. If you want to change something, I would rather replace DB queue driver with something else. |
Use a better driver like beanstalk for job queues. The database is a bad place for a job queue. |
Fix it by change two methods - https://github.com/yiisoft/yii2-queue/pull/362/files |
Absolutely confirmed fix for my MariaDB instance. |
We had this problem 8 months ago and moved to Amazon SQS however the lack of visibility and the inability to delay more than 15 minutes has us looking for another back end again. Can anybody speak to whether or not the lock issue here is present with the redis driver? |
Have you tried beanstalkd? |
@SamMousa No we haven't. We have redis infrastructure already and was hoping to leverage that however we're processing ~20,000 jobs an hour; given that workload am I correct in assuming that redis is also not an appropriate back-end? Thanks. |
I don't think it's that much to be honest.. but redis is not a job queue. If queue jobs are ephemeral beanstalkd is very simple to set to though. |
@darrylkuhn Redis does not have this performance problem (I'm using it to handle hundreds of thousands jobs per hour), as long as you configured your mutex correctly ( |
In my case, the problem was the absence of an index on the UPDATE After create |
i think , this query need to add channel as filter since this update query takes 8 second in my case with 500k queue |
I think if you have 500k jobs then db driver is not the best choice (especially if you use several channels), use a specialized messaging service instead. |
a week I had been dealing with this performance problem
hope this help other who had the same problem |
|
I'm getting this error at same time when I'm doing mysqldump. So Ill just increase |
Running backups with |
it been more than a year now waiting this to be fixed |
@i-internet which way would you fix it? |
To add to the original post, I use numprocs=1 and do get this errors now also on my dev server. |
I can not (!) confirm that downgrading to 2.3.3 would fix it. Again, my dev server has no jobs at all, but every 15 to 30 minutes, it throws the Lock error. |
We have an idea for the fix. Discussing it internally. |
As a quick note to others on our solution.. Our queue is relatively small (500-1000 jobs at any given time), and we still had this issue. If you can't immediately switch to another driver, increasing your job runners + mutex timeout can assist greatly. Some of our jobs can take up to 15-20 minutes. Mostly, that's because they are working with remote APIs and doing a lot of transformations and so are slow. The benefit of this however is that the jobs actually don't consume a lot of local CPU or disk IO. This allowed us to run a lot of job runners at a time, but this in turn caused us to hit the db driver 'has not waited the lock' issue. We now run 8-16 job runners, depending on the queue channel in question, and we increased our mutex timeout to 60 seconds. Before, we would only have 2-3 active jobs, but with this setting we generally have 80% of our job runners actively processing a job. The default mutex timeout of 3 seconds on the db driver does not work well if you have more than a few jobs because of the lock contention in the db driver. You must increase it, sometimes dramatically. |
How to solve? Or may be im doing something wrong?
if set
numprocs=1
in supervisor config - no errors!What steps will reproduce the problem?
config from
common/config/main.php
supervisor config
Error trace
Additional info
The text was updated successfully, but these errors were encountered: