"Has not waited the lock" on queue with supervisor numprocs more than one #234

m1roff · 2018-05-08T12:08:19Z

How to solve? Or may be im doing something wrong?

if set numprocs=1 in supervisor config - no errors!

What steps will reproduce the problem?

config from common/config/main.php

'queue' => [
            'class' => \yii\queue\db\Queue::class,
            'as log' => \yii\queue\LogBehavior::class,
            'db' => 'db', // DB connection component or its config
            'tableName' => '{{%queue}}', // Table name
            'channel' => 'default', // Queue channel key
            'mutex' => \yii\mutex\PgsqlMutex::class, // Mutex that used to sync queries
            'mutexTimeout' => 0,
            'ttr' => 5 * 60, // Max time for anything job handling
            'attempts' => 5, // Max number of attempts
        ],

supervisor config

[program:m-prod-yii-queue-worker]
command=/usr/bin/php /www/m/http/yii queue/listen --verbose=1 --color=0
autostart=true
autorestart=true
numprocs=2
process_name = %(program_name)s_%(process_num)02d
redirect_stderr=true
stdout_logfile=/www/m/log/yii-queue-worker.log

Error trace

yii\base\Exception: Has not waited the lock. in /www/m/http/vendor/yiisoft/yii2-queue/src/drivers/db/Queue.php:179
Stack trace:
#0 [internal function]: yii\queue\db\Queue->yii\queue\db\{closure}(Object(yii\db\Connection))
#1 /www/m/http/vendor/yiisoft/yii2/db/Connection.php(1059): call_user_func(Object(Closure), Object(yii\db\Connection))
#2 /www/m/http/vendor/yiisoft/yii2-queue/src/drivers/db/Queue.php(211): yii\db\Connection->useMaster(Object(Closure))
#3 /www/m/http/vendor/yiisoft/yii2-queue/src/drivers/db/Queue.php(78): yii\queue\db\Queue->reserve()
#4 [internal function]: yii\queue\db\Queue->yii\queue\db\{closure}(Object(Closure))
#5 /www/m/http/vendor/yiisoft/yii2-queue/src/cli/Queue.php(117): call_user_func(Object(Closure), Object(Closure))
#6 /www/m/http/vendor/yiisoft/yii2-queue/src/drivers/db/Queue.php(93): yii\queue\cli\Queue->runWorker(Object(Closure))
#7 /www/m/http/vendor/yiisoft/yii2-queue/src/drivers/db/Command.php(76): yii\queue\db\Queue->run(true, 3)
#8 [internal function]: yii\queue\db\Command->actionListen(3)
#9 /www/m/http/vendor/yiisoft/yii2/base/InlineAction.php(57): call_user_func_array(Array, Array)
#10 /www/m/http/vendor/yiisoft/yii2/base/Controller.php(157): yii\base\InlineAction->runWithParams(Array)
#11 /www/m/http/vendor/yiisoft/yii2/console/Controller.php(148): yii\base\Controller->runAction('listen', Array)
#12 /www/m/http/vendor/yiisoft/yii2/base/Module.php(528): yii\console\Controller->runAction('listen', Array)
#13 /www/m/http/vendor/yiisoft/yii2/console/Application.php(180): yii\base\Module->runAction('queue/listen', Array)
#14 /www/m/http/vendor/yiisoft/yii2/console/Application.php(147): yii\console\Application->runAction('queue/listen', Array)
#15 /www/m/http/vendor/yiisoft/yii2/base/Application.php(386): yii\console\Application->handleRequest(Object(yii\console\Request))
#16 /www/m/http/yii(27): yii\base\Application->run()
#17 {main}

Additional info

Q	A
Yii vesion	2.0.16-dev
PHP version	7.1.10
Operating system	ubuntu16.04.1

The text was updated successfully, but these errors were encountered:

kaspirovski · 2018-07-02T12:30:55Z

Which PostgreSQL version?

m1roff · 2018-07-02T12:37:07Z

psql (PostgreSQL) 10.3 (Ubuntu 10.3-1.pgdg16.04+1)

kaspirovski · 2018-07-04T07:35:00Z

I have the same problem on psql (PostgreSQL) 10.4 (Ubuntu 10.4-0ubuntu0.18.04) so it seems to be problem with PostgreSQL version. It works great on PostgreSQL 9.4.

akorz · 2018-07-06T09:12:52Z

the same story with MySQL

rob006 · 2018-07-06T09:23:03Z

How large is your queue?

akorz · 2018-07-06T09:25:19Z

for me, it's 5 workers

rob006 · 2018-07-06T09:26:53Z

I mean how many jobs do you have in queue table.

akorz · 2018-07-06T09:28:01Z

right now I have a very little load. May be 1 job per minute

rob006 · 2018-07-06T10:01:14Z

I have ~300k waiting jobs in queue and DB driver (MariaDB) becomes unusable - it took like 2-3 seconds to reserve a job (which executes in less than 0.1 second, so queue spends most of its time on reserving jobs).

SamMousa · 2018-07-10T09:16:02Z

You are getting lock contention, meaning that all workers are trying to obtain a lock at the same time.
For those kinds of loads it makes more sense to use a real queue like beanstalkd.
(Note that is is really easy 0 configuration to setup, and it will make your life easier and your queue faster)

samokspv · 2019-01-04T13:01:52Z

the same story with MySQL

+1

geopamplona · 2019-01-11T08:10:07Z

I have the same problem in a Postgresql database "PostgreSQL 10.4 (Debian 10.4-2.pgdg90 + 1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18 + deb9u1) 6.3.0 20170516, 64-bit "

it seems a subject of the database configuration.

Is there any idea how to solve this? What effects can this have?
Does this error leave unresolved tasks?

JorgePalaciosZaratiegui · 2019-01-11T08:28:27Z

+1
Why does the error 'Has not waited the lock' occur? What is the cause?

rob006 · 2019-01-20T13:27:30Z

OP problem is related to mutex settings, see this answer on SO.

In general if you get such errors, you should switch mutex backend to some more reliable implementation (MysqlMutex works fine for me), and/or increase mutexTimeout (setting it to 0 will throw this exception on every concurrency issue, which is very impractical for real queue).

samdark · 2019-01-20T15:46:27Z

Doesn't seem there's anything to fix in the code but definitely worth documenting it.

darrylkuhn · 2019-10-23T17:33:17Z

We too have been bitten by the Has not waited the lock exception. Our queue is currently configured as follows:

'queue' => [
    'class' => \yii\queue\db\Queue::class,
    'db' => 'queue-db',
    'tableName' => '{{%queue}}',
    'channel' => 'default',
    'attempts' => 3, // Max number of attempts
    'ttr' => 60, // Maximum duration a job should run before it is considered abandoned
    'mutex' => \yii\mutex\MysqlMutex::class,
    'as jobMonitor' => \zhuravljov\yii\queue\monitor\JobMonitor::class,
    'as workerMonitor' => \zhuravljov\yii\queue\monitor\WorkerMonitor::class,
]

We are processing ~15,000 - 20,000 jobs an hour with 8 concurrent workers. We see when DB load gets high the lock takes longer and times out. Our current thinking is to move the mutex to something other than MySQL (redis in our case) so that high db load does not impact the worker's ability to take/close locks. In my evaluation of the code it seems fine to have a back-end different than the mutex provider (e.g. keep mysql back-end, but move the mutex to redis). Just wanted to ping the community to get your thoughts on this approach - any red flags?

rob006 · 2019-10-23T17:43:11Z

@darrylkuhn It will probably not change anything, because it is highly unlikely that mutex is the bottleneck here. The problem is in the process, which holds lock. DB driver has know performance issues - in big queues reserving job may take a while. If you want to change something, I would rather replace DB queue driver with something else.

SamMousa · 2019-10-23T17:43:23Z

Use a better driver like beanstalk for job queues. The database is a bad place for a job queue.

darrylkuhn · 2019-10-23T17:45:27Z

@rob006 and @SamMousa understood we'll probably end up taking that approach. Thanks for the feedback.

mathematicalman · 2019-11-29T12:55:59Z

Fix it by change two methods - https://github.com/yiisoft/yii2-queue/pull/362/files

rowansimms · 2020-05-27T03:04:04Z

Fix it by change two methods - https://github.com/yiisoft/yii2-queue/pull/362/files

Absolutely confirmed fix for my MariaDB instance.

darrylkuhn · 2020-07-02T02:35:34Z

We had this problem 8 months ago and moved to Amazon SQS however the lack of visibility and the inability to delay more than 15 minutes has us looking for another back end again. Can anybody speak to whether or not the lock issue here is present with the redis driver?

SamMousa · 2020-07-02T06:06:27Z

Have you tried beanstalkd?

darrylkuhn · 2020-07-02T16:37:12Z

@SamMousa No we haven't. We have redis infrastructure already and was hoping to leverage that however we're processing ~20,000 jobs an hour; given that workload am I correct in assuming that redis is also not an appropriate back-end? Thanks.

SamMousa · 2020-07-02T16:58:38Z

I don't think it's that much to be honest.. but redis is not a job queue. If queue jobs are ephemeral beanstalkd is very simple to set to though.

rob006 · 2020-07-02T17:04:05Z

@darrylkuhn Redis does not have this performance problem (I'm using it to handle hundreds of thousands jobs per hour), as long as you configured your mutex correctly (mutexTimeout is not 0). But it has some other limitations, like lack of priorities, and it is not really atomic, so in unstable environment you may end up with inconsistent queue.

darrylkuhn · 2020-07-02T22:01:38Z

@SamMousa and @rob006 thanks both for the input - think I'll probably suck it up and spin up beanstalkd 👍

ixapek · 2020-10-30T09:35:01Z

In my case, the problem was the absence of an index on the done_at field.

UPDATE queue SET reserved_at=null WHERE reserved_at < :time - ttr and done_at is null;
This query initiated by moveExpired() method and works slowly, if deleteReleased=false, mutex expired, worker crashed with exception, restarted by supervisor and crash again.

After create done_at index all works fine.
I think, need update documentation for this index https://github.com/yiisoft/yii2-queue/blob/master/docs/guide/driver-db.md

freddokresna · 2022-04-07T06:49:53Z

In my case, the problem was the absence of an index on the done_at field.

UPDATE queue SET reserved_at=null WHERE reserved_at < :time - ttr and done_at is null; This query initiated by moveExpired() method and works slowly, if deleteReleased=false, mutex expired, worker crashed with exception, restarted by supervisor and crash again.

After create done_at index all works fine. I think, need update documentation for this index https://github.com/yiisoft/yii2-queue/blob/master/docs/guide/driver-db.md

i think , this query need to add channel as filter since this update query takes 8 second in my case with 500k queue

fl0v · 2022-04-09T16:50:13Z

I think if you have 500k jobs then db driver is not the best choice (especially if you use several channels), use a specialized messaging service instead.
The db driver is meant to have just a small number of concurrent jobs where index is irrelevant on select and inserting/updating jobs should be as fast as possible.

freddokresna · 2022-04-09T17:54:14Z

I think if you have 500k jobs then db driver is not the best choice (especially if you use several channels), use a specialized messaging service instead. The db driver is meant to have just a small number of concurrent jobs where index is irrelevant on select and inserting/updating jobs should be as fast as possible.

a week I had been dealing with this performance problem
and today I had realize several conclusions to note, maybe help another person use this queue

I'm running on VM ( proxmox 6.4-4 ) and php8.1 which had a standard CPU unit, once I increase the CPU unit the problem of performance disappears ( the standard CPU unit, give me 1job per 3 seconds to achieve, once increase the CPU 1job per second )
this DB queue is optimized when we use a separate table instead of one table with several channels ( in case of huge jobs )
always remove the vendors folder and get the updated version ( i don't exactly know why reinstalling had some significant effect, maybe the updated version had bug fixing on the queue )
3 conccurent job per table queue

hope this help other who had the same problem

fl0v · 2022-04-09T18:11:56Z

is a very good ideea
not sure it had anything to do with it, a composer update on dev should be enough and after all new updates are tested a composer install on production should be enough. As far as i know composer itself has no issues regarding installing new versions of your packages. (Also after composer install on production you should always run migrations as part of deploy process just in case new migrations come from required packages)

BenasPaulikas · 2022-08-30T12:56:24Z

I'm getting this error at same time when I'm doing mysqldump. So Ill just increase mutexTimeout for now.

BenasPaulikas · 2022-09-12T10:46:16Z

Running backups with nice -n 19 mysqldump --single-transaction eliminated my issue. Maybe this will be helpful for someone

i-internet · 2022-09-15T06:54:33Z

it been more than a year now waiting this to be fixed

samdark · 2022-09-15T21:21:44Z

@i-internet which way would you fix it?

gb5256 · 2023-05-05T09:25:11Z

To add to the original post, I use numprocs=1 and do get this errors now also on my dev server.
Funny thing is that the dev server has no queue jobs at all most of time. But everytime the "listen" starts, it throws the "has not waited lock" error. Again, the table of queues is empty. Not sure what do to from here. Somebody in another issue posted that downgrading to 2.3.3 did work for them, will try this now as well.

gb5256 · 2023-05-05T15:52:34Z

I can not (!) confirm that downgrading to 2.3.3 would fix it. Again, my dev server has no jobs at all, but every 15 to 30 minutes, it throws the Lock error.

samdark · 2023-05-06T10:23:13Z

https://forum.yiiframework.com/t/yii2-yii-queue-i-am-getting-deadlocks-on-deleting-rows-in-the-queue-table/135237/3

samdark · 2023-05-06T10:23:31Z

We have an idea for the fix. Discussing it internally.

optmsp · 2023-11-13T15:19:55Z

As a quick note to others on our solution..

Our queue is relatively small (500-1000 jobs at any given time), and we still had this issue. If you can't immediately switch to another driver, increasing your job runners + mutex timeout can assist greatly.

Some of our jobs can take up to 15-20 minutes. Mostly, that's because they are working with remote APIs and doing a lot of transformations and so are slow. The benefit of this however is that the jobs actually don't consume a lot of local CPU or disk IO. This allowed us to run a lot of job runners at a time, but this in turn caused us to hit the db driver 'has not waited the lock' issue.

We now run 8-16 job runners, depending on the queue channel in question, and we increased our mutex timeout to 60 seconds. Before, we would only have 2-3 active jobs, but with this setting we generally have 80% of our job runners actively processing a job. The default mutex timeout of 3 seconds on the db driver does not work well if you have more than a few jobs because of the lock contention in the db driver. You must increase it, sometimes dramatically.

samdark added the type:docs Documentation label Jan 20, 2019

samdark added the status:ready for adoption Feel free to implement this issue. label Jan 20, 2019

mathematicalman mentioned this issue Nov 29, 2019

Fix deadlock problem #362

Closed

samdark added this to the 2.3.1 milestone Nov 30, 2019

samdark removed the status:ready for adoption Feel free to implement this issue. label Nov 30, 2019

samdark modified the milestones: 2.3.1, 2.3.2 Dec 23, 2020

samdark removed this from the 2.3.2 milestone Oct 23, 2021

samdark added type:bug Bug and removed type:docs Documentation labels May 6, 2023

"Has not waited the lock" on queue with supervisor numprocs more than one #234

"Has not waited the lock" on queue with supervisor numprocs more than one #234

Comments

m1roff commented May 8, 2018

What steps will reproduce the problem?

Additional info

kaspirovski commented Jul 2, 2018

m1roff commented Jul 2, 2018

kaspirovski commented Jul 4, 2018 • edited Loading

akorz commented Jul 6, 2018

rob006 commented Jul 6, 2018

akorz commented Jul 6, 2018

rob006 commented Jul 6, 2018

akorz commented Jul 6, 2018

rob006 commented Jul 6, 2018

SamMousa commented Jul 10, 2018

samokspv commented Jan 4, 2019

geopamplona commented Jan 11, 2019

JorgePalaciosZaratiegui commented Jan 11, 2019

rob006 commented Jan 20, 2019

samdark commented Jan 20, 2019

darrylkuhn commented Oct 23, 2019

rob006 commented Oct 23, 2019

SamMousa commented Oct 23, 2019

darrylkuhn commented Oct 23, 2019

mathematicalman commented Nov 29, 2019

rowansimms commented May 27, 2020

darrylkuhn commented Jul 2, 2020

SamMousa commented Jul 2, 2020

darrylkuhn commented Jul 2, 2020

SamMousa commented Jul 2, 2020 • edited Loading

rob006 commented Jul 2, 2020

darrylkuhn commented Jul 2, 2020

ixapek commented Oct 30, 2020

freddokresna commented Apr 7, 2022 • edited Loading

fl0v commented Apr 9, 2022

freddokresna commented Apr 9, 2022 • edited Loading

fl0v commented Apr 9, 2022

BenasPaulikas commented Aug 30, 2022

BenasPaulikas commented Sep 12, 2022

i-internet commented Sep 15, 2022

samdark commented Sep 15, 2022

gb5256 commented May 5, 2023

gb5256 commented May 5, 2023

samdark commented May 6, 2023

samdark commented May 6, 2023

optmsp commented Nov 13, 2023

kaspirovski commented Jul 4, 2018 •

edited

Loading

SamMousa commented Jul 2, 2020 •

edited

Loading

freddokresna commented Apr 7, 2022 •

edited

Loading

freddokresna commented Apr 9, 2022 •

edited

Loading