Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gunicorn CPU usage increase over a time #2387

Closed
onzo-mateuszzakarczemny opened this issue Jul 23, 2020 · 3 comments
Closed

Gunicorn CPU usage increase over a time #2387

onzo-mateuszzakarczemny opened this issue Jul 23, 2020 · 3 comments

Comments

@onzo-mateuszzakarczemny
Copy link

onzo-mateuszzakarczemny commented Jul 23, 2020

I run flask app on Gunicorn 20.0.4 and Kubernetes. My python is v3.7.5. I have a problem with increasing CPU over a time.
It is not related to number of request processed in a given time range.
image

I used https://github.com/benfred/py-spy to investigate whats going on on two gunicorn pods. One was quite fresh another one was running for couple of days.
Fresh pod:

Total Samples 130300
GIL: 0.00%, Active: 0.00%, Threads: 1

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)                                                                                                                                                                                 
  0.00%   0.00%   29.91s    29.91s   sleep (gunicorn/arbiter.py:357)
  0.00%   0.00%   0.090s    0.100s   manage_workers (gunicorn/arbiter.py:548)
  0.00%   0.00%   0.070s    0.110s   murder_workers (gunicorn/arbiter.py:495)
  0.00%   0.00%   0.040s    0.040s   last_update (gunicorn/workers/workertmp.py:49)
  0.00%   0.00%   0.020s    0.020s   manage_workers (gunicorn/arbiter.py:544)
  0.00%   0.00%   0.010s    0.010s   run (gunicorn/arbiter.py:207)
  0.00%   0.00%   0.010s    0.010s   <lambda> (gunicorn/arbiter.py:548)
  0.00%   0.00%   0.010s    0.010s   murder_workers (gunicorn/arbiter.py:492)
  0.00%   0.00%   0.000s    29.91s   run (gunicorn/arbiter.py:209)
  0.00%   0.00%   0.000s    30.16s   run (gunicorn/app/base.py:72)
  0.00%   0.00%   0.000s    30.16s   run (gunicorn/app/wsgiapp.py:58)
  0.00%   0.00%   0.000s    0.120s   run (gunicorn/arbiter.py:210)
  0.00%   0.00%   0.000s    30.16s   run (gunicorn/app/base.py:228)
  0.00%   0.00%   0.000s    30.16s   <module> (gunicorn:8)
  0.00%   0.00%   0.000s    0.120s   run (gunicorn/arbiter.py:211)

Old pod:

Total Samples 125500
GIL: 0.00%, Active: 47.00%, Threads: 1

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)                                                                                                                                                                                 
 47.00%  47.00%   298.8s    298.8s   sleep (gunicorn/arbiter.py:357)
  0.00%   0.00%   0.510s    0.900s   murder_workers (gunicorn/arbiter.py:495)
  0.00%   0.00%   0.390s    0.390s   last_update (gunicorn/workers/workertmp.py:49)
  0.00%   0.00%   0.290s    0.360s   manage_workers (gunicorn/arbiter.py:548)
  0.00%   0.00%   0.080s    0.080s   maybe_promote_master (gunicorn/arbiter.py:314)
  0.00%   0.00%   0.070s    0.070s   <lambda> (gunicorn/arbiter.py:548)
  0.00%   0.00%   0.060s    0.060s   run (gunicorn/arbiter.py:207)
  0.00%   0.00%   0.050s    0.140s   run (gunicorn/arbiter.py:205)
  0.00%   0.00%   0.050s    0.080s   manage_workers (gunicorn/arbiter.py:544)
  0.00%   0.00%   0.040s    0.040s   manage_workers (gunicorn/arbiter.py:549)
  0.00%   0.00%   0.030s    0.030s   _get_num_workers (gunicorn/arbiter.py:80)
  0.00%   0.00%   0.020s    0.020s   run (gunicorn/arbiter.py:212)
  0.00%   0.00%   0.020s    0.020s   murder_workers (gunicorn/arbiter.py:486)
  0.00%   0.00%   0.020s    0.960s   run (gunicorn/arbiter.py:210)
  0.00%   0.00%   0.010s    0.010s   murder_workers (gunicorn/arbiter.py:493)
  0.00%  47.00%   0.010s    298.8s   run (gunicorn/arbiter.py:209)
  0.00%   0.00%   0.010s    0.010s   run (gunicorn/arbiter.py:208)
  0.00%   0.00%   0.010s    0.010s   maybe_promote_master (gunicorn/arbiter.py:315)
  0.00%   0.00%   0.010s    0.010s   manage_workers (gunicorn/arbiter.py:554)
  0.00%   0.00%   0.010s    0.010s   manage_workers (gunicorn/arbiter.py:547)
  0.00%   0.00%   0.010s    0.010s   manage_workers (gunicorn/arbiter.py:551)

So it seems that older pod spent much more time on gunicorn/arbiter.py:357 ready = select.select([self.PIPE[0]], [], [], 1.0) than fresh one (within similar number of py-spy samples). The number of request was also similar because those pods are attached to the same load balancer.
I tried to minimize that issue by setting max_requests Gunicorn flag but it didn't help. So that's why I think is master process problem.
Have you ever noticed such issue or have any ideas how to fix that?

@jamadden
Copy link
Collaborator

Does a tool like top show the gunicorn master process taking the CPU time, or is it the gunicorn workers?

The samples you've taken appear to be from the master gunicorn process. It should spend most of its time in sleep. That's where it hands control over to the operating system and waits for something to happen to the actual worker child processes. The call to select.select() should block in the operating system, without using any CPU, until "something happens". Every second, even if nothing has happened, it wakes up and checks just to be sure (that'd be the time spent in murder_workers).

If you enable the --native argument to py-spy, you should find the time being spent in lower level primitives, notably select.select and down into glibc.

@onzo-mateuszzakarczemny
Copy link
Author

py-spy top --pid 1 --native

Total Samples 24200
GIL: 0.00%, Active: 9.00%, Threads: 1

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)                                                                                                                                                                                 
  9.00%   9.00%   63.69s    63.69s   select (libc-2.27.so)
  0.00%   0.00%   0.150s    0.150s   __fxstat (libc-2.27.so)
  0.00%   0.00%   0.110s    0.260s   last_update (gunicorn/workers/workertmp.py:49)
  0.00%   0.00%   0.080s    0.080s   0x7ffce9366b12 ([vdso])
  0.00%   0.00%   0.060s    0.070s   manage_workers (gunicorn/arbiter.py:548)
  0.00%   0.00%   0.040s    0.040s   manage_workers (gunicorn/arbiter.py:544)
  0.00%   9.00%   0.030s    63.75s   sleep (gunicorn/arbiter.py:357)
  0.00%   0.00%   0.030s    0.340s   murder_workers (gunicorn/arbiter.py:495)
  0.00%   0.00%   0.010s    0.010s   manage_workers (gunicorn/arbiter.py:549)
  0.00%   0.00%   0.010s    0.010s   <lambda> (gunicorn/arbiter.py:548)
  0.00%   0.00%   0.010s    0.010s   murder_workers (gunicorn/arbiter.py:493)
  0.00%   0.00%   0.010s    0.140s   run (gunicorn/arbiter.py:211)
  0.00%   0.00%   0.010s    0.010s   manage_workers (gunicorn/arbiter.py:553)
  0.00%   0.00%   0.010s    0.360s   run (gunicorn/arbiter.py:210)
  0.00%   0.00%   0.010s    0.010s   run (gunicorn/arbiter.py:207)
  0.00%   0.00%   0.000s    0.050s   0x7ffce92c4d50 (?)
  0.00%   9.00%   0.000s    63.75s   run (gunicorn/arbiter.py:209)
  0.00%   0.00%   0.000s    0.030s   0x7ffce92b8b00 (?)
  0.00%   9.00%   0.000s    64.26s   <module> (gunicorn:8)
  0.00%   9.00%   0.000s    64.26s   run (gunicorn/app/wsgiapp.py:58)
  0.00%   0.00%   0.000s    0.080s   clock_gettime (libc-2.27.so)
  0.00%   9.00%   0.000s    64.26s   run (gunicorn/app/base.py:72)
  0.00%   9.00%   0.000s    64.26s   run (gunicorn/app/base.py:228)

top

Tasks:   7 total,   3 running,   4 sleeping,   0 stopped,   0 zombie
%Cpu(s): 83.4 us,  5.9 sy,  0.0 ni,  7.5 id,  0.0 wa,  0.0 hi,  3.0 si,  0.1 st
KiB Mem : 16435408 total,  1018392 free, 10144156 used,  5272860 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  5949160 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                                   
  329 root      20   0  836760 212956  40040 R  25.6  1.3   0:27.02 gunicorn                                                                                                                                                                  
  333 root      20   0  837952 212956  39924 R  23.9  1.3   0:23.72 gunicorn                                                                                                                                                                  
    1 root      20   0  117900  35012  11064 S   0.0  0.2   1:04.88 gunicorn                                                                                                                                                                  
  276 root      20   0   20696   4196   3340 S   0.0  0.0   0:00.05 bash                                                                                                                                                                      
  308 root      20   0   20696   4252   3392 S   0.0  0.0   0:00.06 bash                                                                                                                                                                      
  341 root      20   0  830360 206116  39408 S   0.0  1.3   0:29.14 gunicorn                                                                                                                                                                  
  347 root      20   0   38420   3540   2992 R   0.0  0.0   0:00.04 top                                                                                                                                                                       

I tried to kill all workers and leave master. However, I didn't notice any load characteristic change after new workers were spawned:
image

@onzo-mateuszzakarczemny
Copy link
Author

I'm closing that issue. After some further investigation I found that its not directly related to Gunicorn but Prometheus client in multiprocess mode (see prometheus/client_python#568).
Thank you for comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants