Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

video-js webrtc player does not try to play again if the server reports highResourceUsage error #4964

Open
mekya opened this issue Apr 3, 2023 · 4 comments · May be fixed by ant-media/videojs-webrtc-plugin#20
Assignees

Comments

@mekya
Copy link
Contributor

mekya commented Apr 3, 2023

Please write some test code to confirm that we don't encounter this problem again

@mekya mekya moved this to Next Sprint in Ant Media Server Apr 3, 2023
@mekya mekya changed the title video-js webrtc player try to play again if the server reports highResourceUsage error video-js webrtc player does not try to play again if the server reports highResourceUsage error Apr 3, 2023
@mekya mekya moved this from Next Sprint to 🔖 Sprint in Ant Media Server Apr 3, 2023
@mustafaboleken mustafaboleken moved this from 🔖 Sprint to 🏗 In progress in Ant Media Server Apr 10, 2023
@burak-58 burak-58 moved this from 🏗 In progress to 👀 In review in Ant Media Server Apr 17, 2023
@alfred-stokespace
Copy link

@mekya @mustafaboleken

I'm seeing something possibly related, I have two GPU powered ec2 instances (both are identical types g4dn.12xlarge). Both are in a single cluster.

At the moment, there are only 4 active rtsp streams (3 renditions each, 1080p@2500kbps is highest rendition).

One of the ec2 instances will serve Webrtc fine, the other instance returns the "highResourceIsage" error every time an attempt is made.

The "top" on the instance that refuses to serve webrtc looks like it's practically sleeping

top - 21:13:57 up 15 min,  1 user,  load average: 0.72, 0.67, 0.47
Tasks: 531 total,   1 running, 530 sleeping,   0 stopped,   0 zombie
%Cpu0  :  3.0 us,  0.0 sy,  0.0 ni, 97.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  1.0 us,  0.3 sy,  0.0 ni, 98.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  1.7 us,  0.0 sy,  0.0 ni, 98.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  1.7 us,  0.7 sy,  0.0 ni, 97.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  :  1.0 us,  0.0 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  :  1.0 us,  0.0 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  :  1.0 us,  0.3 sy,  0.0 ni, 98.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  :  1.0 us,  0.0 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu8  :  1.0 us,  0.3 sy,  0.0 ni, 98.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu9  :  0.7 us,  0.3 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu10 :  0.7 us,  0.3 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu11 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu12 :  1.0 us,  0.0 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu13 :  0.7 us,  0.0 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu14 :  0.7 us,  0.3 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu15 :  1.0 us,  0.0 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu16 :  2.0 us,  0.3 sy,  0.0 ni, 97.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu17 :  0.7 us,  0.0 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu18 :  0.7 us,  0.0 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu19 :  0.7 us,  0.0 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu20 :  1.3 us,  0.0 sy,  0.0 ni, 98.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu21 :  0.3 us,  0.3 sy,  0.0 ni, 98.3 id,  1.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu22 :  3.0 us,  0.0 sy,  0.0 ni, 97.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu23 :  3.7 us,  0.0 sy,  0.0 ni, 96.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu24 :  1.0 us,  0.0 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu25 :  0.7 us,  0.0 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu26 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu27 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu28 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu29 :  0.7 us,  0.3 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu30 :  0.3 us,  0.3 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu31 :  0.7 us,  0.7 sy,  0.0 ni, 98.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu32 :  0.3 us,  0.3 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu33 :  0.3 us,  0.3 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu34 :  1.0 us,  0.0 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu35 :  0.3 us,  0.3 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu36 :  0.7 us,  0.3 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu37 :  0.3 us,  0.0 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu38 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu39 :  1.0 us,  0.0 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu40 :  0.7 us,  0.0 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu41 :  1.0 us,  0.0 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu42 :  0.3 us,  0.3 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu43 :  1.7 us,  0.7 sy,  0.0 ni, 97.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu44 :  0.3 us,  0.0 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu45 :  0.3 us,  0.3 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu46 :  0.7 us,  0.3 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu47 :  1.0 us,  0.0 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 191074.5 total, 184805.2 free,   5100.9 used,   1168.4 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used. 184409.3 avail Mem

same with nvidia-smi

nvidia-smi
Tue Dec 10 21:19:54 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02              Driver Version: 555.42.02      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       Off |   00000000:00:1B.0 Off |                    0 |
| N/A   29C    P0             25W /   70W |     674MiB /  15360MiB |      4%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla T4                       Off |   00000000:00:1C.0 Off |                    0 |
| N/A   29C    P0             25W /   70W |     274MiB /  15360MiB |      2%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  Tesla T4                       Off |   00000000:00:1D.0 Off |                    0 |
| N/A   29C    P0             25W /   70W |     156MiB /  15360MiB |      1%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  Tesla T4                       Off |   00000000:00:1E.0 Off |                    0 |
| N/A   28C    P0             26W /   70W |     131MiB /  15360MiB |      1%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A       905      C   .../jvm/java-17-openjdk-amd64/bin/java        670MiB |
|    1   N/A  N/A       905      C   .../jvm/java-17-openjdk-amd64/bin/java        270MiB |
|    2   N/A  N/A       905      C   .../jvm/java-17-openjdk-amd64/bin/java        152MiB |
|    3   N/A  N/A       905      C   .../jvm/java-17-openjdk-amd64/bin/java        127MiB |
+-----------------------------------------------------------------------------------------+

sleeping, ... so what gives?

I tried restarting the process, didn't help.
I tried restarting the ec2 instance, didn't help.

Next, I "in-sourced" the webrtc js assets so I could just play with changing things...

  1. remove hls as an option from the playOrder default
  2. remove the reaction to highResourceUsage
-                if (errors["error"] == "no_stream_exist" || errors["error"] == "WebSocketNotConnected" || errors["error"] == "not_initialized_yet" || errors["error"] == "data_store_not_available" || errors["error"] == "highResourceUsage" || errors["error"] == "unauthorized_access" || errors["error"] == "user_blocked") {
+                if (errors["error"] == "no_stream_exist" || errors["error"] == "WebSocketNotConnected" || errors["error"] == "not_initialized_yet" || errors["error"] == "data_store_not_available" || errors["error"] == "unauthorized_access" || errors["error"] == "user_blocked") {

basically just removed the highResourceUsage

and now I just get ...

Logger_1.warn("Websocket is closed. tryNextTech to replay");

@burak-58
Copy link
Contributor

Hi @alfred-stokespace,
Your issue is mainly realated to unfair load balancing. One of the node is high loaded and it doesn't let you play a stream there. The other id not loaded and you can play there.

To distribute the load fairly we created this issue and implemented a solution:
#6712

The solution is not merged to master branch yet. If you want to use a snapshot please write us email.

@alfred-stokespace
Copy link

@burak-58
both servers were almost idle w/respect to CPU and GPU, each one only handling two-three streams each. Both were the same ec2 type w/4xGPU's and 48CPU's and lots of free memory. So, I don't understand how your answer could be correct.

I tried restarting the server that refused to serve and that made no difference. I moved the stream over to the other server and it worked. But the load was minimal on the first server. So it doesn't make sense.

@burak-58
Copy link
Contributor

Hi @alfred-stokespace,
Lets make a call to understand each other better. I will schedule it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 👀 In review
Development

Successfully merging a pull request may close this issue.

4 participants