-
-
Notifications
You must be signed in to change notification settings - Fork 722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distributed upgrade from 2022.03.0 to 2024.2.0 has performance issues. #8646
Comments
distributed.yaml worker-ttl param need to set null |
Just driving by: Users should instead use Client.submit to schedule individual functions. You will also noticed that with Client.run, the dashboard is not actually working just like many other features will not work |
When processing 100 million daily data, a total of 180 days, and a total of 4,000 files, it will appear occasionally. Near the end of the task, the dask scheduling is closed for unknown reasons, but the task is still running and can be completed. |
@yiershanxll: Have you switched from |
I'm trying today, but there is no result yet. It takes 10 hours to run before something can go wrong.
|
I recommend updating to the latest version. Dask development moves fast and chances are that your problem may already be fixed. If that doesn't work, please provide more information about the exact problem you're seeing:
This helps us reproduce the issue you're having and resolve the issue more quickly. |
Thank you so much, I will try it! |
We still use the run mode for submission, but set client.run(on_error='return') and try to run the command for one more time to avoid this problem successfully. |
Problem:
We tested 5 times, and each time the problem occurred at 25 minutes.
the error message "distributed.comm.core.CommClosedError: in <TLS (closed) Scheduler Broadcast local=tls://182.10.4.6:58090 remote=tls://182.10.2.6:18715>: Stream is closed" is displayed.
This problem does not exist in earlier versions:dask==2022.03.0.
Although the task is error, the background worker executes the task properly until the calculation is complete.
Environment information:
Number of nodes: Four containers with 16 vCPUs and 32 GB memory are deployed.
Number of workers: Four workers are started using the dask command. Each worker has ten processes and one thread. The memory usage is limited to 90%. A total of 10 processes are processed in the background.
Distributed computing: We use the client.run method to submit tasks to each worker for processing. The input processed by each worker is a file. Pandas is used for processing, and dask.dataframe is not used. The output is also a file.
We have been using earlier version 2022.03. 0 for distributed computing. However, in this version, the scheduling process fails to allocate tasks due to full CPU usage when the task is processing big data. As a result, the scheduling process stops responding. Therefore, we have not used the native scheduling of dask. Instead, we encapsulate the task submission mode. If there are four workers and each worker has five processes, all tasks are divided into 20 large task lists on average, and each worker is specified to run a large task. In addition, the run method is used for submission to avoid the scheduling problem. This problem occurs occasionally because the production needs to be upgraded to 2024.2. 0 and the data volume increases again.
The text was updated successfully, but these errors were encountered: