Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler will lost the registered executor when restart it in the push mode #90

Open
liukun4515 opened this issue Jul 20, 2022 · 3 comments
Labels
bug Something isn't working

Comments

@liukun4515
Copy link
Contributor

Describe the bug
When i restart the schedule, the schedule lost all the information of registered executor

To Reproduce
Start a scheduler with below config:

scheduler_policy="PushStaged"

Start a executor with below config:

scheduler_port=50050
scheduler_host="localhost"
# PushStaged or PullStaged
task_scheduling_policy="PushStaged"

then kill the scheduler and restart the scheduler using the same config.

And the scheduler will lost all registered executor in the memory.

Expected behavior
We should recover this data in memory after the scheduler restart.

Solution:
heartbeat with the registered information for the executor

Additional context
Add any other context about the problem here.

@liukun4515 liukun4515 added the bug Something isn't working label Jul 20, 2022
@thinkharderdev
Copy link
Contributor

@liukun4515 Are you running in standalone mode? It should initialize any registered executors from the backend if you are using etcd as the state backend but in standalone mode the persistent state is stored in sled DB on disk (in a temp file). If we wanted to make standalone mode persist state across restarts then we would need to make the sled DB location a configurable path.

@mingmwang
Copy link
Contributor

Is it still an issue?

@r4ntix
Copy link
Contributor

r4ntix commented Sep 15, 2022

@liukun4515 @thinkharderdev @mingmwang This can be fixed by specifying the --sled-dir parameter when starting the scheduler service.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants