Reschedule long-pending allocs #24780

EtienneBruines · 2025-01-06T10:50:01Z

Nomad version

Nomad v1.9.4
BuildDate 2024-12-18T15:16:22Z
Revision 5e49fcd+CHANGES

Operating system and Environment details

Ubuntu 22.04.5 LTS on amd64

Issue

Sometimes an alloc will be pending for a long time. Since no alloc has been started yet, the scheduler should be able to re-schedule the alloc to a different client.

This is especially worrisome if it's a periodic batch job which prevents overlap. This makes the severity of the pending job worse.

Reproduction steps

Have a client with a lot of GC'able allocs (start a lot of them and set the GC interval to 24h or something)
Wait for a new job to be scheduled to this client
See the alloc status pending - and see it stay pending for quite a while sometimes

Expected Result

If a job is pending for too long, the scheduler should restart that alloc on a different client.

What is too long? Not sure, but when the next periodic batch job should have started, it has definitely been too long.

Actual Result

The alloc staying on the overworked client and being 'stuck' there until the client finally decides to start it.

Job file (if appropriate)

Not applicable.

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

The text was updated successfully, but these errors were encountered:

pkazmierczak · 2025-01-08T08:53:24Z

Hi @EtienneBruines, thanks for raising the issue. In case of deployments (i.e., service jobs) they will fail if stuck in pending state for too long, and allocations will be marked as unhealthy. This can be controlled by healthy_deadline. Sadly, in case of periodic jobs this doesn't help.

I'll add this to our board and we'll have a think what to do about it.

EtienneBruines added the type/bug label Jan 6, 2025

EtienneBruines mentioned this issue Jan 6, 2025

Nomad client not reporting pending job during GC #24777

Open

jrasell added this to Nomad - Community Issues Triage Jan 7, 2025

github-project-automation bot moved this to Needs Triage in Nomad - Community Issues Triage Jan 7, 2025

pkazmierczak moved this from Needs Triage to Needs Roadmapping in Nomad - Community Issues Triage Jan 8, 2025

pkazmierczak added theme/scheduling theme/periodic labels Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reschedule long-pending allocs #24780

Reschedule long-pending allocs #24780

EtienneBruines commented Jan 6, 2025

pkazmierczak commented Jan 8, 2025

Reschedule long-pending allocs #24780

Reschedule long-pending allocs #24780

Comments

EtienneBruines commented Jan 6, 2025

Nomad version

Operating system and Environment details

Issue

Reproduction steps

Expected Result

Actual Result

Job file (if appropriate)

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

pkazmierczak commented Jan 8, 2025