Skip to content

Disconnected Nomad client nodes never transition to down, are not garbage collected, and generate infinite heartbeat failures #27409

@raunak2004

Description

@raunak2004

Nomad version

v1.9.7

• Client OS: Windows
• Client driver: raw_exec
• Environment: Production (multiple clusters)
• Workload type: UEM Control Plane workloads
• Node lifecycle: Externally decommissioned by SaaS automation

Issue

Nomad client nodes that are permanently decommissioned outside of Nomad remain indefinitely in a disconnected and eligible state. These nodes never transition to down, are not eligible for garbage collection, and cause Nomad servers to continuously emit Node heartbeat missed events.

As a result:
• Nomad servers repeatedly poll nodes that no longer exist
• Logs contain continuous heartbeat failure noise
• Log ingestion costs increase significantly in production
• Manual operator intervention is required to clean up stale nodes

Reproduction steps
1. Register a Nomad client node to a cluster.
2. Decommission or terminate the node externally without graceful Nomad deregistration.
3. Wait for the client to become unreachable.
4. Observe the node state via:
nomad node status
5. Inspect a disconnected node:
nomad node status <NODE_ID>

Expected Result
• After a reasonable period of missed heartbeats, Nomad should:
• Transition the node from disconnected to down
• Mark the node as ineligible
• Garbage collect the node automatically
• Heartbeat retry attempts and related log entries should stop once the node is clearly unrecoverable.

Actual Result
• Node remains indefinitely in:
• Status = disconnected
• Eligibility = eligible
• Node is never marked as down
• Node is never garbage collected
• Nomad servers continuously emit:
• Node heartbeat missed events
• Manual force drain is required to transition the node to a state where GC can remove it.

Nomad Server logs (if appropriate)

2026-01-23T19:05:43.718Z  {
  "@level": "warn",
  "@message": "node TTL expired",
  "@module": "nomad.heartbeat",
  "@timestamp": "2026-01-23T19:05:43.718135Z",
  "node_id": "<REDACTED_NODE_ID>"
}
2026-01-23T19:05:42.507Z  {
  "@level": "warn",
  "@message": "node TTL expired",
  "@module": "nomad.heartbeat",
  "@timestamp": "2026-01-23T19:05:42.507501Z",
  "node_id": "<REDACTED_NODE_ID>"
}

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Triaging

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions