-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
Nomad version
v1.9.7
• Client OS: Windows
• Client driver: raw_exec
• Environment: Production (multiple clusters)
• Workload type: UEM Control Plane workloads
• Node lifecycle: Externally decommissioned by SaaS automation
Issue
Nomad client nodes that are permanently decommissioned outside of Nomad remain indefinitely in a disconnected and eligible state. These nodes never transition to down, are not eligible for garbage collection, and cause Nomad servers to continuously emit Node heartbeat missed events.
As a result:
• Nomad servers repeatedly poll nodes that no longer exist
• Logs contain continuous heartbeat failure noise
• Log ingestion costs increase significantly in production
• Manual operator intervention is required to clean up stale nodes
Reproduction steps
1. Register a Nomad client node to a cluster.
2. Decommission or terminate the node externally without graceful Nomad deregistration.
3. Wait for the client to become unreachable.
4. Observe the node state via:
nomad node status
5. Inspect a disconnected node:
nomad node status <NODE_ID>
Expected Result
• After a reasonable period of missed heartbeats, Nomad should:
• Transition the node from disconnected to down
• Mark the node as ineligible
• Garbage collect the node automatically
• Heartbeat retry attempts and related log entries should stop once the node is clearly unrecoverable.
Actual Result
• Node remains indefinitely in:
• Status = disconnected
• Eligibility = eligible
• Node is never marked as down
• Node is never garbage collected
• Nomad servers continuously emit:
• Node heartbeat missed events
• Manual force drain is required to transition the node to a state where GC can remove it.
Nomad Server logs (if appropriate)
2026-01-23T19:05:43.718Z {
"@level": "warn",
"@message": "node TTL expired",
"@module": "nomad.heartbeat",
"@timestamp": "2026-01-23T19:05:43.718135Z",
"node_id": "<REDACTED_NODE_ID>"
}
2026-01-23T19:05:42.507Z {
"@level": "warn",
"@message": "node TTL expired",
"@module": "nomad.heartbeat",
"@timestamp": "2026-01-23T19:05:42.507501Z",
"node_id": "<REDACTED_NODE_ID>"
}
Metadata
Metadata
Assignees
Type
Projects
Status