Skip to content

feat(vmm): add VM removing state for reliable lifecycle cleanup#497

Merged
kvinwang merged 2 commits intomasterfrom
vm-removing-state
Feb 7, 2026
Merged

feat(vmm): add VM removing state for reliable lifecycle cleanup#497
kvinwang merged 2 commits intomasterfrom
vm-removing-state

Conversation

@kvinwang
Copy link
Collaborator

@kvinwang kvinwang commented Feb 7, 2026

Summary

  • Adds a "removing" intermediate state to the VM lifecycle so deletion is non-blocking and crash-recoverable
  • remove_vm() marks the VM as removing, persists a .removing marker file, and spawns a background cleanup coroutine
  • Background cleanup: stops supervisor process, polls until exit, removes from supervisor, deletes workdir, frees CID
  • On startup and RPC reload, VMs with .removing marker are loaded (visible in UI) and cleanup resumes automatically
  • spawn_finish_remove deduplicates cleanup tasks via in-memory removing flag
  • start_vm and auto-restart skip VMs in removing state
  • UI shows amber "removing" status badge

Test plan

  • Chaos test: deploy + stop + remove (basic flow)
  • Chaos test: rapid stop + remove (race condition)
  • Chaos test: deploy x2, stop both, remove both
  • Chaos test: remove without stopping (background handles stop)
  • Chaos test: delete workdir + reload_vms (orphan cleanup)
  • UI displays "removing" status correctly
  • Consistency checks pass: VMM and supervisor state match after all operations

Introduce a "removing" intermediate state so that VM deletion is
non-blocking and crash-recoverable:

- remove_vm() marks the VM as removing, writes a .removing marker
  file, and spawns a background coroutine that stops the supervisor
  process, polls until it exits, removes it from supervisor, deletes
  the workdir, and finally frees the CID.
- On startup (reload_vms) and on RPC reload (reload_vms_sync), VMs
  with a .removing marker are loaded into memory (visible in UI) and
  cleanup is resumed via spawn_finish_remove.
- spawn_finish_remove checks the in-memory removing flag to avoid
  duplicate cleanup tasks.
- start_vm and try_restart_exited_vms skip VMs in removing state.
- UI shows "removing" status badge (amber) for VMs being cleaned up.
@kvinwang kvinwang enabled auto-merge February 7, 2026 16:19
@kvinwang kvinwang merged commit 9cb24e8 into master Feb 7, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant