-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Open
Description
Issue
When deploying moltworker to Cloudflare Workers, rapid deployments (multiple deployments within 5-10 minutes) cause a Durable Object reset loop that prevents the OpenClaw gateway from starting.
Error Messages
Failed to start process: Error: Durable Object reset because its code was updated.
[PROXY] Failed to start Moltbot: Error: Durable Object reset because its code was updated.
Environment
- Platform: Cloudflare Workers with Durable Objects + Container bindings
- OpenClaw Version: 2026.2.3-1
- Moltworker: Based on cloudflare/moltworker architecture
- Container: Docker with
openclaw gatewayrunning in Cloudflare Sandbox
Steps to Reproduce
- Deploy moltworker to Cloudflare
- Wait for gateway to start successfully
- Deploy again within 5 minutes (e.g., bug fix or feature change)
- Deploy a third time within another 5 minutes
- Observe: Gateway fails to start with "Durable Object reset" errors in a loop
Expected Behavior
Gateway should recover gracefully Gateway should recover gracefully Gateway should recover gracefully Gateway should recover gracefully Gateway should recover gracefully Gateway should recover gracefully Gateway should recoveris interrupted by another DO reset
- Gateway never becomes ready on port 18789
- Process times out after 90 seconds
- Only resolves after waiting 5-10+ minutes without any deployments
Impact
- Production downtime during multiple deployments
- Cannot do rapid iteration/bug fixes in production
- Data is safe (R2 backup/restore works correctly), but service is unavailable during reset loop
Workaround
Wait 5-10 minutes between deployments to allow the Durable Object to fully stabilize before deploying again.
Proposed Solutions
- Better error handling: Detect DO reset scenarios and retry with exponential backoff
- **Startup s2. **Startup s2. **Startup s2. **Stas in progres2. **Startup s2. **Startup s2. **Startup s2. **Stas in progres2. **Startup s2. *guide (batch changes, avoid rapid deploys)
- Graceful degradation: Return a "deployment in progress" status instead of timing out
- Gradual rollouts: Consider using Workers
deploy_config.version_idfor canary deployments
Additional Context
- Using R2 for persistent storage (config, skills, conversations)
- R2 restore completes successfully before the reset occurs
- The issue is purely with the Durable Object lifecycle during rapid code updates
- This appears to be a Cloudflare platform limitation, but better handling would improve the deployment experience
Related
This might be related to how Durable Objects handle alarm() during code updates - our KeepAlive DO pings the Sandbox DO every 30 seconds, which may interact poorly with deployments.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels