Problem
start.sh runs the Slack bridge in a while true loop with a fixed 5s restart delay. Persistent faults can produce noisy restart storms and ambiguous operational state.
Proposed solution
- Replace fixed-delay infinite restart with adaptive policy:
- exponential backoff (+ jitter)
- reset backoff after stable run window
- max-consecutive-failure threshold with clear alert/status signal
- Emit structured restart reason/attempt logs for easier triage.
- Keep current behavior as fallback if policy env vars are unset.
Helpful context
start.sh launches tmux session slack-bridge and restarts bridge with hardcoded 5s sleep.
- Broker health file exists (
broker-health.json) but restart policy is still simplistic.