Skip to content

runtime: make Slack bridge restart policy adaptive (backoff/jitter/failure threshold) #121

@benvinegar

Description

@benvinegar

Problem

start.sh runs the Slack bridge in a while true loop with a fixed 5s restart delay. Persistent faults can produce noisy restart storms and ambiguous operational state.

Proposed solution

  • Replace fixed-delay infinite restart with adaptive policy:
    • exponential backoff (+ jitter)
    • reset backoff after stable run window
    • max-consecutive-failure threshold with clear alert/status signal
  • Emit structured restart reason/attempt logs for easier triage.
  • Keep current behavior as fallback if policy env vars are unset.

Helpful context

  • start.sh launches tmux session slack-bridge and restarts bridge with hardcoded 5s sleep.
  • Broker health file exists (broker-health.json) but restart policy is still simplistic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions