Skip to content

feat: Alert system with webhook notifications #13

@rlaope

Description

@rlaope

Problem

Users cannot be notified when critical conditions occur:

  • Pinning events exceeding a threshold
  • Sudden spike in thread creation
  • Thread pool saturation
  • Application needs real-time alerting for production monitoring

Proposed Solution

Implement an alerting system that:

  1. Monitors metrics against configurable thresholds
  2. Sends notifications via webhooks (Slack, Discord, custom)
  3. Supports multiple alert rules
  4. Provides alert history and acknowledgment

Acceptance Criteria

  • Define alert rule configuration format
  • Implement threshold monitoring
  • Support webhook notifications (HTTP POST)
  • Add Slack-formatted message support
  • Add Discord-formatted message support
  • Implement cooldown period (avoid alert flooding)
  • Add /alerts API for configuration
  • Add alert history endpoint
  • Dashboard section for alert management

Alert Rules Configuration

# argus-alerts.yaml
alerts:
  - name: high-pinning-rate
    condition: pinned_events_per_minute > 100
    severity: critical
    cooldown: 5m
    notifications:
      - type: webhook
        url: https://hooks.slack.com/services/xxx
        
  - name: thread-spike
    condition: active_threads > 10000
    severity: warning
    cooldown: 1m
    notifications:
      - type: webhook
        url: https://discord.com/api/webhooks/xxx

API Design

GET /alerts/rules
POST /alerts/rules
DELETE /alerts/rules/{id}

GET /alerts/history?from=...&to=...

POST /alerts/test/{ruleId}  // Test notification

Alert Payload (Webhook)

{
  "alertName": "high-pinning-rate",
  "severity": "critical",
  "triggeredAt": "2024-01-01T12:00:00Z",
  "condition": "pinned_events_per_minute > 100",
  "currentValue": 156,
  "threshold": 100,
  "message": "High pinning rate detected: 156 events/min (threshold: 100)"
}

Slack Format

{
  "attachments": [
    {
      "color": "#ff0000",
      "title": "🚨 Argus Alert: high-pinning-rate",
      "text": "High pinning rate detected",
      "fields": [
        { "title": "Current Value", "value": "156/min", "short": true },
        { "title": "Threshold", "value": "100/min", "short": true }
      ],
      "ts": 1704110400
    }
  ]
}

UI Mockup

┌─────────────────────────────────────────────────────────────┐
│ Alert Rules                                    [+ Add Rule] │
├─────────────────────────────────────────────────────────────┤
│ ● high-pinning-rate          CRITICAL    [Edit] [Delete]   │
│   pinned_events_per_minute > 100                            │
│   Last triggered: 5 min ago                                 │
│                                                             │
│ ● thread-spike               WARNING     [Edit] [Delete]   │
│   active_threads > 10000                                    │
│   Never triggered                                           │
└─────────────────────────────────────────────────────────────┘

Supported Conditions

  • pinned_events_per_minute > N
  • active_threads > N
  • events_per_second > N
  • carrier_utilization > N%
  • submit_failed_count > N

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions