Skip to content

Observability: Metrics, Traces, Logging, SLO Dashboards, Alerting #103

@IgnacioPro

Description

@IgnacioPro

Production readiness requires Prometheus metrics, OTel traces, log correlation, sample control, and default Grafana dashboards with SLO-driven alerts (latency, error rates, agent health, AI cost).

Paths to start:

  • Metrics: expose Prometheus endpoints in API/agent; see any metrics.go or injection in main.go
  • Traces: distributed and sample logic in API, agent, key flows (internal/diagnostics/, circuit breaker layer ref: CLAUDE.md)
  • Logging: add and correlate context IDs; see logrus config
  • Dashboards/alerts: publish default Grafana JSON; SLO recipes for latency, errors, agent heartbeat, budget

Integrate with key flows (diagnostics, remediation, agent lifecycle).

References: internal/diagnostics/, CLAUDE.md, grafana dashboards, main.go.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions