Skip to content

Conversation

@cybaxx
Copy link
Collaborator

@cybaxx cybaxx commented Dec 1, 2025

LGTM Stack Improvements & Observability Fixes

Summary

This PR delivers a structured set of fixes and enhancements across the Loki, Grafana, Tempo, and Prometheus stack to stabilize the observability environment and clean up supporting configuration across the repo.


Highlights

Loki

  • Fixed WAL/index/chunk directory issues and storage permission errors.
  • Refactored loki-config.yaml with clearer storage paths and single-node lifecycler settings.
  • Added missing persistent volume mounts and validated directory structure.

Grafana

  • Corrected and validated datasources for Prometheus, Loki, Node Exporter, and Tempo.
  • Resolved issues where Grafana couldn’t query Node Exporter despite Prometheus successfully scraping it.

Tempo

  • Created a functional tempo-config.yaml for docker-compose.
  • Fixed port conflicts and startup errors across receiver, ingester, and distributor modules.
  • Achieved consistent service initialization.

Docker Compose

  • Removed deprecated version: directive.
  • Improved service dependency ordering and ensured correct mounts for Loki and Tempo.
  • Updated component images to stable, known-working tags.

Repo & Infrastructure Cleanup

  • Ensured required directories exist for Loki and Tempo persistence.
  • Reduced log noise and improved configuration comments/documentation.
  • Validated end-to-end metrics, logs, and traces flow across the stack.

Next Steps

  • Automate Grafana datasource and dashboard provisioning.
  • Add baseline dashboards for metrics/logs/traces correlation.
  • Explore long-term storage backend options for Loki (e.g., S3/MinIO).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants