Skip to content

Comments

feat: add observability stack and kubelogstream to coder app#113

Open
sharkymark wants to merge 16 commits intomainfrom
mm/coder-observe-kubelogstream
Open

feat: add observability stack and kubelogstream to coder app#113
sharkymark wants to merge 16 commits intomainfrom
mm/coder-observe-kubelogstream

Conversation

@sharkymark
Copy link
Contributor

Adds comprehensive monitoring and Kubernetes event streaming for better visibility into Coder deployments and workspace troubleshooting.

Components:

  • Kubelogstream: streams pod/event logs to workspace startup logs
  • Observability: full stack with Prometheus, Grafana, Loki, Alertmanager

Changes:

  • Add component 5 (kubelogstream) with Helm values
  • Add component 6 (observability) with full monitoring stack
  • Configure Coder to expose Prometheus metrics and agent stats
  • Add coder-observability namespace to sandbox
  • Enhance RDS secrets action to create secrets in both namespaces
  • Use ebs-auto storage class for all persistent volumes

sharkymark and others added 7 commits February 22, 2026 10:52
Adds comprehensive monitoring and Kubernetes event streaming for better visibility into Coder deployments and workspace troubleshooting.

Components:
- Kubelogstream: streams pod/event logs to workspace startup logs
- Observability: full stack with Prometheus, Grafana, Loki, Alertmanager

Changes:
- Add component 5 (kubelogstream) with Helm values
- Add component 6 (observability) with full monitoring stack
- Configure Coder to expose Prometheus metrics and agent stats
- Add coder-observability namespace to sandbox
- Enhance RDS secrets action to create secrets in both namespaces
- Use ebs-auto storage class for all persistent volumes

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Enables public access to Grafana dashboards via subdomain with proper authentication instead of port-forwarding.

Changes:
- Generate random Grafana admin password in RDS secrets action
- Store credentials in AWS Secrets Manager (grafana-admin-{install-id})
- Add grafana_password action to retrieve credentials for admins
- Configure Grafana ingress for subdomain: grafana.{install-id}.nuon.run
- Disable anonymous authentication (require login)
- Update README with step-by-step access instructions

Admin retrieves credentials by running grafana_password action in Nuon UI, then logs in at the subdomain with username 'admin' and the generated password.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Removes manual CNAME creation step by adding wildcard domain to external-dns annotation in ALB ingress.

Changes:
- Add *.{domain} to external-dns hostname annotation in ALB template
- Update README to note wildcard DNS is now automatic
- Enables workspace web apps and port forwarding without manual DNS config

external-dns now creates both the main domain and wildcard CNAME records automatically, pointing to the ALB DNS name.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changes Grafana from subdomain (new ALB) to path-based routing on existing Coder ALB, reducing cost and complexity.

Changes:
- Add group.name annotation to Coder ALB for sharing
- Configure Grafana ingress to join same ALB group
- Serve Grafana from /grafana path with serve_from_sub_path
- Update URLs in README and password action
- Set group.order=200 to route after Coder paths

Result: One ALB instead of two, saves ~$20/month. Grafana accessible at https://{domain}/grafana.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Deletes remove-gp2-default action that was only needed for troubleshooting a previous storage class issue.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…lback

Changes storage class to be default from the start rather than relying on action.

Changes:
- Set is_default_class=true in sandbox.tfvars
- Remove automatic trigger from default-storage-class action
- Keep action as manual-only for troubleshooting if needed

Storage class is now default from sandbox creation, with manual action available if ever needed to re-apply.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…g env var

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sharkymark sharkymark force-pushed the mm/coder-observe-kubelogstream branch from cc2dca2 to 0560f22 Compare February 22, 2026 18:19
sharkymark and others added 9 commits February 22, 2026 15:10
Replace endpoint/port with address/db_instance_port to match actual
rds_cluster_coder outputs, fixing template rendering failure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…edence

Without group.order, the main ALB's catch-all '/' rule (priority 0)
intercepts all traffic including /grafana before Grafana's ingress rule
can match. Setting group.order=1000 ensures more specific paths from
other ingresses (e.g. Grafana at /grafana) are evaluated first.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
target-type is not inherited from the group leader ingress — each
ingress must declare it. Without it the controller defaults to instance
mode, Grafana's ClusterIP service has no NodePort, port resolves to 0,
and CreateTargetGroup fails.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Without this annotation the ALB controller places the /grafana rule on
the HTTP:80 listener only. The main ALB ingress uses HTTPS:443, so
/grafana was never matched on that listener — Coder's /* caught it first.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The upstream coder-observability chart enforces password complexity.
Generated passwords (base64, no special chars) fail the policy,
blocking both initial login and grafana-cli password reset.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tion

The upstream chart sets GF_SECURITY_DISABLE_INITIAL_ADMIN_CREATION=true
which prevents Grafana from creating the admin user from the existingSecret
env vars. Override to false so admin is created from grafana-admin secret
on first start.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extract Grafana admin credential creation from rds_secrets into its own
grafana_setup action, triggered post-deploy of coder component. This
ensures the grafana-admin secret exists before observability deploys,
and keeps each action focused on a single concern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Trigger directly before observability deploys rather than post-coder,
which is the correct lifecycle hook for pre-seeding secrets.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- rds_cluster_coder depends on rds_subnet (uses its subnet group id)
- application_load_balancer depends on certificate (uses its ARN)
- observability depends on application_load_balancer (Grafana joins ALB group)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant