-
Notifications
You must be signed in to change notification settings - Fork 34
Open
Description
Problem
Production deployments have zero visibility into system health, performance, and usage patterns. There's no way to monitor request latency, cache hit rates, verification success rates, or set up alerting.
flowchart LR
subgraph Current
G[Gateway] --> L[log.Printf]
V[Verifier] --> L2[println!]
L --> N[Nothing]
L2 --> N
end
style N fill:#f66
What We Can't Answer Today
| Question | Current Answer |
|---|---|
| What's the p99 latency? | Unknown |
| How many requests per minute? | Unknown |
| Cache hit rate? | Unknown |
| Signature verification success rate? | Unknown |
| Is the verifier healthy? | Check logs manually |
Solution
Add Prometheus-compatible /metrics endpoints to both Gateway (Go) and Verifier (Rust) services, exposing standard RED metrics (Rate, Errors, Duration).
flowchart LR
subgraph Services
G["Gateway :3000"] --> M1["/metrics"]
V["Verifier :3002"] --> M2["/metrics"]
end
subgraph Monitoring
P["Prometheus"] --> M1
P --> M2
P --> GF["Grafana Dashboard"]
end
style M1 fill:#6f6
style M2 fill:#6f6
style GF fill:#6f6
Metrics to Expose
Gateway (Go)
| Metric | Type | Labels | Description |
|---|---|---|---|
gateway_requests_total |
Counter | method, path, status |
Total HTTP requests |
gateway_request_duration_seconds |
Histogram | method, path |
Request latency |
gateway_cache_hits_total |
Counter | path |
Cache hits |
gateway_cache_misses_total |
Counter | path |
Cache misses |
gateway_verification_total |
Counter | result (success/failure) |
Signature verifications |
gateway_rate_limit_hits_total |
Counter | path |
Rate limit rejections |
gateway_active_requests |
Gauge | - | Current in-flight requests |
Verifier (Rust)
| Metric | Type | Labels | Description |
|---|---|---|---|
verifier_requests_total |
Counter | status |
Total verification requests |
verifier_request_duration_seconds |
Histogram | - | Verification latency |
verifier_signature_valid_total |
Counter | - | Valid signatures |
verifier_signature_invalid_total |
Counter | reason |
Invalid signatures by error type |
Implementation
Gateway (Go)
Add dependency:
go get github.com/prometheus/client_golang/prometheus
go get github.com/prometheus/client_golang/prometheus/promhttpFile: gateway/metrics.go
package main
import (
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
)
var (
requestsTotal = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "gateway_requests_total",
Help: "Total number of HTTP requests",
},
[]string{"method", "path", "status"},
)
requestDuration = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "gateway_request_duration_seconds",
Help: "Request duration in seconds",
Buckets: prometheus.DefBuckets,
},
[]string{"method", "path"},
)
cacheHits = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "gateway_cache_hits_total",
Help: "Total cache hits",
},
[]string{"path"},
)
cacheMisses = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "gateway_cache_misses_total",
Help: "Total cache misses",
},
[]string{"path"},
)
activeRequests = promauto.NewGauge(
prometheus.GaugeOpts{
Name: "gateway_active_requests",
Help: "Number of active requests",
},
)
)File: gateway/middleware.go (add metrics middleware)
func MetricsMiddleware() gin.HandlerFunc {
return func(c *gin.Context) {
start := time.Now()
path := c.FullPath()
if path == "" {
path = c.Request.URL.Path
}
activeRequests.Inc()
defer activeRequests.Dec()
c.Next()
duration := time.Since(start).Seconds()
status := strconv.Itoa(c.Writer.Status())
requestsTotal.WithLabelValues(c.Request.Method, path, status).Inc()
requestDuration.WithLabelValues(c.Request.Method, path).Observe(duration)
}
}File: gateway/main.go (add endpoint)
import "github.com/prometheus/client_golang/prometheus/promhttp"
func main() {
// ... existing setup
// Metrics endpoint (no auth required)
r.GET("/metrics", gin.WrapH(promhttp.Handler()))
// Apply metrics middleware
r.Use(MetricsMiddleware())
// ... rest of routes
}Verifier (Rust)
Add dependencies to Cargo.toml:
[dependencies]
metrics = "0.21"
metrics-exporter-prometheus = "0.12"File: verifier/src/metrics.rs
use metrics::{counter, histogram};
use metrics_exporter_prometheus::PrometheusBuilder;
use std::time::Instant;
pub fn init_metrics() {
PrometheusBuilder::new()
.install()
.expect("Failed to install Prometheus recorder");
}
pub fn record_verification(valid: bool, duration: f64, error_reason: Option<&str>) {
counter!("verifier_requests_total", 1);
histogram!("verifier_request_duration_seconds", duration);
if valid {
counter!("verifier_signature_valid_total", 1);
} else {
let reason = error_reason.unwrap_or("unknown");
counter!("verifier_signature_invalid_total", 1, "reason" => reason.to_string());
}
}Update verifier/src/main.rs:
mod metrics;
use metrics_exporter_prometheus::PrometheusHandle;
#[tokio::main]
async fn main() {
// Initialize metrics
let handle = PrometheusBuilder::new()
.install_recorder()
.expect("Failed to install Prometheus recorder");
let app = Router::new()
.route("/health", get(health))
.route("/verify", post(verify_signature))
.route("/metrics", get(move || ready(handle.render())));
// ... rest of setup
}Architecture
flowchart TD
subgraph Gateway
G[Request] --> MW[MetricsMiddleware]
MW --> H[Handler]
H --> MW
MW --> M1[Prometheus Registry]
end
subgraph Verifier
V[Request] --> VH[verify_signature]
VH --> M2[Prometheus Registry]
end
subgraph Scraping
P[Prometheus] -->|/metrics| M1
P -->|/metrics| M2
P --> GF[Grafana]
end
Acceptance Criteria
Gateway (Go)
- Add
prometheus/client_golangdependency - Create
metrics.gowith metric definitions - Add
MetricsMiddlewarefor automatic request instrumentation - Instrument cache hits/misses in
cache.go - Instrument rate limiting in
ratelimit.go - Add
/metricsendpoint withpromhttp.Handler() - Add unit tests for metrics middleware
Verifier (Rust)
- Add
metricsandmetrics-exporter-prometheusdependencies - Create
metrics.rsmodule - Instrument
verify_signaturefunction - Add
/metricsendpoint - Add tests for metrics recording
Documentation
- Update gateway README with metrics documentation
- Update verifier README with metrics documentation
- Add example Prometheus scrape config
- Add example Grafana dashboard JSON (optional)
Environment Variables
# Enable/disable metrics endpoint (default: true)
METRICS_ENABLED=true
# Metrics endpoint path (default: /metrics)
METRICS_PATH=/metricsTesting
# Gateway
cd gateway && go test -v -run TestMetrics
# Verifier
cd verifier && cargo test
# Manual verification
bun run stack
# Check Gateway metrics
curl http://localhost:3000/metrics | grep gateway_
# Check Verifier metrics
curl http://localhost:3002/metrics | grep verifier_
# Make some requests and verify counters increase
curl -X POST http://localhost:3000/api/ai/summarize -d '{"text":"test"}'
curl http://localhost:3000/metrics | grep gateway_requests_totalExample Prometheus Config
# prometheus.yml
scrape_configs:
- job_name: 'microai-gateway'
static_configs:
- targets: ['localhost:3000']
metrics_path: /metrics
- job_name: 'microai-verifier'
static_configs:
- targets: ['localhost:3002']
metrics_path: /metricsExample Queries
# Request rate (last 5 min)
rate(gateway_requests_total[5m])
# p99 latency
histogram_quantile(0.99, rate(gateway_request_duration_seconds_bucket[5m]))
# Cache hit ratio
sum(rate(gateway_cache_hits_total[5m])) /
(sum(rate(gateway_cache_hits_total[5m])) + sum(rate(gateway_cache_misses_total[5m])))
# Verification success rate
sum(rate(verifier_signature_valid_total[5m])) /
sum(rate(verifier_requests_total[5m]))
Reactions are currently unavailable