Skip to content

Conversation

@TheiLLeniumStudios
Copy link
Contributor

@TheiLLeniumStudios TheiLLeniumStudios commented Jan 6, 2026

Introduced a load testing framework for A/B performance comparison and added comprehensive Prometheus metrics to Reloader to support observability and performance validation

New Prometheus Metrics

Added metrics for detailed observability:

Metric Description
reloader_reconcile_total Reconcile/handler invocations by result
reloader_reconcile_duration_seconds Time spent in reconcile (histogram)
reloader_action_total Reload actions by workload kind and result
reloader_action_latency_seconds End-to-end latency from event to reload (histogram)
reloader_workloads_scanned_total Workloads scanned by kind
reloader_workloads_matched_total Workloads matched for reload by kind
reloader_errors_total Errors by type
reloader_skipped_total Skipped operations by reason
reloader_workqueue_depth Current work queue depth
reloader_workqueue_adds_total Items added to queue
reloader_workqueue_latency_seconds Queue wait time (histogram)
reloader_events_received_total Events received by type
reloader_events_processed_total Events processed by type and result
rest_client_requests_total Kubernetes API requests by code/method/host
rest_client_request_duration_seconds API request latency (histogram)

Load Test Framework

Added test/loadtest/ with:

  • CLI tool supporting run and report commands
  • Parallel execution on multiple kind clusters
  • Prometheus-based metrics collection and comparison

Test Scenarios (S1-S13)

ID Scenario What It Tests
S1 Burst Updates Rapid ConfigMap/Secret changes
S2 Fan-Out 1 ConfigMap → 50 workload reloads
S3 High Cardinality Many resources across namespaces
S4 No-Op Updates Annotation-only changes (no reload)
S5 Workload Churn Rapid Deployment create/delete
S6 Controller Restart Resilience under load
S7 API Pressure Concurrent update requests
S8 Large Objects ConfigMaps >100KB
S9 Multi-Workload Deployment, StatefulSet, DaemonSet
S10 Secrets + Mixed Secret and mixed workloads
S11 Annotation Strategy --reload-strategy=annotations
S12 Pause & Resume Pause-period behavior
S13 Complex References Init containers, valueFrom, projected volumes

CI Pipeline

GitHub Actions workflow (.github/workflows/loadtest.yml) triggered by /loadtest PR comment:

  1. Builds images from base branch (old) and PR branch (new)
  2. Creates kind cluster with Prometheus
  3. Runs all scenarios comparing both images
  4. Posts results as PR comment with pass/fail for each scenario
  5. Uploads detailed results as artifacts

Pass/Fail Criteria

  • Throughput: action_total, reload_executed_total within 15% of expected
  • Latency: p50/p95/p99 not regressed beyond thresholds
  • Errors: New implementation ≤ old implementation
  • API Efficiency: rest_client_requests_* not significantly higher

The existing pull_request workflow runs S1, S4, and S6 scenarios after regular tests. The full load test suite can be run on a PR by commenting /loadtest

@TheiLLeniumStudios TheiLLeniumStudios changed the title feat: Load tests feat: Add load test framework with observability metrics Jan 6, 2026
faizanahmad055
faizanahmad055 previously approved these changes Jan 8, 2026
@TheiLLeniumStudios TheiLLeniumStudios merged commit 3dd2741 into stakater:master Jan 9, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants