A bank enterprise microservices demo with 9 Vert.x services profiled by the Pyroscope Java agent — no application code changes required. Integrates with Prometheus, Grafana (6 dashboards), and alert rules.
git clone <this-repo> && cd pyroscope
bash scripts/run.sh # deploy + load + validate + data check (quiet mode)
# Wait for the "Ready!" banner, then:
# Grafana: http://localhost:3000 (admin/admin) → dashboards are pre-loaded
# Pyroscope: http://localhost:4040 → select any bank-* service → flame graphs
# Ctrl-C to stop load, then:
bash scripts/run.sh teardown # clean upgraph TB
subgraph Bank Microservices
GW[API Gateway :18080]
OS[Order Service :18081]
PS[Payment Service :18082]
FS[Fraud Service :18083]
AS[Account Service :18084]
LS[Loan Service :18085]
NS[Notification Service :18086]
SS[Stream Service :18087]
FA[FaaS Server :8088]
end
subgraph Observability Stack
PY[Pyroscope :4040]
PR[Prometheus :9090]
GF[Grafana :3000]
end
GW & OS & PS & FS & AS & LS & NS & SS & FA -->|JFR profiles| PY
GW & OS & PS & FS & AS & LS & NS & SS & FA -->|/metrics| PR
PY -->|flame graphs| GF
PR -->|metrics + alerts| GF
LG[generate-load.sh] -->|traffic| GW & OS & PS & FS & AS & LS & NS & SS & FA
style PY fill:#f59e0b,color:#000
style GF fill:#3b82f6,color:#fff
style PR fill:#ef4444,color:#fff
All 9 services are built from the same Docker image. The VERTICLE environment variable selects which class runs. Pyroscope profiles each one independently.
| Service | Port | Verticle | Pyroscope Name | Profiling Signature |
|---|---|---|---|---|
| API Gateway | 18080 | MainVerticle | bank-api-gateway |
Recursive fibonacci, batch processing, serialization |
| Order Service | 18081 | OrderVerticle | bank-order-service |
String concatenation, synchronized blocks (lock contention) |
| Payment Service | 18082 | PaymentVerticle | bank-payment-service |
BigDecimal math, SHA-256 hashing, synchronized ledger |
| Fraud Detection | 18083 | FraudDetectionVerticle | bank-fraud-service |
Regex rule engine, statistical analysis, sliding window |
| Account Service | 18084 | AccountVerticle | bank-account-service |
Stream API filtering, BigDecimal interest calc, ConcurrentHashMap |
| Loan Service | 18085 | LoanVerticle | bank-loan-service |
Amortization schedules, Monte Carlo simulation, portfolio aggregation |
| Notification | 18086 | NotificationVerticle | bank-notification-service |
Template rendering (String.format), queue drain loops, exponential backoff |
| Stream Service | 18087 | StreamVerticle | bank-stream-service |
Reactive streams, backpressure handling, event processing |
| FaaS Server | 8088 | FaasVerticle | bank-faas-server |
Function deploy/undeploy lifecycle, cold starts, warm pools |
Each service has deliberately different CPU, memory, and lock characteristics so flame graphs show distinct patterns when compared side by side.
API Gateway (:18080) — 17 endpoints
| Endpoint | Category |
|---|---|
/cpu |
Recursive Fibonacci |
/alloc |
Memory allocation |
/slow |
Blocking I/O |
/db, /mixed |
Combined workloads |
/redis/set, /redis/get, /redis/scan |
Serialization + pattern match |
/db/select, /db/insert, /db/join |
Database simulation |
/csv/process |
Data processing |
/json/process, /xml/process |
Serialization |
/downstream/call, /downstream/fanout |
HTTP client simulation |
/batch/process |
50K record batch |
Order Service (:18081) — 6 endpoints
| Endpoint | Category |
|---|---|
/order/create |
Build order maps (GC pressure) |
/order/list |
Iterate + serialize |
/order/validate |
Regex validation |
/order/process |
Synchronized batch (lock contention) |
/order/aggregate |
HashMap group-by |
/order/fulfill |
Fan-out orchestration |
Payment Service (:18082) — 6 endpoints
| Endpoint | Category |
|---|---|
/payment/transfer |
BigDecimal + SHA-256 signing |
/payment/payroll |
Synchronized batch payroll (200-500 employees) |
/payment/fx |
Multi-hop currency conversion |
/payment/orchestrate |
Fraud→debit→credit→notify fan-out |
/payment/history |
Ledger scan + sort |
/payment/reconcile |
Re-verify all signatures |
Fraud Service (:18083) — 6 endpoints
| Endpoint | Category |
|---|---|
/fraud/score |
Rule engine (8 regex patterns) |
/fraud/ingest |
Bulk event ingestion |
/fraud/scan |
Scan 10K events against all rules |
/fraud/anomaly |
Mean/stddev/percentiles/anomaly detection |
/fraud/velocity |
Time-window counting |
/fraud/report |
Risk bucket aggregation |
Account Service (:18084) — 8 endpoints
| Endpoint | Category |
|---|---|
/account/open |
Create new account |
/account/balance |
Lookup balance |
/account/deposit, /account/withdraw |
Synchronized balance updates |
/account/statement |
String.format-heavy statement generation |
/account/interest |
30-day compound interest (BigDecimal loop) |
/account/search |
Stream API filter + sort |
/account/branch-summary |
Group-by aggregation across 20 branches |
Loan Service (:18085) — 6 endpoints
| Endpoint | Category |
|---|---|
/loan/apply |
Weighted credit scoring + decisioning |
/loan/amortize |
Full amortization schedule (BigDecimal power) |
/loan/risk-sim |
10K Monte Carlo simulations |
/loan/portfolio |
Aggregate 3K loans by type |
/loan/delinquency |
Filter + sort delinquent loans |
/loan/originate |
Credit→appraisal→underwriting→funding orchestration |
Notification Service (:18086) — 6 endpoints
| Endpoint | Category |
|---|---|
/notify/send |
Render template + send |
/notify/bulk |
Queue 500-2000 messages |
/notify/drain |
Process outbox with 8% failure rate |
/notify/render |
Render 200-500 templates (allocation heavy) |
/notify/status |
Delivery status aggregation |
/notify/retry |
Exponential backoff retry of failed messages |
Stream Service (:18087) — 5 endpoints
| Endpoint | Category |
|---|---|
/stream/publish |
Publish events to stream |
/stream/subscribe |
Subscribe to event stream |
/stream/backpressure |
Backpressure handling demo |
/stream/transform |
Stream transformation pipeline |
/stream/aggregate |
Windowed aggregation |
FaaS Server (:8088) — 8 endpoints
| Endpoint | Category |
|---|---|
/fn/invoke/{name} |
Deploy-execute-undeploy single function |
/fn/burst/{name}?count=N |
Concurrent function invocations |
/fn/list |
List available functions |
/fn/stats |
Invocation statistics |
/fn/chain |
Chain multiple functions sequentially |
/fn/warmpool/{name}?size=N |
Pre-deploy warm pool |
DELETE /fn/warmpool/{name} |
Undeploy warm pool |
/health |
Health check |
Built-in functions: fibonacci, transform, hash, sort, sleep, matrix, regex, compress, primes, contention, fanout
bash scripts/run.sh # full pipeline: deploy → load → validate → data check
# quiet mode with spinner progress + "Ready" banner
bash scripts/run.sh --verbose # full pipeline with all output inline (old behavior)
bash scripts/run.sh --log-dir /tmp/logs # quiet mode + save full logs to disk
bash scripts/run.sh deploy # deploy only (always verbose)
bash scripts/run.sh load 60 # 60s of load
bash scripts/run.sh validate # validate only
bash scripts/run.sh teardown # clean up
bash scripts/run.sh benchmark # profiling overhead test
bash scripts/run.sh top # top functions by CPU/memory/mutex
bash scripts/run.sh top cpu # CPU hotspots only
bash scripts/run.sh health # flag problematic JVMs
bash scripts/run.sh health --json # JSON output for automation
bash scripts/run.sh diagnose # full diagnostic report (no browser needed)
bash scripts/run.sh diagnose --json # machine-readable JSON for scripting
bash scripts/run.sh --load-duration 60 # full pipeline with custom load duration
bash scripts/run.sh --fixed # deploy with OPTIMIZED=true (optimized-only mode)
bash scripts/run.sh compare # before/after comparison on running stack
bash scripts/run.sh bottleneck # automated root-cause analysis per service
bash scripts/run.sh bottleneck --json # machine-readable for alerting pipelinesThe default full pipeline runs in quiet mode: each stage shows a single-line spinner with elapsed time, and on completion prints a "Ready" banner with Grafana/Pyroscope URLs. Use --verbose for full inline output or --log-dir DIR to persist stage logs to disk.
See docs/pipeline.md for details on each stage.
bash scripts/deploy.sh # build + start 10 containers
bash scripts/generate-load.sh 120 # generate traffic (default 300s)
bash scripts/validate.sh # automated health check
bash scripts/teardown.sh # stop + cleanImport postman/pyroscope-demo.postman_collection.json into Postman
for interactive API exploration. See postman/README.md.
URLs after deploy:
| Service | URL | Credentials |
|---|---|---|
| Grafana | http://localhost:3000 | admin / admin |
| Pyroscope | http://localhost:4040 | |
| Prometheus | http://localhost:9090 | |
| API Gateway | http://localhost:18080 | |
| Order Service | http://localhost:18081 | |
| Payment Service | http://localhost:18082 | |
| Fraud Service | http://localhost:18083 | |
| Account Service | http://localhost:18084 | |
| Loan Service | http://localhost:18085 | |
| Notification Service | http://localhost:18086 | |
| Stream Service | http://localhost:18087 | |
| FaaS Server | http://localhost:8088 |
Six pre-provisioned dashboards:
| Dashboard | UID | Description |
|---|---|---|
| Pyroscope Java Overview | pyroscope-java-overview |
CPU/memory/lock/wall flame graphs with application selector for all 9 services |
| Service Performance | verticle-performance |
Per-service CPU, latency, request rate with flame graph correlation |
| JVM Metrics Deep Dive | jvm-metrics-deep-dive |
CPU gauge, heap/non-heap, GC pauses, threads, memory pool utilization |
| HTTP Performance | http-performance |
Request rate, p50/p95/p99 latency, error rate, slowest endpoints |
| Before vs After Fix | before-after-comparison |
Compare flame graphs before/after OPTIMIZED=true performance fixes |
| FaaS Server | faas-server |
FaaS-specific metrics: function invocations, deploy/undeploy lifecycle, warm pools |
The default bash scripts/run.sh pipeline generates load in two phases — before and after applying optimizations. Use the Before vs After Fix dashboard to compare flame graphs.
| Verticle | Fix | Flame Graph Impact |
|---|---|---|
| MainVerticle | fibonacci() → iterative loop |
fibonacci frame: dominant → near-zero |
| OrderVerticle | processOrders() → lock-free computeIfPresent |
Lock contention frames disappear |
| PaymentVerticle | sha256() → ThreadLocal + Character.forDigit |
getInstance + String.format frames vanish |
| FraudDetectionVerticle | Percentiles → primitive double[] + Arrays.sort |
Double.compareTo boxing gone |
| NotificationVerticle | renderTemplate() → StringBuilder + indexOf |
Formatter.format frames disappear |
| Alert | Trigger | Severity |
|---|---|---|
HighCpuUsage |
CPU > 80% for 1m | warning |
HighHeapUsage |
Heap > 85% for 2m | warning |
HighGcPauseRate |
GC > 50ms/s for 1m | warning |
HighErrorRate |
5xx > 5% for 1m | critical |
HighLatency |
p99 > 2s for 2m | warning |
ServiceDown |
Scrape fails for 30s | critical |
| Document | Audience | Description |
|---|---|---|
| docs/demo-guide.md | Everyone | Overview of the problem, solution, and what this project demonstrates |
| docs/demo-runbook.md | Presenters | Step-by-step demo agenda with commands and talking points (20-25 min) |
| Document | Audience | Description |
|---|---|---|
| docs/profiling-scenarios.md | Engineers | 6 hands-on scenarios + quick reference table of all bottlenecks by service |
| docs/code-to-profiling-guide.md | Engineers | Source code → flame graph mapping for every service and endpoint |
| docs/dashboard-guide.md | Engineers | Panel-by-panel reference for all 6 Grafana dashboards |
| docs/sample-queries.md | Engineers | Copy-paste queries for Pyroscope, Prometheus, and Grafana |
| Document | Audience | Description |
|---|---|---|
| docs/continuous-profiling-runbook.md | SREs | Deploying Pyroscope, agent configuration, Grafana setup |
| docs/runbook.md | On-call | Incident response playbooks, operational procedures |
| docs/mttr-guide.md | Managers/SREs | MTTR reduction workflow, bottleneck decision matrix |
| Document | Audience | Description |
|---|---|---|
| docs/architecture.md | Architects | Service topology, data flow, JVM agent configuration |
| docs/faas-server.md | Engineers | FaaS runtime with deploy/undeploy lifecycle profiling |
| docs/endpoint-reference.md | Engineers | Complete endpoint list with curl examples |
| docs/pipeline.md | Engineers | Pipeline stages, data flow, and configuration |
| docs/ai-profiling-roadmap.md | Leadership | Roadmap for AI/ML integration with profiling data |
pyroscope/
├── app/ # Bank microservices application (Java/Vert.x)
│ ├── build.gradle # Gradle build (shadow plugin)
│ ├── Dockerfile # Multi-stage: Gradle → JRE
│ └── src/main/java/com/example/
│ ├── MainVerticle.java # API Gateway + verticle router
│ ├── OrderVerticle.java # Order processing, synchronized blocks
│ ├── PaymentVerticle.java # Transfers, payroll, FX (SHA-256, BigDecimal)
│ ├── FraudDetectionVerticle.java # Rule engine, anomaly detection
│ ├── AccountVerticle.java # Core banking, interest calculation
│ ├── LoanVerticle.java # Amortization, Monte Carlo simulation
│ ├── NotificationVerticle.java # Templates, queue drain, retries
│ ├── StreamVerticle.java # Reactive streams, backpressure
│ ├── FaasVerticle.java # FaaS runtime, function lifecycle
│ └── handlers/ # Additional workload handlers
│
├── config/
│ ├── grafana/
│ │ ├── dashboards/ # 6 Grafana dashboards (JSON)
│ │ └── provisioning/ # Datasources + dashboard provider
│ ├── prometheus/
│ │ ├── prometheus.yaml # Scrape config for 9 services
│ │ └── alerts.yaml # 6 alert rules
│ └── pyroscope/pyroscope.yaml # Pyroscope server config
│
├── deploy/ # Production deployment configs
│ ├── monolithic/ # Single-node Pyroscope server
│ └── microservices/ # Distributed Pyroscope deployment
│ ├── vm/ # Docker Compose + NFS (for VMs/EC2)
│ └── openshift/ # Helm chart (for OpenShift 4.x)
│
├── docs/ # 14 guides (demos, runbooks, architecture)
│
├── postman/ # Postman collection + environment
│
├── scripts/
│ ├── run.sh # Main entry point — deploy, load, validate, teardown
│ ├── deploy.sh # Build + start containers
│ ├── generate-load.sh # Traffic generation to all 9 services
│ ├── validate.sh # Automated health + data verification
│ ├── teardown.sh # Stop and clean up
│ ├── diagnose.sh # Full diagnostic report (CLI)
│ ├── top-functions.sh # Top CPU/memory/mutex functions
│ ├── bottleneck.sh # Automated root-cause analysis
│ ├── jvm-health.sh # JVM health thresholds
│ ├── benchmark.sh # Profiling overhead measurement
│ ├── check-dashboards.sh # Dashboard validation
│ └── lib/ # Python analysis modules
│
├── docker-compose.yaml # 12 containers (3 infra + 9 services)
├── docker-compose.optimized.yaml # Overlay: enable OPTIMIZED=true
├── docker-compose.no-pyroscope.yaml # Overlay: disable profiling agent
└── README.md
The Pyroscope Java agent is configured via a shared properties file baked into the Docker image (/opt/pyroscope/pyroscope.properties). Per-service overrides (application name, labels) are set via environment variables in docker-compose.yaml:
# config/pyroscope/pyroscope.properties — shared across all services
pyroscope.server.address=http://pyroscope:4040
pyroscope.format=jfr
pyroscope.profiler.event=itimer
pyroscope.profiler.alloc=512k
pyroscope.profiler.lock=10ms
pyroscope.log.level=info# docker-compose.yaml — per-service overrides only
environment:
VERTICLE: payment
PYROSCOPE_APPLICATION_NAME: bank-payment-service
PYROSCOPE_LABELS: env=production,service=payment-service
PYROSCOPE_CONFIGURATION_FILE: /opt/pyroscope/pyroscope.properties
JAVA_TOOL_OPTIONS: >-
-javaagent:/opt/pyroscope/pyroscope.jar
-javaagent:/opt/jmx-exporter/jmx_prometheus_javaagent.jar=9404:/opt/jmx-exporter/config.yamlConfiguration precedence: System Properties (-D) > Environment Variables (PYROSCOPE_*) > Properties File.
All 9 services use the same Docker image. The VERTICLE env var (order, payment, fraud, account, loan, notification, stream, faas) selects which class to run. Default is MainVerticle.
| Profile | Event | What It Shows |
|---|---|---|
| CPU | cpu |
On-CPU time — which methods consume the most processor cycles |
| Allocation | alloc |
Heap allocations — objects created, GC pressure sources (threshold: 512KB) |
| Lock | lock |
Lock contention — synchronized blocks, ReentrantLock waits (threshold: 10ms) |
| Wall Clock | wall |
Real elapsed time — includes I/O waits, sleeps, off-CPU time |
In Pyroscope UI, select any bank-* application, then switch between profile types to see different views of the same service.
Compare application performance with and without the Pyroscope agent:
bash scripts/benchmark.sh # 200 requests per endpoint (default)
bash scripts/benchmark.sh 500 100 # 500 requests, 100 warmupThis runs each service endpoint twice — once with the agent attached, once without — and reports average latency, p50/p95/p99, throughput, and percentage overhead per service. Results are saved to benchmark-results/.
To manually run without profiling:
# Start without Pyroscope agent
docker compose -f docker-compose.yaml -f docker-compose.no-pyroscope.yaml up -d
# Start with Pyroscope agent (default)
docker compose up -dCompare CPU profiles across all 7 services. The Payment Service's BigDecimal math and Loan Service's Monte Carlo simulation are intentionally expensive — in production, these are the services where optimization yields the most savings.
When the Fraud Service p99 spikes, open the flame graph for bank-fraud-service and see whether the regex rule engine or the anomaly detection is the bottleneck. See docs/runbook.md for full playbooks.
Query Pyroscope API to compare profiles between builds:
curl "http://pyroscope:4040/pyroscope/render?query=process_cpu:cpu:nanoseconds:cpu:nanoseconds%7Bservice_name%3D%22bank-payment-service%22%7D&from=now-1h&until=now&format=json"The before/after comparison workflow can be automated in CI pipelines to catch performance regressions per-PR:
- Baseline snapshot script (
scripts/ci-snapshot.sh) — dump per-service top-N function CPU/alloc totals from Pyroscope API tobaseline/profiles.json. Commit as the known-good baseline - Regression gate script (
scripts/ci-compare.sh) — fetch the same top-N data after a PR build, diff against baseline, fail the pipeline if any function's CPU share increases by a configurable threshold (e.g. 20%) - GitHub Actions / GitLab CI job — spin up the stack, generate load, run the regression gate, upload flame graph diffs as PR artifacts
- Pyroscope diff API — use
GET /pyroscope/render-diff?leftFrom=...&rightFrom=...for server-side profile diffs - Slack/webhook notification — on regression, post a summary (service, function, % increase, Grafana link) to a channel
| Language | Profiler | Injection Method |
|---|---|---|
| Java (current) | async-profiler via Pyroscope agent | JAVA_TOOL_OPTIONS=-javaagent:... |
| Python | py-spy | Sidecar: py-spy record --pid <PID> |
| Go | pprof | import _ "net/http/pprof" |
| Node.js | perf hooks | NODE_OPTIONS=--require=pyroscope-node |
bash scripts/validate.sh # automated checkSee docs/runbook.md#troubleshooting for detailed steps.