This directory contains load testing tools and scripts to test the Degree Flowchart application with up to 1000 concurrent users and extrapolate results for 1 million concurrent users.
Requirement: The system should accommodate 1 million concurrent active sessions/users.
Testing Strategy:
- Run load tests with 1000 concurrent users (practical test)
- Collect performance metrics and identify bottlenecks
- Calculate resource requirements for 1M users
- Provide architectural recommendations for scaling
- Best for: High-performance load testing, easy scripting
- Language: JavaScript
- Files:
k6-load-test.js- Comprehensive test with multiple scenariosk6-simple-1000-users.js- Simple 1000 user test
- Best for: Python developers, distributed testing
- Language: Python
- Files:
locust-test.py
- Prometheus: Metrics collection from services
- Grafana: Metrics visualization
- InfluxDB: k6 results storage
- Files:
docker-compose-monitoring.yml
# Install k6 (macOS)
brew install k6
# Or download from: https://k6.io/docs/get-started/installation/
# Install Locust (Python)
pip install locust
# Ensure main application is running
cd ..
docker-compose up -d# Navigate to load-testing directory
cd load-testing
# Run 1000 concurrent users for 5 minutes
k6 run --vus 1000 --duration 5m k6-simple-1000-users.js
# Generate HTML report
k6 run --vus 1000 --duration 5m \
--out json=results.json \
k6-simple-1000-users.js# Runs staged test: 100 → 500 → 1000 users
k6 run k6-load-test.js
# With custom environment
k6 run --env BASE_URL=http://localhost:9000 k6-load-test.js# Start Locust web UI
locust -f locust-test.py --host=http://localhost:9000
# Open browser: http://localhost:8089
# Configure:
# - Number of users: 1000
# - Spawn rate: 50/second
# - Host: http://localhost:9000
# Or run headless
locust -f locust-test.py --host=http://localhost:9000 \
--users 1000 --spawn-rate 50 --run-time 5m \
--html=report.html --csv=results# Start monitoring services
docker-compose -f docker-compose-monitoring.yml up -d
# Wait 30 seconds for services to start
sleep 30
# Run k6 test with InfluxDB output
k6 run --out influxdb=http://localhost:8086 k6-simple-1000-users.js
# View metrics in Grafana
open http://localhost:3000
# Login: admin / admin
# View Prometheus
open http://localhost:9090- GET /courses
- GET /courses?level=GRADUATE
- GET /courses?department=Computer Science
- GET /courses/code/{code}
- GET /degrees
- GET /degrees/code/MSCS
- GET /degrees/{id}/requirements
- GET /degrees/{id}/constraints
- Check authentication status
- View student profile
- View transcript
- Plan courses
-
Request Rate
- Requests per second (RPS)
- Target: > 1000 RPS for 1000 users
-
Response Time
- p50 (median): < 500ms
- p95: < 2000ms
- p99: < 5000ms
-
Error Rate
- Target: < 5%
-
JVM Metrics
- Heap memory usage
- GC pauses
- Thread count
-
Connection Pool
- Active connections
- Waiting connections
- Pool utilization
-
Query Performance
- Slow queries (> 100ms)
- Query rate
-
Resource Usage
- CPU usage
- Memory usage
- Disk I/O
-
CPU Usage
- Per service
- System-wide
-
Memory Usage
- Per service
- System-wide
-
Network
- Bandwidth usage
- Connection count
k6 run --vus 10 --duration 1m k6-simple-1000-users.jsExpected Results:
- Error rate: 0%
- p95 response time: < 500ms
k6 run --vus 100 --duration 5m k6-simple-1000-users.jsExpected Results:
- Error rate: < 1%
- p95 response time: < 1000ms
k6 run --vus 500 --duration 5m k6-simple-1000-users.jsExpected Results:
- Error rate: < 3%
- p95 response time: < 2000ms
k6 run --vus 1000 --duration 10m k6-simple-1000-users.jsExpected Results:
- Error rate: < 5%
- p95 response time: < 3000ms
- System remains stable
k6 run --stage 1m:100,30s:1000,1m:100,30s:1000 k6-load-test.jsTests: System recovery after sudden load spikes
✓ status is 2xx
✓ response time OK
checks.........................: 95.23% ✓ 95230 ✗ 4770
data_received..................: 245 MB 408 kB/s
data_sent......................: 12 MB 20 kB/s
http_req_blocked...............: avg=1.2ms min=1µs med=3µs max=145ms p(90)=5µs p(95)=7µs
http_req_connecting............: avg=523µs min=0s med=0s max=98ms p(90)=0s p(95)=0s
http_req_duration..............: avg=285ms min=12ms med=245ms max=5.2s p(90)=456ms p(95)=678ms
{ expected_response:true }...: avg=278ms min=12ms med=243ms max=1.8s p(90)=445ms p(95)=654ms
http_req_failed................: 4.76% ✓ 4770 ✗ 95230
http_req_receiving.............: avg=145µs min=11µs med=98µs max=12ms p(90)=234µs p(95)=345µs
http_req_sending...............: avg=45µs min=5µs med=32µs max=8ms p(90)=78µs p(95)=123µs
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=284ms min=11ms med=244ms max=5.2s p(90)=455ms p(95)=676ms
http_reqs......................: 100000 166.67/s
iteration_duration.............: avg=5.9s min=2.1s med=5.8s max=15.2s p(90)=8.2s p(95)=9.5s
iterations.....................: 16667 27.78/s
vus............................: 1000 min=1000 max=1000
vus_max........................: 1000 min=1000 max=1000
- http_reqs: Total requests = 100,000 at 166.67 req/sec
- http_req_duration (p95): 95% of requests completed in < 678ms
- http_req_failed: 4.76% error rate
- vus: 1000 concurrent users maintained
Based on 1000 concurrent user test results:
Observed Metrics (Example):
- Request rate: 166 RPS
- CPU usage: 40% per service
- Memory: 1GB per service
- Database connections: 50 active
Calculation for 1M users:
Scaling Factor = 1,000,000 / 1,000 = 1,000x
Required Resources:
- Request rate: 166 * 1,000 = 166,000 RPS
- Service instances: 1,000 instances (with load balancing)
- Database connections: 50,000 (distributed across replicas)
- Memory: 1TB total (1GB × 1000 instances)
- CPU: Proportional scaling
| Component | 1K Users | 1M Users (1000x) |
|---|---|---|
| API Gateway Instances | 1 | 100-200 |
| Course Service Instances | 1 | 200-300 |
| Student Service Instances | 1 | 300-400 |
| Degree Service Instances | 1 | 200-300 |
| PostgreSQL Replicas | 1 | 20-30 (read replicas) |
| Redis Cluster Nodes | 1 | 10-15 |
| Total Memory | ~8GB | ~8TB |
| Total CPU Cores | ~8 | ~8000 |
# Kubernetes Horizontal Pod Autoscaler (HPA)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: course-service-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: course-service
minReplicas: 10
maxReplicas: 300
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70- Master-Slave Replication: 1 master + 20-30 read replicas
- Connection Pooling: PgBouncer or connection pools
- Sharding: Partition data by degree program or user ID
- Caching: Aggressive Redis caching (95%+ cache hit rate)
- Multiple Gateway Instances: 100-200 instances
- Load Balancer: AWS ALB, GCP Load Balancer, or Nginx
- Rate Limiting: Per-user rate limits
- Circuit Breakers: Prevent cascade failures
- Redis Cluster: Distributed session storage
- JWT Tokens: Stateless authentication
- Session Timeout: Short-lived sessions (30 min)
- L1 Cache: In-memory (per service instance)
- L2 Cache: Redis (distributed)
- CDN: CloudFront, Cloudflare for static content
- Cache Invalidation: Event-driven updates
- Async Processing: Kafka or RabbitMQ
- Event-Driven Architecture: Course updates, degree changes
- CQRS: Separate read/write models
- Distributed Tracing: Jaeger, Zipkin
- Log Aggregation: ELK Stack, Splunk
- Metrics: Prometheus + Grafana
- Alerting: PagerDuty, Opsgenie
- k6 test results for 1000 users
- Locust test results
- Grafana dashboards screenshots
- Performance metrics CSV/JSON
- Bottleneck identification
- Resource utilization analysis
- Scaling calculations
- Cost estimation
- Architecture diagram for 1M users
- Infrastructure requirements
- Implementation roadmap
- Risk assessment
- ✅ 95% of requests complete in < 3 seconds
- ✅ Error rate < 5%
- ✅ System remains stable for 10+ minutes
- ✅ No memory leaks or resource exhaustion
- ✅ Mathematical justification for resource scaling
- ✅ Architecture supports horizontal scaling
- ✅ Database can handle load with replication
- ✅ Cost estimation provided
- ✅ Implementation plan documented
# Check service logs
docker logs degree-edge-service
docker logs course-service
# Check database connections
docker exec -it postgres psql -U postgres -c "SELECT count(*) FROM pg_stat_activity;"
# Check memory
docker stats# Check database slow queries
docker exec -it postgres psql -U postgres -c "SELECT query, mean_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10;"
# Check JVM heap
curl http://localhost:9001/actuator/metrics/jvm.memory.used
# Check Redis performance
docker exec -it redis redis-cli INFO stats- Increase connection pool size in
application.yml - Add more database replicas
- Check network latency
- k6 Documentation
- Locust Documentation
- Prometheus Documentation
- Grafana Documentation
- Performance Testing Best Practices
- Run baseline tests (10, 100, 500, 1000 users)
- Collect and analyze metrics
- Identify bottlenecks
- Create scaling plan
- Document findings
- Present results with justification for 1M users