Load Testing Guide - Degree Flowchart Application

📋 Overview

This directory contains load testing tools and scripts to test the Degree Flowchart application with up to 1000 concurrent users and extrapolate results for 1 million concurrent users.

🎯 Non-Functional Requirement

Requirement: The system should accommodate 1 million concurrent active sessions/users.

Testing Strategy:

Run load tests with 1000 concurrent users (practical test)
Collect performance metrics and identify bottlenecks
Calculate resource requirements for 1M users
Provide architectural recommendations for scaling

🛠️ Tools Provided

1. k6 (Recommended)

Best for: High-performance load testing, easy scripting
Language: JavaScript
Files:
- k6-load-test.js - Comprehensive test with multiple scenarios
- k6-simple-1000-users.js - Simple 1000 user test

2. Locust

Best for: Python developers, distributed testing
Language: Python
Files: locust-test.py

3. Monitoring Stack

Prometheus: Metrics collection from services
Grafana: Metrics visualization
InfluxDB: k6 results storage
Files: docker-compose-monitoring.yml

🚀 Quick Start

Prerequisites

# Install k6 (macOS)
brew install k6

# Or download from: https://k6.io/docs/get-started/installation/

# Install Locust (Python)
pip install locust

# Ensure main application is running
cd ..
docker-compose up -d

Option 1: Run Simple 1000 User Test with k6

# Navigate to load-testing directory
cd load-testing

# Run 1000 concurrent users for 5 minutes
k6 run --vus 1000 --duration 5m k6-simple-1000-users.js

# Generate HTML report
k6 run --vus 1000 --duration 5m \
   --out json=results.json \
   k6-simple-1000-users.js

Option 2: Run Comprehensive k6 Test

# Runs staged test: 100 → 500 → 1000 users
k6 run k6-load-test.js

# With custom environment
k6 run --env BASE_URL=http://localhost:9000 k6-load-test.js

Option 3: Run Locust Test

# Start Locust web UI
locust -f locust-test.py --host=http://localhost:9000

# Open browser: http://localhost:8089
# Configure:
#   - Number of users: 1000
#   - Spawn rate: 50/second
#   - Host: http://localhost:9000

# Or run headless
locust -f locust-test.py --host=http://localhost:9000 \
       --users 1000 --spawn-rate 50 --run-time 5m \
       --html=report.html --csv=results

Option 4: Run with Monitoring Stack

# Start monitoring services
docker-compose -f docker-compose-monitoring.yml up -d

# Wait 30 seconds for services to start
sleep 30

# Run k6 test with InfluxDB output
k6 run --out influxdb=http://localhost:8086 k6-simple-1000-users.js

# View metrics in Grafana
open http://localhost:3000
# Login: admin / admin

# View Prometheus
open http://localhost:9090

📊 Test Scenarios

Scenario 1: Browse Courses (40% of users)

GET /courses
GET /courses?level=GRADUATE
GET /courses?department=Computer Science
GET /courses/code/{code}

Scenario 2: View Degrees (30% of users)

GET /degrees
GET /degrees/code/MSCS
GET /degrees/{id}/requirements
GET /degrees/{id}/constraints

Scenario 3: Authenticated Users (30% of users)

Check authentication status
View student profile
View transcript
Plan courses

📈 Metrics to Monitor

Application Metrics (from Spring Boot Actuator)

Request Rate
- Requests per second (RPS)
- Target: > 1000 RPS for 1000 users
Response Time
- p50 (median): < 500ms
- p95: < 2000ms
- p99: < 5000ms
Error Rate
- Target: < 5%
JVM Metrics
- Heap memory usage
- GC pauses
- Thread count

Database Metrics (from PostgreSQL)

Connection Pool
- Active connections
- Waiting connections
- Pool utilization
Query Performance
- Slow queries (> 100ms)
- Query rate
Resource Usage
- CPU usage
- Memory usage
- Disk I/O

System Metrics

CPU Usage
- Per service
- System-wide
Memory Usage
- Per service
- System-wide
Network
- Bandwidth usage
- Connection count

🔬 Running Tests

Step 1: Baseline Test (10 users)

k6 run --vus 10 --duration 1m k6-simple-1000-users.js

Expected Results:

Error rate: 0%
p95 response time: < 500ms

Step 2: Load Test (100 users)

k6 run --vus 100 --duration 5m k6-simple-1000-users.js

Expected Results:

Error rate: < 1%
p95 response time: < 1000ms

Step 3: Stress Test (500 users)

k6 run --vus 500 --duration 5m k6-simple-1000-users.js

Expected Results:

Error rate: < 3%
p95 response time: < 2000ms

Step 4: Peak Test (1000 users)

k6 run --vus 1000 --duration 10m k6-simple-1000-users.js

Expected Results:

Error rate: < 5%
p95 response time: < 3000ms
System remains stable

Step 5: Spike Test

k6 run --stage 1m:100,30s:1000,1m:100,30s:1000 k6-load-test.js

Tests: System recovery after sudden load spikes

📊 Example k6 Output

     ✓ status is 2xx
     ✓ response time OK

     checks.........................: 95.23% ✓ 95230  ✗ 4770
     data_received..................: 245 MB 408 kB/s
     data_sent......................: 12 MB  20 kB/s
     http_req_blocked...............: avg=1.2ms   min=1µs    med=3µs    max=145ms  p(90)=5µs    p(95)=7µs
     http_req_connecting............: avg=523µs   min=0s     med=0s     max=98ms   p(90)=0s     p(95)=0s
     http_req_duration..............: avg=285ms   min=12ms   med=245ms  max=5.2s   p(90)=456ms  p(95)=678ms
       { expected_response:true }...: avg=278ms   min=12ms   med=243ms  max=1.8s   p(90)=445ms  p(95)=654ms
     http_req_failed................: 4.76%  ✓ 4770   ✗ 95230
     http_req_receiving.............: avg=145µs   min=11µs   med=98µs   max=12ms   p(90)=234µs  p(95)=345µs
     http_req_sending...............: avg=45µs    min=5µs    med=32µs   max=8ms    p(90)=78µs   p(95)=123µs
     http_req_tls_handshaking.......: avg=0s      min=0s     med=0s     max=0s     p(90)=0s     p(95)=0s
     http_req_waiting...............: avg=284ms   min=11ms   med=244ms  max=5.2s   p(90)=455ms  p(95)=676ms
     http_reqs......................: 100000 166.67/s
     iteration_duration.............: avg=5.9s    min=2.1s   med=5.8s   max=15.2s  p(90)=8.2s   p(95)=9.5s
     iterations.....................: 16667  27.78/s
     vus............................: 1000   min=1000 max=1000
     vus_max........................: 1000   min=1000 max=1000

Key Metrics Explained:

http_reqs: Total requests = 100,000 at 166.67 req/sec
http_req_duration (p95): 95% of requests completed in < 678ms
http_req_failed: 4.76% error rate
vus: 1000 concurrent users maintained

🧮 Extrapolation to 1 Million Users

Methodology

Based on 1000 concurrent user test results:

Observed Metrics (Example):

Request rate: 166 RPS
CPU usage: 40% per service
Memory: 1GB per service
Database connections: 50 active

Calculation for 1M users:

Scaling Factor = 1,000,000 / 1,000 = 1,000x

Required Resources:
- Request rate: 166 * 1,000 = 166,000 RPS
- Service instances: 1,000 instances (with load balancing)
- Database connections: 50,000 (distributed across replicas)
- Memory: 1TB total (1GB × 1000 instances)
- CPU: Proportional scaling

Resource Requirements (1M Users)

Component	1K Users	1M Users (1000x)
API Gateway Instances	1	100-200
Course Service Instances	1	200-300
Student Service Instances	1	300-400
Degree Service Instances	1	200-300
PostgreSQL Replicas	1	20-30 (read replicas)
Redis Cluster Nodes	1	10-15
Total Memory	~8GB	~8TB
Total CPU Cores	~8	~8000

🏗️ Architectural Changes Required for 1M Users

1. Horizontal Scaling

# Kubernetes Horizontal Pod Autoscaler (HPA)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: course-service-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: course-service
  minReplicas: 10
  maxReplicas: 300
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

2. Database Scaling

Master-Slave Replication: 1 master + 20-30 read replicas
Connection Pooling: PgBouncer or connection pools
Sharding: Partition data by degree program or user ID
Caching: Aggressive Redis caching (95%+ cache hit rate)

3. API Gateway Scaling

Multiple Gateway Instances: 100-200 instances
Load Balancer: AWS ALB, GCP Load Balancer, or Nginx
Rate Limiting: Per-user rate limits
Circuit Breakers: Prevent cascade failures

4. Session Management

Redis Cluster: Distributed session storage
JWT Tokens: Stateless authentication
Session Timeout: Short-lived sessions (30 min)

5. Caching Strategy

L1 Cache: In-memory (per service instance)
L2 Cache: Redis (distributed)
CDN: CloudFront, Cloudflare for static content
Cache Invalidation: Event-driven updates

6. Message Queue

Async Processing: Kafka or RabbitMQ
Event-Driven Architecture: Course updates, degree changes
CQRS: Separate read/write models

7. Monitoring & Observability

Distributed Tracing: Jaeger, Zipkin
Log Aggregation: ELK Stack, Splunk
Metrics: Prometheus + Grafana
Alerting: PagerDuty, Opsgenie

📄 Deliverables

1. Test Results

k6 test results for 1000 users
Locust test results
Grafana dashboards screenshots
Performance metrics CSV/JSON

2. Analysis Document

Bottleneck identification
Resource utilization analysis
Scaling calculations
Cost estimation

3. Recommendations

Architecture diagram for 1M users
Infrastructure requirements
Implementation roadmap
Risk assessment

🎯 Success Criteria

For 1000 Users (Actual Test)

✅ 95% of requests complete in < 3 seconds
✅ Error rate < 5%
✅ System remains stable for 10+ minutes
✅ No memory leaks or resource exhaustion

For 1M Users (Extrapolation)

✅ Mathematical justification for resource scaling
✅ Architecture supports horizontal scaling
✅ Database can handle load with replication
✅ Cost estimation provided
✅ Implementation plan documented

🐛 Troubleshooting

High Error Rates

# Check service logs
docker logs degree-edge-service
docker logs course-service

# Check database connections
docker exec -it postgres psql -U postgres -c "SELECT count(*) FROM pg_stat_activity;"

# Check memory
docker stats

Slow Response Times

# Check database slow queries
docker exec -it postgres psql -U postgres -c "SELECT query, mean_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10;"

# Check JVM heap
curl http://localhost:9001/actuator/metrics/jvm.memory.used

# Check Redis performance
docker exec -it redis redis-cli INFO stats

Connection Timeouts

Increase connection pool size in application.yml
Add more database replicas
Check network latency

📚 Additional Resources

🔗 Next Steps

Run baseline tests (10, 100, 500, 1000 users)
Collect and analyze metrics
Identify bottlenecks
Create scaling plan
Document findings
Present results with justification for 1M users

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
monitoring		monitoring
results		results
LOAD_TESTING_METHODOLOGY.md		LOAD_TESTING_METHODOLOGY.md
QUICK_START.md		QUICK_START.md
README.md		README.md
SCALABILITY_ANALYSIS.md		SCALABILITY_ANALYSIS.md
docker-compose-monitoring.yml		docker-compose-monitoring.yml
k6-load-test.js		k6-load-test.js
k6-simple-1000-users.js		k6-simple-1000-users.js
locust-test.py		locust-test.py
run-load-test.sh		run-load-test.sh
summary.html		summary.html
summary.json		summary.json

degree-flowchart/load-testing

Folders and files

Latest commit

History

Repository files navigation

Load Testing Guide - Degree Flowchart Application

📋 Overview

🎯 Non-Functional Requirement

🛠️ Tools Provided

1. k6 (Recommended)

2. Locust

3. Monitoring Stack

🚀 Quick Start

Prerequisites

Option 1: Run Simple 1000 User Test with k6

Option 2: Run Comprehensive k6 Test

Option 3: Run Locust Test

Option 4: Run with Monitoring Stack

📊 Test Scenarios

Scenario 1: Browse Courses (40% of users)

Scenario 2: View Degrees (30% of users)

Scenario 3: Authenticated Users (30% of users)

📈 Metrics to Monitor

Application Metrics (from Spring Boot Actuator)

Database Metrics (from PostgreSQL)

System Metrics

🔬 Running Tests

Step 1: Baseline Test (10 users)

Step 2: Load Test (100 users)

Step 3: Stress Test (500 users)

Step 4: Peak Test (1000 users)

Step 5: Spike Test

📊 Example k6 Output

Key Metrics Explained:

🧮 Extrapolation to 1 Million Users

Methodology

Resource Requirements (1M Users)

🏗️ Architectural Changes Required for 1M Users

1. Horizontal Scaling

2. Database Scaling

3. API Gateway Scaling

4. Session Management

5. Caching Strategy

6. Message Queue

7. Monitoring & Observability

📄 Deliverables

1. Test Results

2. Analysis Document

3. Recommendations

🎯 Success Criteria

For 1000 Users (Actual Test)

For 1M Users (Extrapolation)

🐛 Troubleshooting

High Error Rates

Slow Response Times

Connection Timeouts

📚 Additional Resources

🔗 Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages