Skip to content

[Memory] Add percentile sampling for high-RPS tests #70

@cbaugus

Description

@cbaugus

Problem

At very high RPS (50k+), recording every single request to HDR histograms becomes:

  • Memory intensive: Each recording adds data to histogram
  • CPU intensive: Mutex contention and histogram updates
  • Unnecessary: Statistical accuracy only needs a sample

Current behavior:

  • 100% of requests tracked in histograms
  • At 50k RPS, that's 50,000 histogram updates/sec
  • Causes both memory and performance issues

Proposed Solution

Add sampling to only track a percentage of requests:

PERCENTILE_SAMPLING_RATE=10  # Track 10% of requests (1 in 10)

Implementation Details

Files to modify:

  • src/config.rs - Add sampling rate config
  • src/percentiles.rs - Implement sampling logic
  • src/worker.rs - Apply sampling before recording

Approach 1: Deterministic sampling (Recommended)

use std::sync::atomic::{AtomicU64, Ordering};

static SAMPLE_COUNTER: AtomicU64 = AtomicU64::new(0);

pub fn should_sample(rate: u64) -> bool {
    if rate == 100 {
        return true;  // Fast path: sample everything
    }
    
    let counter = SAMPLE_COUNTER.fetch_add(1, Ordering::Relaxed);
    (counter % (100 / rate)) == 0
}

// In src/worker.rs
let latency_ms = request_start_time.elapsed().as_millis() as u64;

if should_sample(config.percentile_sampling_rate) {
    GLOBAL_REQUEST_PERCENTILES.record_ms(latency_ms);
}

Approach 2: Random sampling

use rand::Rng;

pub fn should_sample(rate: u64) -> bool {
    if rate == 100 {
        return true;
    }
    
    let mut rng = rand::thread_rng();
    rng.gen_range(0..100) < rate
}

Configuration Examples

Based on RPS:

# Low RPS (< 5k) - track everything
PERCENTILE_SAMPLING_RATE=100  # 100%

# Medium RPS (5k-25k) - track half
PERCENTILE_SAMPLING_RATE=50   # 50%

# High RPS (25k-50k) - track 10%
PERCENTILE_SAMPLING_RATE=10   # 10%

# Extreme RPS (50k+) - track 1%
PERCENTILE_SAMPLING_RATE=1    # 1%

Statistical Validity

Sampling 10% of requests is statistically valid for percentile calculation:

  • Sample size: At 50k RPS for 1h, 10% = 1.8M samples
  • Accuracy: More than sufficient for P99.9 calculation
  • Standard: Industry practice (DataDog, New Relic sample heavily)

Literature:

  • Netflix: Uses 1% sampling for tail latencies
  • Google: Samples requests for distributed tracing
  • Statistical theorem: Sample size of 1000+ is enough for percentiles

Memory Savings

Example at 50k RPS for 1 hour:

Sampling Rate    Recordings       Memory Impact
─────────────────────────────────────────────────
100% (current)   180M samples     ~100% (baseline)
50%              90M samples      ~50% reduction
10%              18M samples      ~90% reduction
1%               1.8M samples     ~99% reduction

Benefits

  • Lower memory: Fewer data points stored
  • Better performance: Less mutex contention
  • Scalability: Support higher RPS
  • Flexibility: Users tune accuracy vs resources

Output Clarity

Mark sampled percentiles clearly in output:

## Single Request Latencies (sampled at 10%)

count=1800000, min=5.23ms, max=1234.56ms, mean=45.67ms
p50=42.31ms, p90=89.23ms, p95=112.45ms, p99=234.56ms, p99.9=567.89ms

Note: Percentiles calculated from 10% sample of 18M requests (1.8M samples)

Auto-tuning (Future enhancement)

Could auto-adjust sampling based on RPS:

fn auto_sampling_rate(current_rps: f64) -> u64 {
    match current_rps {
        rps if rps < 5000.0   => 100,  // Track all
        rps if rps < 25000.0  => 50,   // Track half
        rps if rps < 50000.0  => 10,   // Track 10%
        _                     => 1,    // Track 1%
    }
}

Testing

  1. Run test with 100% sampling, get baseline percentiles
  2. Run same test with 10% sampling
  3. Compare percentiles - should be within 5% of baseline
  4. Verify memory usage reduced
  5. Performance test: Compare CPU usage with/without sampling

Documentation

Update:

  • MEMORY_OPTIMIZATION.md - Add sampling recommendations
  • README.md - Document sampling configuration
  • LOAD_TEST_SCENARIOS.md - Recommend sampling rates by scenario

Related

References

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions