[Memory] Add percentile sampling for high-RPS tests

## Problem

At very high RPS (50k+), recording **every single request** to HDR histograms becomes:
- **Memory intensive**: Each recording adds data to histogram
- **CPU intensive**: Mutex contention and histogram updates
- **Unnecessary**: Statistical accuracy only needs a sample

**Current behavior:**
- 100% of requests tracked in histograms
- At 50k RPS, that's 50,000 histogram updates/sec
- Causes both memory and performance issues

## Proposed Solution

Add sampling to only track a percentage of requests:

```bash
PERCENTILE_SAMPLING_RATE=10  # Track 10% of requests (1 in 10)
```

### Implementation Details

**Files to modify:**
- `src/config.rs` - Add sampling rate config
- `src/percentiles.rs` - Implement sampling logic
- `src/worker.rs` - Apply sampling before recording

**Approach 1: Deterministic sampling (Recommended)**
```rust
use std::sync::atomic::{AtomicU64, Ordering};

static SAMPLE_COUNTER: AtomicU64 = AtomicU64::new(0);

pub fn should_sample(rate: u64) -> bool {
    if rate == 100 {
        return true;  // Fast path: sample everything
    }
    
    let counter = SAMPLE_COUNTER.fetch_add(1, Ordering::Relaxed);
    (counter % (100 / rate)) == 0
}

// In src/worker.rs
let latency_ms = request_start_time.elapsed().as_millis() as u64;

if should_sample(config.percentile_sampling_rate) {
    GLOBAL_REQUEST_PERCENTILES.record_ms(latency_ms);
}
```

**Approach 2: Random sampling**
```rust
use rand::Rng;

pub fn should_sample(rate: u64) -> bool {
    if rate == 100 {
        return true;
    }
    
    let mut rng = rand::thread_rng();
    rng.gen_range(0..100) < rate
}
```

## Configuration Examples

Based on RPS:

```bash
# Low RPS (< 5k) - track everything
PERCENTILE_SAMPLING_RATE=100  # 100%

# Medium RPS (5k-25k) - track half
PERCENTILE_SAMPLING_RATE=50   # 50%

# High RPS (25k-50k) - track 10%
PERCENTILE_SAMPLING_RATE=10   # 10%

# Extreme RPS (50k+) - track 1%
PERCENTILE_SAMPLING_RATE=1    # 1%
```

## Statistical Validity

Sampling 10% of requests is statistically valid for percentile calculation:

- **Sample size**: At 50k RPS for 1h, 10% = 1.8M samples
- **Accuracy**: More than sufficient for P99.9 calculation
- **Standard**: Industry practice (DataDog, New Relic sample heavily)

**Literature:**
- Netflix: Uses 1% sampling for tail latencies
- Google: Samples requests for distributed tracing
- Statistical theorem: Sample size of 1000+ is enough for percentiles

## Memory Savings

Example at 50k RPS for 1 hour:

```
Sampling Rate    Recordings       Memory Impact
─────────────────────────────────────────────────
100% (current)   180M samples     ~100% (baseline)
50%              90M samples      ~50% reduction
10%              18M samples      ~90% reduction
1%               1.8M samples     ~99% reduction
```

## Benefits

- **Lower memory**: Fewer data points stored
- **Better performance**: Less mutex contention
- **Scalability**: Support higher RPS
- **Flexibility**: Users tune accuracy vs resources

## Output Clarity

Mark sampled percentiles clearly in output:

```
## Single Request Latencies (sampled at 10%)

count=1800000, min=5.23ms, max=1234.56ms, mean=45.67ms
p50=42.31ms, p90=89.23ms, p95=112.45ms, p99=234.56ms, p99.9=567.89ms

Note: Percentiles calculated from 10% sample of 18M requests (1.8M samples)
```

## Auto-tuning (Future enhancement)

Could auto-adjust sampling based on RPS:

```rust
fn auto_sampling_rate(current_rps: f64) -> u64 {
    match current_rps {
        rps if rps < 5000.0   => 100,  // Track all
        rps if rps < 25000.0  => 50,   // Track half
        rps if rps < 50000.0  => 10,   // Track 10%
        _                     => 1,    // Track 1%
    }
}
```

## Testing

1. Run test with 100% sampling, get baseline percentiles
2. Run same test with 10% sampling
3. Compare percentiles - should be within 5% of baseline
4. Verify memory usage reduced
5. Performance test: Compare CPU usage with/without sampling

## Documentation

Update:
- `MEMORY_OPTIMIZATION.md` - Add sampling recommendations
- `README.md` - Document sampling configuration
- `LOAD_TEST_SCENARIOS.md` - Recommend sampling rates by scenario

## Related

- Issue #66 - PERCENTILE_TRACKING_ENABLED flag
- Issue #67 - Histogram rotation
- Issue #68 - Label limits
- Issue #69 - Memory metrics
- See `MEMORY_OPTIMIZATION.md` for full analysis

## References

- [Netflix Sampling Strategy](https://netflixtechblog.com/)
- [HdrHistogram Performance](https://github.com/HdrHistogram/HdrHistogram_rust)
- [Statistical Sampling Theory](https://en.wikipedia.org/wiki/Sampling_(statistics))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Memory] Add percentile sampling for high-RPS tests #70

Problem

Proposed Solution

Implementation Details

Configuration Examples

Statistical Validity

Memory Savings

Benefits

Output Clarity

Auto-tuning (Future enhancement)

Testing

Documentation

Related

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Memory] Add percentile sampling for high-RPS tests #70

Description

Problem

Proposed Solution

Implementation Details

Configuration Examples

Statistical Validity

Memory Savings

Benefits

Output Clarity

Auto-tuning (Future enhancement)

Testing

Documentation

Related

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions