diff --git a/.github/workflows/cla.yml b/.github/workflows/cla.yml index c0e1544d..a73b30c6 100644 --- a/.github/workflows/cla.yml +++ b/.github/workflows/cla.yml @@ -22,7 +22,7 @@ jobs: path-to-signatures: 'cla-bot/v1/cla.json' # branch should not be protected branch: 'main' - allowlist: user1,bot* + allowlist: user1,claude[bot],claude,bot* remote-organization-name: mlcommons remote-repository-name: systems diff --git a/kv_cache_benchmark/MLperf v3 KV cache proposal.md b/kv_cache_benchmark/MLperf v3 KV cache proposal.md index 345b94f3..37b845f2 100644 --- a/kv_cache_benchmark/MLperf v3 KV cache proposal.md +++ b/kv_cache_benchmark/MLperf v3 KV cache proposal.md @@ -1,1787 +1,2679 @@ - - -**Date:** November 5, 2025 -**Subject:** A detailed technical explanation of the `kv-cache.py` benchmark for system architects and performance engineers. - -**Authorship Note:** The benchmark architecture, scenario planning, and debugging were led by Hazem Awadallah decisions; AI tooling was used selectively to draft code under that direction. - ---- - -## 1. Introduction: Solving the LLM Memory Problem - -At the heart of an LLM's ability to understand context is the attention mechanism, which relies on a data structure called the KV Cache. During inference, LLMs generate text one token at a time in a process called autoregressive decoding. To generate the next token accurately, the model must consider all the preceding tokens in the sequence. - -Instead of wastefully re-calculating the contextual meaning of the entire sequence for every new token, the model uses the KV Cache. This cache stores the intermediate attention data, specifically, the "Key" and "Value" vectors, for every token already processed. When generating a new token, the model reuses these cached values, which dramatically reduces computation and speeds up response generation. - -The bottleneck emerges from the cache's memory consumption. The size of the KV Cache grows linearly with the length of the token sequence. For applications involving long context windows, such as multi-turn conversations or analyzing large documents, the cache can become enormous, quickly consuming the limited and expensive high-speed memory (VRAM) on a GPU - -This creates a critical system design challenge: **where do you store the KV cache?** Offloading it from expensive GPU VRAM to more abundant CPU RAM or even NVMe storage is a cost-effective solution, but it introduces latency. Moving data is always slower than accessing it locally. - -This benchmark was designed to solve this exact problem. It provides a sophisticated, configurable tool that allows system architects to **quantify the performance trade-offs of different storage tiers.** By simulating a realistic multi-tenant inference workload, it helps you answer critical questions: - -* How much GPU VRAM and CPU RAM do I need for my target user load? -* Is my NVMe drive fast enough to handle the spillover? -* What is the real-world latency impact of offloading to a specific tier? -* Where is the bottleneck in my system: the GPU, the CPU, or the storage? - -**How to Use This Benchmark Properly:** -This is not a simple "pass/fail" test. It's a diagnostic tool. -1. **Start with the `storage-only` workload.** This isolates your storage device and tells you its absolute performance limits. If your drive fails this test, it will be a bottleneck in any multi-tier configuration. -2. **Run the `cpu-storage` and `gpu-cpu-storage` tests.** These represent realistic production scenarios. Compare the latency and throughput to understand the value of each tier. -3. **Use the `autoscale` workload.** This is the most valuable test. It automatically finds the maximum number of concurrent users your specific hardware configuration can support before performance degrades unacceptably. Use this number to configure your production environment. - ---- - -## 2. Recommended Benchmark Invocations - -Here are the specific commands to run for a thorough analysis of your system, **validated through extensive discovery testing** (1,411 Fast system tests, 268 Slow system tests). These examples assume you are testing the `llama3.1-8b` model and have a cache directory at `/mnt/nvme`. - -> **Discovery Test Finding:** llama3.1-8b and mistral-7b showed the best storage tier differentiation (2.31x ratio). The 70b model is recommended for maximum per-request storage stress. - -### Step 1: Isolate and Test Storage Performance (Maximum Stress) - -This command uses **zero CPU RAM** to force all I/O to your NVMe drive. Discovery testing showed this configuration achieves **2.62x differentiation** in I/O volume metrics between fast and slow storage. - -```bash -# Test 1: Storage-Only Maximum Stress Workload -python3 kv-cache.py \ - --model llama3.1-8b \ - --num-users 200 \ - --duration 300 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 0 \ - --max-concurrent-allocs 16 \ - --generation-mode none \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output results_storage_stress.json -``` -**What to look for:** Check **Decode Bytes Read** and **Wall-Clock Throughput** (100% win rate in discovery testing). **⚠️ Do NOT use Storage Throughput** as your primary metric at cpu_mem=0GB—discovery testing showed it only differentiates storage tiers by 1.1x due to I/O time normalization effects. - -### Step 2: Test Storage Throughput (Traditional Metric) - -To use **Storage Throughput (tok/s)** as your primary metric, set cpu_mem=4GB. Discovery testing showed **2.2x differentiation** at this setting. - -```bash -# Test 2: Storage Throughput Test -python3 kv-cache.py \ - --model llama3.1-8b \ - --num-users 100 \ - --duration 300 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 4 \ - --max-concurrent-allocs 0 \ - --generation-mode none \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output results_storage_throughput.json -``` -**What to look for:** The **Storage Throughput** metric in the summary. This configuration provides the traditional tok/s benchmark metric with reliable differentiation. - -### Step 3: Realistic Multi-Tier Configuration - -This command simulates a production environment with a full three-tier hierarchy. It uses a larger, more realistic CPU memory budget and enables the GPU if available. - -```bash -# Test 3: Full Three-Tier Realistic Workload -# (Set --gpu-mem-gb to your available VRAM, or 0 if none) -python3 kv-cache.py \ - --model llama3.1-8b \ - --num-users 100 \ - --duration 300 \ - --gpu-mem-gb 16 \ - --cpu-mem-gb 32 \ - --generation-mode realistic \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output results_realistic_production.json -``` -**What to look for:** Compare the `end_to_end_latency_ms` from this test to the storage-only test. You should see a dramatic improvement. Also, check the `cache_hit_rate` and tier distribution (`gpu_entries`, `cpu_entries`, `nvme_entries`) to see how effectively your system is using the faster tiers. - -### Step 4: Discover Your System's Maximum User Load (QoS Mode) - -This command enables the default **Quality of Service (QoS)** autoscaler. It finds the optimal number of concurrent users your hardware can support *while maintaining acceptable latency*. It starts with a low user count and adds more users until the system's storage latency indicates it is becoming saturated. - -```bash -# Test 4: Autoscaling Discovery (QoS Mode) -# (Set --gpu-mem-gb to your available VRAM, or 0 if none) -python3 kv-cache.py \ - --model llama3.1-8b \ - --num-users 20 \ - --duration 300 \ - --gpu-mem-gb 16 \ - --cpu-mem-gb 32 \ - --enable-autoscaling \ - --autoscaler-mode qos \ - --generation-mode realistic \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output results_autoscaling_qos.json -``` -**What to look for:** The output JSON will contain an `autoscaling_stats` section. The last entry in this list will show the final, stable user count your system settled on. This is your evidence-based maximum user load for a latency-sensitive production environment. - -### Step 5: Discover Your System's Peak Throughput (Capacity Mode) - -This command uses the new **Capacity** autoscaler. Its goal is different: it ignores latency and aggressively adds users to find the absolute maximum I/O throughput (in tokens/sec) your storage hardware can sustain. This is the best way to measure the raw power of your drive. - -```bash -# Test 5: Autoscaling Discovery (Capacity Mode) -python3 kv-cache.py \ - --model llama3.1-70b-instruct \ - --num-users 10 \ - --duration 180 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 32 \ - --enable-autoscaling \ - --autoscaler-mode capacity \ - --generation-mode none \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output results_autoscaling_capacity.json -``` -**What to look for:** In the `autoscaling_stats` section, look for the `reason` field. The test finishes when it detects that throughput has stopped increasing. The final log will state `Peak capacity found`. The `peak_throughput` value associated with that step is the maximum performance of your storage device. Note the use of `--generation-mode none` to ensure the storage is the only bottleneck. - -### Trial Recommendations (Discovery-Validated) - -> **Discovery Finding:** Variance is high (CV 50-125% depending on configuration). Single runs cannot reliably differentiate storage tiers. - -| User Count | Variance (CV) | Minimum Trials | -|------------|---------------|----------------| -| 10 users | ~52% | 3 | -| 50-100 users | ~115-125% | 3-5 | -| 200 users | ~110-120% | 3-5 | - -For publication-quality results, run **5+ trials** and report the **median** rather than mean. - ---- - -## 3. Hardware Requirements - -To effectively run this benchmark and obtain meaningful results, your system should meet certain hardware specifications. The benchmark is flexible, but the quality of your results will depend on the hardware's capabilities, especially the storage subsystem. This is an enterprise storage test, and the recommendations reflect server-grade hardware. - -### Minimum Requirements -These specifications are sufficient to run the basic `storage-only` workload and validate the functionality of the benchmark with a low user count. - -* **CPU:** 8+ Core Server-Grade CPU (e.g., AMD EPYC, Intel Xeon Bronze/Silver) -* **System RAM:** 32 GB ECC RAM -* **GPU:** Not required. The benchmark can run in CPU-only mode (`--gpu-mem-gb 0`). -* **Storage:** 256 GB+ of free space on a data center-class SATA/SAS SSD. -* **Operating System:** A modern Linux distribution (e.g., Ubuntu 22.04, RHEL 9) is required for best performance and compatibility. - -### Recommended Specifications -These specifications are recommended for running the full suite of tests, including the `realistic` multi-tier and `autoscale` workloads with a high user count. This configuration will provide a robust analysis of your system's ability to handle a production-level inference load. - -* **CPU:** 32+ Core Server-Grade CPU (e.g., AMD EPYC 9354 "Genoa", Intel Xeon Gold/Platinum 4510+) -* **System RAM:** 128 GB ECC RAM or more. This allows for a significant CPU cache tier (`--cpu-mem-gb 64` or higher). -* **GPU:** An NVIDIA Data Center GPU (e.g., A100, H100) with 40GB+ of HBM. This is necessary to test the complete three-tier hierarchy at scale. -* **Storage:** 1 TB+ of free space on a high-performance, data center-class NVMe SSD (e.g., PCIe Gen4 or Gen5). The primary goal of this benchmark is to measure the performance of this tier. -* **Operating System:** A modern Linux distribution (e.g., Ubuntu 22.04, RHEL 9). - ---- - -## 4. Automating the Benchmark with `kv-cache-wrapper.sh` - -While you can run each test scenario manually using `kv-cache.py`, the repository includes a powerful wrapper script, `kv-cache-wrapper.sh`, to automate the entire process. This script is the recommended way to get a comprehensive performance profile of your system with minimal effort. - -The wrapper script will: -1. **Automatically detect your hardware:** It checks for available GPU(s), total CPU RAM, and the best path for storage testing. -2. **Calculate optimal parameters:** It determines reasonable user counts and memory budgets based on your hardware to ensure the tests are meaningful but not destructive. -3. **Run a full suite of 9 tests:** It executes a series of pre-configured benchmarks to compare every possible tier configuration and stress test the system. -4. **Generate a summary report:** After all tests are complete, it prints a detailed comparison table, allowing you to easily see the performance trade-offs for your specific hardware. - -### How to Use the Wrapper - -Running the script is simple. From your terminal, execute it directly. It's a good idea to pipe the output to a log file for later review. - -```bash -# Run the full benchmark suite with default settings -./kv-cache-wrapper.sh | tee benchmark_summary.log - -# Run the suite with a different model -./kv-cache-wrapper.sh -m llama3.1-70b-instruct - -# Run only specific workloads, like the production and autoscale tests -./kv-cache-wrapper.sh -w production,autoscale -``` - -The script runs the following nine scenarios automatically: - -* **Test 1: GPU Only:** A baseline for best-case latency, limited by VRAM. -* **Test 2: CPU Only:** A typical production setup using only system RAM. -* **Test 3: Storage Only:** Isolates the NVMe drive to measure its raw performance. -* **Test 4: GPU + CPU:** A two-tier configuration without storage spillover. -* **Test 5: CPU + Storage:** Simulates a budget-friendly setup with RAM and NVMe. -* **Test 6: GPU + CPU + Storage:** The full three-tier hierarchy for maximum capacity and performance. -* **Test 7: Storage Saturation:** A stress test to find the breaking point of your NVMe drive. -* **Test 8: Realistic Production:** A balanced, steady-state test mimicking a normal day. -* **Test 9: Autoscaling Discovery:** Automatically finds the maximum number of users your system can handle. - -At the end of the ~30-minute run, the script will output a detailed report comparing the throughput, latency, and cache distribution for each scenario, giving you a clear, evidence-based picture of how your system performs. - ---- - -## 5. A Look Under the Hood: How It Works - - In the KV cache benchmark, a **user request** is an `InferenceRequest` data structure that simulates a single interaction, or "turn," with a Large Language Model. - -* Each `InferenceRequest` object contains several key fields to model this interaction: - * **`context_tokens`**: The number of tokens in the user's prompt. This directly determines the size of the initial KV cache that needs to be written to storage in the "prefill" phase. - * **`generate_tokens`**: The number of tokens the model is asked to generate. In the "decode" phase, this influences how many times the existing KV cache is read from storage - * **`phase`**: The type of I/O operation, which can be `PREFILL` (write-heavy), `DECODE` (read-heavy), or a combination of both (`PREFILL_DECODE`) - * **`cache_key`**, **`conversation_id`**, and **`turn_number`**: These fields link requests to simulate multi-turn conversations, where the cache from a previous turn must be read to generate the next response. - -* Cache hit categories (e.g., `'system'`, `'common'`, `'multi_turn'`, `'user'`) are determined by a `cache_type` hint that the `process_requests` function passes to the `access_cache` method. This categorization is provided by the caller at the time of access, so `InferenceRequest` remains agnostic. - -* The benchmark measures two critical types of latency for each request: - * **Storage I/O Latency**: This metric measures the time elapsed from the moment a cache operation (`access_cache` for reads or `allocate_cache` for writes) is invoked until it returns. Critically, this duration includes not only the hardware I/O time but also all user-space software overhead within those functions, such as the CPU-intensive process of serializing or deserializing NumPy arrays. It does *not* include time the request spent waiting in the main application queue. - * **End-to-End Latency**: This is the total time the user experiences, measured from request creation (`submit_time`) to completion (`complete_time`). It is the sum of **Queue Wait Time** + **Storage I/O Latency** + **Token Generation Latency**. - -* A `UserSimulator` generates a mix of these requests based on different user "personas" (e.g., 'chatbot', 'coding') to create a realistic workload with varied prompt sizes and response lengths - -The benchmark uses these requests to simulate the two primary phases of inference, which have distinct I/O patterns: - -1. **Initial Prefill (Turn 1):** For the first request in a conversation, the benchmark generates a NumPy array for the user's `context_tokens` and writes it to a storage tier using the `MultiTierCache.allocate_cache` function. This is a single, write-heavy operation. - -2. **Subsequent Prefills (Turn > 1):** For the next turn in the same conversation, the process simulates loading the existing context before adding new information in a read-then-write pattern: - * **Read Previous Context:** The `process_requests` loop first performs a **read** operation. It calls `self.cache.access_cache` on the cache key from the *previous* turn (e.g., `conversation-ID_turn_1`) to simulate loading the conversational history. - * **Write New Context:** It then generates a new NumPy array for the *new* `context_tokens` of the current turn and performs a **write** operation by calling `self.cache.allocate_cache` with a new key (e.g., `conversation-ID_turn_2`) - - How are the KV cache entries are stored on the XFS file system? - -- a unique cache_key is generated for every request. -- The InferenceRequest class generates a key based on its context. For multiturn conversation its tied to a turn number. -- The key is then used to create a unique filepath, then the data is saved to that single file (per request). - -    def __post_init__(_self_): - -        _if_ _self_.cache_key is None: - -            _if_ _self_.conversation_id: - -                _self_.cache_key = f"{_self_.conversation_id}_turn_{_self_.turn_number}" - -            _else_: - -                _self_.cache_key = f"{_self_.user_id}_ctx" - -class NVMeBackend(StorageBackend): - -    def _get_path(_self_, _key_: str) -> Path: - -        """Constructs the file path for a given cache key.""" - -        _return_ _self_.base_path / f"{_key_}.npy" - -    def write(_self_, _key_: str, _data_: np.ndarray) -> StorageBackend.IOTiming: - -        path = _self_._get_path(_key_) - -        _with_ open(path, 'wb') _as_ f: - -            np.save(f, _data_, _allow_pickle_=False) - -### A. The Three-Tier Architecture: A Hierarchy of Speed - -The benchmark's core is the `MultiTierCache` class, which implements a classic three-tier memory hierarchy. The goal is to keep the "hottest" (most frequently accessed) data in the fastest tier (GPU) and the "coldest" data in the slowest but largest tier (NVMe). - -1. **Tier 1: GPU VRAM (`GPUMemoryBackend`)**: The fastest tier. Data is stored as PyTorch or CuPy tensors for near-instant access. Capacity is extremely limited and expensive. -2. **Tier 2: CPU RAM (`CPUMemoryBackend`)**: The "warm" tier. Data is stored as NumPy arrays in system memory. It's an order of magnitude slower than VRAM but much larger and cheaper. -3. **Tier 3: NVMe Storage (`NVMeBackend`)**: The "cold" tier. Data is written to `.npy` files on disk. It offers massive capacity at the lowest cost but with the highest latency. - -**How Data Placement is Decided (`allocate_cache`):** -When a new KV cache entry needs to be created (during the "prefill" phase), the benchmark follows a simple, top-down logic: - -```python -# From kv-cache.py, inside MultiTierCache.allocate_cache -with self.memory_lock: - # Tier 1: GPU. Check if there's space in the GPU budget (with a 20% buffer). - if 'gpu' in self.backends and self.gpu_memory_used + size_bytes < self.gpu_memory_limit * 0.8: - self.gpu_memory_used += size_bytes - allocated_tier = 'gpu' - # Tier 2: CPU. Check if there's space in the CPU budget. - elif self.cpu_memory_used + size_bytes < self.cpu_memory_limit * 0.8: - self.cpu_memory_used += size_bytes - allocated_tier = 'cpu' - # Tier 3: NVMe. If no space in RAM, offload to disk. - else: - allocated_tier = 'nvme' -``` - -**Real-World Implication:** This logic simulates how a real inference server would operate. It prioritizes the fastest memory available. If you configure the benchmark with a small GPU and CPU memory budget, you are forcing data to spill over to the NVMe drive, allowing you to measure the performance penalty of that spillover. - - - -### B. Memory Clamps: The 80% Rule - -You'll notice the `* 0.8` in the allocation logic. This is a crucial design choice. The benchmark intentionally leaves a **20% headroom** on both the GPU and CPU memory limits. - -**Why?** -This prevents the system from running completely out of memory, which can cause crashes, operating system swapping (thrashing), or out-of-memory (OOM) errors. It ensures that there is always a small buffer available for system processes and other application needs. - -**Real-World Implication:** This is a best practice in production systems. You never want to run your memory at 100% utilization. The 80% rule provides stability and ensures that performance remains predictable. When sizing your own hardware, you should apply a similar rule: if you calculate that you need 64 GB of RAM, you should provision at least 80 GB. - -### C. Latency Calculation: User Experience vs. Hardware Speed - -The benchmark reports two different types of latency, and the distinction is critical. - -``` -+--------------------------------+ -| Application (kv-cache.py) [1] | -| - Request Queue (piles up) --------> "Queue Wait" is the dominant latency component [4] -| - Multiple Worker Threads | -+--------------------------------+ - | - | A single worker thread grabs one request. - v -+--------------------------------+ -| Worker Thread & Sync I/O | -| Issues 1 x LARGE, BLOCKING | -| Write/Read (e.g., 1 GB) | -| | -| [SUBMISSION QD = 1] | --------> The thread BLOCKS and WAITS for completion. -+--------------------------------+ - | - | OS receives the single large request. - v -+--------------------------------+ -| Kernel / OS | -| Performs "I/O Splitting" | --------> Splits 1 large I/O into hundreds of small ones. -+--------------------------------+ - | - | Drive receives hundreds of small requests. - v -+--------------------------------+ -| NVMe Storage Device | -| [DEVICE QD > 700] [3] | --------> The drive is heavily utilized in a short burst. -| Processes many small I/Os | -| in parallel | -+--------------------------------+ - -``` - -1. **Storage I/O Latency:** This is the pure hardware time. It measures the time taken for a read or write operation to complete on a specific tier, **excluding any queue wait time.** It is accumulated within the `process_requests` loop every time `self.cache.access_cache` or `self.cache.allocate_cache` is called. - -2. **End-to-End Latency:** This is the total time the user waits. It is measured from the moment a request is created (`submit_time`) to the moment it is finished (`complete_time`). It is the sum of **Queue Wait Time + Storage I/O Latency + Generation Latency.** - -**Real-World Implication:** -* **Storage I/O Latency** tells you how good your hardware is. A low number means your drive is fast. -* **End-to-End Latency** tells you how good your system architecture is. A high number, even with a fast drive, indicates a bottleneck elsewhere—most commonly, in the request queue. As seen in the provided logs, the queue wait time can be orders of magnitude larger than the storage latency, proving that the system is overloaded. - -### D. Validating Latency with Block Tracing: Application vs. Hardware - -As discussed in the previous section, the total **End-to-End Latency** is the sum of *Queue Wait Time* and *Storage I/O Latency*. The analysis below focuses on dissecting the *Storage I/O Latency* component, as this is where a crucial software bottleneck is revealed. - -A common and important question is why the benchmark's "Storage I/O Latency" can be seconds long, even on a high-performance NVMe drive, while low-level tools like `btrace` show the drive is responding in milliseconds. This discrepancy is not an error; it is a key finding that reveals a crucial software bottleneck. - -The two tools are measuring latency at different layers of the system: - -1. **Application-Level I/O Latency (The Benchmark's Metric):** This is the total time spent inside the `NVMeBackend.read()` or `write()` methods in Python. This includes not only the time waiting for the disk, but also all associated software overhead, most notably the CPU-intensive process of serializing (saving) or deserializing (loading) the Python data structures (NumPy arrays) to and from a binary format on disk. - -2. **Hardware-Level I/O Latency (`btrace`'s Metric):** This is the pure hardware time. It measures the time from when an I/O request hits the Linux block layer until the physical NVMe drive signals that the operation is complete. This is the true speed of your storage device. - -#### Case Study: Analyzing the Discrepancy with Real Data - -Let's examine the results from a real test run to see this in action. - -* **From the Benchmark Log (`mlperf_log_run4.txt`):** - The benchmark reports a P95 NVMe read latency of **12.39 seconds**. - ``` - ### TIER-SPECIFIC LATENCIES ### - NVME Read P95: 12390.15 ms - ``` - -* **From the Block Trace Log (`btrace_analysis_btrace_read.txt`):** - In contrast, a `btrace` analysis of the same workload shows the P95 hardware read latency was only **9.74 milliseconds**. - ``` - D2C Latency Analysis: ... Latency (ms) 9.74 - ``` - -**The Analysis:** - -The massive difference between these two numbers exposes the software overhead. - -| Metric | Source | Time | -| :---------------------------- | :-------------- | :------------ | -| **Total Application Latency** | Benchmark Log | **12,390 ms** | -| **Actual Hardware Latency** | `btrace` Log | **~10 ms** | -| **Software Overhead (CPU Serialization)** | (Difference) | **~12,380 ms**| - -This clearly shows that for a P95 read operation, **over 99.9% of the time was spent in the CPU-bound `numpy.load()` function**, deserializing the data. The physical drive responded in under 10 milliseconds. - -**Conclusion:** The `btrace` logs confirm the storage hardware is not the problem. The benchmark is correctly revealing a significant software bottleneck in the Python-based I/O path. A real-world, high-performance inference engine written in C++ or using technologies like GPUDirect Storage would aim to minimize or eliminate this CPU serialization step, resulting in application latency much closer to the hardware latency shown in `btrace`. This is a key finding of the benchmark: it successfully models not just the storage hardware's performance, but also the overhead of the software stack used to access it. - -### E. So, How Should You Interpret the Latency Numbers? - -Given the different layers of latency, here is a simple guide to interpreting the results: - -* **Use `End-to-End Latency` to judge User Experience.** This is the total time a user has to wait for a response. If this number exceeds your Service Level Agreement (SLA), your system is too slow for its workload, regardless of the reason. - -* **Use `Queue Wait Time` to diagnose Overload.** If this number is high (or makes up a large portion of the End-to-End Latency), it is a clear sign that your system is receiving requests faster than it can process them. The bottleneck is system capacity. - -* **Use `Storage I/O Latency` to evaluate the Application's I/O Path.** This number tells you the performance of your Python storage backend. If this number is high, it indicates a bottleneck in the software layer (like CPU serialization), as demonstrated in the case study above. - -* **Use `btrace` (Hardware Latency) to evaluate the Physical Drive.** This number tells you the true speed of your NVMe device. If this number is low, your storage hardware is performing well. - -In short, `btrace` checks the disk, `Storage I/O Latency` checks the application's I/O efficiency, `Queue Wait Time` checks for system overload, and `End-to-End Latency` checks the final user experience. - -### F. QoS Classes: Prioritizing Users - -Not all inference requests are created equal. A user interacting with a chatbot needs an instant response, while a batch job summarizing a document can wait. The benchmark models this with three Quality of Service (QoS) levels defined in the `QoSLevel` enum. - -### G. The MLPerf Storage Submission: Finding the Breaking Point - -The "MLPerf Storage" tests included in the wrapper script are designed to do one thing: find the absolute performance limit of the system by intentionally overloading it. When looking at the results, it's common to see extremely high latency numbers, which might seem alarming. However, in the context of a benchmark, this is not only expected, it is a sign of a successful test. - -This state, often called "thrashing," is when the system is receiving requests so much faster than it can process them that it spends most of its time managing the backlog. This is the most demanding scenario for a storage subsystem. - -#### Case Study: Interpreting a "Thrashing" Result - -Let's analyze the provided results for the 8B model submission: - -``` -End-to-end latency: mean 317.96s, P50 322.10s, P95 635.48s -Approximate mean queue wait: 274.08s -Storage I/O latency: mean 37.04s, P95 138.50s -Potential bottlenecks: - - Queue wait dominates (~274.08s mean). -``` - -**The Analysis:** - -1. **The System is Overloaded:** The most telling metric is the `Approximate mean queue wait` of **274 seconds**. This means that, on average, a request spent over 4.5 minutes waiting in a queue before the system even began to process it. - -2. **The Bottleneck is System Capacity:** The fact that queue wait time accounts for ~86% of the total end-to-end latency (274s out of 318s) is a definitive sign that the system as a whole cannot keep up with the request rate. - -3. **The Storage is Under Extreme Stress:** Even after the long wait, the P95 `Storage I/O latency` is over two minutes (138.5s). As established previously, this is mostly due to application-level overhead, but it demonstrates the immense pressure on the I/O path. The system is desperately reading and writing from the NVMe drive to serve the KV cache for many concurrent users. - -**Why This is a Good Benchmark Result:** - -This is a valuable result precisely *because* it pushed the system to failure. - -* **It finds the true bottleneck:** The test proves that under heavy load, the primary bottleneck isn't just the disk, but the system's overall capacity to handle concurrent requests, leading to massive queue times. -* **It validates the storage:** Despite the system thrashing, the storage subsystem continued to operate and serve terabytes of I/O without failing. This is the goal of the MLPerf Storage test: to certify that the storage solution is robust enough to handle a worst-case "denial-of-service" style workload. -* **It measures maximum throughput:** The reported `312.2 tok/s` is the throughput the system could sustain while being completely saturated. This represents the performance floor under maximum stress. - -In conclusion, the MLPerf submission is not measuring performance under ideal conditions. It is a stress test designed to find the breaking point, and the resulting high latency numbers are a clear and useful indicator of where that breaking point is. - -```python -# From kv-cache.py -class QoSLevel(Enum): - INTERACTIVE = "interactive" # Highest priority, for real-time applications (e.g., chatbot UI). - RESPONSIVE = "responsive" # High priority, for near real-time tasks. - BATCH = "batch" # Low priority, for offline processing. -``` - -Each QoS level has a Service Level Agreement (SLA) with a target P95 latency. The benchmark uses a `PriorityQueue` to ensure that `INTERACTIVE` requests are always processed before `BATCH` requests, simulating how a real production scheduler would work. - -**Real-World Implication:** This feature allows you to test whether your hardware can meet the strict latency demands of high-priority users while still processing a background load of low-priority tasks. - -### F. Autoscaling: Finding Your System's True Limit - -The `WorkloadAutoscaler` is perhaps the most powerful feature of the benchmark. Instead of guessing the number of users or throughput your system can handle, it finds it automatically using one of two modes, selectable with the `--autoscaler-mode` flag. - -#### Mode 1: `qos` (Quality of Service) - -This is the default mode, designed for system architects tuning a **production environment**. Its goal is to find the maximum number of users the system can support while keeping latency low to ensure a good user experience. - -**How it works:** -1. The `StorageMonitor` periodically collects key performance indicators (KPIs), primarily P95 read latency from the storage tiers. -2. It uses these KPIs to calculate a `saturation` score from 0.0 (idle) to 1.0 (fully saturated). A key heuristic is rising latency. -3. The `WorkloadAutoscaler` compares this saturation score to a target (defaulting to `0.8`, or 80%). - * If saturation is too low, it increases the number of simulated users. - * If saturation is too high, it decreases the number of users. - * It includes a "cooldown" period after a scale-down to allow the system to stabilize. - -**Real-World Implication:** This mode allows you to provision your hardware with confidence. By running this test, you can determine the maximum safe user load for your specific server configuration and use that number to set the limits in your production load balancer, ensuring good performance. - -#### Mode 2: `capacity` (Peak Throughput) - -This mode is designed for hardware vendors and performance engineers who want to find the **absolute peak throughput** of a storage device, ignoring user-facing latency. - -**How it works:** -1. The autoscaler starts with a low user count. -2. It aggressively doubles, then increases the user count by 1.5x in stages, monitoring the total `tokens/sec` throughput at each stage. -3. When it detects that adding more users causes the throughput to *decrease* (meaning the point of diminishing returns has been passed), the test concludes. -4. The result is the highest throughput measured before the drop. - -**Real-World Implication:** This is the purest test of raw hardware performance. By combining it with `--generation-mode none`, you can remove all other bottlenecks and measure the maximum I/O your storage can deliver. This is invaluable for comparing the performance of different SSDs in an "apples-to-apples" test. - -### G. RAG Workflow: Simulating Modern Workloads - -Retrieval-Augmented Generation (RAG) is a popular technique where an LLM's context is "augmented" with relevant documents. This creates a unique I/O pattern that the benchmark simulates with the `RAGDocumentManager`. - -**How it works:** -1. **Ingestion (`ingest_document`):** The benchmark simulates the "ingestion" of large documents by splitting them into chunks and pre-calculating and storing the KV cache for each chunk across the three-tier hierarchy. -2. **Retrieval (`retrieve_chunks`):** When a RAG query is simulated, the benchmark retrieves the `top_k` most relevant chunks. This simulates a vector database lookup. -3. **Inference:** The retrieved chunks are then used as the context for the LLM, which involves reading the pre-calculated KV cache for each chunk from storage. - -**Real-World Implication:** RAG workloads place immense stress on the storage system because they involve loading very large contexts (many document chunks) into memory at the start of a request. This feature allows you to test whether your storage can handle the bursty, high-throughput read demands of a RAG-based application. - -### H. Generation Mode: Simulating GPU Backpressure - -A storage benchmark for LLM inference would be incomplete if it only measured I/O. In a real system, the GPU is constantly performing computations to generate the next token. This computation time creates **backpressure** on the I/O subsystem. The benchmark cannot make another I/O request until the GPU is finished with its current work. Without simulating this, the benchmark would flood the storage with requests at an unrealistic rate. - -The `--generation-mode` flag controls this simulation by adding a small `time.sleep()` for each token generated. - -```python -# From kv-cache.py -class GenerationMode(Enum): - NONE = "none" # Pure storage benchmark. No simulated sleep. Latency is 100% I/O. - FAST = "fast" # Simulates a very fast GPU (2ms/token) to model some backpressure. - REALISTIC = "realistic" # Simulates a realistic GPU (30ms/token) for end-to-end latency analysis. - -GENERATION_TIMING = { - GenerationMode.NONE: 0.0, - GenerationMode.FAST: 0.002, - GenerationMode.REALISTIC: 0.030, -} -``` - -**How These Values Were Derived:** - -* **`none` (0 ms/token):** This is for pure storage hardware validation. It removes all simulated GPU processing time to measure the absolute maximum I/O throughput the storage can handle. This mode is useful for finding the raw performance of a drive but does not represent a real-world LLM serving scenario. - -* **`realistic` (30 ms/token):** This is the most important mode for system-level testing and is **required for MLPerf submissions**. The 30ms value was derived from empirical measurements of modern data center GPUs (like the NVIDIA A100 or H100) running medium-sized models (7B-8B parameters). This latency corresponds to a generation speed of approximately **33 tokens per second**, which is a standard and widely accepted performance figure for these models in production. Using this mode ensures the benchmark paces its I/O requests at a rate that a real GPU could sustain. - -* **`fast` (2 ms/token):** This mode simulates a very high-performance or next-generation accelerator, capable of generating **500 tokens per second**. It is useful for modeling "what-if" scenarios where the GPU is so fast that it is almost never the bottleneck, thereby placing maximum stress on the memory and storage hierarchy. - -**Real-World Implication:** For any test that aims to measure system-level performance (like the `realistic` or `autoscale` workloads), you must use `--generation-mode realistic`. Failure to do so will result in misleadingly high throughput numbers and will not accurately represent the performance of a balanced, production-ready system. - ---- - -### I. Shared System Prompts and Prefix Reuse - -Most chat products send the same “system prompt” (for example, *“You are a helpful assistant.”*) before every user message. In real deployments the platform tries to reuse that prompt instead of regenerating it every time: - -1. The first conversation runs the full prefill step and stores the prompt’s KV cache in fast memory (GPU or CPU). -2. Later conversations look up that stored block. If it is still around, they read it and skip the extra work. If it has been evicted, they rebuild it and store it again. - -The benchmark copies that pattern with three simple pieces: - -* **Detect:** `PrefixMatcher` pretends ~20 % of requests start with one of three common prompts. It hashes the text so everyone shares the same key (`kv_system_`). -* **Count reuse attempts:** `PrefixCacheManager` records how often the matcher sees the prompt. The `system_prompt_reuse` counter therefore means “we spotted the pattern,” even if the cache entry is missing. -* **Count real hits:** `MultiTierCache.access_cache` tries to read the shared key. If the block exists, `system_prompt_hits` increments. If not, the request falls back to a normal prefill. - -In the summary you will see both numbers. A high reuse count with few hits simply says the prompt was detected but the stored copy had already been evicted, just like what operators watch for in production. - -### J. ShareGPT Replay: Realistic Workload Simulation - -While synthetic workloads (using random token counts within a range) are excellent for controlled stress testing, they may not fully capture the nuances of human-AI interaction. The **ShareGPT Replay** feature addresses this by loading real conversation trees from the ShareGPT dataset. - -**How it works:** -1. **Ingestion:** The `ShareGPTDatasetLoader` parses a JSON dataset of real conversations. It uses a tokenizer to calculate the exact `context_tokens` (user prompt) and `generate_tokens` (model response) for every turn. -2. **Replay:** Instead of generating random requests, the benchmark feeds these real token counts into the `InferenceRequest` queue. -3. **Structure Preservation:** Crucially, it preserves the multi-turn structure of the data. Request 2 is guaranteed to be a follow-up to Request 1, testing the `MultiTierCache`'s ability to handle real conversational locality. - -**Case Study: Analyzing ShareGPT Results** -Running a replay with the `llama3.1-70b-instruct` model on a memory-constrained system (2GB CPU RAM) reveals bottlenecks often hidden by uniform random distributions. - -* **High Cache Hit Rate (97.2%):** Real conversations exhibit high locality. Users ask follow-up questions, allowing the system to reuse the KV cache effectively. -* **NVMe Read Latency Spikes (291ms P95):** Unlike synthetic tests which might average around a mean, real user inputs vary wildly. A single request with a 16k token context can saturate the read bandwidth, pushing the P95 latency above the 200ms target, resulting in a "FAIL" assessment for storage even if throughput is high. - -**Sample Output Summary:** -```text -### STORAGE PERFORMANCE ASSESSMENT: FAIL ✗ ### - Criteria Passed: 3/4 - ✓ NVMe Write P95 < 500ms: 54.50ms - ✗ NVMe Read P95 < 200ms: 291.11ms (Target: 200ms) - ✓ Cache Hit Rate > 30%: 97.2% - -### CACHE TIER DISTRIBUTION ### - GPU Entries: 0 (0.00 GB) - CPU Entries: 156 (1.60 GB) - NVMe Entries: 1772 (92% of cache on slow storage) -``` - -### K. The Importance of Realism: A Comparative Case Study - -To illustrate why workload realism matters, we compared two runs of the benchmark on identical hardware (50 users, 70B model, NVMe-only cache). - -**Run A: Real Workload (ShareGPT)** -This run uses the actual conversation data, reflecting human usage patterns. -```bash -python3 kv-cache_sharegpt_replay.py \ - --model llama3.1-70b-instruct \ - --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json \ - --gpu-mem-gb 0 --cpu-mem-gb 2 --cache-dir /mnt/nvme \ - --num-users 50 --duration 300 --generation-mode none -``` - -**Run B: Synthetic Workload (Random)** -This run omits the dataset, causing the benchmark to fall back to generating random, full-length contexts. This represents a "worst-case" scenario (e.g., massive document processing) rather than a chat workload. -```bash -python3 kv-cache_sharegpt_replay.py \ - --model llama3.1-70b-instruct \ - --gpu-mem-gb 0 --cpu-mem-gb 2 --cache-dir /mnt/nvme \ - --num-users 50 --duration 300 --generation-mode none -``` - -The results were dramatically different: - -| Metric | Run A: ShareGPT (Real) | Run B: Synthetic (Random) | Difference | -| :--- | :--- | :--- | :--- | -| **Workload Type** | Human Conversations | Random Large Contexts | | -| **Mean Context Size** | **133 tokens** (~41 MB) | **2,676 tokens** (~836 MB) | **20x Larger Data** | -| **Throughput** | **2,610 tok/sec** | **362 tok/sec** | **7.2x Slower** | -| **NVMe Read P95** | **291 ms** | **6,752 ms** (6.7s) | **23x Slower** | -| **End-to-End P50** | 93 ms | 121,158 ms (2 min) | **System Collapse** | - -**Key Findings:** -1. **Context Size Explosion:** Real human queries are concise (avg 133 tokens). The synthetic generator, aiming for coverage, produced contexts averaging 2,676 tokens. This forced the storage system to read/write **20x more data per request** in the synthetic run. -2. **System Collapse:** In the synthetic run, the P50 end-to-end latency ballooned to **2 minutes**, while the storage latency was only ~4 seconds. This indicates the system was in a state of **thrashing**, where requests spent 95% of their time waiting in the queue because the storage was saturated handling massive files. -3. **Cache Efficiency:** Real conversations have high locality (85.9% multi-turn hit rate) because users ask follow-up questions. The synthetic run had a much lower hit rate (60.1%), further stressing the storage. - -**Conclusion:** Run A represents a realistic chatbot application, where the NVMe drive is nearly sufficient. Run B represents a worst-case scenario, proving that for such heavy workloads, the current hardware configuration is inadequate. - ---- - -## 6. Current Work: Validating Simulation Accuracy with vLLM - -The primary goal of `kv-cache.py` is to provide a reliable *simulation* of a multi-tiered KV Cache system. But how do we know the simulation is accurate? We must validate it against a real-world, high-performance inference engine. For this, we use **vLLM**, a state-of-the-art LLM serving library. - -Our validation process is divided into two essential steps: - -1. **Baseline Validation (GPU-Only):** First, we establish a performance baseline by running both `kv-cache.py` and vLLM in a GPU-only configuration. This test ensures that the core token generation logic of the simulator is accurate when no memory offloading occurs. -2. **Offloading Validation (GPU + CPU):** Second, we validate the primary feature of the benchmark: cache offloading. We configure both tools with limited GPU memory to force the KV cache to spill into CPU RAM, and then we compare the performance impact. - -The pass/fail criterion for both steps is the same: the **tokens per second** reported by `kv-cache.py` should be within **±5%** of the tokens per second reported by vLLM's benchmark tool. - -### Step 1: Baseline Validation (GPU-Only) - -In this step, we configure both tools to use a small model and a low user count, ensuring all KV cache data remains within the GPU's VRAM. This isolates the performance of the GPU and the core generation loop. - -**A. `kv-cache.py` Command (GPU-Only):** - -We run the benchmark with a high GPU memory budget and zero CPU/NVMe budget. This forces all allocations into the `GPUMemoryBackend`. Using a fixed seed ensures the workload is identical for comparison. - -```bash -# Validation Step 1: Run kv-cache.py in GPU-only mode -python3 kv-cache.py \ - --model llama3.1-8b \ - --num-users 10 \ - --duration 120 \ - --gpu-mem-gb 24 \ - --cpu-mem-gb 4 \ - --generation-mode deterministic \ - --seed 42 \ - --output validation_kv_cache_gpu_only.json -``` - -**B. vLLM Command (GPU-Only):** - -We run vLLM's offline benchmark without providing any swap space. This ensures vLLM does not offload any cache data to the CPU. The `--num-prompts` should match the `--num-users` from the `kv-cache.py` command. If you haven't already, you can install vLLM with pip: -```bash -pip install vllm -``` - -Now, run the vLLM benchmark: -```bash -# Validation Step 1: Run vLLM benchmark in GPU-only mode -python3 -m vllm.entrypoints.cli.main bench throughput \ - --model meta-llama/Llama-3.1-8B \ - --dataset-name random \ - --num-prompts 10 \ - --input-len 1024 \ - --output-len 1024 -``` - -**C. Compare Results:** - -Compare the `total_tokens_per_sec` from `validation_kv_cache_gpu_only.json` with the `total tokens/s` from the vLLM output. They should be within 5% of each other. - -### Step 2: Offloading Validation (GPU + CPU) - -Here, we validate the simulator's main purpose: measuring the performance impact of cache offloading. We reduce the available GPU memory to force both `kv-cache.py` and vLLM to use CPU RAM as a secondary cache tier. - -**A. `kv-cache.py` Command (GPU + CPU):** - -We reduce the GPU memory budget to force allocations to spill over to the `CPUMemoryBackend`. - -```bash -# Validation Step 2: Run kv-cache.py with GPU-to-CPU offloading -python3 kv-cache.py \ - --model llama3.1-8b \ - --num-users 20 \ - --duration 120 \ - --gpu-mem-gb 8 \ - --cpu-mem-gb 32 \ - --generation-mode deterministic \ - --seed 42 \ - --output validation_kv_cache_offload.json -``` - -**B. vLLM Command (GPU + CPU):** - -We use the `--swap-space` argument to tell vLLM to allocate a KV cache in CPU RAM. The user count is increased to ensure this space is utilized. - -```bash -# Validation Step 2: Run vLLM benchmark with GPU-to-CPU offloading -python3 -m vllm.entrypoints.cli.main bench throughput \ - --model meta-llama/Llama-3.1-8B \ - --dataset-name random \ - --num-prompts 20 \ - --input-len 1024 \ - --output-len 1024 \ - --swap-space 16 -``` - -**C. Compare Results:** - -Again, compare the `total_tokens_per_sec` from `validation_kv_cache_offload.json` with the `total tokens/s` from the vLLM output. A successful validation will see the results within the ±5% margin, confirming that `kv-cache.py` accurately models the performance penalty of offloading. - -### Hardware & Software Requirements for Validation - -To run this validation, you will need: -* **Hardware:** An NVIDIA GPU with at least 16 GB of VRAM and Compute Capability 7.0+ (e.g., V100, T4, A100, RTX 30/40 series). -* **Environment:** A Linux environment (or WSL 2 on Windows). -* **Software:** Python 3.10+, PyTorch, and vLLM installed (`pip install vllm`). - ---- - -## 7. MLPerf v3.0 Submission Guidelines - -For submitting official results to the MLPerf v3.0 benchmark, it is critical to use a standardized, repeatable methodology that isolates the component being tested. When evaluating a storage device's capability for KV cache offloading, the goal is to measure the performance of the storage subsystem under a consistent and saturating load, even on systems without a high-end GPU. - -### Discovery Test Validation Summary - -*Analysis Date: 2026-01-09 | Datasets: 1,411 Fast system tests, 268 Slow system tests* - -Before finalizing these submission guidelines, extensive discovery testing was performed comparing a Fast bare-metal system (14,000 MB/s NVMe) against a Slow virtualized system (3,000 MB/s storage). Key findings that informed the recommendations below: - -| Finding | Details | Impact on Recommendations | -|---------|---------|---------------------------| -| **Storage tier differentiation** | 2.1x-2.6x ratio achieved across all metrics | Benchmark successfully differentiates storage tiers | -| **Metric selection depends on cpu_mem** | Storage Throughput shows only 1.1x at cpu_mem=0GB but 2.2x at cpu_mem=4GB | Different metrics recommended for different configurations | -| **Best differentiation models** | llama3.1-8b and mistral-7b show 2.31x ratio | Recommended for standard submissions | -| **High variance observed** | CV 50-125% depending on configuration | Multiple trials required (minimum 3-5) | -| **100% win rate metrics** | Decode Bytes Read and Wall-Clock Throughput at cpu_mem=0GB | Most reliable for storage stress testing | - -### Recommended Invocations for Storage Submission - -Based on discovery testing, two complementary approaches are recommended depending on your benchmarking goal: - ---- - -#### Option 1: Maximum Storage Stress (cpu_mem=0GB) - -**Use when:** You want to stress test NVMe and measure I/O volume differentiation. - -**Primary Metrics:** Decode Bytes Read (2.62x differentiation, 100% win rate), Wall-Clock Throughput (2.43x differentiation, 100% win rate) - -**⚠️ Important:** Do NOT use Storage Throughput as your primary metric at cpu_mem=0GB—it shows only 1.1x differentiation due to I/O time normalization effects. See "Understanding Metric Behavior" below. - -##### Standard Submission: `llama3.1-8b` (Maximum Storage Stress) - -```bash -# MLPerf v3.0: Maximum Storage Stress Test (8B Model) -# Run 3-5 trials for statistical significance -for trial in 1 2 3 4 5; do - python3 kv-cache.py \ - --model llama3.1-8b \ - --num-users 200 \ - --duration 300 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 0 \ - --max-concurrent-allocs 16 \ - --generation-mode none \ - --seed 42 \ - --output mlperf_v3_stress_8b_trial${trial}.json -done -``` - -##### Large Model Submission: `llama3.1-70b-instruct` (Maximum Per-Request Stress) - -The 70B model generates ~10x more storage I/O per token, ideal for high-bandwidth storage systems: - -```bash -# MLPerf v3.0: Maximum Storage Stress Test (70B Model) -for trial in 1 2 3; do - python3 kv-cache.py \ - --model llama3.1-70b-instruct \ - --num-users 70 \ - --duration 300 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 0 \ - --max-concurrent-allocs 4 \ - --generation-mode none \ - --seed 42 \ - --output mlperf_v3_stress_70b_trial${trial}.json -done -``` - ---- - -#### Option 2: Storage Throughput Focus (cpu_mem=4GB) - -**Use when:** You want Storage Throughput (tok/s) as your primary metric—the traditional benchmark metric. - -**Primary Metric:** Storage Throughput (2.2x differentiation, 97% win rate at cpu_mem=4GB) - -##### Standard Submission: `llama3.1-8b` (Storage Throughput) - -```bash -# MLPerf v3.0: Storage Throughput Test (8B Model) -# Run 3-5 trials for statistical significance -for trial in 1 2 3 4 5; do - python3 kv-cache.py \ - --model llama3.1-8b \ - --num-users 100 \ - --duration 300 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 4 \ - --max-concurrent-allocs 0 \ - --generation-mode none \ - --seed 42 \ - --output mlperf_v3_throughput_8b_trial${trial}.json -done -``` - -##### Large Model Submission: `llama3.1-70b-instruct` (Storage Throughput) - -```bash -# MLPerf v3.0: Storage Throughput Test (70B Model) -for trial in 1 2 3; do - python3 kv-cache.py \ - --model llama3.1-70b-instruct \ - --num-users 50 \ - --duration 300 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 4 \ - --max-concurrent-allocs 4 \ - --generation-mode none \ - --seed 42 \ - --output mlperf_v3_throughput_70b_trial${trial}.json -done -``` - ---- - -#### Option 3: Realistic Production Simulation - -**Use when:** You want to simulate realistic inference timing including GPU backpressure. - -```bash -# MLPerf v3.0: Realistic Production Workload -python3 kv-cache.py \ - --model llama3.1-8b \ - --num-users 50 \ - --duration 300 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 4 \ - --max-concurrent-allocs 4 \ - --generation-mode realistic \ - --seed 42 \ - --output mlperf_v3_realistic_8b.json -``` - ---- - -### Understanding Metric Behavior by cpu_mem Setting - -Discovery testing revealed a critical insight: **the choice of primary metric depends on your cpu_mem setting**. - -#### Why Storage Throughput is Misleading at cpu_mem=0GB - -At cpu_mem=0GB, both fast and slow systems are 100% I/O-bound—every token requires NVMe access. This creates a normalization effect: - -| System | Decode Bytes Read | Total I/O Time | Storage Throughput | -|--------|-------------------|----------------|-------------------| -| Fast | 1,195 GB | ~8,000 s | 9.53 tok/s | -| Slow | 447 GB | ~7,100 s | 8.50 tok/s | -| **Ratio** | **2.62x** | **1.13x** | **1.12x** | - -The Fast system reads **2.62x more bytes** but accumulates **more I/O time** (because more operations). These effects cancel out in Storage Throughput, hiding the true performance difference. - -#### Recommended Metrics by Configuration - -| cpu_mem | Primary Metric | Differentiation | Win Rate | Notes | -|---------|----------------|-----------------|----------|-------| -| **0 GB** | Decode Bytes Read | **2.62x** | **100%** | Measures total storage work done | -| **0 GB** | Wall-Clock Throughput | **2.43x** | **100%** | Measures real-world tokens/sec | -| **0 GB** | Storage Throughput | 1.12x | 62% | **NOT RECOMMENDED** (misleading) | -| **4 GB** | Storage Throughput | **2.23x** | **97%** | Traditional metric works at this setting | -| **4 GB** | Decode Bytes Read | 2.06x | 100% | Also valid secondary metric | - -### Key Parameters Explained - -* `--num-users`: Discovery testing showed differentiation remains stable (~2.1x-2.2x) across 10-200 users. Higher counts (150-200) maximize aggregate throughput. The 70B model uses fewer users due to larger per-user memory footprint. -* `--duration 300`: A 5-minute duration provides stable metrics. For official submissions, 10 minutes (600s) recommended. -* `--gpu-mem-gb 0`: **Critical for storage-focused testing.** Ensures no GPU memory allocation, isolating storage performance. -* `--cpu-mem-gb`: Choose based on your metric goal: - - **0 GB**: Maximum storage stress, use Decode Bytes Read or Wall-Clock Throughput - - **4 GB**: Traditional benchmarking, use Storage Throughput -* `--max-concurrent-allocs`: Controls allocation parallelism. Discovery showed optimal values are 0 (unlimited) for throughput metrics, 16 for stress testing. -* `--generation-mode`: - - **none**: Pure I/O benchmark, no token generation delay. Best for storage characterization. - - **realistic**: Adds 30ms/token GPU simulation. Required for production workload simulation. -* `--seed 42`: **Mandatory for valid submission.** Ensures identical pseudo-random workload across test runs and systems. - -### Trial Requirements Due to Variance - -Discovery testing revealed significant variance (CV 50-125% depending on configuration): - -| Concurrency | Typical CV | Minimum Trials | Recommended Trials | -|-------------|------------|----------------|-------------------| -| Low (10 users) | ~52% | 3 | 5 | -| Medium (50-100 users) | ~115-125% | 3 | 5+ | -| High (200 users) | ~110-120% | 3 | 5+ | - -**For publication-quality results:** -- Run minimum **3 trials** per configuration -- Run **5+ trials** for statistical robustness -- Report **median** rather than mean to reduce outlier impact -- Report **P95** and **P99** alongside mean for latency metrics - -### Interpreting Throughput: System vs. Storage (Read Amplification) - -When you run the benchmark, the summary report presents multiple throughput and I/O metrics that can differ significantly. Understanding these differences—validated by discovery testing—is key to correctly interpreting the results. - -#### Key Metrics Explained - -1. **Wall-Clock Throughput (`total_tokens_per_sec`):** The end-to-end throughput from the user's perspective: tokens generated per second across all users. Discovery testing showed **2.1x-2.4x differentiation** between storage tiers. This metric is reliable at all cpu_mem settings. - -2. **Storage Throughput (`nvme_throughput`):** Tokens processed per unit of NVMe I/O time. **⚠️ Warning:** Discovery testing showed this metric is **unreliable at cpu_mem=0GB** (only 1.1x differentiation) but works well at **cpu_mem=4GB** (2.2x differentiation). - -3. **Decode Bytes Read (GB):** Total bytes read from NVMe during decode phase. Discovery testing showed this is the **most reliable differentiation metric** at cpu_mem=0GB (**2.62x ratio, 100% win rate**). - -4. **Prefill Bytes Written (GB):** Total bytes written to NVMe during prefill phase. Shows **2.15x differentiation** at cpu_mem=0GB. - -#### Why Are They So Different? The Concept of Read Amplification - -Storage metrics are often an order of magnitude higher than System Throughput due to **Read Amplification**—a fundamental characteristic of LLM inference. - -During the "decode" phase, to generate a single new token, the model must read the *entire KV cache for all preceding tokens in the conversation*. - -* **Example:** A user has a context of 1000 tokens. To generate the 1001st token, the system must read the KV cache for all 1000 previous tokens from storage. - * **System Tokens Generated:** 1 - * **Storage Tokens Read:** 1000 - -This creates a massive amplification effect where a small amount of user-facing work (generating one token) triggers a large amount of backend I/O (reading the entire history). This is precisely the behavior this benchmark is designed to measure, as it is the primary source of stress on the storage subsystem in a real-world KV cache offloading scenario. - -#### Discovery Validation: I/O Volume as Primary Metric - -Discovery testing confirmed that **I/O volume metrics (Decode Bytes Read, Prefill Bytes Written)** are more reliable than time-normalized metrics for comparing storage systems: - -| Metric | cpu_mem=0GB Ratio | cpu_mem=4GB Ratio | Win Rate | -|--------|-------------------|-------------------|----------| -| Decode Bytes Read | **2.62x** | 2.06x | **100%** | -| Wall-Clock Throughput | **2.43x** | 1.79x | **100%** | -| Storage Throughput | 1.12x | **2.23x** | 62% / 97% | - -**Recommendation:** When comparing storage systems under maximum stress (cpu_mem=0GB), use **Decode Bytes Read** or **Wall-Clock Throughput** as your primary metric. Reserve Storage Throughput for cpu_mem≥4GB configurations. - -#### Code Snippets - -**1. System Throughput Calculation:** -This metric is calculated in the `_calculate_stats` method and is based on the number of new tokens generated. - -```python -# From IntegratedBenchmark._calculate_stats in kv-cache.py -total_tokens_generated = self.stats['tokens_generated'] -if duration > 0: - self.stats['total_tokens_per_sec'] = total_tokens_generated / duration -``` - -**2. Storage Throughput Calculation:** -This metric is calculated in the `_evaluate_storage_performance` method and is based on the `nvme_tokens_processed` counter, which tracks all I/O to the NVMe tier. - -```python -# From MultiTierCache._evaluate_storage_performance in kv-cache.py -nvme_tokens = self.stats.get('nvme_tokens_processed', 0) -if duration > 0: - nvme_throughput = nvme_tokens / duration -``` - -**3. How Storage Tokens are Counted:** -The `nvme_tokens_processed` counter is incremented during both writes (`allocate_cache`) and reads (`access_cache`) that involve the NVMe tier. - -*Writing to NVMe (Prefill):* -```python -# From MultiTierCache.allocate_cache in kv-cache.py -if allocated_tier == 'nvme': - # For throughput calculation, track tokens written to NVMe - if self.performance_profile == 'throughput': - self.stats['nvme_tokens_processed'] += num_tokens -``` - -*Reading from NVMe (Decode):* -```python -# From MultiTierCache.access_cache in kv-cache.py -elif key in self.nvme_entries: - # ... - # For throughput calculation, track tokens read from NVMe - if self.performance_profile == 'throughput': - entry_size = self.nvme_entries[key]['size'] - num_tokens = entry_size // self.model_config.kv_cache_size_per_token - self.stats['nvme_tokens_processed'] += num_tokens -``` - -By understanding read amplification, you can correctly interpret a high Storage Throughput not as an error, but as an accurate measurement of the intense I/O load the storage device is successfully handling. - -### What About RAG Workloads? - -The benchmark includes a Retrieval-Augmented Generation (RAG) simulation mode (`--enable-rag`), which models workloads that inject large documents into the context. This creates a very large, write-heavy prefill phase and is an excellent way to stress-test a storage device's ability to handle bursty I/O. - -However, for an official MLPerf submission, **it is recommended *not* to use the RAG workload.** The standard conversational workload provides a more consistent and repeatable I/O profile that is better suited for "apples-to-apples" comparisons between different storage solutions. - -The RAG workload can be considered an optional, supplementary test. Vendors are encouraged to run it and report the results separately to showcase performance on this specific, demanding use case, but it should not replace the standard Storage Saturation test for the official submission. - -### Why Not Use Autoscaling for Submission? - -The autoscaling feature (`--enable-autoscaling`) is an invaluable tool for system architects to discover the maximum user capacity of a *specific, balanced hardware configuration*. It is designed for system tuning and capacity planning, not for standardized component benchmarking. - -For an official MLPerf submission focused on storage, a fixed-load test is superior for two reasons: -1. **Repeatability:** A fixed user count ensures that every test run applies the exact same load, leading to highly repeatable and consistent results. Autoscaling, by its nature, adjusts the load based on system performance, which can introduce variability between runs. -2. **Comparability:** The goal of MLPerf is to compare components on an "apples-to-apples" basis. By using a standardized, high-load command, we can directly compare the performance of different storage devices under the exact same conditions. Autoscaling would result in different final user counts for different systems, making direct comparison of the storage's throughput and latency difficult. - -Therefore, the **Storage Saturation** test with a fixed, high user count is the correct methodology for generating official, comparable MLPerf v3.0 results for KV cache storage offloading. - ---- - -## 8. Known Limitations and Future Work - -This benchmark is a sophisticated tool for simulating KV cache offloading, but like any simulation, it has limitations. Understanding these is key to interpreting the results correctly and identifying areas for future improvement. - -* **NumPy Serialization Overhead:** The `NVMeBackend` uses `numpy.save()` and `numpy.load()` to write and read cache entries to disk. While efficient, this process involves CPU-bound serialization and deserialization steps. A real-world inference engine might use more advanced techniques like GPUDirect Storage to move data directly from the GPU to NVMe, bypassing the CPU and avoiding this overhead. Therefore, the measured NVMe latency in this benchmark may be slightly higher than what is achievable with a fully optimized, custom storage pipeline. - -* **Abstracted Storage Backends:** The benchmark currently provides a file-based `NVMeBackend`. It does not include built-in backends for other storage systems like object storage (e.g., S3), network file systems (NFS), or in-memory databases (e.g., Redis). While the `StorageBackend` class is extensible, testing these other systems would require implementing new backend classes. - -* **Single-Node Architecture:** The simulation runs on a single machine, modeling multiple users through threading. It does not account for network latency or bandwidth, which would be a significant factor in a distributed inference environment where the KV cache might be stored on a separate, networked storage server. - -* **Simulated GPU Backpressure:** The `--generation-mode` flag uses `time.sleep()` to emulate the time a GPU would spend on computation. This is a fixed-time approximation. It does not model the complex, dynamic nature of real GPU workloads, including variations in kernel execution times or PCIe bus contention between compute and I/O operations. - -* **Simplified Eviction Policy:** The benchmark employs a straightforward Least Recently Used (LRU) policy for evicting old conversations when memory limits are reached. Production inference servers may use more complex eviction algorithms (e.g., Least Frequently Used, size-based eviction) to optimize cache hit rates. - -### An Invitation to Collaborate - -This benchmark is an open-source effort driven by the MLPerf Storage Working Group. We welcome contributions from the community to help address these limitations and make the tool even more representative of real-world inference workloads. - -If you are an expert in storage systems, GPU programming, or LLM inference and are interested in contributing, please consider getting involved. Areas where we would particularly value collaboration include: -* Developing new storage backends (e.g., for object storage or RDMA). -* Integrating more sophisticated GPU simulation models. -* Implementing alternative cache eviction policies. -* Expanding the benchmark to a distributed, multi-node architecture. - -By working together, we can create a world-class, open standard for evaluating storage performance for AI. - ---- - -## H. How to Calculate Memory Requirements - -A common point of confusion is the memory consumption of the benchmark, especially when testing large models like `llama3.1-70b-instruct`. It's natural to see a 70B model and expect memory usage to be in the hundreds of gigabytes, yet the benchmark process might only consume 15-20 GB of RAM. - -This discrepancy arises because **the benchmark only simulates the I/O for the Key-Value (KV) cache; it does not load the model's actual weights.** - -The primary goal of this tool is to measure the performance of your memory and storage subsystems under the specific I/O patterns generated by moving the KV cache between tiers. The 140GB+ of the model's weights are assumed to be static and already loaded in GPU VRAM. The benchmark focuses on the dynamic part: the KV cache, which is generated on-the-fly for each user. - -#### The KV Cache Size Formula - -The size of the KV cache for a single token can be calculated using the model's architectural parameters. The formula is: -**Bytes per Token = `num_layers` × 2 × `kv_heads` × (`hidden_dim` / `num_heads`) × `bytes_per_dtype`** - -Where: -* `num_layers`: The number of transformer layers in the model. -* `2`: Represents the two components of the cache: the Key (K) and the Value (V). -* `kv_heads`: The number of attention heads for Keys/Values. For models using Grouped-Query Attention (GQA), this is smaller than `num_heads`. -* `hidden_dim / num_heads`: This calculates the dimension of a single attention head. -* `bytes_per_dtype`: The number of bytes for the data type (e.g., 2 for `float16`). - -#### Calculation for Each Model - -Here is the full calculation for each model defined in `kv-cache.py`: - -* **`tiny-1b`**: - * `32 × 2 × 4 × (1024 / 8) × 2` = **24,576 Bytes/Token** (~0.02 MB/Token) - -* **`mistral-7b`**: - * `32 × 2 × 8 × (4096 / 32) × 2` = **131,072 Bytes/Token** (~0.13 MB/Token) - -* **`llama2-7b`** (Uses Multi-Head Attention, so `kv_heads` = `num_heads`): - * `32 × 2 × 32 × (4096 / 32) × 2` = **524,288 Bytes/Token** (~0.50 MB/Token) - -* **`llama3.1-8b`**: - * `32 × 2 × 8 × (4096 / 32) × 2` = **131,072 Bytes/Token** (~0.13 MB/Token) - -* **`llama3.1-70b-instruct`**: - * `80 × 2 × 8 × (8192 / 64) × 2` = **327,680 Bytes/Token** (~0.31 MB/Token) - -#### Memory per User for an 8K Context - -Using these values, we can create a table showing the total KV cache size for a single user with a context of 8,192 tokens. This is crucial for capacity planning. - -| Model | Bytes per Token | Cache Size for 8,192 Tokens | -| :--- | :--- | :--- | -| `tiny-1b` | 24,576 | ~192 MB | -| `mistral-7b` | 131,072 | ~1,024 MB (1 GB) | -| `llama2-7b` | 524,288 | ~4,096 MB (4 GB) | -| `llama3.1-8b` | 131,072 | ~1,024 MB (1 GB) | -| `llama3.1-70b-instruct` | 327,680 | ~2,560 MB (2.5 GB) | - -This table clearly illustrates the memory pressure. If you are running the `llama3.1-70b-instruct` model with 40 users, the total active KV cache size the benchmark needs to manage is `40 users * 2.5 GB/user = 100 GB`. If you only provide 4 GB of CPU RAM (`--cpu-mem-gb 4`), the benchmark will correctly offload the other ~96 GB to your NVMe drive, allowing you to measure the performance of your storage under that specific, heavy load. - ---- - -## 9. Smoke Test: Quick Validation Suite - -This section provides a collection of key benchmark invocations that can be used as a "smoke test" to quickly validate different aspects of your system's performance. **These tests have been validated through discovery testing** (1,411 Fast system tests, 268 Slow system tests). For all commands, it is assumed the cache directory is `/mnt/nvme`. - -### Test 1: Maximum Storage Stress (Discovery-Validated) - -**Purpose:** Establishes the baseline performance of your storage device by forcing **all I/O to NVMe** (cpu_mem=0GB). This configuration showed the strongest storage tier differentiation in discovery testing. - -**Primary Metrics:** Decode Bytes Read (2.62x differentiation), Wall-Clock Throughput (2.43x differentiation) - -```bash -python3 kv-cache.py \ - --model llama3.1-8b \ - --num-users 200 \ - --duration 180 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 0 \ - --max-concurrent-allocs 16 \ - --generation-mode none \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output results_storage_stress.json -``` - -**⚠️ Important:** At cpu_mem=0GB, do NOT use Storage Throughput as your primary metric—use Decode Bytes Read or Wall-Clock Throughput instead (see Section 7 for details). - -### Test 2: Storage Throughput Benchmark (Traditional Metric) - -**Purpose:** Use this configuration when you want **Storage Throughput (tok/s)** as your primary metric. Discovery testing showed this metric works reliably at cpu_mem=4GB (2.2x differentiation). - -```bash -python3 kv-cache.py \ - --model llama3.1-8b \ - --num-users 100 \ - --duration 180 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 4 \ - --max-concurrent-allocs 0 \ - --generation-mode none \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output results_storage_throughput.json -``` - -### Test 3: Realistic Three-Tier Workload - -**Purpose:** Simulates a balanced, production-level environment using GPU, CPU, and NVMe tiers. Use this to measure end-to-end latency in a typical setup. - -```bash -python3 kv-cache.py \ - --model llama3.1-8b \ - --num-users 100 \ - --duration 300 \ - --gpu-mem-gb 16 \ - --cpu-mem-gb 32 \ - --generation-mode realistic \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output results_realistic_production.json -``` - -### Test 4: Autoscaling for Max Users (QoS Mode) - -**Purpose:** **This is the key command for sizing your production environment.** It automatically discovers the maximum number of concurrent users your system can support while maintaining a low-latency user experience (Quality of Service). - -```bash -python3 kv-cache.py \ - --model llama3.1-8b \ - --num-users 20 \ - --duration 300 \ - --gpu-mem-gb 16 \ - --cpu-mem-gb 32 \ - --enable-autoscaling \ - --autoscaler-mode qos \ - --generation-mode realistic \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output results_autoscaling_qos.json -``` - -### Test 5: Autoscaling for Peak Throughput (Capacity Mode) - -**Purpose:** Ignores latency to find the absolute maximum I/O throughput (tokens/sec) your storage hardware can sustain. This is the ultimate test of your drive's raw power. - -```bash -python3 kv-cache.py \ - --model llama3.1-70b-instruct \ - --num-users 10 \ - --duration 180 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 4 \ - --enable-autoscaling \ - --autoscaler-mode capacity \ - --generation-mode none \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output results_autoscaling_capacity.json -``` - -### Test 5: MLPerf Storage Submission (8B Model) - -**Purpose:** A standardized, high-load stress test designed to saturate the storage device and measure its sustained throughput for an official MLPerf submission. - -```bash -python3 kv-cache.py \ - --model llama3.1-8b \ - --num-users 150 \ - --duration 600 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 2 \ - --generation-mode realistic \ - --performance-profile throughput \ - --seed 42 \ - --output mlperf_v3_storage_submission_8b.json -``` - -### Test 6: MLPerf Storage Submission (70B Model) - -**Purpose:** A heavier version of the MLPerf stress test using a large model to generate a more intense I/O load, further testing the limits of the storage subsystem. - -```bash -python3 kv-cache.py \ - --model llama3.1-70b-instruct \ - --num-users 40 \ - --duration 600 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 4 \ - --generation-mode realistic \ - --performance-profile throughput \ - --seed 42 \ - --output mlperf_v3_storage_submission_70b.json -``` - -### Test 7: RAG Workload Simulation - -**Purpose:** Simulates a Retrieval-Augmented Generation (RAG) workload, which involves a write-heavy ingestion phase followed by bursty, high-throughput reads. This is an excellent stress test for RAG-specific applications. - -```bash -python3 kv-cache.py \ - --model llama3.1-8b \ - --num-users 30 \ - --duration 300 \ - --gpu-mem-gb 16 \ - --cpu-mem-gb 32 \ - --enable-rag \ - --generation-mode realistic \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output results_rag_workload.json -``` - -### Test 8: Maximum Stress (The "Kitchen Sink") - -**Purpose:** This is the ultimate stress test. It combines the largest model (70B), the I/O-intensive RAG workload, and the capacity-seeking autoscaler to find the absolute maximum throughput your system can handle when every demanding feature is enabled. - -```bash -python3 kv-cache.py \ - --model llama3.1-70b-instruct \ - --num-users 10 \ - --duration 300 \ - --gpu-mem-gb 16 \ - --cpu-mem-gb 64 \ - --enable-rag \ - --enable-autoscaling \ - --autoscaler-mode capacity \ - --generation-mode realistic \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output results_max_stress.json -``` - -### Test 9: ShareGPT Workload Replay - -**Purpose:** Validates system performance against a trace of real-world human-AI conversations. This is the closest approximation to running a production service. It uses the dedicated replay script [`kv-cache_sharegpt_replay.py`](kv-cache_sharegpt_replay.py ). - -```bash -python3 kv-cache_sharegpt_replay.py \ - --model llama3.1-70b-instruct \ - --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json \ - --max-conversations 1000 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 2 \ - --cache-dir /mnt/nvme \ - --num-users 50 \ - --duration 300 \ - --generation-mode none \ - --output results_sharegpt_replay.json -``` - ---- - -# CHANGES-12-05-2025: The "Waterfall" Architecture & Optimization - -**Date:** December 5, 2025 -**Subject:** Major architectural upgrade to `kv-cache-waterfall-lru.py`. - -This update introduces a fundamental shift in how the benchmark manages memory, moving from a simple "Spillover" model to a sophisticated "Waterfall" eviction strategy. It also addresses a critical CPU bottleneck that was masking true storage performance. - -## 1. Architectural Shift: From Spillover to Waterfall - -The original benchmark used a **Spillover** strategy. When the GPU was full, new data was forced directly into the CPU (and then NVMe). -* **The Problem:** New data is often the "hottest" (most likely to be read again soon). By forcing it to the slowest tier, we were penalizing active conversations. Meanwhile, old, cold data sat comfortably in the GPU, wasting valuable VRAM. -* **The Solution (Waterfall):** The new implementation enforces a strict hierarchy. New data **always** targets the fastest tier (GPU). - * If the GPU is full, the system identifies the **Least Recently Used (LRU)** item in the GPU and moves it to the CPU to make room. - * If the CPU is full, it moves the CPU's LRU item to NVMe. - * **Result:** The hottest data stays fast. Only truly cold data "falls" down the waterfall to storage. This mimics the behavior of production-grade caching systems like Redis or vLLM. - -### The Waterfall Flow - -```ascii - [ New Data ] - | - v - +-------------+ (Full?) +-------------+ (Full?) +-------------+ - | GPU Tier | --------------> | CPU Tier | --------------> | NVMe Tier | - | (Fastest) | Evict LRU | (Medium) | Evict LRU | (Slowest) | - +-------------+ +-------------+ +-------------+ - ^ ^ ^ - | | | - [ Hot Access ] [ Warm Access ] [ Cold Access ] -``` - -### Implementation: Recursive Eviction - -The core logic resides in `_ensure_space_in_tier`. It recursively clears space in lower tiers to make room for demotions from higher tiers. - -```python -def _ensure_space_in_tier(self, tier: str, required_bytes: int, recursion_depth: int = 0) -> bool: - # ... (recursion limits and checks omitted) ... - - # Find the LRU entry in this tier - lru_entries = self._get_lru_entries_in_tier(tier) - lru_key, lru_entry = lru_entries[0] - lru_size = lru_entry['size'] - - # Recursively ensure the next tier has space for this entry - # This triggers the "Waterfall" effect down the hierarchy - if not self._ensure_space_in_tier(next_tier, lru_size, recursion_depth + 1): - return False - - # Demote the LRU entry to the next tier - success, _ = self._demote_entry(lru_key, tier, next_tier) -``` - -## 2. Removing the CPU Bottleneck: Static Noise Buffers - -**The Issue:** -Profiling the original script revealed that `np.random.uniform`—the function used to generate the dummy KV cache data—was consuming massive amounts of CPU time. -* **Impact:** The CPU was spending so much time generating random numbers that it couldn't issue storage I/O requests fast enough. The benchmark was measuring the speed of Python's random number generator, not the speed of the NVMe drive. - -**The Fix:** -We replaced dynamic generation with a **Static Noise Buffer**. -* **Mechanism:** At startup, the benchmark pre-allocates a 256MB block of random noise in memory. -* **Zero-Copy Slicing:** When a request needs 10MB of data, instead of generating 10MB of new numbers, the system simply takes a "slice" (a view) of the pre-existing buffer. -* **Result:** Data generation is now effectively instant (zero CPU cost). This ensures that 100% of the latency measured is due to the storage subsystem, providing a true test of hardware performance. - -```python -class KVCacheGenerator: - def __init__(self, model_config: ModelConfig, global_seed: Optional[int] = None): - # Pre-allocate a large buffer of random noise (e.g., 256MB) - self.buffer_size_elements = 128 * 1024 * 1024 - self.precomputed_buffer = rng.uniform(-1.0, 1.0, size=self.buffer_size_elements).astype(self.dtype) - - def generate(self, sequence_length: int, key: Optional[str] = None) -> np.ndarray: - # ... (shape calculation omitted) ... - - # Zero-Copy Slicing: Take a view of the pre-existing buffer - if total_elements <= self.buffer_size_elements: - flat_view = self.precomputed_buffer[start_idx : start_idx + total_elements] - return flat_view.reshape(kv_shape) -``` - -## 3. Concurrency Hardening - -Implementing the Waterfall strategy introduced complex race conditions, where multiple threads might try to evict the same item or claim the same free space simultaneously. -* **Atomic Reservations:** We implemented a "check-and-reserve" logic inside the memory locks. A thread now claims space *before* it starts writing, preventing over-subscription. -* **Loop Protection:** We added hard caps to the eviction loops. In a pathological case where the system is thrashing, the eviction logic will now abort rather than spinning infinitely, preventing the benchmark from hanging. - -```python -# Inside _ensure_space_in_tier -with self.memory_lock: - current_usage = self._get_tier_usage(tier) - # Check if we have space - if current_usage + required_bytes <= target_usage: - # ATOMIC RESERVATION: Claim the space immediately inside the lock. - # This prevents other threads from seeing this space as free. - self._update_tier_usage(tier, required_bytes) - return True -``` - -## 4. Enhanced Metrics: NVMe Token Throughput - -To align with MLPerf requirements, we added a specific counter for `nvme_tokens_processed`. -* **Why:** Previously, we tracked raw bytes. However, MLPerf metrics are often in "Tokens per Second." -* **How:** The system now tracks the exact number of tokens associated with every read, write, and demotion operation that touches the NVMe drive. This allows us to report a precise "Storage Throughput (tok/s)" metric that accounts for the massive read amplification inherent in LLM inference. ---- - -# CHANGES-01-09-2026: ShareGPT Integration, Unit Testing, and Excel Export - -**Date:** January 9, 2026 -**Subject:** Feature enhancements to support realistic workload replay, automated testing, and streamlined results analysis. - -This update consolidates the ShareGPT replay functionality into the main benchmark script, adds a comprehensive unit test suite, and introduces optional Excel export capabilities. These changes improve usability for both development validation and production benchmarking without introducing any regressions to the core simulation logic. - -## 1. ShareGPT Dataset Integration - -The original repository maintained two separate scripts: `kv-cache.py` for synthetic workloads and `kv-cache_sharegpt_replay.py` for real conversation replay. This created maintenance overhead and confused users about which script to use. We merged the ShareGPT functionality directly into `kv-cache.py`. - -**What Changed:** -* **New Class: `ShareGPTDatasetLoader`** (~150 lines) parses ShareGPT JSON files and uses tiktoken to calculate exact token counts for each conversation turn. -* **New Arguments:** The main script now accepts `--dataset-path`, `--max-conversations`, `--request-rate`, and `--max-requests` for controlling replay behavior. -* **Backward Compatibility:** When no dataset path is provided, the benchmark falls back to its original synthetic workload generation. Existing invocations work unchanged. - -**Why This Matters:** -Real human conversations exhibit dramatically different patterns than synthetic workloads. In our validation testing, ShareGPT conversations averaged 133 tokens per context versus 2,676 tokens for synthetic generation—a 20x difference. This affects cache hit rates (85-97% vs 50-70%), throughput measurements, and the validity of capacity planning exercises. - -**Sample Invocation:** -```bash -python3 kv-cache.py \ - --model llama3.1-8b \ - --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json \ - --max-conversations 500 \ - --num-users 50 \ - --duration 300 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 4 \ - --generation-mode realistic \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output results_sharegpt.json -``` - -## 2. Understanding the Three Throughput Metrics - -The benchmark reports three different "tokens per second" metrics. Each measures something fundamentally different. Understanding these distinctions is critical for interpreting results correctly. - -### Metric 1: Wall-Clock Throughput (`avg_throughput_tokens_per_sec`) - -**What it measures:** The rate at which **output tokens** are generated, as seen by end users. - -**Formula:** -``` -wall_clock_throughput = total_tokens_generated / elapsed_wall_time -``` - -**Code Location:** `_calculate_stats()` method -```python -summary = { - 'avg_throughput_tokens_per_sec': self.results['total_tokens_generated'] / duration, -} -``` - -**What `total_tokens_generated` contains:** The sum of `request.generate_tokens` for all completed requests. This is the number of **new output tokens** the LLM produced—NOT the KV cache data. - -```python -# In process_requests(), after each request completes: -with self.results_lock: - self.results['total_tokens_generated'] += request.generate_tokens # Output tokens only -``` - -**Use this metric to answer:** "How many tokens per second is my inference system delivering to users?" - ---- - -### Metric 2: Storage I/O Throughput (`storage_throughput_tokens_per_sec`) - -**What it measures:** An **efficiency ratio**—how many output tokens are produced per second of cumulative storage I/O time. - -**Formula:** -``` -storage_io_throughput = total_tokens_generated / total_storage_io_latency -``` - -**Code Location:** `_calculate_stats()` method -```python -'storage_throughput_tokens_per_sec': self.results['total_tokens_generated'] / self.results['total_storage_io_latency'] -``` - -**What `total_storage_io_latency` contains:** The **cumulative** time spent in storage operations across ALL threads. This can exceed wall-clock time because multiple threads perform I/O in parallel. - -```python -# In process_requests(), storage_latency accumulates ALL cache operations for this request: -storage_latency = 0.0 -_, read_lat = self.cache.access_cache(...) # Read from cache -storage_latency += read_lat -_, _, write_lat = self.cache.allocate_cache(...) # Write to cache -storage_latency += write_lat - -# Then recorded: -with self.results_lock: - self.results['total_storage_io_latency'] += storage_latency -``` - -**Important:** This metric uses the same numerator (output tokens) as wall-clock throughput. It does NOT measure storage bandwidth. - -**Use this metric to answer:** "How efficiently does each second of I/O work translate into user-facing output?" - -**Interpretation:** -- `storage_io_throughput < wall_clock_throughput` → Storage is a bottleneck (cumulative I/O time exceeds wall time) -- `storage_io_throughput > wall_clock_throughput` → Other factors (GPU simulation, queueing) dominate latency - ---- - -### Metric 3: Storage Assessment Throughput (`nvme_tokens_processed / duration`) - -**What it measures:** The actual **storage bandwidth**—how much KV cache data flows through the NVMe tier. - -**Formula:** -``` -nvme_throughput = nvme_tokens_processed / elapsed_wall_time -``` - -**Code Location:** `_evaluate_storage_performance()` method -```python -if self.performance_profile == 'throughput': - nvme_tokens = self.stats.get('nvme_tokens_processed', 0) - throughput = nvme_tokens / duration if duration > 0 else 0 -``` - -**What `nvme_tokens_processed` contains:** The number of tokens' worth of KV cache data that was **read from or written to NVMe**. This is incremented in three places: - -```python -# 1. When data is WRITTEN directly to NVMe (in allocate_cache): -if allocated_tier == 'nvme': - self.stats['nvme_tokens_processed'] += num_tokens - -# 2. When data is READ from NVMe (in access_cache): -if location == 'nvme': - num_tokens = entry_size / self.model_config.kv_cache_size_per_token - self.stats['nvme_tokens_processed'] += num_tokens - -# 3. When data is EVICTED/DEMOTED to NVMe (in _demote_entry): -if to_tier == 'nvme': - tokens = int(size / bytes_per_token) - self.stats['nvme_tokens_processed'] += tokens -``` - -**Use this metric to answer:** "How much KV cache data is my NVMe drive handling per second?" - -**Why this can be much higher than wall-clock throughput:** Due to **read amplification**. During decode, generating 1 output token requires reading the entire KV cache (potentially thousands of tokens) from storage. - ---- - -### Summary Comparison - -| Metric | Numerator | Denominator | What It Measures | -|--------|-----------|-------------|------------------| -| **Wall-clock** | Output tokens | Wall time | User-facing generation rate | -| **Storage I/O** | Output tokens | Cumulative I/O time | I/O efficiency ratio | -| **NVMe Assessment** | KV cache tokens (R+W) | Wall time | Storage bandwidth | - -### Real-World Example - -From a benchmark run with 70 users on Llama 3.1 70B: - -``` -Total Tokens Generated: 56,108 (output tokens) -Duration: 120 seconds -Total Storage I/O Latency: 603 seconds (cumulative across threads) -NVMe Tokens Processed: 597,991 (KV cache data tokens) -``` - -| Metric | Calculation | Result | -|--------|-------------|--------| -| **Wall-clock** | 56,108 / 120 | **467 tok/s** | -| **Storage I/O** | 56,108 / 603 | **93 tok/s** | -| **NVMe Assessment** | 597,991 / 120 | **4,983 tok/s** | - -**Interpretation:** -- The system delivers **467 output tokens/second** to users -- Storage is a bottleneck (93 < 467), meaning I/O time dominates -- The NVMe drive is handling **4,983 tokens/second** of KV cache I/O (10.6× read amplification) - ---- - -### The ShareGPT Bug Explained - -The Storage Assessment uses `nvme_tokens_processed`, which is **NVMe-specific**. In ShareGPT replay mode with small context sizes (~300 tokens average), all data fits in GPU+CPU memory. No data reaches NVMe, so: - -``` -nvme_tokens_processed = 0 -nvme_throughput = 0 / 120 = 0.00 tok/s → FAIL -``` - -Meanwhile, wall-clock and storage I/O throughput show healthy values because they use `total_tokens_generated` (output tokens), which is always incremented regardless of which cache tier is used. - - - -## 3. Unit Test Suite - -We added a comprehensive pytest-based test suite (`test_kv_cache.py`) that validates core functionality without running full benchmarks. This enables rapid development iteration and CI/CD integration. - -**Coverage:** -The test suite includes 12 test classes covering: -* `TestModelConfig`: Validates KV cache size calculations for all 5 model configurations -* `TestInferenceRequest`: Tests cache key generation and latency tracking -* `TestQoSProfiles`: Verifies priority levels and SLA targets -* `TestKVCacheGenerator`: Confirms deterministic generation and precomputed buffer optimization -* `TestCPUMemoryBackend`: Tests write/read/delete/clear operations -* `TestNVMeBackend`: Validates file I/O and metadata tracking -* `TestGPUMemoryBackend`: CUDA tensor operations (auto-skipped without GPU) -* `TestConversationManager`: Multi-turn tracking and LRU eviction -* `TestUserSimulator`: Mixed user generation -* `TestMultiTierCache`: CPU-only mode allocation and access -* `TestMultiTierCacheWithGPU`: Full three-tier hierarchy (auto-skipped without GPU) -* `TestXLSXExport`: CSV/Excel export validation - -**Running Tests:** -```bash -# Full test suite with verbose output -pytest test_kv_cache.py -v - -# Run specific test class -pytest test_kv_cache.py -k "TestModelConfig" -v - -# Skip GPU tests explicitly -pytest test_kv_cache.py -v -m "not skipif" -``` - -**Expected Runtime:** 3-5 seconds without GPU, 5-10 seconds with GPU. - -## 4. Excel Export Capability - -For users who analyze results in spreadsheets, we added optional Excel/CSV export via the `--xlsx-output` argument. - -**Dependencies:** -* `pandas` (required for export) -* `openpyxl` (optional; enables `.xlsx` format; without it, falls back to `.csv`) - -**Graceful Fallback:** -* If pandas is not installed, the export is skipped with a warning -* If openpyxl is not installed, the benchmark writes CSV instead of XLSX -* The benchmark never fails due to missing optional dependencies - -**Output Columns:** -The export includes all key parameters and metrics in a single row: -| Model | Num Users | Duration | GPU Mem | CPU Mem | Total Requests | Total Tokens | Avg Throughput | Storage Throughput | Cache Hit Rate | E2E P95 | Storage IO P95 | - -**Sample Invocation:** -```bash -python3 kv-cache.py \ - --model llama3.1-8b \ - --num-users 50 \ - --duration 120 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 4 \ - --seed 42 \ - --output results.json \ - --xlsx-output results.xlsx -``` - -## 5. Regression Analysis: No Core Logic Changes - -A detailed diff analysis between the original `kv-cache.py` and the enhanced version confirms that the core simulation logic remains identical: - -| Component | Status | -|-----------|--------| -| `MultiTierCache` class | Unchanged | -| `allocate_cache()` eviction logic | Unchanged | -| `access_cache()` read logic | Unchanged | -| `KVCacheGenerator` precomputed buffer | Unchanged | -| `GPUMemoryBackend`, `CPUMemoryBackend`, `NVMeBackend` | Unchanged | -| `UserSimulator` | Unchanged | -| QoS handling | Unchanged | -| Autoscaling | Unchanged | -| RAG workload | Unchanged | -| Prefix caching | Unchanged | - -**What Was Added (Not Modified):** -1. `ShareGPTDatasetLoader` class (new code, doesn't affect simulation) -2. `storage_throughput_tokens_per_sec` metric (additional output, no behavior change) -3. `--max-requests` argument (optional early termination, backward compatible) -4. `--request-rate` argument (optional rate limiting, backward compatible) -5. `--xlsx-output` argument (optional export, backward compatible) -6. `export_results_to_xlsx()` function (new code, called after benchmark completes) - -Existing benchmark invocations produce identical results. The seed-based reproducibility guarantee is maintained. - -## 6. Updated Requirements - -The `requirements.txt` file now documents all dependencies: - -``` -# Core (required) -numpy>=1.20.0 - -# GPU support (optional) -torch>=2.0.0 - -# ShareGPT replay (optional) -tiktoken>=0.5.0 - -# Excel export (optional) -pandas>=2.0.0 -openpyxl>=3.1.0 - -# Unit testing (optional) -pytest>=7.0.0 -``` - -## 7. Recommended Invocations: ShareGPT Workloads - -### ShareGPT Storage Validation (8B Model) -```bash -python3 kv-cache.py \ - --model llama3.1-8b \ - --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json \ - --max-conversations 1000 \ - --num-users 100 \ - --duration 300 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 4 \ - --generation-mode realistic \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output results_sharegpt_8b.json \ - --xlsx-output results_sharegpt_8b.xlsx -``` - -### ShareGPT High-Stress (70B Model) -```bash -python3 kv-cache.py \ - --model llama3.1-70b-instruct \ - --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json \ - --max-conversations 500 \ - --request-rate 5.0 \ - --num-users 50 \ - --duration 600 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 8 \ - --generation-mode none \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output results_sharegpt_70b.json -``` - -### ShareGPT Fixed Request Count -```bash -python3 kv-cache.py \ - --model llama3.1-8b \ - --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json \ - --max-requests 10000 \ - --num-users 75 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 4 \ - --generation-mode realistic \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output results_sharegpt_fixed.json -``` - ---- - -## Summary of January 9, 2026 Changes - -| Feature | Description | Impact | -|---------|-------------|--------| -| ShareGPT Integration | Merged `kv-cache_sharegpt_replay.py` into main script | Real workload validation | -| Storage Throughput Metric | Added `storage_throughput_tokens_per_sec` | Fair tier comparisons | -| Unit Test Suite | 12 pytest test classes, ~80 tests | Development velocity | -| Excel Export | `--xlsx-output` with CSV fallback | Easier analysis | -| Elapsed Time Tracking | Added to summary output | Debugging support | - -No regressions were introduced. All existing invocations and seed-based reproducibility remain intact. \ No newline at end of file +# MLPerf KV Cache Benchmark v3.0 +## Technical Specification and Implementation Guide + +**Date:** January 27, 2026 +**Author:** Hazem Awadallah , Kingston Digital +**Note:** AI tooling was used to draft code under architectural direction. + +--- + +## Executive Summary + +### The Problem + +Large Language Models generate text one token at a time, maintaining context through a data structure called the **KV Cache** that stores attention state. This cache eliminates redundant computation but grows linearly with sequence length; a single 8K-token conversation with a 70B model consumes **2.5 GB of memory**. + +At scale, this quickly exhausts GPU VRAM, forcing systems to offload data to slower tiers: CPU RAM or NVMe storage. The challenge: **quantifying the performance trade-offs** of multi-tier storage architectures. + +### The Solution + +This benchmark simulates realistic LLM inference workloads to answer critical capacity planning questions: + +- **Tier Performance:** How much faster is GPU vs. CPU vs. NVMe? +- **Capacity Planning:** How many concurrent users can my storage sustain at a given throughput? (See note below on tier promotion.) +- **Hardware Validation:** Which NVMe drive delivers optimal throughput for LLM inference? +- **Bottleneck Identification:** Where is the storage bottleneck in my system? (See note below on tier promotion.) + +> **Scope note; no tier promotion:** The benchmark uses a one-way waterfall: data flows from GPU → CPU → NVMe but is never promoted back to a faster tier on read. This is intentional for isolating storage performance; it ensures NVMe is stressed on every read. However, production inference engines (vLLM, TensorRT-LLM) promote hot entries back to GPU, which reduces NVMe read traffic and increases GPU/CPU memory pressure. As a result, **Capacity Planning** results reflect storage throughput limits, not end-to-end serving capacity (which depends on promotion policy and working set size). **Bottleneck Identification** accurately identifies storage bottlenecks but may not surface GPU/CPU memory pressure caused by promotion traffic in production. See §3.4 for the waterfall design rationale. + +> **Terminology; "NVMe" as shorthand:** Throughout this document, "NVMe" refers to the benchmark's third storage tier (the `--cache-dir` filesystem path). The benchmark is not NVMe-specific; it writes `.npy` files via standard POSIX I/O and works with any block device or filesystem: SATA SSD, HDD, RAM disk, NFS, EBS, etc. "NVMe" is used as shorthand because NVMe SSDs are the primary target for production KV cache offloading. + +### Architecture Overview + +``` +┌─────────────────────────────────────────────────────────────┐ +│ Workload Generator → Multi-Tier Cache → Storage Tiers │ +│ (Requests/Users) (Waterfall LRU) (GPU/CPU/NVMe)│ +│ │ +│ ↓ ↓ ↓ │ +│ Telemetry Priority Queue Device I/O │ +│ (4 Latency Layers) (QoS Classes) (Hardware) │ +└─────────────────────────────────────────────────────────────┘ +``` + +**Key Features:** +- **Waterfall LRU:** Hot data stays in fast tiers; cold data cascades to storage +- **Hardware Validation:** Bypasses OS caching (`posix_fadvise`) for true device measurement +- **Autoscaling:** Automatically discovers maximum sustainable load +- **Production Realism:** Simulates GPU compute, RAG workloads, prefix caching, multi-turn conversations + +--- + +## 1. Quick Start: Four Essential Tests + +All examples use `llama3.1-8b` and assume `/mnt/nvme` as the cache directory. Use `--seed 42` for reproducibility. + +### Test 1: Storage Baseline (Device Isolation) + +**Purpose:** Measure raw NVMe performance by forcing 100% storage utilization. + +```bash +python3 kv-cache.py \ + --config config.yaml \ + --model llama3.1-8b \ + --num-users 200 \ + --duration 300 \ + --gpu-mem-gb 0 \ + --cpu-mem-gb 0 \ + --max-concurrent-allocs 16 \ + --generation-mode none \ + --cache-dir /mnt/nvme \ + --seed 42 \ + --output results_storage_baseline.json +``` + +**Key Metrics:** +- `decode_bytes_read_gb` – I/O volume (2.6× differentiation fast/slow drives) +- `avg_throughput_tokens_per_sec` – Wall-clock throughput (2.4× differentiation) +- `nvme_read_device_p95_ms` – Hardware read latency (P95) +- `nvme_write_device_p95_ms` – Hardware write latency (P95) + +--- + +### Test 2: Production Simulation (Three-Tier) + +**Purpose:** Model realistic workload with GPU/CPU/NVMe hierarchy and simulated inference compute. + +```bash +python3 kv-cache.py \ + --config config.yaml \ + --model llama3.1-8b \ + --num-users 100 \ + --duration 300 \ + --gpu-mem-gb 16 \ + --cpu-mem-gb 32 \ + --generation-mode realistic \ + --cache-dir /mnt/nvme \ + --seed 42 \ + --output results_production.json +``` + +**Key Metrics:** +- `end_to_end_latency_p95_ms` – User-facing latency +- `cache_hit_rate` – % served from fast tiers +- Tier distribution – `gpu_entries`, `cpu_entries`, `nvme_entries` + +--- + +### Test 3: Capacity Planning (QoS Autoscaler) + +**Purpose:** Discover maximum users while maintaining latency SLAs. + +```bash +python3 kv-cache.py \ + --config config.yaml \ + --model llama3.1-8b \ + --num-users 20 \ + --duration 300 \ + --gpu-mem-gb 16 \ + --cpu-mem-gb 32 \ + --enable-autoscaling \ + --autoscaler-mode qos \ + --generation-mode realistic \ + --cache-dir /mnt/nvme \ + --seed 42 \ + --output results_qos.json +``` + +**Key Metrics:** +- `autoscaling_stats[last].users` – Final stabilized count +- `qos_stats` – Per-class latency vs. SLA + +--- + +### Test 4: Peak Throughput (Capacity Autoscaler) + +**Purpose:** Find absolute maximum I/O throughput (ignores latency). + +```bash +python3 kv-cache.py \ + --config config.yaml \ + --model llama3.1-70b-instruct \ + --num-users 10 \ + --duration 180 \ + --gpu-mem-gb 0 \ + --cpu-mem-gb 32 \ + --enable-autoscaling \ + --autoscaler-mode capacity \ + --generation-mode none \ + --cache-dir /mnt/nvme \ + --seed 42 \ + --output results_capacity.json +``` + +**Key Metrics:** +- `peak_throughput` – Max tokens/sec +- `reason: "Peak capacity found"` in `autoscaling_stats` + +--- + +## 2. Hardware Requirements + +### Minimum (Basic Validation) +- **CPU:** 8-core server-grade (AMD EPYC/Intel Xeon Bronze) +- **RAM:** 32 GB ECC +- **GPU:** Optional (can run `--gpu-mem-gb 0`) +- **Storage:** 256 GB+ data center SATA/SAS SSD +- **OS:** Linux (Ubuntu 22.04+, RHEL 9+) + +### Recommended (Full Test Suite) +- **CPU:** 32-core server-grade (EPYC 9354/Xeon Gold 4510+) +- **RAM:** 128 GB+ ECC +- **GPU:** NVIDIA Data Center (A100/H100) with 40GB+ HBM +- **Storage:** 1 TB+ PCIe Gen4/Gen5 NVMe +- **OS:** Linux (Ubuntu 22.04+, RHEL 9+) + +### 2.1 Scaling the Benchmark to Different Hardware + +The benchmark is **storage-agnostic**; `--cache-dir` can point to any mounted filesystem. The key scaling parameters are: + +| Parameter | What It Controls | Scaling Impact | +|-----------|------------------|----------------| +| `--cache-dir` | Storage target path | Point to any mounted device (NVMe, SATA SSD, SAN, NFS, RAM disk) | +| `--num-users` | Concurrent simulated users | More users = higher I/O parallelism | +| `--max-concurrent-allocs` | Parallel write operations | Limits concurrent I/O to prevent OOM | +| `--precondition-threads` | Preconditioning parallelism | 0 = auto-detect from `os.cpu_count()` | +| `--gpu-mem-gb` / `--cpu-mem-gb` | Tier capacities | 0 disables tier, data goes directly to next tier | + +#### Example 1: Enterprise SATA SSD (Dell PowerEdge with RAID) + +```bash +# Mount the RAID array +sudo mount /dev/sda1 /mnt/sata_raid + +# Run benchmark on SATA RAID (expect ~500-800 MB/s) +python -m kv_cache.cli \ + --model llama3.1-8b \ + --cache-dir /mnt/sata_raid/kv_benchmark \ + --gpu-mem-gb 0 --cpu-mem-gb 0 \ + --num-users 50 \ + --max-concurrent-allocs 8 \ + --duration 300 \ + --performance-profile throughput +``` + +#### Example 2: Network-Attached Storage (NFS/SMB) + +```bash +# Mount NFS share from storage array +sudo mount -t nfs storage.local:/exports/benchmark /mnt/nfs + +# Run benchmark on NFS (expect ~200-1000 MB/s depending on network) +python -m kv_cache.cli \ + --model llama3.1-8b \ + --cache-dir /mnt/nfs/kv_benchmark \ + --gpu-mem-gb 0 --cpu-mem-gb 4 \ + --num-users 25 \ + --max-concurrent-allocs 4 \ + --duration 300 +``` + +#### Example 3: SAN Storage (Fibre Channel / iSCSI) + +```bash +# Mount iSCSI LUN +sudo iscsiadm -m node --login +sudo mount /dev/sdb1 /mnt/iscsi_lun + +# Run benchmark on SAN (expect ~1-4 GB/s for enterprise arrays) +python -m kv_cache.cli \ + --model llama3.1-70b-instruct \ + --cache-dir /mnt/iscsi_lun/kv_benchmark \ + --gpu-mem-gb 0 --cpu-mem-gb 32 \ + --num-users 100 \ + --max-concurrent-allocs 16 \ + --duration 600 +``` + +#### Example 4: RAM Disk (Maximum Speed Baseline) + +```bash +# Create RAM disk (requires sufficient RAM) +sudo mkdir -p /mnt/ramdisk +sudo mount -t tmpfs -o size=64G tmpfs /mnt/ramdisk + +# Run benchmark on RAM disk (expect ~10-20 GB/s) +python -m kv_cache.cli \ + --model llama3.1-8b \ + --cache-dir /mnt/ramdisk/kv_benchmark \ + --gpu-mem-gb 0 --cpu-mem-gb 0 \ + --num-users 200 \ + --duration 60 +``` + +#### Example 5: Cloud Block Storage (AWS EBS, Azure Disk, GCP PD) + +```bash +# AWS EBS io2 volume (mounted at /dev/nvme1n1) +sudo mkfs.xfs /dev/nvme1n1 +sudo mount /dev/nvme1n1 /mnt/ebs + +# Run benchmark (expect varies: gp3 ~1GB/s, io2 ~4GB/s) +python -m kv_cache.cli \ + --model llama3.1-8b \ + --cache-dir /mnt/ebs/kv_benchmark \ + --gpu-mem-gb 0 --cpu-mem-gb 8 \ + --num-users 100 \ + --storage-capacity-gb 500 \ + --duration 300 +``` + +#### Scaling Guidelines + +| Storage Type | Expected Bandwidth | Recommended `--num-users` | `--max-concurrent-allocs` | +|--------------|-------------------|---------------------------|---------------------------| +| HDD RAID | 100-300 MB/s | 10-25 | 0 (unlimited) | +| SATA SSD | 400-550 MB/s | 25-50 | 0 (unlimited) | +| SAS SSD | 800-1200 MB/s | 50-100 | 0 (unlimited) | +| NFS (10GbE) | 500-1200 MB/s | 25-50 | 0 (unlimited) | +| SAN (FC/iSCSI) | 1-4 GB/s | 50-150 | 0 (unlimited) | +| PCIe Gen3 NVMe | 2-3.5 GB/s | 100-200 | 0 (unlimited) | +| PCIe Gen4 NVMe | 5-7 GB/s | 150-300 | 0 (unlimited) | +| PCIe Gen5 NVMe | 10-14 GB/s | 200-500 | 0 (unlimited) | +| RAM Disk | 10-25 GB/s | 200-500 | 0 (unlimited) | + +**Note on `--max-concurrent-allocs`:** +- **MLPerf submissions:** Always use `0` (unlimited) to measure true hardware capability +- **Production simulation:** Set non-zero to simulate memory-constrained environments +- **OOM prevention:** Use `4-16` if benchmark exhausts system RAM during parallel writes + +The `--max-concurrent-allocs` flag is a **limiter**, not a performance target. Higher values don't improve throughput; they cap it. + +| Symptom | Cause | Action | +|---------|-------|--------| +| Per-request latency >> actual I/O time | Semaphore wait overhead | Keep `--max-concurrent-allocs 0` (unlimited) | +| OOM during benchmark | Too many parallel writes in flight | Set `--max-concurrent-allocs 8-16` | + +#### Multi-Client Scaling (Bypassing Python GIL) + +For maximum I/O parallelism, run **multiple benchmark processes** with separate cache directories. This bypasses Python's Global Interpreter Lock (GIL) and better simulates production deployments (multiple vLLM/TensorRT-LLM instances on the same node). + +**Why multi-client?** + +| Approach | GIL Contention | Realistic? | Use Case | +|----------|----------------|------------|----------| +| Single-client, `--num-users 400` | Yes | Less | Quick validation | +| 4 clients × `--num-users 100` | No | More | MLPerf submission, stress test | + +**⚠️ RAM Requirements for Multi-Client** + +Each client process holds KV cache tensors in RAM during I/O operations. With `--max-concurrent-allocs 0` (unlimited), worst-case RAM per client: + +``` +RAM per client ≈ num_users × avg_context_tokens × bytes_per_token +``` + +| Model | Bytes/Token | 100 users × 4K context | 100 users × 8K context | +|-------|-------------|------------------------|------------------------| +| llama3.1-8b | 312 KB | ~122 GB | ~244 GB | +| llama3.1-70b | 1.28 MB | ~500 GB | ~1 TB | + +**To prevent OOM with multi-client setups:** + +| System RAM | Max Clients | Users per Client | `--max-concurrent-allocs` | +|------------|-------------|------------------|---------------------------| +| 64 GB | 2 | 25 | 8 | +| 128 GB | 4 | 25 | 8 | +| 256 GB | 4 | 50 | 16 | +| 512 GB | 8 | 50 | 16 | +| 1 TB+ | 8 | 100 | 0 (unlimited) | + +**Example: 4-client parallel benchmark (memory-aware)** + +```bash +#!/bin/bash +# run_multi_client.sh - Scale to 4 processes with RAM limits + +NUM_CLIENTS=4 +CACHE_BASE="/mnt/nvme/kv_benchmark" +MODEL="llama3.1-8b" +DURATION=300 +USERS_PER_CLIENT=50 # Reduced from 100 for RAM safety +MAX_CONCURRENT=16 # Limit in-flight tensors per client + +for i in $(seq 0 $((NUM_CLIENTS-1))); do + python -m kv_cache.cli \ + --cache-dir ${CACHE_BASE}/client_${i} \ + --model ${MODEL} \ + --num-users ${USERS_PER_CLIENT} \ + --max-concurrent-allocs ${MAX_CONCURRENT} \ + --gpu-mem-gb 0 --cpu-mem-gb 0 \ + --duration ${DURATION} \ + --output results_client_${i}.json & + echo "Started client $i (PID: $!)" +done + +echo "Waiting for all clients to complete..." +wait +echo "All clients finished. Aggregate results from results_client_*.json" +``` + +**Result aggregation:** + +```python +import json +import glob + +results = [json.load(open(f)) for f in glob.glob("results_client_*.json")] + +total_write_gb = sum(r['storage_stats']['total_write_bytes'] / 1e9 for r in results) +total_read_gb = sum(r['storage_stats']['total_read_bytes'] / 1e9 for r in results) +total_duration = max(r['duration_seconds'] for r in results) + +print(f"Aggregate Write Bandwidth: {total_write_gb / total_duration:.2f} GB/s") +print(f"Aggregate Read Bandwidth: {total_read_gb / total_duration:.2f} GB/s") +``` + +**Scaling recommendations (RAM-aware):** + +| System RAM | NVMe Type | Recommended Multi-Client Setup | +|------------|-----------|-------------------------------| +| 128 GB | PCIe Gen3 | 2 clients × 50 users × `--max-concurrent-allocs 8` | +| 256 GB | PCIe Gen4 | 4 clients × 50 users × `--max-concurrent-allocs 16` | +| 512 GB | PCIe Gen5 | 4 clients × 100 users × `--max-concurrent-allocs 32` | +| 1 TB+ | PCIe Gen5 | 8 clients × 100 users × `--max-concurrent-allocs 0` | + +**Important:** +- Each client uses a **separate subdirectory** (`client_0/`, `client_1/`, etc.) to avoid file conflicts +- Monitor system RAM with `htop` or `free -h` during runs +- If OOM occurs, reduce `--num-users` or set `--max-concurrent-allocs` lower + +--- + +## 3. Architecture Deep Dive + +### 3.1 Request Structure + +Each inference request simulates a user interaction: + +| Field | Description | +|-------|-------------| +| `context_tokens` | Prompt size (determines KV cache write size) | +| `generate_tokens` | Number of tokens to produce (determines read operations) | +| `phase` | `PREFILL` (write-only, ≥10K tokens), `DECODE` (read-only), `PREFILL_DECODE` (typical: 1 write + N reads) | +| `cache_key` | Unique identifier: `{conversation_id}_turn_{n}` or `{user_id}_ctx` | + +**Phase Logic:** +```python +phase = PREFILL if context_tokens >= 10000 else PREFILL_DECODE +``` + +Most requests use `PREFILL_DECODE`: one prefill write followed by batched decode reads. + +--- + +### 3.2 Telemetry: Four-Layer Latency Hierarchy + +Each inference request produces latency measurements at four nested levels. Understanding what each measures is critical for diagnosing bottlenecks. + +#### Visual Overview + +``` +User submits request + │ + ▼ +┌─────────────────────────────────────────────────────────────────────────┐ +│ L1: END-TO-END LATENCY │ +│ Time from request submission to response completion │ +│ = Queue Wait + Storage I/O + Token Generation │ +│ │ +│ ┌────────────────────────────────────────────────────────────────────┐ │ +│ │ L2: PER-REQUEST STORAGE LATENCY │ │ +│ │ Total I/O time for ONE request (may include multiple ops) │ │ +│ │ = 1× Prefill Write + N× Decode Reads │ │ +│ │ │ │ +│ │ ┌──────────────────────────────────────────────────────────────┐ │ │ +│ │ │ L3: PER-TIER TOTAL LATENCY │ │ │ +│ │ │ Time for ONE file I/O operation on ONE storage tier │ │ │ +│ │ │ = Host (CPU) + Device (Disk) │ │ │ +│ │ │ │ │ │ +│ │ │ ┌────────────────────────────────────────────────────────┐ │ │ │ +│ │ │ │ L4: HOST vs DEVICE BREAKDOWN │ │ │ │ +│ │ │ │ Write: Host = np.save() | Device = fsync() │ │ │ │ +│ │ │ │ Read: Host = fadvise+copy | Device = np.load() │ │ │ │ +│ │ │ │ (NOT pure NVMe controller latency - includes OS) │ │ │ │ +│ │ │ └────────────────────────────────────────────────────────┘ │ │ │ +│ │ └──────────────────────────────────────────────────────────────┘ │ │ +│ └────────────────────────────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────────────────────┘ +``` + +#### Concrete Example: Llama 3.1 70B Request + +A user sends a 4,096-token prompt and requests 128 generated tokens: + +``` +Request: "Explain quantum computing..." (4,096 context tokens, 128 gen tokens) +Model: Llama 3.1 70B (312 KB per token) +File size: 4,096 × 312 KB = 1.28 GB + +Timeline: +├─ Queue Wait: 500ms (waiting for semaphore slot) +├─ PREFILL: Write 1.28 GB file to NVMe +│ ├─ Host (np.save serialization): 800ms +│ └─ Device (fsync to disk): 200ms +│ └─ Total: 1,000ms +├─ DECODE: Read file 4× (⌈128/32⌉ batched reads) +│ ├─ Read 1: Host 600ms + Device 150ms = 750ms +│ ├─ Read 2: Host 600ms + Device 150ms = 750ms +│ ├─ Read 3: Host 600ms + Device 150ms = 750ms +│ └─ Read 4: Host 600ms + Device 150ms = 750ms +│ └─ Total: 3,000ms +└─ Generation: 128 × 30ms = 3,840ms (simulated GPU time) + +L1 End-to-End: 500 + 1,000 + 3,000 + 3,840 = 8,340ms +L2 Storage I/O: 1,000 + 3,000 = 4,000ms +L3 Write Total: 1,000ms +L3 Read Total: 750ms (per read) +L4 Write Host: 800ms | L4 Write Device: 200ms +L4 Read Host: 600ms | L4 Read Device: 150ms +``` + +#### What Each File Represents + +| Concept | On Disk | Contents | +|---------|---------|----------| +| 1 Request | 1 `.npy` file | KV cache tensor: `(layers, 2, seq_len, kv_heads, head_dim)` | +| File size | `seq_len × bytes_per_token` | e.g., 4,096 tokens × 312 KB = 1.28 GB | +| Location | `--cache-dir/uuid.npy` | e.g., `/mnt/nvme/a1b2c3d4.npy` | + +#### L4 Breakdown: What Host vs Device Actually Measures + +**⚠️ Important:** "Device" latency is NOT pure NVMe controller latency. It includes OS/filesystem overhead. + +| Component | Write Operation | Read Operation | +|-----------|-----------------|----------------| +| **Host** | `np.save()`: Serialize numpy array + write to page cache | `posix_fadvise()` prep + `np.array()` copy | +| **Device** | `f.flush()` + `os.fsync()`: Flush page cache → NVMe | `np.load()`: File read + deserialize (includes disk I/O) | + +**What's actually measured (backends.py):** + +```python +# WRITE timing (lines 270-285) +np.save(f, data) # ← host_time starts +post_save = time.perf_counter() +f.flush() # ← device_time starts +os.fsync(f.fileno()) # Block until NVMe ACKs +post_fsync = time.perf_counter() +host_time = post_save - start # np.save() = serialize + buffered write +device_time = post_fsync - post_save # flush + fsync = page cache → NVMe + +# READ timing (lines 287-315) +os.posix_fadvise(fd, POSIX_FADV_DONTNEED) # Drop page cache (prep) +pre_load = time.perf_counter() +data = np.load(path) # ← device_time (disk read + deserialize) +load_done = time.perf_counter() +data = np.array(data) # ← host_time (copy) +device_time = load_done - pre_load # np.load() = file I/O + numpy deserialize +host_time = (pre_load - start) + (copy_done - load_done) +``` + +**Why "Device" includes more than NVMe:** +- Write: `fsync()` waits for page cache flush + NVMe write completion +- Read: `np.load()` includes syscall overhead + numpy header parsing + deserialization + +**To isolate pure NVMe latency:** Use `iostat -x` alongside the benchmark; it reports `r_await`/`w_await` which measure actual device queue time. + +#### Diagnostic Guide + +| Symptom | Meaning | Cause | Solution | +|---------|---------|-------|----------| +| Write host >> write device | `np.save()` dominates over `fsync()` | CPU serialization bottleneck | Faster CPU, smaller tensors | +| Write device >> write host | `fsync()` dominates over `np.save()` | Storage write bottleneck | Faster NVMe, check write amplification | +| Read device high | `np.load()` slow (includes disk + deserialize) | Storage read or CPU bottleneck | Check `iostat r_await` to isolate | +| Per-request latency >> sum of tier latencies | Time between operations exceeds I/O time | Semaphore contention | Use `--max-concurrent-allocs 0` | + +**Key Insight:** The L4 breakdown helps identify bottlenecks, but for pure NVMe performance, correlate with `iostat` metrics which measure actual device latency. + +--- + +### 3.3 Decode Batch Size + +Decode reads are batched to model realistic KV cache access: + +```python +decode_batch_size = cfg('decode', 'batch_size', default=32) # config.yaml: decode.batch_size +num_reads = max(1, (generate_tokens + decode_batch_size - 1) // decode_batch_size) +``` + +| `generate_tokens` | Batched Reads | +|-------------------|---------------| +| 1-32 | 1 | +| 33-64 | 2 | +| 100 | 4 | +| 500 | 16 | + +**Rationale:** Approximates continuous batching/speculative decoding in production LLM systems. + +--- + +### 3.4 Three-Tier Waterfall Architecture + +The `MultiTierCache` implements a **Waterfall LRU** strategy where hot data stays in fast tiers: + +``` + ┌─────────────────┐ + │ GPU VRAM │ ← Tier 1 (Fastest): New writes target here first + │ (Hot Data) │ + └────────┬────────┘ + │ LRU eviction when full + ↓ + ┌─────────────────┐ + │ CPU RAM │ ← Tier 2 (Fast): Evicted GPU data lands here + │ (Warm Data) │ + └────────┬────────┘ + │ LRU eviction when full + ↓ + ┌─────────────────┐ + │ NVMe SSD │ ← Tier 3 (Slow): Capacity-bounded + │ (Cold Data) │ LRU entries deleted when full + └─────────────────┘ +``` + +**Waterfall Logic:** + +1. **New allocations target GPU** – Fastest tier receives all fresh data +2. **GPU full → LRU cascades to CPU** – Least recently used entry "waterfalls" down +3. **CPU full → LRU cascades to NVMe** – Continue cascade to cold storage +4. **NVMe full → LRU deleted** – Oldest entries permanently removed + +**Why no promotion (NVMe → GPU)?** + +This is intentional for a **storage benchmark**: +- Promotion would *reduce* NVMe I/O by moving hot data back to fast tiers, undermining storage stress testing +- Streaming workloads are write-once, read-few: each request has unique cache key +- Data accessed during decode phase, then rarely touched again + +**Impact on capacity planning:** Production systems (vLLM, TensorRT-LLM) promote hot entries back to GPU, creating a mixed workload the benchmark does not model. Without promotion, the benchmark (1) overstates NVMe read bandwidth requirements (hot entries would be served from GPU/CPU after promotion), (2) understates GPU/CPU memory pressure (promoted entries compete with new allocations), and (3) cannot predict the steady-state tier distribution that determines end-to-end serving latency. Benchmark results should be interpreted as **storage throughput limits**, not end-to-end capacity under production promotion policies. + +**Temperature-Based Placement:** + +| Data Temperature | Tier | Access Pattern | +|------------------|------|----------------| +| **Hot** (recent) | GPU | Active requests, stays hot until evicted | +| **Warm** (evicted) | CPU | Recently evicted, accessed from CPU | +| **Cold** (LRU) | NVMe | Historical, accessed from NVMe | + +Data flows **downward only** (waterfall). Once evicted to NVMe, it stays there until deleted. + +--- + +### 3.5 Eviction Mechanism: Recursive Waterfall + +The eviction system uses **recursive space reservation** to ensure that demoting data from a full tier succeeds by preparing space in lower tiers first. When the bottom tier (NVMe) is full, entries are **permanently deleted**. + +#### Algorithm Overview + +```python +def _ensure_space_in_tier(tier, required_bytes, recursion_depth=0): + """ + Recursively ensures space in a tier by cascading evictions downward. + When NVMe (bottom tier) is full, LRU entries are DELETED. + """ + # 1. Check if space is already available + if current_usage + required_bytes <= target_usage: + # ATOMICALLY RESERVE SPACE inside lock + update_tier_usage(tier, required_bytes) + return True + + # 2. Identify LRU (Least Recently Used) entry in this tier + lru_entries = get_lru_entries_in_tier(tier) + if not lru_entries: + return False # Tier is empty, can't evict + + lru_key, lru_entry = lru_entries[0] + lru_size = lru_entry['size'] + + # 3. Check if this is the BOTTOM tier (NVMe) + if tier == 'nvme' or next_tier is None: + # NO LOWER TIER - DELETE the LRU entry permanently + _delete_entry(lru_key) # unlink .npy file from disk + # Loop until enough space is freed + return check_space_and_repeat() + + # 4. RECURSIVELY ensure next tier has space for the LRU entry + # This is the "waterfall" effect + if not _ensure_space_in_tier(next_tier, lru_size, recursion_depth + 1): + return False # Can't cascade further + + # 5. Demote the LRU entry to next tier + success = _demote_entry(lru_key, from_tier=tier, to_tier=next_tier) + + # 6. Loop until enough space is freed + return check_space_and_repeat() +``` + +#### Step-by-Step Example + +**Scenario:** New 10 MB entry needs to be written to GPU, but GPU is full. + +``` +Step 1: _ensure_space_in_tier('gpu', 10MB, depth=0) + ├─ GPU usage: 15.5/16 GB (97% full) + ├─ LRU entry in GPU: "conv_42_turn_3" (8 MB) + └─ Need to evict to make room + +Step 2: Recursively ensure CPU has space for 8 MB + _ensure_space_in_tier('cpu', 8MB, depth=1) + ├─ CPU usage: 30/32 GB (94% full) + ├─ LRU entry in CPU: "user_19_ctx" (6 MB) + └─ Need to evict to make room + +Step 3: Recursively ensure NVMe has space for 6 MB + _ensure_space_in_tier('nvme', 6MB, depth=2) + ├─ NVMe usage: 50/100 GB (within capacity) + └─ RESERVE 6 MB in NVMe ✓ + +Step 4: Cascade back up - demote CPU → NVMe + _demote_entry("user_19_ctx", from='cpu', to='nvme') + ├─ Read from CPU (fast) + ├─ Write to NVMe (slow but necessary) + ├─ Delete from CPU + └─ CPU now has 8 MB free ✓ + +Step 5: Cascade back up - demote GPU → CPU + _demote_entry("conv_42_turn_3", from='gpu', to='cpu') + ├─ Read from GPU (fastest) + ├─ Write to CPU (fast) + ├─ Delete from GPU + └─ GPU now has 10 MB free ✓ + +Step 6: Write new entry to GPU + allocate_cache(key, 10MB) + └─ Write to GPU ✓ +``` + +#### Eviction Configuration (config.yaml) + +```yaml +eviction: + max_recursion_depth: 10 # Max cascade depth + target_usage_ratio: 0.8 # Keep tier at 80% (20% buffer) + large_entry_limit_ratio: 0.95 # Skip to next tier if entry >95% of tier + max_evictions_hard_cap: 5000 # Safety limit per cycle + max_evictions_min: 1000 # Min evictions before giving up +``` + +**Key Parameters:** +- `target_usage_ratio: 0.8` – Eviction starts when tier reaches 80% capacity, maintaining 20% free space buffer +- `large_entry_limit_ratio: 0.95` – Entries larger than 95% of tier capacity skip directly to next tier (prevents thrashing) +- `max_recursion_depth: 10` – Prevents infinite recursion in pathological cases + +#### Concurrency & Thread Safety + +**Race Condition Protection:** +1. **Atomic Reservations:** Space is reserved inside the memory lock *before* writing, preventing over-subscription +2. **Per-Entry Locks:** Each cache key has its own lock to prevent concurrent demotions of the same entry +3. **Metadata Lock:** Global lock protects `cache_entries` dictionary from concurrent modifications + +**Example Race Condition (Prevented):** +``` +Thread A: Needs 5 MB in GPU +Thread B: Needs 5 MB in GPU +GPU has 8 MB free + +WITHOUT atomic reservation: + ├─ A checks: 8 MB free ✓ + ├─ B checks: 8 MB free ✓ + ├─ A writes 5 MB → GPU has 3 MB + └─ B writes 5 MB → GPU OVERFLOWS ✗ + +WITH atomic reservation: + ├─ A acquires lock, reserves 5 MB → GPU has 3 MB free + ├─ A releases lock + ├─ B acquires lock, checks 3 MB free + ├─ B triggers eviction, demotes LRU to CPU + └─ B reserves 5 MB → GPU has sufficient space ✓ +``` + +#### Tier Configuration: What Happens When Tiers Are Disabled + +The eviction waterfall adapts based on which tiers are enabled via `--gpu-mem-gb` and `--cpu-mem-gb`: + +**Configuration 1: `--gpu-mem-gb 0 --cpu-mem-gb 0` (NVMe Only)** + +``` +Tier hierarchy: [NVMe only] +Eviction: LRU DELETION (no lower tier to demote to) + +allocate_cache("user_request", 1.28 GB) +├─ GPU tier: DISABLED (0 GB) → skip +├─ CPU tier: DISABLED (0 GB) → skip +└─ NVMe tier: WRITE DIRECTLY + └─ np.save("/mnt/nvme/uuid.npy", kv_data) +``` + +**How NVMe capacity is determined:** + +| `--storage-capacity-gb` | Behavior | +|-------------------------|----------| +| `> 0` (explicit) | Uses specified value (e.g., `--storage-capacity-gb 100` → 100 GB) | +| `0` (default) | Auto-detects via `shutil.disk_usage(cache_dir).free` | +| Auto-detect fails | `float('inf')` (unlimited, grows until disk full) | + +**What happens when NVMe fills up?** + +Once NVMe reaches `target_usage_ratio` (default 80%), **LRU entries are permanently deleted** to make room: + +``` +NVMe capacity: 100 GB (--storage-capacity-gb 100) +Target usage: 80 GB (80%) +Current usage: 82 GB +New entry: 1.28 GB + +Step 1: _ensure_space_in_tier('nvme', 1.28 GB) + ├─ Usage 82 GB > target 80 GB + ├─ Need to free: 82 + 1.28 - 80 = 3.28 GB + └─ Find LRU entries to DELETE + +Step 2: Delete LRU entries until space is available + ├─ DELETE "user_5_turn_1" (0.9 GB) → unlink file + ├─ DELETE "user_12_turn_2" (1.1 GB) → unlink file + ├─ DELETE "user_8_turn_1" (0.8 GB) → unlink file + ├─ DELETE "user_3_turn_3" (0.6 GB) → unlink file + └─ Total freed: 3.4 GB ✓ + +Step 3: Write new entry + └─ np.save("/mnt/nvme/new_entry.npy", kv_data) ✓ + +Result: 4 old cache entries permanently lost, 1 new entry written +``` + +**Key point:** With `--gpu-mem-gb 0 --cpu-mem-gb 0`, the NVMe tier acts as a **fixed-size LRU cache**. Old entries are evicted (deleted) to make room for new ones. + +**Use case:** Pure storage benchmark. Measures sustained NVMe performance under cache pressure with realistic eviction churn. + +#### Two Separate Eviction Mechanisms + +The benchmark has **two independent eviction systems**. Only one of them deletes files from disk: + +| Mechanism | Location | Trigger | What Happens | +|-----------|----------|---------|--------------| +| **ConversationManager** | `conversation.py` | `len(conversations) >= max_conversations` | Removes conversation **metadata** from memory. Cache files (.npy) **remain on disk**. | +| **MultiTierCache** | `cache.py` | `tier_usage >= capacity × target_ratio` | Calls `path.unlink()` on .npy files, **permanently deleting them from the filesystem**. | + +**ConversationManager eviction (default: 1000 conversations):** +```python +# conversation.py line 72-73 +if len(self.conversations) >= self.max_conversations: # default 1000 + self._evict_oldest_conversation() # removes metadata dict entry ONLY +``` + +This removes the conversation tracking record (an in-memory dict entry). The **cache .npy files remain on disk** untouched; they are only deleted when MultiTierCache runs out of capacity. + +**MultiTierCache eviction (based on storage capacity):** +```python +# cache.py - when NVMe is the bottom tier and full +if nvme_usage >= nvme_capacity * 0.8: + for lru_key in lru_entries_to_evict: + self.backends['nvme'].delete(lru_key) # calls path.unlink() -> file permanently deleted + +# backends.py - NVMeBackend.delete() +def delete(self, key): + path = self.base_path / f"{key}.npy" + path.unlink() # POSIX unlink: permanently removes the file from the filesystem + del self.metadata[key] +``` + +**Example timeline:** +``` +t=0: Conversation 1 started, cache file written (1.2 GB) +t=10: Conversation 1000 started +t=11: Conversation 1001 started + ├─ ConversationManager evicts conv 1 metadata (dict entry removed) + └─ Cache .npy file for conv 1 STILL ON DISK (untouched) + +t=100: NVMe reaches 80% capacity + ├─ MultiTierCache calls NVMeBackend.delete() on LRU entries + └─ Conv 1's .npy file permanently deleted from filesystem via path.unlink() +``` + +**Config locations:** +```yaml +# config.yaml +conversation: + max_conversations: 1000 # ConversationManager limit + max_turns_per_conv: 50 + +eviction: + target_usage_ratio: 0.8 # MultiTierCache limit (80% of capacity) +``` + +--- + +**Configuration 2: `--gpu-mem-gb 0 --cpu-mem-gb 4` (CPU + NVMe)** + +``` +Tier hierarchy: [CPU (4 GB)] → [NVMe] +Eviction: CPU → NVMe (single-hop) + +allocate_cache("user_request", 1.28 GB) +├─ GPU tier: DISABLED (0 GB) → skip +├─ CPU tier: Check if 1.28 GB fits in 4 GB budget +│ ├─ If fits: Write to CPU RAM (fast) +│ └─ If full: Evict LRU from CPU → NVMe, then write to CPU +└─ If CPU can't fit entry (>4 GB): Write directly to NVMe +``` + +**Example eviction flow:** +``` +CPU usage: 3.5 / 4.0 GB (87.5%) +New entry: 1.28 GB +Required free: 1.28 GB +Available: 0.5 GB +Deficit: 0.78 GB + +Step 1: _ensure_space_in_tier('cpu', 1.28 GB) + ├─ Need to evict 0.78 GB from CPU + ├─ LRU entry: "old_ctx" (0.9 GB) + └─ Demote "old_ctx" CPU → NVMe + +Step 2: _demote_entry("old_ctx", from='cpu', to='nvme') + ├─ Read from CPU RAM: 2ms + ├─ Write to NVMe: 100ms + └─ CPU now has 1.4 GB free ✓ + +Step 3: Write new entry to CPU + └─ Write 1.28 GB to CPU RAM: 5ms ✓ +``` + +**Use case:** Hybrid benchmark. Hot data in CPU RAM, cold data spills to NVMe. Measures CPU→NVMe demotion overhead. + +--- + +**Configuration 3: `--gpu-mem-gb 16 --cpu-mem-gb 32` (Full 3-Tier)** + +``` +Tier hierarchy: [GPU (16 GB)] → [CPU (32 GB)] → [NVMe] +Eviction: GPU → CPU → NVMe (multi-hop cascade) +``` + +This is the full recursive waterfall described above. + +--- + +#### Summary: Tier Configurations + +| Config | Active Tiers | Eviction Pattern | I/O Measured | +|--------|--------------|------------------|--------------| +| `--gpu-mem-gb 0 --cpu-mem-gb 0` | NVMe only | None | Pure NVMe read/write | +| `--gpu-mem-gb 0 --cpu-mem-gb 4` | CPU → NVMe | CPU → NVMe | CPU hits + NVMe spill | +| `--gpu-mem-gb 16 --cpu-mem-gb 0` | GPU → NVMe | GPU → NVMe | GPU hits + NVMe spill | +| `--gpu-mem-gb 16 --cpu-mem-gb 32` | GPU → CPU → NVMe | Full cascade | Full tier hierarchy | + +**Key behavior when a tier is set to 0:** +- The tier is **completely bypassed** in allocation decisions +- Entries skip directly to the next enabled tier +- No eviction can occur *from* a disabled tier (nothing stored there) +- The waterfall "shortens" to only include enabled tiers + +#### Eviction vs. Spillover + +**Old Approach (Spillover):** When GPU full, new data forced to CPU → penalizes hot data + +**New Approach (Waterfall):** When GPU full, evict *old cold data* to CPU → new hot data stays fast + +| Aspect | Spillover | Waterfall LRU | +|--------|-----------|---------------| +| **New data placement** | Forced to slower tier | Always targets fastest tier | +| **Evicted data** | Random or FIFO | LRU (least recently used) | +| **Hot data performance** | ❌ Degraded | ✅ Optimal | +| **Production use** | Rare | vLLM, TensorRT-LLM, LMCache, Redis | + +**Production References:** + +1. **vLLM** uses LRU eviction for KV cache blocks: + > *"When the head block (least recently used block) of the free queue is cached, we have to evict the block... Pop the block from the head of the free queue. This is the LRU block to be evicted."* + >; [vLLM Prefix Caching Documentation](https://docs.vllm.ai/en/latest/design/v1/prefix_caching.html) + +2. **TensorRT-LLM** uses LRU eviction with optional offloading: + > *"When this happens, reusable blocks are evicted based on LRU. System prompts that are frequently used have a better chance of remaining reusable."* + >; [TensorRT-LLM KV Cache Reuse](https://nvidia.github.io/TensorRT-LLM/advanced/kv-cache-reuse.html) + +3. **LMCache** supports configurable eviction policies including LRU: + > *"Currently, LMCache supports 'LRU' (Least Recently Used), 'MRU' (Most Recently Used), 'LFU' (Least Frequently Used) and 'FIFO' (First-In-First-Out) caching policies."* + >; [LMCache Caching Policies](https://docs.lmcache.ai/kv_cache/caching_policies.html) + +4. **Redis** provides multiple LRU-based eviction policies: + > *"Use `allkeys-lru` when you expect that a subset of elements will be accessed far more often than the rest. This is a very common case according to the Pareto principle, so `allkeys-lru` is a good default option."* + >; [Redis Eviction Policies](https://redis.io/docs/latest/develop/reference/eviction/) + +--- + +### 3.6 Modular Architecture + +The benchmark has been refactored from a monolithic `kv-cache.py` script into a modular Python package (`kv_cache/`) for maintainability, testability, and extensibility. + +#### Package Structure + +``` +kv_cache/ # Main package directory +├── __init__.py # Public API exports +├── _compat.py # Compatibility flags (CUDA/PyTorch/YAML detection) +├── backends.py # Storage tier implementations (GPU/CPU/NVMe) +├── benchmark.py # IntegratedBenchmark orchestrator +├── cache.py # KVCacheGenerator + MultiTierCache (core engine) +├── cli.py # Command-line interface + XLSX export +├── config.py # YAML configuration loader +├── conversation.py # Multi-turn conversation management +├── models.py # Data models (ModelConfig, InferenceRequest, QoS) +├── monitoring.py # StorageMonitor, QoSMonitor, WorkloadAutoscaler +├── prefix_cache.py # Shared system prompt caching +├── rag.py # RAG workload simulation +├── workload.py # UserSimulator, ShareGPT/BurstGPT loaders +└── test_kv_cache.py # Pytest unit tests +``` + +#### Module Responsibilities + +| File | Purpose | Key Classes/Functions | +|------|---------|----------------------| +| **`__init__.py`** | Package entry point. Re-exports all public symbols for backward compatibility. | Re-exports: `MultiTierCache`, `IntegratedBenchmark`, `main()`, etc. | +| **`_compat.py`** | Detects optional dependencies (CuPy, PyTorch, YAML, Pandas) and sets feature flags. | `HAS_CUPY`, `HAS_TORCH`, `HAS_YAML`, `HAS_PANDAS`, `cp` (CuPy alias) | +| **`backends.py`** | Implements storage tier backends with `IOTiming` breakdowns (host vs device latency). | `StorageBackend` (base), `GPUMemoryBackend`, `CPUMemoryBackend`, `NVMeBackend` | +| **`benchmark.py`** | High-level orchestrator that coordinates cache, workload generator, monitoring, and telemetry. | `IntegratedBenchmark` | +| **`cache.py`** | **Core engine:** KV cache generation with static noise buffers + multi-tier cache with waterfall LRU eviction. | `KVCacheGenerator`, `MultiTierCache` | +| **`cli.py`** | Command-line argument parsing, validation, and Excel export functionality. | `main()`, `export_results_to_xlsx()` | +| **`config.py`** | Loads and validates `config.yaml`. Provides `cfg()` accessor for nested keys. | `ConfigLoader`, `cfg()`, `get_config()`, `set_config()` | +| **`conversation.py`** | Tracks multi-turn conversation state, manages turn history, conversation lifecycle. | `ConversationState`, `ConversationManager` | +| **`models.py`** | **Data models:** Model architectures (layers, heads, dims), inference phases, QoS levels, user profiles, request structures. | `ModelConfig`, `InferencePhase`, `GenerationMode`, `QoSLevel`, `UserProfile`, `InferenceRequest` | +| **`monitoring.py`** | Real-time telemetry collection, saturation detection, QoS tracking, autoscaling logic. | `StorageMetrics`, `StorageMonitor`, `QoSMonitor`, `WorkloadAutoscaler` | +| **`prefix_cache.py`** | Detects common system prompts, manages shared prefix cache entries, tracks reuse stats. | `PrefixType`, `PrefixMatcher`, `PrefixCacheManager` | +| **`rag.py`** | Simulates Retrieval-Augmented Generation: document ingestion, chunking, top-k retrieval. | `RAGChunk`, `RAGDocument`, `RAGDocumentManager` | +| **`workload.py`** | Generates synthetic requests, loads ShareGPT/BurstGPT traces, validates CLI arguments. | `UserSimulator`, `ShareGPTDatasetLoader`, `RealTraceEntry`, `validate_args()` | +| **`test_kv_cache.py`** | Pytest unit tests covering tier logic, eviction, QoS, prefix caching, RAG, autoscaling. | 90+ test functions | + +--- + +#### Dependency Graph + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ CLI Entry Point │ +│ cli.py: main() │ +└────────────────────────┬────────────────────────────────────────┘ + │ + ↓ +┌─────────────────────────────────────────────────────────────────┐ +│ Benchmark Orchestrator │ +│ benchmark.py: IntegratedBenchmark │ +└──┬──────────┬───────────┬──────────┬──────────┬──────────┬─────┘ + │ │ │ │ │ │ + ↓ ↓ ↓ ↓ ↓ ↓ +┌──────┐ ┌─────────┐ ┌────────┐ ┌──────────┐ ┌───────┐ ┌────────┐ +│cache │ │workload │ │monitoring│ │conversation│ │ rag │ │prefix │ +│.py │ │.py │ │.py │ │.py │ │.py │ │_cache │ +└──┬───┘ └────┬────┘ └────┬─────┘ └─────┬────┘ └───┬──┘ └───┬───┘ + │ │ │ │ │ │ + │ │ │ │ │ │ + └──────────┴───────────┴──────────────┴──────────┴────────┘ + │ + ↓ + ┌──────────────────────┐ + │ Foundation Layers │ + │ models.py (data) │ + │ backends.py (I/O) │ + │ config.py (settings)│ + │ _compat.py (flags) │ + └──────────────────────┘ +``` + +--- + +#### Key Design Patterns + +**1. Separation of Concerns** +- **Data Models** (`models.py`) define structure +- **Business Logic** (`cache.py`, `monitoring.py`) implement behavior +- **I/O Abstraction** (`backends.py`) isolate storage details +- **Orchestration** (`benchmark.py`) coordinates components + +**2. Dependency Injection** +- `IntegratedBenchmark` receives `MultiTierCache`, `UserSimulator`, `StorageMonitor` as constructor arguments +- Enables unit testing with mocks/stubs + +**3. Configuration-Driven** +- All internal parameters in `config.yaml` +- CLI arguments override config values +- Enables batch testing without code changes + +**4. Thread-Safe Telemetry** +- All stats updates protected by locks +- Atomic counters for concurrent operations +- Safe for multi-threaded workload generation + +**5. Backward Compatibility** +- `kv-cache.py` wrapper preserves old import path +- `__init__.py` re-exports all public symbols +- Existing test scripts continue to work + +--- + +#### Extensibility Points + +To add new functionality: + +| Feature | Files to Modify | +|---------|----------------| +| **New storage tier** | `backends.py`: Add new `Backend` class implementing `read()`, `write()`, `delete()` | +| **New autoscaler mode** | `monitoring.py`: Add mode to `WorkloadAutoscaler._should_scale()` | +| **New QoS level** | `config.yaml`: Add to `qos_profiles`, `models.py`: Update `QoSLevel` enum | +| **New model** | `config.yaml`: Add to `model_configs` with layer/head/dim values | +| **New workload source** | `workload.py`: Add loader class similar to `ShareGPTDatasetLoader` | +| **New metric** | `cache.py`: Add to `self.stats` dict, `benchmark.py`: Include in output JSON | + +--- + +### 3.7 NVMe Backend Implementation + +**File Mapping:** `{cache_dir}/{cache_key}.npy` + +**I/O Rigor:** Bypasses Linux page cache using `posix_fadvise(DONTNEED)` to ensure measurements reflect actual disk performance. + +**Write Path:** +```python +def write(self, key: str, data: np.ndarray) -> IOTiming: + start = time.perf_counter() + + # HOST LATENCY: Serialization (CPU-bound) + np.save(f, data, allow_pickle=False) + post_save = time.perf_counter() + + # DEVICE LATENCY: Blocking disk I/O + f.flush() + os.fsync(f.fileno()) # Blocks until persisted + post_fsync = time.perf_counter() + + return IOTiming( + host=post_save - start, + device=post_fsync - post_save, + total=post_fsync - start + ) +``` + +**Read Path:** +```python +def read(self, key: str) -> Tuple[np.ndarray, IOTiming]: + # Drop from page cache to force real I/O + os.posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED) + + pre_load = time.perf_counter() + # DEVICE LATENCY: Actual disk read + data = np.load(path, allow_pickle=False) + load_done = time.perf_counter() + + # HOST LATENCY: Array materialization + data = np.array(data) + copy_done = time.perf_counter() + + return data, IOTiming( + device=load_done - pre_load, + host=(pre_load - start) + (copy_done - load_done), + total=copy_done - start + ) +``` + +--- + +### 3.8 Generation Mode: Simulating GPU Backpressure + +Real LLM inference has GPU compute time between I/O operations. Without simulating this, the benchmark would unrealistically flood storage with requests. + +| Mode | Behavior | Use Case | +|------|----------|----------| +| `none` | No sleep | Pure storage benchmark | +| `realistic` | Sleep proportional to token generation | Production simulation | +| `aggressive` | Minimal sleep | Stress testing | + +**Realistic Mode Calculation:** +```python +# Based on NVIDIA A100 inference speed (~50 tok/s) +sleep_time = generate_tokens * 0.02 # 20ms per token +time.sleep(sleep_time) +``` + +This models natural pacing where the GPU's compute creates gaps between storage requests, preventing artificial saturation. + +--- + +### 3.9 QoS Classes: Prioritizing Users + +Three Quality of Service levels model real-world priority: + +| QoS Level | Use Case | Target P95 | Target P99 | Priority | +|-----------|----------|------------|------------|----------| +| **INTERACTIVE** | Real-time chatbots | 50 ms | 100 ms | 3 (Highest) | +| **RESPONSIVE** | Near real-time | 100 ms | 200 ms | 2 | +| **BATCH** | Offline jobs | 1,000 ms | 5,000 ms | 1 (Lowest) | + +**Default Distribution:** 60% Interactive, 30% Responsive, 10% Batch + +**Priority Queue:** Higher-priority requests processed first: +``` +[INTERACTIVE] → [INTERACTIVE] → [RESPONSIVE] → [BATCH] + ↓ + Processed First +``` + +**Output Example:** +```json +"qos_stats": { + "interactive": { + "latency_p95_ms": 42.3, + "sla_met": true + }, + "batch": { + "latency_p95_ms": 2847.5, + "sla_met": false // Appropriately deprioritized + } +} +``` + +--- + +### 3.10 Prefix Caching: System Prompt Optimization + +Many requests share common system prompts. Instead of redundantly storing identical prefixes, the benchmark implements shared caching: + +**Three Common Prompts:** +```python +COMMON_SYSTEM_PROMPTS = [ + "You are a helpful, harmless, and honest AI assistant.", + "You are a coding assistant. Provide clear, working code examples.", + "You are a creative writing assistant. Be imaginative and engaging.", +] +``` + +**Cache Key:** `kv_system_{md5_hash[:8]}` + +**Lifecycle:** +``` +t=0 User A: "You are helpful..." + "Hello" + → Miss → Full prefill → Store as kv_system_a1b2c3d4 + +t=1 User B: "You are helpful..." + "Hi" + → HIT → Read cached prefix → Only prefill "Hi" + +t=2 [LRU eviction of kv_system_a1b2c3d4] + +t=3 User C: "You are helpful..." + "Hey" + → Miss → Full prefill → Re-store +``` + +**Metrics:** +- `system_prompt_reuse` – Detection attempts +- `system_prompt_hits` – Successful cache reads +- **Gap = Memory Pressure** – Low hit rate indicates insufficient memory + +--- + +### 3.11 RAG Workflow: Retrieval-Augmented Generation + +RAG creates bursty, front-loaded I/O patterns: + +``` +Standard Conversation RAG Workload +------------------- ------------ +User: "Hello" User: "What does contract say..." + ↓ ↓ +[Small Prefill] [Vector DB Lookup] + ↓ ↓ +[Incremental Decode] [Load 10-50 Document Chunks] ← BURST + ↓ + [Massive Context Prefill] + ↓ + [Generate Response] +``` + +**Three Phases:** +1. **Ingestion** (offline) – Split documents → Compute KV cache → Store +2. **Retrieval** (per query) – Vector similarity search → Return top_k chunks +3. **Inference** (per query) – Load chunk KV caches → Concatenate → Generate + +**Read Amplification:** + +| Metric | Standard Chat | RAG Query | +|--------|---------------|-----------| +| Context at start | ~1 KB | **500 MB - 2 GB** | +| Reads before first token | 1 | **10-50** | +| Storage pressure | Gradual | **Instant burst** | + +**Enable with:** `--enable-rag --rag-top-k 10` + +--- + +### 3.12 Autoscaling Modes + +#### QoS Mode (Production Sizing) +**Goal:** Find max users while maintaining latency SLAs + +**Logic:** +``` +Collect KPIs (P95 latency every 5s) + ↓ +Calculate Saturation (0.0 - 1.0) + ↓ +Compare to Target (default 0.8) + ↓ +Adjust Load: + - Saturation < 0.7 → Add users (+10-20%) + - 0.7 ≤ Saturation ≤ 0.9 → Hold steady + - Saturation > 0.9 → Remove users + cooldown (30s) +``` + +#### Capacity Mode (Hardware Benchmarking) +**Goal:** Find absolute peak throughput (ignores latency) + +**Logic:** +``` +Ramp-up Phase: Double users while throughput increases rapidly + ↓ +Fine-tune Phase: 1.5× scaling when growth slows + ↓ +Terminate: When throughput decreases from previous stage +``` + +**Output:** +```json +"autoscaling_stats": [ + {"users": 20, "throughput": 450, "saturation": 0.45, "action": "scale_up"}, + {"users": 50, "throughput": 890, "saturation": 0.82, "action": "hold"}, + {"users": 45, "throughput": 865, "saturation": 0.79, "action": "stabilized"} +] +``` + +--- + +## 4. Memory Requirements & Capacity Planning + +### 4.1 User Profile Context Ranges + +The benchmark simulates three user personas with context ranges justified by recent production workload studies: + +#### Research Citations + +**[1] OpenRouter "State of AI: An Empirical 100T Token Study" (arXiv:2601.10088)** +- Average prompt tokens grew ~4× from ~1,500 to >6,000 (early 2024 → late 2025) +- Programming workloads routinely exceed 20K input tokens +- Non-programming categories remain "relatively flat and low-volume" +- Overall input:output ratio ~15:1 + +**[2] BurstGPT (arXiv:2401.17644); 10.31M traces from Azure OpenAI GPT** +- Request lengths follow a Zipf distribution (many short, long tail) +- ChatGPT response lengths are bimodal with linear request-response correlation +- Average 621 request tokens, 126 response tokens (after filtering failures) + +--- + +### User Profiles + +| Profile | Context Range | Generation Range | Justification | +|---------|---------------|------------------|---------------| +| **chatbot** | 512-4096 | 50-200 | General-purpose conversational use. Non-programming categories stay well below platform average of ~6K [1]. Zipf-shaped request distribution means most chatbot prompts are short [2]. | +| **coding** | 4096-25000 | 100-500 | Programming is the dominant context-length driver, "routinely exceeding 20K input tokens" and averaging 3-4× general-purpose prompts [1]. Claude handles ~60% of coding workloads at >20K avg [1]. Output stays modest relative to input (~15:1 ratio) [1]. | +| **document** | 4096-16384 | 200-800 | Long-context document analysis (summarization, Q&A). Sits between chatbot and coding; context-heavy but below coding peaks. Overall avg sequence length >5,400 tokens by late 2025 [1]. | + +**Think Time Ranges:** +- **chatbot:** 0.1-0.5 sec (rapid interaction) +- **coding:** 0.2-1.0 sec (developers pause to review) +- **document:** 0.3-1.5 sec (users read lengthy outputs) + +--- + +### 4.2 KV Cache Size Formula + +**MHA/GQA models:** +``` +Bytes per Token = num_layers × 2 × kv_heads × head_dim × bytes_per_dtype +``` + +**MLA models (DeepSeek-V3):** +``` +Bytes per Token = num_layers × (kv_lora_rank + qk_rope_head_dim) × bytes_per_dtype +``` +MLA jointly compresses K and V into a single latent vector (no ×2 factor), plus a shared RoPE key dimension. + +**head_dim calculation:** `hidden_dim / num_heads` (for MHA/GQA); not applicable for MLA + +| Model | Attention | Layers | kv_heads | head_dim | Bytes/Token | MB/Token | 8K Context | +|-------|-----------|--------|----------|----------|-------------|----------|------------| +| `tiny-1b` | GQA | 12 | 4 | 128 | 24,576 | 0.023 | 192 MB | +| `mistral-7b` | GQA | 32 | 8 | 128 | 131,072 | 0.125 | 1,024 MB | +| `llama2-7b` | MHA | 32 | 32 | 128 | 524,288 | 0.500 | 4,096 MB | +| `llama3.1-8b` | GQA | 32 | 8 | 128 | 131,072 | 0.125 | 1,024 MB | +| `llama3.1-70b-instruct` | GQA | 80 | 8 | 128 | 327,680 | 0.313 | 2,560 MB | +| `deepseek-v3` | **MLA** | 61 | N/A | N/A | 70,272 | 0.067 | 549 MB | +| `qwen3-32b` | GQA | 64 | 8 | 80 | 163,840 | 0.153 | 1,248 MB | +| `gpt-oss-120b` (MoE) | GQA | 36 | 8 | 64 | 73,728 | 0.069 | 563 MB | +| `gpt-oss-20b` (MoE) | GQA | 24 | 8 | 64 | 49,152 | 0.046 | 376 MB | + +**Note:** DeepSeek-V3 uses Multi-head Latent Attention (MLA) which compresses K and V into a single latent of dimension 512 + 64 RoPE = 576, yielding ~25× smaller KV cache than the equivalent MHA configuration. MoE (Mixture of Experts) models like GPT-OSS have smaller KV cache because only a subset of experts is active per request. + +### 4.3 System RAM Requirements + +**Formula:** +``` +Minimum RAM = cpu_mem_gb + peak_in_flight_RAM + 4 GB overhead +Peak In-Flight RAM = max_concurrent_allocs × avg_context_tokens × bytes_per_token +``` + +**Peak In-Flight RAM:** +- **Default (`--max-concurrent-allocs 0`):** `num_users × avg_context × bytes_per_token`; **DANGEROUS for large models** +- **Bounded (`--max-concurrent-allocs N`):** `N × avg_context × bytes_per_token`; **RECOMMENDED** + +--- + +### 4.4 Peak RAM by Model and Concurrency Limit + +The following table shows peak in-flight RAM consumption assuming **8,192 average context tokens** (midpoint of coding user profile). This excludes `cpu_mem_gb` allocation. + +| Model | Architecture | MB/Token | Per User | 200 users (unlimited) | 16 allocs | 8 allocs | 4 allocs | +|-------|--------------|----------|----------|----------------------|-----------|----------|----------| +| `tiny-1b` | GQA | 0.023 | 0.2 GB | 40 GB | 3.2 GB | 1.6 GB | 0.8 GB | +| `mistral-7b` | GQA | 0.125 | 1.0 GB | 200 GB | 16 GB | 8 GB | 4 GB | +| `llama2-7b` | **MHA** | **0.500** | **4.0 GB** | **800 GB** | **64 GB** | **32 GB** | **16 GB** | +| `llama3.1-8b` | GQA | 0.125 | 1.0 GB | 200 GB | 16 GB | 8 GB | 4 GB | +| `llama3.1-70b-instruct` | GQA | 0.313 | 2.5 GB | 500 GB | 40 GB | 20 GB | 10 GB | +| `deepseek-v3` | **MLA** | 0.067 | 0.54 GB | 107 GB | 9 GB | 4.3 GB | 2.1 GB | +| `qwen3-32b` | GQA | 0.153 | 1.25 GB | 250 GB | 20 GB | 10 GB | 5 GB | +| `gpt-oss-120b` | MoE | 0.069 | 0.56 GB | 112 GB | 9 GB | 4.5 GB | 2.3 GB | +| `gpt-oss-20b` | MoE | 0.046 | 0.38 GB | 76 GB | 6 GB | 3 GB | 1.5 GB | + +> **Why is `llama2-7b` so large?** It uses Multi-Head Attention (MHA) with 32 KV heads (same as attention heads), while newer models like `llama3.1-8b` use Grouped Query Attention (GQA) with only 8 KV heads. This 4× difference makes `llama2-7b` an excellent stress test model. + +--- + +### 4.5 Recommended Settings by System RAM + +| System RAM | `--max-concurrent-allocs` | Safe Models (unlimited concurrency) | +|------------|---------------------------|-------------------------------------| +| 32 GB | 4 | `tiny-1b`, `gpt-oss-20b`, `deepseek-v3` | +| 64 GB | 8 | `mistral-7b`, `llama3.1-8b`, `qwen3-32b`, `gpt-oss-120b`, `deepseek-v3` | +| 128 GB | 16 | All GQA/MoE/MLA models | +| 256 GB | 16–32 | All models with bounded concurrency | +| 512 GB+ | 32–64 | All models including `llama2-7b` (MHA) | + +--- + +### 4.6 Impact of `--max-concurrent-allocs` on Benchmark Results + +This parameter controls how many KV cache allocations can be in-flight simultaneously. It has significant effects on benchmark metrics: + +| Setting | Throughput Impact | Latency Impact | I/O Queue Depth | Realism | +|---------|-------------------|----------------|-----------------|---------| +| **0 (unlimited)** | Maximum | Lowest (no queueing) | Very high | Low; no admission control | +| **16** | High | Low-moderate | High | Moderate; stress test | +| **8** | Moderate | Moderate (queueing) | Moderate | High; production-like | +| **4** | Lower | Higher (significant queueing) | Low | Highest; memory-constrained | + +**Why this matters for storage benchmarking:** + +1. **Throughput measurement:** Lower concurrency limits reduce I/O parallelism, which can understate the storage device's peak capability. A PCIe Gen5 NVMe can handle 32+ concurrent operations. + +2. **Latency measurement:** With unlimited concurrency, latency measurements reflect pure device latency. With bounded concurrency, latency includes queueing time; more realistic for production systems with admission control. + +3. **Tail latency (P99):** Lower concurrency values produce more stable P99 latencies because fewer requests compete for I/O resources simultaneously. + +4. **Cache hit rate:** Not directly affected; hit rates depend on working set size and cache tier capacities, not concurrency. + +**Recommended settings by test objective:** + +| Objective | `--max-concurrent-allocs` | Rationale | +|-----------|---------------------------|-----------| +| Peak storage throughput | 16–32 | Maximize I/O parallelism to saturate device | +| Production simulation | 8 | Realistic admission control | +| Latency-sensitive test | 4–8 | Minimize queueing variability | +| Memory-constrained system | 4 | Prevent OOM while still achieving measurement | + +--- + +### 4.7 Example Configurations + +| Config | Model | Users | `--max-concurrent-allocs` | `--cpu-mem-gb` | Minimum RAM | +|--------|-------|-------|---------------------------|----------------|-------------| +| Storage stress | `llama3.1-8b` | 200 | 16 | 0 | 20 GB | +| Storage stress | `llama2-7b` | 200 | 8 | 0 | 36 GB | +| Production sim | `llama3.1-8b` | 100 | 8 | 32 | 44 GB | +| 70B stress | `llama3.1-70b` | 70 | 4 | 0 | 14 GB | +| Large model | `deepseek-v3` | 50 | 4 | 0 | 6 GB | + +**⚠️ Critical Warning:** Running `llama2-7b` with `--max-concurrent-allocs 0` (unlimited) on systems with <1 TB RAM **will cause OOM kills**. The semaphore correctly limits concurrent allocations, but unlimited concurrency allows 200 simultaneous allocations. Note: `deepseek-v3` uses MLA which compresses KV cache ~25× vs MHA, so it requires far less RAM than its parameter count suggests. + +--- + +### 4.8 Disaggregated Inference Modes + +Modern inference systems (vLLM, TensorRT-LLM, Mooncake) often separate **prefill** and **decode** into different node pools for efficiency. The benchmark supports testing each workload pattern independently: + +| Mode | CLI Flag | I/O Pattern | Simulates | +|------|----------|-------------|-----------| +| Standard | *(none)* | Mixed R/W | Colocated prefill+decode | +| Prefill-only | `--prefill-only` | **Write-heavy** | Disaggregated prefill node | +| Decode-only | `--decode-only` | **Read-heavy** | Disaggregated decode node | + +#### How It Works + +``` +Standard Mode (default): + Request → PREFILL (write KV) → DECODE (read KV repeatedly) → Response + +--prefill-only (write-heavy): + Request → PREFILL (write KV) → [DECODE skipped] → Response + Use case: SSD endurance testing, prefill node simulation + +--decode-only (read-heavy): + [Pre-populate cache] → Request → DECODE (read from pre-populated cache) → Response + Use case: Read IOPS/latency testing, decode node simulation +``` + +**Decode-only initialization:** Before the benchmark starts, the system pre-populates the cache with `num_users × 10` entries (simulating KV caches written by prefill nodes). The benchmark then measures pure read performance against this existing data. + +#### Example Commands + +```bash +# Test prefill node (write-heavy) - measures SSD write endurance +python3 kv-cache.py --model llama3.1-70b-instruct --prefill-only \ + --gpu-mem-gb 0 --cpu-mem-gb 0 \ + --num-users 100 --duration 300 --cache-dir /mnt/nvme \ + --max-concurrent-allocs 8 --generation-mode none + +# Test decode node (read-heavy) - measures read IOPS +python3 kv-cache.py --model llama3.1-70b-instruct --decode-only \ + --gpu-mem-gb 0 --cpu-mem-gb 0 \ + --num-users 100 --duration 300 --cache-dir /mnt/nvme \ + --max-concurrent-allocs 8 --generation-mode none +``` + +**Note:** These flags are mutually exclusive. The benchmark will error if both are specified. + +#### Preconditioning vs Prefill-Only vs Decode-Only + +| Feature | `--precondition` | `--prefill-only` | `--decode-only` | +|---------|------------------|------------------|-----------------| +| **Purpose** | Reach SSD steady-state | Benchmark write performance | Benchmark read performance | +| **When** | Before benchmark starts | During benchmark | During benchmark | +| **I/O Pattern** | Sequential writes (fixed 2KB) | Write-heavy (+ prefix/multi-turn reads) | Reads from pre-populated cache | +| **Data Volume** | 2× NVMe capacity | Depends on duration/users | N/A (reads only) | +| **Stats Reset** | Yes (writes don't count) | No (writes ARE the metric) | Yes (pre-pop doesn't count) | + +**Note on prefill-only reads:** Even in `--prefill-only` mode, reads occur for prefix cache hits, multi-turn history, and RAG chunks. For **pure write testing**, add: +```bash +--disable-multi-turn --disable-prefix-caching +``` + +**Combined usage:** For rigorous SSD write testing: +```bash +python3 kv-cache.py --precondition --prefill-only \ + --disable-multi-turn --disable-prefix-caching \ + --gpu-mem-gb 0 --cpu-mem-gb 0 \ + --model llama3.1-70b-instruct --num-users 100 --duration 300 --cache-dir /mnt/nvme +``` +This fills the SSD to steady-state first, then measures sustained write throughput with zero reads. + +--- + +## 5. Validation Results + +### Test Environment + +| Component | Specification | +|-----------|---------------| +| **Server** | Supermicro SYS-621H-TN12R | +| **CPU** | 2× Intel Xeon Silver 4510 (48T total) | +| **RAM** | 256 GB DDR5-4800 ECC | +| **GPU** | NVIDIA H100 NVL (94 GB HBM3) | +| **NVMe** | 7.0 TB enterprise SSD (~14 GB/s) | +| **OS** | Ubuntu 22.04, Linux 6.5.0 | + +### 5.1 Storage Tier Differentiation + +**Configuration:** Mistral-7B, 500 prompts (ShareGPT), 50 concurrent users, 3 trials each + +| Tier | Storage Throughput | Speedup vs NVMe | +|------|-------------------|-----------------| +| **GPU Only** | 1,691 ± 154 tok/s | **6.4×** | +| **GPU + CPU** | 1,546 ± 257 tok/s | **5.9×** | +| **GPU + CPU + NVMe** | 1,175 ± 178 tok/s | **4.4×** | +| **NVMe Only** | 263 ± 2 tok/s | 1.0× (baseline) | + +**Conclusion:** GPU provides 6.4× improvement over NVMe-only storage. + +--- + +### 5.2 Fast vs Slow System Comparison + +**Systems:** +- **Fast:** Bare metal, 7.0 TB NVMe (14 GB/s theoretical) +- **Slow:** VMware ESXi 8.0.3, VMFS6 volume (3 GB/s theoretical) + +**Global Results (220 matched configurations):** + +| Metric | Fast | Slow | Ratio | +|--------|------|------|-------| +| Storage Throughput | 88.47 tok/s | 41.56 tok/s | **2.13×** | +| Wall-Clock Throughput | 610.36 tok/s | 290.02 tok/s | **2.10×** | +| Storage Latency P95 | 36,504 ms | 45,091 ms | **1.24×** | + +**Critical Finding:** At `cpu_mem=0GB`, use **Decode Bytes Read** or **Wall-Clock Throughput** for differentiation, NOT Storage Throughput (only 1.12× due to both systems being 100% I/O-bound). + +--- + +### 5.3 iostat Validation + +**Maximum Storage Utilization by Memory Tier:** + +| `cpu_mem` | Avg Read MB/s | Avg Total MB/s | Util% | +|-----------|---------------|----------------|-------| +| **0 GB** | **6,825** | **7,680** | **211%** | +| 4 GB | 1,714 | 2,741 | 51% | +| 8 GB | 628 | 1,719 | 38% | +| 16 GB | 47 | 1,188 | 38% | + +**Peak Performance:** `cpu_mem=0GB` with `llama3.1-8b` at 200 users achieved **10.9 GB/s** (78% of 14 GB/s theoretical limit). + +--- + +## 6. MLPerf v3.0 Submission Guidelines + +### Recommended Configurations + +#### Option 1: Maximum Storage Stress (cpu_mem=0GB) + +**Use when:** Measuring I/O volume differentiation and hardware stress. + +**Primary Metrics:** +- `decode_bytes_read_gb` (2.62× differentiation, 100% win rate) +- `avg_throughput_tokens_per_sec` (2.43× differentiation, 100% win rate) +- `nvme_read_device_p95_ms`, `nvme_write_device_p95_ms` + +⚠️ **Do NOT use** `storage_throughput` at `cpu_mem=0GB` (only 1.12× differentiation). + +```bash +for trial in {1..5}; do + python3 kv-cache.py \ + --config config.yaml \ + --model llama3.1-8b \ + --num-users 200 \ + --duration 300 \ + --gpu-mem-gb 0 \ + --cpu-mem-gb 0 \ + --max-concurrent-allocs 16 \ + --generation-mode none \ + --cache-dir /mnt/nvme \ + --seed 42 \ + --output mlperf_stress_8b_trial${trial}.json +done +``` + +--- + +#### Option 2: Storage Throughput Focus (cpu_mem=4GB) + +**Use when:** Storage Throughput is the primary metric. + +**Primary Metrics:** +- `storage_throughput_tokens_per_sec` (2.23× differentiation, 97.2% win rate) +- `decode_bytes_read_gb` +- `nvme_read_device_p95_ms`, `nvme_write_device_p95_ms` + +```bash +for trial in {1..5}; do + python3 kv-cache.py \ + --config config.yaml \ + --model llama3.1-8b \ + --num-users 100 \ + --duration 300 \ + --gpu-mem-gb 0 \ + --cpu-mem-gb 4 \ + --generation-mode none \ + --cache-dir /mnt/nvme \ + --seed 42 \ + --output mlperf_throughput_8b_trial${trial}.json +done +``` + +--- + +#### Option 3: Large Model (70B) + +**Use when:** Maximum per-request storage stress (70B has ~2.5× larger KV cache/token). + +```bash +for trial in {1..3}; do + python3 kv-cache.py \ + --config config.yaml \ + --model llama3.1-70b-instruct \ + --num-users 70 \ + --duration 300 \ + --gpu-mem-gb 0 \ + --cpu-mem-gb 0 \ + --max-concurrent-allocs 4 \ + --generation-mode none \ + --cache-dir /mnt/nvme \ + --seed 42 \ + --output mlperf_stress_70b_trial${trial}.json +done +``` + +--- + +### Critical Parameters + +| Parameter | Value | Rationale | +|-----------|-------|-----------| +| `--seed 42` | **Required** | Reproducibility | +| `--gpu-mem-gb 0` | **Required** | Isolates storage | +| `--generation-mode` | `none` | Pure storage benchmark | +| `--cpu-mem-gb` | 0 or 4 | 0 for max stress; 4 for throughput metric | +| `--max-concurrent-allocs` | 0, 4, or 16 | Controls RAM usage | +| `--duration` | 300-600 | Steady-state requirement | + +--- + +### Trial Requirements + +**High variance observed (CV 50-125%)** requires multiple trials: + +| User Count | Variance (CV) | Min Trials | +|------------|---------------|------------| +| 10 users | ~52% | 3 | +| 50-100 users | ~115-125% | 3-5 | +| 200 users | ~110-120% | 3-5 | + +**Report median, not mean.** + +--- + +### Submission Checklist + +- [ ] `--seed 42` used +- [ ] `--gpu-mem-gb 0` (storage isolation) +- [ ] `--generation-mode none` (pure storage) +- [ ] `--duration ≥ 300` seconds +- [ ] 3-5 trials per configuration +- [ ] Median values reported +- [ ] Correct metrics for `cpu_mem` setting: + - `cpu_mem=0GB` → `decode_bytes_read_gb`, `avg_throughput_tokens_per_sec`, device P95 + - `cpu_mem=4GB` → `storage_throughput_tokens_per_sec`, device P95 +- [ ] Both 8B and 70B results included +- [ ] System info documented (CPU, RAM, NVMe model) + +--- + +### Example Submission + +``` +MLPerf Storage v3.0 Submission +============================== +System: Supermicro SYS-621H-TN12R +Storage: Kingston DC600M 7.0TB NVMe (PCIe Gen5) +Model: llama3.1-8b +Config: cpu_mem=0GB, users=200, duration=300s, trials=5 + +Results (median of 5 trials): + Decode Bytes Read: 1,195 GB + Wall-Clock Throughput: 557 tok/s + Storage Read Device P95: 892 ms + Storage Write Device P95: 156 ms + Peak I/O Bandwidth: 10.9 GB/s (78% theoretical) +``` + +--- + +## 7. Interpreting Results + +### Metric Selection by Use Case + +| Use Case | Primary Metric | Configuration | +|----------|----------------|---------------| +| **Compare NVMe drives** | `decode_bytes_read_gb`, `nvme_device_p95_ms` | `cpu_mem=0GB`, `gen_mode=none` | +| **Production planning** | `wall_clock_throughput`, `end_to_end_latency_p95` | `cpu_mem=4GB`, `gen_mode=realistic` | +| **Storage efficiency** | `storage_throughput` | `cpu_mem=4GB` | +| **Capacity discovery** | `autoscaling_stats[last].users` | `--enable-autoscaling --autoscaler-mode qos` | + +--- + +### Understanding Throughput Metrics + +| Metric | Formula | What It Measures | +|--------|---------|------------------| +| **Wall-Clock Throughput** | `tokens / elapsed_time` | System capacity (user-facing) | +| **Storage Throughput** | `tokens / total_storage_io_time` | Storage efficiency (hardware) | + +**Why Storage Throughput fails at `cpu_mem=0GB`:** + +Both fast and slow systems are 100% I/O-bound. Fast system reads **more data** but spends **more time doing I/O** → effects cancel out. + +| System | Decode Bytes | I/O Time | Storage Throughput | +|--------|--------------|----------|-------------------| +| Fast | 1,195 GB | ~8,000 s | 9.53 tok/s | +| Slow | 447 GB | ~7,100 s | 8.50 tok/s | +| **Ratio** | **2.62×** | **1.13×** | **1.12×** ❌ | + +**Use `decode_bytes_read_gb` or `wall_clock_throughput` instead.** + +--- + +### Latency Interpretation Guide + +| Latency Type | What to Check | Diagnosis | +|--------------|---------------|-----------| +| **End-to-End High** | Queue Wait component | Overloaded → reduce users or add capacity | +| **Storage I/O High** | Host vs Device ratio | If Host >> Device → CPU bottleneck, not storage | +| **Device P95 High** | Compare to drive spec | Storage hardware limitation | +| **Queue Wait High** | System saturation | Receiving requests faster than processing | + +**Example Diagnosis:** +``` +Storage Read Total P95: 260.90 ms + ├─ Device P95: 15.23 ms (6%) + └─ Host P95: 245.67 ms (94%) + +Diagnosis: CPU serialization (np.save/load) is bottleneck, not storage. +``` + +--- + +## 8. Advanced Features + +### 8.1 Multi-Turn Conversations + +Simulates chat history by linking requests: + +```python +conversation_id = f"conv_{user_id}" +for turn in range(num_turns): + cache_key = f"{conversation_id}_turn_{turn}" + # Each turn can access previous turn KV caches +``` + +**Benefit:** Models realistic conversational AI workload with growing context. + +--- + +### 8.2 ShareGPT Dataset Replay + +**Source:** The [ShareGPT](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered) dataset contains 90K+ real human-ChatGPT conversations extracted from the ShareGPT browser extension. + +**Why ShareGPT?** +- **Real conversation patterns:** Multi-turn dialogues with natural context accumulation +- **Diverse use cases:** Coding, writing, Q&A, brainstorming +- **Realistic token distributions:** Mean ~133 input tokens, ~150 output tokens (shorter than synthetic) + +**Dataset Structure:** +```json +{ + "id": "conversation_123", + "conversations": [ + {"from": "human", "value": "Explain quantum computing"}, + {"from": "gpt", "value": "Quantum computing uses..."}, + {"from": "human", "value": "How does superposition work?"}, + {"from": "gpt", "value": "Superposition is..."} + ] +} +``` + +**How Replay Works:** + +1. **Load Phase:** `ShareGPTDatasetLoader` parses the JSON and extracts conversation turns +2. **Tokenization:** Each turn is tokenized (tiktoken if available, else char estimate) +3. **Request Generation:** Each conversation turn becomes an `InferenceRequest`: + - Context tokens = cumulative conversation history + - Generation tokens = assistant response length +4. **Timing:** Requests are issued with configurable inter-arrival delays +5. **Cycling:** When dataset exhausts, replay restarts (controlled by `--replay-cycles`) + +**Usage:** +```bash +kv-cache \ + --dataset-path /path/to/ShareGPT_V3_filtered.json \ + --max-conversations 1000 \ + --replay-cycles 3 \ + --model llama3.1-8b \ + --num-users 50 \ + --duration 300 \ + --gpu-mem-gb 0 --cpu-mem-gb 0 \ + --cache-dir /mnt/nvme +``` + +**Config Parameters (`config.yaml`):** +```yaml +sharegpt: + max_context_tokens: 8192 # Truncate long contexts + max_generation_tokens: 2048 # Truncate long responses + chars_per_token_estimate: 4 # Fallback if no tokenizer +``` + +**CLI Parameters:** +| Parameter | Default | Description | +|-----------|---------|-------------| +| `--dataset-path` | None | Path to ShareGPT JSON file | +| `--max-conversations` | 500 | Limit conversations loaded | +| `--replay-cycles` | 0 | Times to replay dataset (0 = infinite until duration) | + +--- + +### 8.3 BurstGPT Trace Replay + +**Source:** Wang et al., "BurstGPT: A Real-world Workload Dataset to Optimize LLM Serving Systems" (arXiv:2401.17644, KDD '25) + +The BurstGPT trace provides **10.31M production API calls** from Azure OpenAI over 121 days, capturing: + +- **Zipf-distributed request lengths:** Many short requests with long tail (realistic API usage) +- **Bimodal response patterns:** ChatGPT responses cluster around two modes +- **Realistic token distributions:** Avg 621 request tokens, 126 response tokens +- **Temporal patterns:** Real request arrival times with burstiness + +**Trace File Format (CSV):** +```csv +Timestamp,Model,Request tokens,Response tokens,Total tokens,Log Type +5,ChatGPT,472,18,490,Conversation log +45,ChatGPT,1087,230,1317,Conversation log +118,GPT-4,417,276,693,Conversation log +``` + +| Column | Description | +|--------|-------------| +| `Timestamp` | Relative time in seconds from trace start | +| `Model` | Original model (ChatGPT or GPT-4); ignored by benchmark | +| `Request tokens` | Input/context token count | +| `Response tokens` | Output/generation token count | +| `Total tokens` | Sum of request + response | +| `Log Type` | Always "Conversation log" | + +**How Replay Works:** + +1. **Load Phase:** CSV files are loaded from the trace directory +2. **Timestamp Extraction:** Original request timestamps are parsed +3. **Replay with Timing:** + - `--trace-speedup 1.0`: Real-time replay (honors original inter-arrival times) + - `--trace-speedup 10.0`: 10× faster (compress 10 minutes into 1 minute) + - `--trace-speedup 0`: No delay (saturate storage as fast as possible) +4. **Request Mapping:** Each trace row becomes an `InferenceRequest`: + - Context tokens from `ContextTokens` column + - Generation tokens from `GeneratedTokens` column +5. **Cycling:** When trace exhausts, replay restarts (controlled by `--replay-cycles`) + +**Setup:** +```bash +git clone https://github.com/HPMLL/BurstGPT.git +# Trace files are in BurstGPT/data/BurstGPT_*.csv +``` + +**Usage:** +```bash +kv-cache \ + --config config.yaml \ + --model llama3.1-8b \ + --use-burst-trace \ + --burst-trace-path BurstGPT/data/ \ + --trace-speedup 0 \ + --replay-cycles 5 \ + --num-users 50 \ + --duration 300 \ + --gpu-mem-gb 0 --cpu-mem-gb 0 \ + --cache-dir /mnt/nvme \ + --output results_burst.json +``` + +**CLI Parameters:** +| Parameter | Default | Description | +|-----------|---------|-------------| +| `--use-burst-trace` | False | Enable BurstGPT trace replay | +| `--burst-trace-path` | `BurstGPT/data/BurstGPT_1.csv` | Path to trace file or directory | +| `--trace-speedup` | 1.0 | Replay speed multiplier (0 = no delay) | +| `--replay-cycles` | 0 | Times to replay trace (0 = infinite until duration) | + +**Speedup Examples:** +| `--trace-speedup` | Behavior | Use Case | +|-------------------|----------|----------| +| `1.0` | Real-time (original timestamps) | Validate temporal patterns | +| `10.0` | 10× faster | Quick stress test | +| `0` | No delay (saturate) | **Maximum storage stress** | + +**Comparison of Workload Sources:** + +| Metric | Synthetic | ShareGPT | BurstGPT | +|--------|-----------|----------|----------| +| Source | Random from user templates | Real conversations | Production API traces | +| Mean Context | ~2,676 tokens | ~133 tokens | ~622 tokens | +| Mean Response | ~275 tokens | ~150 tokens | ~126 tokens | +| Distribution | Uniform within ranges | Natural conversation | Zipf (many short, long tail) | +| Reproducibility | High (fixed seed) | High (fixed dataset) | High (fixed trace) | +| Realism | Configurable | Conversational | Production workload | +| Multi-turn | Simulated | Natural | Single-shot API calls | +| Timing | Configurable | Sequential | Real timestamps | + +**Recommendation for MLPerf Submissions:** +- **Storage stress testing:** Use `--use-burst-trace --trace-speedup 0` (maximum I/O) +- **Realistic validation:** Use `--use-burst-trace --trace-speedup 1.0` (real timing) +- **Conversational patterns:** Use `--dataset-path` with ShareGPT + +**Benefit:** BurstGPT provides the most realistic workload patterns from actual production systems, making it ideal for validating hardware against real-world API traffic. + +--- + +### 8.4 Static Noise Buffers (Performance Optimization) + +**Problem:** `np.random.uniform()` consumed massive CPU time, masking storage performance. + +**Solution:** Pre-allocate 256 MB random buffer at startup, use zero-copy slicing: + +```python +# Startup +buffer = rng.uniform(-1.0, 1.0, size=128*1024*1024).astype(dtype) + +# Per-request (zero-cost) +data = buffer[start:start+size].reshape(kv_shape) +``` + +**Impact:** Data generation now effectively instant, ensuring 100% of measured latency reflects storage. + +--- + +## 9. Common Issues & Troubleshooting + +### Issue: High Host Latency + +**Symptom:** `host_latency_p95 >> device_latency_p95` + +**Diagnosis:** CPU serialization (Python/NumPy overhead) is bottleneck, not storage. + +**Solution:** This is expected behavior. Real inference engines (C++/GPUDirect Storage) minimize this overhead. + +--- + +### Issue: OOM Kills + +**Symptom:** Process terminates with "Out of Memory" + +**Diagnosis:** Insufficient RAM for `--max-concurrent-allocs 0` (unlimited). + +**Solution:** Set explicit limit: `--max-concurrent-allocs 16` (8B model) or `--max-concurrent-allocs 4` (70B model). + +--- + +### Issue: Low Differentiation Between Drives + +**Symptom:** Fast/slow drives show similar throughput + +**Diagnosis:** Using wrong metric for `cpu_mem` setting. + +**Solution:** +- At `cpu_mem=0GB` → Use `decode_bytes_read_gb` or `wall_clock_throughput` +- At `cpu_mem=4GB` → Use `storage_throughput` + +--- + +### Issue: High Variance Across Trials + +**Symptom:** CV > 50% + +**Diagnosis:** Normal for high concurrency workloads. + +**Solution:** Run 3-5 trials, report **median** not mean. + +--- + +## 10. Appendix: Architecture Changes (Dec 2025) + +### From Spillover to Waterfall + +**Old (Spillover):** New data forced to CPU when GPU full → penalizes hot data. + +**New (Waterfall):** New data always targets GPU → LRU cascades down tiers → hot data stays fast. + +### Static Noise Buffers + +**Old:** `np.random.uniform()` on every request → CPU bottleneck. + +**New:** Pre-allocated 256 MB buffer → zero-copy slicing → instant data generation. + +### Concurrency Hardening + +- Atomic space reservations inside memory locks +- Loop protection with hard caps on eviction attempts +- Race condition elimination for concurrent allocations + +### Enhanced Metrics + +- `nvme_tokens_processed` – Tracks exact token count through NVMe +- Per-tier device vs host latency breakdowns +- Autoscaling termination reasons + +--- + +## 11. Future Enhancements: Storage Backend Roadmap + +The current `StorageBackend` abstraction in `backends.py` provides a clean interface for adding new storage tiers. This section outlines planned enhancements with feasibility analysis based on the existing codebase. + +### 11.1 Current Architecture (Extensibility Assessment) + +The existing backend interface is minimal and easy to extend: + +```python +class StorageBackend: + def write(self, key: str, data: np.ndarray) -> IOTiming: ... + def read(self, key: str) -> Tuple[np.ndarray, IOTiming]: ... + def delete(self, key: str): ... + def clear(self): ... +``` + +**Extensibility:** ✅ **HIGH** – Any storage system that can serialize/deserialize NumPy arrays can implement this interface. + +--- + +### 11.2 NVIDIA GPUDirect Storage (GDS) + +**What it is:** Direct DMA path between GPU VRAM and NVMe storage, bypassing CPU bounce buffers entirely. + +**Why it matters for KV cache:** In production inference engines (vLLM, TensorRT-LLM, Mooncake), KV cache tensors are computed on the GPU during the attention forward pass; they originate in GPU VRAM, not CPU memory. When GPU VRAM fills up, these tensors must be offloaded to NVMe. Without GDS, this requires a costly CPU round-trip: + +``` +Without GDS: GPU VRAM → cudaMemcpy → CPU RAM → Page Cache → NVMe +With GDS: GPU VRAM → cuFile DMA → NVMe (direct) +``` + +GDS eliminates three overhead sources on the GPU↔NVMe path: +- `cudaMemcpyDeviceToHost` / `cudaMemcpyHostToDevice` (GPU↔CPU transfer) +- Host-side tensor format conversion (e.g., `.numpy()`) +- Kernel page cache staging (data touches CPU DRAM twice without GDS) + +**GPU↔NVMe paths in the benchmark:** + +The benchmark's tier eviction logic (`_demote_entry`, `cache.py:256-273`) moves data between tiers using the backend `read`/`write` interface: + +| Phase | Current Path | Code Reference | +|-------|-------------|----------------| +| **GPU → NVMe eviction** | GPU tensor → `.to('cpu').numpy()` → `np.save()` → `fsync()` → NVMe | `backends.py:165-169` (GPU read), `backends.py:268-285` (NVMe write) | +| **NVMe read** | `posix_fadvise(DONTNEED)` → `np.load()` → NumPy array in CPU RAM | `backends.py:287-315` | + +Note: The benchmark does not promote NVMe data back to GPU on read. Once evicted, data is served directly from NVMe on subsequent accesses. + +**Configuration to exercise GPU→NVMe eviction:** + +```bash +kv-cache \ + --gpu-mem-gb 16 \ + --cpu-mem-gb 0 \ + --cache-dir /mnt/nvme \ + --model llama3.1-8b \ + --num-users 100 \ + --duration 300 +``` + +With `--cpu-mem-gb 0`, the GPU tier overflows directly to NVMe, maximising GPU→NVMe eviction traffic; exactly the path GDS accelerates. + +**Current benchmark limitation:** The benchmark generates KV cache tensors as NumPy arrays in CPU RAM (`cache.py:427`), then copies them to the GPU tier via `torch.from_numpy().pin_memory().to(cuda)` (`backends.py:144-150`). This CPU-origin flow means the initial write is a CPU→GPU transfer. GDS only accelerates the subsequent GPU→NVMe eviction path, not this initial allocation. A future `--gpu-native` mode that generates tensors directly on GPU (e.g., `torch.randn(..., device='cuda')`) would make the full write path GPU-origin, enabling GDS for both initial NVMe writes and eviction writes. + +**Implementation approach:** + +```python +class GDSBackend(StorageBackend): + """GPUDirect Storage backend using cuFile API.""" + + def __init__(self, base_path: str, gpu_device: int = 0): + import kvikio # NVIDIA's Python bindings for cuFile + self.base_path = Path(base_path) + self.gpu_device = gpu_device + kvikio.defaults.compat_mode(False) # Enable GDS mode + + def write(self, key: str, data) -> IOTiming: + import cupy as cp + # Accept both GPU tensors (direct DMA) and NumPy arrays (copy to GPU first) + gpu_data = data if isinstance(data, cp.ndarray) else cp.asarray(data) + path = self.base_path / f"{key}.bin" + + start = time.perf_counter() + with kvikio.CuFile(path, "w") as f: + f.write(gpu_data) + total = time.perf_counter() - start + + return IOTiming(total=total, device=total, host=0) + + def read(self, key: str) -> Tuple: + import cupy as cp + path = self.base_path / f"{key}.bin" + nbytes = path.stat().st_size + gpu_buf = cp.empty(nbytes // 2, dtype='float16') # Assumes float16 + + start = time.perf_counter() + with kvikio.CuFile(path, "r") as f: + f.read(gpu_buf) + total = time.perf_counter() - start + + # Return NumPy to match StorageBackend interface + return cp.asnumpy(gpu_buf), IOTiming(total=total, device=total, host=0) +``` + +**Feasibility:** ✅ **HIGH** +- Requires: NVIDIA driver 515+, CUDA 11.4+, supported NVMe (most data center drives) +- Python bindings available via `kvikio` package (`pip install kvikio-cu12`) +- Can coexist with existing `NVMeBackend` (fallback when GDS unavailable) + +**References:** +- [GPUDirect Storage Overview](https://docs.nvidia.com/gpudirect-storage/overview-guide/index.html) +- [KvikIO Python API](https://docs.rapids.ai/api/kvikio/stable/) + +--- + +### 11.3 Amazon S3 / Object Storage Backend + +**What it is:** Cloud object storage (S3, Azure Blob, GCS, MinIO) as a cold tier below NVMe. + +**Why it matters for KV cache:** +- Enables virtually unlimited capacity for long-context caching +- Supports disaggregated architectures where prefill and decode run on different nodes +- Cost-effective for infrequently accessed conversation history + +**Implementation approach:** + +```python +class S3Backend(StorageBackend): + """Amazon S3 / S3-compatible object storage backend.""" + + def __init__(self, bucket: str, prefix: str = "kv_cache/", + endpoint_url: str = None): + import boto3 + self.s3 = boto3.client('s3', endpoint_url=endpoint_url) + self.bucket = bucket + self.prefix = prefix + + def write(self, key: str, data: np.ndarray) -> IOTiming: + import io + start = time.perf_counter() + + buffer = io.BytesIO() + np.save(buffer, data, allow_pickle=False) + buffer.seek(0) + + host_time = time.perf_counter() - start + + self.s3.upload_fileobj(buffer, self.bucket, f"{self.prefix}{key}.npy") + total = time.perf_counter() - start + + return IOTiming(total=total, device=total - host_time, host=host_time) + + def read(self, key: str) -> Tuple[np.ndarray, IOTiming]: + import io + start = time.perf_counter() + + buffer = io.BytesIO() + self.s3.download_fileobj(self.bucket, f"{self.prefix}{key}.npy", buffer) + device_time = time.perf_counter() - start + + buffer.seek(0) + data = np.load(buffer, allow_pickle=False) + total = time.perf_counter() - start + + return data, IOTiming(total=total, device=device_time, host=total - device_time) +``` + +**Feasibility:** ✅ **HIGH** +- Requires: `boto3` package, AWS credentials or S3-compatible endpoint +- Latency: 50-200ms (not suitable for hot tier, ideal for archival) +- Throughput: 100-500 MB/s per connection (can parallelize with `TransferConfig`) + +**Use cases:** +- `--s3-bucket my-kv-cache --s3-cold-threshold 3600` (move to S3 after 1 hour idle) +- Cross-region KV cache sharing for global deployments +- Cost optimization: NVMe for recent conversations, S3 for history + +**References:** +- [Boto3 S3 Transfer](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3.html) +- [S3 Express One Zone](https://aws.amazon.com/s3/storage-classes/express-one-zone/) (single-digit ms latency) + +--- + +### 11.4 NVIDIA NIXL (Distributed KV Transfer) + +**What it is:** NVIDIA Inference Xfer Library – high-performance point-to-point transfers between nodes for distributed inference. + +**Why it matters for KV cache:** +- Enables disaggregated prefill/decode across multiple GPUs/nodes +- Supports RDMA (InfiniBand, RoCE) for sub-millisecond inter-node transfers +- Native integration with GDS for storage-to-GPU-to-network pipelines + +**Implementation approach:** + +```python +class NIXLBackend(StorageBackend): + """Distributed KV cache transfer using NVIDIA NIXL.""" + + def __init__(self, local_rank: int, world_size: int, + backend: str = "ucx"): + import nixl + self.agent = nixl.Agent(nixl.NIXL_INIT_AGENT) + self.local_rank = local_rank + self.world_size = world_size + self.remote_descriptors = {} # Cached remote memory descriptors + + def write_to_remote(self, key: str, data: np.ndarray, + target_rank: int) -> IOTiming: + """Transfer KV cache to a remote node (e.g., prefill → decode).""" + import cupy as cp + + start = time.perf_counter() + gpu_data = cp.asarray(data) + + # Get remote memory descriptor (cached for performance) + remote_desc = self._get_remote_descriptor(target_rank, key) + + # Initiate RDMA transfer + handle = self.agent.transfer( + gpu_data.data.ptr, remote_desc, + data.nbytes, nixl.NIXL_WRITE + ) + handle.wait() + + total = time.perf_counter() - start + return IOTiming(total=total, device=total, host=0) +``` + +**Feasibility:** ⚠️ **MEDIUM** +- Requires: UCX library, InfiniBand/RoCE network, NVIDIA GPU +- Complexity: Requires coordination layer (etcd) for metadata exchange +- Integration: Best combined with existing multi-node frameworks (vLLM, TensorRT-LLM) + +**Use cases:** +- Disaggregated inference: Prefill node writes KV cache → Decode node reads via RDMA +- Multi-GPU KV cache sharing within a single server +- Federated KV cache across data center regions + +**References:** +- [NIXL GitHub](https://github.com/ai-dynamo/nixl) +- [LMCache P2P Sharing](https://docs.lmcache.ai/kv_cache/p2p_sharing.html) + +--- + +### 11.5 Distributed KV Cache with Redis / Valkey + +**What it is:** In-memory distributed cache shared across multiple inference servers. + +**Why it matters for KV cache:** +- Enables KV cache sharing across multiple vLLM/TensorRT-LLM instances +- Supports atomic operations for concurrent access +- Built-in LRU eviction and TTL-based expiration + +**Architecture:** + +``` + +---------------------------------------+ + | Redis Cluster | + | +--------+ +--------+ +--------+ | + | |Shard 0 | |Shard 1 | |Shard 2 | | + | |(A-F) | |(G-N) | |(O-Z) | | + | +---+----+ +---+----+ +---+----+ | + +------+----------+----------+---------+ + | | | + +-----------------+----------+----------+-----------------+ + | | | | | + v v v v v ++------------------+ +------------------+ +------------------+ +| Server 1 | | Server 2 | | Server 3 | +| +------------+ | | +------------+ | | +------------+ | +| | vLLM | | | | vLLM | | | | TensorRT | | +| | +--------+ | | | | +--------+ | | | | +--------+ | | +| | |GPU A100| | | | | |GPU A100| | | | | |GPU H100| | | +| | |Local KV| | | | | |Local KV| | | | | |Local KV| | | +| | +--------+ | | | | +--------+ | | | | +--------+ | | +| +------+-----+ | | +------+-----+ | | +------+-----+ | +| | | | | | | | | +| RedisBackend | | RedisBackend | | RedisBackend | ++------------------+ +------------------+ +------------------+ +``` + +**Data Flow Example:** + +``` +1. User "alice" -> Server 1 + Server 1: Compute KV, SET kv:alice_ctx + +2. User "alice" returns -> Server 2 (different server!) + Server 2: GET kv:alice_ctx -> HIT + Result: Skip prefill, 10x faster TTFT + +3. System prompt sharing: + Server 1: SET kv:system_prompt_hash (compute once) + Server 2: GET kv:system_prompt_hash -> HIT (reuse) + Server 3: GET kv:system_prompt_hash -> HIT (reuse) +``` + +**Write-through vs Write-back:** + +``` +Write-Through (sync): Write-Back (async): + + Request Request + | | + v v + Compute KV Compute KV + | | + +-> GPU (local) +-> GPU (local) + | | + +-> Redis (blocks) +-> Queue -> Redis + | (non-blocking) + Wait for ACK + + +1-10ms latency ~0ms overhead + Strong durability May lose recent writes +``` + +**Implementation approach:** + +```python +class RedisBackend(StorageBackend): + """Distributed KV cache using Redis/Valkey.""" + + def __init__(self, host: str = "localhost", port: int = 6379, + prefix: str = "kv:", ttl_seconds: int = 3600): + import redis + self.client = redis.Redis(host=host, port=port, decode_responses=False) + self.prefix = prefix + self.ttl = ttl_seconds + + def write(self, key: str, data: np.ndarray) -> IOTiming: + start = time.perf_counter() + + # Serialize with numpy's efficient binary format + buffer = io.BytesIO() + np.save(buffer, data, allow_pickle=False) + serialized = buffer.getvalue() + host_time = time.perf_counter() - start + + # Write to Redis with TTL + self.client.setex(f"{self.prefix}{key}", self.ttl, serialized) + total = time.perf_counter() - start + + return IOTiming(total=total, device=total - host_time, host=host_time) + + def read(self, key: str) -> Tuple[np.ndarray, IOTiming]: + start = time.perf_counter() + + serialized = self.client.get(f"{self.prefix}{key}") + if serialized is None: + raise KeyError(f"Key {key} not found in Redis") + + device_time = time.perf_counter() - start + + buffer = io.BytesIO(serialized) + data = np.load(buffer, allow_pickle=False) + total = time.perf_counter() - start + + return data, IOTiming(total=total, device=device_time, host=total - device_time) +``` + +**Feasibility:** ✅ **HIGH** +- Requires: Redis 6+ or Valkey, `redis-py` package +- Latency: 0.1-1ms local, 1-10ms cross-rack +- Memory: Limited by Redis cluster size (can scale horizontally) + +**Use cases:** +- Shared prefix cache across multiple inference servers +- Session affinity: Route returning users to servers with cached context +- A/B testing: Share baseline KV cache across experiment groups + +**References:** +- [Redis LRU Eviction](https://redis.io/docs/latest/develop/reference/eviction/) +- [Valkey (Redis fork)](https://valkey.io/) + +--- + +### 11.6 Native Multi-Client Mode (`--num-clients`) + +> **✅ Already Achievable Today:** Multi-client benchmarking works now using separate directories and the bash script in Section 2.1. The native `--num-clients` flag proposed here is a **convenience enhancement** for easier invocation and automatic result aggregation. + +**Current Workaround (Available Now):** +```bash +# Works today - see Section 2.1 "Multi-Client Scaling" +for i in 0 1 2 3; do + python -m kv_cache.cli --cache-dir /mnt/nvme/client_$i ... & +done +wait +# Manually aggregate results_client_*.json +``` + +**Proposed Enhancement:** +```bash +# Future: Single command with automatic aggregation +python -m kv_cache.cli --num-clients 4 --cache-dir /mnt/nvme/kv_benchmark ... +``` + +**What Real-World Scenario This Simulates:** + +``` +Production Deployment: 8-GPU Server Running Multiple vLLM Instances ++------------------------------------------------------------------+ +| Single Physical Server | +| +------------+ +------------+ +------------+ +------------+ | +| | vLLM #0 | | vLLM #1 | | vLLM #2 | | vLLM #3 | | +| | GPU 0-1 | | GPU 2-3 | | GPU 4-5 | | GPU 6-7 | | +| +-----+------+ +-----+------+ +-----+------+ +-----+------+ | +| | | | | | +| +-------+-------+-------+-------+-------+-------+ | +| | | +| v | +| +----------------+ | +| | Shared NVMe | <-- All 4 instances write/read here | +| | (PCIe Gen5) | | +| +----------------+ | ++------------------------------------------------------------------+ + +Each vLLM instance = 1 benchmark client +4 clients competing for same NVMe = realistic storage contention +``` + +| Production Scenario | Today (bash script) | Future (`--num-clients`) | +|---------------------|---------------------|--------------------------| +| 4× vLLM on 8-GPU server | 4 terminals or `&` background | `--num-clients 4` | +| 8× TensorRT-LLM on DGX | 8 terminals or `&` background | `--num-clients 8` | +| Kubernetes: 4 pods, shared PV | 4 terminals or `&` background | `--num-clients 4` | + +**Why This Matters:** +- Single-process benchmark underestimates contention +- Real deployments run **multiple inference engines per node** +- Storage must handle concurrent writes from all instances +- Tests filesystem locking, queue depth saturation, and I/O scheduler behavior + +**Why Native `--num-clients` Would Be Better Than Bash Script:** + +| Aspect | Bash Script (Today) | Native `--num-clients` (Future) | +|--------|---------------------|--------------------------------| +| Invocation | Multi-line script | Single command | +| Result aggregation | Manual Python script | Automatic | +| Latency percentiles | Cannot merge correctly | DDSketch-based merge | +| Progress display | 4 separate outputs | Unified aggregate view | +| Error handling | One crash, others continue | Coordinated shutdown | + +**Implementation Complexity: HIGH (4-6 weeks)** + +This feature requires changes across multiple modules: + +#### Required Code Changes + +| Module | Change | Complexity | +|--------|--------|------------| +| `cli.py` | Add `--num-clients` argument, spawn child processes | LOW | +| `cli.py` | Signal handling (Ctrl+C propagates to children) | MEDIUM | +| `benchmark.py` | IPC for real-time progress reporting | HIGH | +| `monitoring.py` | Cross-process metric aggregation | HIGH | +| `cache.py` | Shared statistics counters (multiprocessing.Value) | MEDIUM | +| New: `aggregator.py` | Merge latency histograms, compute aggregate percentiles | HIGH | + +#### Challenge 1: Latency Percentile Aggregation + +Each client tracks its own latency distribution. Merging P50/P95/P99 across processes is **not trivial**: + +```python +# WRONG: Can't average percentiles +aggregate_p99 = sum(client_p99) / num_clients # ❌ Mathematically incorrect + +# CORRECT: Must merge raw samples or use t-digest/DDSketch +from ddsketch import DDSketch + +# Each client maintains a sketch +client_sketches = [DDSketch() for _ in range(num_clients)] + +# Parent merges sketches +merged = DDSketch() +for sketch in client_sketches: + merged.merge(sketch) + +aggregate_p99 = merged.get_quantile_value(0.99) # ✓ Correct +``` + +**Options:** +1. **Shared file:** Each client appends latencies to `latencies_client_N.bin`, parent reads all after completion +2. **Streaming IPC:** Clients send samples via `multiprocessing.Queue` (memory overhead) +3. **Sketch algorithms:** DDSketch or T-Digest for approximate percentiles (requires new dependency) + +#### Challenge 2: Real-Time Progress Reporting + +Current `monitor_stats()` prints progress every 5 seconds. With multi-client: + +``` +# Current (single client) +Time: 60s, Users: 100, Queue: 5, Write: 3.2 GB/s, Read: 4.1 GB/s + +# Multi-client: Need aggregate view +Time: 60s, Clients: 4, Total Users: 200, Aggregate Write: 12.8 GB/s, Read: 16.4 GB/s + └─ Client 0: 3.2 GB/s W, 4.1 GB/s R + └─ Client 1: 3.1 GB/s W, 4.0 GB/s R + └─ Client 2: 3.3 GB/s W, 4.2 GB/s R + └─ Client 3: 3.2 GB/s W, 4.1 GB/s R +``` + +**Implementation:** Parent process polls children via `multiprocessing.Queue` or shared memory (`multiprocessing.Array`). + +#### Challenge 3: Error Handling + +| Scenario | Current Behavior | Required Behavior | +|----------|------------------|-------------------| +| One client OOMs | N/A | Parent detects, logs, continues or aborts all | +| Ctrl+C pressed | Single process exits | Parent sends SIGTERM to all children | +| One client finishes early | N/A | Wait for slowest, or use first-to-finish time | +| Disk full mid-run | Single process fails | All clients detect, graceful shutdown | + +#### Challenge 4: Output Format + +```json +{ + "aggregate": { + "total_write_bytes": 128000000000, + "total_read_bytes": 164000000000, + "write_bandwidth_gbps": 12.8, + "read_bandwidth_gbps": 16.4, + "latency_p50_ms": 2.1, // Merged from all clients + "latency_p99_ms": 8.3, // Merged from all clients + "num_clients": 4 + }, + "per_client": [ + {"client_id": 0, "write_bandwidth_gbps": 3.2, ...}, + {"client_id": 1, "write_bandwidth_gbps": 3.1, ...}, + ... + ] +} +``` + +#### Implementation Roadmap for `--num-clients` + +| Phase | Task | Effort | +|-------|------|--------| +| 1 | Basic spawning with separate output files (current bash approach, but in Python) | 1 week | +| 2 | Post-run JSON aggregation (bandwidth, bytes) | 3 days | +| 3 | Latency histogram merging (DDSketch or raw samples) | 1 week | +| 4 | Real-time aggregate progress display | 1 week | +| 5 | Graceful error handling and signal propagation | 1 week | +| 6 | XLSX export with per-client and aggregate sheets | 3 days | + +**Total: 4-6 weeks** + +**Recommendation:** For MLPerf v3.0 submission, use the **bash script approach** documented in Section 2.1. Native `--num-clients` is a post-v3.0 enhancement. + +--- + +### 11.7 Implementation Roadmap + +| Phase | Feature | Priority | Effort | Dependencies | +|-------|---------|----------|--------|--------------| +| **Phase 1** | S3Backend | HIGH | 2 weeks | boto3 | +| **Phase 1** | RedisBackend | HIGH | 1 week | redis-py | +| **Phase 2** | GDSBackend | MEDIUM | 3 weeks | kvikio, CUDA 11.4+ | +| **Phase 2** | `--num-clients` (basic) | MEDIUM | 2 weeks | multiprocessing | +| **Phase 3** | `--num-clients` (full) | LOW | 4 weeks | ddsketch | +| **Phase 3** | NIXLBackend | LOW | 6 weeks | UCX, InfiniBand | + +**CLI Integration (proposed):** + +```bash +# S3 as cold tier (auto-migrate after 1 hour idle) +python -m kv_cache.cli \ + --model llama3.1-70b-instruct \ + --cache-dir /mnt/nvme/kv_cache \ + --s3-bucket my-kv-cache \ + --s3-cold-threshold 3600 + +# Redis as shared cache (multi-server deployment) +python -m kv_cache.cli \ + --model llama3.1-8b \ + --redis-host redis.cluster.local \ + --redis-ttl 7200 + +# GDS for maximum NVMe performance +python -m kv_cache.cli \ + --model llama3.1-70b-instruct \ + --storage-backend gds \ + --cache-dir /mnt/nvme/kv_cache + +# Native multi-client (future) +python -m kv_cache.cli \ + --num-clients 4 \ + --cache-dir /mnt/nvme/kv_benchmark \ + --num-users 50 \ + --model llama3.1-8b +``` + +--- + +### 11.8 Research References + +| Technology | Documentation | Key Paper/Blog | +|------------|---------------|----------------| +| GPUDirect Storage | [NVIDIA Docs](https://docs.nvidia.com/gpudirect-storage/overview-guide/index.html) | [GTC 2020: Magnum IO](https://developer.nvidia.com/blog/gpudirect-storage/) | +| NIXL | [GitHub](https://github.com/ai-dynamo/nixl) | NVIDIA Dynamo Architecture | +| LMCache | [Docs](https://docs.lmcache.ai/) | [CacheGen (SIGCOMM 2024)](https://dl.acm.org/doi/10.1145/3651890.3672274) | +| KV Cache Compression | [KVPress](https://github.com/NVIDIA/kvpress) | [Scissorhands (NeurIPS 2023)](https://arxiv.org/abs/2305.17118) | +| Disaggregated Inference | [DistServe](https://arxiv.org/abs/2401.09670) | [Splitwise (ISCA 2024)](https://arxiv.org/abs/2311.18677) | + +--- + +## Conclusion + +This benchmark provides a comprehensive framework for evaluating multi-tier KV cache storage systems. Key takeaways: + +1. **Waterfall LRU** keeps hot data in fast tiers (6.4× speedup GPU vs NVMe) +2. **Autoscaling** discovers production capacity automatically +3. **Hardware validation** bypasses OS caching for true device measurement +4. **Metric selection matters:** Use correct metrics for your `cpu_mem` setting +5. **Multiple trials required:** Report median to account for variance + +For MLPerf submissions, prioritize: +- `decode_bytes_read_gb` at `cpu_mem=0GB` (2.6× differentiation) +- `nvme_device_p95_ms` for hardware comparison +- 3-5 trials with fixed `--seed 42` + +--- + +**Support:** hazem_awadallah@kingston.com +**Repository:** [Link to repo] +**License:** Apache 2.0 diff --git a/kv_cache_benchmark/MLperf v3 KV cache proposal.pdf b/kv_cache_benchmark/MLperf v3 KV cache proposal.pdf deleted file mode 100644 index a07e72ab..00000000 Binary files a/kv_cache_benchmark/MLperf v3 KV cache proposal.pdf and /dev/null differ diff --git a/kv_cache_benchmark/README.md b/kv_cache_benchmark/README.md index 5f0637c1..b6757782 100644 --- a/kv_cache_benchmark/README.md +++ b/kv_cache_benchmark/README.md @@ -1,766 +1,1718 @@ -# MLPerf Storage KV Cache Benchmark - -A storage benchmarking tool for Large Language Model inference systems. This benchmark measures the performance of your storage subsystem under realistic KV cache offloading workloads, helping you answer critical questions about hardware capacity and configuration. - -**Author:** Hazem Awadallah, Kingston Digital -**License:** Apache 2.0 -**Version:** MLPerf Storage v3.0 (Enhanced) - ---- - -## Table of Contents - -1. [What This Benchmark Does](#what-this-benchmark-does) -2. [Architecture Overview](#architecture-overview) -3. [System Requirements](#system-requirements) -4. [Installation](#installation) -5. [Quick Start](#quick-start) -6. [Running the Benchmark](#running-the-benchmark) -7. [ShareGPT Replay Workloads](#sharegpt-replay-workloads) -8. [Using the Wrapper Script](#using-the-wrapper-script) -9. [Understanding Results](#understanding-results) -10. [Unit Testing](#unit-testing) -11. [Excel Export](#excel-export) -12. [MLPerf Submission Guidelines](#mlperf-submission-guidelines) -13. [Troubleshooting](#troubleshooting) - ---- - -## What This Benchmark Does - -During LLM inference, models store intermediate attention data in a structure called the KV (Key-Value) cache. This cache grows with conversation length and can consume enormous amounts of memory. Production systems offload this cache from expensive GPU VRAM to cheaper CPU RAM or NVMe storage. - -This benchmark simulates that offloading behavior. It generates realistic multi-user inference workloads and measures how your storage performs under pressure. It measures these components: - -- How many concurrent users your hardware can support -- Whether your NVMe drive is fast enough to handle cache spillover -- The real latency impact of each storage tier -- Where the bottleneck sits in your system - -This is not a pass/fail test. It is a diagnostic tool for system architects and performance engineers. - ---- - -## Architecture Overview - -The benchmark implements a three-tier memory hierarchy that mirrors production LLM serving systems. - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ KV Cache Benchmark Architecture │ -└─────────────────────────────────────────────────────────────────────────────┘ - - ┌──────────────────┐ - │ User Requests │ - │ (Multi-tenant) │ - └────────┬─────────┘ - │ - ▼ - ┌──────────────────────────────────────┐ - │ Request Queue │ - │ (Priority-based: QoS levels) │ - │ Interactive > Responsive > Batch │ - └──────────────────┬───────────────────┘ - │ - ▼ - ┌────────────────────────────────────────────────────────┐ - │ IntegratedBenchmark │ - │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │ - │ │ Prefill │ │ Decode │ │ Conversation │ │ - │ │ (Write) │ │ (Read) │ │ Manager │ │ - │ └──────┬──────┘ └──────┬──────┘ └────────┬────────┘ │ - └─────────┼────────────────┼─────────────────┼───────────┘ - │ │ │ - └────────────────┼─────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────────────────────┐ -│ MultiTierCache │ -│ (Waterfall LRU Eviction) │ -│ │ -│ New Data ─────► Always targets fastest available tier │ -│ If full, LRU entry cascades down │ -│ │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ │ │ -│ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ │ -│ │ │ GPU VRAM │ │ CPU RAM │ │ NVMe │ │ │ -│ │ │ (Tier 1) │─────►│ (Tier 2) │─────►│ (Tier 3) │ │ │ -│ │ │ │ LRU │ │ LRU │ │ │ │ -│ │ │ Sub-ms │evict │ Tens of ms │evict │ Hundreds │ │ │ -│ │ │ latency │ │ latency │ │ of ms │ │ │ -│ │ │ │ │ │ │ │ │ │ -│ │ │ PyTorch/CuPy │ │ NumPy arrays │ │ .npy files │ │ │ -│ │ │ tensors │ │ in memory │ │ on disk │ │ │ -│ │ └───────────────┘ └───────────────┘ └───────────────┘ │ │ -│ │ │ │ -│ │ ◄──── HOT DATA ────────────────────────────── COLD DATA ────► │ │ -│ │ │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - -### Key Components - -**MultiTierCache**: The core engine. It decides where to place data based on available space and access patterns. New data always targets the fastest tier. When that tier fills up, the least recently used entry gets pushed down to the next tier. - -**Inference Phases**: The benchmark models two distinct I/O patterns: -- **Prefill**: Write-heavy. Processing the user prompt generates new KV cache entries. -- **Decode**: Read-heavy. Generating each output token requires reading the existing cache. - -**User Simulation**: Creates realistic traffic from multiple concurrent users with different behaviors (chatbot, coding assistant, document analysis) and priority levels. - -**Autoscaler**: Automatically adjusts user load to find either the maximum users your system can handle (QoS mode) or the peak throughput of your storage (capacity mode). - ---- - -## System Requirements - -### Minimum - -- CPU: 8+ cores (AMD EPYC, Intel Xeon) -- RAM: 32 GB -- Storage: 256 GB free space on SSD -- OS: Linux (Ubuntu 22.04, RHEL 9, or similar) -- Python: 3.8 or higher -- No GPU required (runs in CPU-only mode) - -### Recommended - -- CPU: 32+ cores -- RAM: 128 GB or more -- GPU: NVIDIA A100/H100 with 40+ GB VRAM (optional but enables full three-tier testing) -- Storage: 1 TB+ on NVMe (PCIe Gen4 or Gen5) -- Tools: `bc`, `jq` for the wrapper script - ---- - -## Installation - -1. Clone or download this repository. - -2. Install Python dependencies: - -```bash -pip install -r requirements.txt -``` - -Or install core dependencies manually: - -```bash -pip install numpy -``` - -3. For GPU support (optional): - -```bash -pip install torch # or cupy-cuda12x for CuPy -``` - -4. For ShareGPT replay workloads (optional): - -```bash -pip install tiktoken -``` - -5. For Excel export (optional): - -```bash -pip install pandas openpyxl -``` - -6. Verify the installation: - -```bash -python3 kv-cache.py --help -``` - ---- - -## Quick Start - -Run a basic storage test with 50 users for 2 minutes: - -```bash -python3 kv-cache.py \ - --model llama3.1-8b \ - --num-users 50 \ - --duration 120 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 4 \ - --generation-mode realistic \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output results.json -``` - -This forces all cache operations to hit your NVMe drive, giving you a baseline measurement of storage performance. - ---- - -## Running the Benchmark - -### Command Line Options - -``` -python3 kv-cache.py [options] - -Required Arguments: - --model MODEL Model configuration to use. Choices: - tiny-1b, mistral-7b, llama2-7b, llama3.1-8b, - llama3.1-70b-instruct - --num-users N Number of concurrent users to simulate - --duration SECONDS Duration of the benchmark in seconds - -Memory Configuration: - --gpu-mem-gb N GPU VRAM budget in GB (0 to disable GPU tier) - --cpu-mem-gb N CPU RAM budget in GB (0 to disable CPU tier) - --cache-dir PATH Directory for NVMe cache files (defaults to temp directory) - -Token Generation: - --generation-mode Token generation speed simulation. Choices: - - none: Pure storage test, no GPU simulation - - fast: 2ms per token (high-end GPU) - - realistic: 30ms per token (typical production) - -Caching Features: - --disable-multi-turn Disable multi-turn conversation caching - --disable-prefix-caching - Disable prefix caching (shared system prompts) - -Autoscaling: - --enable-autoscaling Enable workload autoscaling - --autoscaler-mode Autoscaling strategy. Choices: - - qos: Latency-based, finds max users at target saturation - - capacity: Throughput-based, finds peak storage performance - --target-saturation N Target storage saturation for QoS autoscaling (0.0-1.0, - default: 0.8) - -ShareGPT Replay (NEW): - --dataset-path PATH Path to ShareGPT JSON for realistic workload replay - --max-conversations N Max conversations to load from dataset (default: 500) - --request-rate RATE Target request arrival rate (requests/sec) - --max-requests N Stop after N requests (for fixed-length runs) - -RAG Workload: - --enable-rag Enable RAG workload simulation - --rag-num-docs N Number of RAG documents to ingest - -Performance and Output: - --performance-profile Profile for pass/fail criteria. Choices: - - latency: Default, evaluates P95 latency targets - - throughput: For MLPerf submission, evaluates tokens/sec - --output FILE Write results to JSON file - --xlsx-output FILE Export results to Excel/CSV file (NEW) - --seed N Seed for random number generators (required for MLPerf - reproducibility) - -Resource Limits: - --max-concurrent-allocs N - Limit concurrent cache allocations to bound RAM usage. - 0 = unlimited. Recommended: 8-16 for large models to - prevent memory explosion. -``` - -### Test Scenarios - -#### Scenario 1: Storage-Only Baseline - -Isolate your NVMe drive by setting GPU memory to zero. This tells you the raw performance of your storage. - -```bash -python3 kv-cache.py \ - --model llama3.1-8b \ - --num-users 50 \ - --duration 180 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 4 \ - --generation-mode realistic \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output results_storage_only.json -``` - -#### Scenario 2: Realistic Production Setup - -Test a balanced three-tier configuration that mirrors production deployment. - -```bash -python3 kv-cache.py \ - --model llama3.1-8b \ - --num-users 100 \ - --duration 300 \ - --gpu-mem-gb 16 \ - --cpu-mem-gb 32 \ - --generation-mode realistic \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output results_production.json -``` - -#### Scenario 3: Find Maximum User Count (QoS Mode) - -Let the autoscaler discover how many users your system can handle while maintaining acceptable latency. - -```bash -python3 kv-cache.py \ - --model llama3.1-8b \ - --num-users 20 \ - --duration 300 \ - --gpu-mem-gb 16 \ - --cpu-mem-gb 32 \ - --enable-autoscaling \ - --autoscaler-mode qos \ - --generation-mode realistic \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output results_autoscale_qos.json -``` - -#### Scenario 4: Find Peak Storage Throughput (Capacity Mode) - -Discover the absolute maximum I/O your storage can deliver by ignoring latency constraints. - -```bash -python3 kv-cache.py \ - --model llama3.1-70b-instruct \ - --num-users 10 \ - --duration 180 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 4 \ - --enable-autoscaling \ - --autoscaler-mode capacity \ - --generation-mode none \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output results_capacity.json -``` - ---- - -## ShareGPT Replay Workloads - -While synthetic workloads are excellent for controlled stress testing, they may not capture the nuances of real human-AI interaction. The **ShareGPT Replay** feature addresses this by loading actual conversation data. - -### Why Use ShareGPT? - -Real conversations exhibit different patterns than synthetic workloads: -- **Higher cache locality**: Users ask follow-up questions, reusing context -- **Variable context sizes**: Real queries vary wildly (10-16,000 tokens) -- **Multi-turn structure**: Conversation flows are preserved - -### Downloading the ShareGPT Dataset - -Download the full dataset from Hugging Face (~1.2 GB): - -```bash -wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json -``` - -**Alternative: Smaller subset for quick testing (~40 MB):** - -```bash -wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split_no_imsorry.json -``` - -### Basic ShareGPT Invocation - -```bash -python3 kv-cache.py \ - --model llama3.1-8b \ - --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json \ - --max-conversations 500 \ - --num-users 50 \ - --duration 300 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 4 \ - --generation-mode realistic \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output results_sharegpt.json -``` - -### ShareGPT with Rate Limiting - -Control the request arrival rate for steady-state testing: - -```bash -python3 kv-cache.py \ - --model llama3.1-70b-instruct \ - --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json \ - --max-conversations 1000 \ - --request-rate 10.0 \ - --num-users 100 \ - --duration 600 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 8 \ - --generation-mode none \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output results_sharegpt_rate_limited.json -``` - -### ShareGPT with Fixed Request Count - -Run exactly N requests for reproducible benchmarks: - -```bash -python3 kv-cache.py \ - --model llama3.1-8b \ - --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json \ - --max-requests 5000 \ - --num-users 50 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 4 \ - --generation-mode realistic \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output results_sharegpt_fixed.json -``` - -### Comparing Real vs Synthetic Workloads - -| Metric | ShareGPT (Real) | Synthetic (Random) | -| :--- | :--- | :--- | -| Mean Context Size | ~133 tokens | ~2,676 tokens | -| Cache Hit Rate | 85-97% | 50-70% | -| Multi-turn Locality | High | Medium | -| Throughput | Higher | Lower | -| NVMe Stress | Moderate | Extreme | - -**Use ShareGPT** when you want to model real chatbot/assistant usage. -**Use Synthetic** when you want worst-case stress testing or controlled experiments. - ---- - -## Using the Wrapper Script - -The `kv-cache-wrapper.sh` script automates a complete benchmark suite. It detects your hardware, calculates appropriate parameters, and runs multiple test scenarios. - -### Basic Usage - -```bash -./kv-cache-wrapper.sh -``` - -This runs all test scenarios with default settings. Expect roughly 30 minutes for the full suite. - -### Options - -``` -./kv-cache-wrapper.sh [options] - - -m MODEL Model to benchmark (default: llama3.1-8b) - -t SECONDS Duration for tier comparison tests (default: 120) - -s SECONDS Duration for storage saturation test (default: 180) - -r SECONDS Duration for production test (default: 180) - -a SECONDS Duration for autoscaling tests (default: 300) - -w LIST Comma-separated list of workloads to run - -u USERS Override baseline user count - -U USERS Override high-load user count - -R Enable RAG workload - -D DOCS Number of RAG documents (default: 10) - -h Show help -``` - -### Available Workloads - -```bash -# Run only the storage isolation test -./kv-cache-wrapper.sh -w storage-only - -# Run production and autoscaling tests -./kv-cache-wrapper.sh -w production,autoscale - -# Run MLPerf submission tests -./kv-cache-wrapper.sh -w mlperf_submission -``` - ---- - -## Understanding Results - -### Key Metrics - -**Throughput (tokens/sec)**: How many tokens the system processes per second. Higher is better. - -**Storage Throughput (tokens/sec)**: Raw I/O performance calculated from storage latency, not wall-clock time. This is the fairer metric for comparing storage tiers. - -**End-to-End Latency**: Total time from request submission to completion. This is what users experience. - -**Storage I/O Latency**: Time spent reading from and writing to storage tiers. This measures your hardware. - -**Queue Wait Time**: Time requests spend waiting before processing begins. If this dominates, your system is overloaded. - -**Cache Hit Rate**: Percentage of reads served from cache. Higher rates mean less storage pressure. - -### Reading the Output - -``` -### STORAGE PERFORMANCE ASSESSMENT: PASS ### - Criteria Passed: 4/4 - [PASS] NVMe Write P95 < 500ms: 45.20ms - [PASS] NVMe Read P95 < 200ms: 123.45ms - [PASS] CPU RAM P95 < 150ms: 12.30ms - [PASS] Cache Hit Rate > 30%: 67.5% - -### OVERALL PERFORMANCE ### - Total Requests: 2847 - Total Tokens Generated: 489,231 - Avg Throughput: 1,630.77 tok/s - Storage Throughput: 2,105.32 tok/s - -### LATENCY BREAKDOWN ### - End-to-End: mean 89.3ms, P50 45.2ms, P95 312.4ms - Storage I/O: mean 23.1ms, P50 12.4ms, P95 89.2ms -``` - ---- - -## Unit Testing - -This package includes a comprehensive pytest-based test suite to verify core functionality without running the full benchmark. - -### Running Tests - -```bash -# Run all tests with verbose output -pytest test_kv_cache.py -v - -# Run with shorter traceback -pytest test_kv_cache.py -v --tb=short - -# Run specific test class -pytest test_kv_cache.py -k "TestModelConfig" -v - -# Run only CPU tests (skip GPU tests if no CUDA) -pytest test_kv_cache.py -v -m "not skipif" -``` - -### Test Coverage - -The test suite covers 12 component categories: - -| Test Class | Coverage | -|------------|----------| -| `TestModelConfig` | Model configurations, KV cache size calculations | -| `TestInferenceRequest` | Request dataclass, cache key generation | -| `TestQoSProfiles` | QoS levels, SLA targets, priorities | -| `TestKVCacheGenerator` | Determinism, shapes, dtypes, precomputed buffers | -| `TestCPUMemoryBackend` | Write/read/delete/clear operations | -| `TestNVMeBackend` | File I/O, metadata, temp directories | -| `TestGPUMemoryBackend` | CUDA tensors, device placement (skipped without GPU) | -| `TestConversationManager` | Multi-turn tracking, eviction | -| `TestUserSimulator` | User generation, QoS distribution | -| `TestMultiTierCache` | CPU-only mode, allocation, access | -| `TestMultiTierCacheWithGPU` | GPU tier, waterfall eviction (skipped without GPU) | -| `TestXLSXExport` | CSV/Excel export (skipped without pandas) | - -### Expected Runtime - -- **Without GPU**: ~3-5 seconds -- **With GPU**: ~5-10 seconds - -GPU tests are automatically skipped if CUDA is not available. - ---- - -## Excel Export - -The benchmark can export results directly to Excel or CSV format for analysis. - -### Basic Usage - -```bash -python3 kv-cache.py \ - --model llama3.1-8b \ - --num-users 50 \ - --duration 120 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 4 \ - --seed 42 \ - --output results.json \ - --xlsx-output results.xlsx -``` - -### Output Format - -The Excel file contains a single row with all key metrics: - -| Column | Description | -|--------|-------------| -| Model | Model configuration used | -| Num Users | Concurrent user count | -| Duration (s) | Benchmark duration | -| GPU Mem (GB) | GPU memory budget | -| CPU Mem (GB) | CPU memory budget | -| Total Requests | Requests completed | -| Total Tokens | Tokens processed | -| Avg Throughput (tok/s) | Wall-clock throughput | -| Storage Throughput (tok/s) | Storage I/O throughput | -| Cache Hit Rate | Percentage of cache hits | -| E2E Latency P95 (ms) | End-to-end 95th percentile | -| Storage IO P95 (ms) | Storage I/O 95th percentile | - -### Fallback Behavior - -- **With openpyxl**: Exports to `.xlsx` format -- **Without openpyxl**: Falls back to `.csv` format -- **Without pandas**: Export is skipped with a warning - ---- - -## MLPerf Submission Guidelines - -For official MLPerf v3.0 storage submissions, use these standardized commands. **These invocations have been validated through extensive discovery testing** (1,411 Fast system tests, 268 Slow system tests comparing 14,000 MB/s vs 3,000 MB/s storage). - -### Discovery Test Key Findings - -| Finding | Impact | -|---------|--------| -| **Metric selection depends on cpu_mem** | Storage Throughput shows only 1.1x at cpu_mem=0GB but 2.2x at cpu_mem=4GB | -| **Best models for differentiation** | llama3.1-8b and mistral-7b show 2.31x ratio | -| **High variance observed** | CV 50-125%, requires 3-5 trials minimum | -| **100% win rate metrics** | Decode Bytes Read and Wall-Clock Throughput at cpu_mem=0GB | - -### Option 1: Maximum Storage Stress (cpu_mem=0GB) - -Use when you want to stress test NVMe and measure I/O volume differentiation. - -**Primary Metrics:** Decode Bytes Read (2.62x differentiation), Wall-Clock Throughput (2.43x differentiation) - -```bash -# MLPerf v3.0: Maximum Storage Stress Test (8B Model) -# Run 3-5 trials for statistical significance -for trial in 1 2 3 4 5; do - python3 kv-cache.py \ - --model llama3.1-8b \ - --num-users 200 \ - --duration 300 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 0 \ - --max-concurrent-allocs 16 \ - --generation-mode none \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output mlperf_v3_stress_8b_trial${trial}.json -done -``` - -**⚠️ Important:** At cpu_mem=0GB, do NOT use Storage Throughput as your primary metric—use Decode Bytes Read or Wall-Clock Throughput instead. - -### Option 2: Storage Throughput Focus (cpu_mem=4GB) - -Use when you want Storage Throughput (tok/s) as your primary metric. - -**Primary Metric:** Storage Throughput (2.2x differentiation, 97% win rate) - -```bash -# MLPerf v3.0: Storage Throughput Test (8B Model) -for trial in 1 2 3 4 5; do - python3 kv-cache.py \ - --model llama3.1-8b \ - --num-users 100 \ - --duration 300 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 4 \ - --max-concurrent-allocs 0 \ - --generation-mode none \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output mlperf_v3_throughput_8b_trial${trial}.json -done -``` - -### Option 3: Large Model Submission (70B) - -For maximum per-request storage stress (10x larger KV cache per token): - -```bash -# MLPerf v3.0: Large Model Storage Stress -for trial in 1 2 3; do - python3 kv-cache.py \ - --model llama3.1-70b-instruct \ - --num-users 70 \ - --duration 300 \ - --gpu-mem-gb 0 \ - --cpu-mem-gb 0 \ - --max-concurrent-allocs 4 \ - --generation-mode none \ - --cache-dir /mnt/nvme \ - --seed 42 \ - --output mlperf_v3_stress_70b_trial${trial}.json -done -``` - -### Critical Parameters (Discovery-Validated) - -| Parameter | Value | Rationale | -|-----------|-------|-----------| -| **seed 42** | Required | Reproducibility across systems | -| **gpu-mem-gb 0** | Required | Isolates storage performance | -| **cpu-mem-gb** | 0 or 4 | 0GB for max stress (use I/O volume metrics), 4GB for Storage Throughput metric | -| **max-concurrent-allocs** | 0, 4, or 16 | 0 for throughput, 16 for stress testing | -| **generation-mode** | none or realistic | none for pure I/O, realistic for production simulation | -| **num-users** | 100-200 | Differentiation stable across range; higher = more throughput | -| **duration** | 300-600 | 5-10 minutes for stable metrics | - -### Trial Requirements - -| User Count | Variance (CV) | Minimum Trials | -|------------|---------------|----------------| -| 10 users | ~52% | 3 | -| 50-100 users | ~115-125% | 3-5 | -| 200 users | ~110-120% | 3-5 | - -Report **median** rather than mean for publication-quality results. - ---- - -## Troubleshooting - -### Out of Memory Errors - -Reduce the number of concurrent users or limit parallel allocations: - -```bash -python3 kv-cache.py ... --max-concurrent-allocs 50 -``` - -### Benchmark Hangs - -The system may be thrashing. Reduce users or increase memory budgets. - -### Poor Cache Hit Rates - -Low hit rates indicate your working set exceeds available fast memory. Either: -- Increase GPU/CPU memory budgets -- Reduce user count -- Accept that cold data will hit storage - -### Results Vary Between Runs - -Use the `--seed` flag for reproducible results. - ---- - -## Files in This Package - -- `kv-cache.py`: Main benchmark implementation with ShareGPT support -- `test_kv_cache.py`: Pytest unit test suite -- `requirements.txt`: Python dependencies -- `README.md`: This documentation -- `MLperf v3 KV cache proposal.md`: Detailed technical documentation - ---- - -## License - -Apache License 2.0 - ---- - -## Contact - -For questions or feedback, open an issue on the repository or contact the MLPerf Storage Working Group. +# MLPerf Storage KV Cache Benchmark + +A storage benchmarking tool for Large Language Model inference systems. This benchmark measures the performance of your storage subsystem under realistic KV cache offloading workloads, helping you answer critical questions about hardware capacity and configuration. + +**Author:** Hazem Awadallah, Kingston Digital +**License:** Apache 2.0 +**Version:** MLPerf Storage v3.0 (Enhanced) +**Updated:** February 4, 2026 + +--- + +## Table of Contents + +1. [What This Benchmark Does](#what-this-benchmark-does) +2. [Architecture Overview](#architecture-overview) +3. [Project Structure](#project-structure) +4. [System Requirements](#system-requirements) +5. [Installation](#installation) +6. [Configuration](#configuration) +7. [Quick Start](#quick-start) +8. [Running the Benchmark](#running-the-benchmark) +9. [ShareGPT Replay Workloads](#sharegpt-replay-workloads) +10. [BurstGPT Trace Replay](#burstgpt-trace-replay) +11. [Using the Wrapper Script](#using-the-wrapper-script) +12. [Understanding Results](#understanding-results) +13. [Unit Testing](#unit-testing) +14. [Excel Export](#excel-export) +15. [MLPerf Submission Guidelines](#mlperf-submission-guidelines) +16. [Troubleshooting](#troubleshooting) + +--- + +## What This Benchmark Does + +During LLM inference, models store intermediate attention data in a structure called the KV (Key-Value) cache. This cache grows with conversation length and can consume enormous amounts of memory. Production systems offload this cache from expensive GPU VRAM to cheaper CPU RAM or NVMe storage. + +This benchmark simulates that offloading behavior. It generates realistic multi-user inference workloads and measures how your storage performs under pressure. It answers: + +- The real latency impact of each storage tier (GPU vs. CPU vs. NVMe) +- Whether your NVMe drive is fast enough to handle cache spillover +- How many concurrent users your storage can sustain at a given throughput +- Where the storage bottleneck sits in your system + +This is not a pass/fail test. It is a diagnostic tool for system architects and performance engineers. + +> **Note:** The benchmark uses a one-way waterfall — data flows from GPU → CPU → NVMe but is never promoted back on read. This maximizes storage stress but means capacity planning results reflect storage throughput limits, not end-to-end serving capacity (which depends on promotion policy). See the proposal §3.4 for design rationale. + +> **Terminology:** "NVMe" is used throughout as shorthand for the third storage tier. The benchmark accepts any block device or filesystem via `--cache-dir` (SATA SSD, HDD, RAM disk, NFS, etc.). + +--- + +## Architecture Overview + +The benchmark implements a three-tier memory hierarchy that mirrors production LLM serving systems. + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ KV Cache Benchmark Architecture │ +└─────────────────────────────────────────────────────────────────────────────┘ + + ┌──────────────────┐ + │ User Requests │ + │ (Multi-tenant) │ + └────────┬─────────┘ + │ + ▼ + ┌──────────────────────────────────────┐ + │ Request Queue │ + │ (Priority-based: QoS levels) │ + │ Interactive > Responsive > Batch │ + └──────────────────┬───────────────────┘ + │ + ▼ + ┌────────────────────────────────────────────────────────┐ + │ IntegratedBenchmark │ + │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │ + │ │ Prefill │ │ Decode │ │ Conversation │ │ + │ │ (Write) │ │ (Read) │ │ Manager │ │ + │ └──────┬──────┘ └──────┬──────┘ └────────┬────────┘ │ + └─────────┼────────────────┼─────────────────┼───────────┘ + │ │ │ + └────────────────┼─────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────────────┐ +│ MultiTierCache │ +│ (Waterfall LRU Eviction) │ +│ │ +│ New Data ─────► Always targets fastest available tier │ +│ If full, LRU entry cascades down │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ │ │ +│ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ │ +│ │ │ GPU VRAM │ │ CPU RAM │ │ NVMe │ │ │ +│ │ │ (Tier 1) │─────►│ (Tier 2) │─────►│ (Tier 3) │ │ │ +│ │ │ │ LRU │ │ LRU │ │ │ │ +│ │ │ Sub-ms │evict │ Tens of ms │evict │ Hundreds │ │ │ +│ │ │ latency │ │ latency │ │ of ms │ │ │ +│ │ │ │ │ │ │ │ │ │ +│ │ │ PyTorch/CuPy │ │ NumPy arrays │ │ .npy files │ │ │ +│ │ │ tensors │ │ in memory │ │ on disk │ │ │ +│ │ └───────────────┘ └───────────────┘ └───────────────┘ │ │ +│ │ │ │ +│ │ ◄──── HOT DATA ────────────────────────────── COLD DATA ────► │ │ +│ │ │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### Key Components + +**MultiTierCache**: The core engine. It decides where to place data based on available space and access patterns. New data always targets the fastest tier. When that tier fills up, the least recently used entry gets pushed down to the next tier. + +**Inference Phases**: The benchmark models two distinct I/O patterns: +- **Prefill**: Write-heavy. Processing the user prompt generates new KV cache entries. +- **Decode**: Read-heavy. Generating each output token requires reading the existing cache. + +**User Simulation**: Creates realistic traffic from multiple concurrent users with different behaviors (chatbot, coding assistant, document analysis) and priority levels. + +**Autoscaler**: Automatically adjusts user load to find either the maximum users your system can handle (QoS mode) or the peak throughput of your storage (capacity mode). + +--- + +## Project Structure + +The benchmark uses a modular architecture for maintainability and extensibility: + +``` +mlperf-kv-cache/ +├── kv-cache.py # CLI entry point (backward-compatible wrapper) +├── config.yaml # YAML configuration file +├── pyproject.toml # Python packaging configuration +├── test_kv_cache.py # Unit tests +├── README.md # This file +│ +└── kv_cache/ # Core package + ├── __init__.py # Package exports + ├── _compat.py # Optional dependency detection + ├── backends.py # Storage tier implementations (GPU/CPU/NVMe) + ├── benchmark.py # IntegratedBenchmark orchestration + ├── cache.py # MultiTierCache with waterfall eviction + ├── cli.py # Argument parsing and main() entry point + ├── config.py # ConfigLoader and cfg() helper + ├── conversation.py # Multi-turn conversation state management + ├── models.py # Model configs, QoS profiles, data classes + ├── monitoring.py # Metrics collection and storage monitoring + ├── prefix_cache.py # System prompt prefix caching + ├── rag.py # RAG workload simulation + └── workload.py # User simulation and request generation +``` + +### Module Responsibilities + +| Module | Purpose | +|--------|---------| +| `cli.py` | Parses CLI arguments, loads config, calls `IntegratedBenchmark` | +| `config.py` | Loads `config.yaml`, provides `cfg()` helper for accessing nested values | +| `models.py` | Defines `ModelConfig`, `QoSLevel`, `InferenceRequest`, and other data classes | +| `cache.py` | Implements `MultiTierCache` with LRU eviction and tier management | +| `backends.py` | `GPUMemoryBackend`, `CPUMemoryBackend`, `NVMeBackend` storage implementations | +| `benchmark.py` | `IntegratedBenchmark` orchestrates the full benchmark run | +| `workload.py` | `UserSimulator` generates realistic request patterns | +| `conversation.py` | `ConversationManager` tracks multi-turn state | +| `prefix_cache.py` | `PrefixMatcher` caches common system prompts | +| `rag.py` | `RAGDocumentManager` simulates document retrieval | +| `monitoring.py` | `StorageMonitor`, `QoSMonitor`, `WorkloadAutoscaler` for observability | +| `_compat.py` | Detects optional dependencies (torch, cupy, tiktoken, etc.) | + +--- + +## System Requirements + +### Minimum + +- CPU: 8+ cores (AMD EPYC, Intel Xeon) +- RAM: 32 GB +- Storage: 256 GB free space on SSD +- OS: Linux (Ubuntu 22.04, RHEL 9, or similar) or Windows +- Python: 3.10 or higher +- No GPU required (runs in CPU-only mode) + +### Recommended + +- CPU: 32+ cores +- RAM: 128 GB or more +- GPU: NVIDIA A100/H100 with 40+ GB VRAM (optional but enables full three-tier testing) +- Storage: 1 TB+ on NVMe (PCIe Gen4 or Gen5) +- Tools: `bc`, `jq` for the wrapper script (Linux) + +### Memory Requirements by Model + +The benchmark's RAM usage depends on the model's KV cache size per token and the `--max-concurrent-allocs` setting. Use this table to select appropriate settings for your system. + +#### KV Cache Size Per Token + +| Model | Architecture | kv_heads | Bytes/Token | MB/Token | +|-------|--------------|----------|-------------|----------| +| `tiny-1b` | GQA | 4 | 24,576 | 0.023 | +| `mistral-7b` | GQA | 8 | 131,072 | 0.125 | +| `llama2-7b` | **MHA** | 32 | 524,288 | **0.500** | +| `llama3.1-8b` | GQA | 8 | 131,072 | 0.125 | +| `llama3.1-70b-instruct` | GQA | 8 | 327,680 | 0.313 | +| `deepseek-v3` | **MLA** | N/A | 70,272 | 0.067 | +| `qwen3-32b` | GQA | 8 | 163,840 | 0.153 | +| `gpt-oss-120b` | MoE | 8 | 73,728 | 0.069 | +| `gpt-oss-20b` | MoE | 8 | 49,152 | 0.046 | + +> **Note:** `llama2-7b` uses Multi-Head Attention (MHA) with 32 KV heads, making it **4× larger** than similarly-sized GQA models like `llama3.1-8b`. This is intentional for stress testing. + +#### Peak In-Flight RAM by `--max-concurrent-allocs` + +Formula: `Peak RAM = max_concurrent_allocs × avg_context_tokens × bytes_per_token` + +Assumes average context of 8,192 tokens (midpoint of coding user profile): + +| Model | Per User | 200 users (unlimited) | 16 allocs | 8 allocs | 4 allocs | +|-------|----------|----------------------|-----------|----------|----------| +| `tiny-1b` | 0.2 GB | 40 GB | 3.2 GB | 1.6 GB | 0.8 GB | +| `mistral-7b` | 1.0 GB | 200 GB | 16 GB | 8 GB | 4 GB | +| `llama2-7b` | **4.0 GB** | **800 GB** | **64 GB** | **32 GB** | **16 GB** | +| `llama3.1-8b` | 1.0 GB | 200 GB | 16 GB | 8 GB | 4 GB | +| `llama3.1-70b-instruct` | 2.5 GB | 500 GB | 40 GB | 20 GB | 10 GB | +| `deepseek-v3` | 0.54 GB | 107 GB | 9 GB | 4.3 GB | 2.1 GB | +| `qwen3-32b` | 1.25 GB | 250 GB | 20 GB | 10 GB | 5 GB | +| `gpt-oss-120b` | 0.56 GB | 112 GB | 9 GB | 4.5 GB | 2.3 GB | +| `gpt-oss-20b` | 0.38 GB | 76 GB | 6 GB | 3 GB | 1.5 GB | + +#### Recommended Settings by System RAM + +| System RAM | Recommended `--max-concurrent-allocs` | Safe Models (unlimited) | +|------------|---------------------------------------|-------------------------| +| 32 GB | 4 | `tiny-1b`, `gpt-oss-20b` | +| 64 GB | 8 | `mistral-7b`, `llama3.1-8b`, `qwen3-32b` | +| 128 GB | 16 | All except `llama2-7b` | +| 256 GB | 16–32 | All models with bounded concurrency | +| 512 GB+ | 32–64 | All models | + +> **⚠️ Critical:** Running `llama2-7b` with `--max-concurrent-allocs 0` (unlimited) requires **800+ GB RAM**. Always set this parameter on memory-constrained systems. Note: `deepseek-v3` uses MLA which compresses KV cache ~25× vs MHA, so it requires far less RAM than its parameter count suggests. + +#### Impact on Benchmark Results + +The `--max-concurrent-allocs` parameter affects benchmark metrics in important ways: + +| Setting | Throughput | Latency | Realism | Use Case | +|---------|------------|---------|---------|----------| +| **0 (unlimited)** | Highest | Lower (less queueing) | Lower | Max hardware stress | +| **16** | High | Moderate | Moderate | Storage stress testing | +| **8** | Moderate | Higher (more queueing) | Higher | Production simulation | +| **4** | Lower | Highest (significant queueing) | Highest | Memory-constrained systems | + +**Why this matters:** +- **Lower values** (4–8) cause requests to queue, increasing measured latencies but reducing RAM usage. This better simulates production where admission control limits concurrency. +- **Higher values** (16–32) maximize parallel I/O, showing peak hardware throughput but requiring more RAM. +- **Unlimited (0)** removes all queueing delays but can exhaust RAM or cause artificial latency spikes from GC pressure. + +**For MLPerf submissions:** Use `--max-concurrent-allocs 16` for stress tests (Test 1) to balance throughput measurement with memory safety. + +--- + +## Installation + +### Option 1: Install as Package (Recommended) + +Install the package with pip: + +```bash +# Clone the repository +git clone https://github.com/mlcommons/storage.git +cd storage/kv-cache + +# (Optional) Upgrade pip and setuptools if you have an older version +pip install --upgrade pip setuptools wheel + +# Install with all optional dependencies +pip install ".[full]" + +# Or install with specific features +pip install ".[yaml]" # YAML config support only +pip install ".[gpu]" # GPU support (PyTorch + CuPy) +pip install ".[tokenizer]" # tiktoken for ShareGPT +pip install ".[reporting]" # pandas + openpyxl for Excel output +pip install ".[dev]" # Development tools (pytest, ruff, mypy) +``` + +After installation, run the benchmark from anywhere: + +```bash +kv-cache --help +# or +mlperf-kv-cache --help +``` + +### Option 2: Run Directly (No Install) + +```bash +# Clone and enter the directory +git clone https://github.com/mlcommons/storage.git +cd storage/kv-cache + +# Install dependencies manually +pip install numpy pyyaml + +# Run directly +python kv-cache.py --help +``` + +### Optional Dependencies + +Install based on your needs: + +```bash +# GPU support +pip install torch # PyTorch for GPU tensors +pip install cupy-cuda12x # CuPy for CUDA (adjust cuda version) + +# ShareGPT replay workloads +pip install tiktoken # OpenAI tokenizer + +# Excel/CSV export +pip install pandas openpyxl # DataFrame and Excel support +``` + +### Verify Installation + +```bash +# Check CLI is working +kv-cache --help + +# Or if running directly +python kv-cache.py --help + +# Run unit tests +pytest test_kv_cache.py -v +``` + +--- + +## Configuration + +The benchmark supports a YAML configuration file (`config.yaml`) for tuning internal parameters without modifying the source code. This is the **recommended approach** for MLPerf submissions to ensure reproducibility. + +### Using the Configuration File + +```bash +python3 kv-cache.py --config config.yaml [other CLI arguments] +``` + +**Note:** CLI arguments always take precedence over config file values for overlapping settings. + +### Configuration File Parameters (config.yaml) + +The configuration file controls internal benchmark behavior that affects workload realism and cache dynamics. These settings are **not** exposed as CLI arguments to prevent accidental misconfigurations in MLPerf submissions. + +> **Tip:** For most benchmarking scenarios, the defaults are carefully tuned. Only modify these if you understand the impact on your results. + +--- + +#### User Templates + +Controls the three simulated user personas. Each persona has distinct characteristics that model real-world usage patterns. + +| Persona | Behavior | Use Case | +|---------|----------|----------| +| **Chatbot** | Short prompts, quick responses, fast iteration | Customer service bots, casual conversation | +| **Coding** | Medium prompts with code context, moderate responses | IDE assistants, code completion | +| **Document** | Long prompts with full documents, lengthy analysis | Document summarization, legal/medical analysis | + +| Parameter | Type | Default | Impact | +|-----------|------|---------|--------| +| `user_templates.chatbot.context_range` | [min, max] | [512, 4096] | **KV cache write size per request.** Smaller values reduce storage pressure; larger values stress NVMe throughput. | +| `user_templates.chatbot.generation_range` | [min, max] | [50, 200] | **Decode phase duration.** More tokens = more cache reads per request. Affects read/write ratio. | +| `user_templates.chatbot.think_time_range` | [min, max] | [0.1, 0.5] | **Request inter-arrival time.** Shorter = higher request rate, more concurrent cache operations. | +| `user_templates.coding.context_range` | [min, max] | [4096, 25000] | Large contexts typical of code completion scenarios with full file context. Based on OpenRouter data showing programming workloads routinely exceed 20K input tokens. | +| `user_templates.coding.generation_range` | [min, max] | [100, 500] | Code generation often produces longer outputs than conversational AI. | +| `user_templates.coding.think_time_range` | [min, max] | [0.2, 1.0] | Developers pause to review generated code before next request. | +| `user_templates.document.context_range` | [min, max] | [4096, 16384] | **Stress test scenarios.** 16K tokens creates ~2 GB of total KV cache data for 8B models (128 KB/token × 16,384 tokens). | +| `user_templates.document.generation_range` | [min, max] | [200, 800] | Long-form analysis outputs (summaries, reports). | +| `user_templates.document.think_time_range` | [min, max] | [0.3, 1.5] | Users read lengthy outputs before continuing. | + +--- + +#### Token Generation Timing + +Simulates GPU compute time per generated token. This controls the backpressure on the storage system. + +| Mode | Default (sec/token) | When to Use | +|------|---------------------|-------------| +| `none` | 0.0 | **Pure storage benchmarking.** 100% of measured latency is I/O. Use for MLPerf storage submissions. | +| `fast` | 0.002 (2ms) | Simulates high-end GPU (H100) with optimized inference. Creates light backpressure. | +| `realistic` | 0.030 (30ms) | Simulates typical production GPU throughput. Balances compute/storage for end-to-end analysis. | + +**Why it matters:** With `generation_mode=none`, the benchmark hammers storage as fast as possible. With `realistic`, storage has time to absorb writes between decode steps, showing how your system performs under sustained (not burst) load. + +--- + +#### QoS Profiles (Quality of Service) + +Defines SLA targets for multi-tenant request prioritization. The benchmark tracks violations against these thresholds. + +| Profile | Typical Use Case | Priority | +|---------|------------------|----------| +| **Interactive** | Live chat UIs, real-time assistants | Highest (3) | +| **Responsive** | API calls, near-real-time processing | Medium (2) | +| **Batch** | Overnight jobs, bulk processing | Lowest (1) | + +| Parameter | Default | Meaning | +|-----------|---------|---------| +| `qos_profiles.interactive.target_latency_p95_ms` | 50 | 95% of interactive requests must complete within 50ms. Aggressive target for premium users. | +| `qos_profiles.interactive.target_latency_p99_ms` | 100 | 99% within 100ms. Allows some slack for tail latency. | +| `qos_profiles.interactive.target_latency_p999_ms` | 150 | 99.9% (3 nines) within 150ms. Production SLOs often specify this level. | +| `qos_profiles.interactive.target_latency_p9999_ms` | 200 | 99.99% (4 nines) within 200ms. Critical for detecting storage-induced tail latency. | +| `qos_profiles.interactive.priority` | 3 | Highest priority. These requests are dequeued first. | +| `qos_profiles.responsive.target_latency_p95_ms` | 100 | 2× the interactive target. Acceptable for API consumers. | +| `qos_profiles.responsive.target_latency_p99_ms` | 200 | 99% within 200ms. | +| `qos_profiles.responsive.target_latency_p999_ms` | 350 | 99.9% within 350ms. | +| `qos_profiles.responsive.target_latency_p9999_ms` | 500 | 99.99% within 500ms. | +| `qos_profiles.responsive.priority` | 2 | Medium priority. | +| `qos_profiles.batch.target_latency_p95_ms` | 1000 | 1 second. Batch jobs are latency-tolerant. | +| `qos_profiles.batch.target_latency_p99_ms` | 5000 | 5 seconds. Acceptable for offline processing. | +| `qos_profiles.batch.target_latency_p999_ms` | 7500 | 7.5 seconds. | +| `qos_profiles.batch.target_latency_p9999_ms` | 10000 | 10 seconds. Even worst-case should complete eventually. | +| `qos_profiles.batch.priority` | 1 | Lowest priority. Processed when interactive/responsive queues are empty. | + +> **Research Basis for QoS Targets** (see [sources.md](sources.md) for full citations): +> - **Interactive (50ms P95, 100ms P99)**: Based on Nielsen Norman Group's 0.1s "instant" threshold, Google RAIL <100ms response target, and observed production LLM APIs (Anthropic Claude TTFT: 50–150ms). +> - **Responsive (100ms P95, 200ms P99)**: Based on Google Core Web Vitals FID <100ms "good" threshold, INP ≤200ms target, and Vercel Edge Functions P99 <200ms. +> - **Batch (1000ms P95, 5000ms P99)**: Based on AWS ALB healthy target <1s, and research showing batch workloads tolerate >1s latency ([Splitwise paper](https://arxiv.org/abs/2401.07935): 80% of production requests need <200ms). +> +> **Note:** MLPerf Inference v4.0–v5.0 defines Server/Offline scenarios but does **not** prescribe specific P95/P99 latency SLAs. These targets represent industry best practices, not MLPerf requirements. + +--- + +#### QoS Distribution + +Controls the probability mix of request priorities in the simulated workload. + +| Parameter | Default | Effect | +|-----------|---------|--------| +| `interactive_probability` | 0.15 | 15% of requests are INTERACTIVE. Increase to stress-test low-latency paths. | +| `responsive_threshold` | 0.50 | If not INTERACTIVE, 35% of remaining requests (50% - 15%) are RESPONSIVE. The rest are BATCH. | + +**Example distribution with defaults:** 15% Interactive, 35% Responsive, 50% Batch. + +--- + +#### Eviction Settings + +Controls the waterfall LRU eviction algorithm that moves cold data down the tier hierarchy (GPU → CPU → NVMe). + +| Parameter | Default | Purpose | +|-----------|---------|---------| +| `max_recursion_depth` | 10 | **Safety limit.** Prevents infinite cascading evictions. If you hit this limit, your tiers are severely undersized. | +| `target_usage_ratio` | 0.8 | **Tier headroom.** Keeps each tier at 80% capacity, leaving 20% buffer for burst writes. Lower values = more headroom, fewer evictions. | +| `large_entry_limit_ratio` | 0.95 | **Skip-tier threshold.** If a single entry exceeds 95% of tier capacity, skip directly to the next tier. Prevents tier thrashing with huge entries. | +| `max_evictions_hard_cap` | 5000 | **Absolute safety limit.** Stops eviction loop after 5000 entries regardless of space needs. Prevents runaway eviction under pathological conditions. | +| `max_evictions_min` | 1000 | **Minimum eviction budget.** Ensures the algorithm tries at least 1000 evictions before giving up. Helps with large-model scenarios where many small entries must be evicted. | + +**Tuning guidance:** If you see "Hit recursion limit" warnings, increase `max_recursion_depth`. If evictions dominate your latency, reduce `target_usage_ratio` to provide more headroom. + +--- + +#### GPU Backend Settings + +Controls GPU VRAM allocation and out-of-memory (OOM) recovery behavior. + +| Parameter | Default | Purpose | +|-----------|---------|---------| +| `memory_fraction` | 0.9 | **VRAM budget.** Uses 90% of GPU memory, reserving 10% for framework overhead and other processes. | +| `max_eviction_attempts` | 100 | **OOM recovery limit.** On CUDA OOM, attempts up to 100 evictions to free space before failing the write. | +| `free_memory_threshold` | 0.1 | **Proactive eviction trigger.** When free GPU memory drops below 10%, begin evicting to CPU before OOM occurs. | + +**Note:** These settings only apply when `--gpu-mem-gb > 0` and PyTorch/CuPy is available. + +--- + +#### Prefix Cache Settings + +Controls hierarchical prefix caching for system prompts (e.g., "You are a helpful assistant"). + +| Parameter | Default | Purpose | +|-----------|---------|---------| +| `min_prefix_length` | 50 | **Minimum tokens for caching.** Prefixes shorter than 50 tokens aren't worth the overhead of caching. | +| `max_prefix_entries` | 1000 | **Prefix cache capacity.** LRU eviction kicks in when this limit is reached. Higher values consume more memory but improve hit rates. | +| `system_prompt_hit_probability` | 0.2 | **Simulation realism.** 20% of requests share a common system prompt. Increase to model deployments with standardized prompts (e.g., corporate assistants). | + +**Impact:** Higher `system_prompt_hit_probability` → higher cache hit rates → lower storage throughput (because prefixes are reused). Use 0.0 for pure storage stress testing. + +--- + +#### RAG Settings + +Controls Retrieval-Augmented Generation workload simulation, where external documents are injected into the context. + +| Parameter | Default | Purpose | +|-----------|---------|---------| +| `chunk_size_tokens` | 512 | **Document chunk granularity.** Each document is split into 512-token chunks for independent caching. Smaller chunks = more cache entries, higher metadata overhead. | +| `top_k_chunks` | 5 | **Retrieval depth.** Number of chunks retrieved per RAG query. More chunks = larger context window = more KV cache I/O. | +| `max_chunk_bytes` | 268435456 | **256 MB per chunk.** Safety limit to prevent single chunks from consuming entire tiers. Particularly important for 70B models where 512 tokens ≈ 160 MB of KV cache (320 KB/token). | + +**When to enable RAG:** Use `--enable-rag` when benchmarking systems designed for document-heavy workloads (legal, medical, enterprise search). + +--- + +#### Conversation Settings + +Controls multi-turn conversation simulation, modeling how chatbot context accumulates across turns. + +| Parameter | Default | Purpose | +|-----------|---------|---------| +| `max_conversations` | 1000 | **Concurrent conversation limit.** LRU eviction removes oldest conversations when this limit is hit. Higher values = more memory for conversation metadata. | +| `max_turns_per_conv` | 50 | **Conversation depth limit.** After 50 turns, the conversation resets. Prevents unbounded context growth in long-running benchmarks. | +| `end_conversation_probability` | 0.2 | **Conversation turnover rate.** 20% chance each turn ends the conversation. Lower values = longer conversations = more cache reuse. | + +**Impact on metrics:** Higher `max_turns_per_conv` and lower `end_conversation_probability` increase cache hit rates (context reuse). Use low values for stress testing (force cache misses). + +--- + +#### Autoscaler Settings + +Controls the workload autoscaler that discovers system saturation points. + +| Parameter | Default | Purpose | +|-----------|---------|---------| +| `min_users` | 1 | **Lower bound.** Autoscaler won't go below 1 user. | +| `max_users` | 10000 | **Upper bound.** Autoscaler stops scaling up at 10,000 users. Prevents runaway resource consumption. | +| `scale_up_factor` | 1.2 | **Growth rate.** Increases users by 20% each scaling action (e.g., 100 → 120 → 144). | +| `scale_down_factor` | 0.8 | **Decay rate.** Decreases users by 20% when SLAs are violated (e.g., 100 → 80 → 64). | +| `consecutive_samples_required` | 2 | **Stability requirement.** Requires 2 consecutive samples agreeing on direction before scaling. Prevents oscillation from transient spikes. | + +**QoS mode vs Capacity mode:** In QoS mode, the autoscaler maximizes users while maintaining latency SLAs. In Capacity mode, it maximizes throughput regardless of latency. + +--- + +#### Decode Phase Settings + +Controls token generation batching during the decode (read-heavy) phase. + +| Parameter | Default | Purpose | +|-----------|---------|---------| +| `batch_size` | 32 | **Decode batch granularity.** Reads 32 tokens worth of KV cache per decode operation. Larger batches amortize I/O overhead but require more memory. | + +--- + +#### ShareGPT Dataset Settings + +Controls loading and processing of real ShareGPT conversation data. + +| Parameter | Default | Purpose | +|-----------|---------|---------| +| `max_context_tokens` | 8192 | **Context truncation.** Conversations longer than 8192 tokens are truncated. Prevents OOM with very long conversations. | +| `max_generation_tokens` | 2048 | **Generation truncation.** Caps simulated generation at 2048 tokens per turn. | +| `chars_per_token_estimate` | 4 | **Tokenization heuristic.** Used when tiktoken is unavailable. 4 chars/token is typical for English text. | + +--- + +#### Saturation Detection Thresholds + +Controls when the StorageMonitor considers the storage subsystem saturated. + +| Parameter | Default | Purpose | +|-----------|---------|---------| +| `read_latency_p95_threshold_ms` | 100 | **Read saturation signal.** If P95 read latency exceeds 100ms, storage is considered stressed. | +| `write_latency_p95_threshold_ms` | 50 | **Write saturation signal.** Writes are more sensitive; 50ms threshold triggers concern earlier. | +| `queue_depth_threshold` | 100 | **Queue pressure signal.** More than 100 pending requests indicates backlog is building. | +| `history_window_size` | 10 | **Trend analysis window.** Uses last 10 samples to detect latency trends (increasing = saturation). | + +**Used by:** The autoscaler uses these thresholds to decide when to scale down (in QoS mode) or when peak throughput is reached (in capacity mode). + +--- + +#### Validation Limits + +Safety limits enforced by `validate_args()` to prevent accidental misconfigurations. + +| Parameter | Default | Rationale | +|-----------|---------|-----------| +| `max_users` | 100000 | Reasonable upper bound for simulated users. Prevents accidental `--num-users 1000000`. | +| `max_duration_seconds` | 86400 | 24 hours maximum. Prevents runaway benchmarks that run forever. | +| `max_gpu_memory_gb` | 1024 | 1 TB. Covers even the largest GPU clusters (8× H100 80GB = 640GB). | +| `max_cpu_memory_gb` | 16384 | 16 TB. Covers high-memory server configurations. | + +--- + +## Quick Start + +Run a basic storage test with 50 users for 2 minutes: + +```bash +python3 kv-cache.py \ + --config config.yaml \ + --model llama3.1-8b \ + --num-users 50 \ + --duration 120 \ + --gpu-mem-gb 0 \ + --cpu-mem-gb 4 \ + --generation-mode realistic \ + --cache-dir /mnt/nvme \ + --seed 42 \ + --output results.json +``` + +This forces all cache operations to hit your NVMe drive, giving you a baseline measurement of storage performance. + +--- + +## Running the Benchmark + +### CLI-Only Arguments + +These arguments **must** be passed via command line (not configurable in config.yaml): + +| Argument | Type | Default | Required | Description | +|----------|------|---------|----------|-------------| +| `--config` | str | None | No | Path to YAML configuration file | +| `--log-level` | str | INFO | No | Logging level: DEBUG, INFO, WARNING, ERROR, CRITICAL | +| `--model` | str | llama3.1-8b | Yes | Model config (see [Supported Models](#supported-models) below) | +| `--num-users` | int | 100 | Yes | Number of concurrent users to simulate | +| `--duration` | int | 60 | Yes | Benchmark duration in seconds | +| `--gpu-mem-gb` | float | 16 | Yes | GPU VRAM budget in GB (0 to disable) | +| `--cpu-mem-gb` | float | 32 | Yes | CPU RAM budget in GB | +| `--cache-dir` | str | temp | No | Directory for NVMe cache files | +| `--generation-mode` | str | realistic | No | Token generation: none, fast, realistic | +| `--performance-profile` | str | latency | No | Pass/fail criteria: latency, throughput | +| `--disable-multi-turn` | flag | False | No | Disable multi-turn conversation caching | +| `--disable-prefix-caching` | flag | False | No | Disable prefix caching | +| `--enable-rag` | flag | False | No | Enable RAG workload simulation | +| `--rag-num-docs` | int | 10 | No | Number of RAG documents to ingest | +| `--enable-autoscaling` | flag | False | No | Enable workload autoscaling | +| `--autoscaler-mode` | str | qos | No | Autoscaling strategy: qos, capacity | +| `--target-saturation` | float | 0.8 | No | Target storage saturation (0.0-1.0) | +| `--use-burst-trace` | flag | False | No | Use BurstGPT trace for workload | +| `--burst-trace-path` | str | BurstGPT/... | No | Path to BurstGPT trace file | +| `--validation-trace` | str | None | No | Path to validation trace file | +| `--dataset-path` | str | None | No | Path to ShareGPT dataset JSON | +| `--max-conversations` | int | 500 | No | Max conversations from dataset | +| `--output` | str | auto | No | Output JSON file path | +| `--seed` | int | None | **MLPerf** | Random seed for reproducibility | +| `--max-concurrent-allocs` | int | 0 | No | Limit concurrent allocations (0=unlimited) | +| `--request-rate` | float | 0 | No | Target request rate (req/sec, 0=unlimited) | +| `--max-requests` | int | 0 | No | Stop after N requests (0=use duration) | +| `--storage-capacity-gb` | float | 0 | No | NVMe tier capacity in GB (0=auto-detect from disk) | +| `--precondition` | flag | False | No | Write 2× NVMe capacity before benchmark (SSD steady-state) | +| `--precondition-size-gb` | float | 0 | No | Preconditioning volume in GB (0=2x NVMe capacity) | +| `--precondition-threads` | int | 0 | No | Preconditioning writer threads (0=cpu_count) | +| `--xlsx-output` | str | None | No | Excel/CSV output file path | +| `--prefill-only` | flag | False | No | Write-heavy benchmark (skip decode reads) | +| `--decode-only` | flag | False | No | Read-heavy benchmark (pre-populate cache, then read) | + +### Preconditioning vs Prefill-Only vs Decode-Only + +| Feature | `--precondition` | `--prefill-only` | `--decode-only` | +|---------|------------------|------------------|-----------------| +| **Purpose** | Reach SSD steady-state | Benchmark write performance | Benchmark read performance | +| **When** | Before benchmark starts | During benchmark | During benchmark | +| **I/O Pattern** | Sequential writes (fixed 2KB entries) | Write-heavy (+ prefix/multi-turn reads) | Reads from pre-populated cache | +| **Data Volume** | 2× NVMe capacity | Depends on duration/users | N/A (reads only) | +| **Stats Reset** | Yes (writes don't count) | No (writes ARE the metric) | Yes (pre-pop doesn't count) | +| **Use Case** | Fair SSD comparison | Prefill node simulation | Decode node simulation | + +**Note on prefill-only reads:** Even in `--prefill-only` mode, reads still occur for: +- Prefix cache hits (shared system prompts) +- Multi-turn conversation history +- RAG document chunks + +For **pure write testing** (no reads), combine flags: +```bash +python3 kv-cache.py --model llama3.1-70b-instruct --prefill-only \ + --disable-multi-turn --disable-prefix-caching \ + --gpu-mem-gb 0 --cpu-mem-gb 0 \ + --num-users 100 --duration 300 --cache-dir /mnt/nvme +``` + +**Example: Full SSD benchmark with preconditioning + pure writes** +```bash +python3 kv-cache.py --model llama3.1-70b-instruct \ + --precondition --prefill-only \ + --disable-multi-turn --disable-prefix-caching \ + --gpu-mem-gb 0 --cpu-mem-gb 0 \ + --num-users 100 --duration 300 --cache-dir /mnt/nvme +``` +This first fills the SSD to steady-state, then measures sustained write throughput with zero reads. + +### Disaggregated Inference Modes + +Modern inference systems often separate prefill and decode into different node pools: + +| Mode | Flag | I/O Pattern | Use Case | +|------|------|-------------|----------| +| Standard | *(none)* | Mixed R/W | Colocated prefill+decode | +| Prefill-only | `--prefill-only` | **Write-heavy** | Prefill nodes, SSD endurance | +| Decode-only | `--decode-only` | **Read-heavy** | Decode nodes, read IOPS/latency | + +**How decode-only works:** Before the benchmark, the cache is pre-populated with `num_users × 10` entries (simulating KV caches from prefill nodes). The benchmark then measures pure read performance. + +```bash +# Simulate disaggregated prefill node (write-heavy) +python3 kv-cache.py --model llama3.1-70b-instruct --prefill-only \ + --gpu-mem-gb 0 --cpu-mem-gb 0 \ + --num-users 100 --duration 120 --cache-dir /mnt/nvme + +# Simulate disaggregated decode node (read-heavy) +python3 kv-cache.py --model llama3.1-70b-instruct --decode-only \ + --gpu-mem-gb 0 --cpu-mem-gb 0 \ + --num-users 100 --duration 120 --cache-dir /mnt/nvme +``` + +### Supported Models + +The following models are pre-configured. You can add custom models by editing `config.yaml`. + +| Model Key | Name | Layers | Hidden Dim | Heads | KV Heads | KV Cache/Token | +|-----------|------|--------|------------|-------|----------|----------------| +| `tiny-1b` | Tiny 1B | 12 | 1024 | 8 | 4 | ~24 KB | +| `mistral-7b` | Mistral 7B | 32 | 4096 | 32 | 8 | ~128 KB | +| `llama2-7b` | Llama 2 7B | 32 | 4096 | 32 | 32 | ~512 KB | +| `llama3.1-8b` | Llama 3.1 8B | 32 | 4096 | 32 | 8 | ~128 KB | +| `llama3.1-70b-instruct` | Llama 3.1 70B | 80 | 8192 | 64 | 8 | ~320 KB | +| `deepseek-v3` | DeepSeek V3 (MLA) | 61 | 7168 | 128 | N/A | ~69 KB | +| `qwen3-32b` | Qwen 3 32B | 64 | 5120 | 64 | 8 | ~160 KB | +| `gpt-oss-120b` | GPT-OSS 120B (5.1B active) | 36 | 2880 | 64 | 8 | ~72 KB | +| `gpt-oss-20b` | GPT-OSS 20B (3.6B active) | 24 | 2880 | 64 | 8 | ~48 KB | + +#### Adding Custom Models + +Add new models to `config.yaml` under `model_configs`: + +```yaml +model_configs: + my-custom-model: + name: "My Custom Model" + num_layers: 40 + hidden_dim: 5120 + num_heads: 40 + kv_heads: 8 + dtype: "float16" +``` + +Then use it with `--model my-custom-model`. + +### Test Scenarios + +#### Scenario 1: Storage-Only Baseline + +Isolate your NVMe drive by setting GPU memory to zero. This tells you the raw performance of your storage. + +```bash +python3 kv-cache.py \ + --config config.yaml \ + --model llama3.1-8b \ + --num-users 50 \ + --duration 180 \ + --gpu-mem-gb 0 \ + --cpu-mem-gb 4 \ + --generation-mode realistic \ + --cache-dir /mnt/nvme \ + --seed 42 \ + --output results_storage_only.json +``` + +#### Scenario 2: Realistic Production Setup + +Test a balanced three-tier configuration that mirrors production deployment. + +```bash +python3 kv-cache.py \ + --config config.yaml \ + --model llama3.1-8b \ + --num-users 100 \ + --duration 300 \ + --gpu-mem-gb 16 \ + --cpu-mem-gb 32 \ + --generation-mode realistic \ + --cache-dir /mnt/nvme \ + --seed 42 \ + --output results_production.json +``` + +#### Scenario 3: Find Maximum User Count (QoS Mode) + +Let the autoscaler discover how many users your system can handle while maintaining acceptable latency. + +```bash +python3 kv-cache.py \ + --config config.yaml \ + --model llama3.1-8b \ + --num-users 20 \ + --duration 300 \ + --gpu-mem-gb 16 \ + --cpu-mem-gb 32 \ + --enable-autoscaling \ + --autoscaler-mode qos \ + --generation-mode realistic \ + --cache-dir /mnt/nvme \ + --seed 42 \ + --output results_autoscale_qos.json +``` + +#### Scenario 4: Find Peak Storage Throughput (Capacity Mode) + +Discover the absolute maximum I/O your storage can deliver by ignoring latency constraints. + +```bash +python3 kv-cache.py \ + --config config.yaml \ + --model llama3.1-70b-instruct \ + --num-users 10 \ + --duration 180 \ + --gpu-mem-gb 0 \ + --cpu-mem-gb 4 \ + --enable-autoscaling \ + --autoscaler-mode capacity \ + --generation-mode none \ + --cache-dir /mnt/nvme \ + --seed 42 \ + --output results_capacity.json +``` + +#### Scenario 5: Low Cache Hit Rate (Maximum Storage Stress) + +Force cache misses to maximize NVMe I/O pressure. This is useful for stress testing storage subsystems and measuring worst-case performance. + +**Key flags to lower cache hit rate:** +- `--disable-multi-turn`: Each request is independent (no conversation context reuse) +- `--disable-prefix-caching`: No system prompt caching (every request generates fresh KV cache) +- `--cpu-mem-gb 0`: No CPU tier buffer (all evictions go directly to NVMe) +- High user count with synthetic workload: More unique cache entries + +```bash +# Minimal caching - forces nearly all operations to hit NVMe +python3 kv-cache.py \ + --config config.yaml \ + --model llama3.1-8b \ + --num-users 200 \ + --duration 180 \ + --gpu-mem-gb 0 \ + --cpu-mem-gb 0 \ + --disable-multi-turn \ + --disable-prefix-caching \ + --generation-mode none \ + --cache-dir /mnt/nvme \ + --seed 42 \ + --output results_low_hit_rate.json +``` + +**Expected results:** Cache hit rate drops to 10-30% (vs 50-70% with defaults, or 85-97% with ShareGPT). + +For even more aggressive stress testing with the 70B model (2.5× larger KV cache per token): + +```bash +# Maximum NVMe stress - 70B model with no caching +python3 kv-cache.py \ + --config config.yaml \ + --model llama3.1-70b-instruct \ + --num-users 50 \ + --duration 180 \ + --gpu-mem-gb 0 \ + --cpu-mem-gb 0 \ + --disable-multi-turn \ + --disable-prefix-caching \ + --generation-mode none \ + --cache-dir /mnt/nvme \ + --seed 42 \ + --output results_70b_low_hit_rate.json +``` + +| Configuration | Typical Cache Hit Rate | Use Case | +|---------------|------------------------|----------| +| ShareGPT + defaults | 85-97% | Realistic production simulation | +| Synthetic + defaults | 50-70% | Balanced stress testing | +| `--disable-multi-turn` only | 30-50% | Moderate stress | +| `--disable-multi-turn --disable-prefix-caching` | 10-30% | Maximum NVMe stress | +| Above + `--cpu-mem-gb 0` | 5-15% | Worst-case storage scenario | + +--- + +## ShareGPT Replay Workloads + +While synthetic workloads are excellent for controlled stress testing, they may not capture the nuances of real human-AI interaction. The **ShareGPT Replay** feature addresses this by loading actual conversation data. + +### Why Use ShareGPT? + +Real conversations exhibit different patterns than synthetic workloads: +- **Higher cache locality**: Users ask follow-up questions, reusing context +- **Variable context sizes**: Real queries vary wildly (10-16,000 tokens) +- **Multi-turn structure**: Conversation flows are preserved + +### Downloading the ShareGPT Dataset + +Download the full dataset from Hugging Face (~1.2 GB): + +```bash +wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json +``` + +**Alternative: Smaller subset for quick testing (~40 MB):** + +```bash +wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split_no_imsorry.json +``` + +### Basic ShareGPT Invocation + +```bash +python3 kv-cache.py \ + --config config.yaml \ + --model llama3.1-8b \ + --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json \ + --max-conversations 500 \ + --num-users 50 \ + --duration 300 \ + --gpu-mem-gb 0 \ + --cpu-mem-gb 4 \ + --generation-mode realistic \ + --cache-dir /mnt/nvme \ + --seed 42 \ + --output results_sharegpt.json +``` + +### ShareGPT with Rate Limiting + +Control the request arrival rate for steady-state testing: + +```bash +python3 kv-cache.py \ + --config config.yaml \ + --model llama3.1-70b-instruct \ + --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json \ + --max-conversations 1000 \ + --request-rate 10.0 \ + --num-users 100 \ + --duration 600 \ + --gpu-mem-gb 0 \ + --cpu-mem-gb 8 \ + --generation-mode none \ + --cache-dir /mnt/nvme \ + --seed 42 \ + --output results_sharegpt_rate_limited.json +``` + +### ShareGPT with Fixed Request Count + +Run exactly N requests for reproducible benchmarks: + +```bash +python3 kv-cache.py \ + --config config.yaml \ + --model llama3.1-8b \ + --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json \ + --max-requests 5000 \ + --num-users 50 \ + --gpu-mem-gb 0 \ + --cpu-mem-gb 4 \ + --generation-mode realistic \ + --cache-dir /mnt/nvme \ + --seed 42 \ + --output results_sharegpt_fixed.json +``` + +### Comparing Real vs Synthetic Workloads + +| Metric | ShareGPT (Real) | Synthetic (Random) | +| :--- | :--- | :--- | +| Mean Context Size | ~133 tokens | ~2,676 tokens | +| Cache Hit Rate | 85-97% | 50-70% | +| Multi-turn Locality | High | Medium | +| Throughput | Higher | Lower | +| NVMe Stress | Moderate | Extreme | + +**Use ShareGPT** when you want to model real chatbot/assistant usage. +**Use Synthetic** when you want worst-case stress testing or controlled experiments. + +--- + +## BurstGPT Trace Replay + +The **BurstGPT Trace Replay** feature drives the benchmark using real production LLM workload traces collected from Azure OpenAI GPT services. Unlike ShareGPT (which provides conversation content), BurstGPT provides request-level token counts and timing from 5.29 million production API calls over 121 days. + +**Paper:** Wang et al., "BurstGPT: A Real-world Workload Dataset to Optimize LLM Serving Systems" (arXiv:2401.17644, KDD '25) + +### Why Use BurstGPT? + +BurstGPT traces capture production workload characteristics that synthetic generation cannot replicate: + +- **Zipf-distributed request lengths**: Many short requests with a long tail of large ones, matching real API usage +- **Bimodal response patterns**: ChatGPT responses cluster around two modes (short and medium) +- **Realistic token distributions**: Average 621 request tokens, 126 response tokens (after filtering failures) +- **Mixed model workloads**: Includes both ChatGPT (GPT-3.5) and GPT-4 request patterns + +### Downloading the BurstGPT Trace + +Clone the official BurstGPT repository from GitHub: + +```bash +git clone https://github.com/HPMLL/BurstGPT.git +``` + +This downloads the trace CSV files into `BurstGPT/data/`. The default `--burst-trace-path` points to `BurstGPT/data/BurstGPT_1.csv`, so cloning into your benchmark directory is sufficient. + +| File | Rows | Description | +|------|------|-------------| +| `BurstGPT_1.csv` | 1,429,737 | First 2 months of traces (includes 25K failed requests with 0 response tokens) | + +Each row contains: `Timestamp`, `Model`, `Request tokens`, `Response tokens`, `Total tokens`, `Log Type`. + +The benchmark reads only the `Request tokens` and `Response tokens` columns. Rows with parse errors are silently skipped. + +### Basic BurstGPT Invocation + +```bash +python3 kv-cache.py \ + --config config.yaml \ + --model llama3.1-8b \ + --use-burst-trace \ + --burst-trace-path BurstGPT/data/BurstGPT_1.csv \ + --num-users 50 \ + --duration 300 \ + --gpu-mem-gb 0 \ + --cpu-mem-gb 0 \ + --generation-mode none \ + --cache-dir /mnt/nvme \ + --seed 42 \ + --output results_burstgpt.json +``` + +### BurstGPT with Storage Capacity Tracking + +Track NVMe usage and enable eviction when the drive fills up: + +```bash +python3 kv-cache.py \ + --config config.yaml \ + --model llama3.1-8b \ + --use-burst-trace \ + --burst-trace-path BurstGPT/data/BurstGPT_1.csv \ + --num-users 100 \ + --duration 300 \ + --gpu-mem-gb 0 \ + --cpu-mem-gb 4 \ + --storage-capacity-gb 100 \ + --generation-mode none \ + --cache-dir /mnt/nvme \ + --seed 42 \ + --output results_burstgpt_capped.json +``` + +### BurstGPT with Preconditioning + +Precondition the SSD to steady state before measuring (recommended for consistent results on fresh drives): + +```bash +python3 kv-cache.py \ + --config config.yaml \ + --model llama3.1-8b \ + --use-burst-trace \ + --burst-trace-path BurstGPT/data/BurstGPT_1.csv \ + --num-users 50 \ + --duration 300 \ + --gpu-mem-gb 0 \ + --cpu-mem-gb 0 \ + --storage-capacity-gb 100 \ + --precondition \ + --precondition-size-gb 200 \ + --precondition-threads 16 \ + --generation-mode none \ + --cache-dir /mnt/nvme \ + --seed 42 \ + --output results_burstgpt_preconditioned.json +``` + +### BurstGPT Throughput Profile + +Use the throughput performance profile to focus on bandwidth metrics without QoS latency targets: + +```bash +python3 kv-cache.py \ + --config config.yaml \ + --model llama3.1-8b \ + --use-burst-trace \ + --burst-trace-path BurstGPT/data/BurstGPT_1.csv \ + --num-users 100 \ + --duration 300 \ + --gpu-mem-gb 0 \ + --cpu-mem-gb 0 \ + --performance-profile throughput \ + --generation-mode none \ + --cache-dir /mnt/nvme \ + --seed 42 \ + --output results_burstgpt_throughput.json +``` + +### Comparing Workload Sources + +| Metric | Synthetic | ShareGPT | BurstGPT | +|--------|-----------|----------|----------| +| Source | Random from user templates | Real conversations (Hugging Face) | Production API traces (Azure OpenAI) | +| Mean Context Size | ~2,676 tokens | ~133 tokens | ~622 tokens | +| Mean Response Size | ~275 tokens | ~150 tokens | ~126 tokens | +| Request Distribution | Uniform within ranges | Natural conversation | Zipf (many short, long tail) | +| Cache Hit Rate | 50-70% | 85-97% | Varies by trace segment | +| NVMe Stress | Extreme | Moderate | Moderate-High | +| Best For | Worst-case stress testing | Chatbot/assistant simulation | Production workload modeling | + +--- + +## Using the Wrapper Script + +The `kv-cache-wrapper.sh` script automates a complete benchmark suite. It detects your hardware, calculates appropriate parameters, and runs multiple test scenarios. + +### Basic Usage + +```bash +./kv-cache-wrapper.sh +``` + +This runs all test scenarios with default settings. Expect roughly 30 minutes for the full suite. + +### Options + +``` +./kv-cache-wrapper.sh [options] + + -m MODEL Model to benchmark (default: llama3.1-8b) + -t SECONDS Duration for tier comparison tests (default: 120) + -s SECONDS Duration for storage saturation test (default: 180) + -r SECONDS Duration for production test (default: 180) + -a SECONDS Duration for autoscaling tests (default: 300) + -w LIST Comma-separated list of workloads to run + -u USERS Override baseline user count + -U USERS Override high-load user count + -R Enable RAG workload + -D DOCS Number of RAG documents (default: 10) + -h Show help +``` + +### Available Workloads + +```bash +# Run only the storage isolation test +./kv-cache-wrapper.sh -w storage-only + +# Run production and autoscaling tests +./kv-cache-wrapper.sh -w production,autoscale + +# Run MLPerf submission tests +./kv-cache-wrapper.sh -w mlperf_submission +``` + +--- + +## Understanding Results + +### Key Metrics + +**Throughput (tokens/sec)**: How many tokens the system processes per second. Higher is better. + +**Storage Throughput (tokens/sec)**: Raw I/O performance calculated from storage latency, not wall-clock time. This is the fairer metric for comparing storage tiers. + +**End-to-End Latency**: Total time from request submission to completion. This is what users experience. + +**Storage I/O Latency**: Time spent reading from and writing to storage tiers. This measures your hardware. + +**Queue Wait Time**: Time requests spend waiting before processing begins. If this dominates, your system is overloaded. + +**Cache Hit Rate**: Percentage of reads served from cache. Higher rates mean less storage pressure. + +### Reading the Output + +``` +### STORAGE PERFORMANCE ASSESSMENT: PASS ### + Criteria Passed: 4/4 + [PASS] NVMe Write P95 < 500ms: 45.20ms + [PASS] NVMe Read P95 < 200ms: 123.45ms + [PASS] CPU RAM P95 < 150ms: 12.30ms + [PASS] Cache Hit Rate > 30%: 67.5% + +### OVERALL PERFORMANCE ### + Total Requests: 2847 + Total Tokens Generated: 489,231 + Avg Throughput: 1,630.77 tok/s + Storage Throughput: 2,105.32 tok/s + +### LATENCY BREAKDOWN ### + End-to-End: mean 89.3ms, P50 45.2ms, P95 312.4ms + Storage I/O: mean 23.1ms, P50 12.4ms, P95 89.2ms +``` + +--- + +## Understanding Excel Performance Metrics + +The `--xlsx-output` option exports detailed performance metrics to Excel for analysis. This section provides a comprehensive reference for every metric in the export. + +### Run Parameters (Configuration) + +These columns record the benchmark configuration used for the run: + +| Column | Description | +|--------|-------------| +| **Timestamp** | When the benchmark was executed (YYYY-MM-DD HH:MM:SS) | +| **Model** | Model configuration key (e.g., `llama3.1-8b`, `llama3.1-70b-instruct`) | +| **Num Users** | Number of concurrent simulated users | +| **Duration (s)** | Benchmark duration in seconds | +| **GPU Memory (GB)** | GPU VRAM budget allocated | +| **CPU Memory (GB)** | CPU RAM budget allocated | +| **Generation Mode** | Token generation simulation: `none`, `fast`, or `realistic` | +| **Performance Profile** | Pass/fail criteria: `latency` or `throughput` | +| **Multi-turn** | Whether multi-turn conversation caching was enabled | +| **Prefix Caching** | Whether system prompt prefix caching was enabled | +| **RAG Enabled** | Whether RAG workload simulation was enabled | +| **Autoscaling** | Whether workload autoscaling was enabled | +| **Seed** | Random seed for reproducibility | +| **Max Concurrent Allocs** | Limit on parallel cache allocations (0 = unlimited) | +| **Request Rate** | Target request rate in req/sec (0 = unlimited) | +| **Max Requests** | Stop after N requests (0 = use duration) | +| **Dataset Path** | Path to ShareGPT dataset if used | +| **Cache Dir** | Directory used for NVMe cache files | + +--- + +### Throughput Metrics + +| Metric | Unit | What It Measures | Interpretation | +|--------|------|------------------|----------------| +| **Total Requests** | count | Total inference requests completed | Higher = more work done. Compare across runs with same duration. | +| **Total Tokens** | count | Total tokens generated across all requests | Primary workload volume indicator. | +| **Elapsed Time (s)** | seconds | Actual wall-clock benchmark duration | May differ slightly from configured duration. | +| **Avg Throughput (tok/s)** | tokens/sec | `Total Tokens / Elapsed Time` | **Wall-clock throughput.** Includes all overheads (queue wait, generation simulation). **Primary metric when `gpu_mem=0` and `cpu_mem=0`.** | +| **Storage Throughput (tok/s)** | tokens/sec | `Total Tokens / Total Storage I/O Time` | **Pure storage throughput.** Excludes generation simulation time. Useful when `cpu_mem > 0` to isolate storage I/O. | +| **Requests/sec** | req/sec | `Total Requests / Elapsed Time` | Request processing rate. Higher = system handling more concurrent users efficiently. | + +> **Which throughput metric to use?** +> - **When `gpu_mem=0` and `cpu_mem=0`**: Use **Avg Throughput (tok/s)** — all I/O hits the storage tier, so wall-clock throughput directly reflects storage performance. +> - **When `cpu_mem > 0`**: Use **Storage Throughput (tok/s)** to isolate storage I/O from CPU cache hits. +> - **For MLPerf submissions**: Use **Tier Storage Read/Write Bandwidth (GB/s)** as the primary comparison metric (see below). + +--- + +### End-to-End Latency Metrics + +End-to-end (E2E) latency measures the total time from request submission to completion, including queue wait, cache operations, and simulated generation time. **This is what users experience.** + +| Metric | What It Measures | +|--------|------------------| +| **E2E Latency Mean (ms)** | Average latency across all requests. Sensitive to outliers. | +| **E2E Latency P50 (ms)** | Median latency. 50% of requests complete within this time. | +| **E2E Latency P95 (ms)** | 95th percentile. 95% of requests complete within this time. **Standard SLA metric.** | +| **E2E Latency P99 (ms)** | 99th percentile. 99% of requests complete within this time. **Tail latency indicator.** | +| **E2E Latency P99.9 (ms)** | 99.9th percentile (3 nines). Captures rare slow requests. | +| **E2E Latency P99.99 (ms)** | 99.99th percentile (4 nines). Extreme tail latency for SLA compliance. | + +> **Interpreting percentiles:** +> - **P50** tells you the typical user experience. +> - **P95** is the standard for SLA definitions ("95% of requests under X ms"). +> - **P99–P99.99** reveal tail latency issues that affect a small but real fraction of users. +> - Large gaps between P95 and P99 indicate inconsistent performance (investigate queue buildup or storage saturation). + +--- + +### Storage I/O Latency Metrics + +Storage latency measures only the time spent on cache read/write operations, excluding queue wait and generation simulation. **This isolates storage subsystem performance.** + +| Metric | What It Measures | +|--------|------------------| +| **Storage Latency Mean (ms)** | Average storage I/O time across all operations. | +| **Storage Latency P50 (ms)** | Median storage I/O time. | +| **Storage Latency P95 (ms)** | 95th percentile storage I/O time. **Key metric for storage evaluation.** | +| **Storage Latency P99 (ms)** | 99th percentile storage I/O time. | +| **Storage Latency P99.9 (ms)** | 99.9th percentile storage I/O time. | +| **Storage Latency P99.99 (ms)** | 99.99th percentile storage I/O time. | + +--- + +### Generation Latency Metrics + +Generation latency measures the simulated GPU token generation time. Only meaningful when `--generation-mode` is `fast` or `realistic`. + +| Metric | What It Measures | +|--------|------------------| +| **Gen Latency Mean (ms)** | Average simulated generation time per request. | +| **Gen Latency P50 (ms)** | Median generation time. | +| **Gen Latency P95 (ms)** | 95th percentile generation time. | +| **Gen Latency P99 (ms)** | 99th percentile generation time. | + +> **Note:** With `--generation-mode none`, these values are all 0 (pure storage benchmark). + +--- + +### Storage Tier Latency Breakdown (PRIMARY METRICS) + +These metrics provide granular visibility into storage tier operations. The "storage" tier is device-agnostic—it could be NVMe, SATA SSD, CXL memory, or any block storage device. Each operation is decomposed into: + +- **Total**: Complete operation time (Host + Device) +- **Device**: Actual storage I/O time (`np.save`/`np.load` with fsync) — **PRIMARY LATENCY METRIC** +- **Host**: CPU serialization/deserialization time + +> **⭐ PRIMARY METRICS for MLPerf Storage Comparison:** +> - **Storage Tier Read Device P95 (ms)** — Raw storage read latency +> - **Storage Tier Write Device P95 (ms)** — Raw storage write latency +> - **Tier Storage Read Bandwidth (GB/s)** — Storage read throughput +> - **Tier Storage Write Bandwidth (GB/s)** — Storage write throughput +> +> **What Device Latency Measures:** +> ``` +> Device Latency = [ OS/FS Queue ] + [ Block Layer ] + [ Driver ] + [ Physical I/O ] +> ``` +> The **Storage Tier Read Device P95 (ms)** is the 95th percentile latency of reading one `.npy` file containing the KV cache data for a single cache entry (one request's token sequence). This captures tail latency—95% of reads complete faster than this value, so it reveals worst-case storage behavior under load. + +#### Read Operations (Decode Phase) + +| Metric | Component | What It Measures | +|--------|-----------|------------------| +| **Storage Tier Read Total P50–P99.99 (ms)** | Total | Complete read time including deserialization | +| **Storage Tier Read Device P50–P99.99 (ms)** | Device | **⭐ Raw storage read time (`np.load`) — PRIMARY** | +| **Storage Tier Read Host P50–P99.99 (ms)** | Host | NumPy array deserialization CPU time | + +#### Write Operations (Prefill Phase) + +| Metric | Component | What It Measures | +|--------|-----------|------------------| +| **Storage Tier Write Total P50–P99.99 (ms)** | Total | Complete write time including serialization | +| **Storage Tier Write Device P50–P99.99 (ms)** | Device | **⭐ Raw storage write time (`np.save` + fsync) — PRIMARY** | +| **Storage Tier Write Host P50–P99.99 (ms)** | Host | NumPy array serialization CPU time | + +> **Diagnosing storage bottlenecks:** +> - If **Device >> Host**: Your storage device is the bottleneck. Consider faster storage (NVMe Gen5, CXL). +> - If **Host >> Device**: CPU serialization is the bottleneck. Consider faster CPU or memory bandwidth. +> - Typical ratio: Device should be 60-80% of Total for well-balanced systems. + +--- + +### Cache Statistics + +| Metric | Unit | What It Measures | Good Values | +|--------|------|------------------|-------------| +| **Cache Hit Rate** | ratio (0–1) | Fraction of reads served from cache vs. storage | Higher is better. 0.7+ with multi-turn enabled. | +| **Read/Write Ratio** | ratio | Total reads / Total writes | Higher indicates read-heavy workload (typical for decode phase). | +| **Total Read (GB)** | GB | Total data read from all tiers | Workload volume indicator. | +| **Total Write (GB)** | GB | Total data written to all tiers | Workload volume indicator. | + +--- + +### Per-Tier I/O Volume + +These metrics show data movement through each tier of the cache hierarchy: + +| Metric | What It Measures | +|--------|------------------| +| **Tier GPU KV Bytes Written (GB)** | Data written to GPU VRAM tier | +| **Tier GPU KV Bytes Read (GB)** | Data read from GPU VRAM tier | +| **Tier CPU KV Bytes Written (GB)** | Data written to CPU RAM tier | +| **Tier CPU KV Bytes Read (GB)** | Data read from CPU RAM tier | +| **Tier Storage KV Bytes Written (GB)** | Data written to storage tier (NVMe, SATA, CXL, etc.) | +| **Tier Storage KV Bytes Read (GB)** | Data read from storage tier (NVMe, SATA, CXL, etc.) | + +> **Analyzing tier distribution:** +> - High GPU/CPU reads with low storage reads = hot data fits in fast tiers (good!) +> - High storage reads = working set exceeds fast tier capacity (consider adding memory) +> - **Tier Storage KV Bytes Read** is a key MLPerf differentiation metric (100% win rate in discovery testing) + +--- + +### Per-Tier Bandwidth (PRIMARY METRICS) + +These metrics measure the actual throughput achieved on each tier. **Tier Storage Bandwidth is the primary metric for comparing storage devices.** + +| Metric | Unit | What It Measures | +|--------|------|------------------| +| **Tier GPU Read Bandwidth (GB/s)** | GB/s | GPU VRAM read throughput | +| **Tier GPU Write Bandwidth (GB/s)** | GB/s | GPU VRAM write throughput | +| **Tier CPU Read Bandwidth (GB/s)** | GB/s | CPU RAM read throughput | +| **Tier CPU Write Bandwidth (GB/s)** | GB/s | CPU RAM write throughput | +| **Tier Storage Read Bandwidth (GB/s)** | GB/s | **⭐ Storage tier read throughput — PRIMARY** | +| **Tier Storage Write Bandwidth (GB/s)** | GB/s | **⭐ Storage tier write throughput — PRIMARY** | + +> **Expected bandwidth ranges:** +> - **GPU**: 500–2000 GB/s (HBM2e/HBM3) +> - **CPU**: 50–200 GB/s (DDR4/DDR5) +> - **Storage (NVMe Gen4)**: 3–7 GB/s +> - **Storage (NVMe Gen5)**: 10–14 GB/s +> - **Storage (SATA SSD)**: 0.4–0.6 GB/s +> - **Storage (CXL Memory)**: 30–50 GB/s + +--- + +### Tier Entry Distribution + +| Metric | What It Measures | +|--------|------------------| +| **GPU Entries** | Number of KV cache entries currently in GPU VRAM | +| **CPU Entries** | Number of KV cache entries currently in CPU RAM | +| **Storage Entries** | Number of KV cache entries currently on storage tier | + +> **Interpreting entry counts:** +> - Most entries should be in the fastest available tier for optimal performance. +> - High **Storage Entries** with low **GPU/CPU Entries** indicates memory pressure. +> - When `gpu_mem=0` and `cpu_mem=0`, all entries will be in **Storage Entries**. + +--- + +### Multi-turn Statistics + +| Metric | What It Measures | +|--------|------------------| +| **Multi-turn Hit Rate** | Fraction of requests that reused context from previous conversation turns | + +> **Interpreting Multi-turn Hit Rate:** +> - **High (0.6+)**: Effective conversation context caching. Most requests are follow-ups that reuse existing KV cache entries, reducing redundant computation. Typical for chatbot/assistant workloads. +> - **Low (<0.3)**: Indicates one or more of the following: +> - `--disable-multi-turn` is enabled (expected: 0.0) +> - Workload has high conversation turnover (users start new conversations frequently) +> - Single-shot API usage pattern (each request is independent) +> - Memory pressure causing cache eviction before context reuse +> - Short benchmark duration (not enough time for multi-turn patterns to emerge) +> +> **Note:** A low multi-turn hit rate is **not inherently bad**—it depends on your use case. For storage stress testing, low hit rates force more I/O which is often the goal. + +--- + +### Using Excel Metrics for Analysis + +**⭐ Primary Metrics for MLPerf Storage Comparison:** + +| Metric | When to Use | Why | +|--------|-------------|-----| +| **Tier Storage Read Bandwidth (GB/s)** | Always | Direct measure of storage read throughput | +| **Tier Storage Write Bandwidth (GB/s)** | Always | Direct measure of storage write throughput | +| **Storage Tier Read Device P95 (ms)** | Always | Raw storage read latency (excludes CPU overhead) | +| **Storage Tier Write Device P95 (ms)** | Always | Raw storage write latency (excludes CPU overhead) | +| **Avg Throughput (tok/s)** | When `gpu_mem=0, cpu_mem=0` | Wall-clock throughput equals storage throughput | + +**Comparing storage devices:** +1. Run identical benchmarks on each device with `--gpu-mem-gb 0 --cpu-mem-gb 0` +2. Compare **primary metrics**: Tier Storage Read/Write Bandwidth, Storage Tier Device P95 latencies +3. Use **Avg Throughput (tok/s)** as the overall performance score + +**Diagnosing performance issues:** +1. Check **Storage Tier Device P95** vs **Storage Tier Host P95** +2. If Device >> Host: Storage device is the bottleneck +3. If Host >> Device: CPU serialization is the bottleneck + +**Validating cache configuration:** +1. Check **Cache Hit Rate** and **Multi-turn Hit Rate** +2. Low hit rates with enabled caching: Working set too large for memory budget +3. Compare **Tier Storage KV Bytes Read** across configurations + +--- + +## Unit Testing + +This package includes a comprehensive pytest-based test suite to verify core functionality without running the full benchmark. + +### Running Tests + +```bash +# Run all tests with verbose output +pytest test_kv_cache.py -v + +# Run with shorter traceback +pytest test_kv_cache.py -v --tb=short + +# Run specific test class +pytest test_kv_cache.py -k "TestModelConfig" -v + +# Run only CPU tests (skip GPU tests if no CUDA) +pytest test_kv_cache.py -v -m "not skipif" +``` + +### Test Coverage + +The test suite covers 23 component categories with ~170+ individual tests: + +| Test Class | Tests | Coverage | +|------------|-------|----------| +| `TestConfigLoader` | 5 | YAML loading, strict schema validation, error on unknown keys, nested key access | +| `TestCfgHelper` | 4 | Global `cfg()` helper, defaults when config not loaded, list value extraction | +| `TestModelConfig` | 4 | Model configurations, KV cache size per token calculations, dtype handling | +| `TestInferenceRequest` | 5 | Request dataclass, automatic cache key generation, phase handling, QoS assignment | +| `TestQoSProfiles` | 5 | QoS levels (interactive/responsive/batch), SLA targets, priority ordering, p999/p9999 extended metrics | +| `TestKVCacheGenerator` | 4 | Reproducible generation with seeds, correct tensor shapes, dtype consistency, precomputed buffers | +| `TestCPUMemoryBackend` | 4 | Write/read/delete/clear operations, timing metadata, data integrity | +| `TestNVMeBackend` | 5 | File I/O operations, .npy format handling, metadata persistence, temp directory cleanup | +| `TestGPUMemoryBackend` | 4 | CUDA tensor placement, device memory management (skipped without GPU) | +| `TestConversationManager` | 4 | Multi-turn conversation tracking, cache key management, LRU eviction | +| `TestUserSimulator` | 3 | User profile generation from templates, QoS distribution validation | +| `TestMultiTierCache` | 5 | CPU-only allocation paths, cache access patterns, tier selection logic | +| `TestMultiTierCacheWithGPU` | 4 | GPU tier allocation, waterfall eviction GPU→CPU→NVMe (skipped without GPU) | +| `TestXLSXExport` | 4 | CSV fallback, Excel export, run parameters embedding (skipped without pandas) | +| `TestEnums` | 3 | InferencePhase, GenerationMode, QoSLevel enum values | +| `TestTierLogic` | 3 | Tier ordering (GPU→CPU→NVMe), usage tracking, limit validation | +| `TestConfigDrivenConversationManager` | 2 | ConversationManager respects config.yaml settings | +| `TestConfigDrivenUserSimulator` | 3 | UserSimulator reads user_templates from config | +| `TestStatsNamingConvention` | 2 | `storage_*` naming convention validation for metrics keys | +| `TestGPUMemoryBackendEvictionCallback` | 2 | GPU eviction callback invocation and data passing (skipped without GPU) | +| `TestValidateArgs` | 24 | CLI argument validation: positive integers, ranges, memory limits, cache directory safety, forbidden prefixes | +| `TestPerTierPhaseMetrics` | 7 | Per-tier (GPU/CPU/Storage) KV bytes read/written tracking during prefill/decode phases | +| `TestPerTierPhaseMetricsWithGPU` | 4 | GPU tier metrics tracking, phase-aware read/write separation (skipped without GPU) | + +### Expected Runtime + +- **Without GPU**: ~5-10 seconds +- **With GPU**: ~10-15 seconds + +GPU tests are automatically skipped if CUDA is not available. + +--- + +## Excel Export + +The benchmark can export results directly to Excel or CSV format for analysis. + +### Basic Usage + +```bash +python3 kv-cache.py \ + --config config.yaml \ + --model llama3.1-8b \ + --num-users 50 \ + --duration 120 \ + --gpu-mem-gb 0 \ + --cpu-mem-gb 4 \ + --seed 42 \ + --output results.json \ + --xlsx-output results.xlsx +``` + +### Output Format + +The Excel file contains a single row with all key metrics: + +| Column | Description | +|--------|-------------| +| Model | Model configuration used | +| Num Users | Concurrent user count | +| Duration (s) | Benchmark duration | +| GPU Mem (GB) | GPU memory budget | +| CPU Mem (GB) | CPU memory budget | +| Total Requests | Requests completed | +| Total Tokens | Tokens processed | +| Avg Throughput (tok/s) | Wall-clock throughput | +| Storage Throughput (tok/s) | Storage I/O throughput | +| Cache Hit Rate | Percentage of cache hits | +| E2E Latency P95 (ms) | End-to-end 95th percentile | +| Storage IO P95 (ms) | Storage I/O 95th percentile | + +### Fallback Behavior + +- **With openpyxl**: Exports to `.xlsx` format +- **Without openpyxl**: Falls back to `.csv` format +- **Without pandas**: Export is skipped with a warning + +--- + +## MLPerf Submission Guidelines + +For official MLPerf v3.0 storage submissions, use these standardized commands. **These invocations have been validated through extensive discovery testing** (1,411 Fast system tests, 268 Slow system tests comparing 14,000 MB/s vs 3,000 MB/s storage). + +### Discovery Test Key Findings + +| Finding | Impact | +|---------|--------| +| **Metric selection depends on cpu_mem** | Storage Throughput shows only 1.1x at cpu_mem=0GB but 2.2x at cpu_mem=4GB | +| **Best models for differentiation** | llama3.1-8b and mistral-7b show 2.31x ratio | +| **High variance observed** | CV 50-125%, requires 3-5 trials minimum | +| **100% win rate metrics** | Decode Bytes Read and Wall-Clock Throughput at cpu_mem=0GB | + +### Option 1: Maximum Storage Stress (cpu_mem=0GB) + +Use when you want to stress test NVMe and measure I/O volume differentiation. + +**Primary Metrics:** Decode Bytes Read (2.62x differentiation), Wall-Clock Throughput (2.43x differentiation) + +```bash +# MLPerf v3.0: Maximum Storage Stress Test (8B Model) +# Run 3-5 trials for statistical significance +for trial in 1 2 3 4 5; do + python3 kv-cache.py \ + --config config.yaml \ + --model llama3.1-8b \ + --num-users 200 \ + --duration 300 \ + --gpu-mem-gb 0 \ + --cpu-mem-gb 0 \ + --max-concurrent-allocs 16 \ + --generation-mode none \ + --cache-dir /mnt/nvme \ + --seed 42 \ + --output mlperf_v3_stress_8b_trial${trial}.json +done +``` + +**⚠️ Important:** At cpu_mem=0GB, do NOT use Storage Throughput as your primary metric—use Decode Bytes Read or Wall-Clock Throughput instead. + +### Option 2: Storage Throughput Focus (cpu_mem=4GB) + +Use when you want Storage Throughput (tok/s) as your primary metric. + +**Primary Metric:** Storage Throughput (2.2x differentiation, 97% win rate) + +```bash +# MLPerf v3.0: Storage Throughput Test (8B Model) +for trial in 1 2 3 4 5; do + python3 kv-cache.py \ + --config config.yaml \ + --model llama3.1-8b \ + --num-users 100 \ + --duration 300 \ + --gpu-mem-gb 0 \ + --cpu-mem-gb 4 \ + --max-concurrent-allocs 0 \ + --generation-mode none \ + --cache-dir /mnt/nvme \ + --seed 42 \ + --output mlperf_v3_throughput_8b_trial${trial}.json +done +``` + +### Option 3: Large Model Submission (70B) + +For maximum per-request storage stress (2.5× larger KV cache per token: 320 KB vs 128 KB): + +```bash +# MLPerf v3.0: Large Model Storage Stress +for trial in 1 2 3; do + python3 kv-cache.py \ + --config config.yaml \ + --model llama3.1-70b-instruct \ + --num-users 70 \ + --duration 300 \ + --gpu-mem-gb 0 \ + --cpu-mem-gb 0 \ + --max-concurrent-allocs 4 \ + --generation-mode none \ + --cache-dir /mnt/nvme \ + --seed 42 \ + --output mlperf_v3_stress_70b_trial${trial}.json +done +``` + +### Critical Parameters (Discovery-Validated) + +| Parameter | Value | Rationale | +|-----------|-------|-----------| +| **--config config.yaml** | Required | Ensures consistent internal settings | +| **--seed 42** | Required | Reproducibility across systems | +| **--gpu-mem-gb 0** | Required | Isolates storage performance | +| **--cpu-mem-gb** | 0 or 4 | 0GB for max stress (use I/O volume metrics), 4GB for Storage Throughput metric | +| **--max-concurrent-allocs** | 0, 4, or 16 | 0 for throughput, 16 for stress testing | +| **--generation-mode** | none or realistic | none for pure I/O, realistic for production simulation | +| **--num-users** | 100-200 | Differentiation stable across range; higher = more throughput | +| **--duration** | 300-600 | 5-10 minutes for stable metrics | + +### Trial Requirements + +| User Count | Variance (CV) | Minimum Trials | +|------------|---------------|----------------| +| 10 users | ~52% | 3 | +| 50-100 users | ~115-125% | 3-5 | +| 200 users | ~110-120% | 3-5 | + +Report **median** rather than mean for publication-quality results. + +--- + +## Troubleshooting + +### Out of Memory Errors + +Reduce the number of concurrent users or limit parallel allocations: + +```bash +python3 kv-cache.py --config config.yaml ... --max-concurrent-allocs 50 +``` + +### Benchmark Hangs + +The system may be thrashing. Reduce users or increase memory budgets. + +### Poor Cache Hit Rates + +Low hit rates indicate your working set exceeds available fast memory. Either: +- Increase GPU/CPU memory budgets +- Reduce user count +- Accept that cold data will hit storage + +### Results Vary Between Runs + +Use the `--seed` flag for reproducible results. + +### Configuration Validation Errors + +If you see "Unknown configuration key" errors, check your `config.yaml` for typos. The benchmark uses strict schema validation to prevent silent misconfigurations. + +--- + +## Files in This Package + +- `kv-cache.py`: Main benchmark implementation with ShareGPT and BurstGPT support +- `config.yaml`: YAML configuration file for internal parameters +- `test_kv_cache.py`: Pytest unit test suite +- `requirements.txt`: Python dependencies +- `BurstGPT/`: BurstGPT trace dataset (clone from https://github.com/HPMLL/BurstGPT) +- `README.md`: This documentation +- `MLperf v3 KV cache proposal.md`: Detailed technical documentation + +--- + +## License + +Apache License 2.0 + +--- + +## Contact + +For questions or feedback, open an issue on the repository or contact the MLPerf Storage Working Group. diff --git a/kv_cache_benchmark/config.yaml b/kv_cache_benchmark/config.yaml new file mode 100644 index 00000000..f46f6beb --- /dev/null +++ b/kv_cache_benchmark/config.yaml @@ -0,0 +1,357 @@ +# MLPerf v3.0 KV Cache Benchmark Configuration +# ============================================= +# This file contains all configurable parameters for the benchmark. +# Edit values here instead of modifying the Python source code. +# +# Usage: python kv-cache-01-26-2026.py --config config.yaml [other args] +# +# YAML values are overridden by CLI arguments when both are specified. +# Unknown keys will raise an error to prevent silent misconfigurations. + +# ============================================================================= +# USER TEMPLATES +# Defines behavior patterns for different user personas in the simulation. +# context_range: [min, max] tokens in the input prompt +# generation_range: [min, max] tokens to generate in the response +# think_time_range: [min, max] seconds between requests (simulated user delay) +# +# Sources: +# [1] OpenRouter "State of AI: An Empirical 100T Token Study" (arXiv:2601.10088) +# - Avg prompt tokens grew ~4x from ~1,500 to >6,000 (early 2024 → late 2025) +# - Avg completion tokens grew ~3x from ~150 to ~400 +# - Programming workloads routinely exceed 20K input tokens +# - Non-programming categories remain "relatively flat and low-volume" +# - Overall input:output ratio ~15:1 +# [2] BurstGPT (arXiv:2401.17644) — 10.31M traces from Azure OpenAI GPT +# - Request lengths follow a Zipf distribution (many short, long tail) +# - ChatGPT response lengths are bimodal with linear request-response +# correlation +# ============================================================================= +user_templates: + chatbot: + # General-purpose conversational use. Non-programming categories stay + # well below the platform average of ~6K input tokens [1]. Zipf-shaped + # request distribution means most chatbot prompts are short [2]. + context_range: [512, 4096] + # Completions average ~400 tokens across all categories [1]. + generation_range: [50, 200] + think_time_range: [0.1, 0.5] + coding: + # Programming is the dominant context-length driver, "routinely exceeding + # 20K input tokens" and averaging 3-4x general-purpose prompts [1]. + # Claude handles ~60% of coding workloads at >20K avg prompt tokens [1]. + context_range: [4096, 25000] + # Output stays modest relative to input (~15:1 input:output ratio) [1]. + generation_range: [100, 500] + think_time_range: [0.2, 1.0] + document: + # Long-context document analysis (summarization, Q&A over files). + # Sits between chatbot and coding; context-heavy but below coding peaks. + # Overall avg sequence length >5,400 tokens by late 2025 [1]. + context_range: [4096, 16384] + # Longer outputs for summaries/analysis; still within ~400 avg [1]. + generation_range: [200, 800] + think_time_range: [0.3, 1.5] + +# ============================================================================= +# TOKEN GENERATION TIMING +# Simulates GPU processing time per token for different modes. +# Values in seconds per token. +# - none: Pure storage benchmark (0 delay, 100% I/O latency) +# - fast: Fast GPU simulation (2ms/token) +# - realistic: Realistic GPU simulation (30ms/token) +# ============================================================================= +generation_timing: + none: 0.0 + fast: 0.002 + realistic: 0.030 + +# ============================================================================= +# QOS PROFILES (Quality of Service) +# Defines SLA targets for different priority levels. +# All latency values in milliseconds. +# priority: Higher number = higher priority (3 > 2 > 1) +# ============================================================================= +qos_profiles: + interactive: + # Highest priority - real-time applications like chatbots + target_latency_p95_ms: 50 + target_latency_p99_ms: 100 + target_latency_p999_ms: 150 # 3 nines (99.9%) + target_latency_p9999_ms: 200 # 4 nines (99.99%) + priority: 3 + responsive: + # Medium priority - near real-time tasks + target_latency_p95_ms: 100 + target_latency_p99_ms: 200 + target_latency_p999_ms: 350 + target_latency_p9999_ms: 500 + priority: 2 + batch: + # Low priority - offline/background processing + target_latency_p95_ms: 1000 + target_latency_p99_ms: 5000 + target_latency_p999_ms: 7500 + target_latency_p9999_ms: 10000 + priority: 1 + +# ============================================================================= +# QOS DISTRIBUTION +# Controls how requests are distributed across QoS levels. +# interactive_probability: Fraction of requests that are INTERACTIVE (default 15%) +# responsive_threshold: Cumulative threshold - if rand < this and not INTERACTIVE, use RESPONSIVE +# Example: 0.15 interactive, 0.50 threshold → 15% INTERACTIVE, 35% RESPONSIVE, 50% BATCH +# ============================================================================= +qos_distribution: + interactive_probability: 0.15 + responsive_threshold: 0.50 + +# ============================================================================= +# EVICTION SETTINGS +# Controls the multi-tier LRU eviction behavior. +# ============================================================================= +eviction: + max_recursion_depth: 10 + target_usage_ratio: 0.8 # Try to keep tier at 80% capacity (20% buffer) + large_entry_limit_ratio: 0.95 # Skip to next tier if entry > 95% of tier capacity + max_evictions_hard_cap: 5000 # Safety limit per eviction cycle + max_evictions_min: 1000 # Minimum evictions before giving up + +# ============================================================================= +# GPU BACKEND SETTINGS +# Controls GPU memory allocation and OOM handling. +# ============================================================================= +gpu_backend: + memory_fraction: 0.9 # Use 90% of GPU memory + max_eviction_attempts: 100 # Max evictions during OOM recovery + free_memory_threshold: 0.1 # Keep 10% GPU memory free + +# ============================================================================= +# PREFIX CACHE SETTINGS +# Controls hierarchical prefix caching for system prompts. +# ============================================================================= +prefix_cache: + min_prefix_length: 50 # Minimum tokens for prefix matching + max_prefix_entries: 1000 # Max cached prefix entries + system_prompt_hit_probability: 0.2 # 20% of requests have common system prompt + +# ============================================================================= +# RAG SETTINGS +# Controls Retrieval-Augmented Generation workload simulation. +# +# retrieval_distribution options: +# - "zipfian": Earlier chunks more likely (realistic - document intros are often relevant) +# - "uniform": All chunks equally likely (random access pattern) +# - "random": Alias for uniform +# +# Document token ranges are model-aware: +# - Large models (hidden_dim >= 8192 or layers >= 64) have bigger per-token KV cache, +# so we use fewer tokens per document to avoid memory pressure. +# - Smaller models can handle larger documents. +# ============================================================================= +rag: + chunk_size_tokens: 512 # Tokens per document chunk + top_k_chunks: 5 # Number of chunks retrieved per query + max_chunk_bytes: 268435456 # 256MB max per chunk (256 * 1024 * 1024) + request_probability: 0.1 # Probability of RAG operation per request (0.0-1.0) + retrieval_distribution: "zipfian" # Distribution for chunk selection: zipfian, uniform, random + max_documents: 0 # Max documents before eviction (0 = unlimited) + # Document token ranges (model-aware sizing) + large_model_doc_tokens_min: 1024 # Min tokens for large models (70B+) + large_model_doc_tokens_max: 4096 # Max tokens for large models + small_model_doc_tokens_min: 4000 # Min tokens for smaller models + small_model_doc_tokens_max: 12000 # Max tokens for smaller models + +# ============================================================================= +# CONVERSATION SETTINGS +# Controls multi-turn conversation behavior. +# ============================================================================= +conversation: + max_conversations: 1000 # Max active conversations in memory + max_turns_per_conv: 50 # Max turns before conversation reset + end_conversation_probability: 0.2 # 20% chance to end conversation each turn + +# ============================================================================= +# AUTOSCALER SETTINGS +# Controls workload autoscaling to find saturation point. +# ============================================================================= +autoscaler: + min_users: 1 + max_users: 10000 + scale_up_factor: 1.2 # Increase users by 20% when scaling up + scale_down_factor: 0.8 # Decrease users by 20% when scaling down + consecutive_samples_required: 2 # Samples needed before scale action + +# ============================================================================= +# DECODE PHASE SETTINGS +# Controls token generation batching. +# ============================================================================= +decode: + batch_size: 32 # Tokens per decode batch + +# ============================================================================= +# SHAREGPT DATASET SETTINGS +# Controls ShareGPT dataset loading and processing. +# ============================================================================= +sharegpt: + max_context_tokens: 8192 # Truncate context to this length + max_generation_tokens: 2048 # Truncate generation to this length + chars_per_token_estimate: 4 # For tokenization estimation + +# ============================================================================= +# SATURATION DETECTION THRESHOLDS +# Used by StorageMonitor to detect when storage is saturated. +# ============================================================================= +saturation_detection: + read_latency_p95_threshold_ms: 100 + write_latency_p95_threshold_ms: 50 + queue_depth_threshold: 100 + history_window_size: 10 # Number of samples for trend analysis + +# ============================================================================= +# VALIDATION LIMITS +# Safety limits for CLI argument validation. +# ============================================================================= +validation_limits: + max_users: 100000 # Max simulated users + max_duration_seconds: 86400 # 24 hours max benchmark duration + max_gpu_memory_gb: 1024 # 1TB max GPU memory + max_cpu_memory_gb: 16384 # 16TB max CPU memory + +# A dictionary of pre-defined model configurations that can be selected via command line. + +# ...existing code... + +model_configs: +# Formula: 2 × num_layers × 1 × kv_heads × head_dim +# head_dim = hidden_dim / num_heads +# Total Bytes = Total Elements × dtype_size (2 for float16) + +# Tiny 1B: Synthetic test model (no HuggingFace source — benchmark-internal config) +# head_dim = 1024 / 8 = 128 +# Total Elements: 2 × 12 × 1 × 4 × 128 = 12288 +# Total Bytes: 12288 × 2 = 24576 bytes +# KV Cache Size per token: 24576 / (1024³) ≈ 0.000023 GB (0.023 MB) + tiny-1b: + name: "Tiny 1B" + num_layers: 12 + hidden_dim: 1024 + num_heads: 8 + kv_heads: 4 + dtype: "float16" + +# Source: https://huggingface.co/mistralai/Mistral-7B-v0.1/blob/main/config.json +# head_dim = 4096 / 32 = 128 +# Total Elements: 2 × 32 × 1 × 8 × 128 = 65536 +# Total Bytes: 65536 × 2 = 131072 bytes +# KV Cache Size per token: 131072 / (1024³) ≈ 0.000122 GB (0.125 MB) + mistral-7b: + name: "Mistral 7B" + num_layers: 32 + hidden_dim: 4096 + num_heads: 32 + kv_heads: 8 + dtype: "float16" + +# Source: https://huggingface.co/meta-llama/Llama-2-7b-hf/blob/main/config.json +# head_dim = 4096 / 32 = 128 +# Total Elements: 2 × 32 × 1 × 32 × 128 = 262144 +# Total Bytes: 262144 × 2 = 524288 bytes +# KV Cache Size per token: 524288 / (1024³) ≈ 0.000488 GB (0.500 MB) + llama2-7b: + name: "Llama 2 7B" + num_layers: 32 + hidden_dim: 4096 + num_heads: 32 + kv_heads: 32 + dtype: "float16" + +# Source: https://huggingface.co/meta-llama/Llama-3.1-8B/blob/main/config.json +# head_dim = 4096 / 32 = 128 +# Total Elements: 2 × 32 × 1 × 8 × 128 = 65536 +# Total Bytes: 65536 × 2 = 131072 bytes +# KV Cache Size per token: 131072 / (1024³) ≈ 0.000122 GB (0.125 MB) + llama3.1-8b: + name: "Llama 3.1 8B" + num_layers: 32 + hidden_dim: 4096 + num_heads: 32 + kv_heads: 8 + dtype: "float16" + +# Source: https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/config.json +# head_dim = 8192 / 64 = 128 +# Total Elements: 2 × 80 × 1 × 8 × 128 = 163840 +# Total Bytes: 163840 × 2 = 327680 bytes +# KV Cache Size per token: 327680 / (1024³) ≈ 0.000305 GB (0.313 MB) + llama3.1-70b-instruct: + name: "Llama 3.1 70B Instruct" + num_layers: 80 + hidden_dim: 8192 + num_heads: 64 + kv_heads: 8 + dtype: "float16" + +# DeepSeek-v3 uses Multi-head Latent Attention (MLA). +# MLA compresses K and V into a single latent vector (kv_lora_rank=512) +# plus a decoupled RoPE key (qk_rope_head_dim=64), cached per layer. +# Formula: num_layers × (kv_lora_rank + qk_rope_head_dim) × dtype_bytes +# Total Elements: 61 × (512 + 64) = 61 × 576 = 35136 +# Total Bytes: 35136 × 2 = 70272 bytes +# KV Cache Size per token: 70272 / (1024³) ≈ 0.000065 GB (0.067 MB) +# Sources: DeepSeek-V3 Technical Report (arXiv:2412.19437), HuggingFace config.json + deepseek-v3: + name: "Deepseek v3" + num_layers: 61 + num_heads: 128 + hidden_dim: 7168 + attention_type: "mla" + kv_lora_rank: 512 + qk_rope_head_dim: 64 + dtype: "float16" + +# Qwen3-32B: head_dim = 128 (explicitly set in HF config, NOT hidden_dim/num_heads=80) +# Source: https://huggingface.co/Qwen/Qwen3-32B/blob/main/config.json +# Total Elements: 2 × 64 × 1 × 8 × 128 = 131072 +# Total Bytes: 131072 × 2 = 262144 bytes +# KV Cache Size per token: 262144 / (1024³) ≈ 0.000244 GB (0.250 MB) + qwen3-32b: + name: "Qwen 3 32B" + num_layers: 64 + num_heads: 64 + hidden_dim: 5120 + kv_heads: 8 + kv_dim_per_head: 128 + dtype: "float16" + +# GPT-OSS 120B: MoE model (117B total, 5.1B active) - fits on single 80GB GPU +# Source: https://huggingface.co/openai/gpt-oss-120b/blob/main/config.json +# Paper: https://arxiv.org/abs/2508.10925 +# head_dim = 64 (explicitly set in config.json, NOT hidden_dim/num_heads=45) +# Total Elements: 2 × 36 × 1 × 8 × 64 = 36864 +# Total Bytes: 36864 × 2 = 73728 bytes +# KV Cache Size per token: 73728 / (1024³) ≈ 0.000069 GB (0.070 MB) + gpt-oss-120b: + name: "GPT-OSS 120B (5.1B active)" + num_layers: 36 + num_heads: 64 + hidden_dim: 2880 + kv_heads: 8 + kv_dim_per_head: 64 + dtype: "float16" + +# GPT-OSS 20B: MoE model (21B total, 3.6B active) - fits in 16GB memory +# Source: https://huggingface.co/openai/gpt-oss-20b/blob/main/config.json +# Paper: https://arxiv.org/abs/2508.10925 +# head_dim = 64 (explicitly set in config.json, NOT hidden_dim/num_heads=45) +# Total Elements: 2 × 24 × 1 × 8 × 64 = 24576 +# Total Bytes: 24576 × 2 = 49152 bytes +# KV Cache Size per token: 49152 / (1024³) ≈ 0.000046 GB (0.047 MB) + gpt-oss-20b: + name: "GPT-OSS 20B (3.6B active)" + num_layers: 24 + num_heads: 64 + hidden_dim: 2880 + kv_heads: 8 + kv_dim_per_head: 64 + dtype: "float16" diff --git a/kv_cache_benchmark/discovery_results_and_analysis/Recommended Invocations for MLperf v3.md b/kv_cache_benchmark/discovery_results_and_analysis/Recommended Invocations for MLperf v3.md deleted file mode 100644 index dda0dafa..00000000 --- a/kv_cache_benchmark/discovery_results_and_analysis/Recommended Invocations for MLperf v3.md +++ /dev/null @@ -1,91 +0,0 @@ -## Recommended Invocations by Model - -### Why Two Invocations (cpu_mem=0 vs cpu_mem=4)? - -| cpu_mem | Purpose | Primary Metric | Why | -| -------- | -------------------------------- | ---------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------- | -| **0 GB** | **Maximum Storage Stress** | Decode Bytes Read, Wall-Clock Throughput | All I/O goes through NVMe. 4x more read traffic. True test of storage bandwidth. | -| **4 GB** | **Storage Throughput Benchmark** | Storage Throughput (tok/s) | Some data cached in RAM. Storage Throughput metric works correctly (2.2x ratio). More representative of production inference workloads. | - ---- - -### llama2-7b - -| Parameter | cpu_mem=0 (Storage Stress) | cpu_mem=4 (Throughput) | -| ------------------------- | -------------------------- | ---------------------- | -| `--cpu-memory-gb` | **0** | **4** | -| `--max-concurrent-allocs` | **0** | **4** | -| `--users` | **150** | **200** | -| `--duration` | **300** | **300** | -| `--generation-mode` | **none** | **none** | -| **Expected Ratio** | WC Tput: **4.64x** | Stor Tput: **2.34x** | - -```bash -# llama2-7b: Storage Stress (cpu_mem=0) -python kv-cache.py --model llama2-7b --cpu-memory-gb 0 --max-concurrent-allocs 0 --users 150 --duration 300 --generation-mode none --output results/llama2-7b_stress_trial${N}.json - -# llama2-7b: Throughput Benchmark (cpu_mem=4) -python kv-cache.py --model llama2-7b --cpu-memory-gb 4 --max-concurrent-allocs 4 --users 200 --duration 300 --generation-mode none --output results/llama2-7b_tput_trial${N}.json -``` - ---- - -### llama3.1-8b - -| Parameter | cpu_mem=0 (Storage Stress) | cpu_mem=4 (Throughput) | -|-----------|---------------------------|------------------------| -| `--cpu-memory-gb` | **0** | **4** | -| `--max-concurrent-allocs` | **0** | **0** | -| `--users` | **200** | **150** | -| `--duration` | **300** | **300** | -| `--generation-mode` | **none** | **none** | -| **Expected Ratio** | WC Tput: **2.70x** | Stor Tput: **2.87x** | - -```bash -# llama3.1-8b: Storage Stress (cpu_mem=0) -python kv-cache.py --model llama3.1-8b --cpu-memory-gb 0 --max-concurrent-allocs 0 --users 200 --duration 300 --generation-mode none --output results/llama3.1-8b_stress_trial${N}.json - -# llama3.1-8b: Throughput Benchmark (cpu_mem=4) -python kv-cache.py --model llama3.1-8b --cpu-memory-gb 4 --max-concurrent-allocs 0 --users 150 --duration 300 --generation-mode none --output results/llama3.1-8b_tput_trial${N}.json -``` - ---- - -### llama3.1-70b-instruct - -| Parameter | cpu_mem=0 (Storage Stress) | cpu_mem=4 (Throughput) | -|-----------|---------------------------|------------------------| -| `--cpu-memory-gb` | **0** | **4** | -| `--max-concurrent-allocs` | **0** | **4** | -| `--users` | **70** | **20** | -| `--duration` | **300** | **300** | -| `--generation-mode` | **none** | **none** | -| **Expected Ratio** | WC Tput: **2.44x** | Stor Tput: **3.25x** | - -```bash -# llama3.1-70b: Storage Stress (cpu_mem=0) -python kv-cache.py --model llama3.1-70b-instruct --cpu-memory-gb 0 --max-concurrent-allocs 0 --users 70 --duration 300 --generation-mode none --output results/llama3.1-70b_stress_trial${N}.json - -# llama3.1-70b: Throughput Benchmark (cpu_mem=4) -python kv-cache.py --model llama3.1-70b-instruct --cpu-memory-gb 4 --max-concurrent-allocs 4 --users 20 --duration 300 --generation-mode none --output results/llama3.1-70b_tput_trial${N}.json -``` - ---- - -## Summary Table - -| Model | Invocation | cpu_mem | mca | users | Primary Metric | Expected Ratio | -|-------|------------|---------|-----|-------|----------------|----------------| -| **llama2-7b** | Stress | 0 | 0 | 150 | WC Throughput | 4.64x | -| **llama2-7b** | Tput | 4 | 4 | 200 | Stor Throughput | 2.34x | -| **llama3.1-8b** | Stress | 0 | 0 | 200 | WC Throughput | 2.70x | -| **llama3.1-8b** | Tput | 4 | 0 | 150 | Stor Throughput | 2.87x | -| **llama3.1-70b** | Stress | 0 | 0 | 70 | WC Throughput | 2.44x | -| **llama3.1-70b** | Tput | 4 | 4 | 20 | Stor Throughput | 3.25x | - -**Notes:** -- **70b model uses fewer users** because larger KV cache = more memory per request -- **mca=0 often best at cpu_mem=0** (no allocation throttling when fully I/O-bound) -- **mca=4 often best at cpu_mem=4** (moderate throttling helps throughput) -- **gen_mode=none** for pure storage benchmark (no simulated token delays) -- **Run 3-5 trials** and report median \ No newline at end of file diff --git a/kv_cache_benchmark/discovery_results_and_analysis/analyze_iostat.py b/kv_cache_benchmark/discovery_results_and_analysis/analyze_iostat.py deleted file mode 100644 index e7949799..00000000 --- a/kv_cache_benchmark/discovery_results_and_analysis/analyze_iostat.py +++ /dev/null @@ -1,309 +0,0 @@ -#!/usr/bin/env python3 -""" -Analyze iostat files from kv-cache.py benchmark runs. -Goal: Find configurations that stress storage the most for MLPerf v3 submissions. -""" - -import os -import re -import glob -import pandas as pd -import numpy as np -from collections import defaultdict - -def parse_iostat_file(filepath): - """Parse an iostat file and extract device metrics.""" - metrics = [] - - with open(filepath, 'r') as f: - lines = f.readlines() - - # Find header line and parse subsequent data lines - header_idx = None - for i, line in enumerate(lines): - if line.strip().startswith('Device'): - header_idx = i - # Parse the data line after the header (if it exists and has nvme data) - if i + 1 < len(lines): - data_line = lines[i + 1].strip() - if data_line.startswith('nvme'): - parts = data_line.split() - if len(parts) >= 21: - try: - metrics.append({ - 'device': parts[0], - 'r_s': float(parts[1]), # reads/sec - 'rMB_s': float(parts[2]), # read MB/s - 'r_await': float(parts[5]), # read latency ms - 'rareq_sz': float(parts[6]), # read request size KB - 'w_s': float(parts[7]), # writes/sec - 'wMB_s': float(parts[8]), # write MB/s - 'w_await': float(parts[11]), # write latency ms - 'wareq_sz': float(parts[12]), # write request size KB - 'aqu_sz': float(parts[20]), # average queue size - 'util': float(parts[21]), # utilization % - }) - except (ValueError, IndexError): - pass - - return metrics - -def parse_filename(filename): - """Extract configuration from filename.""" - # iostat_nvme3n1_llama2-7b_cpu0GB_qd32_gennone_users50.txt - basename = os.path.basename(filename) - - m = re.search(r'(llama\d+\.?\d*-\d+b(?:-instruct)?|mistral-\d+b)', basename, re.I) - model = m.group(1).lower().replace('-instruct', '') if m else None - - m = re.search(r'cpu(\d+)GB', basename, re.I) - cpu_mem = int(m.group(1)) if m else None - - m = re.search(r'qd(\d+)', basename, re.I) - mca = int(m.group(1)) if m else None - - m = re.search(r'gen(none|realistic)', basename, re.I) - gen_mode = m.group(1).lower() if m else None - - m = re.search(r'users(\d+)', basename, re.I) - users = int(m.group(1)) if m else None - - return { - 'model': model, - 'cpu_mem': cpu_mem, - 'mca': mca, - 'gen_mode': gen_mode, - 'users': users - } - -def analyze_iostat_files(directory): - """Analyze all iostat files in a directory.""" - results = [] - - pattern = os.path.join(directory, 'iostat_*.txt') - files = glob.glob(pattern) - - print(f"Found {len(files)} iostat files") - - for filepath in files: - config = parse_filename(filepath) - metrics = parse_iostat_file(filepath) - - if not metrics: - continue - - # Filter out zero-activity samples (benchmark idle periods) - active_metrics = [m for m in metrics if m['rMB_s'] > 0 or m['wMB_s'] > 0] - - if not active_metrics: - continue - - # Calculate averages - avg = { - 'r_s': np.mean([m['r_s'] for m in active_metrics]), - 'rMB_s': np.mean([m['rMB_s'] for m in active_metrics]), - 'r_await': np.mean([m['r_await'] for m in active_metrics]), - 'w_s': np.mean([m['w_s'] for m in active_metrics]), - 'wMB_s': np.mean([m['wMB_s'] for m in active_metrics]), - 'w_await': np.mean([m['w_await'] for m in active_metrics]), - 'aqu_sz': np.mean([m['aqu_sz'] for m in active_metrics]), - 'util': np.mean([m['util'] for m in active_metrics]), - 'total_MB_s': np.mean([m['rMB_s'] + m['wMB_s'] for m in active_metrics]), - 'total_IOPS': np.mean([m['r_s'] + m['w_s'] for m in active_metrics]), - 'samples': len(active_metrics), - } - - results.append({**config, **avg}) - - return pd.DataFrame(results) - -def main(): - # Analyze fast system iostat files - fast_dir = 'results_fast/results' - - print("=" * 80) - print("IOSTAT ANALYSIS FOR KV-CACHE BENCHMARK") - print("Goal: Find configurations that stress storage the most") - print("=" * 80) - print() - - df = analyze_iostat_files(fast_dir) - - if df.empty: - print("No iostat data found!") - return - - print(f"Parsed {len(df)} configurations with iostat data") - print() - - # Sort by total throughput (storage stress indicator) - df_sorted = df.sort_values('total_MB_s', ascending=False) - - print("=" * 80) - print("TOP 20 CONFIGURATIONS BY TOTAL STORAGE THROUGHPUT (MB/s)") - print("=" * 80) - print() - print("| Model | CPU | MCA | Gen | Users | Read MB/s | Write MB/s | Total MB/s | IOPS | Queue | Util% |") - print("|-------|-----|-----|-----|-------|-----------|------------|------------|------|-------|-------|") - - for _, row in df_sorted.head(20).iterrows(): - model_short = str(row['model']).replace('llama', 'L').replace('mistral', 'M') if row['model'] else 'N/A' - print(f"| {model_short} | {int(row['cpu_mem']) if pd.notna(row['cpu_mem']) else 'N/A'} | {int(row['mca']) if pd.notna(row['mca']) else 'N/A'} | {row['gen_mode'] or 'N/A'} | {int(row['users']) if pd.notna(row['users']) else 'N/A'} | {row['rMB_s']:.0f} | {row['wMB_s']:.0f} | {row['total_MB_s']:.0f} | {row['total_IOPS']:.0f} | {row['aqu_sz']:.1f} | {row['util']:.1f} |") - - print() - print("=" * 80) - print("ANALYSIS BY MODEL (Average across all configs)") - print("=" * 80) - print() - - model_agg = df.groupby('model').agg({ - 'rMB_s': 'mean', - 'wMB_s': 'mean', - 'total_MB_s': 'mean', - 'total_IOPS': 'mean', - 'aqu_sz': 'mean', - 'util': 'mean', - 'samples': 'sum' - }).sort_values('total_MB_s', ascending=False) - - print("| Model | Avg Read MB/s | Avg Write MB/s | Avg Total MB/s | Avg IOPS | Avg Queue | Avg Util% | Configs |") - print("|-------|---------------|----------------|----------------|----------|-----------|-----------|---------|") - - for model, row in model_agg.iterrows(): - model_short = str(model).replace('llama', 'L').replace('mistral', 'M') if model else 'N/A' - print(f"| {model_short} | {row['rMB_s']:.0f} | {row['wMB_s']:.0f} | {row['total_MB_s']:.0f} | {row['total_IOPS']:.0f} | {row['aqu_sz']:.1f} | {row['util']:.1f} | {int(row['samples'])} |") - - print() - print("=" * 80) - print("ANALYSIS BY CPU MEMORY (Critical for storage stress)") - print("=" * 80) - print() - - cpu_agg = df.groupby('cpu_mem').agg({ - 'rMB_s': 'mean', - 'wMB_s': 'mean', - 'total_MB_s': 'mean', - 'total_IOPS': 'mean', - 'aqu_sz': 'mean', - 'util': 'mean', - 'r_await': 'mean', - 'w_await': 'mean', - 'samples': 'sum' - }).sort_values('total_MB_s', ascending=False) - - print("| CPU Mem | Avg Read MB/s | Avg Write MB/s | Avg Total MB/s | Read Lat ms | Write Lat ms | Queue | Util% |") - print("|---------|---------------|----------------|----------------|-------------|--------------|-------|-------|") - - for cpu_mem, row in cpu_agg.iterrows(): - print(f"| {int(cpu_mem)} GB | {row['rMB_s']:.0f} | {row['wMB_s']:.0f} | {row['total_MB_s']:.0f} | {row['r_await']:.2f} | {row['w_await']:.2f} | {row['aqu_sz']:.1f} | {row['util']:.1f} |") - - print() - print("=" * 80) - print("ANALYSIS BY MAX CONCURRENT ALLOCS (MCA / Queue Depth)") - print("=" * 80) - print() - - mca_agg = df.groupby('mca').agg({ - 'rMB_s': 'mean', - 'wMB_s': 'mean', - 'total_MB_s': 'mean', - 'total_IOPS': 'mean', - 'aqu_sz': 'mean', - 'util': 'mean', - 'samples': 'sum' - }).sort_values('mca') - - print("| MCA | Avg Read MB/s | Avg Write MB/s | Avg Total MB/s | Avg IOPS | Avg Queue | Avg Util% |") - print("|-----|---------------|----------------|----------------|----------|-----------|-----------|") - - for mca, row in mca_agg.iterrows(): - print(f"| {int(mca)} | {row['rMB_s']:.0f} | {row['wMB_s']:.0f} | {row['total_MB_s']:.0f} | {row['total_IOPS']:.0f} | {row['aqu_sz']:.1f} | {row['util']:.1f} |") - - print() - print("=" * 80) - print("ANALYSIS BY USER COUNT") - print("=" * 80) - print() - - user_agg = df.groupby('users').agg({ - 'rMB_s': 'mean', - 'wMB_s': 'mean', - 'total_MB_s': 'mean', - 'total_IOPS': 'mean', - 'aqu_sz': 'mean', - 'util': 'mean', - 'samples': 'sum' - }).sort_values('users') - - print("| Users | Avg Read MB/s | Avg Write MB/s | Avg Total MB/s | Avg IOPS | Avg Queue | Avg Util% |") - print("|-------|---------------|----------------|----------------|----------|-----------|-----------|") - - for users, row in user_agg.iterrows(): - print(f"| {int(users)} | {row['rMB_s']:.0f} | {row['wMB_s']:.0f} | {row['total_MB_s']:.0f} | {row['total_IOPS']:.0f} | {row['aqu_sz']:.1f} | {row['util']:.1f} |") - - print() - print("=" * 80) - print("ANALYSIS BY GENERATION MODE") - print("=" * 80) - print() - - gen_agg = df.groupby('gen_mode').agg({ - 'rMB_s': 'mean', - 'wMB_s': 'mean', - 'total_MB_s': 'mean', - 'total_IOPS': 'mean', - 'aqu_sz': 'mean', - 'util': 'mean', - 'samples': 'sum' - }).sort_values('total_MB_s', ascending=False) - - print("| Gen Mode | Avg Read MB/s | Avg Write MB/s | Avg Total MB/s | Avg IOPS | Avg Queue | Avg Util% |") - print("|----------|---------------|----------------|----------------|----------|-----------|-----------|") - - for gen_mode, row in gen_agg.iterrows(): - print(f"| {gen_mode} | {row['rMB_s']:.0f} | {row['wMB_s']:.0f} | {row['total_MB_s']:.0f} | {row['total_IOPS']:.0f} | {row['aqu_sz']:.1f} | {row['util']:.1f} |") - - print() - print("=" * 80) - print("KEY FINDINGS FOR MAXIMUM STORAGE STRESS") - print("=" * 80) - print() - - # Find best config for each dimension - best_throughput = df_sorted.iloc[0] - best_util = df.sort_values('util', ascending=False).iloc[0] - best_queue = df.sort_values('aqu_sz', ascending=False).iloc[0] - - print(f"HIGHEST THROUGHPUT CONFIG:") - print(f" Model: {best_throughput['model']}, cpu_mem: {best_throughput['cpu_mem']}GB, mca: {best_throughput['mca']}, users: {best_throughput['users']}") - print(f" Total: {best_throughput['total_MB_s']:.0f} MB/s (Read: {best_throughput['rMB_s']:.0f}, Write: {best_throughput['wMB_s']:.0f})") - print() - - print(f"HIGHEST UTILIZATION CONFIG:") - print(f" Model: {best_util['model']}, cpu_mem: {best_util['cpu_mem']}GB, mca: {best_util['mca']}, users: {best_util['users']}") - print(f" Utilization: {best_util['util']:.1f}%, Throughput: {best_util['total_MB_s']:.0f} MB/s") - print() - - print(f"HIGHEST QUEUE DEPTH CONFIG:") - print(f" Model: {best_queue['model']}, cpu_mem: {best_queue['cpu_mem']}GB, mca: {best_queue['mca']}, users: {best_queue['users']}") - print(f" Queue Depth: {best_queue['aqu_sz']:.1f}, Throughput: {best_queue['total_MB_s']:.0f} MB/s") - print() - - # Best by cpu_mem - print("BEST CPU_MEM FOR STORAGE STRESS:") - best_cpu = cpu_agg['total_MB_s'].idxmax() - print(f" cpu_mem={int(best_cpu)}GB: {cpu_agg.loc[best_cpu, 'total_MB_s']:.0f} MB/s average") - print() - - # Best by model - print("BEST MODEL FOR STORAGE STRESS:") - best_model = model_agg['total_MB_s'].idxmax() - print(f" {best_model}: {model_agg.loc[best_model, 'total_MB_s']:.0f} MB/s average") - print() - - # Save to CSV for further analysis - df.to_csv('iostat_analysis.csv', index=False) - print("Full data saved to iostat_analysis.csv") - -if __name__ == '__main__': - main() diff --git a/kv_cache_benchmark/discovery_results_and_analysis/analyze_iostat_summary.py b/kv_cache_benchmark/discovery_results_and_analysis/analyze_iostat_summary.py deleted file mode 100644 index 10a28b79..00000000 --- a/kv_cache_benchmark/discovery_results_and_analysis/analyze_iostat_summary.py +++ /dev/null @@ -1,94 +0,0 @@ -#!/usr/bin/env python3 -"""Summarize iostat analysis focusing on cpu_mem=0 configurations for maximum storage stress.""" - -import pandas as pd - -df = pd.read_csv('iostat_analysis.csv') -# Rename columns for convenience -df = df.rename(columns={'rMB_s': 'read_mbs', 'wMB_s': 'write_mbs', 'total_MB_s': 'total_mbs', 'total_IOPS': 'iops'}) - -# Sort by read throughput -top_read = df.nlargest(30, 'read_mbs') -print('=' * 100) -print('TOP 30 CONFIGURATIONS BY READ THROUGHPUT (Maximum Storage Stress)') -print('=' * 100) -print(f"{'Model':<12} {'CPU':<5} {'MCA':<5} {'Gen':<10} {'Users':<6} {'Read MB/s':>10} {'Write MB/s':>11} {'Util%':>7}") -print('-' * 100) -for _, row in top_read.iterrows(): - print(f"{row['model']:<12} {int(row['cpu_mem']):<5} {int(row['mca']):<5} {row['gen_mode']:<10} {int(row['users']):<6} {row['read_mbs']:>10.0f} {row['write_mbs']:>11.0f} {row['util']:>7.1f}") - -print() -print('=' * 100) -print('SUMMARY: Optimal Parameters for Maximum Storage Stress (cpu_mem=0 only)') -print('=' * 100) - -# Filter to cpu_mem=0 (maximum storage stress) -cpu0 = df[df['cpu_mem'] == 0] - -print() -print('BY MODEL (cpu_mem=0):') -model_avg = cpu0.groupby('model').agg({'read_mbs': 'mean', 'write_mbs': 'mean', 'total_mbs': 'mean', 'util': 'mean', 'model': 'count'}) -model_avg.columns = ['Read MB/s', 'Write MB/s', 'Total MB/s', 'Util%', 'Configs'] -print(model_avg.sort_values('Total MB/s', ascending=False).round(0).to_string()) - -print() -print('BY USERS (cpu_mem=0):') -users_avg = cpu0.groupby('users').agg({'read_mbs': 'mean', 'write_mbs': 'mean', 'total_mbs': 'mean', 'util': 'mean', 'model': 'count'}) -users_avg.columns = ['Read MB/s', 'Write MB/s', 'Total MB/s', 'Util%', 'Configs'] -print(users_avg.sort_values('Total MB/s', ascending=False).round(0).to_string()) - -print() -print('BY MCA (cpu_mem=0):') -mca_avg = cpu0.groupby('mca').agg({'read_mbs': 'mean', 'write_mbs': 'mean', 'total_mbs': 'mean', 'util': 'mean', 'model': 'count'}) -mca_avg.columns = ['Read MB/s', 'Write MB/s', 'Total MB/s', 'Util%', 'Configs'] -print(mca_avg.sort_values('Total MB/s', ascending=False).round(0).to_string()) - -print() -print('BY GEN_MODE (cpu_mem=0):') -gen_avg = cpu0.groupby('gen_mode').agg({'read_mbs': 'mean', 'write_mbs': 'mean', 'total_mbs': 'mean', 'util': 'mean', 'model': 'count'}) -gen_avg.columns = ['Read MB/s', 'Write MB/s', 'Total MB/s', 'Util%', 'Configs'] -print(gen_avg.sort_values('Total MB/s', ascending=False).round(0).to_string()) - -print() -print('=' * 100) -print('OPTIMAL INVOCATION PARAMETERS FOR MAXIMUM STORAGE STRESS') -print('=' * 100) - -# Find best combination -best = cpu0.nlargest(1, 'total_mbs').iloc[0] -print(f""" -RECOMMENDED INVOCATION: - --model: mistral-7b or llama3.1-8b (both show ~10 GB/s peak throughput) - --cpu_mem: 0GB (forces all I/O to storage, 6.8x higher read throughput than cpu_mem=4GB) - --max_concurrent_allocs: 16 or 32 (slight peak at 16) - --users: 200 (highest throughput) or 150 (good balance) - --gen_mode: none (slightly higher throughput than realistic) - -PEAK CONFIGURATION OBSERVED: - {best['model']}, cpu_mem={int(best['cpu_mem'])}GB, mca={int(best['mca'])}, gen={best['gen_mode']}, users={int(best['users'])} - Read: {best['read_mbs']:.0f} MB/s, Write: {best['write_mbs']:.0f} MB/s, Total: {best['total_mbs']:.0f} MB/s - -KEY INSIGHT: cpu_mem=0GB is THE critical parameter for storage stress: - - cpu_mem=0GB: {cpu0['read_mbs'].mean():.0f} MB/s average read throughput - - cpu_mem=4GB: {df[df['cpu_mem']==4]['read_mbs'].mean():.0f} MB/s average read throughput - - Ratio: {cpu0['read_mbs'].mean() / df[df['cpu_mem']==4]['read_mbs'].mean():.1f}x more reads with cpu_mem=0 -""") - -# Cross-tab analysis: Model x Users for cpu_mem=0 -print('=' * 100) -print('DETAILED: Model x Users (cpu_mem=0, averaged across MCA and gen_mode)') -print('=' * 100) -pivot = cpu0.pivot_table(values='total_mbs', index='model', columns='users', aggfunc='mean').round(0) -print(pivot.to_string()) - -print() -print('=' * 100) -print('VALIDATION: Comparing cpu_mem settings') -print('=' * 100) -cpu_comparison = df.groupby('cpu_mem').agg({ - 'read_mbs': ['mean', 'max'], - 'write_mbs': ['mean', 'max'], - 'total_mbs': ['mean', 'max'], - 'util': 'mean' -}).round(0) -print(cpu_comparison.to_string()) diff --git a/kv_cache_benchmark/discovery_results_and_analysis/iostat_analysis.csv b/kv_cache_benchmark/discovery_results_and_analysis/iostat_analysis.csv deleted file mode 100644 index 9316c9b4..00000000 --- a/kv_cache_benchmark/discovery_results_and_analysis/iostat_analysis.csv +++ /dev/null @@ -1,1502 +0,0 @@ -model,cpu_mem,mca,gen_mode,users,r_s,rMB_s,r_await,w_s,wMB_s,w_await,aqu_sz,util,total_MB_s,total_IOPS,samples -llama2-7b,0,32,none,50,50532.39817258883,6312.818020304568,1.167208121827411,7270.273959390863,903.8692385786803,17.226598984771574,0.0,264.91035532994925,7216.687258883249,57802.6721319797,197 -llama3.1-8b,4,2,realistic,150,99.28274647887324,12.387676056338028,0.011267605633802816,8689.524014084509,1084.3177464788732,3.600985915492958,0.0,31.971549295774647,1096.7054225352113,8788.80676056338,142 -llama3.1-8b,8,8,none,200,111.57461038961037,13.923051948051949,0.012337662337662337,9024.559285714286,1125.9201948051948,3.659935064935065,0.0,33.746623376623376,1139.8432467532466,9136.133896103896,154 -llama3.1-70b,0,64,realistic,50,47980.556627906975,5991.9123255813965,1.070639534883721,5647.586046511628,700.0381976744186,21.03651162790698,0.0,226.55691860465112,6691.950523255812,53628.1426744186,172 -llama2-7b,0,32,realistic,50,48992.74206349206,6120.342645502646,1.0765608465608465,6779.02634920635,840.3319576719576,15.667142857142858,0.0,212.9422751322751,6960.674603174603,55771.76841269842,189 -llama2-7b,4,4,realistic,100,11664.70193877551,1457.6323979591837,0.05413265306122451,8060.580867346939,1006.8423979591837,3.8278061224489797,0.0,36.94433673469388,2464.4747959183674,19725.282806122446,196 -llama2-7b,4,8,none,150,13547.451902439025,1692.8370731707319,0.06473170731707317,8511.724292682928,1063.0841951219513,4.5976585365853655,0.0,46.8110731707317,2755.9212682926827,22059.176195121952,205 -mistral-7b,8,64,realistic,100,80.92756578947369,10.103026315789474,0.008684210526315789,8896.810986842105,1110.38625,3.9217763157894736,0.0,36.91072368421053,1120.4892763157895,8977.73855263158,152 -llama3.1-8b,64,2,realistic,50,104.53343137254902,13.045686274509805,0.014117647058823528,8560.98911764706,1067.7036274509803,3.8093137254901963,0.0,34.26500000000001,1080.74931372549,8665.522549019608,102 -llama3.1-70b,0,4,realistic,30,43448.34561728395,5426.401913580247,0.6878395061728395,5027.767962962963,623.8999382716049,20.506172839506174,0.0,194.58469135802468,6050.301851851853,48476.113580246914,162 -llama3.1-70b,0,16,none,30,44155.827388535035,5514.431783439491,0.8462420382165606,5597.863439490447,694.554458598726,21.548853503184713,0.0,224.043949044586,6208.986242038217,49753.69082802548,157 -llama3.1-70b,4,32,none,60,22982.56105263158,2871.7815789473684,0.24783625730994147,7964.761461988304,992.3827485380116,9.796608187134503,0.0,84.15292397660818,3864.1643274853805,30947.322514619886,171 -llama3.1-70b,16,4,none,20,108.4048717948718,13.533247863247864,0.01,10025.38094017094,1251.904615384615,4.082905982905983,0.0,43.30521367521368,1265.437863247863,10133.785811965812,117 -llama2-7b,4,8,none,100,9136.270687830689,1141.7704761904763,0.04825396825396826,8583.250846560848,1071.8233333333335,4.096931216931217,0.0,39.68677248677248,2213.5938095238093,17719.521534391537,189 -mistral-7b,4,32,none,50,3340.966344827586,417.3002068965517,0.024689655172413793,8878.027379310344,1107.829724137931,4.073172413793103,0.0,37.82227586206897,1525.1299310344828,12218.99372413793,145 -llama3.1-8b,0,4,none,100,64044.993062500005,7986.15375,1.5715,7029.1558749999995,869.3083124999999,9.674687500000001,0.0,202.37025,8855.4620625,71074.1489375,160 -llama3.1-8b,32,16,realistic,100,81.60701492537314,10.184477611940299,0.010746268656716417,8737.090895522388,1090.196567164179,3.9041791044776115,0.0,35.66082089552239,1100.3810447761193,8818.697910447761,134 -llama3.1-8b,64,2,none,50,94.0695575221239,11.739734513274335,0.012743362831858406,8292.367256637168,1034.5261946902656,3.8753097345132748,0.0,34.04628318584071,1046.2659292035398,8386.436814159291,113 -llama3.1-70b,0,16,realistic,70,53317.42304347826,6658.448586956522,1.1995108695652175,6139.104565217392,761.0226630434784,10.09125,0.0,188.68483695652176,7419.47125,59456.52760869565,184 -llama3.1-70b,4,64,none,60,15609.998409090911,1950.3539204545457,0.10846590909090909,7273.432556818181,907.1534090909091,5.624204545454545,0.0,51.210909090909084,2857.507329545455,22883.43096590909,176 -llama3.1-70b,8,2,none,20,7079.776229508197,884.6863114754098,0.04901639344262296,8715.531721311476,1088.2160655737703,4.050655737704918,0.0,39.43590163934427,1972.9023770491806,15795.307950819672,122 -llama2-7b,4,0,none,200,18153.306020408163,2268.197346938776,0.07744897959183673,7407.961071428572,924.5329591836735,3.8831632653061225,0.0,36.37224489795918,3192.730306122449,25561.267091836733,196 -llama2-7b,32,4,realistic,50,99.0407258064516,12.373064516129032,0.008548387096774194,10241.424354838711,1279.3732258064515,3.976290322580645,0.0,41.071209677419354,1291.7462903225808,10340.465080645161,124 -llama2-7b,32,64,none,150,63.268282828282835,7.901363636363636,0.0061111111111111106,8717.671262626263,1088.587575757576,4.303838383838384,0.0,39.33242424242425,1096.4889393939397,8780.939545454545,198 -mistral-7b,8,8,realistic,150,116.01847222222221,14.483958333333334,0.010277777777777778,8918.79125,1112.912986111111,3.670138888888889,0.0,33.741597222222225,1127.3969444444442,9034.809722222222,144 -llama3.1-8b,64,2,none,200,76.3094964028777,9.523309352517986,0.010359712230215827,8314.08690647482,1037.0705035971223,3.5068345323741013,0.0,30.091510791366915,1046.59381294964,8390.396402877699,139 -llama3.1-8b,64,16,realistic,150,76.33985401459853,9.527153284671533,0.010510948905109488,8518.204379562043,1062.7681021897808,3.531824817518248,0.0,31.165036496350364,1072.2952554744527,8594.544233576642,137 -mistral-7b,4,4,none,150,97.99932432432432,12.231013513513513,0.010067567567567567,8990.742094594596,1121.9142567567567,3.7747297297297293,0.0,34.65459459459459,1134.1452702702702,9088.741418918918,148 -llama3.1-8b,16,32,realistic,200,71.85841772151898,8.967848101265822,0.009113924050632912,8792.47417721519,1096.9619620253163,3.6010126582278486,0.0,32.771329113924054,1105.9298101265822,8864.33259493671,158 -llama3.1-70b,8,64,realistic,20,18731.32403100775,2340.755348837209,0.0724031007751938,8046.043410852713,1004.3543410852715,3.6512403100775193,0.0,40.94271317829457,3345.109689922481,26777.367441860464,129 -llama3.1-8b,32,32,none,200,68.01276729559748,8.487924528301887,0.009056603773584906,8644.042012578617,1078.3998113207547,3.564088050314465,0.0,32.00490566037736,1086.8877358490568,8712.054779874214,159 -llama2-7b,16,32,none,200,58.11399141630901,7.2575536480686695,0.0051931330472103,9279.787682403434,1159.0006008583691,4.354334763948498,0.0,40.68866952789699,1166.258154506438,9337.901673819742,233 -mistral-7b,4,16,realistic,100,2127.043401360544,265.65870748299324,0.016666666666666666,8416.29619047619,1050.3581632653063,3.804149659863946,0.0,34.96421768707483,1316.0168707482992,10543.339591836735,147 -mistral-7b,64,8,realistic,100,84.98821138211382,10.61,0.010731707317073172,8750.60292682927,1091.8947967479673,3.8636585365853655,0.0,35.19317073170732,1102.5047967479675,8835.591138211383,123 -llama3.1-8b,16,8,none,50,89.80015625,11.206953125,0.01125,9034.462421875,1127.172578125,3.854609375,0.0,36.09929687499999,1138.3795312500001,9124.262578124999,128 -llama3.1-8b,32,8,realistic,200,73.73362416107382,9.201879194630871,0.009664429530201342,8692.599463087246,1084.5257718120806,3.5615436241610743,0.0,32.101946308724834,1093.7276510067113,8766.333087248322,149 -llama3.1-8b,64,4,realistic,50,102.89815533980583,12.841553398058252,0.013980582524271845,8709.188349514563,1086.4463106796115,3.829708737864078,0.0,35.16106796116505,1099.2878640776698,8812.08650485437,103 -llama3.1-70b,0,32,realistic,60,50544.37754010695,6312.862459893048,1.179946524064171,6012.602032085562,745.1352406417111,15.8479679144385,0.0,217.57919786096255,7057.99770053476,56556.97957219251,187 -llama3.1-70b,8,32,realistic,40,10174.230316455696,1271.3729113924053,0.04367088607594937,8755.397215189874,1093.0053797468354,3.459303797468354,0.0,35.80120253164557,2364.3782911392404,18929.62753164557,158 -llama2-7b,4,64,none,100,24675.954953703702,3083.29587962963,0.35856481481481484,8574.959537037037,1065.4565277777779,10.60712962962963,0.0,99.62189814814813,4148.752407407407,33250.91449074074,216 -llama2-7b,8,64,realistic,150,5495.732636363636,686.7434545454547,0.023000000000000003,8408.968045454545,1050.0777272727273,4.133454545454546,0.0,39.24063636363637,1736.821181818182,13904.700681818182,220 -llama2-7b,32,16,realistic,50,90.659,11.325923076923075,0.008153846153846154,9698.678923076923,1211.4595384615384,4.123307692307692,0.0,40.39353846153847,1222.7854615384615,9789.337923076922,130 -llama2-7b,64,64,none,50,71.72811594202898,8.960942028985507,0.007681159420289856,9232.708260869565,1152.9444927536233,3.9683333333333333,0.0,38.136014492753624,1161.9054347826088,9304.436376811595,138 -mistral-7b,64,16,realistic,100,81.01453125,10.11390625,0.0103125,8418.156015625,1050.5065625000002,3.80703125,0.0,33.865078125000004,1060.62046875,8499.170546875,128 -llama3.1-8b,64,64,realistic,150,72.75950704225352,9.080281690140845,0.010140845070422535,8725.644225352113,1088.5183098591547,3.46887323943662,0.0,31.49598591549296,1097.5985915492959,8798.403732394367,142 -llama3.1-70b,4,16,none,50,21689.15329113924,2710.0866455696205,0.1856962025316455,8145.8010126582285,1016.6701265822785,6.9113924050632916,0.0,68.45354430379747,3726.756772151899,29834.95430379747,158 -llama3.1-70b,8,4,none,30,13619.436315789473,1701.9324812030077,0.06315789473684211,8468.61789473684,1057.3677443609024,3.6929323308270674,0.0,38.348496240601506,2759.30022556391,22088.054210526316,133 -llama3.1-70b,16,0,none,70,80.80773584905661,10.08805031446541,0.007358490566037735,8894.24389937107,1110.489496855346,4.302012578616352,0.0,40.8840251572327,1120.5775471698114,8975.051635220125,159 -llama3.1-8b,4,8,none,100,434.43156028368793,54.246099290780144,0.014822695035460992,8851.211985815604,1104.5366666666669,3.948723404255319,0.0,36.01333333333333,1158.7827659574468,9285.64354609929,141 -llama3.1-70b,8,0,none,30,8447.313288590603,1055.5673154362419,0.040536912751677846,8878.530604026846,1108.5375838926175,3.829731543624161,0.0,40.15879194630873,2164.104899328859,17325.84389261745,149 -llama3.1-70b,16,64,realistic,50,1532.0629487179488,191.44820512820513,0.011538461538461539,9168.209102564104,1144.6476282051283,4.1860897435897435,0.0,41.94628205128205,1336.0958333333333,10700.27205128205,156 -llama2-7b,0,4,none,200,57330.810259259255,7160.573296296297,1.295185185185185,8360.410851851851,1036.871222222222,13.684740740740741,0.0,261.1437037037037,8197.44451851852,65691.22111111111,270 -llama2-7b,8,0,realistic,150,12604.069921259841,1575.0470078740157,0.05811023622047245,9523.344803149606,1189.2688188976379,4.757874015748032,0.0,52.336141732283465,2764.3158267716544,22127.41472440945,127 -llama2-7b,64,32,realistic,150,50.87231155778895,6.355427135678392,0.005326633165829146,8722.105477386935,1089.3592462311558,4.3342211055276385,0.0,40.318743718592955,1095.7146733668344,8772.977788944723,199 -mistral-7b,32,2,none,50,93.8135,11.71175,0.011000000000000001,8769.966583333333,1094.1546666666666,3.8568333333333333,0.0,35.007,1105.866416666667,8863.780083333333,120 -llama3.1-8b,0,0,realistic,150,67451.93000000001,8408.995471698116,1.7383647798742137,8282.086352201257,1024.4915094339622,5.638867924528301,0.0,197.13308176100628,9433.486981132077,75734.01635220126,159 -llama3.1-8b,4,2,none,200,92.21796052631578,11.506184210526316,0.010394736842105264,8931.298289473683,1114.3832894736843,3.640197368421053,0.0,33.260131578947366,1125.8894736842105,9023.51625,152 -llama3.1-70b,16,16,none,60,85.3286301369863,10.652397260273972,0.008013698630136986,9246.76794520548,1154.564520547945,4.222397260273973,0.0,41.28020547945206,1165.216917808219,9332.096575342466,146 -llama2-7b,16,64,none,200,501.6437554585153,62.68842794759825,0.006812227074235808,8537.356681222707,1066.1292576419214,4.410349344978166,0.0,39.66074235807859,1128.8176855895197,9039.000436681223,229 -llama2-7b,64,0,none,100,75.8296551724138,9.469586206896551,0.009586206896551725,10271.108827586208,1282.7317931034481,4.216551724137931,0.0,47.64131034482759,1292.201379310345,10346.938482758622,145 -mistral-7b,32,2,realistic,100,91.8348780487805,11.464715447154472,0.010731707317073172,8608.7,1074.230894308943,3.8705691056910565,0.0,34.118455284552844,1085.6956097560974,8700.53487804878,123 -mistral-7b,64,0,realistic,50,94.98716814159292,11.858230088495576,0.011681415929203541,8770.173362831858,1093.948053097345,3.618053097345133,0.0,33.79938053097345,1105.8062831858408,8865.160530973451,113 -llama2-7b,0,4,none,50,47236.761204188486,5900.5065445026175,1.0826701570680626,7178.875968586388,891.459947643979,22.902565445026173,0.0,306.4117277486911,6791.966492146597,54415.63717277486,191 -llama2-7b,16,64,none,100,404.1495321637427,50.495204678362576,0.008596491228070175,9122.164269005847,1139.3611111111113,4.294619883040936,0.0,40.58777777777777,1189.856315789474,9526.31380116959,171 -llama2-7b,64,8,realistic,200,46.725625,5.837366071428571,0.0047321428571428575,9129.801696428573,1140.2706249999999,4.2395982142857145,0.0,39.16383928571429,1146.1079910714286,9176.527321428572,224 -mistral-7b,32,4,none,200,74.68469798657718,9.323691275167786,0.008859060402684565,8854.363959731543,1104.7404697986576,3.6998657718120804,0.0,33.5593288590604,1114.0641610738255,8929.04865771812,149 -llama3.1-8b,64,2,none,100,86.3630894308943,10.777967479674798,0.011707317073170732,8248.73650406504,1029.1874796747968,3.898617886178862,0.0,33.46902439024391,1039.9654471544716,8335.099593495936,123 -llama3.1-8b,64,4,none,200,74.78070921985815,9.332553191489362,0.010212765957446808,8668.323617021275,1081.4207092198583,3.5829078014184392,0.0,32.18964539007093,1090.7532624113476,8743.104326241135,141 -llama3.1-70b,4,16,realistic,40,29376.50013333333,3670.922133333334,0.2748,8183.232333333334,1019.6297333333334,10.2332,0.0,91.19233333333332,4690.551866666667,37559.73246666667,150 -llama3.1-70b,32,16,none,30,84.39007142857143,10.535285714285715,0.008357142857142856,8947.473285714286,1117.1368571428573,4.035785714285714,0.0,38.76421428571428,1127.672142857143,9031.863357142858,140 -llama2-7b,0,4,none,150,59922.45070175439,7484.628859649122,1.3439035087719298,8705.946666666667,1078.7231140350877,12.968114035087718,0.0,261.0485087719298,8563.35197368421,68628.39736842105,228 -llama2-7b,32,16,none,200,51.263893805309735,6.404336283185841,0.004690265486725664,9222.433053097346,1151.948716814159,4.207522123893805,0.0,40.01137168141593,1158.353053097345,9273.696946902655,226 -mistral-7b,16,0,none,50,84.24222222222221,10.516805555555557,0.009166666666666667,9066.213541666666,1131.1575,3.7508333333333326,0.0,36.065416666666664,1141.6743055555555,9150.455763888887,144 -llama3.1-8b,16,4,realistic,50,93.6091935483871,11.682338709677419,0.01161290322580645,8697.997258064517,1085.1962903225804,3.983145161290322,0.0,36.35379032258064,1096.8786290322578,8791.606451612903,124 -llama3.1-70b,4,0,none,10,19921.657674418602,2489.3871317829457,0.08015503875968992,7961.963798449613,993.9943410852712,3.423720930232558,0.0,35.888294573643414,3483.381472868217,27883.621472868217,129 -llama3.1-70b,32,0,realistic,50,84.23151724137931,10.515448275862068,0.008068965517241379,9415.771862068965,1175.569655172414,4.143586206896551,0.0,41.73172413793104,1186.085103448276,9500.003379310345,145 -llama2-7b,16,8,realistic,100,128.34676829268295,16.029695121951217,0.007621951219512195,9976.07012195122,1246.317682926829,4.3828048780487805,0.0,43.86810975609755,1262.3473780487802,10104.416890243903,164 -llama2-7b,32,0,realistic,150,94.47118421052632,11.795657894736843,0.009144736842105265,10409.387631578948,1296.7926315789475,4.556776315789474,0.0,50.30078947368421,1308.5882894736842,10503.858815789474,152 -llama2-7b,64,16,none,50,84.72454545454545,10.584545454545454,0.008760330578512398,9604.918842975208,1199.7804958677686,3.9718181818181817,0.0,39.69743801652892,1210.365041322314,9689.643388429753,121 -mistral-7b,0,64,realistic,150,69316.34993589744,8641.764487179487,1.7912820512820515,8379.530769230769,1035.6273076923076,5.837692307692308,0.0,204.75974358974358,9677.391794871795,77695.8807051282,156 -mistral-7b,16,32,none,200,74.45387096774193,9.29483870967742,0.008516129032258065,9062.755161290323,1130.805548387097,3.6114193548387097,0.0,33.91529032258065,1140.1003870967743,9137.209032258064,155 -llama3.1-8b,4,64,realistic,200,3625.2150292397664,452.8067251461988,0.02391812865497076,8717.05192982456,1087.6401169590645,3.7844444444444445,0.0,35.396432748538004,1540.4468421052634,12342.266959064327,171 -llama3.1-70b,4,0,realistic,60,13605.855843373492,1699.982168674699,0.08349397590361446,7316.197048192771,913.1773493975903,4.484939759036145,0.0,42.94385542168674,2613.159518072289,20922.052891566265,166 -llama3.1-70b,16,16,none,50,88.40687943262412,11.036666666666667,0.008297872340425531,9420.511914893617,1176.2980851063828,4.375319148936171,0.0,42.33297872340425,1187.3347517730497,9508.918794326242,141 -llama3.1-70b,16,64,realistic,60,73.69808383233533,9.200479041916168,0.007005988023952095,8987.002874251497,1122.0832335329342,4.008023952095808,0.0,39.43682634730539,1131.2837125748504,9060.700958083833,167 -llama2-7b,4,32,none,100,29990.46252631579,3747.6141578947377,0.3185789473684211,7922.705421052632,984.1362631578949,15.226263157894735,0.0,117.47826315789473,4731.750421052632,37913.16794736842,190 -mistral-7b,8,4,realistic,150,90.6227659574468,11.31340425531915,0.009361702127659575,9004.718794326242,1123.5532624113473,3.7238297872340422,0.0,34.64255319148936,1134.8666666666666,9095.341560283689,141 -llama3.1-8b,4,4,none,100,242.90063829787232,30.32673758865248,0.01397163120567376,8792.502978723403,1097.2035460992909,4.058510638297872,0.0,36.19035460992908,1127.5302836879434,9035.403617021278,141 -llama3.1-70b,0,32,none,40,46088.00473684211,5756.121403508772,1.0432748538011694,5982.912456140351,742.1861988304094,22.737192982456136,0.0,257.56099415204676,6498.307602339182,52070.91719298246,171 -llama3.1-70b,4,0,realistic,30,29022.86573333333,3626.4940000000006,0.20106666666666664,6918.000666666667,863.5783999999999,7.359333333333334,0.0,73.70393333333332,4490.072400000001,35940.8664,150 -llama3.1-70b,4,4,realistic,40,32237.74719178082,4028.4395205479454,0.16123287671232875,7842.098767123287,978.5684246575341,6.437876712328767,0.0,66.50397260273972,5007.007945205479,40079.84595890411,146 -llama3.1-70b,8,16,realistic,50,9998.826,1249.4855333333337,0.04040000000000001,8401.791933333334,1048.9982666666665,4.170133333333333,0.0,40.86686666666667,2298.4838000000004,18400.61793333333,150 -mistral-7b,16,2,none,200,80.34221476510066,10.02993288590604,0.008859060402684565,8948.498120805369,1116.5509395973156,3.6856375838926163,0.0,33.819395973154364,1126.5808724832216,9028.84033557047,149 -llama3.1-70b,4,2,realistic,30,24080.112932330827,3009.093984962406,0.08127819548872182,7668.94105263158,956.87007518797,3.734360902255639,0.0,39.46496240601504,3965.9640601503756,31749.053984962404,133 -llama3.1-70b,4,32,realistic,70,10550.523782051281,1318.2557692307694,0.04724358974358974,7495.852884615385,935.4974358974357,3.9751923076923075,0.0,35.744423076923084,2253.753205128205,18046.376666666667,156 -llama2-7b,8,8,realistic,50,14849.880263157893,1855.7880921052633,0.07322368421052632,8661.466644736844,1081.3932894736845,3.7048684210526317,0.0,41.74572368421053,2937.1813815789474,23511.34690789474,152 -llama2-7b,32,0,none,200,552.6995833333334,68.98875,0.050416666666666665,3582.9625,442.9804166666666,1.1991666666666667,0.0,3.6766666666666663,511.9691666666666,4135.662083333334,24 -llama3.1-8b,8,2,realistic,50,98.96040322580646,12.350161290322582,0.01161290322580645,8777.103467741936,1094.8024193548388,3.9383064516129034,0.0,35.7325,1107.1525806451614,8876.06387096774,124 -llama3.1-8b,64,8,realistic,50,100.34095238095237,12.522476190476189,0.013714285714285714,8576.860476190475,1069.8737142857142,3.7579999999999996,0.0,34.583619047619045,1082.3961904761904,8677.201428571429,105 -llama3.1-70b,4,8,realistic,60,25509.192980132448,3187.5323178807953,0.11708609271523177,8185.25701986755,1021.640463576159,5.279403973509934,0.0,53.65933774834438,4209.172781456954,33694.45,151 -llama3.1-70b,8,2,none,30,12048.659545454546,1505.6663636363637,0.06583333333333334,8312.912272727272,1037.935909090909,3.693257575757576,0.0,39.31401515151514,2543.6022727272725,20361.57181818182,132 -llama3.1-70b,16,32,none,20,942.8475,117.81676470588235,0.011470588235294118,9527.327794117647,1189.7327941176468,4.111838235294118,0.0,42.580588235294115,1307.5495588235294,10470.175294117646,136 -llama3.1-70b,32,8,realistic,10,142.39785714285713,17.776904761904763,0.013928571428571427,9299.708571428571,1161.185,3.383809523809524,0.0,35.44571428571429,1178.9619047619046,9442.106428571427,84 -mistral-7b,4,2,realistic,50,1500.2720634920636,187.3830158730159,0.026269841269841273,8637.26880952381,1077.7907936507938,3.964126984126984,0.0,36.216587301587296,1265.1738095238095,10137.540873015872,126 -mistral-7b,32,16,realistic,50,91.08115702479338,11.370661157024793,0.01090909090909091,8784.157520661158,1095.8769421487605,3.8159504132231405,0.0,35.18413223140496,1107.247603305785,8875.23867768595,121 -llama3.1-70b,0,32,none,70,55992.747472527466,6992.601483516483,1.2910989010989011,6766.607747252747,838.2861538461538,8.31258241758242,0.0,177.60395604395603,7830.887637362637,62759.35521978022,182 -llama3.1-70b,8,64,none,60,6380.274104046242,797.2431791907514,0.028381502890173407,8428.156358381502,1052.2260115606935,4.055722543352601,0.0,39.79028901734104,1849.469190751445,14808.430462427745,173 -llama3.1-70b,16,4,realistic,30,107.84703389830509,13.46364406779661,0.009915254237288135,9936.496440677967,1240.7506779661016,4.177033898305085,0.0,42.91059322033898,1254.2143220338983,10044.343474576272,118 -mistral-7b,8,4,none,200,91.76381578947368,11.453092105263158,0.009605263157894737,9071.45730263158,1131.9609868421053,3.665394736842105,0.0,34.279210526315794,1143.4140789473684,9163.221118421052,152 -llama3.1-8b,0,8,none,100,65077.63961538462,8115.101282051282,1.6146153846153846,7228.83826923077,893.5857692307693,8.285897435897436,0.0,198.60839743589744,9008.687051282051,72306.4778846154,156 -llama3.1-70b,0,2,none,10,40064.56888059702,5005.5354477611945,0.25828358208955227,4630.772611940299,575.1664179104478,3.050970149253731,0.0,41.40171641791045,5580.701865671642,44695.341492537314,134 -llama3.1-70b,8,8,none,30,5551.87992063492,693.7399206349207,0.040952380952380955,9080.168253968253,1133.5646825396825,3.9988095238095234,0.0,41.32650793650794,1827.3046031746032,14632.048174603175,126 -llama3.1-70b,32,4,realistic,70,87.04449275362319,10.866666666666665,0.008478260869565216,9134.433840579712,1140.4757246376814,4.1182608695652165,0.0,39.76079710144927,1151.3423913043478,9221.478333333333,138 -llama3.1-70b,32,32,none,50,78.58691275167786,9.810805369127516,0.00785234899328859,8551.963020134228,1067.723489932886,4.163758389261745,0.0,38.80818791946309,1077.5342953020136,8630.549932885906,149 -llama2-7b,0,64,none,200,51023.93116352201,6373.734213836478,1.3188993710691823,6660.43179245283,817.5661320754718,1.872861635220126,0.0,146.70157232704403,7191.300345911949,57684.36295597485,318 -llama2-7b,8,8,none,150,4965.022954545455,620.453,0.028909090909090912,8562.949,1069.4523181818183,3.938409090909091,0.0,36.94636363636364,1689.9053181818178,13527.971954545455,220 -mistral-7b,16,0,realistic,150,75.54142857142857,9.430621118012422,0.008198757763975155,8998.446832298136,1122.9178260869564,3.661614906832298,0.0,35.02018633540373,1132.3484472049688,9073.988260869564,161 -llama3.1-8b,4,64,realistic,50,3141.032463768116,392.29123188405794,0.04021739130434783,8664.107391304347,1080.9571739130436,3.7099999999999995,0.0,34.78021739130435,1473.2484057971017,11805.139855072464,138 -llama3.1-8b,16,64,none,50,79.34408450704225,9.902042253521126,0.010140845070422535,8976.799225352113,1119.9179577464788,3.8083802816901415,0.0,35.98267605633803,1129.82,9056.143309859155,142 -llama3.1-70b,8,0,none,40,12487.234320987654,1560.44987654321,0.057407407407407414,8512.550246913579,1062.5733950617282,4.159876543209877,0.0,42.95469135802469,2623.0232716049386,20999.78456790123,162 -llama2-7b,16,32,none,100,665.301705882353,83.1315294117647,0.009823529411764705,9277.73094117647,1158.8070588235296,3.6570000000000005,0.0,35.782823529411765,1241.938588235294,9943.032647058824,170 -llama3.1-8b,8,32,none,100,80.8708843537415,10.092585034013604,0.009795918367346938,9016.410612244897,1125.2076190476191,3.9411564625850333,0.0,36.565238095238094,1135.3002040816327,9097.281496598638,147 -llama3.1-8b,16,2,none,50,93.15648,11.62584,0.011519999999999999,8850.08368,1104.0875200000003,3.84512,0.0,35.105599999999995,1115.7133600000004,8943.24016,125 -llama3.1-8b,32,0,realistic,150,72.33174193548388,9.026903225806452,0.00929032258064516,8644.414516129033,1078.5003225806452,3.5361290322580654,0.0,32.28954838709677,1087.5272258064517,8716.746258064515,155 -llama3.1-8b,32,32,none,100,77.36507142857143,9.655071428571429,0.010285714285714285,9092.962285714286,1134.6057857142857,3.7965714285714283,0.0,35.59535714285714,1144.2608571428573,9170.327357142858,140 -llama3.1-8b,64,0,realistic,150,75.47922535211266,9.419718309859155,0.010140845070422535,8649.394225352113,1078.888943661972,3.319788732394366,0.0,30.302816901408452,1088.308661971831,8724.873450704226,142 -llama3.1-70b,4,0,realistic,10,14629.87214876033,1828.1196694214875,0.05933884297520662,7209.791818181819,900.176694214876,2.4324793388429753,0.0,26.46297520661157,2728.2963636363634,21839.66396694215,121 -llama3.1-70b,4,0,none,60,16420.301229050277,2051.59061452514,0.11128491620111733,7737.701452513967,965.8279888268156,5.092513966480448,0.0,49.669776536312845,3017.4186033519554,24158.00268156425,179 -llama3.1-70b,4,16,none,10,34182.518947368415,4271.434511278196,0.1718045112781955,6572.921278195489,819.9228571428571,3.815714285714286,0.0,48.074887218045106,5091.357368421053,40755.44022556391,133 -mistral-7b,0,4,realistic,150,67161.14189873417,8371.64430379747,1.6586708860759494,8400.532721518986,1037.6220886075948,8.972151898734177,0.0,223.868417721519,9409.266392405065,75561.67462025316,158 -llama2-7b,32,4,none,200,52.712139737991265,6.583318777292576,0.00462882096069869,9714.558515283843,1213.3273362445414,4.11235807860262,0.0,40.83065502183406,1219.9106550218341,9767.270655021834,229 -llama2-7b,8,32,none,200,3560.1723305084747,444.85398305084755,0.026440677966101694,8399.852288135593,1048.852754237288,4.165805084745763,0.0,37.245805084745754,1493.7067372881359,11960.024618644065,236 -llama3.1-70b,4,8,none,60,5810.16972027972,725.9281818181818,0.03993006993006993,8710.067902097902,1087.382097902098,4.341398601398602,0.0,40.06601398601399,1813.3102797202796,14520.237622377623,143 -llama3.1-70b,8,2,realistic,40,7674.005970149254,958.9968656716419,0.05201492537313432,8586.769402985075,1072.0994029850747,3.933432835820896,0.0,37.9744776119403,2031.0962686567161,16260.775373134327,134 -llama3.1-70b,8,4,realistic,40,8321.865220588235,1039.915,0.05102941176470589,8640.705367647059,1078.8399264705881,3.842132352941176,0.0,37.25507352941177,2118.7549264705885,16962.570588235296,136 -llama3.1-70b,16,2,none,30,107.35756302521008,13.402521008403362,0.009831932773109243,9783.53411764706,1221.625966386555,4.0598319327731085,0.0,40.944873949579836,1235.0284873949583,9890.891680672268,119 -llama3.1-70b,32,16,none,10,111.57028301886793,13.928396226415096,0.011037735849056603,10286.500471698113,1284.7422641509434,4.38877358490566,0.0,47.153867924528306,1298.6706603773584,10398.07075471698,106 -llama2-7b,4,2,realistic,200,15186.138257261411,1897.7062655601662,0.06439834024896265,7797.14601659751,973.8812033195022,4.124813278008299,0.0,36.58730290456432,2871.5874688796684,22983.28427385892,241 -llama2-7b,8,64,realistic,100,8864.96695652174,1107.7879891304349,0.028586956521739128,8435.694782608694,1053.3089673913043,4.034673913043478,0.0,40.31157608695653,2161.096956521739,17300.661739130435,184 -mistral-7b,0,32,none,200,74958.43547058824,9344.739588235294,2.411882352941176,9076.488764705882,1120.6688235294118,5.7084117647058825,0.0,277.75676470588235,10465.408411764705,84034.92423529412,170 -mistral-7b,8,64,none,200,3315.4583636363636,414.1004242424243,0.030060606060606062,9144.333878787878,1140.931393939394,3.9810303030303036,0.0,39.106909090909085,1555.0318181818182,12459.792242424242,165 -mistral-7b,64,32,none,100,76.44305970149254,9.54320895522388,0.009850746268656717,8680.64723880597,1083.1723880597012,3.789626865671642,0.0,34.70425373134329,1092.7155970149252,8757.090298507463,134 -llama3.1-8b,4,0,realistic,100,5256.685031055901,656.486645962733,0.06260869565217392,8613.25751552795,1074.9471428571428,4.029068322981367,0.0,37.56521739130435,1731.4337888198759,13869.94254658385,161 -llama3.1-8b,64,64,none,200,66.39593548387097,8.286129032258064,0.00929032258064516,8410.208451612903,1049.097935483871,3.360322580645162,0.0,30.439225806451613,1057.384064516129,8476.604387096773,155 -llama3.1-70b,0,2,realistic,30,43592.39328947368,5444.40052631579,0.6452631578947369,5412.492105263157,672.1003947368421,20.048223684210527,0.0,197.45677631578948,6116.5009210526305,49004.88539473684,152 -llama3.1-70b,0,8,none,30,44016.88537974683,5497.2850632911395,0.8413291139240506,5380.789493670886,667.3323417721518,23.091202531645568,0.0,221.6575316455696,6164.617405063292,49397.67487341772,158 -mistral-7b,0,2,realistic,150,59115.598980891715,7368.643821656052,1.3788535031847136,7858.628789808917,972.1666878980891,16.675222929936304,0.0,245.57254777070062,8340.81050955414,66974.22777070063,157 -llama3.1-8b,0,16,realistic,150,68807.32702531645,8576.820379746836,1.7495569620253162,8597.194367088607,1061.5874683544305,6.56493670886076,0.0,213.69632911392407,9638.407848101268,77404.52139240506,158 -llama3.1-8b,8,2,realistic,200,102.0991724137931,12.741103448275862,0.012,8890.225379310345,1109.1102068965517,3.77703448275862,0.0,34.25179310344828,1121.8513103448277,8992.324551724138,145 -llama3.1-70b,16,4,realistic,50,96.30916666666667,12.023257575757576,0.008863636363636363,9308.809393939395,1162.3016666666665,4.246060606060606,0.0,41.15575757575757,1174.324924242424,9405.11856060606,132 -llama3.1-70b,32,64,realistic,10,151.8022077922078,18.95103896103896,0.015194805194805193,10667.496103896103,1332.2303896103901,3.478831168831168,0.0,39.38857142857144,1351.1814285714288,10819.298311688312,77 -llama3.1-8b,0,4,none,150,71614.62550898205,8927.273952095808,1.8763473053892217,9014.535808383234,1112.29125748503,8.395269461077845,0.0,248.63532934131734,10039.565209580838,80629.16131736527,167 -llama3.1-8b,4,32,none,150,669.4886503067485,83.60736196319019,0.014355828220858895,8717.103619631902,1087.7687116564416,3.777852760736196,0.0,33.87006134969325,1171.3760736196318,9386.59226993865,163 -llama3.1-8b,8,2,none,50,94.79279069767442,11.83,0.011162790697674419,8657.196899224806,1080.085503875969,4.076046511627907,0.0,36.51984496124031,1091.9155038759689,8751.98968992248,129 -llama3.1-8b,16,16,realistic,100,85.43619402985075,10.662313432835822,0.010746268656716417,8863.77432835821,1106.0741791044775,3.7970895522388055,0.0,35.0736567164179,1116.7364925373133,8949.210522388059,134 -llama3.1-8b,64,0,none,100,73.2227397260274,9.138082191780823,0.009863013698630137,8460.126232876712,1055.4622602739728,3.493013698630137,0.0,31.37321917808219,1064.6003424657536,8533.34897260274,146 -llama3.1-70b,0,64,none,30,44428.23447204969,5548.728944099379,0.853416149068323,5186.821428571428,643.1818633540372,22.192732919254656,0.0,216.19161490683234,6191.910807453415,49615.05590062112,161 -llama3.1-70b,8,64,realistic,10,6869.647387387387,858.4490090090093,0.04486486486486486,7864.673603603604,981.7275675675676,2.901711711711712,0.0,30.46576576576576,1840.1765765765767,14734.32099099099,111 -llama3.1-70b,32,0,none,20,96.69531746031747,12.071428571428571,0.009285714285714284,9284.820634920634,1159.3450793650795,3.760793650793652,0.0,39.97492063492063,1171.4165079365082,9381.515952380953,126 -llama2-7b,32,8,none,50,96.0183064516129,11.995483870967742,0.008548387096774194,9690.099435483871,1210.447016129032,3.897096774193548,0.0,39.57709677419355,1222.4424999999994,9786.117741935484,124 -mistral-7b,16,64,none,150,73.37038461538462,9.159615384615385,0.008461538461538461,9023.48044871795,1126.0609615384617,3.7691666666666666,0.0,35.56442307692308,1135.2205769230768,9096.850833333334,156 -llama3.1-8b,0,16,realistic,200,72233.89514450867,9004.431445086706,2.0027167630057803,8886.668901734103,1096.0220231213873,5.604682080924856,0.0,245.38219653179192,10100.45346820809,81120.56404624277,173 -llama3.1-8b,32,0,realistic,200,64.74508670520231,8.080115606936415,0.008323699421965317,8680.259537572254,1082.8096531791907,3.533352601156069,0.0,32.575606936416186,1090.8897687861272,8745.004624277457,173 -llama3.1-70b,0,2,realistic,70,47981.933988439305,5991.547283236994,0.9879768786127167,6437.7727167630055,799.2575722543353,24.35121387283237,0.0,266.0164161849711,6790.804855491329,54419.70670520231,173 -llama3.1-70b,4,2,none,50,16937.809492753622,2116.5189855072463,0.08594202898550725,7994.577536231884,997.7504347826086,4.100942028985507,0.0,39.41152173913044,3114.2694202898556,24932.387028985508,138 -llama2-7b,16,32,realistic,50,2777.1444295302013,347.0597315436242,0.018389261744966443,8581.106979865772,1071.7444295302014,3.7291275167785236,0.0,36.409194630872484,1418.8041610738255,11358.251409395973,149 -llama2-7b,4,4,none,50,16989.824533333336,2123.2146666666667,0.07033333333333334,9128.770733333333,1140.197,4.430133333333333,0.0,49.3562,3263.411666666667,26118.595266666667,150 -llama2-7b,8,8,realistic,150,5224.496175115207,652.8284331797236,0.040645161290322585,8898.442304147466,1111.402857142857,4.155990783410138,0.0,40.18516129032258,1764.231290322581,14122.938479262672,217 -llama2-7b,32,0,none,50,87.30848275862068,10.904896551724137,0.007310344827586207,9275.837379310344,1158.2667586206896,4.043310344827585,0.0,41.850482758620686,1169.171655172414,9363.145862068965,145 -mistral-7b,32,4,none,150,76.82944827586208,9.591448275862069,0.00910344827586207,8563.354000000001,1068.5533793103448,3.7339999999999995,0.0,32.806275862068965,1078.144827586207,8640.183448275862,145 -mistral-7b,32,16,none,50,87.0831746031746,10.871507936507935,0.010476190476190477,8912.944920634922,1111.9907142857144,3.822460317460318,0.0,35.855317460317465,1122.8622222222225,9000.028095238096,126 -llama3.1-8b,0,2,realistic,150,62907.45280254777,7841.54184713376,1.4735668789808918,8009.6256687898085,990.8893630573249,12.910573248407644,0.0,226.34573248407642,8832.431210191084,70917.07847133758,157 -llama3.1-8b,16,2,none,200,76.4316447368421,9.538552631578947,0.009473684210526315,8615.05480263158,1074.8788815789474,3.6550657894736838,0.0,32.12052631578947,1084.4174342105262,8691.486447368421,152 -llama3.1-8b,32,8,none,200,72.05756578947368,8.992697368421053,0.009473684210526315,8846.615789473684,1103.7246710526315,3.424342105263158,0.0,30.872631578947363,1112.7173684210525,8918.673355263158,152 -llama3.1-8b,32,64,realistic,100,72.94966216216216,9.104054054054055,0.00972972972972973,8704.406486486487,1086.0794594594595,3.6912162162162168,0.0,33.84972972972973,1095.1835135135136,8777.356148648649,148 -llama3.1-70b,0,8,realistic,50,47130.28258823529,5885.814176470588,1.0881176470588234,6179.510705882353,766.2292352941176,23.116235294117647,0.0,263.3725882352941,6652.0434117647055,53309.79329411764,170 -llama3.1-70b,4,4,none,70,18590.6952739726,2322.815273972603,0.09171232876712329,8447.718630136986,1054.4935616438356,4.783630136986302,0.0,45.81938356164384,3377.3088356164385,27038.413904109588,146 -llama3.1-70b,4,32,none,10,31733.024503816792,3965.3183969465654,0.1593129770992366,6740.567251908397,841.1186259541985,3.5309160305343514,0.0,46.836717557251916,4806.437022900764,38473.59175572519,131 -llama2-7b,64,0,realistic,50,84.70068702290077,10.581603053435115,0.008091603053435115,9404.29465648855,1174.4578625954198,3.9038931297709922,0.0,39.11641221374046,1185.0394656488547,9488.995343511451,131 -llama2-7b,32,64,none,50,81.79985401459854,10.21919708029197,0.007737226277372263,9666.796715328466,1207.372700729927,4.019124087591241,0.0,40.39102189781022,1217.591897810219,9748.596569343066,137 -llama2-7b,16,2,none,50,779.4932231404958,97.41280991735537,0.011239669421487604,10604.332975206611,1324.720165289256,3.6871074380165294,0.0,39.04322314049586,1422.1329752066115,11383.826198347107,121 -llama3.1-8b,0,16,none,100,64560.16743749999,8050.8635625,1.6056874999999997,7417.24675,916.9279374999999,7.4953125,0.0,193.39968749999997,8967.7915,71977.41418749999,160 -llama3.1-70b,8,16,realistic,10,8298.665545454545,1037.0065454545456,0.035272727272727275,7656.503727272728,956.0589090909092,2.9939999999999998,0.0,28.01472727272727,1993.0654545454545,15955.169272727273,110 -llama2-7b,4,0,realistic,50,28034.292947368423,3503.113263157895,0.5668947368421053,8255.739210526315,1025.6802631578948,5.755,0.0,98.67210526315787,4528.79352631579,36290.03215789473,190 -llama2-7b,4,2,none,50,27966.75448717949,3495.025,0.12153846153846154,7814.598333333332,975.950641025641,4.641217948717949,0.0,49.69115384615385,4470.975641025641,35781.352820512824,156 -mistral-7b,32,32,none,50,81.17776119402986,10.134253731343284,0.009850746268656717,8840.119328358209,1102.848358208955,3.7623880597014923,0.0,35.04619402985075,1112.9826119402983,8921.29708955224,134 -llama3.1-8b,32,8,none,100,82.47654135338345,10.293007518796992,0.010827067669172932,8843.00932330827,1103.4920300751883,3.9975939849624065,0.0,36.13225563909774,1113.7850375939852,8925.485864661656,133 -llama3.1-70b,8,32,none,70,1948.101633986928,243.41176470588235,0.01457516339869281,9024.4345751634,1126.6605882352942,4.330261437908496,0.0,42.59803921568628,1370.0723529411764,10972.536209150327,153 -llama3.1-70b,16,64,realistic,70,74.99469512195121,9.362317073170733,0.007134146341463414,9167.97798780488,1144.7194512195122,4.31939024390244,0.0,42.309390243902435,1154.081768292683,9242.97268292683,164 -llama2-7b,4,2,realistic,150,15063.438403755868,1882.3916901408454,0.06441314553990611,7694.50192488263,961.0258685446009,4.023333333333333,0.0,35.385399061032864,2843.4175586854462,22757.9403286385,213 -llama2-7b,4,4,realistic,200,9252.162051282052,1156.1682051282053,0.04901709401709402,8480.752094017094,1059.2118803418805,4.22525641025641,0.0,39.4107264957265,2215.3800854700858,17732.914145299146,234 -llama2-7b,4,16,none,150,30922.67763948498,3864.1702575107292,0.22326180257510728,7659.512703862661,955.8748497854078,8.486652360515022,0.0,71.71948497854078,4820.045107296138,38582.19034334764,233 -mistral-7b,4,16,none,200,305.49483660130716,38.133790849673204,0.013856209150326797,9041.659477124182,1128.3103267973854,3.7937254901960786,0.0,35.27928104575163,1166.4441176470589,9347.154313725488,153 -mistral-7b,16,64,none,100,73.93206451612903,9.22974193548387,0.008516129032258065,8885.769741935484,1108.9605161290322,3.940516129032258,0.0,36.715548387096774,1118.190258064516,8959.701806451612,155 -mistral-7b,64,16,none,200,72.07629370629371,8.998041958041957,0.009230769230769232,8670.674335664335,1081.9243356643356,3.7023076923076927,0.0,33.21202797202797,1090.9223776223773,8742.750629370628,143 -llama3.1-70b,0,8,realistic,10,36487.13408759124,4558.16496350365,0.15948905109489053,4072.2096350364964,505.7739416058394,1.5035036496350365,0.0,21.455985401459852,5063.9389051094895,40559.34372262774,137 -llama3.1-70b,8,32,realistic,10,8361.005950413222,1044.7628099173553,0.04413223140495868,7287.26785123967,909.9391735537191,2.866115702479339,0.0,28.002561983471075,1954.7019834710745,15648.273801652891,121 -mistral-7b,4,16,realistic,150,1456.9201333333333,181.964,0.017866666666666666,8899.314,1110.5299333333335,3.706066666666667,0.0,34.76246666666666,1292.4939333333334,10356.234133333333,150 -llama2-7b,16,2,none,100,88.14146341463415,11.011402439024389,0.006463414634146342,10419.648353658537,1301.7871951219513,4.2627439024390235,0.0,45.60030487804878,1312.7985975609758,10507.789817073171,164 -llama2-7b,8,2,realistic,50,17350.65414473684,2168.3357894736846,0.06217105263157894,7578.048289473683,946.3103289473685,3.7244078947368413,0.0,36.47065789473684,3114.6461184210525,24928.702434210525,152 -llama3.1-70b,4,8,realistic,20,32443.850000000002,4054.003059701493,0.158955223880597,7918.017985074628,987.8494776119404,5.4040298507462685,0.0,63.410522388059704,5041.852537313433,40361.867985074634,134 -llama3.1-70b,8,0,realistic,50,9264.231728395062,1157.6475925925927,0.0537037037037037,8547.971111111112,1066.7469135802469,4.2078395061728395,0.0,44.42895061728395,2224.3945061728396,17812.20283950617,162 -mistral-7b,32,16,realistic,200,75.75931034482758,9.457862068965518,0.00910344827586207,8948.628137931035,1116.5499310344826,3.569241379310345,0.0,32.79420689655172,1126.0077931034484,9024.387448275862,145 -llama3.1-8b,8,32,realistic,50,94.06272727272727,11.734924242424244,0.01196969696969697,9048.155984848485,1128.8335606060607,3.8701515151515147,0.0,36.55469696969696,1140.568484848485,9142.218712121214,132 -llama3.1-8b,8,64,realistic,200,548.9706432748537,68.5672514619883,0.012631578947368423,8710.079005847952,1086.8659649122806,3.8249707602339176,0.0,34.95374269005848,1155.433216374269,9259.049649122808,171 -llama3.1-8b,8,64,none,100,76.64292207792208,9.564935064935066,0.00935064935064935,8950.931493506494,1117.013116883117,3.8672727272727276,0.0,36.02227272727273,1126.578051948052,9027.574415584415,154 -llama3.1-8b,32,2,none,50,91.85148760330578,11.46297520661157,0.011900826446280991,8756.649834710743,1092.3852066115703,3.7639669421487607,0.0,33.95429752066116,1103.848181818182,8848.50132231405,121 -mistral-7b,8,0,none,50,87.84668918918919,10.966824324324325,0.00891891891891892,9017.822837837837,1125.155945945946,3.8398648648648646,0.0,36.381351351351356,1136.1227702702704,9105.669527027027,148 -mistral-7b,4,8,realistic,200,600.3901333333333,74.97226666666667,0.01646666666666667,8916.055133333333,1112.4996666666666,3.414,0.0,31.65633333333334,1187.4719333333335,9516.445266666668,150 -llama2-7b,8,16,realistic,150,5140.060657894736,642.3251315789474,0.03149122807017544,8433.47350877193,1053.2008771929825,3.8958771929824563,0.0,37.053333333333335,1695.5260087719298,13573.534166666666,228 -llama3.1-8b,0,8,none,200,75629.66744186047,9426.840988372096,2.243895348837209,9898.548023255815,1220.1924999999999,6.189999999999999,0.0,281.5088372093023,10647.033488372093,85528.21546511629,172 -llama3.1-8b,8,2,none,100,87.27728571428571,10.892071428571429,0.010285714285714285,8544.46692857143,1066.2426428571428,4.138285714285715,0.0,36.30821428571429,1077.1347142857142,8631.744214285714,140 -llama3.1-8b,8,32,realistic,150,82.20572413793103,10.259172413793102,0.00993103448275862,9118.180551724137,1137.7333103448277,3.729448275862069,0.0,34.704068965517244,1147.9924827586206,9200.386275862069,145 -llama3.1-70b,32,16,realistic,10,139.6691764705882,17.436235294117647,0.013764705882352941,9658.64388235294,1206.0883529411765,3.507647058823529,0.0,35.58282352941176,1223.524588235294,9798.313058823529,85 -llama3.1-70b,32,32,realistic,20,110.05168224299067,13.738878504672897,0.010934579439252336,10284.365794392525,1284.1486915887851,3.9951401869158882,0.0,43.49700934579439,1297.887570093458,10394.417476635514,107 -mistral-7b,8,16,realistic,150,84.6727027027027,10.57060810810811,0.00891891891891892,8888.208783783783,1109.1272972972972,3.873716216216216,0.0,35.00655405405405,1119.6979054054052,8972.881486486487,148 -mistral-7b,8,32,realistic,150,84.9932191780822,10.610616438356166,0.00904109589041096,9151.052534246575,1142.0127397260273,3.8657534246575347,0.0,36.56267123287671,1152.6233561643837,9236.045753424658,146 -llama3.1-8b,64,16,realistic,50,96.97851851851853,12.102777777777776,0.013333333333333332,8771.573240740741,1094.0853703703704,3.612407407407408,0.0,33.27990740740741,1106.1881481481482,8868.551759259259,108 -llama3.1-70b,4,8,none,10,19892.38527131783,2485.7371317829457,0.07930232558139536,8171.239302325583,1020.0194573643412,4.27875968992248,0.0,43.123333333333335,3505.7565891472873,28063.624573643414,129 -llama3.1-70b,8,8,realistic,40,10340.108581560284,1292.1453191489363,0.03914893617021276,8433.268936170212,1053.0306382978722,3.799787234042553,0.0,37.841843971631214,2345.1759574468083,18773.377517730496,141 -llama3.1-70b,16,32,realistic,70,291.2727672955975,36.38955974842768,0.008050314465408805,8905.965157232704,1112.0438364779877,4.3633962264150945,0.0,40.715534591194974,1148.4333962264152,9197.237924528303,159 -llama2-7b,16,4,realistic,100,86.62660606060605,10.822181818181818,0.006424242424242424,10201.889696969698,1274.5666666666666,4.28169696969697,0.0,43.75490909090909,1285.3888484848485,10288.516303030303,165 -llama2-7b,4,64,none,200,14071.299956896552,1758.1937500000001,0.07681034482758621,7539.331551724137,939.3844396551723,3.8189224137931035,0.0,34.75267241379311,2697.578189655172,21610.63150862069,232 -llama2-7b,32,0,realistic,50,96.22421052631579,12.021203007518796,0.007969924812030075,10103.444962406014,1262.0193984962407,4.020375939849624,0.0,41.44796992481202,1274.0406015037595,10199.66917293233,133 -llama3.1-8b,4,4,realistic,200,111.19818791946308,13.869932885906039,0.012818791946308724,9071.560939597315,1131.8728859060402,3.6506711409395978,0.0,33.8458389261745,1145.7428187919463,9182.759127516778,149 -llama3.1-8b,8,4,none,150,84.20965277777778,10.509236111111111,0.01,8882.13173611111,1108.193402777778,3.704513888888889,0.0,33.49659722222222,1118.702638888889,8966.34138888889,144 -llama3.1-70b,32,32,none,60,78.54167785234898,9.8051677852349,0.00785234899328859,8695.943087248323,1085.6040939597317,4.09503355704698,0.0,37.55187919463088,1095.4092617449664,8774.484765100671,149 -llama2-7b,64,0,realistic,150,93.40842465753424,11.6663698630137,0.00815068493150685,10116.205342465753,1259.5726027397259,4.184041095890411,0.0,45.15739726027397,1271.2389726027398,10209.613767123286,146 -mistral-7b,4,0,realistic,150,3585.6894047619044,447.609880952381,0.07202380952380953,8702.90744047619,1086.150595238095,3.9232142857142853,0.0,36.590535714285714,1533.7604761904763,12288.596845238095,168 -mistral-7b,0,0,realistic,200,73828.66091463415,9203.117012195122,2.1029268292682928,9206.074329268293,1137.8334146341463,4.783536585365853,0.0,240.73554878048785,10340.95042682927,83034.73524390244,164 -llama3.1-70b,0,4,realistic,40,45380.16892215569,5667.8178443113775,0.8989221556886228,5627.600239520958,697.8132934131737,21.45497005988024,0.0,237.268502994012,6365.63113772455,51007.76916167664,167 -llama3.1-70b,4,2,realistic,70,16701.624452054795,2086.8136301369864,0.07863013698630138,7965.5257534246575,994.3434931506849,4.740068493150686,0.0,42.782876712328765,3081.1571232876713,24667.15020547945,146 -llama3.1-70b,64,0,realistic,10,175.6848484848485,21.932424242424243,0.017727272727272727,9247.463030303032,1154.2892424242425,3.4056060606060603,0.0,35.367424242424235,1176.2216666666668,9423.147878787879,66 -mistral-7b,8,4,none,100,91.58798561151079,11.433884892086331,0.009496402877697842,8945.811223021583,1116.4833812949641,4.022230215827339,0.0,36.92064748201438,1127.9172661870505,9037.399208633093,139 -mistral-7b,16,8,realistic,50,95.55096774193548,11.928629032258065,0.01064516129032258,8834.38806451613,1102.249677419355,3.9018548387096774,0.0,36.11346774193549,1114.178306451613,8929.939032258064,124 -llama3.1-8b,32,4,realistic,100,87.18102362204725,10.88007874015748,0.011338582677165353,8874.588582677165,1107.352992125984,3.8936220472440946,0.0,35.373228346456706,1118.2330708661416,8961.769606299213,127 -llama3.1-70b,8,0,none,60,6905.7702366863905,862.8946153846156,0.02934911242603551,8672.656923076924,1082.8471597633136,4.243254437869822,0.0,42.79189349112426,1945.7417751479293,15578.427159763312,169 -llama2-7b,4,8,none,200,6110.905611814345,763.6060337552744,0.033291139240506334,8257.26877637131,1031.110928270042,3.981645569620253,0.0,36.066793248945146,1794.7169620253162,14368.174388185655,237 -llama2-7b,32,64,none,200,48.81026315789474,6.097807017543859,0.004649122807017544,8868.438684210527,1107.5378070175436,4.474736842105262,0.0,41.53679824561403,1113.6356140350874,8917.248947368422,228 -llama2-7b,64,0,none,50,80.05036231884058,9.971884057971012,0.017318840579710146,9455.516376811594,1180.8876086956523,3.8013768115942033,0.0,40.50333333333333,1190.859492753623,9535.566739130434,138 -mistral-7b,4,4,realistic,150,118.5429931972789,14.794557823129253,0.01197278911564626,8669.223469387756,1081.748163265306,3.660340136054421,0.0,32.42972789115646,1096.5427210884357,8787.766462585034,147 -llama3.1-8b,4,16,realistic,200,816.2690506329113,101.93892405063292,0.015506329113924052,8790.25417721519,1096.7446835443036,3.540822784810127,0.0,32.56329113924051,1198.6836075949368,9606.523227848102,158 -llama3.1-8b,4,32,none,200,1757.1364814814815,219.47395061728395,0.020061728395061727,8979.307407407408,1120.4380864197533,3.7397530864197535,0.0,34.1820987654321,1339.912037037037,10736.443888888889,162 -llama3.1-70b,0,2,realistic,20,41749.522808219175,5214.4309589041095,0.4180821917808219,5026.130205479452,624.100890410959,12.100684931506848,0.0,114.81869863013698,5838.531849315069,46775.653013698626,146 -llama3.1-70b,8,4,none,70,92.03328671328671,11.489160839160839,0.008321678321678322,9518.53986013986,1188.5923776223779,4.530909090909089,0.0,44.85132867132867,1200.0815384615382,9610.573146853147,143 -llama3.1-70b,16,8,realistic,20,104.41677685950414,13.035371900826446,0.009669421487603306,9504.92561983471,1186.7853719008265,4.172231404958678,0.0,41.42314049586777,1199.820743801653,9609.342396694214,121 -mistral-7b,0,2,none,50,49943.83867549669,6228.77298013245,1.111523178807947,5861.916423841059,726.6200662251656,27.00569536423841,0.0,256.5815894039735,6955.393046357616,55805.755099337744,151 -mistral-7b,4,4,none,100,321.0271942446043,40.09474820143885,0.014532374100719428,8961.39762589928,1118.3525179856115,3.9927338129496404,0.0,36.50151079136691,1158.4472661870504,9282.424820143886,139 -llama2-7b,16,32,realistic,150,87.16427135678393,10.884522613065327,0.006884422110552764,8852.399949748742,1105.7223618090454,4.517688442211054,0.0,40.894924623115585,1116.6068844221106,8939.564221105527,199 -mistral-7b,0,2,none,100,60644.971006289314,7562.341949685535,1.4169811320754717,6929.080188679245,857.6999999999999,13.44314465408805,0.0,212.3478616352201,8420.041949685536,67574.05119496856,159 -mistral-7b,4,0,none,200,10298.47947368421,1285.7026842105267,0.1859473684210526,8770.449210526316,1093.618894736842,8.689894736842104,0.0,78.73421052631579,2379.3215789473684,19068.92868421053,190 -mistral-7b,0,4,realistic,100,59152.356242038215,7376.426305732483,1.3289808917197452,6558.902993630573,811.5640127388534,11.317515923566878,0.0,193.7464968152866,8187.990318471338,65711.25923566878,157 -mistral-7b,64,32,realistic,50,91.92723214285715,11.476249999999999,0.011785714285714287,8850.952946428572,1104.1017857142856,3.6898214285714284,0.0,34.18901785714286,1115.5780357142855,8942.88017857143,112 -llama3.1-8b,0,8,realistic,200,73021.28005813953,9101.16034883721,2.0395930232558137,9072.109302325582,1118.155581395349,6.361453488372093,0.0,255.41796511627908,10219.315930232558,82093.38936046511,172 -mistral-7b,4,0,realistic,100,5012.648670886076,626.0006329113926,0.06544303797468354,8424.108101265823,1051.4993037974684,4.173101265822785,0.0,37.975443037974685,1677.4999367088608,13436.756772151897,158 -llama3.1-70b,0,16,realistic,20,40852.49781690141,5101.7058450704235,0.40021126760563386,5070.692183098592,629.3672535211267,12.720140845070421,0.0,114.99457746478873,5731.07309859155,45923.19,142 -llama3.1-70b,4,16,none,40,5024.863758865248,627.8267375886525,0.03226950354609929,8564.413971631206,1069.1748936170213,3.890141843971631,0.0,37.91595744680851,1697.0016312056737,13589.277730496455,141 -llama3.1-70b,8,64,none,10,16349.611328125,2043.0640625,0.06968750000000001,8573.886875,1070.5524999999998,4.2459375,0.0,42.597890625,3113.6165625000003,24923.498203125,128 -llama3.1-70b,16,4,realistic,70,92.01195652173912,11.486739130434783,0.008478260869565216,9189.667391304349,1147.418695652174,4.3765217391304345,0.0,41.77311594202899,1158.9054347826088,9281.679347826086,138 -llama2-7b,32,2,realistic,50,100.68744000000001,12.537919999999998,0.02504,10328.88784,1290.3033599999999,3.9560800000000005,0.0,41.51656,1302.84128,10429.575280000001,125 -mistral-7b,4,16,none,50,1585.757985074627,198.0562686567164,0.02104477611940299,8764.743880597014,1093.5699253731343,3.811940298507462,0.0,35.77731343283582,1291.6261940298507,10350.501865671642,134 -llama3.1-8b,32,4,realistic,50,94.70119658119658,11.818632478632479,0.012307692307692308,8906.925128205128,1111.155641025641,3.801538461538461,0.0,35.46384615384615,1122.9742735042737,9001.626324786324,117 -llama3.1-70b,8,32,realistic,60,6956.670591715977,869.3140828402368,0.02514792899408284,8591.62449704142,1072.6509467455621,4.163017751479289,0.0,39.91189349112426,1941.9650295857987,15548.295088757397,169 -llama2-7b,64,0,none,150,75.82374999999999,9.445065789473682,0.01986842105263158,9710.13552631579,1206.8061842105265,3.844210526315789,0.0,39.54006578947369,1216.25125,9785.95927631579,152 -mistral-7b,64,64,realistic,100,76.13283582089552,9.504477611940297,0.009850746268656717,8540.371492537313,1065.6064179104478,3.5835074626865673,0.0,32.578582089552235,1075.1108955223879,8616.504328358209,134 -llama3.1-8b,8,16,realistic,200,107.5856,13.423866666666667,0.010933333333333333,8862.4664,1105.7511999999997,3.671933333333333,0.0,33.34653333333333,1119.1750666666665,8970.052,150 -llama3.1-70b,4,16,realistic,70,20629.68812903226,2577.773290322581,0.10051612903225805,8038.144580645162,1003.2679999999999,5.177290322580645,0.0,50.906580645161284,3581.04129032258,28667.832709677423,155 -llama2-7b,4,8,realistic,100,27638.52454054054,3453.8243783783782,0.22978378378378378,8319.032486486487,1038.3448108108107,10.976810810810811,0.0,98.63340540540541,4492.169189189189,35957.557027027025,185 -mistral-7b,8,0,realistic,150,690.2770370370371,86.19506172839506,0.015493827160493827,9117.47648148148,1137.728086419753,3.7174074074074075,0.0,35.96623456790124,1223.9231481481481,9807.753518518519,162 -llama2-7b,16,4,realistic,150,176.2119801980198,22.018415841584158,0.005693069306930694,9657.593861386138,1206.4133663366338,4.399950495049505,0.0,42.97321782178218,1228.4317821782179,9833.805841584159,202 -llama2-7b,32,2,realistic,100,75.43066265060241,9.420963855421686,0.006385542168674699,10081.835180722892,1259.119638554217,4.235602409638555,0.0,42.78210843373494,1268.5406024096387,10157.265843373494,166 -llama3.1-8b,64,64,none,100,75.23715328467154,9.38948905109489,0.010510948905109488,8719.4901459854,1087.9636496350365,3.7345255474452554,0.0,34.3729197080292,1097.3531386861314,8794.727299270075,137 -llama3.1-70b,0,16,none,50,50449.019364161846,6300.2501734104035,1.1363005780346822,6330.910289017341,784.9939306358382,19.240867052023123,0.0,248.72578034682084,7085.244104046243,56779.92965317919,173 -llama3.1-70b,4,4,realistic,10,20157.005846153843,2518.8106153846156,0.05953846153846154,6133.030384615384,765.1176923076923,1.899153846153846,0.0,22.22869230769231,3283.928307692308,26290.03623076923,130 -llama2-7b,4,8,none,50,26173.94493670886,3270.9082278481014,0.10367088607594939,8158.6377848101265,1018.5793037974681,5.023354430379747,0.0,50.424240506329106,4289.48753164557,34332.582721518986,158 -mistral-7b,16,16,none,150,78.79797297297297,9.837162162162162,0.00891891891891892,9010.987094594593,1124.4099324324322,3.704189189189189,0.0,34.58925675675676,1134.2470945945945,9089.785067567567,148 -mistral-7b,64,16,realistic,150,76.736,9.579777777777778,0.009777777777777778,8477.06874074074,1057.6799999999998,3.6336296296296298,0.0,32.371111111111105,1067.2597777777778,8553.804740740741,135 -llama3.1-8b,0,0,none,200,75550.22602272728,9418.303693181819,2.3492045454545454,9349.801420454545,1153.9602272727273,4.7428409090909085,0.0,270.7598863636364,10572.263920454545,84900.02744318183,176 -llama3.1-70b,8,8,realistic,60,7142.148027210885,892.4780952380954,0.04510204081632653,8664.371496598638,1081.8317006802722,4.025374149659864,0.0,40.04360544217687,1974.3097959183676,15806.519523809524,147 -llama3.1-70b,16,2,none,20,112.12429824561404,13.997543859649124,0.010263157894736842,9415.146666666666,1175.6020175438598,4.1582456140350885,0.0,40.66938596491229,1189.599561403509,9527.270964912283,114 -llama3.1-70b,32,16,realistic,50,85.8504347826087,10.717536231884058,0.008478260869565216,9223.851014492753,1151.6225362318842,4.0578260869565215,0.0,39.766304347826086,1162.3400724637681,9309.701449275362,138 -llama3.1-8b,64,0,none,150,70.73894039735099,8.828145695364238,0.009536423841059603,8414.178476821191,1049.3964900662252,3.2213245033112585,0.0,29.256225165562917,1058.2246357615893,8484.917417218543,151 -llama3.1-70b,0,8,none,10,40338.05098484849,5039.79946969697,0.27007575757575764,4407.515530303031,547.1736363636363,2.6271212121212124,0.0,44.84969696969697,5586.973106060606,44745.56651515152,132 -llama3.1-70b,4,4,none,40,33183.89564285715,4146.632142857143,0.18299999999999997,8405.294142857145,1048.8916428571429,7.171285714285713,0.0,75.47864285714286,5195.5237857142865,41589.18978571429,140 -llama3.1-70b,4,64,none,40,17371.52824675325,2170.471233766234,0.08233766233766233,7168.865454545455,894.7851948051949,4.595000000000001,0.0,41.98344155844156,3065.2564285714284,24540.393701298704,154 -llama3.1-70b,32,4,none,20,102.55991452991454,12.803589743589743,0.01,9616.462051282051,1200.8123931623932,4.131538461538462,0.0,40.51521367521367,1213.6159829059827,9719.021965811966,117 -llama3.1-8b,0,2,none,100,60238.101428571434,7511.354740259741,1.4226623376623375,6997.960649350649,866.3088311688311,14.395064935064935,0.0,217.74214285714285,8377.663571428573,67236.06207792208,154 -llama3.1-8b,0,64,none,150,74643.77969512196,9305.284512195123,2.045,9143.287621951218,1128.9526219512195,5.192743902439024,0.0,231.52042682926833,10434.237134146342,83787.06731707316,164 -mistral-7b,16,16,realistic,50,93.93552000000001,11.726959999999998,0.01056,8866.09232,1106.2059199999999,3.8166400000000005,0.0,35.981120000000004,1117.9328799999998,8960.02784,125 -mistral-7b,8,0,realistic,200,1321.967932960894,165.06782122905028,0.020279329608938548,8653.231452513966,1079.7540782122903,3.7152513966480445,0.0,34.8490502793296,1244.8218994413407,9975.19938547486,179 -mistral-7b,4,32,none,200,1531.4311042944785,191.28331288343557,0.014478527607361963,9098.899693251533,1135.3146012269938,3.6804294478527604,0.0,34.783312883435585,1326.5979141104294,10630.330797546012,163 -llama2-7b,0,4,none,100,53259.55233766234,6652.751515151515,1.2632900432900436,7479.602943722944,929.1064935064935,12.553030303030303,0.0,242.94367965367965,7581.858008658009,60739.15528138528,231 -llama3.1-8b,32,4,none,150,79.92362318840578,9.974347826086957,0.010434782608695651,9040.08115942029,1127.9265942028985,3.7586956521739134,0.0,34.5613768115942,1137.9009420289856,9120.004782608697,138 -llama3.1-70b,4,8,none,20,33423.93815068493,4176.708424657535,0.2532191780821918,8119.22294520548,1013.3447945205479,8.227054794520548,0.0,88.87123287671233,5190.053219178082,41543.161095890406,146 -llama2-7b,64,0,realistic,200,1144.267,142.893,0.121,1477.119,181.719,0.9400000000000002,0.0,0.8479999999999999,324.61199999999997,2621.386,10 -mistral-7b,8,32,none,150,80.69921568627451,10.074509803921568,0.008627450980392158,9096.70843137255,1135.14045751634,3.814771241830065,0.0,36.02372549019608,1145.2149673202616,9177.407647058824,153 -mistral-7b,8,64,none,150,77.34360759493671,9.655632911392404,0.008354430379746836,9015.074683544304,1125.0615822784812,3.6948734177215194,0.0,34.783607594936704,1134.7172151898735,9092.41829113924,158 -mistral-7b,16,4,none,150,84.22829787234043,10.515106382978724,0.009361702127659575,9013.81269503546,1124.8379432624115,3.85822695035461,0.0,35.82503546099291,1135.3530496453902,9098.0409929078,141 -llama3.1-70b,0,2,none,70,51573.9344,6440.180514285714,1.080114285714286,6618.0356,820.7914285714286,21.98594285714286,0.0,257.3305142857143,7260.971942857142,58191.97,175 -llama3.1-70b,4,16,realistic,50,27944.85427745665,3491.94260115607,0.3501734104046243,7604.217572254336,948.8106358381502,11.818612716763004,0.0,106.41774566473988,4440.753236994218,35549.07184971098,173 -llama3.1-70b,16,16,realistic,20,107.14153846153846,13.375555555555556,0.01,9850.346837606838,1230.0486324786325,4.26974358974359,0.0,44.93435897435898,1243.424188034188,9957.488376068377,117 -llama2-7b,8,8,none,100,18237.28050761421,2279.12152284264,0.07309644670050762,8506.518121827412,1062.1516751269037,4.025736040609137,0.0,43.82695431472081,3341.2731979695436,26743.798629441626,197 -llama2-7b,32,8,realistic,50,93.9625,11.738671875,0.00828125,9894.778593750001,1236.018515625,3.8433593749999995,0.0,39.64554687500001,1247.7571875,9988.741093749999,128 -mistral-7b,64,8,none,50,90.61634782608695,11.312608695652175,0.011478260869565217,8753.957217391306,1092.1730434782608,3.8339130434782605,0.0,35.592,1103.4856521739127,8844.573565217392,115 -llama3.1-8b,64,16,none,50,87.7463025210084,10.950588235294116,0.012100840336134453,8891.89756302521,1109.37731092437,3.7482352941176473,0.0,35.52369747899159,1120.327899159664,8979.643865546219,119 -llama3.1-70b,0,2,none,30,44961.030828025476,5615.393248407643,0.747579617834395,5465.585095541401,677.958280254777,24.317070063694267,0.0,219.0856050955414,6293.35152866242,50426.615923566875,157 -llama3.1-70b,8,4,realistic,20,9756.54265625,1219.2190624999998,0.05742187500000001,8535.1990625,1065.630390625,3.72453125,0.0,38.398046875000006,2284.849453125,18291.74171875,128 -mistral-7b,0,64,realistic,200,73433.22418181818,9154.37709090909,2.2136969696969695,8927.52,1102.5229696969698,4.923272727272726,0.0,246.0086666666667,10256.90006060606,82360.74418181818,165 -llama2-7b,4,4,none,100,24440.086649746194,3054.0798984771573,0.0818274111675127,7962.732182741117,994.4291878172588,4.126446700507614,0.0,43.04131979695431,4048.5090862944166,32402.818832487308,197 -llama3.1-8b,8,8,realistic,150,83.92486111111111,10.47375,0.01,9116.054236111111,1137.4284722222224,3.8077777777777775,0.0,35.24652777777778,1147.9022222222222,9199.979097222222,144 -mistral-7b,64,4,realistic,200,74.06267605633802,9.24605633802817,0.009295774647887325,8352.047323943661,1041.905704225352,3.6326760563380276,0.0,31.84218309859155,1051.15176056338,8426.11,142 -llama3.1-8b,64,16,realistic,200,74.1150354609929,9.249503546099291,0.010212765957446808,8607.563191489362,1073.8356028368794,3.538368794326241,0.0,31.911985815602836,1083.0851063829787,8681.678226950355,141 -llama3.1-70b,4,8,realistic,70,18987.49156862745,2372.479019607843,0.11026143790849671,8820.886274509805,1101.2777777777778,5.048300653594771,0.0,53.910522875817,3473.756797385621,27808.37784313726,153 -llama3.1-70b,4,16,none,70,20331.671320754718,2540.3879245283024,0.10226415094339622,7939.643647798743,991.1481132075471,5.3058490566037735,0.0,48.74025157232705,3531.536037735849,28271.31496855346,159 -llama3.1-70b,16,4,realistic,20,105.22859504132231,13.136694214876032,0.009669421487603306,9198.316611570248,1148.4768595041323,4.3386776859504135,0.0,42.15851239669422,1161.6135537190082,9303.54520661157,121 -llama2-7b,0,0,realistic,200,34731.65365591398,4336.0343010752695,1.2383870967741937,18693.30021505376,2291.5723655913976,2.4943010752688175,0.0,163.25172043010753,6627.606666666666,53424.95387096774,93 -llama2-7b,0,64,realistic,150,60551.39522821577,7564.111618257261,1.5356016597510376,8184.2456431535265,1013.780622406639,2.5478423236514516,0.0,171.24572614107882,8577.8922406639,68735.6408713693,241 -llama2-7b,4,64,realistic,50,29017.126736842107,3625.913368421053,0.5489473684210526,8533.75747368421,1058.2084736842105,4.973894736842105,0.0,93.2437894736842,4684.121842105264,37550.88421052632,190 -llama2-7b,64,4,none,200,48.23634703196347,6.026118721461187,0.004840182648401826,9258.609497716894,1156.359680365297,4.191415525114155,0.0,39.301187214611865,1162.385799086758,9306.84584474886,219 -mistral-7b,0,64,none,200,75327.62862857143,9391.628285714285,2.3498857142857146,8991.281485714286,1110.2535428571427,5.093257142857143,0.0,270.97097142857143,10501.88182857143,84318.91011428571,175 -mistral-7b,4,32,realistic,150,1685.4484967320261,210.4967973856209,0.017254901960784316,8885.956078431373,1108.8484967320262,3.823202614379085,0.0,35.62084967320261,1319.345294117647,10571.404575163398,153 -mistral-7b,16,8,realistic,100,88.99022556390977,11.109624060150376,0.009924812030075189,8824.57007518797,1101.2936842105262,3.989323308270678,0.0,36.40932330827068,1112.4033082706765,8913.56030075188,133 -mistral-7b,32,8,none,100,81.89777777777779,10.224148148148148,0.009777777777777778,8794.805851851852,1097.5867407407409,3.9311111111111114,0.0,36.24577777777778,1107.8108888888892,8876.70362962963,135 -llama3.1-70b,4,2,none,10,24734.20053846154,3090.8038461538463,0.08076923076923077,7363.549692307692,919.1707692307692,3.176230769230769,0.0,32.96392307692307,4009.974615384616,32097.750230769234,130 -llama3.1-70b,8,8,realistic,30,9404.396737588651,1175.2058865248227,0.04808510638297873,8292.242624113474,1035.3968794326242,3.8721276595744683,0.0,37.148723404255314,2210.6027659574465,17696.639361702128,141 -llama2-7b,8,2,none,50,23611.58006802721,2950.8216326530614,0.075578231292517,7963.943877551021,994.4951700680275,3.695170068027211,0.0,37.82,3945.316802721089,31575.52394557823,147 -llama2-7b,64,4,realistic,100,65.31073170731707,8.15920731707317,0.006463414634146342,9880.859451219512,1234.2754878048784,4.1225000000000005,0.0,40.8719512195122,1242.4346951219513,9946.17018292683,164 -mistral-7b,4,16,realistic,50,4002.356737588653,499.9534042553192,0.03212765957446809,8224.002553191489,1026.1040425531914,3.7415602836879427,0.0,34.29914893617021,1526.057446808511,12226.359290780143,141 -mistral-7b,8,64,none,100,78.43628205128205,9.79205128205128,0.008461538461538461,9083.177115384615,1133.6891025641025,3.9943589743589736,0.0,37.90987179487179,1143.4811538461538,9161.613397435896,156 -llama3.1-70b,16,64,realistic,30,1600.6506474820144,200.02071942446042,0.013093525179856116,9167.504604316548,1144.5410071942447,3.8925899280575535,0.0,37.554604316546765,1344.561726618705,10768.155251798562,139 -llama2-7b,32,8,none,100,70.73821428571429,8.837261904761906,0.00630952380952381,9799.279345238096,1224.1213095238095,4.012321428571429,0.0,39.986250000000005,1232.9585714285715,9870.017559523809,168 -mistral-7b,32,32,realistic,50,87.41167999999999,10.91256,0.01056,8832.36816,1101.7523999999999,3.6673599999999995,0.0,33.984480000000005,1112.66496,8919.77984,125 -mistral-7b,64,0,none,100,74.11430555555555,9.2525,0.009166666666666667,8722.302152777778,1088.2479166666667,3.4744444444444444,0.0,32.433194444444446,1097.5004166666668,8796.416458333333,144 -mistral-7b,0,2,none,150,63600.577530864204,7927.949197530865,1.5342592592592594,8222.10413580247,1016.7130246913581,13.752777777777776,0.0,241.65092592592592,8944.662222222223,71822.68166666667,162 -llama2-7b,32,8,none,150,58.97960199004975,7.368258706467661,0.0052736318407960205,8969.580348258705,1120.4205472636818,4.360298507462687,0.0,40.19019900497512,1127.7888059701493,9028.559950248757,201 -llama3.1-70b,8,4,realistic,60,4112.228405797102,513.8678985507247,0.04217391304347826,9042.373768115942,1128.9560869565219,4.314057971014493,0.0,41.243550724637686,1642.8239855072463,13154.602173913045,138 -mistral-7b,8,2,none,50,100.5471875,12.55234375,0.0103125,8894.71859375,1109.7017968750001,3.93171875,0.0,35.7884375,1122.2541406250002,8995.26578125,128 -mistral-7b,16,64,realistic,200,2826.3417575757576,352.9878787878788,0.02690909090909091,8993.881575757576,1121.9556363636364,3.822242424242425,0.0,37.11624242424243,1474.943515151515,11820.223333333333,165 -mistral-7b,32,64,none,150,70.77842105263157,8.835986842105262,0.008684210526315789,8911.334078947368,1111.8354605263157,3.630986842105263,0.0,33.41592105263158,1120.671447368421,8982.112500000001,152 -llama3.1-8b,8,32,realistic,100,83.42972027972029,10.411888111888112,0.01006993006993007,8926.686293706294,1113.9631468531468,3.8652447552447553,0.0,36.185524475524474,1124.375034965035,9010.116013986013,143 -llama3.1-70b,0,4,none,50,49976.59443181819,6241.212102272726,1.0923863636363635,6667.836647727273,826.4745454545456,21.427897727272725,0.0,274.7632954545454,7067.686647727273,56644.43107954546,176 -llama3.1-70b,8,64,none,70,4840.661151515152,604.8426666666668,0.024303030303030302,8381.478242424244,1046.4090303030305,4.332848484848485,0.0,39.894727272727266,1651.2516969696972,13222.139393939395,165 -llama2-7b,16,8,none,50,353.41348484848487,44.15977272727273,0.008939393939393941,10241.970227272728,1279.4097727272729,4.04469696969697,0.0,42.61219696969697,1323.5695454545457,10595.383712121211,132 -llama2-7b,4,32,realistic,200,25569.569098360655,3195.3661065573774,0.21479508196721311,8543.725409836066,1065.9381967213114,11.514262295081966,0.0,98.96696721311477,4261.304303278689,34113.29450819672,244 -llama3.1-8b,64,32,realistic,100,85.97495867768595,10.729586776859504,0.011900826446280991,8867.083223140497,1106.2030578512397,3.7280991735537192,0.0,34.98256198347108,1116.9326446280995,8953.058181818182,121 -llama3.1-70b,4,4,none,60,24922.307046979866,3114.1526174496653,0.11959731543624161,8232.558187919463,1027.5883892617449,5.4193288590604025,0.0,53.4548322147651,4141.74100671141,33154.86523489933,149 -llama3.1-70b,8,0,realistic,40,9035.117483870967,1129.0325806451615,0.04335483870967742,8939.92464516129,1116.0294838709679,3.881935483870968,0.0,41.825741935483876,2245.062064516129,17975.04212903226,155 -llama2-7b,16,2,realistic,100,231.824156626506,28.96807228915663,0.0066867469879518075,10108.362590361445,1262.6497590361446,4.126927710843373,0.0,42.93144578313253,1291.6178313253013,10340.186746987953,166 -llama2-7b,4,32,realistic,150,11041.22891891892,1379.5829729729733,0.051036036036036035,8235.960585585586,1028.307117117117,3.983333333333333,0.0,37.61009009009008,2407.8900900900903,19277.189504504506,222 -llama2-7b,0,16,realistic,100,60572.91798283262,7567.030515021459,1.401931330472103,7075.925922746781,879.087339055794,5.0996566523605145,0.0,188.63523605150212,8446.117854077253,67648.8439055794,233 -llama3.1-8b,0,8,realistic,50,44917.77682432432,5601.626216216217,0.9524324324324323,5000.902027027027,619.1391891891892,28.238648648648645,0.0,224.01763513513515,6220.7654054054055,49918.67885135135,148 -llama3.1-8b,64,4,none,150,81.16884615384616,10.129769230769229,0.011076923076923076,9019.468384615386,1125.425076923077,3.7958461538461536,0.0,35.40784615384616,1135.5548461538463,9100.637230769229,130 -mistral-7b,0,2,none,200,63003.09585798816,7852.609230769231,1.445266272189349,8492.988165680474,1049.164023668639,13.12491124260355,0.0,229.65857988165683,8901.773254437869,71496.08402366863,169 -llama3.1-8b,0,0,none,150,73434.86423312884,9154.738834355829,1.9593865030674846,8871.616257668711,1096.1233128834353,5.568220858895705,0.0,228.66147239263805,10250.862147239264,82306.48049079755,163 -llama3.1-8b,32,32,realistic,50,86.99728,10.85712,0.011519999999999999,8921.79544,1113.0012,3.74656,0.0,34.76784,1123.85832,9008.79272,125 -mistral-7b,16,2,none,150,85.6035,10.686785714285715,0.009428571428571429,8952.782785714284,1117.1672857142855,3.6919285714285714,0.0,33.963499999999996,1127.8540714285714,9038.386285714287,140 -llama3.1-70b,32,32,realistic,60,75.31666666666666,9.4025,0.0075,8977.197115384615,1120.8573717948718,4.2025,0.0,40.99480769230769,1130.2598717948717,9052.513782051281,156 -llama2-7b,16,16,none,100,439.85731843575417,54.96,0.010670391061452515,9379.168715083799,1171.5039106145252,4.381564245810056,0.0,41.863798882681564,1226.4639106145253,9819.026033519554,179 -llama2-7b,32,32,none,150,68.16107317073171,8.513121951219512,0.005463414634146342,8594.18507317073,1073.3742439024393,4.252487804878049,0.0,38.21609756097561,1081.8873658536586,8662.346146341462,205 -llama2-7b,64,8,realistic,150,52.46795,6.55475,0.0053,9359.924500000001,1169.03955,4.1324000000000005,0.0,39.98645,1175.5943,9412.39245,200 -mistral-7b,0,32,none,50,50308.96904761905,6272.866258503401,1.2233333333333334,5833.698231292517,722.6561904761904,26.2669387755102,0.0,252.09394557823128,6995.522448979592,56142.66727891156,147 -llama3.1-8b,0,8,none,150,72536.86077380952,9043.11494047619,1.9229166666666668,8820.885,1088.395,6.565595238095238,0.0,240.35553571428574,10131.50994047619,81357.74577380953,168 -llama3.1-8b,4,4,none,200,93.20797385620915,11.626143790849675,0.011437908496732025,9175.939477124182,1144.8878431372548,3.6941176470588237,0.0,34.54398692810457,1156.5139869281047,9269.147450980392,153 -llama3.1-70b,0,0,none,10,39599.682116788324,4947.172773722629,0.27518248175182486,4614.717518248175,573.324306569343,3.1924817518248174,0.0,44.51883211678832,5520.497080291971,44214.3996350365,137 -llama3.1-70b,0,8,none,60,53557.819278350515,6688.558298969073,1.243298969072165,6281.1385051546395,778.1084536082473,14.268608247422678,0.0,228.2546907216495,7466.666752577319,59838.957783505146,194 -llama3.1-70b,0,32,realistic,50,47969.25586206896,5990.607758620689,1.0568965517241378,5852.414195402299,725.737183908046,20.75810344827586,0.0,230.6157471264368,6716.344942528735,53821.67005747127,174 -llama3.1-70b,0,64,realistic,60,49970.7805524862,6240.600552486188,1.1347513812154695,6203.420165745857,768.4365745856353,18.752154696132596,0.0,230.45602209944752,7009.037127071823,56174.200718232045,181 -llama3.1-70b,8,0,none,20,9753.741785714285,1218.8067857142858,0.044071428571428574,8814.079785714284,1100.3872142857144,3.7278571428571428,0.0,40.67507142857143,2319.1940000000004,18567.821571428572,140 -llama3.1-70b,32,8,none,40,89.45984962406015,11.168195488721803,0.008796992481203006,9201.458796992481,1148.9015037593986,3.887894736842105,0.0,38.46939849624059,1160.0696992481203,9290.918646616541,133 -llama2-7b,16,64,none,150,92.41724137931035,11.541477832512316,0.006354679802955665,8747.478866995074,1092.4876354679802,4.3837931034482756,0.0,40.76748768472907,1104.0291133004926,8839.896108374385,203 -mistral-7b,8,32,none,100,84.09421768707483,10.498367346938775,0.008979591836734694,9024.410068027211,1126.2876870748298,3.8989115646258496,0.0,36.77591836734694,1136.7860544217685,9108.504285714285,147 -mistral-7b,64,4,realistic,150,82.249921875,10.268125,0.0103125,8527.144375,1064.015390625,3.7497656249999998,0.0,33.019062500000004,1074.283515625,8609.394296875002,128 -llama3.1-8b,8,64,realistic,100,77.43137254901961,9.663333333333334,0.009411764705882352,8871.644183006536,1107.0979738562091,3.9540522875816997,0.0,36.340196078431376,1116.7613071895425,8949.075555555555,153 -llama3.1-8b,16,16,realistic,200,76.19666666666667,9.509266666666667,0.0096,8889.026266666666,1109.0272666666665,3.7078666666666664,0.0,33.99766666666667,1118.5365333333332,8965.222933333333,150 -llama3.1-70b,8,32,none,30,13087.263680555556,1635.3994444444443,0.05395833333333333,8689.3075,1084.7714583333334,3.516944444444444,0.0,38.73111111111112,2720.170902777778,21776.571180555555,144 -llama3.1-70b,32,2,none,60,86.16928571428572,10.757357142857142,0.008357142857142856,8806.124785714286,1099.4375,3.7955714285714293,0.0,35.90242857142857,1110.1948571428572,8892.29407142857,140 -llama3.1-70b,4,32,realistic,50,24973.59398876405,3120.5911797752806,0.3730337078651686,7428.520898876404,926.9526404494383,12.253146067415727,0.0,103.5473595505618,4047.5438202247187,32402.11488764045,178 -mistral-7b,16,16,realistic,150,78.63073825503355,9.816308724832215,0.008859060402684565,8415.93389261745,1050.144697986577,3.6709395973154355,0.0,32.221409395973154,1059.9610067114095,8494.564630872483,149 -mistral-7b,16,32,realistic,150,77.38986666666666,9.6614,0.0088,8871.0342,1106.934733333333,3.6986000000000003,0.0,34.023199999999996,1116.5961333333332,8948.424066666666,150 -mistral-7b,8,2,realistic,200,87.07743243243243,10.870810810810811,0.00891891891891892,8892.812635135135,1109.5647972972972,3.3963513513513512,0.0,30.730878378378378,1120.435608108108,8979.890067567567,148 -mistral-7b,64,2,realistic,50,103.24368932038836,12.889029126213591,0.012815533980582525,8573.34796116505,1069.5585436893205,3.7417475728155347,0.0,33.910485436893204,1082.447572815534,8676.591650485436,103 -llama2-7b,8,64,realistic,50,15716.699011627907,1964.1078488372095,0.07813953488372095,8223.360872093022,1026.6327325581397,4.123662790697674,0.0,44.7368023255814,2990.740581395349,23940.05988372093,172 -llama3.1-8b,32,16,none,50,85.8927559055118,10.719291338582677,0.011338582677165353,8878.719606299213,1107.652362204724,3.73740157480315,0.0,34.64031496062992,1118.3716535433073,8964.612362204725,127 -mistral-7b,4,64,none,100,2107.0858974358976,263.1500641025641,0.02435897435897436,8811.591666666667,1099.6180769230768,4.091987179487179,0.0,37.76403846153846,1362.768141025641,10918.677564102563,156 -llama3.1-70b,32,0,realistic,40,91.88736842105263,11.471203007518797,0.008796992481203006,9797.362556390977,1223.306842105263,3.9460150375939844,0.0,41.471804511278194,1234.778045112782,9889.24992481203,133 -llama2-7b,16,16,none,50,1580.1664885496184,197.47541984732823,0.014809160305343513,10430.601832061067,1302.898320610687,4.134809160305344,0.0,44.571832061068704,1500.3737404580154,12010.768320610687,131 -llama2-7b,0,16,none,100,59549.60855371901,7438.674008264463,1.405495867768595,7565.058925619835,936.393347107438,5.551239669421487,0.0,197.80528925619834,8375.0673553719,67114.66747933884,242 -llama3.1-8b,32,16,realistic,200,72.30337748344371,9.023377483443708,0.009536423841059603,8738.448874172183,1090.1313907284766,3.5596688741721856,0.0,32.18298013245033,1099.1547682119206,8810.752251655627,151 -llama3.1-70b,4,64,realistic,20,33837.43823529412,4228.268897058824,0.17705882352941177,7333.059044117647,914.7063970588235,5.725588235294118,0.0,65.15860294117647,5142.9752941176475,41170.497279411764,136 -llama3.1-70b,4,64,none,20,7968.914379562044,995.6360583941605,0.04416058394160584,8737.321751824818,1090.7241605839417,3.987664233576642,0.0,40.52591240875912,2086.3602189781022,16706.23613138686,137 -llama3.1-70b,8,64,realistic,60,10824.3704,1352.601942857143,0.03417142857142856,8655.683828571428,1080.6346857142858,4.1901142857142855,0.0,42.750171428571434,2433.236628571429,19480.054228571425,175 -llama3.1-70b,32,2,realistic,60,91.0345112781955,11.364736842105263,0.008796992481203006,8912.627218045112,1112.727142857143,4.1469172932330824,0.0,39.02105263157894,1124.0918796992482,9003.661729323308,133 -llama2-7b,32,16,realistic,100,69.19111764705882,8.644,0.006235294117647059,9022.845764705882,1126.9303529411766,4.083823529411765,0.0,38.25352941176471,1135.5743529411766,9092.03688235294,170 -llama3.1-8b,32,64,none,100,70.33444444444444,8.77764705882353,0.009411764705882352,8808.371111111112,1099.141503267974,3.7358169934640517,0.0,34.352679738562095,1107.9191503267973,8878.705555555554,153 -llama3.1-70b,0,32,none,30,44719.45525974026,5585.136688311689,0.8260389610389611,5629.384935064934,698.1929220779222,23.540454545454544,0.0,228.28948051948052,6283.32961038961,50348.8401948052,154 -llama3.1-70b,0,64,none,50,49090.52805555556,6130.846388888889,1.2108333333333332,5895.112944444445,731.0899444444444,17.555722222222222,0.0,228.0403888888889,6861.936333333333,54985.640999999996,180 -llama3.1-70b,8,8,realistic,20,11992.661127819549,1498.6482706766917,0.06045112781954887,8587.751353383459,1072.351127819549,3.6842105263157894,0.0,39.91323308270676,2570.9993984962407,20580.412481203006,133 -llama3.1-70b,8,8,realistic,70,826.5971333333334,103.27900000000001,0.0096,8907.3294,1112.2420666666667,4.416533333333334,0.0,41.763666666666666,1215.5210666666665,9733.926533333333,150 -llama3.1-70b,16,0,none,30,1401.0137333333332,175.07146666666665,0.010333333333333332,9064.071866666667,1131.707666666667,3.8308666666666666,0.0,38.53059999999999,1306.7791333333332,10465.085599999999,150 -llama3.1-70b,32,2,none,10,117.41504854368932,14.658058252427184,0.011359223300970873,10112.226407766992,1262.8763106796118,4.058543689320388,0.0,41.663689320388336,1277.534368932039,10229.641456310681,103 -llama3.1-8b,8,4,realistic,100,90.87805970149253,11.341492537313433,0.010746268656716417,8796.365373134327,1097.7494776119402,3.8514179104477613,0.0,34.616492537313434,1109.0909701492537,8887.243432835821,134 -llama3.1-8b,32,0,none,200,64.51138728323699,8.05092485549133,0.008323699421965317,8692.089248554914,1084.211098265896,3.3115606936416184,0.0,30.86670520231214,1092.2620231213873,8756.600635838151,173 -llama2-7b,16,2,none,150,69.63888888888889,8.699903381642512,0.005120772946859904,9150.385555555556,1143.0166666666667,4.407826086956521,0.0,42.267632850241554,1151.7165700483092,9220.024444444445,207 -llama3.1-8b,8,32,realistic,200,74.90006289308177,9.347421383647799,0.009056603773584906,9058.466352201258,1130.287610062893,3.7687421383647792,0.0,35.047106918238995,1139.6350314465408,9133.36641509434,159 -mistral-7b,4,0,none,100,4139.718658536585,516.9520121951219,0.04871951219512195,8531.840731707316,1064.8138414634147,4.232073170731707,0.0,38.25591463414635,1581.7658536585363,12671.559390243903,164 -llama3.1-70b,4,16,realistic,30,21173.478671328674,2645.8856643356644,0.1786013986013986,8722.52937062937,1088.239230769231,8.702097902097902,0.0,80.8941958041958,3734.124895104896,29896.008041958044,143 -llama3.1-70b,8,32,none,40,12617.599937106917,1576.7668553459123,0.051635220125786155,8139.768427672955,1015.668176100629,3.667232704402516,0.0,37.838238993710696,2592.435031446541,20757.368364779875,159 -llama2-7b,0,8,realistic,150,58643.50824267782,7325.217949790795,1.3371129707112968,8111.982845188284,1004.1004184100418,7.521171548117154,0.0,223.6866108786611,8329.318368200837,66755.49108786612,239 -llama2-7b,16,8,none,200,131.03377777777777,16.369333333333334,0.006,9412.384844444443,1175.5797777777777,4.267022222222222,0.0,41.2128,1191.949111111111,9543.418622222222,225 -mistral-7b,8,32,none,200,76.59310559006211,9.56192546583851,0.008198757763975155,9006.63155279503,1123.8380124223602,3.772608695652174,0.0,34.741242236024846,1133.3999378881988,9083.224658385092,161 -mistral-7b,64,32,realistic,100,82.94854838709678,10.355322580645161,0.01064516129032258,8719.918709677419,1088.0621774193548,3.7658870967741933,0.0,34.90346774193548,1098.4175,8802.867258064516,124 -llama3.1-8b,8,8,realistic,50,95.31629921259842,11.895354330708662,0.011338582677165353,8776.027244094488,1094.8107874015745,3.882283464566929,0.0,35.39826771653542,1106.7061417322832,8871.343543307086,127 -llama3.1-70b,4,0,none,40,25124.60320224719,3139.4467415730337,0.5256741573033709,7025.615842696629,876.3939325842697,5.461516853932584,0.0,73.57129213483147,4015.8406741573026,32150.21904494382,178 -llama3.1-70b,32,4,realistic,30,104.67973913043478,13.068173913043477,0.01017391304347826,9537.980173913043,1190.916,4.30608695652174,0.0,42.14878260869565,1203.9841739130436,9642.659913043479,115 -llama2-7b,16,8,none,150,68.84661904761906,8.598190476190476,0.005714285714285715,9159.181571428571,1144.0794285714285,4.335761904761905,0.0,40.135666666666665,1152.677619047619,9228.02819047619,210 -llama2-7b,4,0,realistic,150,20177.94828125,2521.3152343750003,0.08851562500000001,7705.194453125,962.21453125,3.929921875,0.0,44.811953125,3483.5297656249995,27883.142734375,128 -llama2-7b,4,2,none,200,16022.547708333334,2002.1349999999998,0.06341666666666666,7944.574083333334,992.2143333333333,4.147125,0.0,37.479875,2994.349333333333,23967.12179166667,240 -llama2-7b,4,8,realistic,200,26816.120839416057,3350.8597445255477,0.13664233576642335,7499.837335766423,936.4036131386861,5.717043795620437,0.0,55.527043795620436,4287.263357664234,34315.95817518248,274 -llama2-7b,16,64,realistic,150,73.75544554455446,9.20871287128713,0.0059405940594059415,8647.96,1079.933069306931,4.349158415841584,0.0,39.15579207920792,1089.141782178218,8721.715445544554,202 -mistral-7b,16,0,realistic,200,2680.638612716763,334.8028901734104,0.02260115606936416,9015.524624277457,1124.835606936416,3.8656069364161842,0.0,38.27086705202313,1459.6384971098264,11696.16323699422,173 -mistral-7b,16,32,realistic,50,89.49869230769231,11.173076923076923,0.010153846153846154,9007.209538461539,1123.7416153846154,3.753461538461538,0.0,35.03923076923077,1134.9146923076921,9096.70823076923,130 -llama3.1-8b,4,16,none,200,1361.6333529411766,170.07305882352944,0.013529411764705882,8444.595,1053.7546470588236,3.640176470588235,0.0,32.078705882352935,1223.827705882353,9806.228352941176,170 -llama3.1-70b,0,2,none,40,46366.0196969697,5790.812484848485,0.9194545454545453,5790.652303030303,718.7318181818182,23.604484848484848,0.0,242.34799999999998,6509.544303030302,52156.67199999999,165 -llama3.1-70b,4,2,realistic,50,23224.54429577465,2902.055563380282,0.09288732394366196,7553.344507042253,942.6807746478872,4.472816901408451,0.0,41.11732394366197,3844.7363380281695,30777.888802816902,142 -mistral-7b,0,32,realistic,50,44093.34664473684,5499.061973684211,0.9417105263157893,4927.383552631579,610.626447368421,28.30006578947368,0.0,221.93072368421053,6109.688421052631,49020.73019736842,152 -llama2-7b,16,32,none,150,69.11148514851486,8.628168316831683,0.00599009900990099,8633.169455445544,1078.215495049505,4.286287128712871,0.0,38.402029702970296,1086.8436633663366,8702.280940594059,202 -llama3.1-70b,8,8,none,40,9487.668723404255,1185.5820567375888,0.0549645390070922,8812.73134751773,1100.3745390070922,3.9568085106382975,0.0,39.64475177304965,2285.9565957446807,18300.400070921987,141 -mistral-7b,4,32,realistic,50,3331.526402877698,416.06330935251793,0.04719424460431655,8567.271654676259,1068.9520863309351,3.933597122302158,0.0,36.942158273381295,1485.015395683453,11898.798057553957,139 -llama3.1-8b,8,16,none,150,75.70841772151898,9.448291139240506,0.009113924050632912,8725.153987341773,1088.7184810126585,3.7041772151898735,0.0,33.17050632911393,1098.1667721518988,8800.862405063292,158 -mistral-7b,4,4,realistic,100,358.7128148148148,44.78148148148148,0.018740740740740742,8805.825111111111,1098.9713333333334,3.887111111111111,0.0,35.734370370370364,1143.7528148148149,9164.537925925926,135 -mistral-7b,32,32,none,100,75.99153846153847,9.486853146853147,0.009230769230769232,8665.439020979022,1081.3012587412588,3.7394405594405593,0.0,34.30825174825175,1090.7881118881119,8741.43055944056,143 -llama3.1-8b,16,0,realistic,100,75.39346153846154,9.409038461538461,0.00923076923076923,8618.064102564103,1075.4309615384614,3.820705128205128,0.0,34.41442307692307,1084.84,8693.457564102564,156 -llama3.1-70b,16,0,realistic,50,837.8650326797386,104.69777777777777,0.010588235294117647,9483.672875816994,1184.1182352941175,4.064117647058825,0.0,41.790849673202615,1288.8160130718954,10321.537908496732,153 -llama3.1-70b,16,32,realistic,30,97.87456692913386,12.218661417322835,0.00921259842519685,9657.300157480317,1205.922204724409,4.052755905511812,0.0,41.68944881889764,1218.140866141732,9755.174724409448,127 -llama2-7b,0,16,realistic,50,45831.208999999995,5725.067,1.1395555555555557,6417.683333333333,797.7923333333333,21.582444444444445,0.0,265.1565,6522.859333333333,52248.89233333333,180 -llama2-7b,0,64,none,100,60871.651877729266,7604.1703056768565,1.3804366812227071,8074.751048034935,1000.3097379912664,3.836768558951965,0.0,167.05344978165942,8604.480043668123,68946.4029257642,229 -mistral-7b,16,0,none,100,75.728375,9.4539375,0.00825,8848.560125,1104.3573125,3.6684375000000005,0.0,33.9681875,1113.81125,8924.288499999999,160 -mistral-7b,16,4,none,200,76.04384615384616,9.493333333333334,0.008461538461538461,8799.681217948719,1097.9779487179487,3.5624358974358974,0.0,32.17237179487179,1107.471282051282,8875.725064102564,156 -mistral-7b,32,2,none,150,83.22348148148149,10.38962962962963,0.009777777777777778,8853.429851851852,1104.6695555555557,3.703777777777778,0.0,33.208,1115.0591851851855,8936.653333333334,135 -llama3.1-8b,0,64,none,100,64734.400370370364,8072.721543209877,1.6242592592592593,7110.051728395061,879.5403086419753,6.796419753086419,0.0,176.54086419753088,8952.261851851852,71844.45209876544,162 -llama3.1-8b,4,0,realistic,200,7462.5884745762705,931.6682485875705,0.13621468926553673,8979.597966101694,1120.1520903954802,6.411129943502825,0.0,60.12401129943503,2051.8203389830505,16442.186440677968,177 -llama3.1-70b,4,8,none,70,16796.297337662338,2098.642142857143,0.0848051948051948,8172.029805194804,1020.1555194805194,4.657272727272727,0.0,44.17181818181819,3118.797662337662,24968.327142857142,154 -llama3.1-70b,8,8,none,60,4041.760689655173,505.0310344827586,0.036000000000000004,8761.222620689656,1093.913379310345,4.162965517241379,0.0,38.584758620689655,1598.9444137931034,12802.983310344827,145 -llama3.1-70b,16,16,none,20,95.3132824427481,11.898854961832061,0.008931297709923663,9284.597786259543,1159.3485496183205,3.9606870229007645,0.0,40.45564885496183,1171.2474045801525,9379.911068702291,131 -mistral-7b,64,32,none,150,74.69423357664235,9.324890510948904,0.009635036496350365,8765.99189781022,1093.7673722627737,3.602189781021898,0.0,33.044744525547436,1103.0922627737227,8840.686131386861,137 -mistral-7b,8,2,realistic,100,94.30613138686132,11.773211678832117,0.009635036496350365,8710.84,1086.9437956204379,3.8575912408759123,0.0,34.512189781021895,1098.7170072992699,8805.146131386862,137 -llama3.1-70b,0,32,none,60,53415.90265536723,6670.390734463278,1.2447457627118643,6730.063220338982,833.7984180790961,15.299830508474574,0.0,226.0579661016949,7504.1891525423725,60145.96587570622,177 -llama3.1-8b,16,16,none,50,87.84076923076923,10.962384615384615,0.011076923076923076,9016.58546153846,1124.9406153846153,3.8595384615384614,0.0,36.39230769230769,1135.903,9104.426230769232,130 -llama3.1-70b,8,8,none,10,15979.63893129771,1996.8306106870232,0.0669465648854962,8746.355114503816,1092.3912977099237,4.043129770992366,0.0,42.27145038167939,3089.221908396947,24725.994045801526,131 -llama3.1-70b,32,64,none,30,82.48971631205673,10.298014184397163,0.008297872340425531,9007.847872340426,1124.619290780142,3.9230496453900714,0.0,38.91702127659575,1134.9173049645392,9090.337588652483,141 -llama3.1-70b,32,64,realistic,30,91.2290625,11.3890625,0.009140625,9593.454296875,1197.902734375,4.03515625,0.0,40.721093749999994,1209.2917968750003,9684.683359375,128 -llama3.1-70b,32,64,none,40,79.62020547945205,9.939794520547945,0.008013698630136986,9334.22020547945,1165.4953424657533,3.8635616438356157,0.0,39.910821917808214,1175.4351369863014,9413.840410958905,146 -llama2-7b,8,16,none,50,17702.052594936707,2212.175949367089,0.06474683544303798,8041.884303797468,1004.105253164557,3.3184810126582276,0.0,36.718227848101264,3216.2812025316457,25743.93689873418,158 -llama2-7b,16,0,none,100,121.25852760736196,15.131779141104298,0.00803680981595092,8948.532453987731,1117.6726993865032,4.204907975460123,0.0,41.34257668711657,1132.8044785276077,9069.790981595092,163 -llama2-7b,32,64,realistic,50,84.47611940298506,10.553507462686568,0.00791044776119403,9658.496865671643,1206.2797014925377,5.146567164179105,0.0,49.031716417910445,1216.833208955224,9742.972985074628,134 -mistral-7b,8,16,realistic,100,92.25558823529411,11.51720588235294,0.009705882352941177,8987.787279411765,1121.665294117647,4.079632352941176,0.0,37.738897058823525,1133.1825000000001,9080.04286764706,136 -mistral-7b,4,32,realistic,200,2806.5995209580838,350.5519161676647,0.022095808383233537,8592.517904191616,1072.0694610778442,3.674550898203592,0.0,34.33167664670658,1422.6213772455092,11399.1174251497,167 -mistral-7b,4,8,none,150,103.42643835616438,12.900753424657534,0.011506849315068493,9109.86780821918,1136.7824657534245,3.789109589041096,0.0,35.46301369863014,1149.6832191780823,9213.294246575342,146 -llama3.1-70b,16,8,none,40,97.44906976744187,12.16550387596899,0.009069767441860464,9638.151860465116,1203.4978294573643,4.085968992248063,0.0,41.10023255813954,1215.6633333333334,9735.600930232558,129 -llama3.1-8b,32,2,realistic,200,77.24624999999999,9.640208333333334,0.01,8471.666458333333,1056.8031944444444,3.457708333333333,0.0,30.098888888888887,1066.443402777778,8548.912708333333,144 -llama2-7b,64,8,none,100,67.226,8.398451612903227,0.006838709677419355,10111.256903225807,1262.9161935483871,3.9950967741935486,0.0,40.79696774193549,1271.3146451612904,10178.482903225808,155 -llama3.1-70b,0,16,realistic,60,51059.27481865285,6376.622953367875,1.1705699481865286,6196.697305699482,768.0882901554403,18.4560621761658,0.0,248.2079274611399,7144.711243523316,57255.97212435234,193 -llama2-7b,32,8,none,200,51.40504347826087,6.422,0.004608695652173913,9548.94008695652,1192.6147391304348,4.375434782608696,0.0,41.66304347826087,1199.0367391304349,9600.345130434782,230 -llama2-7b,64,4,none,100,68.03192307692308,8.499166666666666,0.006794871794871795,9888.055769230768,1235.1834615384614,4.161153846153847,0.0,41.63544871794871,1243.6826282051281,9956.087692307692,156 -llama3.1-70b,0,0,none,40,46643.02764705882,5825.363764705882,1.0802941176470588,5802.132411764706,719.7691176470588,23.02723529411765,0.0,253.19411764705882,6545.132882352942,52445.16005882354,170 -llama3.1-70b,0,4,none,10,40986.66414814815,5120.584740740741,0.2602962962962963,4429.628814814815,550.0098518518517,3.020518518518519,0.0,43.13437037037037,5670.5945925925935,45416.29296296296,135 -llama3.1-70b,0,16,none,60,53819.04540983606,6721.291639344263,1.2440983606557379,6716.058360655738,832.4913114754097,12.919180327868853,0.0,218.00360655737705,7553.782950819674,60535.10377049179,183 -llama2-7b,16,0,none,200,1015.0256737588652,126.83468085106385,0.012907801418439717,9478.303404255319,1182.9343262411346,4.292978723404255,0.0,44.89553191489362,1309.7690070921985,10493.329078014183,141 -mistral-7b,64,64,none,200,66.25908496732026,8.271830065359477,0.008627450980392158,8527.584313725489,1063.8791503267973,3.486013071895425,0.0,31.29588235294118,1072.150980392157,8593.84339869281,153 -llama3.1-8b,64,4,realistic,200,78.91768656716417,9.848805970149254,0.010746268656716417,8768.640597014924,1093.93223880597,3.6535820895522386,0.0,33.02179104477612,1103.7810447761192,8847.55828358209,134 -llama3.1-8b,64,4,none,100,85.84869918699187,10.713821138211381,0.011707317073170732,8911.002032520326,1111.8541463414638,3.759430894308944,0.0,34.72325203252032,1122.5679674796752,8996.850731707318,123 -llama3.1-70b,16,8,realistic,70,86.88731034482758,10.84696551724138,0.008068965517241379,9341.047793103447,1166.4152413793104,4.528689655172413,0.0,43.14,1177.2622068965516,9427.935103448275,145 -mistral-7b,16,64,none,50,78.57753424657534,9.809657534246575,0.00904109589041096,8847.739726027397,1103.900616438356,3.766849315068493,0.0,35.50356164383562,1113.7102739726029,8926.317260273972,146 -mistral-7b,8,4,realistic,50,104.98803278688526,13.10672131147541,0.010819672131147541,9092.837540983606,1134.5035245901638,3.750819672131147,0.0,35.27073770491804,1147.610245901639,9197.825573770491,122 -mistral-7b,32,64,realistic,100,72.6158389261745,9.065369127516778,0.008859060402684565,8508.314161073826,1061.6987919463088,3.7846308724832207,0.0,34.064026845637585,1070.7641610738256,8580.93,149 -mistral-7b,8,8,realistic,100,90.49578571428572,11.297571428571429,0.009428571428571429,8594.184285714286,1072.4488571428574,3.789142857142857,0.0,33.865071428571426,1083.7464285714286,8684.680071428571,140 -llama3.1-8b,0,64,none,200,76528.19712643679,9540.789367816093,2.384655172413793,9228.223563218391,1138.9272988505745,4.7892528735632185,0.0,273.71183908045975,10679.716666666667,85756.42068965516,174 -mistral-7b,32,2,realistic,50,97.47577586206897,12.168965517241379,0.011379310344827587,8423.332155172415,1050.7491379310345,3.724137931034482,0.0,33.24629310344829,1062.9181034482758,8520.807931034482,116 -llama3.1-8b,4,32,realistic,50,3246.907625899281,405.4787050359713,0.037697841726618705,8567.294316546764,1068.9028057553955,3.7705035971223015,0.0,35.26402877697842,1474.3815107913667,11814.201942446043,139 -llama3.1-70b,0,2,realistic,10,38851.0381294964,4854.047553956834,0.18949640287769784,3933.077553956835,488.1383453237411,1.118201438848921,0.0,20.527338129496403,5342.185899280576,42784.11568345324,139 -llama3.1-70b,4,32,none,70,9641.792165605097,1204.6904458598729,0.04426751592356688,7839.756815286624,978.44949044586,4.087579617834395,0.0,36.767452229299366,2183.1399363057326,17481.54898089172,157 -llama3.1-70b,16,8,none,20,97.55465116279069,12.178682170542634,0.009069767441860464,9214.501860465116,1150.5827906976745,4.441007751937985,0.0,42.91488372093023,1162.761472868217,9312.056511627907,129 -llama3.1-70b,32,0,none,10,116.09285714285714,14.49304761904762,0.011142857142857142,10275.160857142859,1283.2545714285716,4.177809523809524,0.0,45.29057142857143,1297.747619047619,10391.253714285716,105 -llama2-7b,32,2,none,150,60.28209756097561,7.530975609756097,0.005170731707317073,9519.874731707318,1189.13156097561,4.303756097560975,0.0,40.96102439024391,1196.6625365853658,9580.156829268291,205 -mistral-7b,0,16,none,50,50383.42053333333,6283.510800000001,1.1676000000000002,5671.976933333333,702.7136666666665,24.12786666666667,0.0,240.38986666666668,6986.224466666667,56055.39746666666,150 -llama3.1-8b,0,32,none,200,78293.735748503,9760.336766467068,2.399101796407186,9649.139281437127,1190.2271257485029,5.7856287425149695,0.0,292.6054491017964,10950.56389221557,87942.87502994011,167 -llama3.1-8b,32,64,realistic,50,87.13516129032259,10.874354838709678,0.01161290322580645,8990.81685483871,1121.5287096774193,3.7120161290322584,0.0,34.83564516129032,1132.4030645161288,9077.952016129033,124 -llama3.1-70b,32,64,realistic,20,110.2179245283019,13.759622641509434,0.011037735849056603,10174.327924528301,1270.3523584905658,3.892075471698113,0.0,42.33943396226415,1284.1119811320752,10284.545849056603,106 -mistral-7b,4,64,none,50,6146.828516129032,767.7678064516128,0.03767741935483871,8638.710387096773,1077.8600645161291,4.002258064516129,0.0,37.931225806451614,1845.627870967742,14785.538903225808,155 -llama3.1-70b,16,16,realistic,40,1049.4055714285714,131.13028571428572,0.013142857142857144,9414.121857142858,1175.5351428571428,4.144142857142857,0.0,42.876357142857145,1306.6654285714287,10463.527428571428,140 -llama3.1-8b,8,64,realistic,150,74.91253164556962,9.348987341772153,0.009113924050632912,8689.209873417722,1084.236075949367,3.9788607594936716,0.0,35.83867088607595,1093.5850632911392,8764.12240506329,158 -llama2-7b,16,64,realistic,200,71.30502164502164,8.90108225108225,0.0058874458874458874,8875.398051948052,1108.377489177489,4.5604329004329,0.0,42.57251082251082,1117.2785714285712,8946.703073593075,231 -llama3.1-8b,4,0,realistic,50,3491.1180985915494,435.9861971830986,0.04464788732394366,8331.837535211267,1039.550704225352,3.9967605633802816,0.0,36.61619718309859,1475.5369014084508,11822.955633802816,142 -mistral-7b,4,0,realistic,50,3690.010704225352,460.8445070422535,0.038873239436619716,8595.94767605634,1072.719647887324,3.7449295774647893,0.0,35.68852112676057,1533.5641549295776,12285.958380281689,142 -mistral-7b,32,64,realistic,150,73.02047297297298,9.115945945945947,0.00891891891891892,8901.523310810811,1110.6658783783782,3.6525000000000003,0.0,33.71270270270271,1119.7818243243244,8974.543783783784,148 -llama3.1-8b,16,0,none,150,69.67470238095238,8.695297619047619,0.008571428571428572,8658.724702380952,1080.4333928571427,3.5201190476190467,0.0,31.94595238095238,1089.1286904761905,8728.399404761905,168 -llama3.1-8b,32,4,none,100,82.98947368421052,10.356992481203008,0.010827067669172932,8852.820225563908,1104.7387969924812,3.8334586466165415,0.0,35.26729323308272,1115.0957894736841,8935.80969924812,133 -llama3.1-70b,4,32,none,30,26259.023416149066,3281.285093167702,0.37254658385093165,7408.865217391304,924.2518633540371,7.516770186335404,0.0,77.20888198757764,4205.536956521739,33667.88863354037,161 -llama3.1-70b,32,0,realistic,60,76.767106918239,9.583584905660377,0.007358490566037735,8729.558113207548,1089.840314465409,4.0979245283018875,0.0,38.79974842767295,1099.4238993710694,8806.325220125786,159 -llama3.1-70b,32,8,none,20,102.675,12.81793103448276,0.010086206896551724,9984.092155172413,1246.6309482758622,4.057672413793103,0.0,43.35129310344828,1259.448879310345,10086.767155172414,116 -llama2-7b,0,2,none,50,44951.82797752808,5615.223932584269,0.8982022471910112,7035.406179775281,874.458988764045,24.7026404494382,0.0,279.3351123595506,6489.682921348315,51987.23415730337,178 -llama2-7b,32,0,none,150,90.49714285714286,11.2964,0.007257142857142857,9529.143142857143,1190.1524571428572,4.4544,0.0,46.21148571428571,1201.4488571428572,9619.640285714286,175 -mistral-7b,4,2,none,50,1140.1731782945735,142.41790697674418,0.01984496124031008,8542.796899224806,1065.906046511628,3.988837209302325,0.0,35.688682170542634,1208.323953488372,9682.970077519381,129 -mistral-7b,16,8,realistic,150,84.44442857142856,10.542071428571429,0.009428571428571429,9033.43742857143,1127.2342142857142,3.7149285714285707,0.0,34.385285714285715,1137.7762857142857,9117.881857142856,140 -mistral-7b,0,64,realistic,50,45345.73326666667,5655.111133333334,0.9659333333333334,4972.575666666667,616.0672666666667,27.153866666666666,0.0,216.95113333333336,6271.1784,50318.308933333334,150 -llama3.1-8b,0,32,none,100,64520.638757396446,8046.774437869822,1.602958579881657,6875.461124260355,850.2439644970415,6.933964497041421,0.0,180.71029585798817,8897.018402366863,71396.0998816568,169 -llama2-7b,4,2,realistic,100,22352.113264248703,2793.2186528497405,0.07248704663212435,7627.909274611399,952.6459585492229,4.051295336787565,0.0,38.008860103626944,3745.864611398964,29980.022538860103,193 -llama2-7b,4,32,none,150,22479.497142857144,2808.940504201681,0.2919327731092437,8256.738403361345,1028.2931932773108,11.316008403361344,0.0,97.74962184873948,3837.2336974789914,30736.23554621849,238 -llama2-7b,8,16,none,100,8293.240793650793,1036.327195767196,0.035449735449735446,8717.282063492064,1088.6625396825398,4.247142857142857,0.0,41.76111111111112,2124.9897354497357,17010.52285714286,189 -mistral-7b,4,0,none,150,3199.7862427745667,399.42768786127164,0.07410404624277457,8907.752485549134,1111.7018497109825,3.9147976878612716,0.0,37.387109826589594,1511.1295375722543,12107.5387283237,173 -llama3.1-8b,0,2,realistic,200,62304.70832335329,7766.421976047905,1.453173652694611,8212.308502994012,1014.981017964072,12.792155688622755,0.0,226.21485029940123,8781.402994011976,70517.0168263473,167 -llama3.1-70b,4,0,none,30,29262.324150943397,3656.564465408805,0.4216352201257862,7107.817232704402,886.8027044025157,8.77547169811321,0.0,90.62893081761007,4543.367169811321,36370.1413836478,159 -llama3.1-70b,16,8,realistic,40,98.600390625,12.30921875,0.009140625,9633.292578125,1202.931953125,4.252968750000001,0.0,43.33039062499999,1215.241171875,9731.89296875,128 -llama3.1-8b,4,2,none,50,1610.59296875,201.172890625,0.035625000000000004,8817.707890625,1100.204921875,4.00359375,0.0,36.232109375,1301.3778125,10428.300859375,128 -llama3.1-8b,64,32,none,50,84.3809756097561,10.530650406504066,0.011707317073170732,8489.4081300813,1058.6944715447155,3.6361788617886184,0.0,32.365528455284554,1069.2251219512195,8573.789105691058,123 -llama3.1-70b,4,0,realistic,20,33426.48810218978,4176.887883211679,0.1908029197080292,7139.167080291971,890.6505109489051,6.984671532846716,0.0,70.52810218978101,5067.538394160584,40565.65518248175,137 -llama3.1-70b,8,64,realistic,30,10240.874577464789,1279.6528169014089,0.04978873239436621,8970.97246478873,1119.989647887324,3.5891549295774645,0.0,39.78471830985916,2399.6424647887325,19211.847042253525,142 -llama2-7b,4,0,none,100,32938.115296803655,4115.728584474887,0.4655707762557078,8538.606712328768,1058.2560730593607,11.848721461187214,0.0,117.58205479452056,5173.984657534247,41476.72200913243,219 -llama3.1-70b,4,0,realistic,40,31512.11532051282,3937.735,0.45134615384615384,6838.449871794872,853.464358974359,16.58871794871795,0.0,133.05442307692306,4791.19935897436,38350.565192307695,156 -mistral-7b,32,2,realistic,200,79.37394366197184,9.909084507042254,0.009295774647887325,8696.34929577465,1085.0288028169014,3.6595070422535216,0.0,32.3975352112676,1094.9378873239436,8775.72323943662,142 -llama2-7b,16,32,realistic,200,377.9873684210526,47.230570175438594,0.007675438596491228,9254.150526315789,1155.797763157895,4.350833333333333,0.0,42.56956140350877,1203.0283333333334,9632.137894736841,228 -llama2-7b,8,4,realistic,50,24219.084370860928,3026.640066225166,0.08735099337748345,8726.40238410596,1089.734105960265,4.034370860927153,0.0,48.008013245033105,4116.374172185431,32945.486754966885,151 -mistral-7b,32,0,realistic,50,93.60467213114754,11.685655737704918,0.010819672131147541,9069.602704918034,1131.340163934426,3.871885245901639,0.0,36.32655737704918,1143.025819672131,9163.20737704918,122 -llama3.1-8b,16,64,realistic,50,85.64628787878787,10.688560606060607,0.010909090909090908,9114.497954545455,1137.0448484848484,3.8452272727272727,0.0,36.53287878787878,1147.7334090909092,9200.144242424243,132 -llama2-7b,0,0,none,100,56772.6076,7092.43031111111,1.3805777777777777,7149.239333333334,883.7617777777776,2.1449333333333334,0.0,138.2779111111111,7976.192088888889,63921.846933333334,225 -llama2-7b,64,16,none,150,60.229470899470904,7.5215343915343915,0.006455026455026455,9507.580158730158,1187.4734920634921,4.057777777777778,0.0,38.56195767195767,1194.9950264550266,9567.80962962963,189 -mistral-7b,4,2,none,150,97.54575342465753,12.173904109589042,0.010136986301369864,8856.392534246575,1105.1826027397262,3.718972602739726,0.0,33.77493150684931,1117.356506849315,8953.938287671233,146 -llama3.1-70b,32,8,realistic,60,84.0156338028169,10.488521126760563,0.008239436619718309,8966.869436619718,1119.59823943662,4.071197183098591,0.0,39.257746478873244,1130.0867605633803,9050.885070422535,142 -llama3.1-8b,0,8,realistic,150,67246.71391304348,8381.826273291925,1.692111801242236,8313.454285714286,1026.4172049689441,7.494223602484473,0.0,218.334099378882,9408.24347826087,75560.16819875776,161 -mistral-7b,0,4,none,100,64249.99748427674,8013.19427672956,1.590754716981132,6926.856100628932,856.8511320754718,9.48874213836478,0.0,205.6380503144654,8870.045408805032,71176.85358490565,159 -llama3.1-70b,0,0,none,50,49197.832588235295,6143.863941176471,1.1972352941176472,6155.474058823529,763.6895294117647,18.787235294117647,0.0,235.789,6907.553470588235,55353.30664705882,170 -llama3.1-8b,8,2,realistic,100,90.82266666666666,11.334592592592593,0.010666666666666666,8620.159851851853,1075.7168148148148,4.040962962962964,0.0,35.55607407407407,1087.0514074074074,8710.98251851852,135 -llama3.1-70b,0,4,realistic,10,37510.48879699248,4686.351654135338,0.1705263157894737,4316.18977443609,535.9300000000001,1.199097744360902,0.0,21.272180451127817,5222.281654135338,41826.67857142857,133 -llama2-7b,8,32,none,50,9874.549308176101,1233.9800000000002,0.05371069182389938,9542.42716981132,1191.1866037735847,4.3025157232704405,0.0,50.08352201257861,2425.1666037735854,19416.97647798742,159 -llama2-7b,64,2,realistic,50,95.30208695652173,11.874869565217393,0.019391304347826085,10242.338782608695,1279.4080869565214,3.9989565217391303,0.0,43.20626086956521,1291.282956521739,10337.640869565217,115 -llama3.1-70b,8,4,realistic,70,3238.6844680851063,404.70900709219853,0.023617021276595745,9098.854326241135,1136.0968085106383,4.203191489361702,0.0,40.92780141843972,1540.805815602837,12337.538794326241,141 -mistral-7b,8,0,none,100,82.7040127388535,10.32484076433121,0.008407643312101911,8793.784267515923,1097.4193630573247,3.8687261146496814,0.0,35.91057324840765,1107.744203821656,8876.488280254778,157 -llama3.1-8b,4,2,realistic,50,953.4571875,119.055078125,0.020468749999999997,8668.646171875,1081.43484375,3.6810937499999996,0.0,33.781875,1200.489921875,9622.103359375,128 -llama3.1-70b,8,0,realistic,60,8831.1,1103.517705882353,0.037941176470588235,8629.50711764706,1077.3192941176471,4.22035294117647,0.0,42.9024705882353,2180.8370000000004,17460.607117647058,170 -llama3.1-70b,8,16,realistic,30,8273.714154929577,1033.9012676056338,0.04112676056338028,8900.971197183098,1111.4609859154932,3.9676056338028167,0.0,41.143239436619716,2145.3622535211266,17174.68535211268,142 -llama3.1-70b,0,16,realistic,50,47647.01737142857,5950.592514285713,1.1109714285714287,5390.317257142858,667.9481142857143,20.347942857142858,0.0,229.1604571428571,6618.540628571428,53037.33462857143,175 -llama2-7b,64,8,none,200,47.58105504587156,5.944266055045871,0.004862385321100918,9427.442064220184,1177.4074770642203,4.0763761467889905,0.0,39.55825688073394,1183.351743119266,9475.023119266056,218 -llama3.1-70b,8,0,none,70,4888.868323353294,610.8937724550899,0.02976047904191617,8855.940419161678,1105.6729341317366,4.392934131736526,0.0,43.8202994011976,1716.5667065868265,13744.80874251497,167 -mistral-7b,8,2,none,150,86.76216216216216,10.831418918918919,0.00891891891891892,8619.261418918919,1075.468581081081,3.6932432432432436,0.0,32.77006756756757,1086.3000000000002,8706.023581081081,148 -llama3.1-70b,32,2,none,50,97.33838709677418,12.151693548387096,0.009435483870967742,9320.40185483871,1163.6651612903222,4.135645161290323,0.0,40.189112903225805,1175.8168548387093,9417.740241935484,124 -llama3.1-70b,16,2,realistic,60,92.78550724637681,11.583333333333334,0.008478260869565216,9025.040507246376,1126.8014492753623,4.324782608695652,0.0,40.31072463768116,1138.3847826086958,9117.826014492754,138 -mistral-7b,0,4,realistic,200,73086.21579268292,9109.692500000001,2.0195121951219517,9464.956890243902,1167.9661585365855,8.101158536585366,0.0,267.97646341463417,10277.658658536586,82551.17268292683,164 -llama2-7b,4,16,realistic,50,33864.364213483146,4232.0249438202245,0.4975842696629213,7832.221348314606,975.3345505617979,13.808707865168538,0.0,126.78044943820224,5207.359494382023,41696.58556179776,178 -llama3.1-70b,16,8,none,70,84.79385135135135,10.585675675675676,0.007905405405405404,9207.671756756756,1149.855135135135,4.319662162162162,0.0,41.922027027027035,1160.4408108108107,9292.465608108108,148 -mistral-7b,0,64,none,50,50052.08025806451,6242.514064516129,1.195806451612903,5581.272903225807,691.0357419354839,22.485612903225807,0.0,233.4231612903226,6933.549806451614,55633.353161290324,155 -llama2-7b,4,4,none,150,21781.341022222223,2721.833644444445,0.08084444444444445,8107.473333333333,1012.3388,4.1052,0.0,42.312977777777775,3734.172444444445,29888.814355555554,225 -llama3.1-8b,0,4,realistic,200,70305.86953488372,8763.552558139536,1.9151744186046515,9068.508139534883,1118.7497674418605,8.75639534883721,0.0,257.81418604651157,9882.302325581395,79374.3776744186,172 -mistral-7b,16,4,realistic,50,98.00581967213114,12.235081967213116,0.010819672131147541,8977.360901639344,1120.1113114754098,3.846803278688525,0.0,35.97352459016393,1132.3463934426227,9075.366721311475,122 -llama3.1-8b,16,4,none,100,83.77297101449275,10.454782608695652,0.010434782608695651,8758.592463768116,1092.943623188406,3.933405797101449,0.0,35.06304347826087,1103.3984057971015,8842.36543478261,138 -llama2-7b,32,8,realistic,150,104.40201058201058,13.042433862433862,0.00671957671957672,9503.735396825397,1187.1681481481482,4.458201058201058,0.0,42.93238095238096,1200.210582010582,9608.137407407406,189 -llama2-7b,32,4,none,100,73.08572289156626,9.1305421686747,0.006385542168674699,9858.373855421687,1231.6774698795182,4.0959638554216875,0.0,41.005,1240.8080120481927,9931.459578313254,166 -llama2-7b,64,64,realistic,150,56.02112299465241,6.995508021390375,0.006470588235294118,8911.594331550803,1112.9677005347594,4.334224598930481,0.0,40.16374331550803,1119.9632085561498,8967.615454545454,187 -mistral-7b,4,4,realistic,200,98.19486301369864,12.250890410958904,0.011917808219178084,9032.83404109589,1127.0452054794519,3.6804794520547945,0.0,33.99308219178082,1139.296095890411,9131.02890410959,146 -mistral-7b,8,8,realistic,50,100.66626984126985,12.567222222222222,0.010476190476190477,8961.72619047619,1118.1540476190478,3.896111111111111,0.0,36.219920634920626,1130.7212698412702,9062.39246031746,126 -llama3.1-70b,4,64,none,50,22864.211005291003,2856.881904761905,0.45492063492063495,7880.615714285715,982.2780423280423,10.785132275132277,0.0,102.870582010582,3839.1599470899478,30744.82671957672,189 -llama3.1-70b,16,4,none,50,95.20954887218046,11.88593984962406,0.008796992481203006,9335.482706766918,1165.6851879699248,4.172030075187969,0.0,41.184586466165406,1177.571127819549,9430.692255639098,133 -llama2-7b,32,2,none,100,75.54707317073171,9.438048780487804,0.006463414634146342,10294.147134146342,1285.9495731707318,3.909756097560976,0.0,40.1904268292683,1295.3876219512194,10369.694207317074,164 -mistral-7b,0,0,none,100,63662.19219512195,7939.383292682927,1.5896951219512194,7050.971219512196,873.1636585365853,7.028963414634147,0.0,176.85091463414636,8812.546951219514,70713.16341463415,164 -llama3.1-8b,4,2,none,100,92.15539568345324,11.500791366906475,0.010431654676258992,8725.185683453237,1088.9184172661871,4.0018705035971225,0.0,35.399640287769785,1100.4192086330934,8817.341079136691,139 -llama3.1-8b,16,8,realistic,50,91.52738095238095,11.422539682539682,0.011428571428571429,8854.199285714285,1104.6709523809525,3.8486507936507928,0.0,35.797619047619065,1116.093492063492,8945.726666666667,126 -llama3.1-70b,4,32,realistic,20,32653.10392857143,4080.1649285714293,0.2072857142857143,7134.792714285714,889.7718571428571,6.182785714285714,0.0,66.44699999999999,4969.936785714287,39787.896642857144,140 -llama3.1-70b,16,8,none,50,93.76037313432835,11.705,0.00873134328358209,9514.123059701493,1187.9682089552239,4.137985074626866,0.0,40.77970149253731,1199.673208955224,9607.88343283582,134 -llama2-7b,16,16,realistic,50,1338.3650724637682,167.2542028985507,0.010144927536231883,9728.418333333333,1215.2613768115943,3.858695652173913,0.0,40.53166666666667,1382.515579710145,11066.783405797101,138 -mistral-7b,16,0,none,200,66.76364640883978,8.334806629834254,0.0072928176795580115,8776.907624309393,1095.273591160221,3.5520994475138123,0.0,33.81237569060774,1103.6083977900553,8843.671270718232,181 -llama2-7b,64,2,realistic,200,48.990225225225224,6.111666666666666,0.00927927927927928,8958.275495495496,1118.8802702702703,4.092612612612613,0.0,38.44472972972973,1124.991936936937,9007.265720720721,222 -llama3.1-70b,16,2,none,50,96.68022727272728,12.069545454545455,0.008863636363636363,9318.89393939394,1163.5079545454544,4.0479545454545445,0.0,39.7244696969697,1175.5774999999999,9415.574166666667,132 -llama2-7b,8,16,realistic,100,9281.991421319797,1159.9790862944162,0.03253807106598985,8501.06076142132,1061.557918781726,3.704314720812183,0.0,37.05786802030457,2221.537005076142,17783.052182741118,197 -mistral-7b,4,8,realistic,50,936.0124615384615,116.89415384615384,0.018384615384615385,8860.980923076922,1105.5891538461537,3.7685384615384616,0.0,35.52207692307692,1222.4833076923076,9796.993384615385,130 -llama3.1-8b,0,0,none,50,50404.34697368421,6285.730460526316,1.1948684210526317,5585.062763157895,692.4330921052632,23.851447368421052,0.0,229.9478289473684,6978.163552631578,55989.40973684211,152 -llama3.1-8b,16,32,none,150,74.50690789473684,9.298355263157895,0.009473684210526315,8656.406842105262,1080.0166447368422,3.6417105263157894,0.0,32.241710526315785,1089.315,8730.913750000002,152 -llama2-7b,32,32,realistic,50,79.65531034482758,9.951241379310344,0.007310344827586207,9592.811793103449,1198.1200689655172,3.8214482758620685,0.0,40.27724137931034,1208.0713103448275,9672.467103448276,145 -llama2-7b,64,64,realistic,50,73.42389705882353,9.172794117647058,0.007794117647058824,9624.153676470587,1201.9701470588236,3.8861029411764707,0.0,40.10955882352941,1211.1429411764707,9697.577573529412,136 -mistral-7b,8,64,realistic,50,94.73276923076924,11.826461538461539,0.010153846153846154,9050.085384615386,1129.249076923077,3.9363076923076927,0.0,37.08592307692308,1141.0755384615386,9144.818153846152,130 -llama3.1-70b,8,2,none,10,12607.205039999999,1575.4234400000003,0.05152,8164.76304,1019.53264,3.29576,0.0,35.063199999999995,2594.95608,20771.96808,125 -llama3.1-70b,32,0,none,40,82.23047297297298,10.265675675675675,0.007905405405405404,9191.998783783783,1147.6856081081082,4.28081081081081,0.0,42.26,1157.9512837837838,9274.229256756757,148 -llama2-7b,16,4,none,200,93.07604545454545,11.627136363636366,0.0054090909090909085,9727.159454545455,1214.9885,4.2896363636363635,0.0,42.2505,1226.6156363636362,9820.2355,220 -llama2-7b,8,64,none,150,4727.0950717703345,590.6716746411483,0.02023923444976077,8418.087464114833,1051.2722488038278,4.145980861244019,0.0,37.55354066985646,1641.9439234449762,13145.182535885167,209 -mistral-7b,32,2,realistic,150,83.5822962962963,10.434444444444445,0.009777777777777778,8644.704814814813,1078.6176296296294,3.6695555555555552,0.0,32.44140740740742,1089.0520740740737,8728.28711111111,135 -mistral-7b,32,2,none,100,85.20280303030303,10.636742424242424,0.01,8464.378030303029,1056.0845454545452,3.9950757575757576,0.0,34.66492424242424,1066.7212878787877,8549.580833333333,132 -mistral-7b,64,32,none,50,81.37373015873015,10.158730158730158,0.010476190476190477,8610.346349206351,1074.1531746031744,3.7221428571428574,0.0,34.01198412698413,1084.3119047619048,8691.720079365079,126 -llama3.1-8b,0,32,realistic,150,67316.83462962964,8392.20061728395,1.7202469135802474,8188.45024691358,1011.2359876543209,6.1651851851851855,0.0,200.61104938271603,9403.436604938272,75505.28487654321,162 -llama3.1-8b,4,64,none,50,5814.886533333333,726.2718000000001,0.03853333333333333,8891.498533333333,1109.2730666666666,4.163,0.0,38.89266666666666,1835.5448666666668,14706.385066666666,150 -llama3.1-8b,64,32,none,200,69.50228187919463,8.673825503355705,0.009664429530201342,8540.824966442953,1065.5480536912753,3.626308724832215,0.0,32.42181208053691,1074.221879194631,8610.327248322148,149 -llama3.1-8b,64,64,realistic,200,557.714358974359,69.66115384615385,0.010128205128205128,8325.506089743589,1038.5121153846155,3.2933974358974356,0.0,29.84570512820513,1108.173269230769,8883.22044871795,156 -llama3.1-70b,0,16,none,10,39916.41905797101,4986.899855072465,0.25956521739130434,4400.261739130435,546.3215942028986,3.0412318840579706,0.0,42.61601449275363,5533.221449275363,44316.68079710145,138 -llama3.1-70b,8,4,realistic,50,9218.829583333332,1152.0029166666666,0.05701388888888889,8488.234583333333,1059.8206249999998,3.7691666666666666,0.0,39.56236111111111,2211.8235416666666,17707.064166666667,144 -llama3.1-70b,32,4,realistic,60,89.69388059701492,11.197388059701494,0.00873134328358209,9243.28276119403,1154.1087313432834,3.977537313432836,0.0,38.60149253731343,1165.306119402985,9332.976641791045,134 -llama2-7b,64,2,realistic,150,55.30649746192893,6.904568527918782,0.010456852791878173,9471.350456852791,1183.0275634517766,4.1971573604060906,0.0,40.15172588832488,1189.9321319796954,9526.65695431472,197 -llama2-7b,0,4,realistic,100,54354.178515283835,6789.7096506550215,1.4941484716157203,7018.002620087336,872.4444104803495,14.24873362445415,0.0,258.44449781659387,7662.154061135371,61372.18113537118,229 -llama2-7b,0,0,none,150,59529.45367256638,7436.309778761061,1.464911504424779,8167.288053097345,1000.6673451327433,2.2253097345132744,0.0,154.86274336283188,8436.977123893806,67696.74172566371,226 -llama3.1-8b,8,64,none,150,74.16616352201258,9.255849056603774,0.009056603773584906,9010.52314465409,1124.3589937106915,3.7070440251572325,0.0,34.58446540880503,1133.6148427672954,9084.6893081761,159 -llama3.1-70b,32,16,realistic,30,89.85068181818183,11.216969696969699,0.008863636363636363,9176.771136363637,1145.7661363636364,3.8425757575757578,0.0,39.30636363636364,1156.9831060606061,9266.621818181819,132 -llama3.1-8b,16,4,none,200,76.9482,9.603066666666667,0.0096,8936.8176,1115.0192666666665,3.6752666666666665,0.0,33.5816,1124.6223333333332,9013.765800000001,150 -llama3.1-70b,32,2,realistic,40,101.00533333333333,12.6095,0.00975,9249.821583333332,1154.91925,3.8899166666666667,0.0,38.91875,1167.5287500000002,9350.826916666667,120 -llama3.1-70b,8,16,none,10,9303.11542635659,1162.4842635658915,0.04930232558139535,8933.973798449613,1115.6399999999999,4.148217054263566,0.0,43.21341085271318,2278.124263565892,18237.0892248062,129 -mistral-7b,4,64,none,200,1937.871488095238,242.03672619047617,0.023154761904761907,9025.75857142857,1126.1405357142855,3.7513690476190478,0.0,35.601488095238096,1368.177261904762,10963.63005952381,168 -mistral-7b,8,8,none,50,96.34473282442748,12.027709923664123,0.010076335877862596,8956.82,1117.4815267175575,3.890381679389313,0.0,36.23854961832061,1129.5092366412216,9053.164732824429,131 -mistral-7b,16,64,realistic,150,74.66857142857143,9.32168831168831,0.008571428571428572,8757.522532467532,1092.7935064935066,3.6356493506493504,0.0,33.01922077922078,1102.1151948051947,8832.191103896104,154 -llama3.1-8b,0,2,realistic,50,46224.51776223776,5764.159090909091,0.8884615384615385,5221.660349650349,647.3009790209791,27.56335664335664,0.0,214.46510489510487,6411.46006993007,51446.17811188811,143 -llama3.1-8b,4,32,none,100,1367.7323333333334,170.81313333333333,0.016933333333333335,8931.026,1114.535133333333,3.908133333333333,0.0,36.35013333333334,1285.3482666666666,10298.758333333333,150 -llama3.1-8b,32,16,none,100,80.15117647058823,10.00279411764706,0.010588235294117647,8817.905735294116,1100.3822794117646,3.8555882352941175,0.0,35.79191176470588,1110.3850735294118,8898.056911764706,136 -llama3.1-70b,4,8,none,50,28691.562620689656,3585.155724137931,0.1422068965517241,8814.292206896553,1100.048827586207,5.699931034482758,0.0,61.30248275862069,4685.204551724138,37505.854827586205,145 -llama3.1-70b,16,2,realistic,70,89.48776223776224,11.171608391608391,0.00818181818181818,8879.156223776223,1108.641048951049,4.057622377622378,0.0,37.767762237762234,1119.8126573426575,8968.643986013985,143 -llama2-7b,0,64,realistic,100,63234.813551401865,7899.585280373832,1.5157009345794392,8166.123224299066,1014.9410747663551,3.8919158878504674,0.0,171.50481308411216,8914.526355140188,71400.93677570093,214 -llama2-7b,4,0,realistic,100,20288.12798816568,2534.92100591716,0.13402366863905324,6649.457633136094,829.0103550295858,3.952603550295858,0.0,42.090118343195265,3363.931360946746,26937.585621301772,169 -llama2-7b,8,8,realistic,200,5996.871375,749.3638333333333,0.03333333333333333,8676.88075,1083.63175,4.045625,0.0,38.317125000000004,1832.9955833333333,14673.752124999999,240 -llama2-7b,32,0,realistic,200,80.60760736196319,10.051165644171776,0.014294478527607362,10152.92582822086,1262.2306134969324,4.23877300613497,0.0,45.84815950920246,1272.2817791411042,10233.533435582822,163 -llama2-7b,64,0,realistic,100,82.0308888888889,10.248,0.007851851851851853,10415.050296296296,1300.9113333333332,4.096962962962963,0.0,45.5402962962963,1311.1593333333335,10497.081185185185,135 -mistral-7b,8,2,none,200,85.49833333333333,10.673666666666666,0.0088,8980.7166,1120.5198,3.6576666666666666,0.0,33.7856,1131.1934666666666,9066.214933333333,150 -mistral-7b,32,2,none,200,77.40179310344828,9.662896551724137,0.00910344827586207,8541.130551724138,1065.767379310345,3.9957931034482748,0.0,34.30324137931035,1075.430275862069,8618.532344827587,145 -mistral-7b,32,64,realistic,200,69.1950641025641,8.638333333333334,0.008461538461538461,8508.212435897436,1061.5536538461538,3.5144871794871793,0.0,31.274487179487178,1070.1919871794871,8577.407500000001,156 -llama3.1-70b,0,0,none,20,43075.07641791045,5379.779925373135,0.5494029850746269,5460.53380597015,677.3888059701493,15.252985074626867,0.0,151.06701492537314,6057.168731343284,48535.61022388059,134 -llama3.1-70b,4,8,none,40,32196.911849315067,4023.303082191781,0.2552739726027397,8558.755547945206,1068.2017123287671,10.36335616438356,0.0,103.38061643835616,5091.504794520548,40755.66739726027,146 -llama3.1-70b,4,16,none,60,19912.917857142857,2488.1966883116884,0.12272727272727271,8217.96987012987,1025.654935064935,6.008051948051949,0.0,56.55116883116884,3513.8516233766236,28130.887727272726,154 -llama3.1-70b,4,64,none,70,12530.793055555556,1565.6014444444445,0.11511111111111112,8823.76838888889,1101.0074444444442,5.873388888888889,0.0,66.40183333333334,2666.608888888889,21354.561444444444,180 -llama3.1-70b,16,8,none,30,97.50263565891473,12.172248062015504,0.009069767441860464,9488.291395348839,1184.8390697674417,4.169302325581396,0.0,40.90077519379845,1197.011317829457,9585.794031007752,129 -llama3.1-70b,16,32,realistic,10,132.37148936170212,16.525212765957445,0.012446808510638297,9615.298936170213,1200.7770212765959,3.4213829787234045,0.0,36.6186170212766,1217.3022340425532,9747.670425531915,94 -llama3.1-70b,32,32,none,30,82.55584507042254,10.306267605633803,0.008239436619718309,9116.606408450703,1138.3796478873237,4.055422535211267,0.0,39.82852112676056,1148.6859154929577,9199.162253521126,142 -llama3.1-70b,16,16,realistic,70,86.2076551724138,10.762137931034482,0.008068965517241379,9173.011103448274,1145.371655172414,4.3148275862068965,0.0,41.51586206896551,1156.1337931034484,9259.21875862069,145 -llama3.1-70b,0,0,realistic,70,53871.39303867404,6727.74226519337,1.1807734806629835,6549.681933701657,810.400276243094,10.37817679558011,0.0,177.5414917127072,7538.142541436463,60421.07497237569,181 -llama3.1-70b,0,0,none,70,55535.58417142857,6935.363714285714,1.2896571428571428,7258.554857142858,896.8404571428572,7.306914285714286,0.0,167.5084,7832.20417142857,62794.13902857143,175 -llama3.1-70b,4,2,realistic,20,17315.944848484847,2163.7949242424247,0.07446969696969696,7395.020984848485,922.9145454545454,3.8199242424242428,0.0,35.41477272727273,3086.70946969697,24710.965833333335,132 -llama2-7b,0,32,none,200,54091.30520689655,6756.86451724138,1.3832413793103449,7375.968206896553,909.4397586206895,3.7504827586206892,0.0,187.22358620689653,7666.3042758620695,61467.273413793104,290 -mistral-7b,16,4,realistic,150,87.72367647058823,10.951470588235296,0.009705882352941177,8987.733529411766,1121.4629411764708,3.707941176470588,0.0,33.91382352941177,1132.414411764706,9075.457205882352,136 -mistral-7b,0,0,realistic,50,44804.392377622375,5586.560909090909,0.9639160839160837,5179.862937062937,641.3036363636364,29.10083916083916,0.0,230.19524475524474,6227.864545454547,49984.25531468531,143 -mistral-7b,4,32,none,100,2601.2941721854304,324.92158940397354,0.022450331125827817,8991.286953642384,1122.1982119205297,4.080397350993378,0.0,38.28317880794702,1447.1198013245034,11592.581125827815,151 -mistral-7b,32,8,none,50,89.97382113821139,11.232357723577236,0.010731707317073172,8950.25243902439,1116.6960162601627,3.7782113821138203,0.0,35.53130081300813,1127.9283739837401,9040.226260162603,123 -llama3.1-8b,4,4,realistic,100,93.85904411764706,11.71345588235294,0.010661764705882353,8758.593676470588,1093.055,3.9325735294117643,0.0,35.3177205882353,1104.768455882353,8852.452720588235,136 -llama3.1-8b,32,2,none,150,80.41971014492754,10.036304347826087,0.010434782608695651,8699.85731884058,1085.438768115942,3.8131884057971015,0.0,33.79036231884058,1095.4750724637681,8780.277028985507,138 -llama3.1-70b,4,2,none,30,24793.139779411762,3098.1462500000002,0.11661764705882352,8024.959852941177,1001.7722794117647,5.011691176470588,0.0,50.50764705882352,4099.918529411765,32818.09963235294,136 -llama3.1-70b,8,16,none,70,1861.375316455696,232.59898734177216,0.014810126582278482,8468.231518987342,1057.1748101265823,4.3123417721519,0.0,38.916772151898726,1289.7737974683546,10329.606835443037,158 -llama3.1-70b,32,2,none,30,103.26615384615384,12.8917094017094,0.01,9596.542564102563,1198.2016239316235,4.128119658119658,0.0,41.48452991452992,1211.0933333333332,9699.80871794872,117 -mistral-7b,64,8,none,100,83.96314516129033,10.482016129032258,0.01064516129032258,8784.387096774193,1096.179435483871,3.892016129032258,0.0,35.34564516129032,1106.6614516129032,8868.350241935483,124 -llama3.1-8b,32,64,realistic,200,1543.5477987421384,192.7932075471698,0.0210062893081761,8807.977672955974,1098.7800628930818,3.475911949685535,0.0,32.55641509433962,1291.5732704402515,10351.525471698113,159 -llama3.1-8b,64,32,none,150,73.50234042553191,9.172978723404256,0.010212765957446808,8532.759787234041,1064.5151063829787,3.6521276595744685,0.0,32.61723404255319,1073.688085106383,8606.262127659575,141 -llama3.1-70b,8,32,none,60,2748.8715384615384,343.49365384615385,0.020512820512820513,8680.287115384615,1083.597115384615,4.141025641025641,0.0,38.62897435897436,1427.090769230769,11429.158653846154,156 -llama2-7b,4,0,realistic,200,28712.019834710743,3587.573140495868,0.14801652892561984,9186.565454545454,1145.1644628099173,6.068181818181818,0.0,68.8603305785124,4732.737603305785,37898.5852892562,121 -llama3.1-8b,16,2,realistic,50,96.55694214876033,12.050165289256197,0.011900826446280991,8774.99958677686,1094.7407438016528,3.959256198347107,0.0,35.843223140495866,1106.790909090909,8871.55652892562,121 -llama2-7b,32,2,realistic,150,66.83016042780748,8.342513368983958,0.007005347593582888,9949.314812834225,1242.557807486631,4.508716577540108,0.0,44.86684491978609,1250.900320855615,10016.144973262031,187 -llama3.1-8b,64,2,none,150,81.03068702290076,10.112519083969465,0.01099236641221374,8275.502213740458,1032.4801526717558,3.6738931297709927,0.0,31.29312977099237,1042.5926717557254,8356.532900763359,131 -llama2-7b,0,2,none,150,52919.983303964764,6609.395814977974,0.9976651982378856,7944.122555066078,985.7045814977973,20.764977973568286,0.0,257.3192511013216,7595.100396475771,60864.10585903083,227 -llama2-7b,4,16,none,100,27495.743270142182,3435.8921327014223,0.2224170616113744,7679.225592417062,958.4590047393365,11.387630331753554,0.0,81.07729857819906,4394.351137440759,35174.96886255924,211 -llama3.1-70b,16,2,none,10,116.26145454545454,14.514090909090909,0.010636363636363637,9972.745454545455,1245.5994545454541,4.086636363636364,0.0,41.07381818181819,1260.1135454545451,10089.00690909091,110 -llama3.1-8b,0,16,realistic,50,45043.96053333333,5617.23,0.9455333333333332,4927.313866666667,610.3254666666667,28.25786666666667,0.0,223.25886666666668,6227.555466666667,49971.2744,150 -llama3.1-8b,8,0,realistic,200,6072.451609195403,758.3247126436781,0.072183908045977,9162.35103448276,1142.9781609195402,5.61741379310345,0.0,56.72913793103448,1901.302873563218,15234.802643678162,174 -llama3.1-8b,16,4,none,50,91.096062992126,11.368661417322834,0.011338582677165353,8994.40527559055,1122.1127559055114,3.8795275590551186,0.0,36.1163779527559,1133.4814173228342,9085.501338582677,127 -llama3.1-8b,16,8,realistic,200,77.21187919463087,9.635973154362416,0.009664429530201342,8912.677449664428,1111.9737583892618,3.6464429530201348,0.0,33.655838926174496,1121.609731543624,8989.88932885906,149 -llama3.1-8b,64,16,none,150,73.95120567375888,9.22900709219858,0.010212765957446808,8619.372907801418,1075.3878723404255,3.5562411347517733,0.0,31.729290780141845,1084.616879432624,8693.324113475179,141 -llama3.1-70b,0,0,none,60,53064.3359375,6626.824895833332,1.2570833333333333,6559.91390625,813.3908333333334,13.685104166666667,0.0,212.24515625,7440.215729166666,59624.249843749996,192 -llama3.1-70b,0,8,realistic,20,40796.97819444445,5095.435069444444,0.4404166666666667,4413.1397916666665,547.8460416666667,12.623819444444443,0.0,113.90791666666665,5643.281111111111,45210.117986111116,144 -llama3.1-70b,8,4,none,60,3603.0124113475176,450.24248226950357,0.029148936170212768,9292.397163120568,1160.2606382978724,4.237943262411348,0.0,41.73950354609929,1610.5031205673758,12895.409574468085,141 -llama3.1-70b,32,8,none,50,90.08818181818181,11.246590909090909,0.008863636363636363,9281.906363636364,1158.9453787878788,4.110227272727272,0.0,40.980606060606064,1170.1919696969699,9371.994545454545,132 -llama2-7b,8,2,none,150,10609.489166666668,1325.872916666667,0.04685185185185186,8505.261296296296,1062.2907870370373,4.161574074074074,0.0,38.69782407407407,2388.163703703704,19114.750462962962,216 -llama2-7b,32,64,none,100,77.361375,9.651375,0.0075625,8952.279875,1118.11275,4.0734375,0.0,39.006,1127.7641250000001,9029.64125,160 -mistral-7b,0,32,none,100,64782.96352201257,8078.929119496858,1.6449685534591192,7227.666226415094,894.3261635220125,6.802389937106918,0.0,179.82232704402514,8973.25528301887,72010.62974842767,159 -mistral-7b,16,32,none,50,84.52145985401461,10.551678832116787,0.009635036496350365,8943.406496350364,1115.7669343065693,3.8437956204379558,0.0,36.17948905109489,1126.318613138686,9027.92795620438,137 -mistral-7b,32,32,none,150,72.36806666666666,9.034466666666667,0.0088,8916.406133333332,1112.5244666666665,3.6587999999999994,0.0,33.60726666666667,1121.5589333333332,8988.7742,150 -mistral-7b,64,16,none,100,81.3184251968504,10.151811023622047,0.010393700787401575,8864.572440944881,1106.1974803149608,3.8770866141732285,0.0,35.848740157480314,1116.3492913385828,8945.890866141734,127 -llama3.1-8b,16,8,none,200,72.57905063291139,9.057784810126583,0.009113924050632912,8731.179556962026,1089.3382278481013,3.5788607594936703,0.0,32.16335443037975,1098.3960126582278,8803.758607594937,158 -llama3.1-8b,64,8,realistic,100,87.73641666666667,10.949416666666668,0.012,8955.82525,1117.5069999999998,3.836,0.0,35.557166666666674,1128.4564166666667,9043.561666666666,120 -llama3.1-70b,8,16,none,50,3727.315170068027,465.75442176870746,0.02673469387755102,9037.301360544217,1128.3753061224488,4.101972789115647,0.0,41.036734693877555,1594.1297278911566,12764.616530612244,147 -llama3.1-70b,8,64,realistic,50,9557.401024096385,1194.2819277108435,0.04168674698795181,8582.565421686746,1071.4619879518073,4.085180722891567,0.0,41.872228915662646,2265.7439156626506,18139.96644578313,166 -llama2-7b,32,0,realistic,100,95.24448275862069,11.890758620689656,0.008344827586206896,9938.054,1240.9628275862071,4.135931034482759,0.0,42.2831724137931,1252.8535862068966,10033.29848275862,145 -llama3.1-70b,4,2,realistic,60,26007.467399999998,3249.719066666667,0.09999999999999999,7419.082266666665,925.9552666666667,4.7512,0.0,44.876666666666665,4175.674333333334,33426.549666666666,150 -llama3.1-70b,4,4,none,20,8290.979457364341,1035.9926356589147,0.04922480620155039,8956.87534883721,1118.297364341085,4.205116279069768,0.0,41.85790697674419,2154.29,17247.85480620155,129 -mistral-7b,0,8,realistic,200,72957.90698224853,9094.431183431954,2.0393491124260357,9213.189112426035,1136.9590532544378,6.592130177514792,0.0,257.3475739644971,10231.390236686391,82171.09609467456,169 -mistral-7b,64,8,realistic,150,77.93492537313433,9.729402985074627,0.009850746268656717,8437.642462686566,1052.7865671641791,3.6830597014925375,0.0,32.11082089552239,1062.515970149254,8515.577388059703,134 -llama3.1-8b,8,0,none,150,78.39082352941176,9.78064705882353,0.009352941176470588,8557.106,1067.5404705882354,3.5684117647058824,0.0,32.25017647058824,1077.3211176470588,8635.49682352941,170 -llama3.1-70b,8,2,realistic,30,10067.355488721805,1258.0645864661656,0.05030075187969924,8262.583007518797,1031.5582706766918,3.6075187969924816,0.0,35.87593984962406,2289.6228571428574,18329.938496240604,133 -llama3.1-70b,4,4,realistic,50,20471.489420289854,2557.9192028985503,0.09565217391304347,8167.244565217391,1019.1741304347828,4.5213768115942035,0.0,43.42949275362319,3577.0933333333332,28638.733985507246,138 -mistral-7b,16,2,realistic,150,87.87912408759125,10.97087591240876,0.009635036496350365,8820.000291970804,1100.6151094890513,3.6986861313868613,0.0,33.58693430656934,1111.5859854014598,8907.879416058395,137 -mistral-7b,4,4,none,200,457.878940397351,57.183377483443714,0.016821192052980133,9082.288543046358,1133.3403973509933,3.7736423841059605,0.0,35.192317880794704,1190.523774834437,9540.167483443709,151 -mistral-7b,8,0,none,150,105.0945508982036,13.117005988023951,0.010598802395209581,9049.224131736526,1129.1650898203593,3.6759880239520966,0.0,35.22988023952096,1142.2820958083832,9154.31868263473,167 -llama3.1-8b,0,32,realistic,50,45103.632108843536,5625.1004081632655,0.9771428571428571,5238.649931972789,648.8455102040817,28.262925170068026,0.0,235.53571428571428,6273.945918367346,50342.282040816324,147 -llama3.1-8b,4,4,none,150,353.78472972972975,44.18054054054054,0.014527027027027026,9011.443986486487,1124.3685135135136,3.6922972972972974,0.0,33.975608108108105,1168.549054054054,9365.228716216217,148 -llama3.1-8b,4,64,none,150,1494.7915625,186.6990625,0.014687500000000001,9129.8201875,1139.1481250000002,3.7816874999999994,0.0,35.6190625,1325.8471875,10624.61175,160 -llama3.1-8b,4,64,none,200,584.7778362573099,73.03391812865497,0.012690058479532163,8902.85374269006,1110.8144444444442,3.6788304093567246,0.0,34.238713450292394,1183.8483625730994,9487.631578947368,171 -llama3.1-70b,16,0,none,50,82.47673076923077,10.296410256410256,0.0075,9089.794871794871,1134.939807692308,4.102884615384616,0.0,41.34826923076923,1145.2362179487181,9172.271602564104,156 -llama2-7b,32,64,realistic,150,60.19724489795918,7.517397959183674,0.006020408163265307,9149.194744897959,1142.6557142857143,4.422295918367347,0.0,42.81464285714286,1150.173112244898,9209.391989795919,196 -mistral-7b,16,32,none,150,73.12917721518987,9.129493670886076,0.008354430379746836,8641.079620253166,1078.2749367088609,3.7086708860759496,0.0,32.84443037974684,1087.404430379747,8714.208797468355,158 -llama3.1-8b,32,32,realistic,100,79.31788321167883,9.898759124087592,0.010510948905109488,8854.051678832117,1104.8578102189783,3.8484671532846715,0.0,34.989416058394156,1114.7565693430656,8933.369562043796,137 -llama3.1-70b,16,16,realistic,50,1435.3710810810812,179.3702027027027,0.012702702702702701,8943.925878378379,1116.773783783784,3.7708108108108105,0.0,37.71628378378379,1296.1439864864867,10379.29695945946,148 -llama2-7b,16,8,realistic,200,59.582060085836915,7.443519313304721,0.004549356223175966,9020.013261802575,1126.599356223176,4.33519313304721,0.0,40.08918454935622,1134.0428755364806,9079.595321888411,233 -llama3.1-70b,4,64,realistic,50,19766.90144578313,2469.7654216867472,0.11192771084337348,6901.414879518073,861.0027710843373,4.682650602409638,0.0,43.92813253012047,3330.7681927710846,26668.316325301203,166 -mistral-7b,16,32,realistic,200,72.92044025157233,9.103396226415095,0.00830188679245283,8749.007924528301,1091.6693710691823,3.5684276729559743,0.0,32.46660377358491,1100.7727672955975,8821.928364779873,159 -mistral-7b,32,8,none,150,78.33234042553192,9.779078014184396,0.009361702127659575,8769.799361702127,1094.3091489361705,3.741063829787234,0.0,34.07191489361702,1104.0882269503547,8848.13170212766,141 -llama3.1-8b,4,16,realistic,100,1966.137847222222,245.5813888888889,0.023194444444444448,8498.627916666666,1060.5752777777777,3.7512499999999998,0.0,34.35236111111111,1306.1566666666668,10464.765763888889,144 -llama3.1-70b,16,2,none,40,102.97427419354838,12.855322580645161,0.009435483870967742,9428.17935483871,1177.2801612903224,4.387419354838709,0.0,43.39016129032258,1190.1354838709676,9531.153629032258,124 -llama2-7b,0,2,none,100,48513.51110132159,6060.233127753304,1.0841409691629957,6840.922907488987,851.8574889867842,26.264713656387663,0.0,299.99449339207047,6912.090616740088,55354.43400881057,227 -llama2-7b,4,8,realistic,50,37227.73798742138,4652.223144654088,0.24081761006289307,8142.960817610063,1016.3463522012579,9.327735849056605,0.0,95.23603773584907,5668.569496855346,45370.69880503144,159 -mistral-7b,4,4,none,50,714.5601550387597,89.24527131782945,0.020310077519379847,8916.57465116279,1112.5052713178295,3.985736434108527,0.0,36.80813953488372,1201.750542635659,9631.134806201551,129 -mistral-7b,8,16,none,50,93.27761194029851,11.644850746268657,0.009850746268656717,8937.013582089552,1115.1841791044776,3.999626865671642,0.0,37.444850746268656,1126.8290298507463,9030.29119402985,134 -llama3.1-8b,4,0,none,50,6273.280961538461,783.5626923076924,0.044807692307692305,8645.904423076921,1078.3761538461538,4.3206410256410255,0.0,39.263076923076916,1861.9388461538463,14919.185384615384,156 -llama3.1-8b,16,0,none,100,72.77186335403727,9.081863354037267,0.008944099378881987,8734.107080745342,1089.9237267080746,3.780683229813665,0.0,34.999689440993784,1099.0055900621119,8806.878944099379,161 -llama3.1-70b,4,4,realistic,60,21837.960816326533,2728.765306122449,0.10537414965986394,8162.596190476192,1018.5670068027212,5.120204081632654,0.0,50.35095238095238,3747.33231292517,30000.55700680272,147 -llama3.1-70b,16,2,realistic,50,97.80076335877862,12.209465648854962,0.008931297709923663,8958.790534351145,1118.496488549618,4.272595419847328,0.0,41.110534351145034,1130.7059541984731,9056.591297709923,131 -llama3.1-70b,16,32,realistic,50,79.58679487179488,9.935576923076923,0.0075,9042.456794871794,1129.1014102564102,4.126474358974359,0.0,41.528653846153844,1139.0369871794874,9122.043589743591,156 -mistral-7b,4,0,realistic,200,7407.82650273224,924.8357923497268,0.11775956284153004,8558.529289617485,1067.8042076502734,5.2481967213114755,0.0,51.02704918032787,1992.6399999999999,15966.355792349728,183 -llama3.1-70b,0,2,none,20,41680.666153846156,5205.404055944056,0.567902097902098,5386.757202797203,669.1606993006993,21.01811188811189,0.0,183.78265734265733,5874.564755244756,47067.423356643354,143 -llama3.1-70b,32,4,realistic,40,101.96618644067797,12.729491525423729,0.009915254237288135,10014.34906779661,1250.4304237288134,3.9813559322033902,0.0,43.0420338983051,1263.159915254237,10116.315254237288,118 -mistral-7b,0,16,realistic,200,73644.93666666668,9180.047575757577,2.036060606060606,9327.12587878788,1150.965090909091,6.232242424242424,0.0,255.4044242424243,10331.012666666667,82972.06254545455,165 -mistral-7b,0,8,none,100,64208.57783950617,8006.278641975309,1.592037037037037,7233.243888888889,894.5464197530864,8.10962962962963,0.0,196.158024691358,8900.825061728396,71441.82172839507,162 -llama2-7b,16,4,none,50,2866.742932330827,358.2603759398496,0.02661654135338346,10146.62939849624,1267.5385714285715,3.5487969924812033,0.0,40.23293233082706,1625.7989473684208,13013.372330827067,133 -mistral-7b,4,4,realistic,50,1730.2484,216.02976,0.03544,8998.24064,1122.67376,3.9212,0.0,36.9092,1338.70352,10728.489039999999,125 -llama3.1-8b,32,4,realistic,200,78.9605,9.854142857142856,0.010285714285714285,9096.771928571428,1134.9740714285715,3.680642857142857,0.0,34.25721428571429,1144.8282142857145,9175.73242857143,140 -llama3.1-70b,4,2,none,60,23623.538333333334,2951.864266666667,0.09146666666666667,7636.022733333334,953.0268000000001,4.6916,0.0,42.018600000000006,3904.8910666666675,31259.56106666667,150 -llama3.1-70b,16,16,realistic,10,132.0203157894737,16.481368421052633,0.01231578947368421,9524.278631578947,1189.4413684210526,3.6442105263157907,0.0,37.294315789473686,1205.9227368421052,9656.298947368421,95 -llama2-7b,16,16,realistic,200,98.91843478260868,12.352478260869564,0.005391304347826088,9122.678478260868,1139.4422608695652,4.369217391304348,0.0,41.92386956521739,1151.7947391304347,9221.596913043479,230 -llama2-7b,16,16,none,200,71.00058295964125,8.86798206278027,0.005246636771300449,9194.300582959642,1148.350941704036,4.211345291479821,0.0,40.38421524663677,1157.2189237668163,9265.301165919283,223 -llama2-7b,64,8,realistic,50,91.6,11.443478260869565,0.009217391304347827,10390.434521739131,1297.9715652173911,4.073391304347825,0.0,42.525478260869576,1309.4150434782607,10482.03452173913,115 -mistral-7b,0,64,none,100,65543.04632258065,8173.238387096775,1.6543870967741932,7153.397225806452,885.5534838709677,6.974967741935483,0.0,182.77812903225808,9058.791870967743,72696.44354838709,155 -mistral-7b,64,0,realistic,150,72.86149659863945,9.096054421768708,0.008979591836734694,8437.319115646258,1052.4825850340137,3.284761904761905,0.0,29.352448979591845,1061.5786394557822,8510.180612244896,147 -llama3.1-8b,32,0,realistic,50,89.84192,11.212159999999999,0.011519999999999999,9076.212160000001,1132.2340800000002,3.9044,0.0,37.01239999999999,1143.4462399999998,9166.05408,125 -llama3.1-70b,4,0,none,20,28693.31463768116,3585.3558695652173,0.24398550724637683,7726.093768115941,964.5241304347825,9.467173913043478,0.0,95.68804347826085,4549.88,36419.4084057971,138 -llama3.1-70b,4,16,realistic,10,24596.886,3073.5688000000005,0.08624000000000002,5791.155519999999,722.01784,2.29152,0.0,26.322479999999995,3795.58664,30388.04152,125 -llama3.1-70b,8,16,realistic,20,16822.49705882353,2102.157205882353,0.08272058823529412,8140.375808823528,1016.3522794117647,3.819117647058824,0.0,39.927867647058825,3118.509485294118,24962.87286764706,136 -llama3.1-70b,16,64,none,20,99.6930303030303,12.441969696969696,0.00946969696969697,9842.772803030302,1228.9883333333335,4.022045454545455,0.0,42.22280303030303,1241.430303030303,9942.465833333334,132 -llama3.1-8b,16,2,none,100,86.19244444444445,10.75674074074074,0.010666666666666666,8706.167185185186,1086.4902962962963,3.660962962962963,0.0,32.4731111111111,1097.2470370370368,8792.35962962963,135 -llama3.1-70b,16,2,realistic,40,103.37983870967743,12.905887096774192,0.009435483870967742,9432.841290322582,1177.8356451612906,4.009193548387097,0.0,39.367983870967734,1190.7415322580646,9536.221129032258,124 -llama3.1-8b,0,0,realistic,50,45270.64345070422,5644.510211267606,0.9585211267605634,5287.166267605635,654.7373943661971,29.82908450704225,0.0,231.97985915492958,6299.247605633803,50557.809718309865,142 -llama3.1-70b,32,8,realistic,70,83.966338028169,10.482323943661973,0.008239436619718309,9181.63147887324,1146.4688028169014,4.290774647887323,0.0,40.60669014084507,1156.9511267605633,9265.597816901409,142 -llama3.1-70b,16,32,none,60,82.37853333333334,10.284133333333333,0.0078,8886.9194,1109.5547333333334,4.1478,0.0,39.624266666666664,1119.8388666666667,8969.297933333333,150 -llama3.1-8b,0,4,realistic,150,66616.58893081761,8303.981069182391,1.6366666666666665,8356.066792452832,1031.791761006289,8.860125786163524,0.0,222.50389937106917,9335.772830188678,74972.65572327044,159 -llama3.1-70b,32,16,none,50,86.77514705882353,10.833014705882352,0.008602941176470588,9242.657647058822,1153.9358088235294,4.068529411764706,0.0,40.83279411764706,1164.7688235294117,9329.432794117645,136 -llama2-7b,16,8,none,100,699.963705882353,87.44582352941177,0.013176470588235295,9783.404588235295,1222.1491764705884,4.5385294117647055,0.0,44.37717647058823,1309.595,10483.368294117647,170 -llama2-7b,16,0,realistic,50,3103.797635135135,387.88445945945944,0.02081081081081081,9246.231486486486,1154.833783783784,3.6989864864864868,0.0,39.122567567567565,1542.7182432432433,12350.029121621623,148 -llama3.1-70b,8,2,realistic,60,3775.7072916666666,471.8311805555556,0.03673611111111111,8742.739027777778,1091.4027777777778,3.992638888888889,0.0,38.35597222222222,1563.2339583333332,12518.446319444445,144 -llama2-7b,64,8,none,50,91.55543859649123,11.437894736842106,0.009298245614035089,10325.075789473685,1289.6892982456143,3.8219298245614035,0.0,41.90464912280701,1301.1271929824563,10416.631228070175,114 -mistral-7b,32,0,none,150,71.30270440251572,8.901446540880503,0.00830188679245283,8798.678616352201,1097.7740880503145,3.388867924528302,0.0,32.18421383647799,1106.6755345911947,8869.981320754718,159 -llama3.1-70b,8,2,none,40,7507.720909090909,938.1648484848484,0.043560606060606064,8687.87303030303,1084.6530303030304,3.911818181818183,0.0,37.785681818181814,2022.8178787878785,16195.593939393939,132 -llama2-7b,8,2,none,100,9458.75015873016,1182.0422751322753,0.0455026455026455,8701.601851851852,1086.6742328042328,3.6984126984126977,0.0,37.0563492063492,2268.7165079365077,18160.352010582013,189 -mistral-7b,64,0,realistic,100,78.84161764705881,9.842647058823529,0.009705882352941177,8466.028235294116,1056.4637500000001,3.7419117647058826,0.0,33.338088235294116,1066.3063970588237,8544.869852941178,136 -llama3.1-70b,0,0,realistic,40,45003.78278787879,5620.6393333333335,0.9622424242424241,5374.095696969697,666.1855757575757,21.145575757575756,0.0,226.80472727272726,6286.824909090909,50377.87848484849,165 -llama3.1-8b,8,0,none,50,84.3609589041096,10.528150684931505,0.009863013698630137,8952.413219178083,1116.7863698630138,3.8343150684931504,0.0,35.99801369863014,1127.3145205479452,9036.774178082193,146 -llama3.1-8b,16,2,realistic,100,88.44287878787878,11.037575757575757,0.010909090909090908,8640.383787878789,1078.2177272727272,3.9846212121212123,0.0,35.027575757575754,1089.255303030303,8728.826666666668,132 -llama3.1-70b,8,32,none,20,9184.230211267606,1147.601971830986,0.04126760563380283,8941.410492957746,1116.2694366197184,4.46830985915493,0.0,43.467957746478874,2263.871408450704,18125.640704225352,142 -llama2-7b,64,8,realistic,100,66.55892405063291,8.31512658227848,0.006708860759493672,9920.41335443038,1239.1768987341775,3.9839240506329117,0.0,40.59424050632911,1247.492025316456,9986.972278481013,158 -llama2-7b,64,4,realistic,200,52.767376237623765,6.592178217821782,0.005247524752475248,9556.144158415842,1193.6381188118812,4.155247524752475,0.0,40.18613861386139,1200.230297029703,9608.911534653465,202 -llama3.1-8b,32,0,realistic,100,73.82578947368421,9.213355263157895,0.009473684210526315,8702.69907894737,1085.8597368421051,3.7583552631578945,0.0,34.46203947368421,1095.073092105263,8776.524868421053,152 -mistral-7b,0,16,realistic,150,67408.88060606061,8404.156606060607,1.7104242424242426,8010.481515151515,989.2126060606062,6.334363636363637,0.0,209.3092727272727,9393.369212121212,75419.36212121212,165 -mistral-7b,64,64,realistic,200,69.72794520547946,8.704863013698631,0.00904109589041096,8713.07897260274,1086.9725342465754,3.4595890410958905,0.0,31.587945205479453,1095.6773972602741,8782.806917808219,146 -llama3.1-8b,0,64,none,50,50079.457266666665,6245.4056,1.1925333333333332,5694.479533333333,704.9586,25.051466666666666,0.0,243.7518666666667,6950.364199999999,55773.936799999996,150 -llama3.1-70b,16,4,realistic,40,2047.29,255.84038167938928,0.023969465648854958,9145.616564885497,1141.7210687022903,4.181679389312977,0.0,41.15190839694656,1397.5614503816794,11192.906564885496,131 -llama3.1-70b,32,64,none,50,74.95135483870968,9.356903225806452,0.007548387096774193,9076.520903225806,1133.2777419354838,4.023225806451612,0.0,40.22212903225807,1142.6346451612903,9151.472258064516,155 -mistral-7b,8,8,none,100,90.05214285714285,11.242142857142857,0.009428571428571429,9034.637214285714,1127.4650714285713,3.9725714285714293,0.0,36.902714285714296,1138.7072142857141,9124.689357142857,140 -llama3.1-8b,4,0,none,200,7474.9868681318685,933.191318681319,0.12104395604395606,9107.781703296703,1136.103076923077,5.851868131868132,0.0,58.59357142857142,2069.294395604396,16582.768571428573,182 -llama2-7b,8,8,none,50,16171.690805369126,2021.0243624161074,0.058255033557046976,8118.3806040268455,1013.7214765100671,3.4702013422818787,0.0,36.223489932885904,3034.7458389261747,24290.071409395972,149 -llama2-7b,8,32,realistic,100,9274.852769230769,1159.0241025641028,0.047999999999999994,8506.09482051282,1062.0163076923077,3.805641025641026,0.0,38.158923076923074,2221.0404102564103,17780.94758974359,195 -llama2-7b,0,2,realistic,200,48933.68220588235,6111.602683823529,1.0043014705882354,7063.556838235294,876.1715073529413,22.78389705882353,0.0,264.82283088235295,6987.774191176471,55997.23904411765,272 -mistral-7b,64,2,none,150,80.69175572519084,10.073587786259543,0.010076335877862596,8530.990763358779,1064.3974809160306,3.585190839694657,0.0,31.62198473282443,1074.4710687022903,8611.68251908397,131 -llama3.1-70b,4,8,realistic,30,32359.81940740741,4043.805925925926,0.23170370370370372,8089.444740740741,1009.3177037037038,9.363185185185186,0.0,95.0785925925926,5053.12362962963,40449.26414814815,135 -llama3.1-70b,8,32,realistic,70,4969.400843373494,620.9753012048193,0.025542168674698797,8552.7921686747,1067.7738554216867,4.369518072289157,0.0,42.197891566265056,1688.749156626506,13522.193012048194,166 -llama2-7b,0,8,realistic,200,59732.33377862595,7460.76148854962,1.3767557251908398,8607.006526717558,1064.244885496183,6.904809160305343,0.0,223.70007633587787,8525.006374045803,68339.34030534352,262 -llama2-7b,64,4,realistic,50,93.2931304347826,11.655043478260868,0.009217391304347827,10333.532347826087,1290.8310434782609,4.066782608695653,0.0,42.906695652173916,1302.4860869565216,10426.82547826087,115 -mistral-7b,0,8,realistic,50,46079.77587412587,5746.27881118881,0.9722377622377625,5280.802307692307,654.2435664335663,29.211748251748258,0.0,232.91776223776216,6400.5223776223775,51360.57818181818,143 -llama3.1-70b,0,32,realistic,40,44845.681736526945,5601.041257485031,0.9406586826347305,5213.186107784431,646.7746107784432,19.429640718562876,0.0,216.42580838323354,6247.815868263473,50058.867844311375,167 -llama3.1-70b,8,16,realistic,60,8482.734615384616,1060.020448717949,0.040705128205128206,8901.008782051284,1111.3821153846154,4.242564102564102,0.0,40.71698717948718,2171.4025641025646,17383.743397435897,156 -llama3.1-70b,32,32,realistic,10,140.25333333333333,17.509166666666665,0.013928571428571427,9331.706071428573,1165.3616666666665,3.6047619047619044,0.0,35.07119047619047,1182.8708333333332,9471.959404761905,84 -llama3.1-8b,0,8,realistic,100,61056.08419354839,7613.425419354839,1.395290322580645,6860.066645161291,848.2839999999999,9.863225806451613,0.0,184.86864516129032,8461.709419354838,67916.15083870967,155 -mistral-7b,0,8,none,50,49877.778387096776,6220.416903225807,1.17,5697.317419354838,705.5795483870969,24.93296774193548,0.0,247.1570322580645,6925.996451612903,55575.09580645161,155 -llama3.1-8b,64,8,none,200,75.95427536231884,9.478985507246376,0.010434782608695651,8905.065724637681,1111.0504347826086,3.7326811594202898,0.0,34.186884057971014,1120.5294202898551,8981.02,138 -llama2-7b,32,16,realistic,200,57.20234234234234,7.143738738738739,0.005405405405405407,8942.088693693693,1116.7563063063064,4.279234234234234,0.0,39.34144144144144,1123.900045045045,8999.291036036037,222 -llama2-7b,16,32,realistic,100,83.8359393939394,10.469939393939393,0.007575757575757576,9347.723454545456,1167.5422424242427,4.25539393939394,0.0,40.20854545454545,1178.012181818182,9431.559393939395,165 -llama3.1-8b,64,64,none,50,79.34676923076923,9.902384615384616,0.011076923076923076,8929.465461538462,1113.8893076923075,3.6505384615384617,0.0,34.49407692307692,1123.791692307692,9008.81223076923,130 -llama3.1-70b,16,0,none,60,80.35875,10.032,0.0073124999999999996,8984.77225,1121.829125,4.289875,0.0,41.2920625,1131.861125,9065.131,160 -llama3.1-70b,16,16,realistic,30,237.45485074626868,29.662238805970148,0.00917910447761194,9517.852164179105,1188.5397761194029,4.240820895522388,0.0,43.664402985074624,1218.202014925373,9755.307014925374,134 -llama3.1-70b,32,8,none,30,94.47809523809524,11.79468253968254,0.009285714285714284,9259.363888888889,1156.1057936507939,4.189523809523809,0.0,41.69166666666667,1167.9004761904762,9353.841984126982,126 -llama2-7b,16,4,none,100,88.30431250000001,11.031749999999999,0.006625000000000001,10004.44875,1249.883375,4.033875,0.0,39.936249999999994,1260.915125,10092.7530625,160 -llama2-7b,0,4,realistic,200,54957.32529824562,6863.675543859648,1.5505263157894735,7868.444666666666,974.1742105263158,11.834315789473685,0.0,250.66761403508772,7837.849754385966,62825.76996491227,285 -mistral-7b,32,64,none,50,76.4645390070922,9.545886524822695,0.009361702127659575,8911.583687943263,1111.7468794326242,3.6631205673758864,0.0,34.83234042553192,1121.2927659574466,8988.048226950355,141 -llama3.1-8b,16,8,none,100,82.04242857142857,10.238785714285715,0.010285714285714285,8852.303357142857,1104.7237142857143,3.8492142857142855,0.0,35.17778571428571,1114.9625,8934.345785714288,140 -llama3.1-70b,0,64,realistic,70,53548.45634831461,6687.363651685393,1.2306741573033708,6709.605449438202,831.2360674157304,10.198876404494383,0.0,187.12342696629213,7518.599719101124,60258.06179775281,178 -llama3.1-70b,0,64,none,10,40043.01130434783,5002.685144927536,0.267463768115942,4229.766594202899,524.6597101449275,2.258550724637681,0.0,36.384420289855065,5527.344855072464,44272.777898550725,138 -llama3.1-70b,8,32,none,50,5010.364387096774,626.0954838709677,0.02561290322580645,8738.889161290323,1090.5314838709678,4.323870967741936,0.0,42.65767741935484,1716.6269677419355,13749.253548387096,155 -llama3.1-70b,8,4,realistic,30,11299.666940298506,1412.0392537313433,0.06014925373134328,8424.250149253732,1051.8456716417911,3.782985074626865,0.0,38.9529104477612,2463.884925373134,19723.91708955224,134 -llama2-7b,16,8,realistic,150,69.62525,8.6982,0.0053,9302.153100000001,1161.9826999999998,4.4389,0.0,42.12350000000001,1170.6809,9371.77835,200 -mistral-7b,0,0,none,150,74465.49651898733,9282.68265822785,2.0127215189873415,9183.498481012659,1135.6179113924052,5.390822784810126,0.0,234.974746835443,10418.300569620254,83648.99500000001,158 -llama3.1-8b,0,4,none,50,49751.695555555554,6203.456732026144,1.1553594771241829,5811.783790849673,719.9037908496732,25.087320261437906,0.0,251.12503267973855,6923.360522875816,55563.47934640523,153 -llama3.1-8b,4,64,realistic,150,2894.4509937888197,361.51695652173913,0.023354037267080744,8807.944658385093,1098.8585714285714,3.741677018633541,0.0,35.04832298136646,1460.3755279503105,11702.395652173913,161 -llama3.1-70b,16,64,none,30,2399.8906666666667,299.90199999999993,0.013933333333333332,8814.8732,1100.6246,3.9525333333333332,0.0,38.562000000000005,1400.5266,11214.763866666668,150 -llama3.1-8b,64,32,realistic,50,94.64154545454545,11.811181818181819,0.01309090909090909,8970.310818181817,1118.9803636363636,3.683272727272727,0.0,34.813909090909085,1130.7915454545455,9064.952363636363,110 -llama2-7b,8,16,none,150,5010.441915887851,626.1458411214953,0.024719626168224304,8537.546588785046,1066.2292990654205,4.006588785046728,0.0,37.532476635514016,1692.375140186916,13547.988504672898,214 -llama2-7b,8,0,none,100,11428.992471264368,1428.1690804597702,0.04701149425287356,8369.940172413793,1045.0756896551727,3.843390804597701,0.0,40.85816091954023,2473.2447701149435,19798.932643678163,174 -llama3.1-70b,32,16,none,60,79.69608108108109,9.949256756756757,0.007905405405405404,8924.521689189189,1114.2168918918917,4.214932432432432,0.0,39.81310810810811,1124.1661486486485,9004.21777027027,148 -llama3.1-8b,32,32,none,50,82.74190839694657,10.326106870229008,0.01099236641221374,9095.570381679388,1134.6463358778624,3.7916793893129768,0.0,36.250916030534356,1144.9724427480917,9178.312290076337,131 -llama3.1-70b,4,32,realistic,30,35795.333841059604,4472.954569536425,0.3479470198675497,7318.669933774835,912.4366887417219,12.560927152317879,0.0,119.12529801324507,5385.391258278146,43114.00377483444,151 -llama3.1-70b,16,64,realistic,20,104.53330508474576,13.04991525423729,0.009915254237288135,10012.236694915257,1250.2313559322033,4.091864406779661,0.0,42.707033898305085,1263.2812711864408,10116.77,118 -llama3.1-70b,16,64,none,40,1611.434183006536,201.3694117647059,0.014248366013071894,9270.775947712418,1157.6574509803922,3.937843137254902,0.0,39.961895424836605,1359.026862745098,10882.210130718953,153 -llama3.1-70b,32,32,realistic,30,89.16439393939395,11.131287878787878,0.008863636363636363,9375.072954545454,1170.5840151515151,3.8741666666666665,0.0,39.008484848484855,1181.715303030303,9464.237348484849,132 -llama2-7b,16,64,realistic,50,5806.747236842105,725.6827631578948,0.020986842105263158,8942.641447368422,1116.920065789474,3.4105921052631576,0.0,37.447763157894734,1842.6028289473684,14749.388684210528,152 -mistral-7b,16,16,realistic,100,86.24558823529412,10.766911764705881,0.009705882352941177,8873.11044117647,1107.3952205882354,3.9623529411764706,0.0,36.691838235294114,1118.1621323529412,8959.356029411763,136 -mistral-7b,32,64,realistic,50,88.05520325203253,10.992845528455284,0.010731707317073172,8956.455203252033,1117.260731707317,3.692032520325203,0.0,34.2430081300813,1128.2535772357724,9044.510406504065,123 -mistral-7b,64,32,realistic,150,76.11303703703705,9.502,0.009777777777777778,8560.322962962964,1068.0660740740739,3.56837037037037,0.0,31.919999999999998,1077.568074074074,8636.436,135 -llama3.1-8b,16,64,none,100,73.58111111111111,9.18281045751634,0.009411764705882352,8885.241176470588,1108.8303921568627,3.80202614379085,0.0,35.39699346405229,1118.013202614379,8958.8222875817,153 -llama3.1-70b,8,0,none,10,16658.58023255814,2081.7288372093026,0.0668217054263566,8889.338449612402,1109.921472868217,4.17875968992248,0.0,44.145581395348835,3191.6503100775194,25547.91868217054,129 -llama3.1-70b,16,0,realistic,40,597.3254794520549,74.63438356164383,0.010205479452054795,9300.027671232878,1161.164589041096,3.9678767123287666,0.0,39.37,1235.7989726027397,9897.353150684932,146 -llama3.1-70b,16,64,none,50,663.7631012658228,82.93031645569621,0.011075949367088608,8746.851898734178,1091.872594936709,3.867088607594937,0.0,38.76044303797469,1174.8029113924051,9410.615,158 -llama3.1-70b,32,16,realistic,60,79.46671140939598,9.920604026845638,0.00785234899328859,9138.917248322146,1141.0660402684564,4.1151677852349,0.0,39.90906040268456,1150.986644295302,9218.383959731544,149 -llama3.1-70b,0,16,none,70,56176.35124293785,7015.121581920904,1.28,7010.63836158192,868.7807344632769,9.396610169491524,0.0,185.3535028248588,7883.90231638418,63186.989604519775,177 -llama2-7b,8,2,realistic,100,13517.34875,1689.2707291666668,0.060416666666666674,7899.159635416666,986.4459895833334,3.5077604166666667,0.0,33.52098958333334,2675.71671875,21416.508385416666,192 -llama2-7b,8,64,none,100,4859.083722222223,607.1494444444445,0.02561111111111112,8191.3285,1022.7678333333333,3.8097222222222222,0.0,35.67905555555555,1629.9172777777776,13050.412222222223,180 -llama3.1-70b,32,2,none,70,87.36826086956522,10.907028985507248,0.008478260869565216,9208.377101449276,1149.7639855072464,4.331086956521739,0.0,41.43659420289855,1160.6710144927536,9295.74536231884,138 -llama3.1-8b,32,8,none,150,77.19204225352112,9.633450704225352,0.010140845070422535,8873.263309859156,1107.0609859154927,3.70781690140845,0.0,33.996901408450704,1116.6944366197183,8950.455352112676,142 -llama2-7b,64,32,realistic,50,74.72308823529411,9.335073529411764,0.007794117647058824,8820.557647058822,1101.6094852941176,3.725735294117647,0.0,36.69573529411765,1110.9445588235294,8895.280735294116,136 -llama3.1-70b,0,64,none,60,53820.3028,6721.518742857142,1.2934285714285714,6710.6612,832.1594857142858,12.822971428571428,0.0,212.96988571428568,7553.678228571428,60530.96399999999,175 -mistral-7b,0,8,none,150,72410.77153374233,9026.7745398773,1.9086503067484664,9059.130245398774,1118.480736196319,7.197177914110429,0.0,248.1523926380368,10145.25527607362,81469.9017791411,163 -mistral-7b,64,8,realistic,200,75.59557971014493,9.437391304347825,0.009565217391304347,8828.086014492754,1101.4713768115942,3.758043478260869,0.0,34.4081884057971,1110.908768115942,8903.681594202899,138 -llama3.1-70b,4,8,realistic,50,30817.775460992907,3850.8673049645395,0.1722695035460993,7892.373262411346,984.935744680851,7.124397163120568,0.0,69.35340425531916,4835.80304964539,38710.14872340425,141 -llama3.1-8b,64,8,none,100,83.97672,10.48016,0.011519999999999999,8815.725440000002,1099.9964799999998,3.7728800000000007,0.0,34.51016,1110.4766399999996,8899.70216,125 -llama2-7b,16,4,realistic,50,4937.85890625,617.12234375,0.02875,10026.81859375,1252.5778906250002,3.590625,0.0,40.717578125,1869.700234375,14964.6775,128 -llama3.1-8b,0,32,none,150,75155.6645,9369.4375,2.0246874999999998,9216.9969375,1137.6740624999998,5.81425,0.0,242.74875000000003,10507.1115625,84372.6614375,160 -llama3.1-70b,8,4,none,50,6227.456884057971,778.1801449275363,0.03949275362318841,8817.956449275362,1100.9321739130435,4.090869565217391,0.0,40.81644927536232,1879.1123188405795,15045.413333333334,138 -llama3.1-70b,16,8,none,10,113.43153153153153,14.16081081081081,0.01054054054054054,10226.458378378378,1277.3153153153153,4.337117117117118,0.0,47.0781981981982,1291.4761261261262,10339.88990990991,111 -llama2-7b,0,32,realistic,200,56856.70741935484,7100.222831541219,1.4184946236559142,7614.871899641576,940.7936200716846,4.764659498207885,0.0,202.92655913978496,8041.0164516129025,64471.57931899641,279 -llama3.1-8b,8,4,none,100,86.68685714285714,10.818428571428571,0.010285714285714285,8817.8785,1100.3997142857145,4.126785714285714,0.0,37.107000000000006,1111.218142857143,8904.565357142856,140 -llama2-7b,0,16,realistic,150,58043.967124999996,7250.865791666667,1.3525416666666665,7700.508916666668,951.144125,5.492083333333332,0.0,208.8135416666667,8202.009916666668,65744.47604166667,240 -llama2-7b,64,64,realistic,100,62.69150943396227,7.832012578616352,0.006666666666666667,9285.54817610063,1159.403647798742,4.175345911949685,0.0,41.17672955974843,1167.2356603773587,9348.23968553459,159 -mistral-7b,64,64,none,100,73.08381294964029,9.123812949640287,0.009496402877697842,8727.790287769785,1088.9927338129496,3.6755395683453234,0.0,33.50892086330935,1098.11654676259,8800.874100719424,139 -llama3.1-8b,4,8,realistic,50,1118.429140625,139.69125,0.023125,8767.515625,1093.8185156249997,4.040546875,0.0,37.1709375,1233.509765625,9885.944765625,128 -llama3.1-8b,32,4,none,200,72.03241830065359,8.98954248366013,0.009411764705882352,8614.695620915032,1074.7993464052288,3.6786928104575165,0.0,32.45470588235294,1083.788888888889,8686.728039215686,153 -llama3.1-8b,64,0,none,50,80.44218045112781,10.039097744360902,0.010827067669172932,8856.456917293233,1104.6671428571426,3.6983458646616545,0.0,34.56127819548872,1114.7062406015036,8936.899097744361,133 -llama3.1-8b,64,0,none,200,63.526130952380946,7.927976190476191,0.008571428571428572,7996.597916666667,997.287619047619,2.9857738095238093,0.0,26.95345238095238,1005.2155952380953,8060.124047619047,168 -llama3.1-70b,8,64,none,40,10982.727662337662,1372.3649350649353,0.04305194805194806,8583.410974025974,1071.516103896104,3.6690259740259736,0.0,37.856103896103896,2443.881038961039,19566.138636363637,154 -llama3.1-70b,32,4,none,40,94.38976377952756,11.783622047244094,0.00921259842519685,9691.836929133859,1210.1908661417324,4.02992125984252,0.0,40.71960629921259,1221.9744881889765,9786.226692913386,127 -llama3.1-70b,16,16,none,40,91.71014705882352,11.449117647058824,0.008602941176470588,9537.22044117647,1190.914338235294,4.167426470588235,0.0,42.521617647058825,1202.3634558823528,9628.930588235295,136 -llama3.1-70b,0,8,realistic,60,50819.0837704918,6346.726994535518,1.1518032786885244,6065.254153005464,751.6861202185793,17.412459016393445,0.0,240.2097267759563,7098.4131147540975,56884.33792349727,183 -llama3.1-70b,0,2,none,50,47049.53269005848,5875.604853801169,0.9731578947368419,6279.555614035088,778.9134502923978,25.55356725146199,0.0,263.3043859649123,6654.518304093566,53329.088304093566,171 -llama3.1-8b,8,32,none,50,84.9835,10.605857142857142,0.010285714285714285,8852.994714285715,1104.5241428571426,3.885142857142857,0.0,36.13914285714285,1115.1299999999999,8937.978214285715,140 -llama3.1-8b,64,8,none,150,78.28097014925373,9.769402985074626,0.010746268656716417,9040.40567164179,1127.9037313432834,3.741716417910448,0.0,34.704626865671635,1137.6731343283584,9118.686641791044,134 -mistral-7b,16,2,realistic,200,81.25060810810811,10.143378378378378,0.00891891891891892,8793.337770270271,1097.1533108108108,3.6616891891891896,0.0,32.8577027027027,1107.2966891891895,8874.58837837838,148 -llama3.1-70b,8,4,none,40,7344.5142187500005,917.79375,0.04625,9304.82015625,1161.7481249999998,4.16890625,0.0,41.987421874999995,2079.541875,16649.334375,128 -mistral-7b,16,0,realistic,50,92.36416666666666,11.530757575757576,0.01,9067.147575757575,1131.1888636363635,3.7673484848484846,0.0,35.54037878787879,1142.7196212121212,9159.511742424243,132 -llama3.1-8b,16,16,none,200,73.02820512820513,9.113846153846154,0.00923076923076923,8916.982628205129,1112.503205128205,3.7146153846153847,0.0,34.15397435897436,1121.6170512820513,8990.010833333334,156 -llama2-7b,8,32,none,150,5446.700714285714,680.6109821428572,0.038214285714285715,8441.648660714285,1054.2922767857142,3.9374107142857144,0.0,38.05464285714286,1734.9032589285719,13888.349375,224 -llama3.1-70b,8,8,realistic,50,3004.4607299270074,375.41124087591237,0.027007299270072987,8973.96087591241,1120.5800729927007,3.8432846715328464,0.0,39.008905109489056,1495.9913138686131,11978.421605839416,137 -llama2-7b,16,32,none,50,2794.189315068493,349.18212328767123,0.018013698630136986,9065.094657534248,1132.1426712328769,4.0,0.0,40.335821917808225,1481.324794520548,11859.28397260274,146 -mistral-7b,8,32,realistic,200,75.56908536585365,9.434085365853658,0.008048780487804878,8711.398841463415,1087.0282926829268,3.745426829268293,0.0,33.857499999999995,1096.4623780487807,8786.967926829268,164 -llama3.1-70b,4,2,realistic,40,20933.10546099291,2615.841134751773,0.08496453900709221,7407.294822695036,924.5970921985814,3.65531914893617,0.0,35.69304964539007,3540.438226950355,28340.400283687944,141 -llama3.1-70b,4,2,none,70,15933.193092105264,1990.8401973684208,0.07546052631578949,7998.085328947368,998.5682236842105,4.382894736842106,0.0,39.220789473684206,2989.4084210526316,23931.278421052633,152 -llama3.1-70b,8,2,none,60,3572.305035460993,446.41432624113475,0.031631205673758864,9040.106737588652,1128.6487943262414,3.8572340425531917,0.0,37.523900709219866,1575.063120567376,12612.411773049645,141 -llama3.1-70b,32,8,realistic,50,87.77132352941176,10.95735294117647,0.008602941176470588,9217.507647058825,1150.8969117647057,3.8346323529411763,0.0,39.111176470588234,1161.8542647058823,9305.278970588235,136 -llama3.1-70b,32,64,none,70,70.74786585365854,8.832134146341463,0.007134146341463414,9010.846524390245,1125.0646341463414,4.463231707317074,0.0,43.120914634146345,1133.896768292683,9081.594390243903,164 -llama2-7b,8,0,none,50,14940.659714285715,1867.0539428571435,0.08091428571428572,7979.972742857143,996.4810857142858,4.534114285714286,0.0,47.7029142857143,2863.5350285714294,22920.632457142856,175 -llama2-7b,32,8,realistic,200,59.91216216216216,7.482387387387387,0.005675675675675676,9728.85981981982,1215.129054054054,4.326171171171171,0.0,42.690090090090095,1222.6114414414415,9788.771981981981,222 -llama2-7b,64,16,realistic,150,53.38740932642487,6.6696373056994815,0.005492227979274612,8933.897409326424,1115.896787564767,4.284663212435233,0.0,38.97647668393782,1122.5664248704663,8987.28481865285,193 -mistral-7b,64,4,none,50,91.34704347826087,11.403826086956522,0.011478260869565217,8754.142956521739,1092.0889565217392,4.015565217391304,0.0,36.299913043478256,1103.4927826086957,8845.49,115 -mistral-7b,64,4,none,200,74.81935714285714,9.3405,0.009428571428571429,8585.220000000001,1071.1720714285716,3.7249285714285714,0.0,32.90135714285714,1080.5125714285714,8660.039357142858,140 -llama3.1-8b,4,8,none,200,211.53233766233765,26.40077922077922,0.01422077922077922,9095.90896103896,1134.920194805195,3.7280519480519487,0.0,34.756493506493506,1161.320974025974,9307.441298701298,154 -llama3.1-70b,8,2,none,50,2771.805693430657,346.3702189781022,0.02401459854014599,8830.992773722628,1102.5536496350364,3.945912408759124,0.0,37.59350364963503,1448.9238686131387,11602.798467153285,137 -llama3.1-70b,32,8,realistic,20,110.70037037037036,13.819814814814814,0.010833333333333332,9617.690833333334,1200.8409259259258,3.859351851851852,0.0,40.252407407407404,1214.660740740741,9728.391203703704,108 -llama3.1-70b,32,32,realistic,50,76.84006535947712,9.592679738562092,0.007647058823529411,9094.450980392157,1135.5046405228757,4.071960784313726,0.0,40.74346405228758,1145.097320261438,9171.291045751634,153 -mistral-7b,8,2,realistic,150,90.23755244755245,11.265314685314685,0.009230769230769232,8859.947622377622,1105.5116083916082,3.7040559440559435,0.0,34.22132867132867,1116.776923076923,8950.185174825174,143 -llama3.1-70b,64,0,realistic,20,115.89569999999999,14.468399999999999,0.011699999999999999,9139.5637,1140.876,4.0988,0.0,40.177,1155.3444,9255.4594,100 -llama3.1-8b,16,64,realistic,200,1810.4503086419754,226.12728395061734,0.02388888888888889,9105.339382716049,1136.0035185185186,3.6922839506172838,0.0,35.38111111111111,1362.130802469136,10915.789691358024,162 -llama3.1-8b,4,2,none,150,92.3003448275862,11.511172413793103,0.012,8875.396551724138,1107.467724137931,3.680344827586207,0.0,33.234758620689654,1118.9788965517241,8967.696896551724,145 -mistral-7b,8,8,none,150,85.08533783783783,10.622094594594595,0.00891891891891892,8914.461621621622,1112.421081081081,3.5922972972972973,0.0,33.120000000000005,1123.0431756756757,8999.54695945946,148 -llama2-7b,32,16,none,100,65.05335195530726,8.12703910614525,0.005921787709497207,9096.600279329608,1136.2943016759777,3.9923463687150837,0.0,37.880391061452514,1144.421340782123,9161.653631284917,179 -mistral-7b,16,4,realistic,100,91.87746153846153,11.469999999999999,0.010153846153846154,8867.668153846153,1106.6419230769231,4.066307692307691,0.0,36.96330769230769,1118.1119230769234,8959.545615384615,130 -llama3.1-70b,0,8,realistic,40,45229.92449704142,5649.208461538461,0.9746153846153844,5263.910295857988,652.824674556213,22.906745562130176,0.0,231.33408284023668,6302.033136094675,50493.83479289941,169 -mistral-7b,4,8,realistic,150,290.38041666666663,36.24798611111111,0.014861111111111111,8733.903263888888,1089.7985416666666,3.5900694444444445,0.0,32.46666666666667,1126.0465277777776,9024.283680555556,144 -llama3.1-70b,0,0,realistic,30,43012.68417721519,5372.1161392405065,0.7087341772151898,4965.727974683545,616.0249367088608,20.23025316455696,0.0,188.93531645569618,5988.141075949367,47978.41215189874,158 -llama3.1-8b,16,64,realistic,100,75.81322147651007,9.461409395973154,0.009664429530201342,8843.56154362416,1103.559798657718,3.9099999999999993,0.0,35.851543624161074,1113.021208053691,8919.37476510067,149 -llama3.1-70b,8,2,realistic,10,5310.956571428571,663.6480952380952,0.035333333333333335,8060.678095238095,1006.3677142857141,3.149142857142857,0.0,30.445619047619047,1670.0158095238096,13371.634666666669,105 -llama3.1-70b,4,64,none,10,32721.167936507936,4088.82,0.17444444444444449,6977.326031746033,869.8845238095238,3.6221428571428573,0.0,50.030317460317455,4958.704523809524,39698.49396825397,126 -llama3.1-70b,8,8,none,70,1929.1498013245032,241.06284768211924,0.01695364238410596,9089.154238410596,1134.942582781457,4.379536423841059,0.0,42.159006622516564,1376.0054304635762,11018.3040397351,151 -llama3.1-70b,16,2,realistic,30,112.50938596491228,14.04561403508772,0.010263157894736842,9270.56947368421,1157.4747368421051,4.254298245614034,0.0,42.25245614035087,1171.5203508771926,9383.078859649122,114 -llama3.1-70b,32,4,realistic,20,109.49154545454545,13.668909090909091,0.010636363636363637,9383.021272727272,1171.5701818181815,3.932000000000001,0.0,39.778,1185.2390909090905,9492.512818181818,110 -llama2-7b,64,32,none,100,68.50006172839507,8.550432098765432,0.008333333333333333,8779.171481481482,1096.3598148148149,4.07358024691358,0.0,37.828950617283944,1104.9102469135803,8847.671543209877,162 -mistral-7b,0,4,none,200,72203.53655555556,9000.829555555558,2.101277777777778,9188.540444444445,1134.0529444444442,7.7394444444444455,0.0,271.4197222222222,10134.8825,81392.07699999999,180 -mistral-7b,4,32,realistic,100,2258.7403401360543,282.1167346938775,0.018435374149659865,8683.668979591836,1083.7542176870747,3.9842176870748305,0.0,37.14326530612245,1365.870952380952,10942.40931972789,147 -mistral-7b,64,2,none,50,97.15788990825688,12.129266055045871,0.012110091743119267,8821.102110091742,1100.5976146788992,3.810183486238533,0.0,34.94155963302753,1112.726880733945,8918.26,109 -llama3.1-8b,0,32,none,50,50245.12986666667,6264.3872,1.2104666666666668,6037.640466666667,747.5822666666667,26.02873333333333,0.0,261.61653333333334,7011.969466666667,56282.77033333334,150 -llama3.1-8b,64,32,realistic,200,72.6381118881119,9.065174825174825,0.01006993006993007,8636.80888111888,1077.5162937062937,3.590839160839161,0.0,32.49594405594406,1086.5814685314688,8709.446993006992,143 -llama3.1-70b,0,4,none,30,44742.71556291391,5588.241390728478,0.8082119205298013,5503.3074172185425,682.9055629139073,22.784569536423838,0.0,220.6368874172186,6271.146953642385,50246.02298013245,151 -llama3.1-70b,8,4,none,10,11418.375396825397,1426.9044444444446,0.051746031746031755,8611.971984126983,1075.5396825396826,4.028412698412699,0.0,38.66936507936508,2502.4441269841273,20030.347380952382,126 -llama3.1-70b,8,8,none,50,6290.688633093525,786.0763309352517,0.050863309352517996,9211.44,1150.1855395683451,4.0743165467625895,0.0,41.42086330935252,1936.2618705035973,15502.128633093524,139 -llama3.1-70b,32,2,realistic,30,107.3183185840708,13.397610619469027,0.010353982300884955,9571.44407079646,1195.0507079646015,4.15212389380531,0.0,42.33973451327433,1208.4483185840706,9678.76238938053,113 -llama3.1-70b,32,64,none,10,107.80268518518518,13.458055555555555,0.010833333333333332,10085.160925925926,1259.483611111111,4.402777777777778,0.0,45.27814814814815,1272.9416666666666,10192.963611111112,108 -llama3.1-8b,4,16,realistic,150,762.2510204081632,95.1795238095238,0.017346938775510204,8822.798775510204,1100.9670068027212,4.0095238095238095,0.0,36.43605442176871,1196.1465306122452,9585.049795918367,147 -mistral-7b,0,0,none,50,49592.700980392154,6184.7422875817,1.1831372549019608,5428.178366013072,672.7273856209151,24.000849673202612,0.0,231.00143790849674,6857.469673202615,55020.87934640522,153 -llama3.1-8b,8,4,realistic,50,97.49848,12.16768,0.011519999999999999,8929.790560000001,1113.93832,3.8136000000000005,0.0,35.261120000000005,1126.106,9027.28904,125 -llama3.1-70b,4,16,realistic,60,27021.189529411764,3376.5331764705884,0.2096470588235294,7635.05905882353,952.9510588235294,8.020470588235295,0.0,77.93823529411765,4329.484235294118,34656.24858823529,170 -mistral-7b,4,0,none,50,5471.266866666667,683.3477999999999,0.043933333333333324,8785.608666666667,1096.1644000000001,4.200466666666666,0.0,38.59759999999999,1779.5122000000006,14256.875533333334,150 -llama2-7b,32,2,realistic,200,60.192077294685994,7.515700483091788,0.00995169082125604,9726.026376811595,1214.5585024154589,4.389806763285025,0.0,43.23526570048308,1222.0742028985508,9786.21845410628,207 -llama3.1-8b,64,8,realistic,200,77.87118518518518,9.718222222222222,0.010666666666666666,8854.153703703703,1104.6205925925926,3.902222222222222,0.0,35.694740740740734,1114.3388148148147,8932.024888888887,135 -llama3.1-8b,0,64,realistic,200,72014.9015882353,8979.225058823527,2.0658235294117646,8501.958117647058,1050.4067647058823,4.932352941176471,0.0,237.65594117647055,10029.631823529413,80516.85970588235,170 -mistral-7b,32,4,realistic,150,82.87192592592592,10.345777777777778,0.009777777777777778,8985.245555555555,1121.1594814814816,3.726666666666667,0.0,34.50407407407408,1131.5052592592594,9068.11748148148,135 -llama2-7b,4,32,none,200,8031.2933620689655,1003.5768103448277,0.039655172413793106,7970.211681034482,995.0944396551724,3.8801293103448278,0.0,36.004181034482755,1998.67125,16001.505043103449,232 -mistral-7b,16,16,none,50,88.53810606060607,11.05310606060606,0.01,8774.749772727273,1094.8231818181816,3.870075757575757,0.0,35.86984848484848,1105.8762878787877,8863.287878787878,132 -llama3.1-70b,0,8,realistic,30,43708.34374193549,5459.059225806451,0.691483870967742,5333.798193548387,661.6416774193549,17.91535483870968,0.0,191.75503225806452,6120.7009032258065,49042.14193548387,155 -llama3.1-70b,8,8,none,20,10669.371145038169,1333.286259541985,0.0533587786259542,8757.338320610688,1093.4881679389314,3.956259541984733,0.0,39.226183206106874,2426.774427480916,19426.709465648855,131 -llama3.1-70b,16,0,none,20,369.6068992248062,46.17527131782946,0.010310077519379844,9774.803798449611,1220.4744961240312,4.0993023255813945,0.0,42.24573643410852,1266.6497674418606,10144.410697674419,129 -llama3.1-70b,16,4,none,30,100.60857142857144,12.559920634920635,0.009285714285714284,9384.424285714285,1171.8157936507935,4.286111111111111,0.0,42.47666666666666,1184.3757142857141,9485.032857142858,126 -llama3.1-70b,16,32,none,40,83.59472972972974,10.435945945945946,0.007905405405405404,8775.333581081082,1095.7483783783784,4.398513513513514,0.0,41.719662162162166,1106.1843243243243,8858.928310810812,148 -llama2-7b,0,0,realistic,150,55962.939957264956,6990.753504273505,1.3769658119658121,7618.866324786325,938.1992735042736,1.9146153846153848,0.0,145.3142307692308,7928.9527777777785,63581.80628205128,234 -llama2-7b,0,4,realistic,50,46224.84691489361,5774.162606382979,1.1074468085106381,6875.975904255319,854.7219148936172,21.13590425531915,0.0,276.79281914893613,6628.884521276596,53100.82281914894,188 -llama2-7b,0,0,realistic,50,46532.923241758246,5812.817087912088,1.0919230769230768,6895.14587912088,854.5676373626374,18.313186813186814,0.0,246.08104395604394,6667.3847252747255,53428.06912087912,182 -mistral-7b,16,16,realistic,200,75.0124358974359,9.364615384615385,0.008461538461538461,8401.417243589744,1048.3582692307693,3.6884615384615382,0.0,32.183461538461536,1057.7228846153846,8476.429679487179,156 -llama3.1-70b,4,16,realistic,20,32027.18954887218,4002.078646616541,0.25887218045112786,6663.238947368422,830.7245112781955,8.106466165413533,0.0,81.81443609022556,4832.8031578947375,38690.4284962406,133 -llama3.1-70b,8,4,realistic,10,10979.89041322314,1372.0680165289255,0.047520661157024795,7891.356776859504,985.4284297520662,2.635702479338843,0.0,31.0001652892562,2357.4964462809917,18871.247190082646,121 -llama3.1-70b,32,2,realistic,70,90.3034328358209,11.273507462686569,0.00873134328358209,9044.747462686566,1129.3277611940298,4.2791044776119405,0.0,40.68365671641791,1140.6012686567165,9135.050895522387,134 -llama3.1-70b,32,0,realistic,10,147.47759036144578,18.411084337349397,0.014096385542168674,9307.502530120482,1162.0956626506022,3.518795180722892,0.0,34.57674698795181,1180.5067469879518,9454.980120481927,83 -llama3.1-8b,16,8,realistic,100,87.96618320610686,10.978091603053436,0.01099236641221374,8889.93396946565,1109.29213740458,3.8976335877862596,0.0,35.44229007633588,1120.2702290076336,8977.900152671755,131 -llama2-7b,0,2,realistic,50,43463.45240223464,5429.075418994414,0.8358659217877096,6484.982905027932,803.4159217877095,26.570558659217877,0.0,276.6991061452514,6232.491340782122,49948.43530726257,179 -mistral-7b,32,16,none,100,80.6020588235294,10.062426470588235,0.009705882352941177,8966.262205882354,1118.9025735294117,3.8436029411764707,0.0,35.63154411764705,1128.965,9046.864264705882,136 -mistral-7b,64,2,none,200,77.64433823529411,9.693161764705883,0.009705882352941177,8310.542058823528,1036.6069117647057,3.5601470588235298,0.0,30.970661764705884,1046.3000735294117,8388.186397058824,136 -llama3.1-8b,32,64,realistic,150,71.4398013245033,8.915629139072848,0.009536423841059603,8779.917748344371,1095.29821192053,3.484503311258279,0.0,32.22271523178809,1104.2138410596026,8851.357549668874,151 -mistral-7b,64,8,none,150,78.79992424242424,9.837424242424243,0.01,8962.698484848484,1118.210303030303,3.4947727272727276,0.0,32.16719696969697,1128.0477272727271,9041.49840909091,132 -llama3.1-8b,8,4,none,50,93.42730769230769,11.659615384615385,0.011076923076923076,8996.143384615385,1122.3124615384618,3.804384615384616,0.0,35.023153846153846,1133.972076923077,9089.570692307692,130 -llama3.1-70b,4,4,realistic,20,10383.27806451613,1297.395241935484,0.052499999999999984,8378.990403225807,1045.8923387096775,3.663951612903225,0.0,38.50524193548388,2343.287580645161,18762.268467741935,124 -llama3.1-8b,8,0,realistic,50,94.17939849624061,11.749022556390978,0.011804511278195488,8922.296015037595,1113.1183458646615,3.975112781954887,0.0,36.91037593984962,1124.8673684210526,9016.475413533835,133 -llama3.1-70b,4,2,none,20,20854.238702290077,2605.9712977099234,0.0968702290076336,7948.761526717557,991.6709923664124,4.630229007633588,0.0,44.660992366412216,3597.642290076336,28803.000229007637,131 -llama3.1-8b,4,64,realistic,100,4400.67198757764,549.6365838509317,0.03515527950310559,8352.07248447205,1042.2552795031054,4.045093167701864,0.0,35.78366459627329,1591.891863354037,12752.744472049688,161 -llama3.1-8b,8,8,realistic,100,88.28839416058395,11.018248175182482,0.010510948905109488,8952.567372262774,1117.1069343065694,3.9416788321167884,0.0,36.143503649635036,1128.125182481752,9040.855766423358,137 -llama3.1-8b,8,16,realistic,150,92.71173611111111,11.567916666666669,0.010972222222222223,9079.326180555556,1132.9028472222224,3.6679166666666663,0.0,34.060763888888886,1144.4707638888888,9172.037916666666,144 -llama2-7b,16,4,none,150,88.95730769230771,11.10889423076923,0.0071634615384615396,9158.015913461539,1144.0711538461537,4.383413461538462,0.0,41.31932692307692,1155.1800480769232,9246.973221153847,208 -llama2-7b,4,16,realistic,100,19819.77384180791,2476.6394915254236,0.11807909604519773,8437.77988700565,1053.281186440678,6.559830508474577,0.0,61.58887005649717,3529.9206779661013,28257.553728813557,177 -mistral-7b,16,2,none,100,86.93985507246377,10.853623188405797,0.009565217391304347,8656.632246376812,1080.296231884058,3.9618840579710146,0.0,35.28086956521739,1091.1498550724637,8743.572101449276,138 -mistral-7b,32,16,none,200,283.22238410596026,35.37476821192053,0.012847682119205298,8995.54761589404,1122.2962913907286,3.5568211920529804,0.0,33.229933774834436,1157.6710596026492,9278.77,151 -llama3.1-8b,8,16,none,100,84.90574468085106,10.596099290780142,0.010212765957446808,8740.689361702127,1090.739574468085,3.874397163120567,0.0,35.11971631205673,1101.3356737588654,8825.59510638298,141 -llama3.1-70b,0,0,realistic,20,41402.34084507042,5170.813591549296,0.3982394366197183,4690.037535211268,582.3206338028169,11.637042253521127,0.0,107.60232394366197,5753.134225352113,46092.37838028169,142 -llama3.1-70b,0,4,none,60,52030.9156185567,6497.9261340206185,1.1628350515463917,6471.058402061856,801.8412371134021,16.553556701030928,0.0,250.97355670103093,7299.76737113402,58501.97402061856,194 -llama3.1-70b,16,64,none,60,77.08761006289308,9.623584905660378,0.007358490566037735,9105.501132075473,1136.8672327044026,4.030880503144654,0.0,39.47616352201258,1146.4908176100628,9182.588742138363,159 -llama3.1-70b,16,2,realistic,20,113.56513274336284,14.177433628318584,0.010353982300884955,9712.66610619469,1212.7156637168139,4.313008849557522,0.0,43.2061946902655,1226.8930973451327,9826.231238938051,113 -llama3.1-8b,8,2,none,150,85.37608391608391,10.654825174825175,0.01006993006993007,8764.665524475524,1093.5358741258742,3.7373426573426576,0.0,33.33062937062937,1104.190699300699,8850.04160839161,143 -llama3.1-8b,8,32,none,150,76.1402564102564,9.502179487179486,0.00923076923076923,8900.095064102565,1110.5510256410255,3.8549358974358974,0.0,35.3099358974359,1120.053205128205,8976.23532051282,156 -mistral-7b,16,2,realistic,100,91.31128787878788,11.399318181818183,0.01,8657.978939393939,1080.5330303030303,3.9363636363636365,0.0,35.425000000000004,1091.9323484848485,8749.290227272728,132 -mistral-7b,64,4,none,150,78.83646616541354,9.841954887218046,0.009924812030075189,8661.687894736842,1080.6512030075187,3.6293233082706764,0.0,32.51263157894737,1090.493157894737,8740.524360902256,133 -llama2-7b,8,2,realistic,200,4572.678049792532,571.4124066390042,0.03348547717842324,8295.424439834025,1036.000746887967,3.8410373443983405,0.0,34.611327800829876,1607.413153526971,12868.102489626557,241 -llama2-7b,64,2,none,200,48.46617117117117,6.05481981981982,0.0047747747747747754,9198.697117117117,1148.902882882883,4.224459459459459,0.0,39.76873873873875,1154.9577027027026,9247.163288288288,222 -llama3.1-8b,32,64,none,200,1576.726,196.9298125,0.0226875,8916.0988125,1112.33275,3.727375,0.0,35.407062499999995,1309.2625625,10492.8248125,160 -llama3.1-70b,32,0,realistic,70,76.71635220125786,9.577232704402515,0.007358490566037735,8806.987987421382,1099.6015094339623,4.35125786163522,0.0,41.26930817610063,1109.1787421383651,8883.704339622642,159 -mistral-7b,0,2,realistic,50,44526.965103448274,5552.123931034483,0.8917931034482759,5163.12,639.6947586206896,30.02006896551724,0.0,226.85379310344828,6191.818689655172,49690.08510344828,145 -llama2-7b,8,0,realistic,100,11447.708238636364,1430.5303977272727,0.05034090909090909,8605.907386363637,1074.566534090909,4.184545454545455,0.0,44.43960227272728,2505.096931818182,20053.615625000002,176 -llama2-7b,32,32,none,50,82.8527536231884,10.35072463768116,0.007681159420289856,9321.59188405797,1164.333623188406,4.055797101449276,0.0,40.19789855072463,1174.6843478260869,9404.44463768116,138 -mistral-7b,16,0,none,150,73.78512195121951,9.21140243902439,0.008048780487804878,8959.711158536586,1118.0801219512196,3.6865853658536585,0.0,34.96634146341463,1127.291524390244,9033.496280487805,164 -mistral-7b,16,32,none,100,82.03581560283688,10.241418439716313,0.009361702127659575,8944.359148936172,1116.1429787234042,3.8996453900709223,0.0,36.37581560283688,1126.3843971631206,9026.394964539008,141 -llama3.1-8b,16,64,none,150,73.04071428571429,9.11538961038961,0.00935064935064935,9030.863506493506,1126.7826623376623,3.520454545454546,0.0,32.96032467532467,1135.898051948052,9103.90422077922,154 -llama3.1-8b,32,16,realistic,50,89.70155737704918,11.194672131147541,0.01180327868852459,8362.539672131148,1042.9680327868853,3.6341803278688527,0.0,31.596147540983612,1054.162704918033,8452.241229508196,122 -llama3.1-70b,0,4,realistic,60,50451.31835978836,6300.8613227513215,1.152063492063492,6618.686455026455,820.4587830687831,21.261375661375663,0.0,281.9142328042328,7121.320105820106,57070.00481481481,189 -llama3.1-70b,0,32,none,10,40663.836969696975,5080.486060606061,0.2672727272727273,4541.483106060607,564.2996969696969,2.828181818181818,0.0,45.229469696969694,5644.785757575758,45205.320075757576,132 -llama3.1-70b,4,64,realistic,70,18580.54762886598,2321.5378865979383,0.2302577319587629,7994.491185567011,997.6018041237113,7.315412371134021,0.0,72.01154639175257,3319.1396907216485,26575.03881443299,194 -llama3.1-70b,32,64,none,20,88.1593181818182,11.005833333333333,0.008863636363636363,9199.816969696969,1148.5891666666669,4.086363636363636,0.0,40.444848484848485,1159.595,9287.976287878788,132 -mistral-7b,0,8,realistic,150,67649.6650625,8433.2723125,1.7190625000000002,8346.948999999999,1030.9765,7.644749999999999,0.0,218.9586875,9464.2488125,75996.6140625,160 -mistral-7b,16,8,none,150,82.30258741258741,10.274685314685314,0.009230769230769232,8973.95097902098,1119.865944055944,3.8229370629370636,0.0,35.190489510489506,1130.1406293706293,9056.253566433566,143 -llama3.1-8b,32,32,realistic,150,74.88124137931035,9.345103448275863,0.00993103448275862,8845.626344827586,1103.6141379310343,3.5944137931034477,0.0,33.05489655172414,1112.95924137931,8920.507586206897,145 -llama3.1-70b,16,0,none,10,1787.5245217391305,223.34991304347827,0.01947826086956522,9751.338260869565,1217.7925217391303,4.150347826086957,0.0,43.396260869565225,1441.1424347826085,11538.862782608696,115 -llama3.1-70b,0,64,realistic,10,38480.28488888889,4807.689407407408,0.19570370370370366,3648.7294814814813,453.3334074074074,1.4305185185185185,0.0,24.462666666666667,5261.022814814815,42129.014370370365,135 -llama2-7b,4,32,none,50,34488.85860335196,4309.9408938547485,0.5400558659217878,8279.704860335196,1031.6577653631284,10.066368715083799,0.0,104.93798882681563,5341.598659217877,42768.56346368715,179 -mistral-7b,64,0,none,150,69.22285714285714,8.64181818181818,0.008571428571428572,8570.03331168831,1069.133831168831,3.395649350649351,0.0,30.65727272727273,1077.7756493506495,8639.256168831169,154 -llama3.1-8b,8,16,none,200,100.35057324840764,12.520573248407644,0.012547770700636942,9023.556496815287,1125.8253503184715,3.634840764331211,0.0,33.86796178343949,1138.345923566879,9123.907070063695,157 -llama3.1-8b,8,4,realistic,200,79.97769736842106,9.981118421052633,0.009473684210526315,8664.050460526316,1080.9071710526316,3.4998684210526316,0.0,31.23065789473684,1090.8882894736844,8744.028157894738,152 -mistral-7b,32,8,realistic,200,73.8666,9.221533333333333,0.0088,8607.117333333332,1073.8225333333335,3.4083333333333328,0.0,30.546666666666667,1083.0440666666666,8680.983933333333,150 -mistral-7b,8,16,none,100,87.30986013986013,10.89979020979021,0.009230769230769232,8857.968181818183,1105.4767132867132,3.9477622377622374,0.0,36.39517482517483,1116.3765034965036,8945.278041958041,143 -mistral-7b,64,0,realistic,200,64.83503030303031,8.094060606060605,0.008,8368.076363636364,1043.8716363636363,3.205818181818182,0.0,28.508363636363633,1051.965696969697,8432.911393939394,165 -llama3.1-70b,32,32,none,10,106.68018181818182,13.317909090909092,0.010636363636363637,10175.317818181818,1270.847909090909,4.308909090909091,0.0,45.72336363636364,1284.1658181818182,10281.998,110 -llama2-7b,8,0,none,200,3319.665443786982,414.78810650887567,0.021479289940828403,8888.492189349112,1109.636627218935,4.206153846153846,0.0,43.773254437869824,1524.424733727811,12208.157633136096,169 -mistral-7b,0,0,realistic,150,67438.98640243903,8408.07012195122,1.7435365853658535,8151.47256097561,1008.2051219512194,5.80719512195122,0.0,198.78634146341466,9416.27524390244,75590.45896341463,164 -mistral-7b,0,0,none,200,77303.49827380953,9636.456547619047,2.4375595238095236,9447.316309523809,1167.5807142857143,5.206785714285714,0.0,283.31607142857143,10804.037261904763,86750.81458333333,168 -llama3.1-8b,32,8,realistic,50,94.92741379310345,11.846810344827587,0.012413793103448275,8870.426637931034,1106.5380172413795,3.9284482758620682,0.0,36.1401724137931,1118.3848275862072,8965.354051724138,116 -llama3.1-70b,4,4,realistic,30,24938.62372093023,3116.3956589147288,0.12790697674418605,8168.495891472868,1019.3710077519379,5.258992248062015,0.0,52.81953488372093,4135.7666666666655,33107.1196124031,129 -llama3.1-70b,4,16,none,30,25792.98507246377,3223.056666666667,0.16681159420289854,8590.722173913044,1071.4457246376812,7.121666666666666,0.0,69.43130434782607,4294.502391304349,34383.707246376805,138 -llama3.1-70b,8,0,realistic,10,7754.6985087719295,968.9999122807019,0.04078947368421052,7720.592192982456,963.2443859649125,2.6633333333333336,0.0,27.832894736842103,1932.2442982456141,15475.290701754388,114 -llama3.1-8b,0,0,realistic,200,71668.1653254438,8934.838284023668,2.098639053254438,8744.757869822486,1079.720473372781,4.727633136094675,0.0,233.5313609467456,10014.558757396451,80412.92319526627,169 -llama3.1-70b,0,4,none,70,56200.18664772727,7017.723863636363,1.2431818181818182,7252.774034090909,898.8688068181818,13.772556818181819,0.0,233.14039772727273,7916.592670454545,63452.960681818186,176 -llama3.1-70b,4,64,realistic,30,25369.70926380368,3170.0965644171783,0.2769325153374233,7333.811288343558,914.6588957055216,8.807055214723926,0.0,83.62042944785276,4084.7554601226993,32703.520552147238,163 -mistral-7b,64,16,realistic,200,74.44877697841726,9.294244604316548,0.009496402877697842,8832.960287769783,1102.0229496402878,3.558057553956834,0.0,32.60323741007195,1111.3171942446043,8907.4090647482,139 -mistral-7b,64,64,realistic,150,73.84992753623189,9.219492753623188,0.009565217391304347,8664.410507246375,1080.9828985507245,3.468985507246377,0.0,31.95231884057971,1090.2023913043479,8738.260434782609,138 -mistral-7b,0,32,realistic,100,61013.36546666667,7608.681333333335,1.4347333333333332,6730.621,832.9000666666667,9.0442,0.0,170.56473333333332,8441.581400000001,67743.98646666665,150 -llama3.1-70b,0,8,none,50,49679.10541899442,6204.313296089386,1.1609497206703911,6213.481452513966,770.3643575418995,19.97608938547486,0.0,252.88385474860334,6974.677653631284,55892.586871508385,179 -llama3.1-70b,16,0,realistic,30,502.54014598540147,62.79021897810219,0.009927007299270072,9000.658321167883,1123.6408759124088,3.9360583941605842,0.0,38.60094890510949,1186.4310948905108,9503.198467153285,137 -llama2-7b,8,32,realistic,150,3910.8483251231523,488.68492610837444,0.027733990147783254,8448.53339901478,1054.9976354679804,4.104827586206897,0.0,37.93679802955665,1543.6825615763546,12359.381724137933,203 -llama3.1-70b,4,32,none,40,24439.405000000002,3053.9160465116274,0.35156976744186047,7631.1431395348845,951.8558139534885,9.028023255813954,0.0,84.72354651162792,4005.7718604651163,32070.54813953488,172 -llama3.1-70b,16,16,realistic,60,83.94530201342282,10.479731543624162,0.00785234899328859,8891.003624161074,1110.143355704698,4.27013422818792,0.0,39.86040268456376,1120.6230872483222,8974.948926174498,149 -mistral-7b,0,4,none,150,71278.00648148148,8885.403518518518,1.866543209876543,8941.022037037037,1104.5146913580247,9.40395061728395,0.0,255.25870370370373,9989.918209876543,80219.02851851852,162 -llama3.1-8b,16,64,realistic,150,72.34782051282052,9.028910256410256,0.00923076923076923,8923.318974358974,1113.3949358974357,3.6630769230769227,0.0,34.0500641025641,1122.4238461538462,8995.666794871795,156 -llama3.1-70b,16,64,none,10,104.15881355932203,13.003135593220337,0.009915254237288135,9627.248220338983,1202.2231355932201,4.154322033898305,0.0,40.41305084745763,1215.2262711864405,9731.407033898306,118 -llama3.1-70b,32,16,realistic,70,78.36774834437087,9.783443708609271,0.007748344370860927,8638.162317880795,1078.5792715231787,4.347086092715232,0.0,39.92311258278146,1088.3627152317881,8716.530066225167,151 -llama2-7b,4,16,realistic,150,25097.095579399138,3136.0270815450644,0.15021459227467812,7878.8575536480685,980.7982403433477,7.263433476394848,0.0,58.65304721030043,4116.825321888412,32975.95313304721,233 -llama2-7b,8,32,realistic,200,4762.0689130434785,595.0786086956523,0.024043478260869566,8457.476913043478,1056.2040869565217,3.95604347826087,0.0,37.20247826086956,1651.282695652174,13219.545826086956,230 -llama2-7b,64,16,realistic,200,47.59027777777778,5.945416666666667,0.004907407407407408,9016.717037037037,1126.026111111111,4.0495833333333335,0.0,36.950879629629625,1131.9715277777777,9064.307314814814,216 -mistral-7b,8,4,none,150,86.50197278911565,10.79891156462585,0.008979591836734694,8988.841904761905,1121.704693877551,3.630204081632653,0.0,33.82190476190476,1132.5036054421769,9075.34387755102,147 -llama3.1-8b,4,0,none,150,2068.5445142857143,258.13188571428566,0.08017142857142857,8758.003142857144,1092.7705142857142,3.825657142857143,0.0,35.56377142857142,1350.9024000000002,10826.547657142857,175 -llama3.1-8b,8,16,realistic,100,86.42352517985613,10.785539568345325,0.010359712230215827,8868.163669064748,1106.6333812949638,3.849856115107913,0.0,35.22971223021583,1117.4189208633093,8954.587194244605,139 -llama3.1-70b,8,2,none,70,2258.590612244898,282.24632653061224,0.023197278911564628,9187.688707482994,1147.1555102040816,4.473741496598639,0.0,43.30795918367347,1429.4018367346937,11446.279319727892,147 -llama3.1-70b,16,0,none,40,349.4075,43.65291666666667,0.008819444444444444,9664.595625,1206.7467361111112,4.067152777777778,0.0,42.31458333333333,1250.3996527777779,10014.003125,144 -mistral-7b,0,4,realistic,50,44793.49330985915,5585.177746478873,0.9406338028169011,5121.913169014084,634.5774647887324,29.45105633802817,0.0,227.38528169014083,6219.755211267606,49915.406478873236,142 -llama3.1-8b,64,4,none,50,93.50814159292035,11.669734513274337,0.012743362831858406,8873.272566371681,1107.0080530973448,3.9552212389380537,0.0,36.53672566371681,1118.6777876106194,8966.7807079646,113 -llama3.1-70b,0,4,realistic,20,40818.41842465753,5098.344657534247,0.4527397260273972,4460.012397260274,553.9234931506849,14.456301369863013,0.0,122.00794520547944,5652.268150684932,45278.43082191781,146 -llama3.1-8b,0,2,none,150,63941.40695121951,7970.975487804878,1.5334756097560975,8211.345548780488,1016.0999999999999,13.34170731707317,0.0,235.84518292682924,8987.07548780488,72152.7525,164 -llama2-7b,16,2,none,200,66.52069444444444,8.309861111111111,0.004907407407407408,9441.869398148148,1179.4180092592594,4.341527777777777,0.0,41.78268518518518,1187.7278703703703,9508.390092592592,216 -llama2-7b,32,2,none,200,55.51378378378378,6.934324324324325,0.0047747747747747754,9860.687747747746,1231.6233783783782,4.186981981981982,0.0,41.040135135135124,1238.5577027027027,9916.201531531533,222 -llama2-7b,0,64,none,150,57462.80048192771,7178.022208835342,1.41574297188755,7750.259236947792,957.4920080321285,2.5887148594377507,0.0,163.58080321285138,8135.51421686747,65213.059718875505,249 -llama3.1-8b,64,2,realistic,150,82.53589147286822,10.300387596899224,0.011162790697674419,8265.996124031008,1031.0574418604651,3.6264341085271314,0.0,31.2053488372093,1041.3578294573645,8348.532015503875,129 -mistral-7b,0,2,realistic,100,58226.14619354839,7260.508580645162,1.2776129032258066,6599.9910322580645,817.3449677419355,14.319612903225806,0.0,200.38806451612905,8077.8535483870955,64826.13722580645,155 -llama2-7b,4,32,realistic,100,29906.162227488152,3736.96336492891,0.31236966824644546,8146.642132701421,1016.7701421800948,10.728578199052134,0.0,98.03298578199052,4753.733507109005,38052.80436018957,211 -llama3.1-8b,64,64,realistic,100,76.59274074074074,9.558666666666667,0.010666666666666666,8572.453481481482,1069.5260740740741,3.590962962962963,0.0,32.7077037037037,1079.0847407407407,8649.046222222221,135 -llama3.1-8b,8,2,none,200,78.69967741935484,9.821612903225805,0.00929032258064516,8417.453161290323,1050.1963870967743,3.789870967741935,0.0,32.556387096774195,1060.0179999999998,8496.152838709677,155 -llama3.1-8b,16,0,realistic,150,72.08687116564417,8.99638036809816,0.008834355828220859,8804.932944785276,1098.6228220858895,3.6549079754601226,0.0,33.99276073619632,1107.6192024539878,8877.01981595092,163 -llama3.1-70b,16,32,realistic,40,509.70492957746484,63.68598591549296,0.009436619718309858,9373.912323943663,1170.5445774647887,3.8083098591549294,0.0,39.76112676056338,1234.2305633802819,9883.617253521126,142 -mistral-7b,32,16,realistic,100,84.03999999999999,10.491603053435115,0.010076335877862596,8821.686335877865,1100.8741984732826,3.914656488549618,0.0,36.04251908396947,1111.3658015267174,8905.726335877862,131 -llama3.1-70b,0,16,realistic,10,37697.38622222222,4709.810814814816,0.18133333333333335,4010.3843703703706,497.96451851851845,1.7592592592592593,0.0,25.549629629629628,5207.775333333334,41707.77059259259,135 -llama3.1-8b,16,32,none,50,81.60597122302158,10.18431654676259,0.010359712230215827,8854.692302158273,1104.7220863309353,3.858705035971223,0.0,35.6889928057554,1114.9064028776977,8936.298273381295,139 -llama3.1-70b,32,4,none,60,86.77246376811594,10.832681159420291,0.008478260869565216,9310.303768115942,1162.4797101449276,4.062463768115943,0.0,40.46550724637681,1173.3123913043478,9397.076231884059,138 -llama3.1-70b,8,16,realistic,70,2264.5363870967744,282.97490322580643,0.013870967741935483,9049.40935483871,1129.9650322580644,4.2372258064516135,0.0,40.9641935483871,1412.939935483871,11313.945741935482,155 -llama2-7b,8,64,realistic,200,4512.677242798354,563.9020164609053,0.017695473251028805,8457.634320987654,1056.0146913580247,4.250329218106995,0.0,39.846378600823044,1619.91670781893,12970.311563786008,243 -mistral-7b,32,16,realistic,150,75.84696551724137,9.468758620689655,0.00910344827586207,8747.830896551724,1091.5026206896553,3.6731724137931034,0.0,33.44744827586207,1100.971379310345,8823.677862068966,145 -llama3.1-8b,0,16,realistic,100,60748.63324675324,7575.691168831168,1.4324025974025973,6656.3548051948055,823.4934415584415,8.680909090909092,0.0,173.99201298701297,8399.184610389611,67404.98805194805,154 -llama3.1-8b,0,32,realistic,100,60609.55732026143,7558.449738562092,1.4264052287581699,6716.439411764706,830.9049019607843,8.859934640522876,0.0,169.72437908496732,8389.354640522875,67325.99673202615,153 -llama3.1-8b,8,8,none,100,84.88528169014084,10.593591549295775,0.010140845070422535,9013.425000000001,1124.777323943662,3.9356338028169007,0.0,36.455000000000005,1135.3709154929577,9098.310281690141,142 -llama3.1-8b,16,16,none,100,80.35802816901409,10.028591549295774,0.010140845070422535,8819.27746478873,1100.5838028169017,3.9734507042253506,0.0,36.4318309859155,1110.6123943661973,8899.635492957746,142 -llama3.1-8b,32,2,realistic,150,81.85830882352941,10.21580882352941,0.010588235294117647,8634.878235294118,1077.3616911764707,3.770808823529411,0.0,33.05544117647059,1087.5774999999999,8716.736544117646,136 -llama3.1-8b,64,64,realistic,50,93.21981981981982,11.633693693693694,0.012972972972972972,8675.298918918917,1082.0506306306306,3.6709909909909917,0.0,33.693603603603606,1093.6843243243243,8768.518738738738,111 -llama3.1-70b,0,32,none,20,42627.88631944444,5323.686736111111,0.5582638888888888,4993.095069444444,619.6055555555555,19.472986111111112,0.0,168.9701388888889,5943.292291666667,47620.98138888889,144 -llama3.1-70b,32,0,none,60,75.967625,9.4838125,0.0073124999999999996,8965.6578125,1119.4244375,4.156375,0.0,40.696000000000005,1128.90825,9041.6254375,160 -llama3.1-8b,32,32,none,150,72.63577181208053,9.0648322147651,0.009664429530201342,8802.496174496644,1098.245167785235,3.7020134228187915,0.0,33.910134228187914,1107.31,8875.131946308726,149 -llama3.1-8b,0,4,none,200,73918.4510982659,9213.989942196533,2.1597109826589596,9557.078901734105,1178.6306936416183,8.462947976878613,0.0,286.3071098265896,10392.620635838151,83475.53000000001,173 -llama3.1-70b,0,8,none,70,55251.300107526884,6899.932634408603,1.258763440860215,6904.124139784945,855.152311827957,9.663763440860215,0.0,200.57946236559138,7755.0849462365595,62155.42424731183,186 -mistral-7b,32,0,realistic,150,73.03839743589744,9.118141025641027,0.008461538461538461,8706.247564102565,1086.2417307692308,3.54025641025641,0.0,32.873333333333335,1095.3598717948717,8779.28596153846,156 -mistral-7b,16,16,none,200,76.13496732026144,9.504705882352942,0.008627450980392158,9024.577647058823,1126.0586274509803,3.5489542483660133,0.0,33.23529411764706,1135.5633333333333,9100.712614379085,153 -mistral-7b,64,2,realistic,200,77.38722627737226,9.661094890510949,0.009635036496350365,8267.682846715328,1031.5396350364963,3.572408759124088,0.0,30.305036496350365,1041.2007299270074,8345.070072992701,137 -llama3.1-8b,0,8,none,50,49928.00703947368,6225.993881578947,1.1819078947368422,5549.949934210526,687.605197368421,25.414605263157895,0.0,241.82203947368419,6913.599078947369,55477.95697368422,152 -llama3.1-70b,0,32,realistic,10,38969.336766917295,4868.776015037595,0.18481203007518796,3756.3963909774434,466.67458646616547,1.183233082706767,0.0,23.088796992481203,5335.45060150376,42725.733157894734,133 -mistral-7b,4,2,realistic,100,139.69133333333335,17.426592592592595,0.013259259259259259,8497.919777777777,1060.4795555555554,3.974148148148148,0.0,34.67896296296296,1077.9061481481483,8637.611111111111,135 -llama3.1-70b,32,4,realistic,50,93.948984375,11.72859375,0.009140625,9171.10578125,1145.049453125,4.035468750000001,0.0,40.0428125,1156.7780468749997,9265.054765625,128 -llama2-7b,32,32,realistic,100,72.84078787878788,9.096363636363638,0.007333333333333333,9348.442181818182,1167.7035151515151,4.053575757575758,0.0,39.95284848484848,1176.7998787878787,9421.28296969697,165 -llama2-7b,16,0,none,150,113.08385057471264,14.11867816091954,0.00867816091954023,9156.51724137931,1143.309942528736,4.322011494252873,0.0,43.305747126436785,1157.4286206896552,9269.601091954022,174 -llama3.1-8b,64,8,none,50,90.5526724137931,11.300862068965518,0.012413793103448275,8793.673879310345,1097.020948275862,3.9193965517241383,0.0,35.43629310344827,1108.3218103448276,8884.226551724138,116 -llama2-7b,32,16,none,150,66.85340000000001,8.34635,0.00605,9105.8259,1137.4142000000002,4.350599999999999,0.0,41.656099999999995,1145.76055,9172.6793,200 -llama3.1-70b,32,16,none,70,78.5882,9.810933333333335,0.0078,9001.834066666666,1124.0237333333332,4.382733333333334,0.0,41.31086666666666,1133.8346666666666,9080.422266666666,150 -llama2-7b,32,0,none,100,92.1823417721519,11.513164556962025,0.0068987341772151906,9300.525316455696,1161.3216455696204,4.09373417721519,0.0,42.220506329113924,1172.8348101265824,9392.707658227848,158 -llama2-7b,64,4,realistic,150,54.246700507614214,6.777005076142132,0.005380710659898477,9262.610812182742,1156.9855837563452,4.1718781725888325,0.0,39.930456852791885,1163.7625888324874,9316.857512690356,197 -mistral-7b,8,8,none,200,81.66941558441559,10.19564935064935,0.008571428571428572,9080.665194805195,1133.0344155844155,3.7134415584415583,0.0,34.57837662337662,1143.230064935065,9162.334610389611,154 -mistral-7b,32,4,realistic,200,79.25602836879433,9.894397163120567,0.009361702127659575,9005.118581560284,1123.5505673758867,3.6285815602836875,0.0,33.725744680851065,1133.444964539007,9084.374609929078,141 -mistral-7b,64,4,none,100,83.29761904761905,10.398888888888889,0.010476190476190477,8557.781666666666,1067.9488095238096,3.876349206349206,0.0,34.22420634920635,1078.3476984126987,8641.079285714286,126 -llama3.1-70b,4,64,realistic,40,28490.883875,3560.0955625,0.36956250000000007,7320.9913750000005,910.5305000000001,12.991812500000004,0.0,103.94424999999998,4470.6260624999995,35811.87525,160 -llama3.1-70b,8,64,none,30,7519.154421768707,939.5757142857143,0.04013605442176871,8848.090340136054,1104.701632653061,4.0941496598639455,0.0,40.15387755102041,2044.2773469387757,16367.244761904762,147 -mistral-7b,4,2,none,100,98.35149999999999,12.278214285714286,0.0095,8552.411714285716,1067.3015,3.9425714285714286,0.0,34.660714285714285,1079.579714285714,8650.763214285715,140 -mistral-7b,64,0,none,200,65.32269938650307,8.154907975460123,0.008098159509202455,8469.09509202454,1056.2225766871168,3.1774846625766866,0.0,28.993128834355822,1064.3774846625768,8534.417791411044,163 -llama2-7b,8,2,none,200,8079.014579831933,1009.6155462184873,0.04184873949579832,8226.657100840337,1027.3668487394957,3.745462184873949,0.0,34.76302521008403,2036.9823949579834,16305.671680672269,238 -llama3.1-70b,4,0,realistic,50,15712.81406666667,1963.1256,0.08026666666666667,6542.3044666666665,816.4063333333332,3.6602666666666663,0.0,34.875733333333336,2779.531933333334,22255.11853333333,150 -llama3.1-70b,32,32,none,20,90.22292307692307,11.263461538461538,0.009,9094.816307692308,1135.540692307692,3.8826923076923077,0.0,38.2453076923077,1146.8041538461537,9185.039230769231,130 -llama3.1-70b,32,2,none,40,102.33949152542372,12.776016949152542,0.009915254237288135,9714.958559322033,1213.0608474576268,4.014491525423729,0.0,40.953220338983044,1225.8368644067793,9817.298050847457,118 -llama3.1-70b,32,16,realistic,20,105.01035398230088,13.109469026548672,0.010353982300884955,9471.852831858409,1182.635663716814,3.886725663716814,0.0,40.16787610619468,1195.7451327433628,9576.863185840708,113 -llama3.1-70b,16,8,none,60,90.33690647482014,11.277625899280576,0.00841726618705036,9377.739136690647,1170.9262589928057,4.270359712230215,0.0,42.02971223021582,1182.2038848920865,9468.076043165469,139 -mistral-7b,8,4,realistic,100,94.07220588235293,11.744044117647059,0.009705882352941177,8842.890073529412,1103.6264705882354,3.859705882352941,0.0,35.116470588235295,1115.3705147058824,8936.962279411766,136 -llama3.1-8b,4,2,realistic,100,722.0974264705883,90.17448529411764,0.02176470588235294,8626.954632352941,1076.6416911764704,3.9724264705882355,0.0,35.16948529411765,1166.8161764705883,9349.05205882353,136 -llama2-7b,8,4,realistic,200,10345.999834710743,1292.9040495867769,0.05185950413223141,8813.311983471074,1100.5218595041324,4.1903305785123965,0.0,41.66979338842975,2393.4259090909095,19159.311818181817,242 -llama2-7b,8,8,realistic,100,10294.271010638298,1286.4151063829788,0.04425531914893617,8462.966595744681,1057.0385638297873,4.023351063829788,0.0,39.521702127659566,2343.453670212766,18757.23760638298,188 -llama2-7b,4,64,realistic,150,18492.280633484163,2310.6542986425343,0.07529411764705883,8322.208099547512,1038.0432126696833,4.497285067873303,0.0,44.26361990950225,3348.697511312218,26814.488733031674,221 -mistral-7b,32,32,realistic,100,79.09442028985508,9.874202898550726,0.009565217391304347,8679.763913043478,1083.0746376811594,3.7503623188405792,0.0,33.65746376811594,1092.9488405797101,8758.858333333334,138 -llama3.1-8b,32,2,realistic,50,96.11637931034483,11.995172413793103,0.012413793103448275,8428.264568965516,1051.4409482758622,3.940862068965517,0.0,35.13758620689654,1063.4361206896554,8524.380948275862,116 -llama3.1-70b,8,16,none,20,5160.973235294117,644.9201470588235,0.033382352941176474,9127.886029411764,1139.779338235294,4.351323529411765,0.0,43.74014705882353,1784.6994852941177,14288.859264705881,136 -llama3.1-70b,16,2,realistic,10,131.0158163265306,16.356020408163268,0.01193877551020408,9396.518265306122,1173.363469387755,3.6440816326530614,0.0,36.64091836734694,1189.7194897959182,9527.534081632653,98 -mistral-7b,64,2,none,100,86.02325203252033,10.73918699186992,0.010731707317073172,8337.544796747967,1040.3634959349595,3.7897560975609754,0.0,32.933902439024386,1051.1026829268294,8423.568048780488,123 -llama3.1-70b,0,32,none,50,50309.847528089886,6283.134887640448,1.1941573033707866,6143.575674157303,761.9328651685393,17.461067415730337,0.0,234.61016853932585,7045.067752808989,56453.42320224719,178 -llama2-7b,8,16,realistic,200,5369.977792207793,671.0327705627706,0.032034632034632034,8734.685584415585,1090.9263636363637,4.287792207792207,0.0,40.69796536796537,1761.959134199134,14104.663376623377,231 -llama2-7b,8,4,none,50,13052.981449275361,1631.2579710144928,0.05630434782608695,9010.004855072464,1125.199420289855,3.590797101449275,0.0,37.57971014492754,2756.4573913043473,22062.986304347825,138 -mistral-7b,32,0,none,200,64.33534090909092,8.031647727272727,0.007500000000000001,8680.598068181818,1083.0464204545456,3.496761363636364,0.0,32.68863636363636,1091.0780681818183,8744.933409090909,176 -mistral-7b,64,32,realistic,200,71.77783216783216,8.960769230769232,0.009230769230769232,8788.56881118881,1096.5051748251747,3.553706293706294,0.0,32.861118881118884,1105.4659440559442,8860.346643356645,143 -llama3.1-8b,16,32,realistic,150,77.3025850340136,9.647278911564626,0.009795918367346938,8934.48850340136,1114.6711564625848,3.613401360544218,0.0,33.520816326530614,1124.3184353741497,9011.791088435375,147 -mistral-7b,8,0,none,200,5224.861657458563,652.4036464088398,0.06591160220994474,8923.669779005524,1113.054640883978,5.692541436464088,0.0,56.6479005524862,1765.4582872928174,14148.531436464089,181 -mistral-7b,16,4,realistic,200,81.61910958904109,10.189383561643837,0.00904109589041096,8902.79212328767,1110.8700684931507,3.7109589041095887,0.0,34.335,1121.0594520547945,8984.411232876713,146 -llama3.1-70b,4,8,realistic,10,25039.51643410853,3128.832093023256,0.06945736434108528,5799.364108527132,723.532015503876,2.1108527131782946,0.0,22.074496124031008,3852.3641085271315,30838.88054263566,129 -llama3.1-8b,16,8,none,150,78.0747619047619,9.74360544217687,0.009795918367346938,8877.597959183673,1107.672448979592,3.702517006802721,0.0,33.79442176870749,1117.4160544217689,8955.672721088436,147 -mistral-7b,16,4,none,100,86.15391304347825,10.755507246376812,0.009565217391304347,8826.504855072464,1101.569347826087,4.02768115942029,0.0,36.73166666666667,1112.3248550724636,8912.658768115944,138 -llama3.1-70b,32,2,realistic,50,96.91392,12.09872,0.00936,9124.77616,1139.2788799999996,3.8296000000000006,0.0,37.53679999999999,1151.3775999999996,9221.69008,125 -llama2-7b,16,0,realistic,200,782.8557894736842,97.80105263157895,0.05578947368421053,4397.772105263158,547.3057894736843,1.756842105263158,0.0,12.318947368421053,645.1068421052632,5180.627894736842,19 -mistral-7b,16,4,none,50,93.71322834645669,11.699212598425197,0.010393700787401575,9052.355196850393,1129.494173228346,3.848661417322835,0.0,36.18377952755906,1141.1933858267714,9146.06842519685,127 -llama2-7b,64,64,realistic,200,47.94304347826087,5.989468599033816,0.005120772946859904,9357.775797101449,1168.6551207729467,4.274299516908212,0.0,42.21497584541063,1174.6445893719806,9405.71884057971,207 -llama2-7b,0,32,none,100,56263.660762331834,7028.787085201794,1.3682959641255605,7629.939192825112,946.6150672645741,3.459103139013453,0.0,167.6918385650224,7975.4021524663685,63893.59995515695,223 -llama2-7b,64,0,none,200,1127.286818181818,140.66272727272727,0.06272727272727273,2070.9363636363637,256.6422727272727,0.6772727272727272,0.0,1.3177272727272726,397.30500000000006,3198.223181818182,22 -llama2-7b,64,2,none,150,54.75609137055838,6.8406091370558375,0.005380710659898477,9303.740203045685,1162.1155329949238,4.034771573604061,0.0,38.871827411167516,1168.9561421319797,9358.496294416243,197 -llama3.1-8b,16,0,none,50,80.86979310344829,10.092413793103448,0.00993103448275862,8947.809172413794,1116.4004137931033,3.7180689655172414,0.0,35.46806896551724,1126.492827586207,9028.678965517242,145 -llama3.1-8b,32,64,none,50,75.84204225352113,9.465,0.010140845070422535,8995.828169014085,1122.2607042253524,3.733732394366197,0.0,35.36359154929577,1131.7257042253525,9071.670211267605,142 -llama3.1-70b,4,64,realistic,10,13039.82691588785,1629.477757009346,0.04719626168224299,7734.55476635514,965.2539252336448,2.833831775700935,0.0,29.01943925233645,2594.731682242991,20774.38168224299,107 -llama3.1-70b,8,8,realistic,10,6344.596181818182,792.7467272727274,0.04345454545454545,8050.332,1005.0857272727272,3.0206363636363647,0.0,31.52154545454545,1797.832454545454,14394.928181818183,110 -mistral-7b,32,0,none,100,72.76346153846154,9.083846153846153,0.008461538461538461,8761.728653846154,1093.3651923076925,3.7616025641025637,0.0,35.03371794871794,1102.4490384615383,8834.492115384615,156 -llama3.1-8b,64,0,realistic,100,78.87448529411765,9.843455882352941,0.010588235294117647,8726.81455882353,1088.8473529411765,3.5641176470588234,0.0,32.75625,1098.6908088235296,8805.689044117647,136 -llama3.1-70b,16,32,none,50,81.34572368421053,10.155197368421051,0.007697368421052631,8894.36677631579,1110.5575,4.135921052631579,0.0,40.05355263157895,1120.712697368421,8975.7125,152 -llama3.1-70b,16,16,none,70,302.3762666666667,37.776666666666664,0.008466666666666667,9210.147533333333,1150.0985999999998,4.431666666666667,0.0,42.88586666666666,1187.8752666666667,9512.5238,150 -mistral-7b,8,2,realistic,50,105.16357723577237,13.12869918699187,0.010731707317073172,8821.0743902439,1100.4934959349594,3.8534959349593496,0.0,35.07666666666666,1113.622195121951,8926.237967479676,123 -mistral-7b,32,0,realistic,100,78.67193103448277,9.821448275862068,0.00910344827586207,8679.764344827587,1083.045103448276,3.677724137931034,0.0,33.72558620689655,1092.8665517241382,8758.436275862068,145 -mistral-7b,4,2,realistic,200,150.9262666666667,18.84286666666667,0.013133333333333334,8598.891,1072.9521333333334,3.761133333333334,0.0,33.587133333333334,1091.795,8749.817266666665,150 -llama3.1-70b,0,0,realistic,50,47446.7468361582,5925.650112994349,1.10180790960452,5583.869096045199,692.188813559322,20.612372881355935,0.0,232.13542372881358,6617.838926553673,53030.615932203385,177 -llama3.1-70b,8,2,realistic,20,5268.9611904761905,658.4263492063494,0.03420634920634919,8547.237222222224,1067.0662698412696,3.6899206349206355,0.0,35.69587301587302,1725.4926190476194,13816.198412698413,126 -mistral-7b,0,16,none,150,74463.0588607595,9282.092594936708,1.990316455696202,9492.578037974685,1171.937911392405,6.012911392405063,0.0,243.87158227848101,10454.030506329116,83955.63689873418,158 -llama2-7b,0,64,none,50,47852.288858695654,5977.297228260869,1.1979347826086957,6404.133097826087,792.3197826086956,16.21663043478261,0.0,216.5940760869565,6769.617010869564,54256.42195652174,184 -llama2-7b,4,4,realistic,50,22269.796708074533,2783.05099378882,0.08937888198757764,7656.568881987577,956.0875776397514,4.099565217391304,0.0,42.69826086956522,3739.1385714285716,29926.36559006211,161 -llama3.1-8b,8,16,realistic,50,92.48276923076924,11.541692307692308,0.011076923076923076,8817.16123076923,1100.0501538461535,3.8579230769230772,0.0,35.8573076923077,1111.591846153846,8909.644,130 -llama3.1-70b,0,16,none,40,46103.438363636364,5757.8820000000005,1.0374545454545454,6056.708303030303,751.2830303030304,23.234363636363636,0.0,251.87763636363636,6509.16503030303,52160.14666666666,165 -llama3.1-70b,16,8,realistic,30,786.2882089552238,98.25014925373137,0.009925373134328357,8984.363507462685,1121.8283582089553,3.975298507462687,0.0,39.776119402985074,1220.0785074626867,9770.651716417911,134 -llama3.1-70b,32,0,none,30,90.20037037037036,11.260592592592593,0.008666666666666666,9484.852074074075,1184.2331851851852,3.989777777777777,0.0,41.43133333333333,1195.493777777778,9575.052444444445,135 -llama3.1-70b,32,64,realistic,70,73.73430379746836,9.205,0.00740506329113924,9034.080379746834,1127.9895569620253,4.237848101265824,0.0,40.2190506329114,1137.1945569620252,9107.814683544304,158 -mistral-7b,64,16,none,50,86.8653781512605,10.844285714285714,0.011092436974789916,8955.290504201681,1117.2705042016808,3.7815966386554623,0.0,35.76638655462185,1128.1147899159662,9042.155882352941,119 -llama3.1-70b,16,8,realistic,60,92.69220588235294,11.571691176470589,0.008602941176470588,9465.125588235294,1181.8961764705882,4.126985294117647,0.0,41.507205882352935,1193.467867647059,9557.817794117647,136 -llama2-7b,32,32,realistic,150,57.48785,7.181900000000001,0.0053,8730.95185,1090.55275,4.4079,0.0,39.876450000000006,1097.7346499999999,8788.439699999999,200 -llama2-7b,0,16,none,150,60654.30809128631,7576.393651452283,1.4227385892116182,8255.20784232365,1020.2048547717842,6.206804979253112,0.0,222.42933609958507,8596.598506224065,68909.51593360996,241 -mistral-7b,64,32,none,200,70.50131034482759,8.80144827586207,0.00910344827586207,8787.725655172413,1096.4973793103447,3.6204137931034484,0.0,33.52820689655172,1105.298827586207,8858.226965517242,145 -llama2-7b,4,2,realistic,50,25582.41935897436,3196.982948717949,0.09653846153846155,7263.654679487178,907.1764743589744,3.49474358974359,0.0,36.51275641025641,4104.159423076924,32846.07403846154,156 -llama2-7b,0,32,none,150,58402.31784313726,7295.472196078432,1.3966274509803922,7651.538470588235,945.1213333333334,3.474274509803921,0.0,177.78160784313724,8240.593529411764,66053.85631372548,255 -llama2-7b,0,4,realistic,150,55588.24518518519,6943.436049382716,1.7299176954732511,7628.327160493827,946.262633744856,13.011316872427983,0.0,265.9727160493827,7889.698683127572,63216.572345679015,243 -llama3.1-70b,0,4,realistic,70,52601.775245901634,6568.375737704917,1.1439890710382512,6769.709016393443,838.8084153005465,16.820437158469947,0.0,244.8962841530055,7407.184153005464,59371.484262295075,183 -llama3.1-8b,64,64,none,150,71.03186206896552,8.864689655172414,0.00993103448275862,8589.48124137931,1071.5041379310344,3.4026206896551723,0.0,30.814896551724136,1080.3688275862069,8660.513103448275,145 -llama2-7b,4,2,none,150,13885.418883720931,1735.2146046511627,0.06255813953488372,7709.341441860466,962.9857674418605,4.10506976744186,0.0,35.494837209302325,2698.200372093024,21594.760325581396,215 -llama2-7b,8,0,realistic,50,14810.959532163743,1850.8789473684212,0.0871345029239766,8328.637894736843,1040.0187719298247,3.870994152046784,0.0,45.16789473684211,2890.8977192982456,23139.597426900586,171 -llama2-7b,32,2,none,50,106.1025641025641,13.255299145299144,0.00905982905982906,10866.158803418804,1357.1895726495725,3.9364957264957265,0.0,42.33974358974359,1370.4448717948717,10972.261367521369,117 -llama2-7b,64,4,none,150,54.64912371134021,6.819793814432989,0.005876288659793815,9019.299845360825,1126.4942268041236,4.122422680412371,0.0,37.80546391752578,1133.3140206185567,9073.948969072164,194 -mistral-7b,0,32,realistic,150,68211.93089743589,8503.931217948719,1.8210256410256414,8248.706538461538,1019.2104487179488,6.265576923076923,0.0,210.73769230769233,9523.141666666666,76460.63743589743,156 -mistral-7b,32,0,realistic,200,68.13874251497006,8.506467065868263,0.007904191616766467,8795.247664670658,1097.1807185628743,3.3544910179640715,0.0,31.282514970059882,1105.6871856287423,8863.38640718563,167 -llama3.1-70b,0,32,realistic,20,41059.73582733813,5128.301223021583,0.4084892086330935,5219.230215827338,647.4712949640289,12.141726618705034,0.0,120.06021582733813,5775.772517985612,46278.96604316547,139 -mistral-7b,0,32,realistic,200,73033.86650602409,9104.792590361447,2.2315662650602412,8781.689096385542,1084.0171686746987,5.3062048192771085,0.0,252.70542168674703,10188.809759036145,81815.55560240964,166 -llama3.1-70b,32,64,realistic,50,78.81378378378379,9.839121621621622,0.007905405405405404,9380.69277027027,1171.0409459459459,4.129324324324324,0.0,41.23898648648649,1180.8800675675675,9459.506554054055,148 -llama2-7b,0,2,realistic,150,47921.306163265304,5985.4852244897975,1.0427755102040814,7215.007918367347,895.6751020408165,30.44702040816326,0.0,327.9756326530612,6881.160326530613,55136.31408163265,245 -mistral-7b,64,4,realistic,100,87.09570247933884,10.87305785123967,0.01090909090909091,8724.854214876032,1088.7057024793387,3.85900826446281,0.0,34.86181818181818,1099.5787603305785,8811.949917355372,121 -llama2-7b,4,16,realistic,200,27447.70391472868,3429.888449612403,0.18112403100775196,7912.514961240309,987.5865891472868,9.430542635658915,0.0,77.7994573643411,4417.4750387596905,35360.21887596899,258 -llama2-7b,4,0,none,150,21856.868657407405,2730.873657407408,0.12203703703703705,7463.19875,931.8737500000002,5.6662037037037045,0.0,54.99106481481481,3662.7474074074075,29320.06740740741,216 -llama2-7b,0,0,none,200,35838.9084137931,4474.5977241379305,1.031793103448276,12450.581034482759,1527.5555862068966,2.1033793103448275,0.0,133.96993103448276,6002.153310344826,48289.48944827586,145 -llama3.1-8b,0,0,none,100,63951.67118012423,7975.115776397515,1.6129813664596273,6997.991925465838,866.0727329192548,6.98360248447205,0.0,179.6204347826087,8841.188509316771,70949.66310559007,161 -llama3.1-70b,32,4,realistic,10,135.39213483146068,16.902359550561798,0.013146067415730336,9071.041797752809,1132.682584269663,3.7243820224719104,0.0,36.30213483146068,1149.584943820225,9206.43393258427,89 -mistral-7b,0,2,realistic,200,65800.6199408284,8202.056745562131,1.6131952662721893,8561.797751479291,1057.5837869822485,10.657928994082841,0.0,228.33840236686393,9259.64053254438,74362.41769230769,169 -llama3.1-8b,0,2,none,200,62300.90303571429,7765.88113095238,1.459345238095238,8448.229583333334,1043.8469642857142,13.746130952380952,0.0,234.71029761904762,8809.728095238093,70749.13261904762,168 -llama3.1-8b,4,16,none,100,1577.6290476190477,197.03850340136057,0.018027210884353745,8650.499455782312,1079.5791836734695,3.7899319727891156,0.0,34.851360544217684,1276.6176870748297,10228.128503401362,147 -llama3.1-8b,16,2,none,150,82.46085106382978,10.290992907801419,0.010212765957446808,8932.24219858156,1114.4345390070923,3.6103546099290775,0.0,32.924468085106376,1124.7255319148937,9014.70304964539,141 -llama3.1-8b,64,8,realistic,150,82.83937007874016,10.338267716535434,0.011338582677165353,8883.002677165354,1108.2524409448818,3.6536220472440943,0.0,33.45566929133859,1118.5907086614172,8965.842047244094,127 -llama3.1-70b,0,0,realistic,60,50851.24679558011,6350.591988950276,1.2057458563535912,6381.210662983425,791.0420441988952,17.910552486187846,0.0,241.55596685082875,7141.63403314917,57232.45745856354,181 -llama3.1-70b,0,16,realistic,40,44410.89773006135,5546.716748466258,0.9168098159509204,5731.479386503068,710.7405521472392,23.02699386503068,0.0,240.89159509202455,6257.457300613497,50142.377116564414,163 -llama3.1-70b,8,64,none,50,7647.739464285714,955.6253571428573,0.03386904761904762,8511.613928571429,1062.6285714285714,3.9145833333333333,0.0,38.94797619047619,2018.2539285714288,16159.353392857143,168 -llama3.1-70b,16,8,realistic,10,183.68666666666667,22.933010752688173,0.013225806451612903,9777.731612903226,1221.065053763441,3.3797849462365597,0.0,36.70172043010752,1243.998064516129,9961.41827956989,93 -llama3.1-70b,32,8,none,10,106.394375,13.282232142857142,0.01044642857142857,9692.573124999999,1210.4544642857143,4.324821428571428,0.0,41.27955357142857,1223.7366964285716,9798.967499999999,112 -llama3.1-70b,64,0,realistic,30,131.63431818181817,16.43318181818182,0.013295454545454544,8579.54715909091,1070.779318181818,3.6337499999999996,0.0,32.72261363636363,1087.2125,8711.181477272727,88 -llama3.1-8b,0,16,none,150,73885.28341614906,9210.891801242236,1.9798136645962734,8978.183664596274,1108.065403726708,6.346645962732919,0.0,242.81347826086957,10318.957204968943,82863.46708074534,161 -llama3.1-8b,4,2,realistic,200,97.26846666666667,12.124733333333332,0.014933333333333335,8793.9574,1097.2202666666665,3.5914,0.0,32.53266666666667,1109.345,8891.225866666666,150 -llama3.1-8b,32,32,realistic,200,70.44642857142857,8.791623376623377,0.00935064935064935,8914.004545454545,1112.0966233766235,3.435519480519481,0.0,31.611233766233767,1120.8882467532467,8984.450974025975,154 -llama3.1-8b,32,8,realistic,100,84.643,10.563307692307692,0.011076923076923076,8846.83776923077,1103.9756923076925,3.8970000000000002,0.0,35.09953846153846,1114.539,8931.48076923077,130 -llama3.1-8b,0,0,realistic,100,60353.74509803922,7526.695620915032,1.4088235294117648,6592.245163398693,816.5866013071895,8.24718954248366,0.0,163.22633986928105,8343.282222222224,66945.99026143791,153 -mistral-7b,4,64,none,150,2849.367012195122,355.8929268292683,0.01951219512195122,8826.565487804877,1101.2303048780489,3.8222560975609765,0.0,35.923048780487804,1457.123231707317,11675.932499999999,164 -mistral-7b,64,64,realistic,50,92.83372727272726,11.589454545454545,0.012,8782.024181818182,1095.454090909091,3.693090909090909,0.0,34.23990909090909,1107.0435454545454,8874.85790909091,110 -llama3.1-8b,4,8,realistic,150,120.87535211267605,15.07387323943662,0.014577464788732394,9084.32718309859,1133.5673943661973,3.6968309859154926,0.0,34.35845070422535,1148.6412676056339,9205.202535211267,142 -mistral-7b,8,4,none,50,98.79643410852714,12.333798449612402,0.010232558139534885,8955.673643410852,1117.3578294573647,3.8634883720930233,0.0,36.43883720930233,1129.691627906977,9054.470077519381,129 -llama3.1-70b,16,16,none,10,315.8337606837607,39.453076923076914,0.010427350427350428,10082.294786324786,1259.299145299145,4.16025641025641,0.0,42.702820512820516,1298.7522222222224,10398.128547008548,117 -llama3.1-70b,32,4,none,70,88.65162962962962,11.06725925925926,0.008666666666666666,9276.753703703704,1158.3693333333333,4.368666666666667,0.0,41.94444444444444,1169.4365925925927,9365.405333333332,135 -llama3.1-70b,32,16,realistic,40,89.80265151515152,11.210984848484848,0.008863636363636363,9494.374545454544,1185.4275757575758,4.24030303030303,0.0,42.32924242424242,1196.6385606060608,9584.177196969698,132 -llama2-7b,4,64,realistic,100,29992.6774,3747.83525,0.59395,8420.38105,1049.8558000000003,21.42865,0.0,162.94425,4797.691049999999,38413.05845,200 -llama3.1-8b,4,64,none,100,4190.222307692308,523.3780128205128,0.024551282051282052,8678.086987179488,1082.8650641025642,4.157179487179487,0.0,37.8326282051282,1606.2430769230768,12868.309294871795,156 -llama3.1-70b,0,64,realistic,20,41513.90692307692,5184.464895104896,0.39146853146853144,5105.878811188811,633.7972727272728,12.305734265734266,0.0,116.01181818181814,5818.262167832168,46619.78573426573,143 -llama3.1-70b,8,0,none,50,9759.320975609755,1219.4800000000002,0.04878048780487805,8868.945426829268,1107.1818292682926,4.333475609756098,0.0,45.59280487804878,2326.6618292682924,18628.266402439025,164 -llama3.1-70b,8,32,none,10,12161.80157480315,1519.745354330709,0.051102362204724416,8920.027559055117,1113.826220472441,3.81259842519685,0.0,40.46818897637796,2633.5715748031503,21081.829133858268,127 -llama3.1-70b,16,16,none,30,796.7327659574469,99.55758865248228,0.01049645390070922,9311.292269503545,1162.723829787234,4.193049645390071,0.0,41.84411347517731,1262.2814184397162,10108.025035460993,141 -llama3.1-70b,16,4,none,10,426.41301724137935,53.27456896551724,0.011206896551724138,9768.949482758622,1220.164568965517,4.4350862068965515,0.0,45.19517241379311,1273.4391379310346,10195.362500000001,116 -llama3.1-70b,0,0,none,30,44981.266624203825,5617.72439490446,0.8450318471337579,5570.602929936305,691.5762420382165,24.188343949044583,0.0,235.31636942675158,6309.300636942677,50551.869554140125,157 -mistral-7b,8,4,realistic,200,86.22716216216217,10.764662162162162,0.00891891891891892,9043.230202702704,1128.3600675675675,3.581216216216216,0.0,33.56716216216216,1139.1247297297298,9129.457364864864,148 -mistral-7b,32,16,none,150,75.52055172413793,9.427999999999999,0.00910344827586207,8853.011034482759,1104.6562758620687,3.7208965517241377,0.0,34.11737931034483,1114.0842758620688,8928.531586206896,145 -mistral-7b,4,2,realistic,150,103.17035460992908,12.876241134751774,0.010567375886524823,8728.47340425532,1089.1905673758865,3.7612056737588646,0.0,33.41156028368794,1102.0668085106383,8831.643758865248,141 -llama2-7b,32,4,realistic,150,61.1382,7.63795,0.0053,9303.7862,1162.21455,4.2726,0.0,40.95570000000001,1169.8525,9364.9244,200 -llama3.1-70b,16,2,none,70,88.52111111111111,11.05097222222222,0.008125,9051.35236111111,1130.2475694444447,4.292847222222223,0.0,40.825833333333335,1141.2985416666666,9139.873472222222,144 -llama2-7b,16,2,realistic,200,467.0450909090909,58.36409090909092,0.006727272727272728,9357.43668181818,1168.7933636363634,4.17540909090909,0.0,39.90018181818182,1227.1574545454546,9824.481772727273,220 -llama2-7b,0,64,realistic,50,47152.71394444444,5890.256722222222,1.1446111111111112,6677.582833333334,828.9108333333334,17.45077777777778,0.0,244.57255555555554,6719.167555555557,53830.296777777774,180 -llama2-7b,4,64,realistic,200,17046.189838709677,2129.8401612903226,0.07129032258064516,7951.792580645161,992.518306451613,4.48866935483871,0.0,43.163387096774194,3122.358467741936,24997.98241935484,248 -llama2-7b,64,64,none,150,49.30265,6.15935,0.0053,8908.600050000001,1112.47355,4.1354500000000005,0.0,40.5537,1118.6328999999998,8957.9027,200 -mistral-7b,8,2,none,100,93.83751824817519,11.714744525547445,0.009635036496350365,8814.478321167884,1100.0095620437955,3.8256204379562035,0.0,34.67401459854015,1111.7243065693428,8908.315839416058,137 -mistral-7b,16,64,none,200,2061.5797530864197,257.51,0.022098765432098766,9056.418209876543,1129.90524691358,3.7430246913580247,0.0,36.14024691358025,1387.4152469135804,11117.997962962963,162 -mistral-7b,32,8,realistic,50,93.40638655462185,11.6609243697479,0.011092436974789916,8745.709411764705,1091.1607563025211,3.7923529411764703,0.0,35.146554621848736,1102.821680672269,8839.115798319328,119 -llama3.1-8b,0,4,realistic,100,59490.97258064516,7418.637290322582,1.3916129032258067,6458.746709677419,798.7755483870967,11.59058064516129,0.0,199.33774193548385,8217.412838709679,65949.71929032258,155 -llama3.1-8b,0,64,realistic,50,45127.44897260274,5627.6434246575345,0.9550684931506849,5037.27794520548,623.6510958904109,29.114041095890407,0.0,225.4231506849315,6251.2945205479455,50164.72691780822,146 -llama3.1-8b,4,32,none,50,1481.5386231884056,185.0400724637681,0.01884057971014493,9092.907608695652,1134.538768115942,3.96072463768116,0.0,37.294710144927535,1319.5788405797102,10574.446231884058,138 -llama3.1-70b,4,4,realistic,70,10404.838716216216,1299.9977027027028,0.06141891891891893,8587.60831081081,1072.1583108108107,4.632972972972974,0.0,42.192162162162155,2372.1560135135132,18992.44702702703,148 -llama3.1-70b,8,0,realistic,70,4164.54497005988,520.3662275449102,0.025508982035928142,8732.240538922155,1090.186526946108,4.299700598802395,0.0,41.71898203592814,1610.5527544910183,12896.785508982035,167 -llama3.1-70b,16,64,realistic,40,1028.108843537415,128.47068027210884,0.012108843537414964,9260.822517006804,1156.2678231292516,4.082585034013606,0.0,41.98741496598639,1284.7385034013605,10288.931360544218,147 -llama3.1-70b,32,4,none,10,110.14137614678899,13.75,0.01073394495412844,9790.478532110094,1222.678348623853,4.215137614678899,0.0,43.017522935779816,1236.4283486238533,9900.619908256882,109 -llama3.1-70b,32,8,none,70,81.92137931034483,10.22703448275862,0.008068965517241379,9034.530413793105,1128.0851724137929,4.234275862068966,0.0,40.01110344827587,1138.3122068965517,9116.451793103448,145 -llama3.1-70b,16,4,realistic,10,139.99362637362637,17.47681318681319,0.012857142857142857,9214.84934065934,1150.6780219780221,3.5738461538461537,0.0,36.20384615384616,1168.1548351648353,9354.842967032966,91 -llama3.1-8b,0,16,none,50,50958.723881578946,6354.746118421052,1.2207236842105265,5844.874868421052,724.0682236842105,25.68644736842105,0.0,256.0686184210526,7078.814342105264,56803.59875,152 -mistral-7b,64,0,none,50,80.32796992481204,10.028195488721805,0.009924812030075189,8565.839398496242,1068.5154887218046,3.575187969924813,0.0,32.93187969924812,1078.5436842105264,8646.167368421053,133 -llama2-7b,0,8,none,50,48629.48679775281,6074.478202247192,1.0526966292134832,7268.5832584269665,903.1192696629213,19.032640449438205,0.0,274.0064606741573,6977.597471910111,55898.07005617977,178 -llama3.1-70b,4,8,realistic,40,4534.104402985075,566.5316417910448,0.02567164179104478,8795.310223880595,1098.2328358208956,4.037910447761194,0.0,39.18246268656717,1664.7644776119403,13329.41462686567,134 -llama3.1-70b,32,16,none,40,86.82397058823528,10.839117647058822,0.008602941176470588,9114.756617647057,1138.0017647058824,4.040661764705882,0.0,39.824411764705886,1148.840882352941,9201.580588235294,136 -llama2-7b,16,16,none,150,72.95683417085426,9.111658291457287,0.006080402010050251,9157.466130653267,1143.7904020100502,4.303517587939698,0.0,40.09743718592964,1152.9020603015076,9230.42296482412,199 -llama3.1-8b,16,16,realistic,50,90.21606299212598,11.258818897637795,0.011338582677165353,8927.928582677165,1113.8222047244096,3.73755905511811,0.0,35.4367716535433,1125.0810236220473,9018.144645669292,127 -mistral-7b,32,64,none,200,398.9627950310559,49.832608695652176,0.015900621118012426,8478.924223602486,1057.8211801242237,3.5033540372670813,0.0,31.149813664596273,1107.653788819876,8877.88701863354,161 -mistral-7b,4,64,realistic,150,2272.3549032258065,283.8104516129032,0.01961290322580645,8922.516129032258,1113.2933548387098,3.480451612903226,0.0,32.994193548387095,1397.1038064516129,11194.871032258065,155 -llama3.1-70b,32,64,none,60,72.11260869565217,9.002546583850933,0.007267080745341614,8703.393602484473,1086.703850931677,4.131490683229814,0.0,39.55826086956522,1095.706397515528,8775.506211180124,161 -llama3.1-70b,16,2,none,60,93.78316176470588,11.707867647058823,0.008602941176470588,9301.409852941177,1161.3220588235292,4.137132352941176,0.0,40.49066176470588,1173.0299264705882,9395.193014705883,136 -mistral-7b,8,16,none,200,81.40405228758169,10.162549019607843,0.008627450980392158,9108.102745098038,1136.5178431372549,3.732156862745098,0.0,34.64385620915033,1146.6803921568628,9189.50679738562,153 -llama3.1-70b,4,32,none,50,23702.848773006135,2961.7856441717795,0.19392638036809814,8241.565337423312,1027.9641717791412,9.798650306748469,0.0,83.04319018404907,3989.749815950921,31944.414110429447,163 -llama3.1-70b,16,32,none,10,112.64972727272726,14.06318181818182,0.010636363636363637,10225.853727272728,1277.1972727272725,4.287454545454546,0.0,45.415545454545445,1291.2604545454544,10338.503454545453,110 -llama3.1-70b,4,8,none,30,20821.685147058823,2601.7920588235293,0.12419117647058824,8616.561323529411,1075.4774264705882,5.701544117647059,0.0,57.721985294117644,3677.2694852941177,29438.246470588234,136 -llama2-7b,16,64,realistic,100,3429.2648876404496,428.54331460674155,0.014719101123595505,9088.547977528091,1134.8832584269662,3.951741573033708,0.0,40.00365168539326,1563.4265730337079,12517.812865168538,178 -mistral-7b,4,16,none,150,722.2027814569536,90.20523178807949,0.011655629139072848,9031.417880794703,1127.102847682119,3.8127152317880797,0.0,35.87046357615895,1217.308079470199,9753.620662251655,151 -llama3.1-8b,0,64,realistic,150,68881.50583850931,8587.870621118012,1.7673291925465837,8080.7198136645975,998.7431677018633,5.717701863354037,0.0,200.8718633540373,9586.613788819875,76962.22565217392,161 -llama3.1-8b,4,16,none,150,622.9718954248366,77.811045751634,0.010849673202614379,8889.41934640523,1109.2832679738565,3.7430065359477127,0.0,34.34104575163399,1187.09431372549,9512.391241830064,153 -llama3.1-8b,8,16,none,50,91.45770992366411,11.41381679389313,0.01099236641221374,8776.563893129773,1094.952900763359,3.91618320610687,0.0,35.5736641221374,1106.366717557252,8868.021603053436,131 -llama3.1-8b,16,32,none,200,70.28217391304348,8.771118012422361,0.008944099378881987,8951.93857142857,1116.9506211180124,3.637018633540373,0.0,33.39714285714286,1125.7217391304348,9022.220745341616,161 -llama2-7b,4,4,realistic,150,6302.072500000001,787.5889814814815,0.039768518518518516,8748.597453703704,1092.6816203703702,4.005462962962963,0.0,37.680092592592594,1880.2706018518516,15050.669953703704,216 -llama3.1-70b,0,0,realistic,10,38485.85426470588,4808.315955882353,0.17786764705882355,4419.538529411765,548.2456617647059,1.4761764705882352,0.0,24.809926470588234,5356.5616176470585,42905.392794117644,136 -llama3.1-8b,4,4,none,50,565.8025735294117,70.66625,0.014411764705882353,8584.362794117646,1071.0366911764704,3.808529411764706,0.0,34.61102941176471,1141.7029411764706,9150.16536764706,136 -llama2-7b,32,32,none,100,69.58091463414634,8.692682926829267,0.006463414634146342,9466.49243902439,1182.3985975609758,4.125731707317073,0.0,40.22670731707317,1191.091280487805,9536.073353658538,164 -llama3.1-8b,0,32,realistic,200,73150.26273809525,9119.61630952381,2.07922619047619,8901.05857142857,1098.3641666666667,5.515357142857143,0.0,251.88422619047617,10217.980476190476,82051.32130952382,168 -llama3.1-8b,16,4,realistic,200,79.31068493150686,9.897876712328767,0.009863013698630137,8897.981438356164,1110.185205479452,3.5965753424657527,0.0,32.84356164383561,1120.083082191781,8977.29212328767,146 -llama2-7b,32,64,realistic,100,81.68864197530864,10.202283950617284,0.0074074074074074086,9377.65086419753,1171.0190123456791,4.349814814814815,0.0,41.720185185185194,1181.2212962962965,9459.339506172839,162 -llama3.1-8b,0,4,realistic,50,45590.131538461545,5684.470139860141,0.9260139860139861,5463.671538461539,676.9863636363635,29.32818181818182,0.0,232.38111888111894,6361.4565034965035,51053.803076923075,143 -llama3.1-70b,16,4,realistic,60,91.40647482014388,11.411151079136692,0.00841726618705036,9318.142805755397,1163.4514388489208,4.306187050359713,0.0,42.72964028776978,1174.8625899280578,9409.54928057554,139 -llama3.1-8b,64,0,realistic,50,95.849375,11.961875000000001,0.012857142857142857,8436.805089285714,1052.3396428571427,4.787946428571429,0.0,38.934999999999995,1064.3015178571427,8532.654464285715,112 -llama3.1-70b,32,32,none,70,76.44307189542484,9.54313725490196,0.007647058823529411,8866.389281045753,1107.06045751634,4.291895424836602,0.0,40.317908496732024,1116.6035947712417,8942.832352941177,153 -llama2-7b,8,4,none,200,6991.796681222707,873.735807860262,0.040524017467248916,8504.308689956331,1062.068253275109,4.139781659388646,0.0,38.06122270742358,1935.8040611353713,15496.10537117904,229 -llama2-7b,8,4,realistic,100,6971.616390532544,871.266923076923,0.030710059171597637,9771.004674556214,1220.4824260355028,4.214260355029586,0.0,45.498224852071004,2091.749349112426,16742.62106508876,169 -llama3.1-8b,32,2,realistic,100,86.36782945736435,10.77860465116279,0.011162790697674419,8559.470697674418,1068.0538759689923,3.869457364341085,0.0,33.762015503875965,1078.8324806201551,8645.838527131782,129 -llama2-7b,32,32,none,200,50.23809734513274,6.276194690265487,0.004690265486725664,8536.893053097345,1066.1631415929203,4.109646017699115,0.0,37.28659292035398,1072.4393362831859,8587.13115044248,226 -mistral-7b,8,64,realistic,150,78.74826923076922,9.830961538461539,0.008461538461538461,8847.030897435898,1104.0451923076923,3.8062179487179484,0.0,35.06365384615385,1113.8761538461538,8925.779166666667,156 -llama3.1-8b,32,0,none,100,70.31654088050314,8.775408805031446,0.009056603773584906,8689.772830188678,1084.2359119496855,3.7086792452830193,0.0,34.61100628930817,1093.011320754717,8760.089371069182,159 -llama3.1-8b,64,2,realistic,200,77.09202898550726,9.621014492753623,0.010434782608695651,8190.452753623188,1021.7479710144928,3.988405797101449,0.0,33.47079710144927,1031.3689855072464,8267.544782608695,138 -llama3.1-8b,64,16,realistic,100,85.09105691056911,10.619268292682927,0.011707317073170732,8636.835284552846,1077.7259349593494,3.903577235772357,0.0,35.207723577235775,1088.3452032520324,8721.926341463413,123 -llama3.1-70b,4,16,none,20,31030.09977443609,3877.497819548873,0.3001503759398496,7452.664285714285,929.7730827067669,11.523383458646617,0.0,103.80022556390978,4807.270902255639,38482.76406015037,133 -mistral-7b,32,0,none,50,80.02711267605635,9.990633802816902,0.009295774647887325,8861.2788028169,1105.471408450704,3.6976760563380275,0.0,34.5062676056338,1115.462042253521,8941.305915492958,142 -llama3.1-70b,16,0,realistic,10,145.5179775280899,18.16640449438202,0.013146067415730336,9899.500786516855,1235.9113483146068,3.4535955056179777,0.0,36.41730337078652,1254.0777528089886,10045.018764044942,89 -mistral-7b,16,8,none,100,83.56212765957447,10.431914893617021,0.009361702127659575,8868.848652482271,1106.885673758865,3.8538297872340435,0.0,35.24808510638298,1117.317588652482,8952.410780141843,141 -llama3.1-70b,16,32,realistic,60,79.53570512820512,9.92923076923077,0.0075,8920.45608974359,1113.8722435897437,3.891538461538461,0.0,37.93621794871795,1123.8014743589742,8999.991794871794,156 -llama3.1-8b,64,4,realistic,150,80.78847328244275,10.082290076335878,0.01099236641221374,8521.75786259542,1063.186106870229,3.561297709923664,0.0,31.394427480916033,1073.2683969465647,8602.546335877863,131 -llama3.1-70b,8,4,none,20,8812.921692307693,1101.3106923076923,0.04015384615384616,8784.779384615385,1096.853769230769,3.816307692307692,0.0,39.19707692307692,2198.1644615384616,17597.701076923076,130 -llama3.1-8b,32,2,none,200,74.9272972972973,9.350810810810811,0.00972972972972973,8524.312635135135,1063.4580405405404,3.6706756756756764,0.0,31.81054054054054,1072.8088513513512,8599.239932432432,148 -mistral-7b,8,64,realistic,200,73.4616766467066,9.171017964071856,0.007904191616766467,8598.142275449101,1072.945748502994,3.7413772455089815,0.0,33.56287425149701,1082.1167664670659,8671.603952095807,167 -llama2-7b,0,64,realistic,200,37741.0509,4712.634,1.0172,17147.034,2123.2698,5.7613,0.0,209.60270000000003,6835.9038,54888.0849,100 -llama3.1-8b,16,16,none,150,77.0395945945946,9.614459459459459,0.00972972972972973,8933.154054054054,1114.6412837837838,3.5733783783783784,0.0,33.06256756756757,1124.2557432432434,9010.19364864865,148 -llama3.1-8b,16,4,none,150,80.21979166666667,10.011319444444446,0.01,8935.055833333334,1114.8972916666667,3.744305555555556,0.0,34.32006944444444,1124.908611111111,9015.275625,144 -llama3.1-8b,16,32,realistic,100,77.89986301369862,9.72178082191781,0.009863013698630137,8814.356232876713,1099.9496575342466,3.824726027397261,0.0,34.95082191780822,1109.6714383561641,8892.25609589041,146 -llama3.1-8b,32,4,none,50,90.53483606557377,11.298606557377049,0.01180327868852459,8933.74950819672,1114.555,3.688852459016393,0.0,34.33606557377049,1125.8536065573771,9024.284344262294,122 -llama2-7b,8,2,realistic,150,8441.371054852321,1054.8863291139241,0.045316455696202525,8176.4408860759495,1021.207341772152,3.784219409282701,0.0,36.95248945147679,2076.0936708860763,16617.81194092827,237 -llama2-7b,0,16,realistic,200,44326.32503816794,5535.089007633588,1.2648854961832061,13813.051603053436,1711.2195419847326,10.857022900763358,0.0,280.0006106870229,7246.30854961832,58139.37664122137,131 -llama2-7b,0,8,realistic,50,47704.72864130435,5959.146847826087,1.1130434782608696,6804.975543478261,846.2315217391304,19.175815217391307,0.0,260.9508695652174,6805.378369565218,54509.70418478261,184 -llama3.1-8b,8,4,realistic,150,105.11717391304347,13.117391304347827,0.011666666666666665,8979.089347826088,1120.391304347826,3.794710144927537,0.0,34.8236231884058,1133.508695652174,9084.20652173913,138 -llama2-7b,64,32,realistic,100,62.61969135802469,7.823024691358024,0.00654320987654321,8972.97049382716,1120.7937037037038,4.015493827160494,0.0,38.49462962962963,1128.616728395062,9035.590185185185,162 -mistral-7b,0,16,none,100,64881.230299401206,8091.671497005988,1.6044311377245506,7137.9732335329345,882.7468263473054,7.284191616766467,0.0,190.01185628742516,8974.418323353295,72019.20353293413,167 -mistral-7b,0,32,none,150,74066.68704402515,9233.880754716982,2.0479245283018868,9033.158867924529,1115.5445911949685,5.856477987421384,0.0,239.04194968553458,10349.42534591195,83099.84591194968,159 -mistral-7b,8,16,none,150,81.50666666666667,10.175359477124182,0.008627450980392158,8781.050326797385,1095.7025490196079,3.792483660130719,0.0,34.47222222222222,1105.877908496732,8862.556993464052,153 -llama3.1-8b,8,0,none,200,3874.6743406593405,483.94390109890116,0.029175824175824178,8960.942802197802,1117.8247802197802,3.692692307692308,0.0,37.70417582417582,1601.7686813186815,12835.617142857145,182 -llama3.1-8b,32,8,none,50,89.97500000000001,11.22877049180328,0.01180327868852459,8963.138770491805,1118.175983606557,3.8688524590163933,0.0,35.90581967213116,1129.4047540983604,9053.113770491802,122 -llama3.1-70b,0,2,none,60,48489.24288770054,6055.617967914439,1.0379144385026737,5937.032727272728,736.672834224599,22.23524064171123,0.0,252.27935828877006,6792.290802139037,54426.27561497326,187 -llama2-7b,0,8,none,200,58201.98293233083,7269.62552631579,1.3525563909774436,8483.30296992481,1048.4684586466165,8.407406015037594,0.0,234.9702255639098,8318.093984962406,66685.28590225564,266 -mistral-7b,0,8,none,200,76161.41525423729,9493.271129943503,2.2986440677966105,9527.999548022599,1175.6563841807908,6.047740112994349,0.0,282.31813559322035,10668.927514124294,85689.41480225988,177 -llama3.1-8b,32,16,none,200,72.07688741721854,8.995099337748345,0.009536423841059603,8879.736225165563,1107.8764238410595,3.680198675496689,0.0,33.623576158940395,1116.8715231788078,8951.813112582782,151 -mistral-7b,0,8,realistic,100,60268.53077922078,7515.050649350648,1.3836363636363636,6844.388246753247,846.5786363636363,10.325649350649352,0.0,186.7151948051948,8361.629285714285,67112.91902597403,154 -llama2-7b,64,16,realistic,100,62.58024242424242,7.818060606060606,0.006424242424242424,9348.793757575757,1167.828,3.9233939393939403,0.0,38.42642424242425,1175.6460606060607,9411.374,165 -llama3.1-8b,8,8,realistic,200,83.11283870967742,10.369419354838712,0.010193548387096775,8838.017806451613,1102.7084516129032,3.760064516129032,0.0,34.14206451612903,1113.077870967742,8921.13064516129,155 -mistral-7b,0,16,realistic,50,46241.02246575343,5766.722534246575,0.9723287671232878,4987.785273972603,617.9808219178082,28.47184931506849,0.0,224.81897260273973,6384.703356164383,51228.80773972603,146 -mistral-7b,16,16,none,100,83.39214285714286,10.410714285714286,0.009428571428571429,8942.751857142857,1116.0105714285712,3.8381428571428575,0.0,35.53142857142857,1126.4212857142857,9026.144,140 -mistral-7b,32,4,none,50,90.75186991869919,11.329512195121952,0.010731707317073172,8985.241219512194,1121.042276422764,3.7885365853658546,0.0,35.436097560975604,1132.3717886178858,9075.993089430893,123 -llama3.1-70b,8,0,realistic,20,9802.576811594203,1224.9066666666665,0.06355072463768116,8507.004057971015,1061.9632608695651,3.552971014492753,0.0,38.07463768115942,2286.8699275362324,18309.58086956522,138 -llama3.1-8b,16,16,realistic,150,78.8911724137931,9.84551724137931,0.00993103448275862,8889.849862068966,1109.2593103448278,3.6637241379310344,0.0,33.42806896551724,1119.104827586207,8968.741034482759,145 -mistral-7b,4,16,realistic,200,226.63986577181205,28.290536912751676,0.014697986577181207,9063.332147651006,1130.8588590604027,3.7542953020134235,0.0,35.11456375838926,1159.1493959731545,9289.97201342282,149 -mistral-7b,32,8,realistic,100,86.750546875,10.83,0.0103125,8859.628203125001,1105.619921875,3.91921875,0.0,36.129609375,1116.449921875,8946.37875,128 -mistral-7b,64,8,realistic,50,95.9962385321101,11.984220183486238,0.012110091743119267,8489.459357798165,1059.0370642201838,3.8138532110091745,0.0,35.01256880733945,1071.0212844036698,8585.455596330276,109 -llama3.1-8b,64,4,realistic,100,91.3023275862069,11.394396551724139,0.012413793103448275,8805.379396551723,1098.8013793103446,3.9051724137931036,0.0,35.7203448275862,1110.1957758620688,8896.681724137932,116 -llama3.1-70b,4,0,none,70,12412.867247191009,1550.7419662921352,0.08634831460674158,7387.724213483147,921.979775280899,4.651966292134832,0.0,41.15544943820225,2472.721741573034,19800.591460674157,178 -llama3.1-70b,8,2,realistic,70,91.32158620689654,11.400275862068966,0.008275862068965517,9035.381655172412,1128.1264827586206,4.384413793103448,0.0,40.64296551724138,1139.5267586206896,9126.70324137931,145 -llama3.1-70b,16,8,realistic,50,90.74611510791367,11.328705035971224,0.00841726618705036,9245.038057553957,1154.3012230215827,4.171798561151079,0.0,40.994820143884894,1165.629928057554,9335.78417266187,139 -llama3.1-8b,16,32,realistic,50,86.88931297709924,10.843664122137405,0.01099236641221374,9002.300763358779,1123.065496183206,3.8482442748091605,0.0,35.81022900763358,1133.9091603053432,9089.190076335877,131 -mistral-7b,4,64,realistic,200,2731.116046511628,341.08494186046516,0.02790697674418605,8691.806511627907,1084.4837790697675,3.7437790697674416,0.0,34.8996511627907,1425.5687209302325,11422.922558139535,172 -llama2-7b,32,16,realistic,150,61.118489583333336,7.63546875,0.005520833333333333,9636.273020833332,1203.6634375,4.2124999999999995,0.0,40.762135416666666,1211.29890625,9697.391510416666,192 -llama3.1-8b,4,32,realistic,100,2197.262162162162,274.4602702702703,0.018513513513513515,8692.24472972973,1084.7769594594595,3.9288513513513506,0.0,36.138851351351356,1359.2372297297297,10889.506891891891,148 -mistral-7b,64,16,realistic,50,97.00691588785047,12.1103738317757,0.012336448598130842,8631.120560747664,1076.6105607476634,3.8035514018691585,0.0,34.31728971962617,1088.720934579439,8728.127476635515,107 -llama3.1-8b,32,16,none,150,73.10389261744966,9.123288590604027,0.009664429530201342,8355.800805369128,1042.5248993288592,3.570872483221476,0.0,30.83657718120805,1051.6481879194632,8428.904697986578,149 -llama3.1-70b,8,32,realistic,30,9305.990547945205,1162.888287671233,0.05589041095890411,8741.3198630137,1091.1200684931507,4.162328767123288,0.0,45.169863013698624,2254.0083561643837,18047.310410958904,146 -llama2-7b,8,4,none,100,11208.107555555554,1400.6406111111112,0.05394444444444445,9110.549777777778,1137.8953888888889,4.2285555555555545,0.0,42.20838888888889,2538.536,20318.657333333336,180 -llama3.1-70b,16,64,none,70,76.07894409937887,9.497701863354038,0.007267080745341614,8584.503602484472,1071.8150931677017,4.317639751552795,0.0,39.78888198757764,1081.3127950310559,8660.582546583852,161 -llama3.1-8b,16,64,none,200,2674.319298245614,334.0222807017544,0.024385964912280702,8921.250701754387,1113.0416374269005,3.7705263157894735,0.0,35.95625730994152,1447.063918128655,11595.57,171 -llama2-7b,4,2,none,100,27421.734809523812,3426.8280476190484,0.08395238095238096,7594.126952380953,948.4158095238096,3.9286190476190477,0.0,40.01042857142858,4375.243857142857,35015.86176190476,210 -llama3.1-8b,8,2,realistic,150,85.66783216783217,10.691258741258741,0.01006993006993007,8612.633496503495,1074.566783216783,3.763006993006993,0.0,33.45237762237762,1085.258041958042,8698.301328671329,143 -llama3.1-8b,8,4,none,200,102.23794701986755,12.759470198675496,0.010264900662251657,9043.003178807947,1128.1903311258277,3.6644370860927147,0.0,33.80364238410596,1140.9498013245034,9145.241125827815,151 -llama2-7b,4,32,realistic,50,28514.783404255322,3563.3921808510636,0.46617021276595744,8367.388776595744,1041.4840425531916,6.000478723404256,0.0,92.12143617021277,4604.876223404255,36882.17218085106,188 -mistral-7b,8,8,realistic,200,126.74033783783784,15.820540540540541,0.010135135135135137,8949.368986486486,1116.6139864864865,3.6751351351351347,0.0,33.780743243243236,1132.4345270270271,9076.109324324323,148 -mistral-7b,32,32,none,200,71.3358552631579,8.905592105263159,0.008684210526315789,8948.958947368421,1116.5221052631578,3.548881578947369,0.0,33.03381578947368,1125.4276973684211,9020.294802631579,152 -llama3.1-8b,8,64,none,200,1172.323372781065,146.40875739644972,0.01698224852071006,9044.10621301775,1128.354852071006,3.7231952662721897,0.0,35.40467455621302,1274.7636094674556,10216.429585798816,169 -llama3.1-8b,16,2,realistic,200,77.18529801324503,9.632649006622517,0.009536423841059603,8447.063112582782,1053.864966887417,3.8264238410596025,0.0,32.26046357615894,1063.4976158940397,8524.248410596027,151 -llama3.1-70b,32,0,none,70,87.82202614379085,10.960980392156864,0.008627450980392156,8964.440065359478,1119.2122875816995,4.0973202614379085,0.0,39.4837908496732,1130.1732679738564,9052.26209150327,153 -llama2-7b,4,16,none,50,32909.641358024695,4112.532654320988,0.3251851851851852,7434.35512345679,925.9023456790123,14.792716049382717,0.0,104.35555555555557,5038.435,40343.99648148148,162 -mistral-7b,64,64,none,50,75.32525925925925,9.403629629629629,0.009777777777777778,8722.212296296297,1088.150962962963,3.6851111111111106,0.0,34.330740740740744,1097.5545925925926,8797.537555555557,135 -llama2-7b,8,16,none,200,4158.049826086956,519.562043478261,0.02652173913043478,8392.325260869566,1048.0417826086957,4.011130434782609,0.0,37.174,1567.6038260869566,12550.375086956521,230 -llama3.1-8b,16,2,realistic,150,83.32078571428572,10.398357142857142,0.010285714285714285,8733.903214285714,1089.7647857142858,4.0035,0.0,35.38364285714287,1100.163142857143,8817.223999999998,140 -llama3.1-8b,8,0,none,100,80.2289696969697,10.009818181818181,0.009636363636363636,9043.672909090908,1128.5687878787878,3.881878787878788,0.0,36.98169696969697,1138.5786060606063,9123.90187878788,165 -llama2-7b,8,0,realistic,200,3325.770718562874,415.54880239520963,0.02958083832335329,9664.810419161675,1206.0544910179642,4.446886227544909,0.0,46.993532934131736,1621.6032934131736,12990.58113772455,167 -llama3.1-8b,64,32,realistic,150,78.75212121212121,9.828181818181818,0.010909090909090908,8725.987121212122,1088.6152272727275,3.6063636363636364,0.0,32.65704545454546,1098.4434090909092,8804.739242424243,132 -llama2-7b,0,0,none,50,47493.33569832402,5932.741675977653,1.17463687150838,6874.045418994413,849.9645251396647,15.205530726256983,0.0,226.2123463687151,6782.706201117319,54367.38111731843,179 -llama2-7b,4,8,realistic,150,30422.507885462554,3801.6320704845816,0.15083700440528636,8300.971101321586,1036.0549339207048,6.526431718061674,0.0,67.49881057268722,4837.687004405288,38723.47898678415,227 -llama3.1-70b,16,0,realistic,60,1227.000975609756,153.32847560975608,0.012621951219512198,9072.531097560975,1132.7907926829268,4.064573170731708,0.0,39.80469512195121,1286.1192682926828,10299.532073170732,164 -llama3.1-8b,4,32,realistic,200,1482.2284375,185.134625,0.017625,9030.039625,1126.804625,3.7495000000000003,0.0,35.1806875,1311.93925,10512.268062500001,160 -llama3.1-8b,4,0,realistic,150,2651.18,330.9176506024096,0.06487951807228916,8798.031506024096,1097.8571084337352,3.7250602409638556,0.0,34.99939759036145,1428.7747590361446,11449.211506024096,166 -llama2-7b,16,64,none,50,1692.0036538461538,211.44416666666666,0.013269230769230771,9047.661987179486,1129.9751282051282,3.9460897435897437,0.0,38.67801282051282,1341.4192948717948,10739.66564102564,156 -llama3.1-70b,8,16,none,40,8176.832739726028,1021.7823287671235,0.038767123287671235,8923.50219178082,1114.2231506849316,4.088835616438356,0.0,41.62643835616438,2136.005479452055,17100.334931506848,146 -llama3.1-8b,4,8,realistic,100,1131.027676056338,141.2598591549296,0.01887323943661972,8498.578661971831,1060.5299295774648,3.730140845070422,0.0,33.96119718309859,1201.7897887323945,9629.60633802817,142 -llama2-7b,32,32,realistic,200,52.40447368421052,6.544254385964912,0.005482456140350877,9048.212719298246,1130.0504385964912,4.302280701754387,0.0,40.49390350877193,1136.594692982456,9100.617192982456,228 -llama2-7b,16,2,realistic,150,74.46556122448979,9.299132653061225,0.010510204081632654,9803.103112244899,1224.513418367347,4.495357142857142,0.0,45.016479591836735,1233.8125510204081,9877.568673469388,196 -llama3.1-8b,0,2,realistic,100,57450.360392156865,7162.887843137254,1.2686928104575164,6657.3932679738555,824.3792156862745,15.950130718954247,0.0,214.4750980392157,7987.267058823529,64107.75366013072,153 -llama3.1-8b,4,0,none,100,5324.927041420118,665.0300591715977,0.05301775147928994,8851.439053254438,1104.7601183431955,4.257928994082841,0.0,40.26408284023669,1769.7901775147932,14176.366094674557,169 -llama2-7b,32,4,none,50,102.14512605042017,12.7609243697479,0.008907563025210084,10842.573109243696,1354.4358823529412,4.092100840336135,0.0,45.21957983193278,1367.196806722689,10944.718235294118,119 -llama3.1-8b,64,0,realistic,200,69.08909677419355,8.62225806451613,0.00929032258064516,8472.21606451613,1056.7185806451614,3.2881935483870963,0.0,29.85025806451613,1065.3408387096774,8541.305161290324,155 -mistral-7b,32,32,realistic,150,75.71659722222222,9.4525,0.009166666666666667,8742.295694444445,1090.7561805555556,3.609583333333333,0.0,32.863125000000004,1100.2086805555555,8818.012291666666,144 -llama2-7b,0,0,realistic,100,61300.164563106795,7657.592330097087,1.459466019417476,7757.336067961165,962.0028640776699,2.923640776699029,0.0,151.40655339805826,8619.595194174757,69057.50063106796,206 -llama2-7b,64,2,realistic,100,68.22425,8.51925,0.006625000000000001,9452.4636875,1180.5683125,4.1780625,0.0,39.5640625,1189.0875625,9520.6879375,160 -llama2-7b,64,64,none,200,921.2210096153847,115.11028846153847,0.010961538461538464,8769.035865384616,1095.0935576923077,3.991346153846154,0.0,38.29086538461538,1210.2038461538461,9690.256875,208 -mistral-7b,4,16,none,100,164.411338028169,20.51887323943662,0.014014084507042255,8870.779507042253,1107.0571830985916,3.95556338028169,0.0,36.03795774647887,1127.5760563380281,9035.190845070423,142 -llama2-7b,16,0,realistic,100,1819.8309756097563,227.40481707317073,0.011402439024390245,9373.567439024391,1170.767012195122,4.171646341463415,0.0,43.09817073170731,1398.1718292682929,11193.398414634146,164 -mistral-7b,16,32,realistic,100,81.27391608391608,10.146293706293706,0.009230769230769232,8770.266223776223,1094.389020979021,3.7406993006993012,0.0,34.58468531468531,1104.5353146853147,8851.54013986014,143 -llama3.1-70b,4,4,none,30,34321.63376811594,4288.846014492754,0.18768115942028987,8189.027536231884,1021.612608695652,7.6805797101449285,0.0,78.3023188405797,5310.458623188406,42510.66130434782,138 -llama3.1-70b,32,2,realistic,20,116.66230769230768,14.564134615384615,0.01125,9457.040961538461,1180.7264423076924,3.910769230769231,0.0,38.350769230769224,1195.290576923077,9573.703269230768,104 -llama2-7b,64,8,none,150,52.2502512562814,6.527587939698493,0.005326633165829146,9265.463618090453,1157.2891959798997,4.411708542713568,0.0,42.223266331658294,1163.8167839195983,9317.713869346733,199 -llama2-7b,32,4,realistic,100,73.83771084337349,9.224457831325301,0.006385542168674699,10097.820240963856,1261.5844578313254,4.1698192771084335,0.0,43.61144578313253,1270.8089156626509,10171.657951807229,166 -mistral-7b,16,8,realistic,200,79.25194630872484,9.893825503355705,0.008859060402684565,8921.275234899329,1113.1304026845637,3.5759731543624165,0.0,33.48953020134228,1123.0242281879193,9000.527181208054,149 -llama3.1-70b,4,32,none,20,23709.624375,2962.732361111111,0.1938888888888889,8655.307083333333,1080.309236111111,9.553819444444445,0.0,87.2754861111111,4043.041597222222,32364.931458333333,144 -llama3.1-70b,4,32,realistic,10,27594.05841269841,3448.1859523809526,0.08119047619047617,5400.689444444444,673.086111111111,1.8002380952380952,0.0,21.419444444444448,4121.272063492063,32994.74785714286,126 -llama2-7b,0,16,none,200,55441.041960784314,6925.138071895425,1.3506209150326798,7402.999836601308,914.0062091503269,5.158986928104575,0.0,198.6045098039216,7839.144281045753,62844.04179738562,306 -mistral-7b,16,64,realistic,50,86.01843283582089,10.738582089552239,0.009850746268656717,8815.587835820896,1099.827313432836,3.7930597014925374,0.0,34.9368656716418,1110.5658955223882,8901.606268656717,134 -llama3.1-70b,32,8,realistic,30,99.58116666666666,12.43175,0.00975,9568.261499999999,1194.7114166666665,3.9060833333333336,0.0,39.28133333333333,1207.1431666666665,9667.842666666667,120 -llama3.1-70b,0,2,realistic,40,44449.839640718565,5551.630598802396,0.8603592814371258,5622.160538922156,697.8958682634732,24.925209580838324,0.0,245.82245508982038,6249.526467065868,50072.00017964072,167 -llama3.1-8b,32,0,none,50,79.92321428571428,9.974357142857144,0.010285714285714285,8809.212857142857,1098.9088571428572,3.6547142857142862,0.0,33.79185714285715,1108.8832142857143,8889.136071428571,140 -llama3.1-70b,8,32,realistic,20,8148.187295081967,1018.1551639344262,0.04581967213114753,8919.910327868853,1113.6432786885248,3.8535245901639343,0.0,40.72040983606558,2131.798442622951,17068.09762295082,122 -llama3.1-70b,16,4,none,40,104.70983471074379,13.07198347107438,0.009669421487603306,10067.035785123968,1257.0831404958678,4.087520661157024,0.0,43.65892561983471,1270.1551239669423,10171.74561983471,121 -llama2-7b,64,32,none,150,59.14568306010929,7.385901639344262,0.006721311475409836,8813.734262295082,1100.726174863388,4.203825136612021,0.0,38.364754098360656,1108.1120765027322,8872.879945355191,183 -llama3.1-8b,8,8,none,50,92.79238461538462,11.580384615384617,0.011076923076923076,9018.467999999999,1125.137307692308,3.9045384615384617,0.0,36.523153846153846,1136.7176923076922,9111.260384615385,130 -mistral-7b,8,0,realistic,100,86.4694701986755,10.794900662251655,0.008741721854304637,8898.450728476822,1110.47059602649,4.032913907284768,0.0,37.52033112582781,1121.2654966887417,8984.920198675496,151 -llama3.1-70b,4,64,none,30,18027.090647482015,2252.4034532374103,0.07273381294964028,7260.496762589928,905.6194244604317,4.075755395683453,0.0,38.28928057553957,3158.022877697841,25287.587410071937,139 -llama2-7b,4,0,none,50,28646.05067357513,3579.6248186528496,0.5201554404145079,8086.5708290155435,1005.9598963730571,6.239222797927462,0.0,99.81580310880827,4585.584715025907,36732.62150259067,193 -llama2-7b,64,16,none,200,45.08508849557522,5.6324336283185845,0.004690265486725664,8704.855884955752,1087.1652654867257,4.146991150442478,0.0,38.96690265486726,1092.7976991150445,8749.940973451328,226 -mistral-7b,0,64,realistic,100,58837.269871794866,7337.147243589745,1.4375641025641026,6443.929743589744,797.793205128205,7.967820512820513,0.0,158.83942307692308,8134.940448717948,65281.19961538462,156 -mistral-7b,32,8,none,200,73.55406666666667,9.182533333333334,0.0088,8913.651133333333,1112.1824666666666,3.634466666666667,0.0,33.50073333333333,1121.365,8987.2052,150 -mistral-7b,64,2,realistic,150,84.23103174603175,10.51547619047619,0.010476190476190477,8202.913888888888,1023.3871428571429,3.6135714285714284,0.0,30.378730158730153,1033.902619047619,8287.14492063492,126 -llama3.1-8b,32,0,none,150,70.25496855345912,8.767735849056603,0.009056603773584906,8710.624339622642,1086.6140251572328,3.4518867924528305,0.0,32.47893081761006,1095.3817610062895,8780.879308176101,159 -llama3.1-8b,32,4,realistic,150,81.9508888888889,10.227333333333334,0.010666666666666666,8905.929185185185,1111.196962962963,3.7573333333333334,0.0,34.03881481481481,1121.4242962962962,8987.880074074075,135 -llama3.1-70b,4,64,realistic,60,11449.079589041097,1430.2754794520547,0.05876712328767123,6797.368356164384,848.1369863013698,3.652260273972603,0.0,32.61952054794521,2278.4124657534244,18246.44794520548,146 -llama3.1-70b,32,8,none,60,85.50489208633094,10.674388489208633,0.00841726618705036,8893.059712230217,1110.2805035971223,4.093525179856115,0.0,37.87776978417266,1120.9548920863308,8978.564604316547,139 -llama2-7b,64,32,none,200,46.78495327102804,5.8448130841121495,0.004953271028037384,8614.061074766356,1075.5339252336448,3.922803738317757,0.0,34.37775700934579,1081.378738317757,8660.846028037384,214 -llama2-7b,0,32,realistic,100,57373.309871794874,7167.374188034189,1.3328205128205128,7165.813547008547,888.3007264957264,3.949017094017094,0.0,161.67606837606837,8055.674914529915,64539.123418803414,234 -mistral-7b,32,4,realistic,50,96.6426724137931,12.064913793103448,0.011379310344827587,8899.617931034481,1110.3060344827586,3.7799137931034474,0.0,35.71224137931034,1122.370948275862,8996.260603448276,116 -llama3.1-8b,64,16,none,200,70.39898648648648,8.785675675675675,0.00972972972972973,8435.416283783783,1052.3939864864865,3.658243243243244,0.0,31.965810810810808,1061.1796621621622,8505.815270270272,148 -llama3.1-70b,0,64,none,70,56147.842365591394,7011.983225806451,1.2833870967741934,6568.617580645162,814.3649462365591,7.306182795698925,0.0,165.33139784946238,7826.348172043011,62716.459946236566,186 -llama2-7b,32,64,realistic,200,49.297105263157896,6.158640350877193,0.004649122807017544,8545.667368421053,1067.173201754386,4.343377192982456,0.0,39.155964912280695,1073.3318421052631,8594.96447368421,228 -llama3.1-8b,16,0,realistic,50,88.50511278195489,11.04533834586466,0.010827067669172932,9009.315338345865,1124.0118045112783,3.844285714285714,0.0,36.13691729323308,1135.0571428571432,9097.82045112782,133 -llama2-7b,16,8,realistic,50,236.91456692913388,29.60251968503937,0.008582677165354331,10609.913543307086,1325.3399212598424,4.176141732283464,0.0,45.40614173228347,1354.9424409448816,10846.82811023622,127 -llama3.1-70b,4,0,realistic,70,3646.4732608695654,455.44579710144933,0.05043478260869566,7549.12804347826,942.3541304347826,3.793768115942029,0.0,33.731086956521736,1397.799927536232,11195.601304347825,138 -llama3.1-8b,4,4,realistic,50,496.59264,62.01376,0.018160000000000003,8794.72368,1097.2772000000002,3.9978399999999996,0.0,36.70120000000001,1159.29096,9291.31632,125 -mistral-7b,8,32,realistic,100,88.7455,11.078999999999999,0.009428571428571429,8931.266285714286,1114.620714285714,4.0451428571428565,0.0,37.18978571428571,1125.6997142857142,9020.011785714285,140 -llama2-7b,16,0,realistic,150,376.9745294117647,47.100941176470585,0.00823529411764706,9345.257647058825,1167.1091764705882,4.411352941176471,0.0,45.00858823529412,1214.210117647059,9722.232176470588,170 -llama2-7b,64,64,none,100,64.16097402597403,8.015584415584415,0.006883116883116883,9303.08142857143,1161.9831168831167,4.234025974025974,0.0,41.20214285714287,1169.9987012987015,9367.242402597403,154 -llama3.1-70b,32,64,realistic,60,76.19143790849674,9.511764705882353,0.007647058823529411,8847.030261437907,1104.538888888889,4.078562091503268,0.0,39.21947712418301,1114.0506535947713,8923.221699346404,153 -llama3.1-8b,8,0,realistic,150,162.14115853658538,20.222256097560976,0.02048780487804878,8759.623597560976,1092.8810365853658,3.6335975609756104,0.0,33.81969512195122,1113.1032926829268,8921.76475609756,164 -llama3.1-70b,4,4,none,10,22511.411640625,2812.980859375,0.10078125000000002,8171.1740625,1019.9961718750001,3.9360156249999996,0.0,43.88421875,3832.97703125,30682.585703124998,128 -llama3.1-70b,16,32,none,70,78.15949367088608,9.75740506329114,0.00740506329113924,8929.238481012659,1114.9681012658225,4.408734177215189,0.0,41.87867088607595,1124.7255063291138,9007.397974683543,158 -llama3.1-70b,32,64,realistic,40,82.77404255319149,10.333475177304964,0.008297872340425531,9429.466737588653,1177.3543971631204,4.167872340425532,0.0,41.885886524822695,1187.6878723404254,9512.240780141843,141 -mistral-7b,16,8,none,50,92.143203125,11.503203125,0.0103125,8777.379375,1095.0528125,3.8949218749999996,0.0,35.520078125,1106.5560156249999,8869.522578125001,128 -llama3.1-8b,8,0,realistic,100,94.57066225165563,11.800728476821192,0.010198675496688741,8844.144437086094,1103.5923178807948,3.8997350993377475,0.0,36.04225165562914,1115.3930463576157,8938.715099337749,151 -llama3.1-8b,8,64,realistic,50,93.19775193798449,11.626434108527134,0.012248062015503876,9038.299069767443,1127.593875968992,3.8218604651162793,0.0,36.124806201550385,1139.2203100775196,9131.496821705425,129 -llama3.1-70b,0,4,realistic,50,47589.468857142856,5943.231542857142,1.0178285714285713,5675.711314285714,703.3100571428571,20.312742857142858,0.0,246.6804,6646.541599999999,53265.180171428576,175 -mistral-7b,4,2,none,200,94.67038461538462,11.815705128205128,0.009166666666666668,8611.575064102564,1074.5223717948718,3.809038461538462,0.0,33.46506410256411,1086.3380769230769,8706.24544871795,156 -llama3.1-70b,0,2,realistic,50,45740.761452513965,5712.31782122905,0.9400558659217876,5451.627597765363,676.5441899441341,24.049776536312848,0.0,237.9035195530726,6388.862011173183,51192.38905027934,179 -llama2-7b,32,16,none,50,86.42274074074074,10.796666666666667,0.007851851851851853,9307.512962962963,1162.588888888889,4.189925925925925,0.0,40.24703703703704,1173.3855555555554,9393.935703703704,135 -llama3.1-8b,4,8,none,150,232.07529801324506,28.98019867549669,0.009933774834437087,8634.208079470198,1077.406225165563,3.6733112582781455,0.0,32.50682119205298,1106.3864238410597,8866.283377483443,151 -llama3.1-8b,8,8,none,150,82.49150684931507,10.29486301369863,0.009863013698630137,8990.219794520546,1121.7628767123288,3.6552054794520545,0.0,33.51541095890411,1132.0577397260274,9072.711301369864,146 -llama2-7b,16,16,realistic,100,103.55215116279071,12.927558139534883,0.007848837209302326,9136.549244186046,1141.3692441860467,4.037383720930232,0.0,38.97087209302325,1154.2968023255814,9240.101395348836,172 -llama2-7b,16,16,realistic,150,204.56830687830688,25.549365079365078,0.008042328042328042,9465.91857142857,1182.3633862433865,4.161693121693121,0.0,40.47777777777778,1207.9127513227513,9670.486878306878,189 -llama3.1-70b,16,0,realistic,20,107.86858333333333,13.46625,0.00975,9714.39525,1212.71775,3.982833333333333,0.0,41.254749999999994,1226.1840000000002,9822.263833333334,120 -llama2-7b,0,32,realistic,150,60395.29048192771,7544.727710843374,1.4083132530120483,7734.75297188755,955.8607630522088,3.528995983935743,0.0,175.34425702811245,8500.588473895583,68130.04345381526,249 -llama3.1-8b,4,8,none,50,1777.532781954887,222.02849624060147,0.029248120300751884,8820.983082706767,1100.5278195488722,3.863007518796993,0.0,35.5090977443609,1322.5563157894735,10598.515864661653,133 -mistral-7b,8,0,realistic,50,99.79381679389313,12.458320610687023,0.010076335877862596,9121.911679389312,1138.0681679389313,3.8722900763358776,0.0,37.15259541984733,1150.526488549618,9221.705496183205,131 -llama2-7b,16,2,realistic,50,610.148203125,76.24781250000001,0.010859375000000001,10303.579296875,1287.129765625,4.041328125,0.0,43.512578125000005,1363.377578125,10913.7275,128 -mistral-7b,4,64,realistic,50,3795.474714285714,474.0250714285714,0.04221428571428572,8701.648071428572,1085.6819999999998,3.7695714285714286,0.0,36.00592857142857,1559.7070714285715,12497.122785714286,140 -llama3.1-70b,32,32,none,40,80.80234482758621,10.087379310344827,0.008068965517241379,9155.833517241379,1143.2365517241378,3.968,0.0,39.484344827586206,1153.3239310344827,9236.635862068964,145 -llama3.1-70b,0,16,realistic,30,42464.85907407408,5303.83438271605,0.7330864197530864,4930.656419753086,611.5719135802469,21.03141975308642,0.0,195.64703703703702,5915.406296296297,47395.51549382716,162 -llama3.1-70b,8,2,realistic,50,3812.3837681159416,476.3883333333333,0.030362318840579713,8391.013115942029,1047.5412318840579,3.824275362318841,0.0,36.830724637681165,1523.9295652173912,12203.396884057971,138 -llama3.1-70b,8,16,none,30,7219.8256296296295,902.1967407407408,0.03799999999999999,8978.71488888889,1121.0697037037037,3.881777777777778,0.0,40.44170370370371,2023.2664444444447,16198.540518518517,135 -llama3.1-70b,32,4,none,50,91.45786259541984,11.417557251908397,0.008931297709923663,9304.3906870229,1161.7043511450383,3.9729770992366413,0.0,40.69076335877863,1173.1219083969468,9395.848549618322,131 -llama3.1-70b,32,32,realistic,70,76.74568627450981,9.580915032679739,0.007647058823529411,8775.38751633987,1095.773660130719,4.315751633986928,0.0,40.02771241830065,1105.3545751633987,8852.13320261438,153 -mistral-7b,64,64,none,150,73.0084892086331,9.114388489208634,0.009496402877697842,8779.503956834533,1095.3453237410072,4.211942446043166,0.0,37.74661870503598,1104.459712230216,8852.512446043165,139 -llama2-7b,32,8,realistic,100,73.64576687116565,9.200490797546014,0.006503067484662577,9940.245153374233,1241.854846625767,4.095582822085889,0.0,41.31386503067485,1251.055337423313,10013.890920245398,163 -mistral-7b,16,64,realistic,100,73.80461538461539,9.21378205128205,0.008461538461538461,8551.855833333333,1067.2115384615386,3.717435897435897,0.0,33.614423076923075,1076.4253205128205,8625.660448717948,156 -llama3.1-70b,8,0,realistic,30,11336.717631578947,1416.635460526316,0.059802631578947364,8497.43032894737,1060.8477631578946,3.8175657894736843,0.0,40.853815789473686,2477.4832236842108,19834.147960526316,152 -mistral-7b,4,32,none,150,857.5120779220779,107.09948051948052,0.011948051948051949,8947.707142857142,1116.619025974026,3.823961038961039,0.0,35.395779220779225,1223.7185064935065,9805.219220779221,154 -llama2-7b,64,4,none,50,89.33747899159664,11.160840336134454,0.008907563025210084,10208.15680672269,1275.1808403361345,4.086386554621848,0.0,42.30714285714285,1286.3416806722687,10297.494285714287,119 -llama3.1-8b,16,0,realistic,200,67.46304597701149,8.419310344827586,0.008275862068965517,8729.387356321839,1089.2371264367814,3.603218390804598,0.0,33.52,1097.656436781609,8796.85040229885,174 -llama2-7b,64,2,none,100,74.03369863013698,9.248972602739725,0.00726027397260274,10107.800753424659,1262.7832191780822,4.316027397260273,0.0,44.47321917808219,1272.032191780822,10181.834452054794,146 -llama2-7b,8,0,none,150,5747.998144329897,718.2061855670103,0.02922680412371134,8382.77087628866,1046.433195876289,4.257938144329897,0.0,41.740979381443296,1764.6393814432988,14130.769020618556,194 -mistral-7b,8,64,none,50,85.07354166666667,10.620624999999999,0.009166666666666667,8835.339583333332,1102.4423611111113,3.846458333333333,0.0,35.99701388888889,1113.062986111111,8920.413125,144 -llama2-7b,64,16,none,100,66.45863636363636,8.302597402597401,0.006883116883116883,9351.150000000001,1168.1026623376624,4.088051948051948,0.0,40.516948051948056,1176.40525974026,9417.608636363637,154 -llama3.1-8b,16,32,none,100,76.07000000000001,9.493422818791947,0.009664429530201342,8661.85322147651,1080.817852348993,3.856107382550335,0.0,35.041812080536914,1090.311275167785,8737.92322147651,149 -llama3.1-70b,4,0,none,50,20432.56813186813,2553.032307692308,0.2625824175824176,7830.910824175824,977.035989010989,6.558461538461539,0.0,77.92203296703296,3530.0682967032963,28263.478956043957,182 -llama3.1-70b,16,32,none,30,1711.5676388888887,213.8860416666667,0.012152777777777778,9273.33875,1157.933611111111,3.8537500000000002,0.0,39.03423611111111,1371.819652777778,10984.906388888889,144 -mistral-7b,4,8,realistic,100,400.7621481481481,50.037703703703706,0.016444444444444446,8964.061925925926,1118.6857037037037,3.8655555555555567,0.0,35.95148148148149,1168.7234074074076,9364.824074074075,135 -mistral-7b,32,32,realistic,200,69.36828025477706,8.66,0.008407643312101911,8603.268535031848,1073.335031847134,3.4509554140127388,0.0,31.0012101910828,1081.995031847134,8672.636815286623,157 -llama2-7b,8,4,realistic,150,8214.655225225226,1026.599774774775,0.038513513513513516,8589.837477477477,1072.8266216216216,4.255765765765766,0.0,40.01004504504504,2099.426396396397,16804.4927027027,222 -mistral-7b,0,64,none,150,73731.38530864197,9191.391111111112,1.9911111111111108,9076.588148148148,1120.6335802469137,5.3745061728395065,0.0,232.0972839506173,10312.024691358025,82807.97345679012,162 -llama3.1-8b,0,16,none,200,75703.62657142857,9437.659714285715,2.3006285714285717,9416.029485714287,1161.35,5.698799999999999,0.0,280.68982857142856,10599.009714285716,85119.65605714286,175 -mistral-7b,0,16,realistic,100,60844.804832214766,7587.826241610739,1.4356375838926172,6726.79744966443,832.6869127516778,9.306241610738255,0.0,176.53651006711408,8420.513154362416,67571.6022818792,149 -llama3.1-70b,32,2,realistic,10,141.1493023255814,17.62104651162791,0.013604651162790696,8802.772906976745,1099.141511627907,3.020232558139535,0.0,28.502790697674428,1116.7625581395348,8943.922209302325,86 -llama3.1-8b,4,16,realistic,50,1772.365185185185,221.3705185185185,0.01740740740740741,8552.198666666667,1067.0277777777778,3.7699259259259255,0.0,34.92385185185185,1288.3982962962962,10324.563851851854,135 -llama2-7b,32,4,none,150,57.90617224880383,7.234162679425838,0.00507177033492823,9444.110861244018,1179.6885167464116,4.346124401913875,0.0,41.55535885167464,1186.9226794258373,9502.017033492823,209 -llama3.1-8b,16,8,realistic,150,83.43659420289855,10.412753623188406,0.010434782608695651,8991.209202898552,1121.8590579710144,3.8015942028985505,0.0,34.91043478260869,1132.271811594203,9074.64579710145,138 -mistral-7b,32,64,none,100,71.8018,8.963799999999999,0.0088,8744.239133333334,1091.1945333333335,3.7630000000000003,0.0,34.83986666666667,1100.1583333333333,8816.040933333334,150 -mistral-7b,8,32,none,50,88.40078571428572,11.036,0.009428571428571429,8979.55292857143,1120.388714285714,3.8597857142857137,0.0,36.701071428571424,1131.4247142857141,9067.953714285715,140 -llama2-7b,8,64,none,50,13981.160588235292,1747.2634705882356,0.16135294117647062,9204.838294117648,1144.446294117647,7.677176470588234,0.0,78.65982352941175,2891.7097647058827,23185.998882352942,170 -mistral-7b,8,16,realistic,200,84.56222972972972,10.556756756756757,0.00891891891891892,9027.041351351352,1126.3428378378378,3.640675675675676,0.0,34.089729729729726,1136.8995945945946,9111.60358108108,148 -llama3.1-70b,8,64,realistic,40,9988.000891719747,1248.0726114649683,0.05057324840764331,8506.31923566879,1061.9124840764332,3.9847770700636937,0.0,42.182675159235664,2309.9850955414013,18494.320127388535,157 -llama3.1-70b,32,0,realistic,20,110.2210810810811,13.76,0.01054054054054054,9787.280720720722,1221.9714414414416,4.045405405405405,0.0,41.43360360360359,1235.7314414414416,9897.501801801802,111 -llama2-7b,8,64,none,200,4374.908510638298,546.6929361702128,0.026808510638297877,8488.727021276596,1059.8689787234043,4.163659574468085,0.0,38.632000000000005,1606.5619148936169,12863.635531914893,235 -llama3.1-8b,16,4,realistic,150,83.37604316546764,10.405251798561151,0.010359712230215827,8942.980215827338,1115.8438848920862,3.6825899280575545,0.0,33.42410071942445,1126.2491366906474,9026.356258992806,139 -llama3.1-70b,0,64,realistic,30,42243.8908974359,5275.949551282052,0.6987820512820513,4745.075384615385,588.2168589743588,17.944615384615382,0.0,175.62756410256412,5864.166410256411,46988.96628205128,156 -llama3.1-8b,16,0,none,200,2153.512786885246,268.9731693989071,0.020437158469945354,8867.379398907104,1106.2024043715846,3.6249180327868853,0.0,34.73049180327869,1375.1755737704918,11020.89218579235,183 -mistral-7b,32,4,realistic,100,88.88444444444445,11.096349206349208,0.010476190476190477,8763.895555555555,1093.6590476190474,3.8262698412698404,0.0,35.066825396825394,1104.7553968253967,8852.78,126 -mistral-7b,4,64,realistic,100,3510.3856209150326,438.3866666666666,0.041045751633986924,8791.651111111112,1097.1752941176471,4.065032679738562,0.0,38.512875816993464,1535.5619607843137,12302.036732026145,153 -llama3.1-8b,64,2,realistic,100,91.06794871794872,11.365128205128205,0.012307692307692308,8429.175384615384,1051.7928205128205,3.748119658119658,0.0,32.55820512820513,1063.1579487179488,8520.243333333334,117 -mistral-7b,64,2,realistic,100,91.58508620689656,11.43353448275862,0.011379310344827587,8676.239137931034,1082.7528448275862,3.9049137931034483,0.0,34.831637931034486,1094.1863793103446,8767.82422413793,116 -llama3.1-70b,16,32,realistic,20,101.93926229508197,12.72606557377049,0.00959016393442623,9514.123196721312,1187.9745081967214,4.1314754098360655,0.0,41.88688524590164,1200.7005737704922,9616.062459016395,122 -llama3.1-8b,4,16,none,50,3027.544748201439,378.1654676258993,0.025611510791366903,8618.207769784172,1075.3059712230215,3.7058273381294966,0.0,34.98194244604316,1453.4714388489208,11645.752517985613,139 -llama3.1-70b,0,4,none,40,47060.06333333334,5877.720059523809,1.022857142857143,5821.687916666667,721.8201190476191,21.562142857142856,0.0,246.2522619047619,6599.540178571428,52881.75125000001,168 -llama3.1-70b,0,32,realistic,30,43531.378553459115,5437.081823899372,0.7059119496855346,5082.793459119497,630.201823899371,21.25440251572327,0.0,197.5305031446541,6067.283647798743,48614.172012578616,159 -llama3.1-70b,16,0,realistic,70,79.16822085889571,9.883312883435583,0.007177914110429447,8982.263374233129,1121.47736196319,4.294478527607362,0.0,41.63380368098159,1131.360674846626,9061.431595092025,163 -llama3.1-70b,32,8,realistic,40,94.04228346456694,11.740236220472442,0.00921259842519685,9256.136141732284,1155.7260629921261,3.9807086614173226,0.0,38.62913385826772,1167.4662992125986,9350.17842519685,127 -llama3.1-70b,8,32,realistic,50,6863.0958709677425,857.632064516129,0.03283870967741936,8713.281806451612,1087.8425806451612,4.050193548387097,0.0,39.74754838709677,1945.4746451612903,15576.377677419356,155 -llama2-7b,0,2,none,200,49529.625947955385,6186.087992565055,1.1566914498141263,7377.516617100372,915.7860966542752,23.260074349442377,0.0,279.3759479553903,7101.874089219331,56907.14256505576,269 -llama3.1-70b,16,4,none,70,86.0444217687075,10.741768707482994,0.007959183673469388,8894.585850340136,1110.625306122449,4.23952380952381,0.0,39.86183673469388,1121.367074829932,8980.630272108843,147 -llama3.1-70b,32,2,none,20,106.97327433628318,13.354513274336282,0.010353982300884955,9579.10477876106,1196.0393805309734,4.029292035398231,0.0,39.9453982300885,1209.3938938053097,9686.078053097346,113 -llama3.1-8b,64,16,none,100,80.26692307692308,10.01723076923077,0.011076923076923076,8644.972076923077,1078.5745384615384,3.6895384615384614,0.0,33.547230769230765,1088.5917692307694,8725.239000000001,130 -llama3.1-8b,32,8,realistic,150,80.84573529411765,10.089485294117647,0.010588235294117647,8892.95786764706,1109.5783823529412,3.6319852941176474,0.0,33.088088235294116,1119.667867647059,8973.803602941176,136 -mistral-7b,4,8,none,200,179.1918831168831,22.37694805194805,0.009805194805194805,9089.521428571428,1134.173116883117,3.6660389610389608,0.0,34.36331168831169,1156.5500649350652,9268.713311688312,154 -llama3.1-8b,64,32,none,100,76.8257037037037,9.587777777777777,0.010666666666666666,8646.714518518518,1078.8713333333333,3.642148148148148,0.0,33.47325925925926,1088.4591111111113,8723.540222222222,135 -llama3.1-70b,4,2,realistic,10,28731.39580882353,3590.2956617647055,0.07551470588235296,5690.6649264705875,709.975,2.301029411764706,0.0,24.13433823529412,4300.270661764706,34422.060735294115,136 -llama3.1-70b,0,8,none,40,47221.08196531792,5897.793699421965,1.0449710982658957,5710.103294797688,708.1271676300578,20.644682080924856,0.0,240.38167630057802,6605.920867052024,52931.18526011561,173 -mistral-7b,0,4,none,50,51347.318,6404.1022,1.1710666666666665,5623.929333333333,696.6184,23.076,0.0,234.48426666666666,7100.720599999999,56971.24733333334,150 -mistral-7b,32,8,realistic,150,80.37884057971014,10.034565217391304,0.009565217391304347,8864.904492753623,1106.143115942029,3.656449275362318,0.0,33.15775362318841,1116.1776811594204,8945.283333333335,138 -mistral-7b,8,16,realistic,50,98.1340625,12.25109375,0.0103125,8830.6559375,1101.723515625,3.956171875,0.0,36.119765625,1113.974609375,8928.79,128 -llama3.1-70b,4,32,realistic,60,12998.099072847683,1624.0201324503316,0.06463576158940398,7149.728543046358,892.0488079470198,3.8276821192052983,0.0,34.44105960264901,2516.068940397351,20147.82761589404,151 -llama3.1-70b,32,4,none,30,100.78638655462184,12.58218487394958,0.009831932773109243,9811.552352941177,1225.0573109243696,4.1571428571428575,0.0,41.53176470588235,1237.6394957983193,9912.338739495799,119 -llama2-7b,4,4,none,200,22484.562962962962,2809.5348148148146,0.08621399176954733,8116.6101234567905,1013.4908230452673,4.839341563786008,0.0,46.33012345679012,3823.025637860082,30601.173086419756,243 -llama3.1-70b,0,8,none,20,42487.36821917808,5306.201164383562,0.5511643835616439,5134.977328767123,637.5432876712329,17.997534246575345,0.0,159.69609589041093,5943.744452054795,47622.34554794521,146 -llama3.1-8b,4,8,realistic,200,871.2613548387097,108.80296774193548,0.02064516129032258,8745.206838709677,1091.080451612903,3.5450322580645164,0.0,32.808451612903234,1199.8834193548387,9616.468193548388,155 -llama3.1-70b,0,8,realistic,70,53620.2684916201,6695.629832402235,1.1710055865921787,6563.781955307262,813.2788826815644,14.273575418994415,0.0,217.59217877094972,7508.908715083799,60184.05044692737,179 -llama3.1-8b,8,64,none,50,78.75493333333333,9.828533333333333,0.0096,8806.306133333333,1098.7655999999997,3.8506,0.0,35.5438,1108.5941333333333,8885.061066666667,150 -llama3.1-70b,32,16,none,20,92.349609375,11.52890625,0.009140625,9195.895703125,1148.130859375,4.103828125000001,0.0,40.293124999999996,1159.659765625,9288.2453125,128 -llama2-7b,0,16,none,50,48426.21427835052,6049.449175257732,1.234175257731959,6773.226082474226,832.7779896907216,17.809072164948454,0.0,263.1474226804123,6882.227164948453,55199.44036082474,194 -mistral-7b,8,32,realistic,50,93.52691729323308,11.675939849624061,0.009924812030075189,8876.06834586466,1107.4736842105262,3.817368421052632,0.0,35.69609022556391,1119.14962406015,8969.595263157895,133 -llama2-7b,0,8,realistic,100,54989.38342519685,6869.493031496063,1.2390551181102363,6983.290748031496,866.638346456693,7.079251968503938,0.0,200.4833464566929,7736.131377952756,61972.67417322834,254 -llama2-7b,32,4,realistic,200,55.932981651376146,6.987660550458715,0.004862385321100918,9265.415596330276,1157.3991743119266,4.108853211009174,0.0,39.07922018348624,1164.3868348623853,9321.348577981651,218 -llama2-7b,8,32,none,100,5504.085775401069,687.7926737967914,0.034064171122994646,8661.58101604278,1081.5624598930483,3.9353475935828883,0.0,38.57112299465241,1769.3551336898397,14165.66679144385,187 -llama3.1-8b,0,64,realistic,100,59815.768500000006,7459.618062500001,1.4,6449.675125,797.591,8.446250000000001,0.0,163.34093750000002,8257.2090625,66265.443625,160 -mistral-7b,64,16,none,150,74.22705035971222,9.266546762589927,0.009496402877697842,8751.251870503596,1091.8272661870503,3.593597122302158,0.0,32.417266187050366,1101.0938129496403,8825.47892086331,139 -llama2-7b,8,8,none,200,5927.902291666667,740.8172500000001,0.029500000000000005,8688.703458333333,1085.0909166666668,4.094375,0.0,39.116625000000006,1825.9081666666666,14616.605749999999,240 -llama3.1-8b,32,64,none,150,70.73789473684211,8.828026315789472,0.009473684210526315,8815.346710526315,1099.7584868421054,3.537894736842105,0.0,32.094210526315784,1108.5865131578946,8886.084605263157,152 -llama3.1-70b,8,64,realistic,70,3635.179012345679,454.1985802469136,0.025246913580246912,8925.342777777778,1114.2614197530866,4.263271604938271,0.0,41.78203703703704,1568.4599999999998,12560.521790123457,162 -llama3.1-8b,32,16,realistic,150,76.4113986013986,9.536083916083916,0.01006993006993007,8446.907272727272,1053.9225174825174,3.653216783216783,0.0,31.93468531468532,1063.458601398601,8523.318671328672,143 -llama2-7b,16,4,realistic,200,167.36361990950226,20.904570135746606,0.006515837104072399,9577.118597285067,1196.3774208144796,4.272941176470588,0.0,42.43936651583711,1217.281990950226,9744.48221719457,221 -mistral-7b,0,16,none,200,78175.01806060606,9744.15793939394,2.3495757575757574,9913.168848484847,1222.9064242424245,5.8957575757575755,0.0,290.4637575757576,10967.064363636364,88088.1869090909,165 -mistral-7b,0,0,realistic,100,60482.957987013,7543.244025974026,1.4464285714285714,6558.931038961039,812.4050000000001,8.490064935064934,0.0,165.06655844155844,8355.649025974028,67041.88902597403,154 -llama2-7b,64,16,realistic,50,86.91655462184873,10.85840336134454,0.008907563025210084,9996.023781512606,1248.6399159663863,4.0162184873949585,0.0,41.89210084033613,1259.4983193277308,10082.940336134454,119 -llama3.1-70b,4,4,none,50,12426.2465942029,1552.7619565217392,0.06884057971014493,8924.961884057971,1114.2374637681162,4.460072463768116,0.0,43.39666666666667,2666.9994202898547,21351.20847826087,138 -llama3.1-8b,32,2,none,100,83.50609022556391,10.42142857142857,0.010827067669172932,8589.53827067669,1071.8422556390979,3.9700751879699245,0.0,34.88556390977444,1082.2636842105264,8673.044360902255,133 -llama3.1-70b,32,0,none,50,80.01677631578947,9.989276315789473,0.007697368421052631,8935.227368421052,1115.3862500000002,3.9843421052631585,0.0,39.35026315789474,1125.3755263157896,9015.244144736842,152 -llama2-7b,16,0,none,50,1276.9089032258064,159.52870967741933,0.024129032258064516,8669.482967741935,1082.495806451613,4.057612903225807,0.0,39.867870967741936,1242.0245161290325,9946.391870967742,155 -llama3.1-70b,16,64,realistic,10,127.22917525773195,15.883298969072166,0.012061855670103093,9184.25144329897,1147.0235051546392,3.6678350515463922,0.0,36.492680412371136,1162.9068041237115,9311.480618556701,97 -llama3.1-70b,0,64,none,40,46002.7213372093,5745.600406976743,1.041279069767442,5318.2225,659.7042441860466,20.613546511627906,0.0,218.17017441860463,6405.30465116279,51320.9438372093,172 -llama3.1-70b,0,16,none,20,42388.892,5293.962785714286,0.5645714285714285,5002.3285,621.2718571428571,18.223857142857145,0.0,155.01035714285712,5915.234642857144,47391.2205,140 -llama2-7b,0,8,none,100,58297.84451754386,7282.219780701755,1.3320614035087721,7870.246228070176,977.8802631578947,8.869912280701755,0.0,226.8299122807018,8260.10004385965,66168.09074561404,228 -llama2-7b,4,64,none,150,28900.400126050423,3611.374537815126,0.33029411764705874,7731.612899159664,964.3588655462185,16.432226890756304,0.0,120.29802521008403,4575.733403361345,36632.01302521008,238 -llama3.1-8b,4,32,realistic,150,2184.8567741935485,272.8864516129032,0.023354838709677417,8871.64270967742,1107.082064516129,3.7846451612903227,0.0,35.042258064516126,1379.9685161290324,11056.499483870968,155 -mistral-7b,16,0,realistic,100,81.18613333333333,10.135333333333334,0.0088,8732.724533333332,1089.8239333333331,3.8279999999999994,0.0,35.15506666666667,1099.9592666666663,8813.910666666667,150 -llama3.1-70b,4,32,realistic,40,26275.567108433734,3283.325120481928,0.3660240963855422,7388.948072289156,921.9171686746988,8.839819277108433,0.0,75.67319277108433,4205.242289156627,33664.51518072289,166 -llama3.1-70b,16,4,none,60,92.37817518248175,11.532481751824818,0.008540145985401459,9255.504890510949,1155.7159124087593,4.257956204379561,0.0,41.94824817518248,1167.248394160584,9347.88306569343,137 -llama2-7b,4,16,none,200,14507.599221311475,1812.9207377049179,0.09221311475409837,8055.145778688525,1005.5531967213117,5.733114754098361,0.0,51.38098360655738,2818.473934426229,22562.745000000003,244 -llama3.1-8b,0,2,none,50,50268.67048275862,6268.493793103449,1.0844137931034483,5946.381172413793,736.428,25.614137931034485,0.0,245.428,7004.921793103448,56215.051655172414,145 -llama2-7b,8,4,none,150,10009.378883248732,1250.8472081218274,0.05472081218274112,9003.312791878174,1124.6057360406091,4.167360406091371,0.0,42.69071065989848,2375.4529441624363,19012.691675126902,197 -llama2-7b,8,32,realistic,50,12966.977751479291,1620.4949704142014,0.0870414201183432,7993.051656804733,998.233550295858,3.6874556213017757,0.0,42.64715976331362,2618.7285207100595,20960.029408284026,169 -llama2-7b,64,32,realistic,200,45.9065,5.735045454545455,0.004818181818181819,9122.614181818182,1139.2506818181819,4.062954545454545,0.0,40.1015,1144.9857272727274,9168.520681818181,220 -llama3.1-70b,0,64,realistic,40,45323.67256097561,5660.492987804879,0.938048780487805,5571.113048780488,690.7245121951219,22.787439024390242,0.0,236.63585365853658,6351.217500000001,50894.785609756094,164 -mistral-7b,16,2,none,50,96.08032,11.99472,0.01056,8667.37,1081.3943199999999,3.782080000000001,0.0,33.6996,1093.38904,8763.45032,125 -llama2-7b,64,2,none,50,94.15391304347827,11.762521739130435,0.009217391304347827,10112.32443478261,1263.187652173913,4.117478260869565,0.0,41.512086956521735,1274.9501739130435,10206.478347826087,115 -llama3.1-70b,0,32,realistic,70,53618.062375690606,6696.395690607734,1.2058563535911602,6592.745856353591,817.5167403314917,11.50524861878453,0.0,184.78375690607734,7513.912430939226,60210.8082320442,181 -llama3.1-70b,8,16,realistic,40,10554.987337662338,1318.983051948052,0.05694805194805194,8478.368246753247,1058.6299350649351,3.8169480519480516,0.0,38.803441558441556,2377.612987012987,19033.35558441558,154 -mistral-7b,16,8,none,200,77.85370860927152,9.719271523178808,0.008741721854304637,8913.170794701988,1112.1839735099336,3.6095364238410594,0.0,32.951192052980126,1121.9032450331124,8991.024503311259,151 -llama3.1-8b,8,32,none,200,449.47335403726703,56.134534161490684,0.015341614906832297,8853.936708074534,1104.6452173913044,3.677453416149069,0.0,33.41987577639752,1160.779751552795,9303.410062111801,161 -llama3.1-8b,16,4,realistic,100,86.55768656716418,10.80231343283582,0.010746268656716417,8561.436268656716,1068.4059701492538,4.133283582089551,0.0,36.42917910447761,1079.2082835820895,8647.99395522388,134 -llama3.1-70b,0,2,realistic,60,46798.55300546448,5844.312240437158,0.9893989071038252,5785.745245901639,717.8872677595629,23.518360655737702,0.0,253.60628415300548,6562.199508196722,52584.29825136612,183 -llama3.1-70b,32,32,realistic,40,86.49507352941177,10.798014705882352,0.008602941176470588,9725.509485294118,1214.365,4.148897058823529,0.0,42.73338235294118,1225.1630147058822,9812.00455882353,136 -mistral-7b,32,4,none,100,84.48234848484849,10.546818181818182,0.01,8890.844166666666,1109.5604545454544,3.9704545454545457,0.0,36.321666666666665,1120.1072727272729,8975.326515151515,132 -llama2-7b,4,64,none,50,26145.01211640212,3267.09544973545,0.4830158730158729,8555.67052910053,1062.7272486772488,5.7047089947089935,0.0,88.48566137566138,4329.822698412699,34700.682645502646,189 -mistral-7b,16,2,realistic,50,99.72000000000001,12.44909090909091,0.01090909090909091,8778.823801652892,1095.4334710743803,3.9620661157024797,0.0,35.96867768595041,1107.882561983471,8878.543801652893,121 -llama3.1-70b,8,64,none,20,10768.5245,1345.5985714285716,0.05621428571428572,9041.1515,1128.755571428571,3.960428571428572,0.0,41.63214285714286,2474.3541428571425,19809.676,140 -llama2-7b,0,8,none,150,59566.740595744675,7440.475489361702,1.362510638297872,8576.283744680852,1059.8208085106382,7.704510638297873,0.0,230.72489361702128,8500.29629787234,68143.02434042553,235 -llama3.1-70b,8,16,none,60,1782.3858904109588,222.70513698630137,0.023356164383561646,8936.503767123288,1115.7351369863015,4.153287671232878,0.0,40.11102739726028,1338.4402739726024,10718.889657534248,146 -llama2-7b,64,32,none,50,75.73082706766918,9.460977443609021,0.007969924812030075,9195.822556390976,1148.6141353383457,4.2118796992481204,0.0,40.27654135338346,1158.0751127819549,9271.553383458648,133 -llama2-7b,8,16,realistic,50,14014.435499999998,1751.3260000000002,0.06568750000000001,8339.681125,1041.1983125000002,3.6595625,0.0,38.097125,2792.5243125,22354.116625000002,160 -llama3.1-8b,4,4,realistic,150,128.0524647887324,15.975281690140843,0.012816901408450704,8951.683450704226,1117.007957746479,3.8128873239436625,0.0,34.914577464788735,1132.9832394366197,9079.735915492958,142 -mistral-7b,4,8,none,100,1166.7678873239438,145.7387323943662,0.025704225352112673,8941.77161971831,1115.9526760563379,4.064647887323944,0.0,37.52035211267605,1261.691408450704,10108.539507042253,142 -llama3.1-70b,0,4,none,20,42674.81713286713,5329.786363636364,0.5530769230769231,5260.206363636364,652.6985314685315,18.195174825174824,0.0,167.13426573426577,5982.484895104895,47935.02349650349,143 -mistral-7b,4,8,none,50,1780.8896268656715,222.40485074626864,0.032089552238805975,8572.248208955223,1069.5314179104478,3.8426865671641797,0.0,34.59574626865672,1291.9362686567165,10353.137835820895,134 -llama3.1-70b,4,2,none,40,18283.632720588233,2284.6702205882357,0.07639705882352942,7844.258382352942,979.0704411764707,4.492205882352941,0.0,39.996691176470584,3263.7406617647057,26127.891102941176,136 -mistral-7b,64,8,none,200,70.68789115646258,8.82469387755102,0.008979591836734694,8462.140340136053,1055.813129251701,3.627414965986394,0.0,31.71755102040816,1064.6378231292517,8532.828231292517,147 -llama2-7b,0,2,realistic,100,48692.62486363636,6082.464863636364,1.1010454545454547,6516.759363636364,809.276409090909,24.620636363636365,0.0,285.968,6891.741272727273,55209.38422727273,220 -llama3.1-70b,0,64,none,20,42341.53827586207,5288.435103448276,0.5749655172413793,4938.022896551724,613.0992413793103,19.452896551724137,0.0,164.8271724137931,5901.534344827587,47279.56117241379,145 -mistral-7b,64,4,realistic,50,103.41519607843138,12.910392156862745,0.012941176470588235,8247.583823529412,1028.6232352941176,3.8659803921568625,0.0,33.468823529411765,1041.5336274509805,8350.999019607843,102 -llama3.1-70b,32,0,realistic,30,90.57903703703704,11.30785185185185,0.008666666666666666,9314.960000000001,1163.0511111111114,4.046444444444444,0.0,39.568000000000005,1174.358962962963,9405.539037037037,135 diff --git a/kv_cache_benchmark/discovery_results_and_analysis/iostat_cpu0_comparison.py b/kv_cache_benchmark/discovery_results_and_analysis/iostat_cpu0_comparison.py deleted file mode 100644 index 2865d8c4..00000000 --- a/kv_cache_benchmark/discovery_results_and_analysis/iostat_cpu0_comparison.py +++ /dev/null @@ -1,98 +0,0 @@ -#!/usr/bin/env python3 -"""Compare all models at cpu_mem=0GB for apples-to-apples iostat comparison.""" - -import pandas as pd - -df = pd.read_csv('iostat_analysis.csv') -df = df.rename(columns={'rMB_s': 'read_mbs', 'wMB_s': 'write_mbs', 'total_MB_s': 'total_mbs', 'total_IOPS': 'iops'}) - -# Filter to cpu_mem=0 only -cpu0 = df[df['cpu_mem'] == 0] - -print('=' * 100) -print('ALL MODELS @ cpu_mem=0GB (Apples-to-Apples Comparison)') -print(f'Total configs: {len(cpu0)}') -print('=' * 100) - -# Summary by model -print('\nSUMMARY BY MODEL:') -model_stats = cpu0.groupby('model').agg({ - 'read_mbs': ['mean', 'std', 'max'], - 'write_mbs': ['mean', 'max'], - 'total_mbs': ['mean', 'max'], - 'iops': ['mean', 'max'], - 'util': 'mean', - 'model': 'count' -}).round(0) -model_stats.columns = ['Read Avg', 'Read Std', 'Read Max', 'Write Avg', 'Write Max', 'Total Avg', 'Total Max', 'IOPS Avg', 'IOPS Max', 'Util%', 'Configs'] -print(model_stats.sort_values('Total Avg', ascending=False).to_string()) - -print('\n' + '=' * 100) -print('DETAILED: All Models x Users @ cpu_mem=0GB') -print('=' * 100) - -# Pivot by model and users -pivot = cpu0.pivot_table( - values=['read_mbs', 'write_mbs', 'total_mbs'], - index='model', - columns='users', - aggfunc='mean' -).round(0) - -print('\nRead MB/s by Model x Users:') -print(pivot['read_mbs'].to_string()) - -print('\nWrite MB/s by Model x Users:') -print(pivot['write_mbs'].to_string()) - -print('\nTotal MB/s by Model x Users:') -print(pivot['total_mbs'].to_string()) - -print('\n' + '=' * 100) -print('TOP 5 CONFIGS PER MODEL @ cpu_mem=0GB') -print('=' * 100) - -for model in ['mistral-7b', 'llama3.1-8b', 'llama2-7b', 'llama3.1-70b']: - model_df = cpu0[cpu0['model'] == model].nlargest(5, 'total_mbs') - print(f'\n{model}:') - for _, row in model_df.iterrows(): - mca = int(row['mca']) - users = int(row['users']) - gen = row['gen_mode'] - read = row['read_mbs'] - write = row['write_mbs'] - total = row['total_mbs'] - print(f" mca={mca:2d}, users={users:3d}, gen={gen:9s} => Read: {read:,.0f} MB/s, Write: {write:,.0f} MB/s, Total: {total:,.0f} MB/s") - -print('\n' + '=' * 100) -print('MODEL COMPARISON @ SAME USER COUNT (cpu_mem=0GB)') -print('=' * 100) - -# For each user count that all models have -common_users = [50] # All models have 50 users -for users in common_users: - print(f'\nUsers={users}:') - user_df = cpu0[cpu0['users'] == users].groupby('model').agg({ - 'read_mbs': 'mean', - 'write_mbs': 'mean', - 'total_mbs': 'mean', - 'iops': 'mean', - 'util': 'mean' - }).round(0) - print(user_df.sort_values('total_mbs', ascending=False).to_string()) - -print('\n' + '=' * 100) -print('KEY INSIGHT: Which model stresses storage the most @ cpu_mem=0GB?') -print('=' * 100) - -# Get best config per model -best_per_model = cpu0.loc[cpu0.groupby('model')['total_mbs'].idxmax()] -print('\nBest config per model:') -for _, row in best_per_model.sort_values('total_mbs', ascending=False).iterrows(): - print(f" {row['model']:14s}: {row['total_mbs']:,.0f} MB/s (mca={int(row['mca'])}, users={int(row['users'])}, gen={row['gen_mode']})") - -# Average across all configs -avg_per_model = cpu0.groupby('model')['total_mbs'].mean().sort_values(ascending=False) -print('\nAverage throughput per model (all configs):') -for model, avg in avg_per_model.items(): - print(f" {model:14s}: {avg:,.0f} MB/s") diff --git a/kv_cache_benchmark/discovery_results_and_analysis/mlperf_kvcache_slowsystem.xlsx b/kv_cache_benchmark/discovery_results_and_analysis/mlperf_kvcache_slowsystem.xlsx deleted file mode 100644 index 37f30729..00000000 Binary files a/kv_cache_benchmark/discovery_results_and_analysis/mlperf_kvcache_slowsystem.xlsx and /dev/null differ diff --git a/kv_cache_benchmark/discovery_results_and_analysis/mlperf_storage_summary_fast.xlsx b/kv_cache_benchmark/discovery_results_and_analysis/mlperf_storage_summary_fast.xlsx deleted file mode 100644 index 91f2fd2a..00000000 Binary files a/kv_cache_benchmark/discovery_results_and_analysis/mlperf_storage_summary_fast.xlsx and /dev/null differ diff --git a/kv_cache_benchmark/discovery_results_and_analysis/mlperfv3_results_and_metrics_discovery.md b/kv_cache_benchmark/discovery_results_and_analysis/mlperfv3_results_and_metrics_discovery.md deleted file mode 100644 index 27d6c897..00000000 --- a/kv_cache_benchmark/discovery_results_and_analysis/mlperfv3_results_and_metrics_discovery.md +++ /dev/null @@ -1,1649 +0,0 @@ -# MLPerf v3 KV Cache Benchmark: Results and Metrics Discovery - -*Analysis performed on 2026-01-09* -*Datasets: mlperf_storage_summary_fast.xlsx (1411 tests), mlperf_kvcache_slowsystem.xlsx (268 tests)* - ---- - -## Executive Summary - -This document analyzes benchmark results from two storage systems - "Fast" and "Slow" - to validate that the kv-cache.py benchmark can differentiate storage performance tiers, identify which metrics to report for MLPerf v3 submissions, and determine optimal invocation parameters for reproducible results. - -**Key Findings:** - -1. **Decode Bytes Read** (I/O Volume) differentiates storage tiers at **2.6x** at cpu_mem=0GB, **100% Fast win rate** -2. **Wall-Clock Throughput** shows **2.4x** differentiation at cpu_mem=0GB, **100% Fast win rate** -3. **Storage Throughput** shows **2.2x** at cpu_mem=4GB but **only 1.1x** at cpu_mem=0GB (misleading metric when I/O-saturated) -4. **cpu_mem=0GB** maximizes storage stress; **cpu_mem=4GB** works better for Storage Throughput metric -5. **llama3.1-70b** generates most I/O per request; **llama3.1-8b/mistral-7b** achieve highest aggregate throughput -6. **Variance is high** (CV 50-125% depending on configuration), requiring multiple trials - ---- - -## 1. Test Systems - -### 1.1 Fast System (Bare Metal) - -| Component | Specification | -|-----------|---------------| -| Server | Supermicro SYS-621H-TN12R | -| CPU | 2x Intel Xeon Silver 4510 (24C/48T total) | -| CPU Frequency | 2.4 GHz base, 4.2 GHz turbo | -| System RAM | 256 GB DDR5-4800 ECC (16x 16GB DIMMs) | -| Memory Config | 8 channels per CPU, 1 DIMM per channel | -| L3 Cache | 60 MB (30 MB per socket) | -| NVMe Device | /dev/nvme4n1, 7.0 TB | -| **NVMe Bandwidth** | **14,000 MB/s read (theoretical)** | -| OS | Ubuntu 22.04, Linux 6.5.0-15-generic | -| Python | 3.10.12 | - -*GPU (NVIDIA H100 NVL, 94GB HBM3) present but not used during discovery tests.* - -### 1.2 Slow System (Virtualized) - -| Component | Specification | -|-----------|---------------| -| Hypervisor | VMware ESXi 8.0.3U3 | -| Guest OS | Ubuntu 22.04.5, Linux 6.8.0-90 | -| System RAM | 128 GB DDR4-2400 | -| Storage | VMFS6 volume at /mnt/kv-cache | -| **Storage Bandwidth** | **~3,000 MB/s (theoretical)** | - -### 1.3 Expected Differentiation - -Based on theoretical storage bandwidth alone: -- Fast: 14,000 MB/s -- Slow: 3,000 MB/s -- **Expected ratio: 4.7x** - -Observed ratio (2.1x-2.3x) is lower due to: -1. Benchmark overhead (Python, threading, memory copies) -2. NVMe not saturated at all queue depths -3. CPU/memory bottlenecks in virtualized environment - ---- - -## 2. Dataset Overview - -### 2.1 Concurrency Model - -The kv-cache.py benchmark implements a **multi-user, producer-consumer** concurrency model with three distinct layers of concurrency control: - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ CONCURRENCY ARCHITECTURE │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ LAYER 1: Request Generation (--num-users) │ -│ ┌─────────────┐ │ -│ │ User 1 │──┐ │ -│ ├─────────────┤ │ ┌──────────────┐ │ -│ │ User 2 │──┼────▶│ Request │ LAYER 2: Request Processing │ -│ ├─────────────┤ │ │ Queue │ ┌──────────────────────────┐ │ -│ │ ... │──┤ │ (Priority) │────▶│ Worker Pool │ │ -│ ├─────────────┤ │ └──────────────┘ │ min(users, 500) │ │ -│ │ User N │──┘ │ threads │ │ -│ └─────────────┘ └───────────┬──────────────┘ │ -│ │ │ -│ ▼ │ -│ ┌──────────────────────────┐ │ -│ │ LAYER 3: Allocation │ │ -│ │ Semaphore │ │ -│ │ (--max-concurrent- │ │ -│ │ allocs) │ │ -│ │ Bounds RAM usage │ │ -│ └──────────────────────────┘ │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - -#### Layer 1: Request Generation (`--num-users`) - -Each simulated user runs in its own thread, generating requests and pushing them to a priority queue: - -```python -# From IntegratedBenchmark.__init__ (line 2635) -self.request_queue = queue.PriorityQueue() -``` - -The `--num-users` flag controls how many user simulation threads generate requests concurrently. - -#### Layer 2: Worker Pool (min(users, 500) threads) - -Worker threads pull requests from the queue and process them: - -```python -# From IntegratedBenchmark.run() (lines 3149-3153) -num_workers = min(self.num_users, 500) -for _ in range(num_workers): - proc_thread = threading.Thread(target=self.process_requests, args=(stop_event,), daemon=True) - threads.append(proc_thread) - proc_thread.start() -``` - -Each worker runs this loop: - -```python -# From IntegratedBenchmark.process_requests() (lines 2923-2926) -def process_requests(self, stop_event: threading.Event): - """The main worker loop that processes requests from the queue.""" - while not stop_event.is_set(): - priority_tuple, request = self.request_queue.get(timeout=0.5) - # ... process request ... -``` - -#### Layer 3: Allocation Semaphore (`--max-concurrent-allocs`) - -This is the critical throttle for RAM usage. When a worker needs to allocate KV cache data, it must acquire a semaphore permit: - -```python -# From MultiTierCache.__init__ (lines 1188-1192) -# Semaphore to limit concurrent allocations (bounds RAM usage). -# If max_concurrent_allocs is 0 or None, no limit is applied. -if self.max_concurrent_allocs and self.max_concurrent_allocs > 0: - self.allocation_semaphore = threading.Semaphore(self.max_concurrent_allocs) -else: - self.allocation_semaphore = None -``` - -```python -# From MultiTierCache.allocate_cache() (lines 1539-1548) -# Use semaphore to limit concurrent allocations if configured. -# This bounds RAM usage by limiting how many threads can hold large -# data arrays simultaneously. -if self.allocation_semaphore: - self.allocation_semaphore.acquire() - -try: - return self._allocate_cache_inner(key, num_tokens, phase) -finally: - if self.allocation_semaphore: - self.allocation_semaphore.release() -``` - -**Why this matters:** The `_allocate_cache_inner()` function generates large numpy arrays (the KV cache data). Without the semaphore, all 500 workers could simultaneously allocate multi-GB arrays, causing memory exhaustion. The semaphore limits how many threads can hold these arrays at once. - -#### Summary Table - -| Parameter | CLI Flag | Code Location | What It Controls | -|-----------|----------|---------------|------------------| -| **Users** | `--num-users N` | Line 3144 | Number of user simulation threads generating requests | -| **Workers** | *(derived)* | Line 3149 | `min(users, 500)` threads processing requests | -| **Max Concurrent Allocs** | `--max-concurrent-allocs N` | Line 1191 | Semaphore permits for simultaneous cache allocations | -| **Queue Depth** | *(observed)* | `request_queue.qsize()` | Backlog of requests waiting to be processed | - -#### Clarification on "qd" in filenames - -The `qdN` in filenames like `mlperf_v3_storage_llama2-7b_cpu0GB_qd16_gennone_users100.json` refers to `--max-concurrent-allocs`, NOT the observed queue depth. - -| Filename Value | Meaning | Effect | -|----------------|---------|--------| -| `qd0` | `--max-concurrent-allocs 0` | No semaphore, unlimited concurrent allocations | -| `qd16` | `--max-concurrent-allocs 16` | Max 16 threads can allocate cache simultaneously | - -The observed `queue_depth` metric in logs (`request_queue.qsize()`) is different - it's the instantaneous backlog that fluctuates during the benchmark. - -### 1.2 Test Configuration Space - -| Parameter | Fast System | Slow System | Notes | -|-----------|-------------|-------------|-------| -| Total tests | 1411 | 268 | Fast has 5x more coverage | -| Models | llama2-7b, llama3.1-8b, llama3.1-70b, mistral-7b | llama2-7b, llama3.1-8b, llama3.1-70b, mistral-7b | Same models for comparison | -| CPU Memory | 0, 4, 8, 16, 32, 64 GB | 0, 4 GB | Fast tested higher tiers | -| Max Concurrent Allocs | 0, 2, 4, 8, 16, 32, 64 | 0, 2, 4 | Fast tested higher limits | -| Users | 10-200 | 10-500 | Slow tested higher concurrency | -| Gen Mode | none, realistic | none, realistic | Both tested simulation modes | - -### 1.3 Matched Configuration Analysis - -For apples-to-apples comparison, we filtered to **220 matched configurations** where both systems ran identical (model, cpu_mem, max_concurrent_allocs, gen_mode, users) combinations. - ---- - -## 3. Can kv-cache.py Differentiate Storage Tiers? - -**Yes.** Across all matched configurations, the benchmark consistently identifies the Fast system as faster. - -### 2.1 Global Differentiation (All 220 Matched Configs) - -| Metric | Fast Mean | Slow Mean | Ratio | Differentiation | -|--------|-----------|-----------|-------|-----------------| -| Storage Throughput (tok/s) | 88.47 | 41.56 | **2.13x** | CLEAR | -| Wall-Clock Throughput (tok/s) | 610.36 | 290.02 | **2.10x** | CLEAR | -| Storage Latency Mean (ms) | 8,598 | 12,917 | **1.50x** | CLEAR | -| Storage Latency P95 (ms) | 36,504 | 45,091 | **1.24x** | YES | -| Storage Latency P99 (ms) | 57,372 | 71,821 | **1.25x** | YES | -| E2E Latency P95 (ms) | 126,042 | 168,911 | **1.34x** | YES | - -The benchmark shows a **clear 2x differentiation** in throughput metrics, with latency metrics showing more modest but still measurable differences. - -### 2.2 Differentiation by CPU Memory Limit - -This is a critical finding. The `cpu_mem_gb` parameter dramatically affects which metrics show differentiation: - -#### Storage Throughput (Misleading at cpu_mem=0GB) - -| cpu_mem | Fast Storage Throughput | Slow Storage Throughput | Ratio | Fast Win Rate | -|---------|-------------------------|-------------------------|-------|---------------| -| 0 GB | 9.53 tok/s | 8.50 tok/s | **1.12x** | 62.2% | -| 4 GB | 167.94 tok/s | 75.15 tok/s | **2.23x** | 97.2% | - -#### I/O Volume Metrics (True Differentiation at cpu_mem=0GB) - -| cpu_mem | Metric | Fast Mean | Slow Mean | Ratio | Fast Win Rate | -|---------|--------|-----------|-----------|-------|---------------| -| **0 GB** | Decode Bytes Read | 1,195 GB | 447 GB | **2.62x** | **100%** | -| **0 GB** | Wall-Clock Throughput | 557 tok/s | 224 tok/s | **2.43x** | **100%** | -| **0 GB** | Prefill Bytes Written | 146 GB | 68 GB | **2.15x** | **100%** | -| 4 GB | Decode Bytes Read | 557 GB | 271 GB | **2.06x** | 100% | -| 4 GB | Wall-Clock Throughput | 692 tok/s | 387 tok/s | **1.79x** | 100% | - -**Why Storage Throughput is misleading at cpu_mem=0GB:** - -Storage Throughput = Total Tokens / Total I/O Time - -At cpu_mem=0GB, both systems are **100% I/O-bound** - every token requires NVMe access. - -| System | Decode Bytes Read | Total I/O Time | Storage Throughput | -|--------|-------------------|----------------|-------------------| -| Fast | 1,195 GB | ~8,000 s | 9.53 tok/s | -| Slow | 447 GB | ~7,100 s | 8.50 tok/s | -| **Ratio** | **2.62x** | **1.13x** | **1.12x** | - -The Fast system: -- Reads **2.62x more bytes** from NVMe (more work done) -- Accumulates **~1.13x more I/O time** (because more I/O operations) -- These effects **cancel out** in Storage Throughput! - -**What each metric measures:** - -| Metric | What It Measures | cpu_mem=0 Ratio | cpu_mem=4 Ratio | -|--------|------------------|-----------------|-----------------| -| **Decode Bytes Read** | Total storage work completed | **2.62x** | 2.06x | -| **Wall-Clock Throughput** | Real-world tokens/sec | **2.43x** | 1.79x | -| **Storage Throughput** | Tokens per unit of I/O time | 1.12x | **2.23x** | - -**Key Insight:** Storage Throughput measures **efficiency per I/O operation**, not **total work done**. At cpu_mem=0GB where both systems are saturated, efficiency converges. The Fast system's advantage is that it **completes more I/O operations** in the same wall time - captured by Decode Bytes Read and Wall-Clock Throughput. - -**Recommendations by Use Case:** - -| Use Case | cpu_mem | Primary Metric | Expected Ratio | Why | -|----------|---------|----------------|----------------|-----| -| **Max storage stress** | **0 GB** | **Decode Bytes Read** | **2.6x** | Measures total storage work | -| **Max storage stress** | **0 GB** | **Wall-Clock Throughput** | **2.4x** | Measures real throughput | -| **Traditional benchmark** | 4 GB | Storage Throughput | 2.2x | Works when I/O is bursty | - -### 2.3 Differentiation by Model - -| Model | Fast (tok/s) | Slow (tok/s) | Ratio | Notes | -|-------|--------------|--------------|-------|-------| -| llama3.1-8b | 308.50 | 133.37 | **2.31x** | Best differentiation | -| mistral-7b | 306.56 | 132.98 | **2.31x** | Best differentiation | -| llama2-7b | 42.59 | 23.35 | **1.82x** | Good | -| llama3.1-70b | 57.54 | 32.28 | **1.78x** | Moderate | - -Smaller models (7b-8b) show stronger differentiation because their KV cache blocks fit more granularly into storage tiers, exposing I/O patterns more directly. The 70b model's larger cache blocks amortize some storage overhead, reducing visible differentiation. - -### 2.4 Differentiation by User Count - -| Users | Matched Configs | Ratio (Fast/Slow) | Fast CV | Slow CV | -|-------|-----------------|-------------------|---------|---------| -| 10 | 12 | 2.20x | 52.44% | 51.77% | -| 20 | 12 | 2.13x | 81.07% | 63.08% | -| 50 | 48 | 2.20x | 125.27% | 113.27% | -| 100 | 35 | 2.23x | 120.62% | 116.08% | -| 150 | 33 | 2.21x | 117.47% | 110.26% | -| 200 | 32 | 2.12x | 120.03% | 111.25% | - -Differentiation remains stable (~2.1x to 2.2x) across user counts. However, **variance increases with concurrency**. At 10 users, CV is ~52%. At 100+ users, CV exceeds 100%. This matters for repeatability. - ---- - -## 4. Which Metrics Should MLPerf Report? - -### 3.1 Metric Evaluation Matrix - -The choice of metric depends on the `cpu_mem` setting: - -**At cpu_mem=0GB (Maximum Storage Stress):** - -| Metric | Mean Ratio | Fast Win Rate | Recommendation | -|--------|------------|---------------|----------------| -| **Decode Bytes Read (GB)** | **2.62x** | **100%** | **PRIMARY** | -| **Wall-Clock Throughput** | **2.43x** | **100%** | **PRIMARY** | -| Prefill Bytes Written (GB) | 2.15x | 100% | SECONDARY | -| Storage Throughput | 1.12x | 62.2% | **NOT RECOMMENDED** (misleading) | - -**At cpu_mem=4GB (Mixed Workload):** - -| Metric | Mean Ratio | Fast Win Rate | Recommendation | -|--------|------------|---------------|----------------| -| **Storage Throughput** | **2.23x** | **97.2%** | **PRIMARY** | -| Decode Bytes Read (GB) | 2.06x | 100% | SECONDARY | -| Wall-Clock Throughput | 1.79x | 100% | SECONDARY | -| Storage Latency P95 | 2.22x | ~85% | SUPPORTING | - -### 3.2 Recommended Metrics for Submission - -**Critical:** The choice of primary metric depends on your `cpu_mem` setting. - -#### For cpu_mem=0GB: Primary Metric is Decode Bytes Read (GB) - -``` -Decode Read Bandwidth = Decode Bytes Read (GB) / benchmark_duration (s) -``` - -At cpu_mem=0GB, all I/O goes through NVMe. The Fast system reads **2.62x more bytes** in the same benchmark duration, proving superior storage performance. - -**Pros:** -- **100% Fast win rate** - no edge cases -- **2.62x differentiation** - strongest of all metrics -- Measures actual storage work done -- Hardware-agnostic: bytes transferred is bytes transferred - -**Cons:** -- Requires standardized benchmark duration across submitters -- Raw GB less intuitive than tok/s - -#### For cpu_mem=4GB: Primary Metric is Storage Throughput (tokens/sec) - -``` -Storage Throughput = tokens_with_nvme_io / total_nvme_io_time -``` - -At cpu_mem=4GB, some tokens hit CPU cache, creating bursty I/O patterns where Storage Throughput differentiates well. - -**Pros:** -- 2.2x differentiation between tiers -- 97% win rate -- Familiar tok/s units - -**Cons:** -- **MISLEADING at cpu_mem=0GB** (shows only 1.1x due to I/O time normalization) -- Requires cpu_mem ≥ 4GB to work correctly - -#### Secondary Metric: Wall-Clock Throughput (tokens/sec) - -``` -Wall-Clock Throughput = total_tokens_generated / total_benchmark_duration -``` - -This is the user-facing metric. It answers: "How many tokens per second does my inference system deliver?" - -**Pros:** -- 100% Fast win rate -- 2.1x differentiation -- Relatable to production workloads - -**Cons:** -- Includes generation delay (when gen_mode ≠ none) -- Not purely a storage metric - -#### Tertiary Metric: I/O Volume (Decode Bytes Read / Prefill Bytes Written) - -When all submitters run **identical invocations for identical durations**, I/O volume becomes a valid Unit of Work measurement: - -``` -Decode Read Bandwidth = Decode Bytes Read (GB) / benchmark_duration (s) -Prefill Write Bandwidth = Prefill Bytes Written (GB) / benchmark_duration (s) -``` - -**Pros:** -- **100% Fast win rate** for both metrics across all 220 configurations -- 2.30x differentiation for Decode Read, 1.98x for Prefill Write -- Hardware-agnostic: measures actual bytes transferred -- Directly comparable across submissions with standardized duration - -**Cons:** -- Requires standardized benchmark duration across all submitters -- Raw GB values less intuitive than tok/s or latency - -**Note:** Decode Bytes Read shows stronger differentiation (2.30x) than Storage Throughput (2.13x), making it a robust validation metric. - -#### Supporting Metrics: Storage Latency P95/P99 - -These tail latency metrics matter for SLA-sensitive deployments. A 1.24x difference in P95 latency (36.5s vs 45.1s) can be the difference between acceptable and unacceptable user experience. - -### 3.3 Correlation Analysis - -The correlation matrix of Fast/Slow ratios reveals an important insight: - -``` - ratio_storage_tput ratio_wallclock ratio_latency_p95 ratio_io_time -ratio_storage_tput 1.000 -0.077 0.837 0.887 -ratio_wallclock -0.077 1.000 -0.315 -0.473 -ratio_latency_p95 0.837 -0.315 1.000 0.879 -ratio_io_time 0.887 -0.473 0.879 1.000 -``` - -**Observation:** Storage Throughput and Wall-Clock Throughput are **nearly uncorrelated** (r = -0.077). This means they measure fundamentally different aspects of system performance. Both should be reported. - ---- - -## 5. Optimal Invocation Parameters for MLPerf Submission? - -### 4.1 Recommended Configuration - -Based on this analysis, the optimal kv-cache.py invocation depends on your benchmarking goal: - -#### Option 1: Maximum Storage Stress (cpu_mem=0GB) - -Use when you want to **stress test NVMe** and measure **I/O volume differentiation**: - -| Parameter | Recommended Value | Rationale | -|-----------|-------------------|-----------| -| `cpu_mem_gb` | **0** | Forces ALL I/O through NVMe - 4x more read I/O than cpu_mem=4 | -| `model` | **llama3.1-8b** or **mistral-7b** | Highest aggregate throughput (~11 GB/s peak) | -| `users` | **200** | Maximum sustained throughput | -| `max_concurrent_allocs` | **16** | Slight peak at this value | -| `gen_mode` | **none** | Pure I/O benchmark | -| **Primary Metric** | **Decode Bytes Read** | 2.62x differentiation, 100% win rate | - -#### Option 2: Storage Throughput Focus (cpu_mem=4GB) - -Use when you want **Storage Throughput (tok/s)** as your primary metric: - -| Parameter | Recommended Value | Rationale | -|-----------|-------------------|-----------| -| `cpu_mem_gb` | **4** | Storage Throughput metric works correctly at this setting | -| `model` | **llama3.1-8b** or **mistral-7b** | Best differentiation (2.31x) | -| `users` | **100-150** | Good balance of load and variance | -| `max_concurrent_allocs` | **0 or 2** | Minimal allocation throttling | -| `gen_mode` | **none** | Pure I/O benchmark | -| **Primary Metric** | **Storage Throughput** | 2.2x differentiation, 97% win rate | - -### 4.2 Alternative: Focus on Latency SLAs - -If the submission targets latency-sensitive workloads, use: - -| Parameter | Recommended Value | Rationale | -|-----------|-------------------|-----------| -| `gen_mode` | **realistic** | Simulates real inference timing | -| `cpu_mem_gb` | **4-8** | Realistic caching behavior | -| `max_concurrent_allocs` | **4** | Moderate allocation throttling | -| `users` | **50-100** | Realistic concurrency | -| `model` | **llama3.1-70b** | Larger model = larger KV cache = more storage pressure | - -### 4.3 Top 10 Configurations by Differentiation - -These configurations (gen_mode=none) showed the strongest Fast/Slow differentiation: - -| Model | cpu_mem | MCA | Users | Fast (tok/s) | Slow (tok/s) | Ratio | -|-------|---------|-----|-------|--------------|--------------|-------| -| mistral-7b | 0 | 0 | 200 | 7.5 | 2.0 | **3.80x** | -| llama3.1-8b | 0 | 0 | 200 | 7.5 | 2.1 | **3.57x** | -| mistral-7b | 0 | 0 | 150 | 9.2 | 2.7 | **3.42x** | -| llama3.1-8b | 0 | 0 | 150 | 9.0 | 2.6 | **3.39x** | -| llama3.1-70b | 4 | 4 | 20 | 94.1 | 29.0 | **3.25x** | -| llama2-7b | 0 | 0 | 150 | 2.2 | 0.7 | **3.16x** | -| llama3.1-70b | 4 | 4 | 50 | 92.1 | 30.7 | **3.01x** | -| llama2-7b | 4 | 2 | 200 | 68.1 | 23.2 | **2.93x** | -| llama2-7b | 0 | 0 | 100 | 2.8 | 1.0 | **2.89x** | -| mistral-7b | 0 | 0 | 100 | 10.1 | 3.5 | **2.88x** | - -*MCA = max_concurrent_allocs (--max-concurrent-allocs)* - -**Note:** These are **Storage Throughput** ratios. The highest ratios (3.5x-3.8x) occur at cpu_mem=0GB with very low absolute throughput (7-10 tok/s). However, these ratios may be misleading - see Section 2.2 for why Storage Throughput can be unreliable at cpu_mem=0GB. - -**Better metric for cpu_mem=0GB:** Decode Bytes Read shows 2.62x differentiation with 100% win rate. - ---- - -## 6. Variance and Repeatability - -### 5.1 Coefficient of Variation by Configuration - -Variance (measured as CV = std/mean) is substantial: - -| Config Type | Typical CV | Implication | -|-------------|------------|-------------| -| Low concurrency (10 users) | ~52% | Moderate variance | -| Medium concurrency (50-100 users) | ~115-125% | High variance | -| High concurrency (200 users) | ~110-120% | High variance | - -This high variance means **multiple trials are essential**. A single run cannot reliably differentiate storage tiers. - -### 5.2 Trial Recommendations - -Based on the variance analysis: - -1. **Minimum 3 trials per configuration** for basic differentiation -2. **5+ trials recommended** for publication-quality results -3. Report **median** rather than mean to reduce outlier impact -4. Report **P95** and **P99** alongside mean for latency metrics - ---- - -## 7. Anomalies and Edge Cases - -### 7.1 Total I/O Time Paradox - -Total I/O Time shows a **0.71x** Fast/Slow ratio - meaning Fast appears *slower*. This is NOT a sampling artifact - it's expected behavior: - -**At cpu_mem=0GB:** -- Fast system reads **2.62x more bytes** from NVMe -- Therefore Fast accumulates **more Total I/O Time** (more operations × time per operation) -- This is why Storage Throughput (tokens / I/O time) shows only 1.1x - the numerator and denominator both scale up - -**The insight:** Total I/O Time is NOT a performance metric. A system that does **more work** in the same benchmark duration will have **higher** Total I/O Time. Use Decode Bytes Read or Wall-Clock Throughput instead. - -### 7.2 Cache Hit Rate Neutrality - -Cache Hit Rate shows minimal differentiation (Fast: 90%, Slow: 88%). This is expected - cache hit rate is primarily driven by workload access patterns, not storage speed. It's a configuration validation metric, not a performance differentiator. - ---- - -## 8. Conclusion - -The kv-cache.py benchmark **successfully differentiates storage performance tiers**. Key recommendations: - -**For Maximum Storage Stress (cpu_mem=0GB):** -1. **Primary metric: Decode Bytes Read** (2.62x differentiation, 100% win rate) -2. **Secondary metric: Wall-Clock Throughput** (2.43x differentiation, 100% win rate) -3. **DO NOT use Storage Throughput** at cpu_mem=0GB (shows only 1.1x - misleading) -4. **Use llama3.1-8b or mistral-7b** with 200 users for maximum aggregate throughput -5. **Use llama3.1-70b** for maximum per-request storage stress - -**For Storage Throughput Metric (cpu_mem=4GB):** -1. **Primary metric: Storage Throughput** (2.2x differentiation, 97% win rate) -2. **Use cpu_mem=4GB** - Storage Throughput metric fails at cpu_mem=0GB -3. **Use llama3.1-8b or mistral-7b** for best throughput differentiation - -**General:** -- **Run 3-5 trials** per configuration to account for variance -- **Use gen_mode=none** for pure I/O benchmarking -- **Report median and P95** for latency metrics - -The benchmark is ready for MLPerf v3 submission with these configurations. - ---- - -## Appendix A: Statistical Summary - -### A.1 Storage Throughput - -| System | Min | Max | Mean | Std | -|--------|-----|-----|------|-----| -| Fast | 2.23 | 394.66 | 88.47 | 120.81 | -| Slow | 0.71 | 182.87 | 41.56 | 51.59 | - -### A.2 Wall-Clock Throughput - -| System | Min | Max | Mean | Std | -|--------|-----|-----|------|-----| -| Fast | 88.72 | 1415.96 | 610.36 | 405.28 | -| Slow | 37.09 | 785.52 | 290.02 | 199.18 | - -### A.3 Storage Latency P95 - -| System | Min | Max | Mean | Std | -|--------|-----|-----|------|-----| -| Fast | 1,257 ms | 171,523 ms | 36,504 ms | 34,191 | -| Slow | 2,669 ms | 255,381 ms | 45,091 ms | 43,469 | - ---- - -## Appendix B: Recommended Invocations - -### B.1 Comprehensive Sweep (Full Configuration Space) - -Run a full parameter sweep to characterize storage performance across configurations: - -```bash -#!/bin/bash -# Full benchmark sweep - generates ~100+ result files - -MODELS="llama3.1-8b mistral-7b llama3.1-70b llama2-7b" -CPU_MEM="0 4 8 16" -MCA="0 2 4 8" -USERS="50 100 150 200" -GEN_MODES="none realistic" -DURATION=300 -TRIALS=3 - -mkdir -p results - -for model in $MODELS; do - for cpu in $CPU_MEM; do - for mca in $MCA; do - for users in $USERS; do - for gen in $GEN_MODES; do - for trial in $(seq 1 $TRIALS); do - outfile="results/mlperf_${model}_cpu${cpu}GB_mca${mca}_gen${gen}_users${users}_trial${trial}.json" - echo "Running: $outfile" - python kv-cache.py \ - --model $model \ - --cpu-memory-gb $cpu \ - --gpu-memory-gb 0 \ - --max-concurrent-allocs $mca \ - --users $users \ - --duration $DURATION \ - --generation-mode $gen \ - --output $outfile - done - done - done - done - done -done - -# Convert all results to XLSX -python utils/json_to_xlsx.py results/ --output mlperf_storage_summary.xlsx -``` - -### B.2 Storage Tier Differentiation (Primary Use Case) - -For MLPerf v3 submissions comparing storage systems: - -```bash -# Recommended: Maximum storage differentiation -python kv-cache.py \ - --model llama3.1-8b \ - --cpu-memory-gb 4 \ - --gpu-memory-gb 0 \ - --max-concurrent-allocs 0 \ - --users 100 \ - --duration 300 \ - --generation-mode none \ - --output results/mlperf_storage_$(hostname)_trial1.json - -# Run 3-5 trials for statistical significance -for trial in 1 2 3 4 5; do - python kv-cache.py \ - --model llama3.1-8b \ - --cpu-memory-gb 4 \ - --gpu-memory-gb 0 \ - --max-concurrent-allocs 0 \ - --users 100 \ - --duration 300 \ - --generation-mode none \ - --output results/mlperf_storage_$(hostname)_trial${trial}.json -done -``` - -**Why these parameters:** -| Parameter | Value | Rationale | -|-----------|-------|-----------| -| `--model` | llama3.1-8b | Best differentiation (2.31x ratio) | -| `--cpu-memory-gb` | 4 | Forces NVMe usage while maintaining differentiation | -| `--gpu-memory-gb` | 0 | Excludes GPU from cache hierarchy | -| `--max-concurrent-allocs` | 0 | Unlimited parallelism for maximum throughput | -| `--users` | 100 | Balance between load and variance | -| `--duration` | 300 | 5 minutes for stable metrics | -| `--generation-mode` | none | Pure I/O benchmark, no token generation delay | - -### B.3 Large Model for Maximum Storage Pressure - -Larger models have larger KV cache blocks, which stress storage bandwidth more effectively: - -```bash -# Llama3.1-70b: ~2.5x larger KV cache per token than 8b models (320KB vs 128KB) -# Better for systems with high-bandwidth storage (NVMe, CXL) -for trial in 1 2 3; do - python kv-cache.py \ - --model llama3.1-70b \ - --cpu-memory-gb 4 \ - --gpu-memory-gb 0 \ - --max-concurrent-allocs 4 \ - --users 50 \ - --duration 300 \ - --generation-mode none \ - --output results/mlperf_70b_$(hostname)_trial${trial}.json -done -``` - -**Why llama3.1-70b matters:** -| Model | KV Cache per Token | Storage I/O per Request | Use Case | -|-------|-------------------|------------------------|----------| -| llama3.1-8b | 128 KB | Lower | Best differentiation ratio | -| llama3.1-70b | 320 KB | Higher | Maximum storage bandwidth stress | -| mistral-7b | 128 KB | Lower | Alternative to 8b | -| llama2-7b | 512 KB | Highest | MHA architecture (4x more than GQA) | - -The 70b model generates ~2.5x more storage I/O per token than 8b (due to 80 vs 32 layers), making it ideal for: -- High-bandwidth NVMe arrays (PCIe 5.0, multiple drives) -- CXL memory expanders -- Enterprise storage systems where small I/Os don't saturate bandwidth - -**Recommended: Run both 8b and 70b models** to characterize storage across different I/O sizes. - -### B.4 Alternative Models - -```bash -# Mistral-7b: Similar differentiation to llama3.1-8b -python kv-cache.py --model mistral-7b --cpu-memory-gb 4 --users 100 --duration 300 --generation-mode none - -# Llama2-7b: Older model, good differentiation -python kv-cache.py --model llama2-7b --cpu-memory-gb 4 --users 100 --duration 300 --generation-mode none -``` - -### B.5 Realistic Workload Simulation - -For benchmarks that include token generation timing: - -```bash -python kv-cache.py \ - --model llama3.1-8b \ - --cpu-memory-gb 4 \ - --gpu-memory-gb 0 \ - --max-concurrent-allocs 4 \ - --users 50 \ - --duration 300 \ - --generation-mode realistic \ - --output results/mlperf_realistic_$(hostname).json -``` - -### B.6 Stress Test (Maximum I/O Load) - -```bash -python kv-cache.py \ - --model llama3.1-8b \ - --cpu-memory-gb 0 \ - --gpu-memory-gb 0 \ - --max-concurrent-allocs 16 \ - --users 200 \ - --duration 600 \ - --generation-mode none \ - --output results/mlperf_stress_$(hostname).json -``` - -**Note:** cpu_mem=0GB forces all I/O through NVMe, achieving: -- **Peak throughput: ~11 GB/s** (78% of theoretical 14 GB/s) -- **Decode Bytes Read differentiation: 2.62x** (strongest of all metrics) -- **100% Fast win rate** for I/O volume metrics - -**Important:** At cpu_mem=0GB, use **Decode Bytes Read** or **Wall-Clock Throughput** as your metric, NOT Storage Throughput (which shows only 1.1x due to I/O time normalization). - -### B.7 Quick Validation Run - -For rapid system validation before full benchmark: - -```bash -python kv-cache.py \ - --model mistral-7b \ - --cpu-memory-gb 4 \ - --users 50 \ - --duration 60 \ - --generation-mode none \ - --output results/quick_validation.json -``` - -### B.8 Post-Processing Results - -Convert JSON results to XLSX for analysis: - -```bash -python utils/json_to_xlsx.py results/ --output mlperf_storage_summary.xlsx -``` - ---- - -## Appendix C: Side-by-Side Comparison (All 220 Matched Configurations) - -This appendix provides the complete side-by-side comparison of all 220 matched configurations -between the Fast and Slow systems. The tables are organized by metric category. - -**Legend:** -- **Model:** L2-7b = llama2-7b, L3.1-8b = llama3.1-8b, L3.1-70b = llama3.1-70b, M-7b = mistral-7b -- **CPU:** CPU memory limit in GB (--cpu-memory-gb) -- **MCA:** Max concurrent allocations (--max-concurrent-allocs) -- **Gen:** Generation mode (none/real = realistic) -- **Ratio:** For throughput, Fast/Slow (higher = Fast wins). For latency, Slow/Fast (higher = Fast wins). - -### C.1 Summary by Model - -| Model | Configs | Avg Stor Tput Ratio | Avg WC Tput Ratio | Avg P95 Lat Ratio | Avg P99 Lat Ratio | -|-------|---------|---------------------|-------------------|-------------------|-------------------| -| L2-7b | 40 | 1.80x | 2.10x | 1.70x | 1.72x | -| L3.1-8b | 48 | 2.02x | 2.23x | 1.94x | 1.76x | -| L3.1-70b | 84 | 1.74x | 2.19x | 1.49x | 1.50x | -| M-7b | 48 | 1.98x | 2.18x | 1.89x | 1.78x | - -### C.2 Summary by CPU Memory - -| CPU Mem | Configs | Avg Stor Tput Ratio | Avg WC Tput Ratio | Avg P95 Lat Ratio | -|---------|---------|---------------------|-------------------|-------------------| -| 0 GB | 111 | 1.55x | 2.43x | 1.22x | -| 4 GB | 109 | 2.19x | 1.92x | 2.22x | - -### C.3 Summary by Generation Mode - -| Gen Mode | Configs | Avg Stor Tput Ratio | Avg WC Tput Ratio | -|----------|---------|---------------------|-------------------| -| none | 110 | 1.84x | 2.24x | -| realistic | 110 | 1.89x | 2.13x | - -### C.4 Summary by I/O Volume (Prefill/Decode) - -I/O Volume metrics show **100% Fast win rate** across all configurations, making them robust differentiation metrics when benchmark duration is standardized. - -**By Model:** - -| Model | Configs | Avg Prefill Ratio | Avg Decode Ratio | -|-------|---------|-------------------|------------------| -| L2-7b | 40 | 1.82x | 2.29x | -| L3.1-8b | 48 | 2.12x | 2.27x | -| L3.1-70b | 84 | 1.90x | 2.37x | -| M-7b | 48 | 2.09x | 2.23x | - -**By CPU Memory:** - -| CPU Mem | Configs | Avg Prefill Ratio | Avg Decode Ratio | -|---------|---------|-------------------|------------------| -| 0 GB | 111 | 2.14x | 2.62x | -| 4 GB | 109 | 1.80x | 1.98x | - -**By Generation Mode:** - -| Gen Mode | Configs | Avg Prefill Ratio | Avg Decode Ratio | -|----------|---------|-------------------|------------------| -| none | 110 | 2.01x | 2.33x | -| realistic | 110 | 1.94x | 2.27x | - -**Key Finding:** Unlike Storage Throughput (which shows stronger differentiation at cpu_mem=4GB), I/O Volume shows **stronger differentiation at cpu_mem=0GB** (2.62x Decode vs 1.98x). This is because cpu_mem=0GB forces all tokens through NVMe, maximizing storage I/O volume differentiation. - -### C.5 Full Throughput Comparison - -Storage Throughput (tok/s) and Wall-Clock Throughput (tok/s) for all 220 matched configurations. - -| Model | CPU | MCA | Gen | Users | Stor Fast | Stor Slow | Ratio | WC Fast | WC Slow | Ratio | -|-------|-----|-----|-----|-------|-----------|-----------|-------|---------|---------|-------| -| L2-7b | 0 | 0 | none | 50 | 4.6 | 2.5 | 1.85x | 179 | 66 | 2.70x | -| L2-7b | 0 | 0 | none | 100 | 2.8 | 1.0 | 2.89x | 243 | 57 | 4.27x | -| L2-7b | 0 | 0 | none | 150 | 2.2 | 0.7 | 3.16x | 297 | 64 | 4.64x | -| L2-7b | 0 | 0 | real | 50 | 4.9 | 2.6 | 1.86x | 163 | 81 | 2.02x | -| L2-7b | 0 | 0 | real | 100 | 3.3 | 1.3 | 2.60x | 257 | 63 | 4.05x | -| L2-7b | 0 | 2 | none | 50 | 9.4 | 7.2 | 1.30x | 158 | 82 | 1.92x | -| L2-7b | 0 | 2 | none | 100 | 6.4 | 6.0 | 1.07x | 240 | 130 | 1.85x | -| L2-7b | 0 | 2 | none | 150 | 7.4 | 6.1 | 1.21x | 355 | 179 | 1.98x | -| L2-7b | 0 | 2 | none | 200 | 5.5 | 5.6 | 0.98x | 400 | 194 | 2.07x | -| L2-7b | 0 | 2 | real | 50 | 10.4 | 8.7 | 1.19x | 163 | 79 | 2.06x | -| L2-7b | 0 | 2 | real | 100 | 6.8 | 6.6 | 1.02x | 229 | 131 | 1.74x | -| L2-7b | 0 | 2 | real | 150 | 6.9 | 7.3 | 0.95x | 324 | 158 | 2.05x | -| L2-7b | 0 | 2 | real | 200 | 6.4 | 6.5 | 0.99x | 374 | 172 | 2.18x | -| L2-7b | 0 | 4 | none | 50 | 7.2 | 7.4 | 0.96x | 179 | 83 | 2.14x | -| L2-7b | 0 | 4 | none | 100 | 4.5 | 5.0 | 0.89x | 286 | 120 | 2.38x | -| L2-7b | 0 | 4 | none | 150 | 4.5 | 5.6 | 0.80x | 370 | 190 | 1.95x | -| L2-7b | 0 | 4 | none | 200 | 4.6 | 5.1 | 0.92x | 444 | 174 | 2.56x | -| L2-7b | 0 | 4 | real | 50 | 7.6 | 7.7 | 0.99x | 169 | 80 | 2.13x | -| L2-7b | 0 | 4 | real | 100 | 4.7 | 6.3 | 0.74x | 271 | 139 | 1.94x | -| L2-7b | 0 | 4 | real | 150 | 4.0 | 5.6 | 0.71x | 329 | 168 | 1.96x | -| L2-7b | 0 | 4 | real | 200 | 3.2 | 5.8 | 0.55x | 427 | 191 | 2.24x | -| L2-7b | 4 | 0 | none | 50 | 25.2 | 16.1 | 1.57x | 212 | 110 | 1.93x | -| L2-7b | 4 | 0 | real | 50 | 39.3 | 28.7 | 1.37x | 203 | 109 | 1.87x | -| L2-7b | 4 | 0 | real | 100 | 26.4 | 12.0 | 2.20x | 378 | 105 | 3.60x | -| L2-7b | 4 | 2 | none | 50 | 37.5 | 22.1 | 1.69x | 222 | 120 | 1.85x | -| L2-7b | 4 | 2 | none | 100 | 43.2 | 23.1 | 1.87x | 294 | 190 | 1.55x | -| L2-7b | 4 | 2 | none | 150 | 68.8 | 25.4 | 2.71x | 369 | 268 | 1.38x | -| L2-7b | 4 | 2 | none | 200 | 68.1 | 23.2 | 2.93x | 445 | 290 | 1.54x | -| L2-7b | 4 | 2 | real | 50 | 44.7 | 23.7 | 1.89x | 227 | 135 | 1.68x | -| L2-7b | 4 | 2 | real | 100 | 55.9 | 19.3 | 2.90x | 300 | 183 | 1.64x | -| L2-7b | 4 | 2 | real | 150 | 68.1 | 34.6 | 1.97x | 347 | 277 | 1.25x | -| L2-7b | 4 | 2 | real | 200 | 69.8 | 23.0 | 3.03x | 415 | 276 | 1.50x | -| L2-7b | 4 | 4 | none | 50 | 50.2 | 22.7 | 2.21x | 245 | 109 | 2.25x | -| L2-7b | 4 | 4 | none | 100 | 48.2 | 21.9 | 2.20x | 342 | 223 | 1.54x | -| L2-7b | 4 | 4 | none | 150 | 49.4 | 22.1 | 2.23x | 361 | 238 | 1.52x | -| L2-7b | 4 | 4 | none | 200 | 53.0 | 22.7 | 2.34x | 433 | 282 | 1.53x | -| L2-7b | 4 | 4 | real | 50 | 53.1 | 28.0 | 1.90x | 233 | 139 | 1.68x | -| L2-7b | 4 | 4 | real | 100 | 68.1 | 20.4 | 3.33x | 359 | 191 | 1.88x | -| L2-7b | 4 | 4 | real | 150 | 79.1 | 28.2 | 2.80x | 396 | 244 | 1.62x | -| L2-7b | 4 | 4 | real | 200 | 85.8 | 26.3 | 3.26x | 427 | 326 | 1.31x | -| L3.1-70b | 0 | 0 | none | 10 | 14.3 | 7.4 | 1.94x | 116 | 49 | 2.37x | -| L3.1-70b | 0 | 0 | none | 20 | 11.3 | 5.0 | 2.28x | 178 | 55 | 3.21x | -| L3.1-70b | 0 | 0 | none | 30 | 9.1 | 5.4 | 1.69x | 212 | 85 | 2.50x | -| L3.1-70b | 0 | 0 | none | 40 | 8.1 | 4.5 | 1.78x | 227 | 117 | 1.93x | -| L3.1-70b | 0 | 0 | none | 50 | 6.4 | 3.7 | 1.76x | 284 | 129 | 2.21x | -| L3.1-70b | 0 | 0 | none | 60 | 6.4 | 2.9 | 2.24x | 328 | 114 | 2.88x | -| L3.1-70b | 0 | 0 | none | 70 | 5.5 | 2.8 | 1.96x | 345 | 141 | 2.44x | -| L3.1-70b | 0 | 0 | real | 10 | 17.3 | 9.5 | 1.83x | 91 | 37 | 2.46x | -| L3.1-70b | 0 | 0 | real | 20 | 15.2 | 5.9 | 2.59x | 179 | 72 | 2.51x | -| L3.1-70b | 0 | 0 | real | 30 | 10.1 | 4.9 | 2.05x | 185 | 93 | 1.97x | -| L3.1-70b | 0 | 0 | real | 40 | 8.6 | 4.5 | 1.91x | 207 | 109 | 1.91x | -| L3.1-70b | 0 | 0 | real | 50 | 7.2 | 3.8 | 1.86x | 239 | 118 | 2.03x | -| L3.1-70b | 0 | 0 | real | 60 | 7.2 | 3.1 | 2.34x | 269 | 139 | 1.94x | -| L3.1-70b | 0 | 0 | real | 70 | 6.2 | 2.9 | 2.16x | 316 | 141 | 2.25x | -| L3.1-70b | 0 | 2 | none | 10 | 13.5 | 6.8 | 1.99x | 101 | 37 | 2.72x | -| L3.1-70b | 0 | 2 | none | 20 | 12.2 | 8.9 | 1.36x | 196 | 75 | 2.60x | -| L3.1-70b | 0 | 2 | none | 30 | 9.9 | 10.0 | 0.99x | 214 | 78 | 2.75x | -| L3.1-70b | 0 | 2 | none | 40 | 9.3 | 11.5 | 0.81x | 222 | 99 | 2.26x | -| L3.1-70b | 0 | 2 | none | 50 | 8.7 | 10.7 | 0.81x | 267 | 116 | 2.31x | -| L3.1-70b | 0 | 2 | none | 60 | 8.2 | 9.7 | 0.84x | 297 | 121 | 2.45x | -| L3.1-70b | 0 | 2 | none | 70 | 8.8 | 10.2 | 0.86x | 352 | 181 | 1.95x | -| L3.1-70b | 0 | 2 | real | 10 | 16.7 | 7.7 | 2.17x | 89 | 39 | 2.26x | -| L3.1-70b | 0 | 2 | real | 20 | 15.7 | 9.7 | 1.62x | 164 | 72 | 2.26x | -| L3.1-70b | 0 | 2 | real | 30 | 11.2 | 9.2 | 1.22x | 195 | 85 | 2.30x | -| L3.1-70b | 0 | 2 | real | 40 | 10.2 | 10.1 | 1.01x | 205 | 104 | 1.97x | -| L3.1-70b | 0 | 2 | real | 50 | 9.7 | 10.5 | 0.93x | 250 | 110 | 2.28x | -| L3.1-70b | 0 | 2 | real | 60 | 9.5 | 8.9 | 1.07x | 274 | 135 | 2.03x | -| L3.1-70b | 0 | 2 | real | 70 | 9.5 | 8.5 | 1.12x | 313 | 145 | 2.16x | -| L3.1-70b | 0 | 4 | none | 10 | 14.0 | 6.8 | 2.06x | 112 | 49 | 2.31x | -| L3.1-70b | 0 | 4 | none | 20 | 12.0 | 8.6 | 1.39x | 182 | 65 | 2.79x | -| L3.1-70b | 0 | 4 | none | 30 | 8.6 | 8.5 | 1.01x | 193 | 93 | 2.08x | -| L3.1-70b | 0 | 4 | none | 40 | 7.4 | 8.5 | 0.87x | 227 | 101 | 2.24x | -| L3.1-70b | 0 | 4 | none | 50 | 8.1 | 9.0 | 0.90x | 271 | 123 | 2.21x | -| L3.1-70b | 0 | 4 | none | 60 | 7.0 | 8.1 | 0.87x | 328 | 123 | 2.66x | -| L3.1-70b | 0 | 4 | none | 70 | 7.3 | 7.0 | 1.04x | 380 | 156 | 2.44x | -| L3.1-70b | 0 | 4 | real | 10 | 18.1 | 6.7 | 2.70x | 97 | 38 | 2.56x | -| L3.1-70b | 0 | 4 | real | 20 | 13.4 | 7.0 | 1.91x | 150 | 70 | 2.13x | -| L3.1-70b | 0 | 4 | real | 30 | 10.8 | 7.8 | 1.39x | 198 | 83 | 2.39x | -| L3.1-70b | 0 | 4 | real | 40 | 8.2 | 8.3 | 0.98x | 207 | 95 | 2.17x | -| L3.1-70b | 0 | 4 | real | 50 | 8.4 | 8.8 | 0.95x | 245 | 118 | 2.09x | -| L3.1-70b | 0 | 4 | real | 60 | 7.5 | 7.5 | 1.00x | 263 | 122 | 2.15x | -| L3.1-70b | 0 | 4 | real | 70 | 7.7 | 7.1 | 1.07x | 350 | 154 | 2.27x | -| L3.1-70b | 4 | 0 | none | 10 | 45.9 | 26.3 | 1.75x | 195 | 100 | 1.96x | -| L3.1-70b | 4 | 0 | none | 20 | 36.2 | 29.9 | 1.21x | 291 | 113 | 2.57x | -| L3.1-70b | 4 | 0 | none | 30 | 26.5 | 34.4 | 0.77x | 274 | 190 | 1.44x | -| L3.1-70b | 4 | 0 | none | 40 | 38.9 | 26.2 | 1.48x | 301 | 215 | 1.40x | -| L3.1-70b | 4 | 0 | none | 50 | 59.3 | 29.4 | 2.01x | 395 | 225 | 1.75x | -| L3.1-70b | 4 | 0 | none | 60 | 62.5 | 33.5 | 1.86x | 422 | 192 | 2.20x | -| L3.1-70b | 4 | 0 | none | 70 | 79.9 | 36.8 | 2.17x | 497 | 232 | 2.14x | -| L3.1-70b | 4 | 0 | real | 10 | 56.3 | 21.4 | 2.63x | 158 | 64 | 2.47x | -| L3.1-70b | 4 | 0 | real | 20 | 36.1 | 26.6 | 1.36x | 266 | 115 | 2.31x | -| L3.1-70b | 4 | 0 | real | 30 | 38.8 | 39.0 | 0.99x | 351 | 137 | 2.56x | -| L3.1-70b | 4 | 0 | real | 40 | 23.8 | 41.8 | 0.57x | 275 | 176 | 1.57x | -| L3.1-70b | 4 | 0 | real | 50 | 58.3 | 40.1 | 1.46x | 403 | 183 | 2.21x | -| L3.1-70b | 4 | 0 | real | 60 | 67.3 | 28.9 | 2.33x | 405 | 172 | 2.36x | -| L3.1-70b | 4 | 0 | real | 70 | 76.4 | 33.5 | 2.28x | 471 | 199 | 2.37x | -| L3.1-70b | 4 | 2 | none | 10 | 42.7 | 17.2 | 2.48x | 183 | 70 | 2.60x | -| L3.1-70b | 4 | 2 | none | 20 | 61.0 | 25.5 | 2.39x | 299 | 136 | 2.20x | -| L3.1-70b | 4 | 2 | none | 30 | 54.6 | 33.7 | 1.62x | 306 | 168 | 1.82x | -| L3.1-70b | 4 | 2 | none | 40 | 78.9 | 40.5 | 1.95x | 337 | 178 | 1.89x | -| L3.1-70b | 4 | 2 | none | 50 | 83.0 | 32.8 | 2.53x | 346 | 181 | 1.91x | -| L3.1-70b | 4 | 2 | none | 60 | 73.7 | 38.8 | 1.90x | 357 | 174 | 2.05x | -| L3.1-70b | 4 | 2 | none | 70 | 95.3 | 43.4 | 2.19x | 407 | 221 | 1.84x | -| L3.1-70b | 4 | 2 | real | 10 | 40.1 | 22.3 | 1.80x | 141 | 81 | 1.74x | -| L3.1-70b | 4 | 2 | real | 20 | 76.4 | 34.8 | 2.20x | 272 | 141 | 1.93x | -| L3.1-70b | 4 | 2 | real | 30 | 69.9 | 34.7 | 2.02x | 290 | 152 | 1.90x | -| L3.1-70b | 4 | 2 | real | 40 | 67.6 | 35.1 | 1.93x | 285 | 167 | 1.71x | -| L3.1-70b | 4 | 2 | real | 50 | 74.9 | 32.5 | 2.31x | 321 | 175 | 1.84x | -| L3.1-70b | 4 | 2 | real | 60 | 66.0 | 44.0 | 1.50x | 353 | 197 | 1.79x | -| L3.1-70b | 4 | 2 | real | 70 | 91.7 | 37.5 | 2.44x | 389 | 198 | 1.96x | -| L3.1-70b | 4 | 4 | none | 10 | 40.1 | 16.9 | 2.37x | 212 | 75 | 2.84x | -| L3.1-70b | 4 | 4 | none | 20 | 94.1 | 29.0 | 3.25x | 331 | 127 | 2.60x | -| L3.1-70b | 4 | 4 | none | 30 | 41.5 | 31.3 | 1.33x | 335 | 151 | 2.22x | -| L3.1-70b | 4 | 4 | none | 40 | 40.5 | 26.9 | 1.51x | 327 | 180 | 1.82x | -| L3.1-70b | 4 | 4 | none | 50 | 92.1 | 30.7 | 3.01x | 399 | 193 | 2.07x | -| L3.1-70b | 4 | 4 | none | 60 | 61.6 | 28.3 | 2.17x | 384 | 191 | 2.01x | -| L3.1-70b | 4 | 4 | none | 70 | 87.3 | 38.9 | 2.25x | 433 | 211 | 2.06x | -| L3.1-70b | 4 | 4 | real | 10 | 44.7 | 16.5 | 2.72x | 152 | 63 | 2.40x | -| L3.1-70b | 4 | 4 | real | 20 | 84.8 | 28.7 | 2.95x | 294 | 127 | 2.31x | -| L3.1-70b | 4 | 4 | real | 30 | 54.5 | 25.5 | 2.13x | 311 | 144 | 2.16x | -| L3.1-70b | 4 | 4 | real | 40 | 46.3 | 34.3 | 1.35x | 318 | 181 | 1.76x | -| L3.1-70b | 4 | 4 | real | 50 | 73.7 | 34.1 | 2.16x | 355 | 167 | 2.12x | -| L3.1-70b | 4 | 4 | real | 60 | 72.6 | 49.8 | 1.46x | 366 | 187 | 1.96x | -| L3.1-70b | 4 | 4 | real | 70 | 101.8 | 44.3 | 2.30x | 441 | 224 | 1.97x | -| L3.1-8b | 0 | 0 | none | 50 | 14.4 | 5.6 | 2.57x | 650 | 251 | 2.58x | -| L3.1-8b | 0 | 0 | none | 100 | 10.2 | 3.6 | 2.82x | 958 | 388 | 2.47x | -| L3.1-8b | 0 | 0 | none | 150 | 9.0 | 2.6 | 3.39x | 1222 | 458 | 2.67x | -| L3.1-8b | 0 | 0 | none | 200 | 7.5 | 2.1 | 3.57x | 1367 | 506 | 2.70x | -| L3.1-8b | 0 | 0 | real | 50 | 18.9 | 6.8 | 2.79x | 553 | 248 | 2.23x | -| L3.1-8b | 0 | 0 | real | 100 | 11.5 | 3.9 | 2.93x | 817 | 372 | 2.19x | -| L3.1-8b | 0 | 0 | real | 150 | 9.9 | 2.8 | 3.56x | 1076 | 430 | 2.50x | -| L3.1-8b | 0 | 0 | real | 200 | 8.0 | 2.2 | 3.68x | 1204 | 483 | 2.49x | -| L3.1-8b | 0 | 2 | none | 50 | 16.4 | 13.9 | 1.18x | 633 | 259 | 2.44x | -| L3.1-8b | 0 | 2 | none | 100 | 12.8 | 13.9 | 0.92x | 889 | 347 | 2.56x | -| L3.1-8b | 0 | 2 | none | 150 | 13.3 | 18.6 | 0.72x | 1120 | 438 | 2.55x | -| L3.1-8b | 0 | 2 | none | 200 | 14.2 | 21.3 | 0.67x | 1156 | 488 | 2.37x | -| L3.1-8b | 0 | 2 | real | 50 | 20.0 | 17.0 | 1.18x | 562 | 217 | 2.59x | -| L3.1-8b | 0 | 2 | real | 100 | 15.5 | 13.8 | 1.12x | 880 | 315 | 2.80x | -| L3.1-8b | 0 | 2 | real | 150 | 13.6 | 20.9 | 0.65x | 1072 | 429 | 2.50x | -| L3.1-8b | 0 | 2 | real | 200 | 14.0 | 18.7 | 0.75x | 1131 | 484 | 2.34x | -| L3.1-8b | 0 | 4 | none | 50 | 15.8 | 11.3 | 1.40x | 689 | 264 | 2.61x | -| L3.1-8b | 0 | 4 | none | 100 | 11.5 | 10.9 | 1.05x | 980 | 365 | 2.68x | -| L3.1-8b | 0 | 4 | none | 150 | 10.6 | 14.9 | 0.71x | 1246 | 441 | 2.82x | -| L3.1-8b | 0 | 4 | none | 200 | 9.5 | 14.8 | 0.64x | 1376 | 484 | 2.84x | -| L3.1-8b | 0 | 4 | real | 50 | 19.4 | 11.7 | 1.66x | 573 | 222 | 2.58x | -| L3.1-8b | 0 | 4 | real | 100 | 12.7 | 10.8 | 1.18x | 844 | 330 | 2.56x | -| L3.1-8b | 0 | 4 | real | 150 | 11.7 | 13.6 | 0.86x | 1099 | 405 | 2.71x | -| L3.1-8b | 0 | 4 | real | 200 | 10.4 | 14.1 | 0.73x | 1275 | 474 | 2.69x | -| L3.1-8b | 4 | 0 | none | 50 | 236.4 | 111.0 | 2.13x | 1037 | 521 | 1.99x | -| L3.1-8b | 4 | 0 | none | 100 | 246.8 | 98.1 | 2.52x | 1269 | 620 | 2.05x | -| L3.1-8b | 4 | 0 | none | 150 | 257.7 | 89.7 | 2.87x | 1267 | 670 | 1.89x | -| L3.1-8b | 4 | 0 | none | 200 | 177.4 | 91.3 | 1.94x | 1402 | 763 | 1.84x | -| L3.1-8b | 4 | 0 | real | 50 | 261.3 | 107.4 | 2.43x | 905 | 472 | 1.92x | -| L3.1-8b | 4 | 0 | real | 100 | 257.5 | 94.3 | 2.73x | 1190 | 580 | 2.05x | -| L3.1-8b | 4 | 0 | real | 150 | 262.4 | 95.0 | 2.76x | 1232 | 628 | 1.96x | -| L3.1-8b | 4 | 0 | real | 200 | 188.8 | 88.2 | 2.14x | 1340 | 786 | 1.71x | -| L3.1-8b | 4 | 2 | none | 50 | 285.6 | 122.8 | 2.33x | 880 | 433 | 2.03x | -| L3.1-8b | 4 | 2 | none | 100 | 341.6 | 147.3 | 2.32x | 1060 | 575 | 1.84x | -| L3.1-8b | 4 | 2 | none | 150 | 394.7 | 182.9 | 2.16x | 1155 | 613 | 1.88x | -| L3.1-8b | 4 | 2 | none | 200 | 388.5 | 174.9 | 2.22x | 1198 | 663 | 1.81x | -| L3.1-8b | 4 | 2 | real | 50 | 314.8 | 132.0 | 2.39x | 892 | 443 | 2.01x | -| L3.1-8b | 4 | 2 | real | 100 | 315.3 | 156.8 | 2.01x | 995 | 556 | 1.79x | -| L3.1-8b | 4 | 2 | real | 150 | 367.9 | 162.4 | 2.27x | 1047 | 595 | 1.76x | -| L3.1-8b | 4 | 2 | real | 200 | 382.5 | 182.5 | 2.10x | 1121 | 640 | 1.75x | -| L3.1-8b | 4 | 4 | none | 50 | 301.9 | 119.8 | 2.52x | 904 | 446 | 2.03x | -| L3.1-8b | 4 | 4 | none | 100 | 311.8 | 142.4 | 2.19x | 1048 | 538 | 1.95x | -| L3.1-8b | 4 | 4 | none | 150 | 372.2 | 144.9 | 2.57x | 1160 | 603 | 1.92x | -| L3.1-8b | 4 | 4 | none | 200 | 382.4 | 161.4 | 2.37x | 1240 | 671 | 1.85x | -| L3.1-8b | 4 | 4 | real | 50 | 302.9 | 121.3 | 2.50x | 832 | 412 | 2.02x | -| L3.1-8b | 4 | 4 | real | 100 | 323.4 | 143.3 | 2.26x | 1027 | 554 | 1.86x | -| L3.1-8b | 4 | 4 | real | 150 | 347.3 | 171.6 | 2.02x | 1083 | 633 | 1.71x | -| L3.1-8b | 4 | 4 | real | 200 | 379.3 | 159.7 | 2.37x | 1191 | 653 | 1.82x | -| M-7b | 0 | 0 | none | 50 | 14.2 | 6.2 | 2.30x | 632 | 300 | 2.11x | -| M-7b | 0 | 0 | none | 100 | 10.1 | 3.5 | 2.88x | 942 | 366 | 2.57x | -| M-7b | 0 | 0 | none | 150 | 9.2 | 2.7 | 3.42x | 1229 | 470 | 2.61x | -| M-7b | 0 | 0 | none | 200 | 7.5 | 2.0 | 3.80x | 1357 | 474 | 2.86x | -| M-7b | 0 | 0 | real | 50 | 18.3 | 6.5 | 2.81x | 553 | 246 | 2.25x | -| M-7b | 0 | 0 | real | 100 | 10.9 | 4.0 | 2.73x | 813 | 352 | 2.31x | -| M-7b | 0 | 0 | real | 150 | 9.7 | 2.8 | 3.50x | 1072 | 418 | 2.56x | -| M-7b | 0 | 0 | real | 200 | 8.3 | 2.3 | 3.56x | 1250 | 530 | 2.36x | -| M-7b | 0 | 2 | none | 50 | 15.7 | 13.1 | 1.20x | 629 | 261 | 2.41x | -| M-7b | 0 | 2 | none | 100 | 12.8 | 13.2 | 0.97x | 922 | 318 | 2.90x | -| M-7b | 0 | 2 | none | 150 | 13.4 | 18.3 | 0.73x | 1129 | 435 | 2.60x | -| M-7b | 0 | 2 | none | 200 | 15.0 | 15.1 | 0.99x | 1215 | 499 | 2.43x | -| M-7b | 0 | 2 | real | 50 | 20.6 | 15.0 | 1.37x | 558 | 248 | 2.25x | -| M-7b | 0 | 2 | real | 100 | 14.3 | 13.6 | 1.06x | 864 | 372 | 2.33x | -| M-7b | 0 | 2 | real | 150 | 14.6 | 21.1 | 0.69x | 1014 | 413 | 2.45x | -| M-7b | 0 | 2 | real | 200 | 13.0 | 20.6 | 0.63x | 1225 | 463 | 2.64x | -| M-7b | 0 | 4 | none | 50 | 14.0 | 11.0 | 1.28x | 619 | 267 | 2.32x | -| M-7b | 0 | 4 | none | 100 | 10.4 | 11.5 | 0.90x | 911 | 387 | 2.35x | -| M-7b | 0 | 4 | none | 150 | 10.6 | 14.8 | 0.71x | 1210 | 420 | 2.88x | -| M-7b | 0 | 4 | none | 200 | 9.3 | 13.6 | 0.68x | 1348 | 494 | 2.73x | -| M-7b | 0 | 4 | real | 50 | 19.0 | 12.8 | 1.48x | 552 | 224 | 2.46x | -| M-7b | 0 | 4 | real | 100 | 13.2 | 11.9 | 1.11x | 863 | 323 | 2.67x | -| M-7b | 0 | 4 | real | 150 | 11.7 | 16.0 | 0.73x | 1111 | 444 | 2.50x | -| M-7b | 0 | 4 | real | 200 | 10.1 | 12.0 | 0.84x | 1263 | 461 | 2.74x | -| M-7b | 4 | 0 | none | 50 | 241.3 | 105.0 | 2.30x | 973 | 499 | 1.95x | -| M-7b | 4 | 0 | none | 100 | 244.3 | 98.5 | 2.48x | 1176 | 625 | 1.88x | -| M-7b | 4 | 0 | none | 150 | 246.2 | 95.6 | 2.57x | 1264 | 693 | 1.82x | -| M-7b | 4 | 0 | none | 200 | 142.8 | 96.2 | 1.48x | 1416 | 763 | 1.86x | -| M-7b | 4 | 0 | real | 50 | 262.5 | 98.2 | 2.67x | 937 | 480 | 1.95x | -| M-7b | 4 | 0 | real | 100 | 225.1 | 94.2 | 2.39x | 1076 | 564 | 1.91x | -| M-7b | 4 | 0 | real | 150 | 243.2 | 101.1 | 2.41x | 1206 | 689 | 1.75x | -| M-7b | 4 | 0 | real | 200 | 197.7 | 79.9 | 2.47x | 1323 | 735 | 1.80x | -| M-7b | 4 | 2 | none | 50 | 299.7 | 130.8 | 2.29x | 822 | 432 | 1.90x | -| M-7b | 4 | 2 | none | 100 | 339.4 | 148.3 | 2.29x | 1040 | 542 | 1.92x | -| M-7b | 4 | 2 | none | 150 | 376.9 | 164.4 | 2.29x | 1144 | 622 | 1.84x | -| M-7b | 4 | 2 | none | 200 | 383.0 | 152.7 | 2.51x | 1177 | 652 | 1.80x | -| M-7b | 4 | 2 | real | 50 | 290.4 | 128.1 | 2.27x | 820 | 436 | 1.88x | -| M-7b | 4 | 2 | real | 100 | 318.1 | 157.3 | 2.02x | 995 | 562 | 1.77x | -| M-7b | 4 | 2 | real | 150 | 359.9 | 162.9 | 2.21x | 1059 | 593 | 1.79x | -| M-7b | 4 | 2 | real | 200 | 375.3 | 177.8 | 2.11x | 1091 | 631 | 1.73x | -| M-7b | 4 | 4 | none | 50 | 300.3 | 128.2 | 2.34x | 901 | 447 | 2.02x | -| M-7b | 4 | 4 | none | 100 | 326.1 | 139.5 | 2.34x | 1081 | 544 | 1.99x | -| M-7b | 4 | 4 | none | 150 | 368.6 | 155.1 | 2.38x | 1120 | 624 | 1.79x | -| M-7b | 4 | 4 | none | 200 | 366.3 | 164.7 | 2.22x | 1169 | 676 | 1.73x | -| M-7b | 4 | 4 | real | 50 | 296.7 | 137.5 | 2.16x | 880 | 453 | 1.94x | -| M-7b | 4 | 4 | real | 100 | 321.0 | 133.9 | 2.40x | 1028 | 514 | 2.00x | -| M-7b | 4 | 4 | real | 150 | 358.9 | 169.3 | 2.12x | 1094 | 622 | 1.76x | -| M-7b | 4 | 4 | real | 200 | 370.3 | 172.3 | 2.15x | 1130 | 680 | 1.66x | - -### C.6 Full Latency Comparison (P95/P99) - -Storage Latency P95 and P99 in milliseconds. Ratio is Slow/Fast (higher = Fast is better). - -| Model | CPU | MCA | Gen | Users | P95 Fast | P95 Slow | Ratio | P99 Fast | P99 Slow | Ratio | -|-------|-----|-----|-----|-------|----------|----------|-------|----------|----------|-------| -| L2-7b | 0 | 0 | none | 50 | 126,053 | 107,567 | 0.85x | 146,671 | 141,271 | 0.96x | -| L2-7b | 0 | 0 | none | 100 | 171,523 | 217,813 | 1.27x | 194,160 | 225,534 | 1.16x | -| L2-7b | 0 | 0 | none | 150 | 163,169 | 255,382 | 1.57x | 191,553 | 301,531 | 1.57x | -| L2-7b | 0 | 0 | real | 50 | 100,274 | 101,189 | 1.01x | 136,628 | 137,796 | 1.01x | -| L2-7b | 0 | 0 | real | 100 | 127,340 | 176,743 | 1.39x | 151,488 | 201,622 | 1.33x | -| L2-7b | 0 | 2 | none | 50 | 51,092 | 84,519 | 1.65x | 107,691 | 108,434 | 1.01x | -| L2-7b | 0 | 2 | none | 100 | 83,556 | 82,809 | 0.99x | 119,084 | 116,474 | 0.98x | -| L2-7b | 0 | 2 | none | 150 | 60,461 | 74,926 | 1.24x | 96,887 | 132,093 | 1.36x | -| L2-7b | 0 | 2 | none | 200 | 94,552 | 92,269 | 0.98x | 142,965 | 183,066 | 1.28x | -| L2-7b | 0 | 2 | real | 50 | 53,065 | 41,156 | 0.78x | 72,089 | 104,230 | 1.45x | -| L2-7b | 0 | 2 | real | 100 | 86,404 | 73,585 | 0.85x | 117,802 | 159,720 | 1.36x | -| L2-7b | 0 | 2 | real | 150 | 72,543 | 68,463 | 0.94x | 111,722 | 109,247 | 0.98x | -| L2-7b | 0 | 2 | real | 200 | 81,298 | 70,129 | 0.86x | 113,189 | 112,098 | 0.99x | -| L2-7b | 0 | 4 | none | 50 | 77,034 | 51,108 | 0.66x | 128,468 | 116,349 | 0.91x | -| L2-7b | 0 | 4 | none | 100 | 110,298 | 110,670 | 1.00x | 148,669 | 156,568 | 1.05x | -| L2-7b | 0 | 4 | none | 150 | 105,661 | 78,928 | 0.75x | 156,188 | 140,823 | 0.90x | -| L2-7b | 0 | 4 | none | 200 | 101,258 | 74,503 | 0.74x | 166,598 | 130,704 | 0.78x | -| L2-7b | 0 | 4 | real | 50 | 73,110 | 41,690 | 0.57x | 111,707 | 104,098 | 0.93x | -| L2-7b | 0 | 4 | real | 100 | 106,789 | 76,710 | 0.72x | 157,919 | 122,094 | 0.77x | -| L2-7b | 0 | 4 | real | 150 | 115,919 | 74,937 | 0.65x | 154,103 | 129,595 | 0.84x | -| L2-7b | 0 | 4 | real | 200 | 147,896 | 70,939 | 0.48x | 181,924 | 136,735 | 0.75x | -| L2-7b | 4 | 0 | none | 50 | 24,705 | 25,892 | 1.05x | 70,246 | 72,500 | 1.03x | -| L2-7b | 4 | 0 | real | 50 | 11,045 | 12,964 | 1.17x | 46,978 | 23,637 | 0.50x | -| L2-7b | 4 | 0 | real | 100 | 22,431 | 70,990 | 3.16x | 30,335 | 89,519 | 2.95x | -| L2-7b | 4 | 2 | none | 50 | 19,792 | 24,864 | 1.26x | 42,842 | 81,774 | 1.91x | -| L2-7b | 4 | 2 | none | 100 | 15,705 | 31,814 | 2.03x | 38,190 | 47,962 | 1.26x | -| L2-7b | 4 | 2 | none | 150 | 6,899 | 19,727 | 2.86x | 25,619 | 80,225 | 3.13x | -| L2-7b | 4 | 2 | none | 200 | 7,553 | 23,851 | 3.16x | 21,802 | 72,543 | 3.33x | -| L2-7b | 4 | 2 | real | 50 | 18,268 | 38,139 | 2.09x | 32,195 | 63,093 | 1.96x | -| L2-7b | 4 | 2 | real | 100 | 16,177 | 47,790 | 2.95x | 27,660 | 67,792 | 2.45x | -| L2-7b | 4 | 2 | real | 150 | 6,948 | 16,007 | 2.30x | 24,224 | 30,689 | 1.27x | -| L2-7b | 4 | 2 | real | 200 | 6,426 | 26,240 | 4.08x | 26,270 | 79,034 | 3.01x | -| L2-7b | 4 | 4 | none | 50 | 9,741 | 19,592 | 2.01x | 35,441 | 64,972 | 1.83x | -| L2-7b | 4 | 4 | none | 100 | 16,744 | 34,869 | 2.08x | 30,762 | 69,112 | 2.25x | -| L2-7b | 4 | 4 | none | 150 | 12,761 | 34,214 | 2.68x | 34,752 | 64,567 | 1.86x | -| L2-7b | 4 | 4 | none | 200 | 10,973 | 26,780 | 2.44x | 24,914 | 76,998 | 3.09x | -| L2-7b | 4 | 4 | real | 50 | 12,098 | 23,410 | 1.94x | 25,564 | 53,124 | 2.08x | -| L2-7b | 4 | 4 | real | 100 | 8,134 | 32,044 | 3.94x | 17,832 | 76,658 | 4.30x | -| L2-7b | 4 | 4 | real | 150 | 5,552 | 18,191 | 3.28x | 10,997 | 58,577 | 5.33x | -| L2-7b | 4 | 4 | real | 200 | 5,499 | 20,543 | 3.74x | 12,920 | 39,176 | 3.03x | -| L3.1-70b | 0 | 0 | none | 10 | 39,772 | 72,593 | 1.83x | 50,033 | 99,303 | 1.98x | -| L3.1-70b | 0 | 0 | none | 20 | 56,456 | 77,525 | 1.37x | 73,927 | 105,140 | 1.42x | -| L3.1-70b | 0 | 0 | none | 30 | 69,930 | 52,775 | 0.75x | 101,387 | 95,203 | 0.94x | -| L3.1-70b | 0 | 0 | none | 40 | 78,120 | 71,301 | 0.91x | 109,868 | 131,851 | 1.20x | -| L3.1-70b | 0 | 0 | none | 50 | 92,720 | 90,681 | 0.98x | 134,924 | 130,691 | 0.97x | -| L3.1-70b | 0 | 0 | none | 60 | 92,570 | 141,969 | 1.53x | 145,001 | 189,636 | 1.31x | -| L3.1-70b | 0 | 0 | none | 70 | 85,310 | 119,439 | 1.40x | 141,675 | 161,085 | 1.14x | -| L3.1-70b | 0 | 0 | real | 10 | 26,094 | 53,350 | 2.04x | 40,257 | 66,861 | 1.66x | -| L3.1-70b | 0 | 0 | real | 20 | 39,775 | 88,050 | 2.21x | 70,204 | 114,882 | 1.64x | -| L3.1-70b | 0 | 0 | real | 30 | 66,575 | 85,528 | 1.28x | 83,587 | 125,131 | 1.50x | -| L3.1-70b | 0 | 0 | real | 40 | 74,307 | 83,138 | 1.12x | 116,252 | 141,274 | 1.22x | -| L3.1-70b | 0 | 0 | real | 50 | 83,497 | 113,614 | 1.36x | 118,504 | 135,372 | 1.14x | -| L3.1-70b | 0 | 0 | real | 60 | 66,405 | 127,597 | 1.92x | 109,696 | 138,739 | 1.26x | -| L3.1-70b | 0 | 0 | real | 70 | 78,909 | 114,198 | 1.45x | 124,244 | 142,043 | 1.14x | -| L3.1-70b | 0 | 2 | none | 10 | 41,742 | 44,906 | 1.08x | 52,360 | 87,509 | 1.67x | -| L3.1-70b | 0 | 2 | none | 20 | 42,875 | 53,085 | 1.24x | 81,905 | 85,139 | 1.04x | -| L3.1-70b | 0 | 2 | none | 30 | 69,150 | 50,662 | 0.73x | 89,793 | 101,417 | 1.13x | -| L3.1-70b | 0 | 2 | none | 40 | 72,229 | 47,511 | 0.66x | 110,848 | 71,921 | 0.65x | -| L3.1-70b | 0 | 2 | none | 50 | 78,623 | 43,628 | 0.55x | 115,761 | 99,622 | 0.86x | -| L3.1-70b | 0 | 2 | none | 60 | 74,936 | 50,867 | 0.68x | 138,365 | 78,721 | 0.57x | -| L3.1-70b | 0 | 2 | none | 70 | 58,808 | 46,789 | 0.80x | 95,596 | 88,732 | 0.93x | -| L3.1-70b | 0 | 2 | real | 10 | 39,577 | 75,812 | 1.92x | 47,236 | 87,958 | 1.86x | -| L3.1-70b | 0 | 2 | real | 20 | 41,846 | 56,234 | 1.34x | 60,235 | 79,475 | 1.32x | -| L3.1-70b | 0 | 2 | real | 30 | 67,221 | 69,492 | 1.03x | 89,867 | 108,025 | 1.20x | -| L3.1-70b | 0 | 2 | real | 40 | 62,435 | 45,986 | 0.74x | 86,996 | 91,180 | 1.05x | -| L3.1-70b | 0 | 2 | real | 50 | 61,753 | 35,693 | 0.58x | 107,411 | 107,379 | 1.00x | -| L3.1-70b | 0 | 2 | real | 60 | 61,300 | 66,016 | 1.08x | 108,627 | 90,452 | 0.83x | -| L3.1-70b | 0 | 2 | real | 70 | 52,953 | 62,881 | 1.19x | 76,847 | 105,067 | 1.37x | -| L3.1-70b | 0 | 4 | none | 10 | 39,588 | 58,097 | 1.47x | 58,556 | 90,261 | 1.54x | -| L3.1-70b | 0 | 4 | none | 20 | 59,061 | 61,760 | 1.05x | 81,936 | 74,527 | 0.91x | -| L3.1-70b | 0 | 4 | none | 30 | 77,929 | 59,715 | 0.77x | 110,445 | 108,223 | 0.98x | -| L3.1-70b | 0 | 4 | none | 40 | 95,181 | 57,939 | 0.61x | 129,170 | 93,851 | 0.73x | -| L3.1-70b | 0 | 4 | none | 50 | 72,227 | 63,542 | 0.88x | 106,786 | 82,604 | 0.77x | -| L3.1-70b | 0 | 4 | none | 60 | 83,257 | 63,972 | 0.77x | 140,604 | 101,628 | 0.72x | -| L3.1-70b | 0 | 4 | none | 70 | 77,517 | 61,332 | 0.79x | 117,202 | 118,432 | 1.01x | -| L3.1-70b | 0 | 4 | real | 10 | 27,971 | 65,329 | 2.34x | 34,230 | 85,975 | 2.51x | -| L3.1-70b | 0 | 4 | real | 20 | 49,476 | 73,378 | 1.48x | 75,114 | 114,205 | 1.52x | -| L3.1-70b | 0 | 4 | real | 30 | 65,692 | 74,991 | 1.14x | 105,259 | 111,015 | 1.05x | -| L3.1-70b | 0 | 4 | real | 40 | 69,803 | 64,313 | 0.92x | 115,247 | 112,385 | 0.98x | -| L3.1-70b | 0 | 4 | real | 50 | 66,619 | 46,173 | 0.69x | 113,501 | 110,547 | 0.97x | -| L3.1-70b | 0 | 4 | real | 60 | 68,568 | 75,260 | 1.10x | 108,814 | 89,690 | 0.82x | -| L3.1-70b | 0 | 4 | real | 70 | 68,670 | 68,753 | 1.00x | 95,000 | 96,418 | 1.01x | -| L3.1-70b | 4 | 0 | none | 10 | 18,621 | 26,616 | 1.43x | 27,295 | 38,894 | 1.42x | -| L3.1-70b | 4 | 0 | none | 20 | 25,363 | 20,933 | 0.83x | 49,911 | 56,987 | 1.14x | -| L3.1-70b | 4 | 0 | none | 30 | 42,006 | 15,059 | 0.36x | 70,099 | 24,190 | 0.35x | -| L3.1-70b | 4 | 0 | none | 40 | 8,481 | 23,043 | 2.72x | 69,948 | 34,144 | 0.49x | -| L3.1-70b | 4 | 0 | none | 50 | 9,432 | 19,211 | 2.04x | 31,195 | 23,953 | 0.77x | -| L3.1-70b | 4 | 0 | none | 60 | 10,776 | 15,768 | 1.46x | 17,880 | 20,447 | 1.14x | -| L3.1-70b | 4 | 0 | none | 70 | 7,808 | 13,890 | 1.78x | 15,211 | 30,051 | 1.98x | -| L3.1-70b | 4 | 0 | real | 10 | 10,068 | 23,480 | 2.33x | 19,939 | 66,801 | 3.35x | -| L3.1-70b | 4 | 0 | real | 20 | 25,212 | 16,001 | 0.63x | 42,928 | 63,989 | 1.49x | -| L3.1-70b | 4 | 0 | real | 30 | 20,896 | 10,056 | 0.48x | 46,604 | 40,472 | 0.87x | -| L3.1-70b | 4 | 0 | real | 40 | 40,551 | 11,708 | 0.29x | 86,182 | 22,854 | 0.27x | -| L3.1-70b | 4 | 0 | real | 50 | 11,220 | 11,096 | 0.99x | 20,322 | 25,646 | 1.26x | -| L3.1-70b | 4 | 0 | real | 60 | 8,802 | 17,552 | 1.99x | 21,477 | 25,438 | 1.18x | -| L3.1-70b | 4 | 0 | real | 70 | 10,313 | 19,842 | 1.92x | 12,531 | 23,699 | 1.89x | -| L3.1-70b | 4 | 2 | none | 10 | 14,824 | 44,308 | 2.99x | 27,852 | 54,118 | 1.94x | -| L3.1-70b | 4 | 2 | none | 20 | 10,953 | 22,262 | 2.03x | 32,714 | 71,876 | 2.20x | -| L3.1-70b | 4 | 2 | none | 30 | 17,700 | 28,244 | 1.60x | 28,577 | 47,737 | 1.67x | -| L3.1-70b | 4 | 2 | none | 40 | 9,794 | 16,332 | 1.67x | 19,272 | 32,296 | 1.68x | -| L3.1-70b | 4 | 2 | none | 50 | 6,815 | 24,079 | 3.53x | 21,447 | 50,942 | 2.38x | -| L3.1-70b | 4 | 2 | none | 60 | 10,416 | 18,093 | 1.74x | 22,892 | 38,041 | 1.66x | -| L3.1-70b | 4 | 2 | none | 70 | 5,788 | 13,611 | 2.35x | 19,857 | 34,116 | 1.72x | -| L3.1-70b | 4 | 2 | real | 10 | 22,558 | 35,933 | 1.59x | 26,227 | 45,033 | 1.72x | -| L3.1-70b | 4 | 2 | real | 20 | 9,875 | 19,222 | 1.95x | 18,513 | 52,059 | 2.81x | -| L3.1-70b | 4 | 2 | real | 30 | 11,102 | 21,327 | 1.92x | 22,466 | 49,177 | 2.19x | -| L3.1-70b | 4 | 2 | real | 40 | 12,398 | 17,146 | 1.38x | 21,547 | 56,299 | 2.61x | -| L3.1-70b | 4 | 2 | real | 50 | 10,912 | 24,257 | 2.22x | 19,920 | 44,632 | 2.24x | -| L3.1-70b | 4 | 2 | real | 60 | 12,599 | 12,074 | 0.96x | 26,744 | 33,301 | 1.25x | -| L3.1-70b | 4 | 2 | real | 70 | 6,993 | 22,334 | 3.19x | 17,392 | 36,452 | 2.10x | -| L3.1-70b | 4 | 4 | none | 10 | 26,246 | 38,984 | 1.49x | 33,074 | 79,629 | 2.41x | -| L3.1-70b | 4 | 4 | none | 20 | 6,399 | 19,532 | 3.05x | 15,402 | 49,841 | 3.24x | -| L3.1-70b | 4 | 4 | none | 30 | 24,833 | 11,285 | 0.45x | 41,311 | 62,691 | 1.52x | -| L3.1-70b | 4 | 4 | none | 40 | 22,829 | 25,556 | 1.12x | 37,162 | 69,396 | 1.87x | -| L3.1-70b | 4 | 4 | none | 50 | 4,762 | 22,910 | 4.81x | 22,666 | 54,038 | 2.38x | -| L3.1-70b | 4 | 4 | none | 60 | 13,052 | 23,268 | 1.78x | 25,103 | 69,245 | 2.76x | -| L3.1-70b | 4 | 4 | none | 70 | 7,340 | 12,482 | 1.70x | 18,293 | 36,462 | 1.99x | -| L3.1-70b | 4 | 4 | real | 10 | 16,989 | 42,741 | 2.52x | 26,694 | 64,059 | 2.40x | -| L3.1-70b | 4 | 4 | real | 20 | 7,771 | 20,400 | 2.63x | 15,628 | 54,857 | 3.51x | -| L3.1-70b | 4 | 4 | real | 30 | 14,200 | 33,029 | 2.33x | 33,010 | 65,123 | 1.97x | -| L3.1-70b | 4 | 4 | real | 40 | 17,018 | 19,716 | 1.16x | 41,566 | 37,455 | 0.90x | -| L3.1-70b | 4 | 4 | real | 50 | 9,634 | 19,760 | 2.05x | 20,394 | 41,191 | 2.02x | -| L3.1-70b | 4 | 4 | real | 60 | 9,849 | 7,001 | 0.71x | 22,128 | 33,429 | 1.51x | -| L3.1-70b | 4 | 4 | real | 70 | 5,711 | 11,101 | 1.94x | 14,660 | 37,216 | 2.54x | -| L3.1-8b | 0 | 0 | none | 50 | 48,787 | 80,674 | 1.65x | 71,953 | 131,497 | 1.83x | -| L3.1-8b | 0 | 0 | none | 100 | 54,746 | 122,893 | 2.24x | 88,999 | 168,486 | 1.89x | -| L3.1-8b | 0 | 0 | none | 150 | 56,329 | 133,335 | 2.37x | 88,926 | 164,240 | 1.85x | -| L3.1-8b | 0 | 0 | none | 200 | 65,862 | 183,175 | 2.78x | 102,496 | 206,836 | 2.02x | -| L3.1-8b | 0 | 0 | real | 50 | 32,055 | 70,552 | 2.20x | 52,164 | 98,343 | 1.89x | -| L3.1-8b | 0 | 0 | real | 100 | 52,169 | 101,818 | 1.95x | 83,336 | 166,879 | 2.00x | -| L3.1-8b | 0 | 0 | real | 150 | 52,031 | 133,832 | 2.57x | 87,860 | 166,331 | 1.89x | -| L3.1-8b | 0 | 0 | real | 200 | 63,407 | 169,624 | 2.68x | 90,766 | 205,095 | 2.26x | -| L3.1-8b | 0 | 2 | none | 50 | 40,924 | 46,491 | 1.14x | 67,987 | 69,594 | 1.02x | -| L3.1-8b | 0 | 2 | none | 100 | 51,356 | 44,369 | 0.86x | 75,951 | 69,047 | 0.91x | -| L3.1-8b | 0 | 2 | none | 150 | 41,426 | 31,423 | 0.76x | 73,576 | 55,072 | 0.75x | -| L3.1-8b | 0 | 2 | none | 200 | 37,076 | 23,380 | 0.63x | 75,335 | 41,899 | 0.56x | -| L3.1-8b | 0 | 2 | real | 50 | 34,768 | 34,644 | 1.00x | 53,463 | 54,501 | 1.02x | -| L3.1-8b | 0 | 2 | real | 100 | 37,932 | 41,320 | 1.09x | 64,883 | 59,873 | 0.92x | -| L3.1-8b | 0 | 2 | real | 150 | 38,782 | 20,183 | 0.52x | 72,169 | 41,225 | 0.57x | -| L3.1-8b | 0 | 2 | real | 200 | 39,876 | 28,396 | 0.71x | 66,173 | 47,857 | 0.72x | -| L3.1-8b | 0 | 4 | none | 50 | 42,958 | 58,219 | 1.36x | 64,904 | 86,216 | 1.33x | -| L3.1-8b | 0 | 4 | none | 100 | 51,760 | 54,284 | 1.05x | 92,535 | 79,855 | 0.86x | -| L3.1-8b | 0 | 4 | none | 150 | 50,914 | 35,169 | 0.69x | 76,514 | 62,035 | 0.81x | -| L3.1-8b | 0 | 4 | none | 200 | 56,922 | 29,161 | 0.51x | 93,638 | 68,390 | 0.73x | -| L3.1-8b | 0 | 4 | real | 50 | 32,718 | 52,232 | 1.60x | 48,763 | 82,305 | 1.69x | -| L3.1-8b | 0 | 4 | real | 100 | 51,295 | 52,832 | 1.03x | 79,244 | 77,447 | 0.98x | -| L3.1-8b | 0 | 4 | real | 150 | 44,350 | 37,530 | 0.85x | 66,535 | 70,188 | 1.05x | -| L3.1-8b | 0 | 4 | real | 200 | 50,855 | 32,524 | 0.64x | 77,654 | 57,823 | 0.74x | -| L3.1-8b | 4 | 0 | none | 50 | 2,361 | 4,970 | 2.11x | 5,657 | 8,064 | 1.43x | -| L3.1-8b | 4 | 0 | none | 100 | 2,585 | 6,807 | 2.63x | 5,189 | 12,264 | 2.36x | -| L3.1-8b | 4 | 0 | none | 150 | 2,077 | 8,572 | 4.13x | 5,201 | 15,422 | 2.96x | -| L3.1-8b | 4 | 0 | none | 200 | 2,622 | 8,418 | 3.21x | 13,280 | 17,088 | 1.29x | -| L3.1-8b | 4 | 0 | real | 50 | 2,203 | 4,700 | 2.13x | 4,022 | 11,216 | 2.79x | -| L3.1-8b | 4 | 0 | real | 100 | 2,157 | 6,946 | 3.22x | 3,837 | 11,543 | 3.01x | -| L3.1-8b | 4 | 0 | real | 150 | 1,860 | 7,057 | 3.79x | 5,114 | 13,572 | 2.65x | -| L3.1-8b | 4 | 0 | real | 200 | 2,401 | 8,829 | 3.68x | 9,875 | 17,155 | 1.74x | -| L3.1-8b | 4 | 2 | none | 50 | 2,354 | 5,171 | 2.20x | 3,235 | 7,929 | 2.45x | -| L3.1-8b | 4 | 2 | none | 100 | 1,707 | 4,175 | 2.45x | 2,674 | 5,779 | 2.16x | -| L3.1-8b | 4 | 2 | none | 150 | 1,407 | 2,752 | 1.96x | 2,385 | 4,492 | 1.88x | -| L3.1-8b | 4 | 2 | none | 200 | 1,345 | 2,927 | 2.18x | 2,225 | 4,845 | 2.18x | -| L3.1-8b | 4 | 2 | real | 50 | 2,066 | 4,443 | 2.15x | 3,357 | 7,711 | 2.30x | -| L3.1-8b | 4 | 2 | real | 100 | 1,828 | 3,715 | 2.03x | 3,053 | 5,083 | 1.67x | -| L3.1-8b | 4 | 2 | real | 150 | 1,490 | 3,017 | 2.02x | 2,189 | 6,699 | 3.06x | -| L3.1-8b | 4 | 2 | real | 200 | 1,275 | 2,669 | 2.09x | 2,559 | 4,177 | 1.63x | -| L3.1-8b | 4 | 4 | none | 50 | 2,052 | 4,985 | 2.43x | 2,957 | 7,800 | 2.64x | -| L3.1-8b | 4 | 4 | none | 100 | 2,132 | 3,633 | 1.70x | 2,948 | 6,235 | 2.11x | -| L3.1-8b | 4 | 4 | none | 150 | 1,486 | 3,303 | 2.22x | 2,286 | 5,416 | 2.37x | -| L3.1-8b | 4 | 4 | none | 200 | 1,361 | 3,142 | 2.31x | 2,330 | 5,019 | 2.15x | -| L3.1-8b | 4 | 4 | real | 50 | 1,888 | 5,213 | 2.76x | 3,226 | 7,546 | 2.34x | -| L3.1-8b | 4 | 4 | real | 100 | 1,920 | 4,422 | 2.30x | 2,871 | 6,973 | 2.43x | -| L3.1-8b | 4 | 4 | real | 150 | 1,595 | 2,799 | 1.76x | 2,472 | 4,917 | 1.99x | -| L3.1-8b | 4 | 4 | real | 200 | 1,347 | 3,569 | 2.65x | 2,183 | 5,477 | 2.51x | -| M-7b | 0 | 0 | none | 50 | 47,410 | 85,121 | 1.80x | 72,722 | 137,091 | 1.89x | -| M-7b | 0 | 0 | none | 100 | 54,781 | 135,918 | 2.48x | 91,447 | 157,453 | 1.72x | -| M-7b | 0 | 0 | none | 150 | 57,942 | 145,555 | 2.51x | 75,685 | 170,968 | 2.26x | -| M-7b | 0 | 0 | none | 200 | 69,186 | 191,746 | 2.77x | 104,360 | 222,366 | 2.13x | -| M-7b | 0 | 0 | real | 50 | 35,408 | 84,558 | 2.39x | 55,319 | 132,854 | 2.40x | -| M-7b | 0 | 0 | real | 100 | 59,104 | 112,751 | 1.91x | 88,705 | 154,687 | 1.74x | -| M-7b | 0 | 0 | real | 150 | 56,641 | 133,954 | 2.36x | 79,667 | 162,942 | 2.05x | -| M-7b | 0 | 0 | real | 200 | 57,825 | 160,630 | 2.78x | 87,954 | 200,221 | 2.28x | -| M-7b | 0 | 2 | none | 50 | 41,019 | 48,176 | 1.17x | 63,207 | 79,394 | 1.26x | -| M-7b | 0 | 2 | none | 100 | 49,323 | 46,737 | 0.95x | 80,856 | 77,975 | 0.96x | -| M-7b | 0 | 2 | none | 150 | 42,841 | 27,839 | 0.65x | 68,897 | 55,034 | 0.80x | -| M-7b | 0 | 2 | none | 200 | 38,132 | 35,851 | 0.94x | 58,221 | 71,779 | 1.23x | -| M-7b | 0 | 2 | real | 50 | 31,411 | 40,145 | 1.28x | 51,318 | 63,814 | 1.24x | -| M-7b | 0 | 2 | real | 100 | 43,935 | 48,649 | 1.11x | 68,702 | 79,659 | 1.16x | -| M-7b | 0 | 2 | real | 150 | 37,855 | 23,097 | 0.61x | 62,082 | 39,106 | 0.63x | -| M-7b | 0 | 2 | real | 200 | 40,901 | 24,651 | 0.60x | 71,630 | 42,160 | 0.59x | -| M-7b | 0 | 4 | none | 50 | 50,972 | 58,260 | 1.14x | 68,846 | 96,147 | 1.40x | -| M-7b | 0 | 4 | none | 100 | 66,218 | 54,560 | 0.82x | 97,189 | 83,899 | 0.86x | -| M-7b | 0 | 4 | none | 150 | 51,248 | 34,498 | 0.67x | 76,578 | 55,645 | 0.73x | -| M-7b | 0 | 4 | none | 200 | 59,124 | 36,211 | 0.61x | 97,468 | 68,419 | 0.70x | -| M-7b | 0 | 4 | real | 50 | 34,668 | 47,960 | 1.38x | 52,885 | 78,325 | 1.48x | -| M-7b | 0 | 4 | real | 100 | 46,447 | 44,628 | 0.96x | 83,298 | 74,528 | 0.89x | -| M-7b | 0 | 4 | real | 150 | 43,389 | 29,019 | 0.67x | 75,270 | 46,922 | 0.62x | -| M-7b | 0 | 4 | real | 200 | 50,644 | 41,697 | 0.82x | 89,026 | 82,240 | 0.92x | -| M-7b | 4 | 0 | none | 50 | 2,338 | 5,117 | 2.19x | 4,634 | 7,524 | 1.62x | -| M-7b | 4 | 0 | none | 100 | 2,604 | 7,141 | 2.74x | 4,451 | 11,977 | 2.69x | -| M-7b | 4 | 0 | none | 150 | 2,268 | 7,089 | 3.13x | 5,626 | 16,019 | 2.85x | -| M-7b | 4 | 0 | none | 200 | 3,389 | 7,352 | 2.17x | 23,246 | 14,644 | 0.63x | -| M-7b | 4 | 0 | real | 50 | 2,170 | 5,386 | 2.48x | 3,351 | 14,116 | 4.21x | -| M-7b | 4 | 0 | real | 100 | 2,423 | 6,942 | 2.87x | 5,076 | 10,573 | 2.08x | -| M-7b | 4 | 0 | real | 150 | 2,227 | 7,483 | 3.36x | 5,804 | 12,193 | 2.10x | -| M-7b | 4 | 0 | real | 200 | 2,345 | 5,950 | 2.54x | 8,588 | 20,395 | 2.37x | -| M-7b | 4 | 2 | none | 50 | 2,153 | 4,666 | 2.17x | 3,110 | 7,508 | 2.41x | -| M-7b | 4 | 2 | none | 100 | 1,812 | 4,045 | 2.23x | 2,860 | 6,089 | 2.13x | -| M-7b | 4 | 2 | none | 150 | 1,478 | 3,232 | 2.19x | 2,245 | 6,248 | 2.78x | -| M-7b | 4 | 2 | none | 200 | 1,258 | 3,239 | 2.57x | 2,457 | 5,859 | 2.38x | -| M-7b | 4 | 2 | real | 50 | 2,062 | 5,126 | 2.49x | 3,307 | 7,418 | 2.24x | -| M-7b | 4 | 2 | real | 100 | 1,904 | 3,802 | 2.00x | 2,796 | 5,486 | 1.96x | -| M-7b | 4 | 2 | real | 150 | 1,515 | 3,079 | 2.03x | 2,897 | 5,491 | 1.90x | -| M-7b | 4 | 2 | real | 200 | 1,349 | 3,205 | 2.38x | 2,502 | 4,655 | 1.86x | -| M-7b | 4 | 4 | none | 50 | 1,982 | 4,685 | 2.36x | 3,283 | 7,892 | 2.40x | -| M-7b | 4 | 4 | none | 100 | 1,736 | 4,217 | 2.43x | 2,732 | 6,225 | 2.28x | -| M-7b | 4 | 4 | none | 150 | 1,342 | 3,474 | 2.59x | 2,383 | 5,058 | 2.12x | -| M-7b | 4 | 4 | none | 200 | 1,468 | 2,890 | 1.97x | 2,441 | 5,179 | 2.12x | -| M-7b | 4 | 4 | real | 50 | 2,196 | 4,265 | 1.94x | 3,786 | 6,788 | 1.79x | -| M-7b | 4 | 4 | real | 100 | 1,841 | 4,175 | 2.27x | 3,187 | 6,634 | 2.08x | -| M-7b | 4 | 4 | real | 150 | 1,498 | 3,033 | 2.03x | 2,414 | 4,908 | 2.03x | -| M-7b | 4 | 4 | real | 200 | 1,368 | 3,019 | 2.21x | 2,146 | 4,959 | 2.31x | - -### C.7 Full I/O Volume Comparison (Prefill/Decode) - -Prefill Bytes Written and Decode Bytes Read in GB. - -| Model | CPU | MCA | Gen | Users | Prefill Fast | Prefill Slow | Decode Fast | Decode Slow | -|-------|-----|-----|-----|-------|--------------|--------------|-------------|-------------| -| L2-7b | 0 | 0 | none | 50 | 148.5 | 94.8 | 1055.7 | 328.5 | -| L2-7b | 0 | 0 | none | 100 | 194.1 | 112.4 | 1590.8 | 498.6 | -| L2-7b | 0 | 0 | none | 150 | 220.8 | 115.0 | 1665.0 | 434.2 | -| L2-7b | 0 | 0 | real | 50 | 151.8 | 94.8 | 1050.6 | 271.7 | -| L2-7b | 0 | 0 | real | 100 | 193.5 | 113.2 | 1568.7 | 349.1 | -| L2-7b | 0 | 2 | none | 50 | 151.9 | 73.4 | 1007.4 | 439.7 | -| L2-7b | 0 | 2 | none | 100 | 188.7 | 87.5 | 1361.8 | 606.5 | -| L2-7b | 0 | 2 | none | 150 | 218.4 | 111.5 | 1487.2 | 710.0 | -| L2-7b | 0 | 2 | none | 200 | 240.4 | 117.9 | 1637.4 | 885.9 | -| L2-7b | 0 | 2 | real | 50 | 140.3 | 70.0 | 969.6 | 437.5 | -| L2-7b | 0 | 2 | real | 100 | 173.8 | 93.3 | 1328.8 | 623.7 | -| L2-7b | 0 | 2 | real | 150 | 214.2 | 98.8 | 1445.8 | 656.2 | -| L2-7b | 0 | 2 | real | 200 | 232.6 | 111.7 | 1635.6 | 723.6 | -| L2-7b | 0 | 4 | none | 50 | 166.1 | 66.5 | 1132.2 | 378.0 | -| L2-7b | 0 | 4 | none | 100 | 209.4 | 89.5 | 1528.0 | 578.2 | -| L2-7b | 0 | 4 | none | 150 | 240.0 | 113.1 | 1684.5 | 722.2 | -| L2-7b | 0 | 4 | none | 200 | 273.2 | 131.2 | 1912.7 | 762.5 | -| L2-7b | 0 | 4 | real | 50 | 156.8 | 74.1 | 1088.1 | 410.7 | -| L2-7b | 0 | 4 | real | 100 | 195.0 | 97.2 | 1544.4 | 605.7 | -| L2-7b | 0 | 4 | real | 150 | 224.4 | 110.8 | 1663.6 | 683.7 | -| L2-7b | 0 | 4 | real | 200 | 271.0 | 118.0 | 1922.4 | 740.7 | -| L2-7b | 4 | 0 | none | 50 | 191.5 | 121.3 | 1181.8 | 495.6 | -| L2-7b | 4 | 0 | real | 50 | 192.8 | 115.1 | 1152.3 | 378.6 | -| L2-7b | 4 | 0 | real | 100 | 228.7 | 114.0 | 2071.1 | 639.0 | -| L2-7b | 4 | 2 | none | 50 | 154.7 | 83.6 | 1161.3 | 604.1 | -| L2-7b | 4 | 2 | none | 100 | 198.7 | 115.0 | 1592.6 | 893.6 | -| L2-7b | 4 | 2 | none | 150 | 209.0 | 164.8 | 1589.8 | 1157.2 | -| L2-7b | 4 | 2 | none | 200 | 241.7 | 177.1 | 1768.4 | 1211.6 | -| L2-7b | 4 | 2 | real | 50 | 141.7 | 82.4 | 1220.5 | 701.6 | -| L2-7b | 4 | 2 | real | 100 | 185.6 | 119.6 | 1499.4 | 960.6 | -| L2-7b | 4 | 2 | real | 150 | 206.6 | 163.2 | 1613.2 | 1196.9 | -| L2-7b | 4 | 2 | real | 200 | 236.4 | 158.7 | 1753.2 | 1143.4 | -| L2-7b | 4 | 4 | none | 50 | 175.1 | 86.6 | 1245.0 | 622.7 | -| L2-7b | 4 | 4 | none | 100 | 204.7 | 124.8 | 1705.8 | 1004.2 | -| L2-7b | 4 | 4 | none | 150 | 234.8 | 149.1 | 1730.4 | 1149.5 | -| L2-7b | 4 | 4 | none | 200 | 249.4 | 174.2 | 1797.4 | 1208.6 | -| L2-7b | 4 | 4 | real | 50 | 158.1 | 97.1 | 1392.9 | 687.7 | -| L2-7b | 4 | 4 | real | 100 | 202.3 | 120.6 | 1674.2 | 857.0 | -| L2-7b | 4 | 4 | real | 150 | 235.8 | 155.1 | 1760.7 | 1143.3 | -| L2-7b | 4 | 4 | real | 200 | 250.3 | 178.1 | 1841.1 | 1276.0 | -| L3.1-70b | 0 | 0 | none | 10 | 75.9 | 34.6 | 670.0 | 298.7 | -| L3.1-70b | 0 | 0 | none | 20 | 87.8 | 45.2 | 710.9 | 280.6 | -| L3.1-70b | 0 | 0 | none | 30 | 105.2 | 62.9 | 876.8 | 331.2 | -| L3.1-70b | 0 | 0 | none | 40 | 118.7 | 71.5 | 982.0 | 342.2 | -| L3.1-70b | 0 | 0 | none | 50 | 126.0 | 81.2 | 1031.8 | 394.1 | -| L3.1-70b | 0 | 0 | none | 60 | 151.7 | 84.5 | 1255.1 | 365.5 | -| L3.1-70b | 0 | 0 | none | 70 | 152.5 | 86.4 | 1193.4 | 418.8 | -| L3.1-70b | 0 | 0 | real | 10 | 72.0 | 33.3 | 640.2 | 299.3 | -| L3.1-70b | 0 | 0 | real | 20 | 80.0 | 45.6 | 718.5 | 310.1 | -| L3.1-70b | 0 | 0 | real | 30 | 94.3 | 58.5 | 831.2 | 350.4 | -| L3.1-70b | 0 | 0 | real | 40 | 106.5 | 69.8 | 916.8 | 378.7 | -| L3.1-70b | 0 | 0 | real | 50 | 118.8 | 75.8 | 1035.7 | 365.0 | -| L3.1-70b | 0 | 0 | real | 60 | 139.0 | 80.7 | 1142.2 | 391.6 | -| L3.1-70b | 0 | 0 | real | 70 | 142.5 | 73.9 | 1199.3 | 369.2 | -| L3.1-70b | 0 | 2 | none | 10 | 74.5 | 39.2 | 662.6 | 295.1 | -| L3.1-70b | 0 | 2 | none | 20 | 92.6 | 46.7 | 731.9 | 301.6 | -| L3.1-70b | 0 | 2 | none | 30 | 103.1 | 54.7 | 873.1 | 357.8 | -| L3.1-70b | 0 | 2 | none | 40 | 115.0 | 57.2 | 950.2 | 344.3 | -| L3.1-70b | 0 | 2 | none | 50 | 129.3 | 59.5 | 985.1 | 385.4 | -| L3.1-70b | 0 | 2 | none | 60 | 133.7 | 60.1 | 1113.8 | 417.6 | -| L3.1-70b | 0 | 2 | none | 70 | 139.5 | 68.5 | 1108.7 | 459.7 | -| L3.1-70b | 0 | 2 | real | 10 | 65.5 | 33.4 | 661.5 | 301.4 | -| L3.1-70b | 0 | 2 | real | 20 | 88.2 | 47.7 | 747.8 | 328.9 | -| L3.1-70b | 0 | 2 | real | 30 | 99.0 | 52.8 | 814.4 | 352.2 | -| L3.1-70b | 0 | 2 | real | 40 | 113.0 | 54.5 | 914.8 | 349.1 | -| L3.1-70b | 0 | 2 | real | 50 | 117.5 | 56.9 | 1007.2 | 406.2 | -| L3.1-70b | 0 | 2 | real | 60 | 127.5 | 63.8 | 1050.6 | 412.3 | -| L3.1-70b | 0 | 2 | real | 70 | 134.2 | 62.0 | 1017.0 | 431.0 | -| L3.1-70b | 0 | 4 | none | 10 | 71.7 | 36.2 | 679.1 | 291.1 | -| L3.1-70b | 0 | 4 | none | 20 | 90.4 | 48.2 | 751.1 | 295.9 | -| L3.1-70b | 0 | 4 | none | 30 | 99.9 | 53.3 | 828.8 | 327.4 | -| L3.1-70b | 0 | 4 | none | 40 | 117.6 | 61.6 | 979.6 | 362.2 | -| L3.1-70b | 0 | 4 | none | 50 | 141.3 | 61.2 | 1094.1 | 393.4 | -| L3.1-70b | 0 | 4 | none | 60 | 151.1 | 60.3 | 1236.2 | 378.4 | -| L3.1-70b | 0 | 4 | none | 70 | 153.7 | 70.9 | 1220.8 | 429.6 | -| L3.1-70b | 0 | 4 | real | 10 | 68.8 | 37.3 | 609.4 | 309.9 | -| L3.1-70b | 0 | 4 | real | 20 | 78.2 | 46.1 | 727.8 | 304.2 | -| L3.1-70b | 0 | 4 | real | 30 | 97.9 | 48.1 | 864.5 | 339.1 | -| L3.1-70b | 0 | 4 | real | 40 | 113.0 | 60.4 | 932.9 | 376.5 | -| L3.1-70b | 0 | 4 | real | 50 | 119.4 | 59.7 | 1025.6 | 416.7 | -| L3.1-70b | 0 | 4 | real | 60 | 150.6 | 66.1 | 1179.1 | 401.9 | -| L3.1-70b | 0 | 4 | real | 70 | 149.1 | 66.9 | 1178.4 | 417.3 | -| L3.1-70b | 4 | 0 | none | 10 | 128.4 | 70.0 | 1111.5 | 544.3 | -| L3.1-70b | 4 | 0 | none | 20 | 134.1 | 70.0 | 1127.3 | 393.8 | -| L3.1-70b | 4 | 0 | none | 30 | 140.2 | 95.6 | 1150.4 | 773.3 | -| L3.1-70b | 4 | 0 | none | 40 | 154.2 | 104.2 | 1173.3 | 951.1 | -| L3.1-70b | 4 | 0 | none | 50 | 185.9 | 103.8 | 1361.5 | 862.7 | -| L3.1-70b | 4 | 0 | none | 60 | 193.0 | 104.6 | 1390.6 | 506.7 | -| L3.1-70b | 4 | 0 | none | 70 | 193.9 | 108.8 | 1748.5 | 631.8 | -| L3.1-70b | 4 | 0 | real | 10 | 110.1 | 47.8 | 1003.3 | 435.8 | -| L3.1-70b | 4 | 0 | real | 20 | 120.6 | 60.2 | 1111.5 | 516.8 | -| L3.1-70b | 4 | 0 | real | 30 | 145.4 | 81.0 | 1335.4 | 458.6 | -| L3.1-70b | 4 | 0 | real | 40 | 140.9 | 101.2 | 1241.1 | 522.9 | -| L3.1-70b | 4 | 0 | real | 50 | 169.5 | 108.5 | 1537.5 | 643.9 | -| L3.1-70b | 4 | 0 | real | 60 | 182.3 | 109.3 | 1467.9 | 539.9 | -| L3.1-70b | 4 | 0 | real | 70 | 187.3 | 110.0 | 1596.5 | 603.8 | -| L3.1-70b | 4 | 2 | none | 10 | 119.0 | 58.0 | 1087.7 | 434.5 | -| L3.1-70b | 4 | 2 | none | 20 | 130.8 | 65.1 | 1123.1 | 539.4 | -| L3.1-70b | 4 | 2 | none | 30 | 137.3 | 66.9 | 1162.5 | 526.3 | -| L3.1-70b | 4 | 2 | none | 40 | 134.1 | 75.0 | 1172.7 | 609.2 | -| L3.1-70b | 4 | 2 | none | 50 | 137.3 | 69.9 | 1137.9 | 580.8 | -| L3.1-70b | 4 | 2 | none | 60 | 142.0 | 79.0 | 1158.7 | 605.2 | -| L3.1-70b | 4 | 2 | none | 70 | 150.6 | 86.7 | 1229.8 | 651.4 | -| L3.1-70b | 4 | 2 | real | 10 | 95.6 | 53.1 | 958.9 | 409.5 | -| L3.1-70b | 4 | 2 | real | 20 | 122.7 | 62.6 | 1055.6 | 506.9 | -| L3.1-70b | 4 | 2 | real | 30 | 127.2 | 65.2 | 1082.3 | 551.6 | -| L3.1-70b | 4 | 2 | real | 40 | 131.2 | 73.7 | 1110.9 | 543.7 | -| L3.1-70b | 4 | 2 | real | 50 | 133.0 | 75.1 | 1090.7 | 615.0 | -| L3.1-70b | 4 | 2 | real | 60 | 139.9 | 80.3 | 1214.9 | 661.1 | -| L3.1-70b | 4 | 2 | real | 70 | 143.3 | 85.1 | 1186.4 | 673.0 | -| L3.1-70b | 4 | 4 | none | 10 | 133.7 | 56.2 | 1208.8 | 451.6 | -| L3.1-70b | 4 | 4 | none | 20 | 147.3 | 63.0 | 1181.5 | 515.0 | -| L3.1-70b | 4 | 4 | none | 30 | 142.7 | 71.9 | 1234.0 | 533.9 | -| L3.1-70b | 4 | 4 | none | 40 | 147.0 | 74.9 | 1236.1 | 606.5 | -| L3.1-70b | 4 | 4 | none | 50 | 157.6 | 77.8 | 1214.1 | 594.9 | -| L3.1-70b | 4 | 4 | none | 60 | 153.0 | 88.2 | 1282.8 | 652.8 | -| L3.1-70b | 4 | 4 | none | 70 | 157.3 | 89.3 | 1240.8 | 633.1 | -| L3.1-70b | 4 | 4 | real | 10 | 100.2 | 47.8 | 1038.4 | 454.0 | -| L3.1-70b | 4 | 4 | real | 20 | 131.7 | 62.8 | 1191.6 | 495.4 | -| L3.1-70b | 4 | 4 | real | 30 | 132.6 | 71.4 | 1176.5 | 532.3 | -| L3.1-70b | 4 | 4 | real | 40 | 141.8 | 74.3 | 1216.5 | 596.0 | -| L3.1-70b | 4 | 4 | real | 50 | 142.1 | 73.5 | 1180.8 | 676.6 | -| L3.1-70b | 4 | 4 | real | 60 | 148.5 | 89.0 | 1193.2 | 618.2 | -| L3.1-70b | 4 | 4 | real | 70 | 163.4 | 86.3 | 1413.4 | 658.4 | -| L3.1-8b | 0 | 0 | none | 50 | 102.0 | 47.6 | 935.4 | 363.6 | -| L3.1-8b | 0 | 0 | none | 100 | 135.4 | 61.3 | 1252.9 | 471.7 | -| L3.1-8b | 0 | 0 | none | 150 | 173.7 | 72.5 | 1456.0 | 462.8 | -| L3.1-8b | 0 | 0 | none | 200 | 197.6 | 84.2 | 1617.5 | 535.6 | -| L3.1-8b | 0 | 0 | real | 50 | 90.0 | 45.7 | 781.5 | 372.8 | -| L3.1-8b | 0 | 0 | real | 100 | 121.2 | 59.8 | 1123.3 | 463.2 | -| L3.1-8b | 0 | 0 | real | 150 | 158.3 | 70.6 | 1304.5 | 489.4 | -| L3.1-8b | 0 | 0 | real | 200 | 177.4 | 84.9 | 1473.4 | 534.5 | -| L3.1-8b | 0 | 2 | none | 50 | 103.5 | 43.7 | 888.0 | 363.9 | -| L3.1-8b | 0 | 2 | none | 100 | 129.5 | 53.9 | 1133.6 | 435.8 | -| L3.1-8b | 0 | 2 | none | 150 | 162.0 | 63.9 | 1275.8 | 503.3 | -| L3.1-8b | 0 | 2 | none | 200 | 170.5 | 68.7 | 1272.7 | 504.7 | -| L3.1-8b | 0 | 2 | real | 50 | 89.6 | 41.1 | 803.7 | 347.0 | -| L3.1-8b | 0 | 2 | real | 100 | 122.4 | 48.7 | 1068.9 | 427.4 | -| L3.1-8b | 0 | 2 | real | 150 | 151.1 | 62.3 | 1201.0 | 452.6 | -| L3.1-8b | 0 | 2 | real | 200 | 164.7 | 68.2 | 1265.3 | 520.4 | -| L3.1-8b | 0 | 4 | none | 50 | 106.8 | 42.5 | 925.6 | 366.1 | -| L3.1-8b | 0 | 4 | none | 100 | 135.1 | 52.6 | 1247.2 | 432.6 | -| L3.1-8b | 0 | 4 | none | 150 | 180.6 | 63.3 | 1457.5 | 482.8 | -| L3.1-8b | 0 | 4 | none | 200 | 198.4 | 69.8 | 1557.2 | 507.9 | -| L3.1-8b | 0 | 4 | real | 50 | 93.8 | 41.7 | 792.5 | 342.8 | -| L3.1-8b | 0 | 4 | real | 100 | 120.1 | 51.9 | 1121.7 | 446.0 | -| L3.1-8b | 0 | 4 | real | 150 | 159.4 | 60.2 | 1288.0 | 457.0 | -| L3.1-8b | 0 | 4 | real | 200 | 187.1 | 70.2 | 1470.6 | 521.9 | -| L3.1-8b | 4 | 0 | none | 50 | 166.5 | 89.8 | 1441.1 | 659.8 | -| L3.1-8b | 4 | 0 | none | 100 | 184.3 | 98.3 | 1658.9 | 806.8 | -| L3.1-8b | 4 | 0 | none | 150 | 188.5 | 104.6 | 1521.1 | 769.5 | -| L3.1-8b | 4 | 0 | none | 200 | 204.9 | 112.5 | 1622.8 | 818.0 | -| L3.1-8b | 4 | 0 | real | 50 | 145.9 | 82.5 | 1313.5 | 718.3 | -| L3.1-8b | 4 | 0 | real | 100 | 170.6 | 92.1 | 1557.6 | 795.2 | -| L3.1-8b | 4 | 0 | real | 150 | 180.1 | 101.1 | 1421.9 | 735.7 | -| L3.1-8b | 4 | 0 | real | 200 | 195.7 | 114.3 | 1560.9 | 875.9 | -| L3.1-8b | 4 | 2 | none | 50 | 139.9 | 68.2 | 1222.1 | 611.4 | -| L3.1-8b | 4 | 2 | none | 100 | 150.2 | 83.8 | 1281.2 | 716.2 | -| L3.1-8b | 4 | 2 | none | 150 | 159.2 | 85.1 | 1234.6 | 628.5 | -| L3.1-8b | 4 | 2 | none | 200 | 167.8 | 93.8 | 1292.6 | 692.1 | -| L3.1-8b | 4 | 2 | real | 50 | 137.6 | 68.3 | 1196.6 | 609.6 | -| L3.1-8b | 4 | 2 | real | 100 | 145.4 | 78.4 | 1286.1 | 673.3 | -| L3.1-8b | 4 | 2 | real | 150 | 152.6 | 85.5 | 1196.6 | 689.7 | -| L3.1-8b | 4 | 2 | real | 200 | 163.1 | 95.3 | 1245.2 | 698.1 | -| L3.1-8b | 4 | 4 | none | 50 | 144.5 | 69.8 | 1203.0 | 610.6 | -| L3.1-8b | 4 | 4 | none | 100 | 152.7 | 79.1 | 1343.0 | 657.8 | -| L3.1-8b | 4 | 4 | none | 150 | 164.8 | 89.9 | 1271.6 | 672.3 | -| L3.1-8b | 4 | 4 | none | 200 | 173.3 | 99.9 | 1323.8 | 740.0 | -| L3.1-8b | 4 | 4 | real | 50 | 136.2 | 69.4 | 1125.9 | 595.6 | -| L3.1-8b | 4 | 4 | real | 100 | 147.5 | 80.2 | 1291.9 | 712.2 | -| L3.1-8b | 4 | 4 | real | 150 | 157.3 | 89.5 | 1239.5 | 677.7 | -| L3.1-8b | 4 | 4 | real | 200 | 166.8 | 95.8 | 1276.5 | 753.2 | -| M-7b | 0 | 0 | none | 50 | 99.8 | 53.1 | 924.8 | 425.3 | -| M-7b | 0 | 0 | none | 100 | 139.2 | 58.7 | 1270.7 | 444.6 | -| M-7b | 0 | 0 | none | 150 | 174.5 | 76.9 | 1432.4 | 509.4 | -| M-7b | 0 | 0 | none | 200 | 190.9 | 85.1 | 1580.6 | 550.6 | -| M-7b | 0 | 0 | real | 50 | 88.9 | 49.8 | 778.9 | 410.3 | -| M-7b | 0 | 0 | real | 100 | 121.5 | 61.1 | 1133.2 | 463.9 | -| M-7b | 0 | 0 | real | 150 | 160.8 | 69.5 | 1345.4 | 441.4 | -| M-7b | 0 | 0 | real | 200 | 181.6 | 86.5 | 1472.7 | 556.1 | -| M-7b | 0 | 2 | none | 50 | 106.5 | 43.2 | 919.3 | 361.2 | -| M-7b | 0 | 2 | none | 100 | 132.5 | 51.3 | 1176.3 | 439.0 | -| M-7b | 0 | 2 | none | 150 | 160.2 | 63.3 | 1253.7 | 480.6 | -| M-7b | 0 | 2 | none | 200 | 172.5 | 71.9 | 1296.7 | 544.3 | -| M-7b | 0 | 2 | real | 50 | 89.9 | 42.9 | 784.9 | 342.6 | -| M-7b | 0 | 2 | real | 100 | 123.0 | 50.8 | 1097.8 | 446.7 | -| M-7b | 0 | 2 | real | 150 | 148.4 | 60.1 | 1128.5 | 464.7 | -| M-7b | 0 | 2 | real | 200 | 173.9 | 68.4 | 1352.4 | 505.3 | -| M-7b | 0 | 4 | none | 50 | 101.4 | 44.4 | 937.9 | 380.1 | -| M-7b | 0 | 4 | none | 100 | 132.4 | 53.3 | 1244.6 | 447.8 | -| M-7b | 0 | 4 | none | 150 | 174.1 | 62.4 | 1405.2 | 469.3 | -| M-7b | 0 | 4 | none | 200 | 198.7 | 69.7 | 1585.2 | 513.7 | -| M-7b | 0 | 4 | real | 50 | 87.3 | 39.9 | 773.2 | 345.1 | -| M-7b | 0 | 4 | real | 100 | 123.8 | 51.8 | 1129.7 | 416.0 | -| M-7b | 0 | 4 | real | 150 | 159.4 | 63.6 | 1290.3 | 490.6 | -| M-7b | 0 | 4 | real | 200 | 186.4 | 68.6 | 1457.6 | 530.7 | -| M-7b | 4 | 0 | none | 50 | 162.5 | 84.8 | 1375.6 | 671.3 | -| M-7b | 4 | 0 | none | 100 | 173.7 | 97.7 | 1576.9 | 758.0 | -| M-7b | 4 | 0 | none | 150 | 190.0 | 105.9 | 1522.7 | 769.9 | -| M-7b | 4 | 0 | none | 200 | 205.2 | 114.5 | 1595.7 | 838.0 | -| M-7b | 4 | 0 | real | 50 | 151.2 | 84.7 | 1340.9 | 740.5 | -| M-7b | 4 | 0 | real | 100 | 164.4 | 91.2 | 1464.0 | 751.3 | -| M-7b | 4 | 0 | real | 150 | 180.5 | 99.5 | 1473.1 | 812.1 | -| M-7b | 4 | 0 | real | 200 | 192.7 | 113.1 | 1578.1 | 881.6 | -| M-7b | 4 | 2 | none | 50 | 136.3 | 70.4 | 1134.0 | 598.9 | -| M-7b | 4 | 2 | none | 100 | 148.4 | 80.5 | 1252.8 | 683.5 | -| M-7b | 4 | 2 | none | 150 | 160.0 | 88.4 | 1256.1 | 692.7 | -| M-7b | 4 | 2 | none | 200 | 166.2 | 95.7 | 1243.1 | 710.5 | -| M-7b | 4 | 2 | real | 50 | 135.1 | 71.5 | 1121.6 | 622.0 | -| M-7b | 4 | 2 | real | 100 | 142.3 | 79.9 | 1269.4 | 677.1 | -| M-7b | 4 | 2 | real | 150 | 152.4 | 86.0 | 1227.3 | 667.8 | -| M-7b | 4 | 2 | real | 200 | 159.6 | 90.1 | 1219.5 | 694.9 | -| M-7b | 4 | 4 | none | 50 | 142.4 | 73.5 | 1204.8 | 603.6 | -| M-7b | 4 | 4 | none | 100 | 154.1 | 82.8 | 1341.3 | 691.3 | -| M-7b | 4 | 4 | none | 150 | 164.6 | 88.5 | 1253.1 | 683.4 | -| M-7b | 4 | 4 | none | 200 | 169.6 | 94.8 | 1284.2 | 719.2 | -| M-7b | 4 | 4 | real | 50 | 139.4 | 71.5 | 1186.1 | 602.2 | -| M-7b | 4 | 4 | real | 100 | 147.1 | 81.2 | 1280.2 | 719.1 | -| M-7b | 4 | 4 | real | 150 | 157.8 | 87.9 | 1242.9 | 677.7 | -| M-7b | 4 | 4 | real | 200 | 162.9 | 94.7 | 1229.9 | 745.1 | - ---- - -## Appendix D: iostat Analysis - Maximum Storage Stress Configurations - -This appendix analyzes iostat data from the Fast system to identify configurations that stress NVMe storage the most. The Slow system iostat files contained no actual I/O data (device nvme3n1 showed zeros), so only Fast system data is available. - -### D.1 Top 20 Configurations by Total Throughput - -| Model | CPU | MCA | Gen | Users | Read MB/s | Write MB/s | Total MB/s | Util% | -|-------|-----|-----|-----|-------|-----------|------------|------------|-------| -| M-7b | 0 | 16 | none | 200 | 9,744 | 1,223 | **10,967** | 290.5 | -| L3.1-8b | 0 | 32 | none | 200 | 9,760 | 1,190 | **10,951** | 292.6 | -| M-7b | 0 | 0 | none | 200 | 9,636 | 1,168 | **10,804** | 283.3 | -| L3.1-8b | 0 | 64 | none | 200 | 9,541 | 1,139 | **10,680** | 273.7 | -| M-7b | 0 | 8 | none | 200 | 9,493 | 1,176 | **10,669** | 282.3 | -| L3.1-8b | 0 | 8 | none | 200 | 9,427 | 1,220 | **10,647** | 281.5 | -| L3.1-8b | 0 | 16 | none | 200 | 9,438 | 1,161 | **10,599** | 280.7 | -| L3.1-8b | 0 | 0 | none | 200 | 9,418 | 1,154 | **10,572** | 270.8 | -| L3.1-8b | 0 | 32 | none | 150 | 9,369 | 1,138 | **10,507** | 242.7 | -| M-7b | 0 | 64 | none | 200 | 9,392 | 1,110 | **10,502** | 271.0 | - -**Key Finding:** Peak throughput exceeds **10.9 GB/s** (78% of theoretical 14 GB/s NVMe limit). - -### D.2 Storage Stress by cpu_mem Setting - -| cpu_mem | Avg Read MB/s | Avg Write MB/s | Avg Total MB/s | Read Latency | Util% | -|---------|---------------|----------------|----------------|--------------|-------| -| **0 GB** | **6,825** | 855 | **7,680** | 1.26 ms | 211% | -| 4 GB | 1,714 | 1,027 | 2,741 | 0.11 ms | 51% | -| 8 GB | 628 | 1,091 | 1,719 | 0.03 ms | 38% | -| 16 GB | 47 | 1,141 | 1,188 | 0.01 ms | 38% | -| 32 GB | 12 | 1,139 | 1,151 | 0.01 ms | 38% | -| 64 GB | 12 | 1,100 | 1,112 | 0.01 ms | 35% | - -**Critical Finding:** `cpu_mem=0GB` generates **4.0x more read I/O** than `cpu_mem=4GB`: -- Forces **all decode reads** to come from NVMe storage (no CPU memory cache) -- Read throughput: 6,825 MB/s vs 1,714 MB/s -- This is **THE most important parameter** for storage stress testing - -### D.3 Storage Stress by Model (cpu_mem=0 only) - -**Summary Statistics (all user counts):** - -| Model | Avg Read MB/s | Avg Write MB/s | Avg Total MB/s | Configs | -|-------|---------------|----------------|----------------|---------| -| **mistral-7b** | 7,853 | 927 | **8,781** | 56 | -| **llama3.1-8b** | 7,843 | 926 | **8,769** | 56 | -| llama2-7b | 6,601 | 993 | 7,594 | 56 | -| llama3.1-70b | 5,785 | 694 | 6,479 | 98 | - -**Apples-to-Apples Comparison @ users=50 (all models tested):** - -| Model | Read MB/s | Write MB/s | Total MB/s | -|-------|-----------|------------|------------| -| **llama3.1-70b** | 6,041 | 739 | **6,781** | -| llama2-7b | 5,898 | 848 | 6,746 | -| llama3.1-8b | 5,958 | 678 | 6,636 | -| mistral-7b | 5,945 | 667 | 6,611 | - -**Key Insight:** At the same user count, **llama3.1-70b generates the most storage I/O** because: -- **Larger KV cache per request** - 70B model has more layers and larger hidden dimensions -- Each prefill/decode operation transfers more bytes -- The 7B/8B models only appear to generate more total throughput because they were tested with higher user counts (100-200) where they complete more requests per second - -**Recommendation:** For **per-request storage stress**, use `llama3.1-70b`. For **maximum aggregate throughput**, use `mistral-7b` or `llama3.1-8b` with 200 users. - -### D.4 Storage Stress by Users (cpu_mem=0 only) - -| Users | Avg Read MB/s | Avg Total MB/s | Util% | -|-------|---------------|----------------|-------| -| **200** | 8,119 | 9,277 | 246% | -| **150** | 8,168 | 9,203 | 222% | -| 100 | 7,509 | 8,380 | 192% | -| 50 | 5,961 | 6,694 | 243% | - -**Finding:** Higher user counts (150-200) sustain **maximum storage throughput**. - -### D.5 Optimal Invocation for Maximum Storage Stress - -Based on iostat analysis, the **recommended configurations** for maximum NVMe stress: - -**Option A: Maximum Aggregate Throughput (~11 GB/s)** -``` -python kv-cache.py \ - --model mistral-7b # or llama3.1-8b (equivalent) - --cpu_mem 0 # CRITICAL: no CPU memory cache - --max_concurrent_allocs 16 # or 32 - --users 200 # or 150 - --gen_mode none # slightly higher throughput -``` - -**Option B: Maximum Per-Request Storage Stress (for KV cache size testing)** -``` -python kv-cache.py \ - --model llama3.1-70b # Largest KV cache per request - --cpu_mem 0 # CRITICAL: no CPU memory cache - --max_concurrent_allocs 4 # Best for 70B model - --users 70 # Optimal for 70B - --gen_mode none -``` - -**Expected Performance:** - -| Option | Model | Read MB/s | Total MB/s | IOPS | -|--------|-------|-----------|------------|------| -| A (max throughput) | mistral-7b | ~9,700 | ~10,900 | ~88,000 | -| B (max per-request) | llama3.1-70b | ~7,000 | ~7,900 | ~63,000 | - -### D.6 Summary: Why cpu_mem=0 is Essential for Storage Benchmarking - -| Metric | cpu_mem=0GB | cpu_mem=4GB | Ratio | -|--------|-------------|-------------|-------| -| Read MB/s | 6,825 | 1,714 | **4.0x** | -| Max Read MB/s | 9,760 | 4,652 | **2.1x** | -| Utilization | 211% | 51% | **4.1x** | - -The `cpu_mem=0GB` setting: -1. **Eliminates CPU memory caching** - all decode reads must come from NVMe -2. **Maximizes storage throughput differentiation** between Fast and Slow systems -3. **Represents worst-case storage requirements** for KV cache workloads -4. **Achieves 78% of theoretical NVMe bandwidth** (10.9 GB/s of 14 GB/s) - ---- - -*Document generated by analysis scripts: analyze_results.py, analyze_variance.py, investigate_cpu_mem.py, investigate_anomaly.py, generate_sidebyside_v2.py, analyze_iostat.py* diff --git a/kv_cache_benchmark/discovery_results_and_analysis/mlperfv3_results_summary_1page.md b/kv_cache_benchmark/discovery_results_and_analysis/mlperfv3_results_summary_1page.md deleted file mode 100644 index 072aa5fb..00000000 --- a/kv_cache_benchmark/discovery_results_and_analysis/mlperfv3_results_summary_1page.md +++ /dev/null @@ -1,154 +0,0 @@ -# MLPerf v3 KV Cache Benchmark: Results Summary - -**Analysis Date:** 2026-01-09 | **Datasets:** Fast (1411 tests), Slow (268 tests) | **Matched Configs:** 220 - ---- - -## Test Systems - -| System | Type | Storage | RAM | Theoretical BW | -|--------|------|---------|-----|----------------| -| **Fast** | Supermicro SYS-621H-TN12R (bare metal) | NVMe /dev/nvme4n1 | 256 GB DDR5-4800 | **14,000 MB/s** | -| **Slow** | VMware ESXi 8.0.3U3 (VM) | VMFS6 volume | 128 GB DDR4-2400 | **~3,000 MB/s** | - -**Expected ratio:** 4.7x | **Observed ratio:** 2.1-2.6x (benchmark overhead, Python threading, memory copies) - ---- - -## Recommended Metrics for MLPerf v3 Submission - -**Critical:** Metric choice depends on `cpu_mem` setting. - -### At cpu_mem=0GB (Maximum Storage Stress) - -| Metric | Mean Ratio | Fast Win Rate | Recommendation | -|--------|------------|---------------|----------------| -| **Decode Bytes Read (GB)** | **2.62x** | **100%** | **PRIMARY** | -| **Wall-Clock Throughput (tok/s)** | **2.43x** | **100%** | **PRIMARY** | -| Storage Throughput (tok/s) | 1.12x | 62% | ❌ NOT RECOMMENDED | - -### At cpu_mem=4GB (Mixed Workload) - -| Metric | Mean Ratio | Fast Win Rate | Recommendation | -|--------|------------|---------------|----------------| -| **Storage Throughput (tok/s)** | **2.23x** | **97%** | **PRIMARY** | -| Decode Bytes Read (GB) | 2.06x | 100% | SECONDARY | -| Wall-Clock Throughput (tok/s) | 1.79x | 100% | SECONDARY | - ---- - -## Key Findings - -### Differentiation by cpu_mem_gb (Critical Parameter) - -| cpu_mem | Storage Tput Ratio | Decode Bytes Ratio | Primary Metric | -|---------|--------------------|--------------------|----------------| -| **0 GB** | 1.12x ❌ | **2.62x** ✓ | **Decode Bytes Read** | -| **4 GB** | **2.23x** ✓ | 2.06x | **Storage Throughput** | - -**Why Storage Throughput fails at cpu_mem=0:** Both systems are I/O-saturated. Fast does 2.62x more I/O but accumulates proportionally more I/O time → ratio cancels out. - -### Differentiation by Model - -| Model | Stor Tput Ratio | Decode Ratio | Notes | -|-------|-----------------|--------------|-------| -| llama3.1-8b | **2.02x** | 2.27x | Best overall differentiation | -| mistral-7b | **1.98x** | 2.23x | Good alternative | -| llama3.1-70b | 1.74x | **2.37x** | Best I/O volume, max storage stress | -| llama2-7b | 1.80x | 2.29x | Legacy model | - -### Variance (CV = std/mean) - -| Users | CV (Fast) | CV (Slow) | Implication | -|-------|-----------|-----------|-------------| -| 10-20 | 52-81% | 52-63% | Lower variance | -| 50-200 | 117-125% | 110-116% | **Run 3-5 trials minimum** | - ---- - -## Optimal Invocations for MLPerf v3 Submission - -### Option 1: Maximum Storage Stress (cpu_mem=0GB) - -```bash -python kv-cache.py \ - --model llama3.1-8b \ - --cpu-memory-gb 0 \ - --max-concurrent-allocs 16 \ - --users 200 \ - --duration 300 \ - --generation-mode none \ - --output results/mlperf_stress_$(hostname)_trial${N}.json -``` - -| Metric | Expected | Notes | -|--------|----------|-------| -| **Decode Bytes Read** | **2.62x** | PRIMARY metric at cpu_mem=0 | -| **Wall-Clock Throughput** | **2.43x** | 100% win rate | -| Storage Throughput | 1.12x | ❌ Do NOT use | -| Peak iostat throughput | ~11 GB/s | 78% of theoretical | - -### Option 2: Storage Throughput Focus (cpu_mem=4GB) - -```bash -python kv-cache.py \ - --model llama3.1-8b \ - --cpu-memory-gb 4 \ - --max-concurrent-allocs 0 \ - --users 100 \ - --duration 300 \ - --generation-mode none \ - --output results/mlperf_storage_$(hostname)_trial${N}.json -``` - -| Metric | Expected | Notes | -|--------|----------|-------| -| **Storage Throughput** | **2.23x** | PRIMARY metric at cpu_mem=4 | -| Decode Bytes Read | 2.06x | SECONDARY | - -**Run 3-5 trials per configuration. Report median and P95.** - ---- - -## Concurrency Model (kv-cache.py) - -``` -Users (--num-users) --> Request Queue --> Worker Pool (min(users,500)) --> Semaphore (--max-concurrent-allocs) -``` - -- `--num-users`: Simulated user threads generating requests -- `--max-concurrent-allocs`: Bounds simultaneous cache allocations (RAM usage) -- Filename `qdN` = `--max-concurrent-allocs N`, NOT observed queue depth - ---- - -## Conclusion - -**kv-cache.py successfully differentiates storage tiers:** - -| cpu_mem | Primary Metric | Differentiation | Win Rate | -|---------|----------------|-----------------|----------| -| **0 GB** | Decode Bytes Read | **2.62x** | **100%** | -| **0 GB** | Wall-Clock Throughput | **2.43x** | **100%** | -| **4 GB** | Storage Throughput | **2.23x** | 97% | - -**Critical:** Storage Throughput (tok/s) **fails at cpu_mem=0GB** (shows only 1.12x). Use Decode Bytes Read instead. - ---- - -## iostat Validation (Maximum Storage Stress) - -For maximum NVMe stress testing (e.g., validating hardware capabilities): - -| Setting | Value | Read MB/s | Total MB/s | Rationale | -|---------|-------|-----------|------------|-----------| -| cpu_mem | **0 GB** | 6,825 | 7,680 | **4x more reads** than cpu_mem=4GB | -| model | mistral-7b | 7,853 | 8,781 | Highest throughput | -| users | 200 | 8,119 | 9,277 | Peak sustained load | -| Peak config | M-7b/cpu0/mca16/200users | **9,744** | **10,967** | 78% of 14 GB/s theoretical | - -**Key Insight:** `cpu_mem=0GB` is critical for storage stress - forces all decode reads from NVMe. - ---- - -*Full analysis: [mlperfv3_results_and_metrics_discovery.md](mlperfv3_results_and_metrics_discovery.md)* diff --git a/kv_cache_benchmark/kv-cache.py b/kv_cache_benchmark/kv-cache.py index 106418a5..2cd76289 100644 --- a/kv_cache_benchmark/kv-cache.py +++ b/kv_cache_benchmark/kv-cache.py @@ -3,86 +3,25 @@ KV Cache Benchmark - Multi-Tier Performance Comparison Kingston Digital, 2025 Licensed under the Apache License, Version 2.0 (the "License") -MLPerf Storage Working Group -This script provides a comprehensive, configurable benchmark for testing storage system -performance for Large Language Model (LLM) Key-Value (KV) cache offloading. It simulates -a realistic multi-tenant inference environment with a sophisticated multi-tier cache. - ---- Key Features --- -1. Phase-Aware Processing: Differentiates between the write-heavy 'prefill' phase - and the read-heavy 'decode' phase. -2. Stateful Multi-turn Conversations: Models cache reuse in conversational AI. -3. Hierarchical Prefix Caching: Simulates the caching of common prompts (e.g., system prompts) - for high-efficiency reuse across users. -4. RAG Workload Modeling: Simulates Retrieval-Augmented Generation workloads, which involve - large context sizes and unique I/O patterns. -5. Adaptive Autoscaling: Automatically adjusts the user load to find the saturation point - of the storage system. -6. Trace-Driven Validation: Can validate its own simulation against real-world traces. -7. QoS Support: Implements different priority levels (Interactive, Responsive, Batch) to - mimic real-world request scheduling. -8. Enhanced Metrics and Reporting: Provides detailed statistics on latency, throughput, IOPS, - and cache performance across all tiers. - -Target Accuracy: ±5% representation of real LLM inference clusters +MLPerf Storage Working Group + +Thin wrapper around the kv_cache package (modular_architecture/kv_cache/). +All implementation has been refactored into the package while this file +preserves backward compatibility for existing scripts and test imports. """ -import os import sys -import time -import json -import tempfile -import numpy as np -import hashlib -import shutil -from pathlib import Path -from dataclasses import dataclass, asdict, field, is_dataclass -from typing import Dict, List, Tuple, Optional, Set -from enum import Enum -import threading -import queue -import random -from datetime import datetime -from concurrent.futures import ThreadPoolExecutor -from collections import defaultdict -import argparse -import csv - -# Attempt to import optional GPU libraries (torch, cupy) -# The benchmark can run in a CPU-only environment if these are not found. -try: - import torch - TORCH_AVAILABLE = True -except ImportError: - TORCH_AVAILABLE = False - -try: - import cupy as cp - CUPY_AVAILABLE = True -except ImportError: - CUPY_AVAILABLE = False - -try: - import tiktoken - TIKTOKEN_AVAILABLE = True -except ImportError: - TIKTOKEN_AVAILABLE = False - -# Optional pandas/openpyxl for XLSX output -try: - import pandas as pd - PANDAS_AVAILABLE = True -except ImportError: - PANDAS_AVAILABLE = False - -try: - import openpyxl - OPENPYXL_AVAILABLE = True -except ImportError: - OPENPYXL_AVAILABLE = False - - -# ============================================================================ +import os + +# Add the script's directory to sys.path so `import kv_cache` resolves. +sys.path.insert(0, os.path.dirname(os.path.abspath(__file__))) + +# Re-export all public symbols for backward compatibility with test imports +from kv_cache import * # noqa: F401,F403 +from kv_cache.cli import main + + +# ============================================================================ # CORE DATA MODELS # Defines the basic data structures used throughout the benchmark. # ============================================================================ @@ -1868,7 +1807,6 @@ def get_stats(self, duration: float) -> Dict: 'read_write_ratio': self.stats['total_read_bytes'] / max(self.stats['total_write_bytes'], 1), 'read_iops': self.stats['read_operations'], 'write_iops': self.stats['write_operations'], - 'nvme_tokens_processed': self.stats['nvme_tokens_processed'], } # Add latency percentiles for each tier. @@ -2379,169 +2317,10 @@ def generate_mixed_users(cls, num_users: int) -> List[UserProfile]: return users -# ============================================================================ -# SHAREGPT DATASET LOADER -# Loads ShareGPT conversation data for realistic workload generation. -# ============================================================================ - -class ShareGPTDatasetLoader: - """ - Loads ShareGPT conversation data and provides realistic request patterns. - ShareGPT format has conversations with 'from' (human/gpt) and 'value' (text content). - """ - - def __init__(self, dataset_path: str, max_conversations: int = 1000, seed: Optional[int] = None): - """ - Initialize the ShareGPT dataset loader. - - Args: - dataset_path: Path to the ShareGPT JSON file - max_conversations: Maximum number of conversations to load - seed: Random seed for reproducibility - """ - self.dataset_path = dataset_path - self.max_conversations = max_conversations - self.conversations = [] - self.token_stats = {} - - if seed: - random.seed(seed) - np.random.seed(seed) - - self._load_dataset() - - def _load_dataset(self): - """Load and process the ShareGPT dataset.""" - if not os.path.exists(self.dataset_path): - print(f"[ShareGPT] Warning: Dataset not found at {self.dataset_path}") - return - - try: - # Try to initialize tokenizer for accurate token counting - tokenizer = None - if TIKTOKEN_AVAILABLE: - try: - tokenizer = tiktoken.get_encoding("cl100k_base") # GPT-4 tokenizer - except Exception: - pass - - if tokenizer is None: - print("[ShareGPT] Tiktoken not available, using approximate token counting") - - with open(self.dataset_path, 'r', encoding='utf-8') as f: - data = json.load(f) - - # Process conversations - for conv_idx, conversation in enumerate(data[:self.max_conversations]): - if 'conversations' not in conversation: - continue - - conv_data = [] - turns = conversation['conversations'] - - for i in range(0, len(turns) - 1, 2): # Process pairs of human-gpt turns - if i + 1 >= len(turns): - break - - human_turn = turns[i] - gpt_turn = turns[i + 1] - - if human_turn.get('from') != 'human' or gpt_turn.get('from') != 'gpt': - continue - - # Calculate tokens - context_text = human_turn.get('value', '') - generation_text = gpt_turn.get('value', '') - - if tokenizer: - context_tokens = len(tokenizer.encode(context_text)) - generation_tokens = len(tokenizer.encode(generation_text)) - else: - # Approximate: 4 characters per token on average - context_tokens = max(1, len(context_text) // 4) - generation_tokens = max(1, len(generation_text) // 4) - - # Limit extreme values for stability - context_tokens = min(context_tokens, 16384) # Cap at 16K context - generation_tokens = min(generation_tokens, 2048) # Cap at 2K generation - - conv_data.append({ - 'context_tokens': context_tokens, - 'generation_tokens': generation_tokens, - 'turn_number': i // 2 + 1 - }) - - if conv_data: - self.conversations.append({ - 'id': conversation.get('id', f'conv_{conv_idx}'), - 'turns': conv_data - }) - - # Calculate statistics - if self.conversations: - all_context_tokens = [] - all_generation_tokens = [] - - for conv in self.conversations: - for turn in conv['turns']: - all_context_tokens.append(turn['context_tokens']) - all_generation_tokens.append(turn['generation_tokens']) - - self.token_stats = { - 'context_mean': np.mean(all_context_tokens), - 'context_std': np.std(all_context_tokens), - 'context_min': np.min(all_context_tokens), - 'context_max': np.max(all_context_tokens), - 'context_p50': np.percentile(all_context_tokens, 50), - 'context_p95': np.percentile(all_context_tokens, 95), - 'generation_mean': np.mean(all_generation_tokens), - 'generation_std': np.std(all_generation_tokens), - 'generation_min': np.min(all_generation_tokens), - 'generation_max': np.max(all_generation_tokens), - 'generation_p50': np.percentile(all_generation_tokens, 50), - 'generation_p95': np.percentile(all_generation_tokens, 95), - 'total_conversations': len(self.conversations), - 'total_turns': sum(len(c['turns']) for c in self.conversations) - } - - print(f"[ShareGPT] Loaded {len(self.conversations)} conversations with {self.token_stats['total_turns']} turns") - print(f"[ShareGPT] Context tokens: mean={self.token_stats['context_mean']:.1f}, p50={self.token_stats['context_p50']:.1f}, p95={self.token_stats['context_p95']:.1f}") - print(f"[ShareGPT] Generation tokens: mean={self.token_stats['generation_mean']:.1f}, p50={self.token_stats['generation_p50']:.1f}, p95={self.token_stats['generation_p95']:.1f}") - - except Exception as e: - print(f"[ShareGPT] Error loading dataset: {e}") - self.conversations = [] - - def get_random_conversation(self) -> Optional[Dict]: - """Get a random conversation from the dataset.""" - if not self.conversations: - return None - return random.choice(self.conversations) - - def get_random_turn(self) -> Optional[Tuple[int, int]]: - """Get random context and generation token counts from the dataset.""" - if not self.conversations: - return None - - conv = self.get_random_conversation() - if conv and conv['turns']: - turn = random.choice(conv['turns']) - return turn['context_tokens'], turn['generation_tokens'] - return None - - def iterate_conversations(self, shuffle: bool = True): - """Iterate through all conversations, optionally shuffled.""" - conversations = self.conversations.copy() - if shuffle: - random.shuffle(conversations) - for conv in conversations: - yield conv - - -# ============================================================================ +# ============================================================================ # INTEGRATED BENCHMARK ORCHESTRATOR # This class wires all the components together and runs the main benchmark loop. -# ============================================================================ +# ============================================================================ class IntegratedBenchmark: """The main orchestrator for the entire benchmark.""" @@ -2565,12 +2344,8 @@ def __init__(self, performance_profile: str = 'latency', use_burst_trace: bool = False, burst_trace_path: Optional[str] = None, - dataset_path: Optional[str] = None, - max_conversations: int = 500, seed: Optional[int] = None, - max_concurrent_allocs: int = 0, - request_rate: float = 0, - max_requests: int = 0): + max_concurrent_allocs: int = 0): self.model_config = model_config self.num_users = num_users @@ -2586,28 +2361,11 @@ def __init__(self, self.performance_profile = performance_profile self.use_burst_trace = use_burst_trace self.burst_trace_path = burst_trace_path - self.dataset_path = dataset_path - self.max_conversations = max_conversations self.seed = seed self.max_concurrent_allocs = max_concurrent_allocs - self.request_rate = request_rate - self.max_requests = max_requests self.burst_requests: List[Tuple[int, int]] = [] - self.sharegpt_loader: Optional[ShareGPTDatasetLoader] = None - - # Load dataset if provided (takes priority over burst trace) - if self.dataset_path: - self.sharegpt_loader = ShareGPTDatasetLoader( - dataset_path=self.dataset_path, - max_conversations=self.max_conversations, - seed=self.seed - ) - self.use_dataset = True - elif self.use_burst_trace: + if self.use_burst_trace: self._load_burst_trace() - self.use_dataset = False - else: - self.use_dataset = False # Initialize components self.cache = MultiTierCache( @@ -2651,7 +2409,6 @@ def __init__(self, 'seed': self.seed, } self.results_lock = threading.Lock() - self.stop_event: Optional[threading.Event] = None # Set during run() self.rag_ingest_done = threading.Event() if self.enable_rag else None def _ingest_rag_documents(self, num_docs: int, stop_event: Optional[threading.Event] = None): @@ -2741,80 +2498,10 @@ def _generate_requests_from_trace(self, stop_event: threading.Event): priority_tuple = (-QOS_PROFILES[request.qos_level].priority, time.time()) self.request_queue.put((priority_tuple, request)) - + request_index += 1 time.sleep(0.01) # Simulate request arrival rate - def _generate_requests_from_dataset(self, stop_event: threading.Event): - """Generates InferenceRequest objects from the loaded ShareGPT dataset.""" - if not self.sharegpt_loader or not self.sharegpt_loader.conversations: - print("Warning: ShareGPT dataset is empty or not loaded. Falling back to synthetic workload.") - # Fall back to synthetic generation - users = UserSimulator.generate_mixed_users(self.num_users) - self.generate_requests(users, stop_event) - return - - conversation_iterator = iter(self.sharegpt_loader.iterate_conversations(shuffle=True)) - current_conversation = None - turn_index = 0 - - while not stop_event.is_set(): - # Get next conversation turn - if current_conversation is None or turn_index >= len(current_conversation['turns']): - try: - current_conversation = next(conversation_iterator) - turn_index = 0 - except StopIteration: - # Restart iteration when we run out of conversations - conversation_iterator = iter(self.sharegpt_loader.iterate_conversations(shuffle=True)) - continue - - turn = current_conversation['turns'][turn_index] - context_tokens = turn['context_tokens'] - generate_tokens = turn['generation_tokens'] - - with self.counter_lock: - req_id = self.request_counter - self.request_counter += 1 - - # Assign QoS level based on request characteristics - rand = random.random() - if rand < 0.15: - qos_level, priority = QoSLevel.INTERACTIVE, 3 - elif rand < 0.50: - qos_level, priority = QoSLevel.RESPONSIVE, 2 - else: - qos_level, priority = QoSLevel.BATCH, 1 - - user_id = f"dataset_user_{req_id % self.num_users}" - conv_id = current_conversation['id'] - - # Determine inference phase - phase = InferencePhase.PREFILL if context_tokens >= 10000 else InferencePhase.PREFILL_DECODE - - request = InferenceRequest( - user_id=user_id, - request_id=f"{user_id}_req_{req_id:04d}", - timestamp=datetime.now(), - context_tokens=context_tokens, - generate_tokens=generate_tokens, - priority=priority, - phase=phase, - qos_level=qos_level, - cache_key=f"{conv_id}_turn_{turn['turn_number']}", - conversation_id=conv_id if self.enable_multi_turn else None, - turn_number=turn['turn_number'] if self.enable_multi_turn else None - ) - - priority_tuple = (-QOS_PROFILES[request.qos_level].priority, time.time()) - self.request_queue.put((priority_tuple, request)) - - turn_index += 1 - - # Control request arrival rate (0 = unlimited for storage saturation) - if self.request_rate > 0: - time.sleep(1.0 / self.request_rate) - def generate_requests(self, users: List[UserProfile], stop_event: threading.Event): """Generate requests concurrently for each simulated user.""" @@ -3006,11 +2693,6 @@ def process_requests(self, stop_event: threading.Event): self.results['storage_latencies'].append(storage_latency) self.results['generation_latencies'].append(generation_latency) - # Check if we've hit max_requests limit - if self.max_requests > 0 and self.results['requests_completed'] >= self.max_requests: - if self.stop_event: - self.stop_event.set() - self.qos_monitor.record_request(request) def monitor_stats(self, stop_event: threading.Event): @@ -3106,13 +2788,12 @@ def run(self) -> Dict: print(f" - Mode: {self.autoscaler.mode}") print(f" - QoS Support: Enabled (Interactive/Responsive/Batch)") print(f" - Trace-Driven (BurstGPT): {'Enabled' if self.use_burst_trace else 'Disabled'}") - print(f" - ShareGPT Dataset: {'Enabled' if self.use_dataset else 'Disabled'}") if self.max_concurrent_allocs > 0: print(f" - Max Concurrent Allocations: {self.max_concurrent_allocs} (bounds RAM usage)") print("=" * 80) users = [] - if not self.use_burst_trace and not self.use_dataset: + if not self.use_burst_trace: users = UserSimulator.generate_mixed_users(self.num_users) context_lengths = [u.context_length for u in users] print(f"\nUser Context Length Distribution:") @@ -3124,21 +2805,14 @@ def run(self) -> Dict: print(f"\nQoS Distribution:") for level, count in qos_dist.items(): print(f" {level.value}: {count} users") - elif self.use_dataset and self.sharegpt_loader: - print(f"\nShareGPT Dataset Statistics:") - print(f" Conversations: {self.sharegpt_loader.token_stats.get('total_conversations', 0)}") - print(f" Total Turns: {self.sharegpt_loader.token_stats.get('total_turns', 0)}") print(f"\nStarting benchmark...") print("-" * 80) stop_event = threading.Event() - self.stop_event = stop_event # Store for max_requests check threads = [] - if self.use_dataset: - gen_thread = threading.Thread(target=self._generate_requests_from_dataset, args=(stop_event,), daemon=True) - elif self.use_burst_trace: + if self.use_burst_trace: gen_thread = threading.Thread(target=self._generate_requests_from_trace, args=(stop_event,), daemon=True) else: gen_thread = threading.Thread(target=self.generate_requests, args=(users, stop_event), daemon=True) @@ -3158,36 +2832,31 @@ def run(self) -> Dict: threads.append(mon_thread) mon_thread.start() - # Wait for either the configured duration or an earlier stop signal (from max_requests or monitor). - benchmark_start = time.time() + # Wait for either the configured duration or an earlier stop signal from the monitor. stop_event.wait(timeout=self.duration) - actual_duration = time.time() - benchmark_start stop_event.set() for thread in threads: thread.join(timeout=2.0) - self._calculate_stats(actual_duration) + self._calculate_stats() if self.validator: self.results['validation'] = self.validator.validate_benchmark(self.results) return self.results - def _calculate_stats(self, actual_duration: float = None): + def _calculate_stats(self): """Calculate final statistics with all feature breakdowns""" if not self.results['end_to_end_latencies']: print("\nNo requests completed during benchmark!") return - # Use actual duration if provided (for max_requests mode), else configured duration - duration = actual_duration if actual_duration else self.duration - e2e = np.array(self.results['end_to_end_latencies']) storage = np.array(self.results['storage_latencies']) generation = np.array(self.results['generation_latencies']) - cache_stats = self.cache.get_stats(duration) + cache_stats = self.cache.get_stats(self.duration) qos_metrics = self.qos_monitor.get_all_qos_metrics() prefix_stats = self.prefix_cache_manager.stats if self.prefix_cache_manager else {} autoscaling_stats = self.autoscaler.scaling_history if self.autoscaler else [] @@ -3208,11 +2877,8 @@ def _calculate_stats(self, actual_duration: float = None): summary = { 'total_requests': self.results['requests_completed'], 'total_tokens': self.results['total_tokens_generated'], - 'elapsed_time': duration, - 'avg_throughput_tokens_per_sec': self.results['total_tokens_generated'] / duration, - 'total_storage_io_time': self.results['total_storage_io_latency'], - 'storage_throughput_tokens_per_sec': self.results['total_tokens_generated'] / self.results['total_storage_io_latency'] if self.results['total_storage_io_latency'] > 0 else 0, - 'requests_per_second': self.results['requests_completed'] / duration, + 'avg_throughput_tokens_per_sec': self.results['total_tokens_generated'] / self.duration, + 'requests_per_second': self.results['requests_completed'] / self.duration, 'end_to_end_latency_ms': { 'mean': np.mean(e2e) * 1000, 'p50': np.percentile(e2e, 50) * 1000, @@ -3319,8 +2985,7 @@ def _print_summary(self, summary: Dict): print(f"\n### OVERALL PERFORMANCE ###") print(f"Requests Completed: {summary['total_requests']}") print(f"Total Tokens Generated: {summary['total_tokens']}") - print(f"Throughput (wall-clock): {summary['avg_throughput_tokens_per_sec']:.2f} tokens/sec") - print(f"Throughput (storage I/O): {summary['storage_throughput_tokens_per_sec']:.2f} tokens/sec") + print(f"Throughput: {summary['avg_throughput_tokens_per_sec']:.2f} tokens/sec") print(f"Requests/sec: {summary['requests_per_second']:.2f}") print(f"\n### END-TO-END LATENCY (Storage I/O + Token Generation) ###") @@ -3461,26 +3126,12 @@ def main(): help='Path to the BurstGPT trace file.') parser.add_argument('--validation-trace', type=str, default=None, help='Path to a real-world trace file for validation.') - parser.add_argument('--dataset-path', type=str, default=None, - help='Path to ShareGPT dataset JSON file for realistic workload generation.') - parser.add_argument('--max-conversations', type=int, default=500, - help='Maximum number of conversations to load from the ShareGPT dataset.') parser.add_argument('--output', type=str, default=f"benchmark_results_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json", help='Output file for results') parser.add_argument('--seed', type=int, default=None, help='Seed for random number generators to ensure reproducibility.') parser.add_argument('--max-concurrent-allocs', type=int, default=0, help='Limit concurrent allocations to bound RAM usage. 0 = unlimited. ' 'Recommended: 8-16 for large models to prevent memory explosion.') - parser.add_argument('--request-rate', type=float, default=0, - help='Target request arrival rate for ShareGPT replay (requests/sec). ' - '0 = unlimited (storage-saturating mode for MLPerf). ' - 'Default: 0. Use 10 for realistic user arrival patterns.') - parser.add_argument('--max-requests', type=int, default=0, - help='Stop after completing N requests (0 = use duration instead). ' - 'Useful for fixed-workload comparisons with vLLM benchmarks.') - parser.add_argument('--xlsx-output', type=str, default=None, - help='Optional: Output Excel file path for summary results with run parameters. ' - 'Requires pandas and openpyxl. Falls back to CSV if openpyxl not available.') args = parser.parse_args() @@ -3515,12 +3166,8 @@ def main(): performance_profile=args.performance_profile, use_burst_trace=args.use_burst_trace, burst_trace_path=args.burst_trace_path, - dataset_path=args.dataset_path, - max_conversations=args.max_conversations, seed=args.seed, - max_concurrent_allocs=args.max_concurrent_allocs, - request_rate=args.request_rate, - max_requests=args.max_requests + max_concurrent_allocs=args.max_concurrent_allocs ) results = benchmark.run() @@ -3542,166 +3189,5 @@ def convert_numpy(obj): print(f"\nResults saved to {args.output}") - # Export to XLSX if requested - if args.xlsx_output: - export_results_to_xlsx(results, args, args.xlsx_output) - - -def export_results_to_xlsx(results: Dict, args, output_path: str): - """ - Export benchmark results to an Excel file with run parameters embedded. - Falls back to CSV if openpyxl is not available. - - Args: - results: The benchmark results dictionary - args: The argparse namespace with all CLI parameters - output_path: Path for the output Excel/CSV file - """ - if not PANDAS_AVAILABLE: - print(f"Warning: pandas not available, skipping XLSX export. Install with: pip install pandas") - return - - summary = results.get('summary', {}) - if not summary: - print("Warning: No summary data available for XLSX export") - return - - # Helper to safely get nested keys - def get_nested(d, keys, default=None): - for key in keys: - if isinstance(d, dict): - d = d.get(key, default) - else: - return default - return d - - # Build run parameters row - run_params = { - 'Timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S'), - 'Model': args.model, - 'Num Users': args.num_users, - 'Duration (s)': args.duration, - 'GPU Memory (GB)': args.gpu_mem_gb, - 'CPU Memory (GB)': args.cpu_mem_gb, - 'Generation Mode': args.generation_mode, - 'Performance Profile': args.performance_profile, - 'Multi-turn': not args.disable_multi_turn, - 'Prefix Caching': not args.disable_prefix_caching, - 'RAG Enabled': args.enable_rag, - 'Autoscaling': args.enable_autoscaling, - 'Seed': args.seed, - 'Max Concurrent Allocs': args.max_concurrent_allocs, - 'Request Rate': args.request_rate, - 'Max Requests': args.max_requests, - 'Dataset Path': args.dataset_path or 'N/A', - 'Cache Dir': args.cache_dir or 'temp', - } - - # Build metrics row - metrics = { - 'Total Requests': summary.get('total_requests'), - 'Total Tokens': summary.get('total_tokens'), - 'Elapsed Time (s)': summary.get('elapsed_time'), - 'Avg Throughput (tok/s)': summary.get('avg_throughput_tokens_per_sec'), - 'Storage Throughput (tok/s)': summary.get('storage_throughput_tokens_per_sec'), - 'Requests/sec': summary.get('requests_per_second'), - - # End to End Latency - 'E2E Latency Mean (ms)': get_nested(summary, ['end_to_end_latency_ms', 'mean']), - 'E2E Latency P50 (ms)': get_nested(summary, ['end_to_end_latency_ms', 'p50']), - 'E2E Latency P95 (ms)': get_nested(summary, ['end_to_end_latency_ms', 'p95']), - 'E2E Latency P99 (ms)': get_nested(summary, ['end_to_end_latency_ms', 'p99']), - - # Storage IO Latency - 'Storage Latency Mean (ms)': get_nested(summary, ['storage_io_latency_ms', 'mean']), - 'Storage Latency P50 (ms)': get_nested(summary, ['storage_io_latency_ms', 'p50']), - 'Storage Latency P95 (ms)': get_nested(summary, ['storage_io_latency_ms', 'p95']), - 'Storage Latency P99 (ms)': get_nested(summary, ['storage_io_latency_ms', 'p99']), - - # Generation Latency - 'Gen Latency Mean (ms)': get_nested(summary, ['generation_latency_ms', 'mean']), - 'Gen Latency P50 (ms)': get_nested(summary, ['generation_latency_ms', 'p50']), - 'Gen Latency P95 (ms)': get_nested(summary, ['generation_latency_ms', 'p95']), - 'Gen Latency P99 (ms)': get_nested(summary, ['generation_latency_ms', 'p99']), - - # Cache Stats - 'Cache Hit Rate': get_nested(summary, ['cache_stats', 'cache_hit_rate']), - 'Read/Write Ratio': get_nested(summary, ['cache_stats', 'read_write_ratio']), - 'Total Read (GB)': get_nested(summary, ['cache_stats', 'total_read_gb']), - 'Total Write (GB)': get_nested(summary, ['cache_stats', 'total_write_gb']), - 'Prefill Bytes Written (GB)': get_nested(summary, ['cache_stats', 'prefill_bytes_written_gb']), - 'Decode Bytes Read (GB)': get_nested(summary, ['cache_stats', 'decode_bytes_read_gb']), - - # Tier distribution - 'GPU Entries': get_nested(summary, ['cache_stats', 'gpu_entries']), - 'CPU Entries': get_nested(summary, ['cache_stats', 'cpu_entries']), - 'NVMe Entries': get_nested(summary, ['cache_stats', 'nvme_entries']), - - # Multi-turn stats - 'Multi-turn Hit Rate': get_nested(summary, ['multi_turn_stats', 'hit_rate']), - } - - # Combine into single row with all data - combined_row = {**run_params, **metrics} - - # Create DataFrame - df = pd.DataFrame([combined_row]) - - # Determine output format - use_excel = OPENPYXL_AVAILABLE and output_path.endswith('.xlsx') - - try: - if use_excel: - # Create Excel with multiple sheets for better organization - with pd.ExcelWriter(output_path, engine='openpyxl') as writer: - # Sheet 1: Combined summary (single row for easy aggregation) - df.to_excel(writer, sheet_name='Summary', index=False) - - # Sheet 2: Run Parameters (vertical format for readability) - params_df = pd.DataFrame(list(run_params.items()), columns=['Parameter', 'Value']) - params_df.to_excel(writer, sheet_name='Run Parameters', index=False) - - # Sheet 3: Performance Metrics (vertical format) - metrics_df = pd.DataFrame(list(metrics.items()), columns=['Metric', 'Value']) - metrics_df.to_excel(writer, sheet_name='Performance Metrics', index=False) - - # Sheet 4: QoS Metrics if available - qos_metrics = summary.get('qos_metrics', {}) - if qos_metrics: - qos_rows = [] - for level, data in qos_metrics.items(): - if isinstance(data, dict) and not data.get('no_data'): - qos_rows.append({ - 'QoS Level': level, - 'Total Requests': data.get('total_requests'), - 'Latency P95 (ms)': get_nested(data, ['latency_ms', 'p95']), - 'Latency P99 (ms)': get_nested(data, ['latency_ms', 'p99']), - 'SLA Met': get_nested(data, ['sla', 'met']), - 'SLA Compliance': get_nested(data, ['sla', 'compliance']), - }) - if qos_rows: - qos_df = pd.DataFrame(qos_rows) - qos_df.to_excel(writer, sheet_name='QoS Metrics', index=False) - - print(f"XLSX results saved to {output_path}") - else: - # Fall back to CSV - csv_path = output_path.replace('.xlsx', '.csv') if output_path.endswith('.xlsx') else output_path - if not csv_path.endswith('.csv'): - csv_path += '.csv' - df.to_csv(csv_path, index=False) - print(f"CSV results saved to {csv_path} (openpyxl not available for XLSX)") - - except Exception as e: - print(f"Error saving XLSX/CSV: {e}") - # Last resort: try CSV - try: - csv_path = output_path.replace('.xlsx', '.csv') - df.to_csv(csv_path, index=False) - print(f"Fallback CSV saved to {csv_path}") - except Exception as e2: - print(f"Failed to save results: {e2}") - - if __name__ == "__main__": main() diff --git a/kv_cache_benchmark/kv-cache_sharegpt_replay.py b/kv_cache_benchmark/kv-cache_sharegpt_replay.py deleted file mode 100644 index 34df6abe..00000000 --- a/kv_cache_benchmark/kv-cache_sharegpt_replay.py +++ /dev/null @@ -1,3151 +0,0 @@ -#!/usr/bin/env python3 -""" -Integrated Multi-User KV Cache Benchmark - Enhanced Version -Hazem Awadallah, Kingston Digital, 2025 -Assisted by Github Copilot - -Integrated Multi-User KV Cache Benchmark - Enhanced Version -MLPerf Storage Working Group - Benchmark Implementation - -This script provides a comprehensive, configurable benchmark for testing storage system -performance for Large Language Model (LLM) Key-Value (KV) cache offloading. It simulates -a realistic multi-tenant inference environment with a sophisticated multi-tier cache. - ---- Key Features --- -1. Phase-Aware Processing: Differentiates between the write-heavy 'prefill' phase - and the read-heavy 'decode' phase. -2. Stateful Multi-turn Conversations: Models cache reuse in conversational AI. -3. Hierarchical Prefix Caching: Simulates the caching of common prompts (e.g., system prompts) - for high-efficiency reuse across users. -4. RAG Workload Modeling: Simulates Retrieval-Augmented Generation workloads, which involve - large context sizes and unique I/O patterns. -5. Adaptive Autoscaling: Automatically adjusts the user load to find the saturation point - of the storage system. -6. Trace-Driven Validation: Can validate its own simulation against real-world traces. -7. QoS Support: Implements different priority levels (Interactive, Responsive, Batch) to - mimic real-world request scheduling. -8. Enhanced Metrics and Reporting: Provides detailed statistics on latency, throughput, IOPS, - and cache performance across all tiers. - -Target Accuracy: ±5% representation of real LLM inference clusters -""" - -import os -import sys -import time -import json -import tempfile -import numpy as np -import hashlib -import shutil -from pathlib import Path -from dataclasses import dataclass, asdict, field, is_dataclass -from typing import Dict, List, Tuple, Optional, Set -from enum import Enum -import threading -import queue -import random -from datetime import datetime -from concurrent.futures import ThreadPoolExecutor -from collections import defaultdict -import argparse -import csv -import tiktoken # For tokenization if needed - -# Attempt to import optional GPU libraries (torch, cupy) -# The benchmark can run in a CPU-only environment if these are not found. -try: - import torch - TORCH_AVAILABLE = True -except ImportError: - TORCH_AVAILABLE = False - -try: - import cupy as cp - CUPY_AVAILABLE = True -except ImportError: - CUPY_AVAILABLE = False - - -# ============================================================================ -# CORE DATA MODELS -# Defines the basic data structures used throughout the benchmark. -# ============================================================================ - -@dataclass -class ModelConfig: - """ - Configuration for a model's KV cache requirements. - - This dataclass holds the architectural parameters of an LLM that are essential - for calculating the size of its KV cache. - """ - name: str - num_layers: int # Number of transformer layers in the model. - hidden_dim: int # The size of the main hidden state vector. - num_heads: int # Number of attention heads for queries (Q). - kv_heads: int # Number of attention heads for keys/values (K/V). For GQA, kv_heads < num_heads. - dtype: str = 'float16' # Data type used for cache tensors (e.g., float16, bfloat16). - - @property - def bytes_per_element(self) -> int: - """Returns the size in bytes of a single element based on the data type.""" - dtype_map = {'float32': 4, 'float16': 2, 'bfloat16': 2, 'int8': 1} - return dtype_map.get(self.dtype, 2) # Default to 2 bytes for float16/bfloat16 - - @property - def kv_dim_per_head(self) -> int: - """Calculates the dimension of each Key/Value attention head.""" - return self.hidden_dim // self.num_heads - - @property - def kv_cache_size_per_token(self) -> int: - """ - Calculates the total memory in bytes required to store the KV cache for a single token. - This is the fundamental unit for all memory calculations in the benchmark. - Formula: num_layers * num_kv_heads * head_dimension * 2 (for K and V) * bytes_per_element - """ - return self.num_layers * self.kv_heads * self.kv_dim_per_head * 2 * self.bytes_per_element - - -# A dictionary of pre-defined model configurations that can be selected via command line. -MODEL_CONFIGS = { - 'tiny-1b': ModelConfig( - name='Tiny 1B', - num_layers=12, - hidden_dim=1024, - num_heads=8, - kv_heads=4, - dtype='float16' - ), - 'tinyllama-1.1b': ModelConfig( - name='TinyLlama/TinyLlama-1.1B-Chat-v1.0', - num_layers=12, - hidden_dim=1024, - num_heads=8, - kv_heads=4, - dtype='float16' - ), - 'mistral-7b': ModelConfig( - name='Mistral 7B', - num_layers=32, - hidden_dim=4096, - num_heads=32, - kv_heads=8, - dtype='float16' - ), - 'llama2-7b': ModelConfig( - name='Llama 2 7B', - num_layers=32, - hidden_dim=4096, - num_heads=32, - kv_heads=32, # Llama 2 uses Multi-Head Attention (MHA), so kv_heads == num_heads - dtype='float16' - ), - 'llama3.1-8b': ModelConfig( - name='Llama 3.1 8B', - num_layers=32, - hidden_dim=4096, - num_heads=32, - kv_heads=8, - dtype='float16' - ), - 'llama3.1-70b-instruct': ModelConfig( - name='Llama 3.1 70B Instruct', - num_layers=80, - hidden_dim=8192, - num_heads=64, - kv_heads=8, - dtype='float16' - ), -} - - -# ============================================================================ -# FEATURE 1: PHASE-AWARE PROCESSING -# Models the two distinct phases of LLM inference, which have different I/O patterns. -# ============================================================================ - -class InferencePhase(Enum): - """Enumeration for the two main phases of LLM inference.""" - PREFILL = "prefill" # Write-heavy phase: processing the input prompt. - DECODE = "decode" # Read-heavy phase: generating output tokens one by one. - PREFILL_DECODE = "both" # A combined phase for very short requests. - - -class GenerationMode(Enum): - """Enumeration for token generation simulation modes.""" - NONE = "none" # Pure storage benchmark. No simulated sleep. Latency is 100% I/O. - FAST = "fast" # Simulates a very fast GPU (2ms/token) to model some backpressure. - REALISTIC = "realistic" # Simulates a realistic GPU (30ms/token) for end-to-end latency analysis. - -# Defines the sleep time per token to simulate GPU work for each mode. -GENERATION_TIMING = { - GenerationMode.NONE: 0.0, - GenerationMode.FAST: 0.002, - GenerationMode.REALISTIC: 0.030, -} - - -# ============================================================================ -# FEATURE 7: QOS SUPPORT -# Models a multi-tenant environment where requests have different priorities. -# ============================================================================ - -class QoSLevel(Enum): - """Enumeration for Quality of Service (QoS) levels, defining user priority.""" - INTERACTIVE = "interactive" # Highest priority, for real-time applications (e.g., chatbot UI). - RESPONSIVE = "responsive" # High priority, for near real-time tasks. - BATCH = "batch" # Low priority, for offline processing. - - -@dataclass -class QoSSLA: - """ - Represents a Service Level Agreement (SLA) for a given QoS level. - Defines the performance targets and tracks violations. - """ - qos_level: QoSLevel - target_latency_p95_ms: float # The 95th percentile latency target. - target_latency_p99_ms: float # The 99th percentile latency target. - priority: int # An integer priority level (higher is more important). - - # SLA violation tracking - violations: int = 0 - total_requests: int = 0 - - @property - def sla_compliance(self) -> float: - """Calculates the percentage of requests that met the SLA target.""" - if self.total_requests == 0: - return 1.0 - return 1.0 - (self.violations / self.total_requests) - - -# Pre-defined QoS profiles mapping each level to a specific SLA. -QOS_PROFILES = { - QoSLevel.INTERACTIVE: QoSSLA( - qos_level=QoSLevel.INTERACTIVE, - target_latency_p95_ms=50, - target_latency_p99_ms=100, - priority=3 - ), - QoSLevel.RESPONSIVE: QoSSLA( - qos_level=QoSLevel.RESPONSIVE, - target_latency_p95_ms=100, - target_latency_p99_ms=200, - priority=2 - ), - QoSLevel.BATCH: QoSSLA( - qos_level=QoSLevel.BATCH, - target_latency_p95_ms=1000, - target_latency_p99_ms=5000, - priority=1 - ) -} - - -@dataclass -class UserProfile: - """Represents a simulated user with specific behavior patterns.""" - user_id: str - context_length: int # The number of tokens in the user's prompts. - generation_length: int # The number of tokens the user requests to be generated. - think_time: float # The simulated time the user "thinks" between requests. - priority: int - qos_level: QoSLevel - session_start: datetime = field(default_factory=datetime.now) - total_latency: float = 0.0 - request_count: int = 0 - - -@dataclass -class InferenceRequest: - """Represents a single, atomic inference request sent to the benchmark.""" - user_id: str - request_id: str - timestamp: datetime - context_tokens: int - generate_tokens: int - priority: int - phase: InferencePhase = InferencePhase.PREFILL_DECODE - qos_level: QoSLevel = QoSLevel.BATCH - cache_key: Optional[str] = None # The unique identifier for this request's KV cache. - - # Timing fields to track latency at different stages. - submit_time: float = field(default_factory=time.perf_counter) # When the request was created. - start_time: float = 0 # When processing began. - complete_time: float = 0 # When processing finished. - - # Conversation tracking for stateful workloads. - conversation_id: Optional[str] = None - turn_number: int = 0 - - def __post_init__(self): - """Post-initialization hook to automatically generate a cache key. - - If a `cache_key` is not explicitly provided during the object's - creation, this method constructs one based on the available context. - - The generation logic is as follows: - - If a `conversation_id` is present, the key is formatted as - `f"{conversation_id}_turn_{turn_number}"` to uniquely identify a - specific turn within a conversation. - - Otherwise, it defaults to a user-specific context key formatted as - `f"{user_id}_ctx"`. - - This ensures that every instance has a non-null `cache_key` for - cache management. - """ - - if self.cache_key is None: - if self.conversation_id: - self.cache_key = f"{self.conversation_id}_turn_{self.turn_number}" - else: - self.cache_key = f"{self.user_id}_ctx" - - @property - def total_latency_ms(self) -> float: - """Calculates the total end-to-end latency for the request in milliseconds.""" - if self.complete_time == 0: - return 0 - return (self.complete_time - self.submit_time) * 1000 - - -# ============================================================================ -# FEATURE 2: STATEFUL MULTI-TURN CONVERSATIONS -# Models how conversational context is managed and reused over time. -# ============================================================================ - -@dataclass -class ConversationState: - """Tracks the state of a single multi-turn conversation for a user.""" - conversation_id: str - user_id: str - turn_number: int - created_at: datetime - last_access: datetime - - # KV cache management for this conversation. - cache_keys: List[str] = field(default_factory=list) # List of cache keys for each turn. - cumulative_tokens: int = 0 - cache_locations: Dict[str, str] = field(default_factory=dict) - - # Metadata for advanced caching strategies. - system_prompt_key: Optional[str] = None - common_prefix_keys: List[str] = field(default_factory=list) - - # Performance tracking for this conversation. - turns_completed: int = 0 - total_latency: float = 0.0 - cache_hits: int = 0 - cache_misses: int = 0 - - -class ConversationManager: - """Manages the lifecycle of all multi-turn conversations and enables cache reuse.""" - - def __init__(self, max_conversations: int = 1000, max_turns_per_conv: int = 50): - self.conversations: Dict[str, ConversationState] = {} - self.max_conversations = max_conversations - self.max_turns_per_conv = max_turns_per_conv - self.lock = threading.Lock() # Protects access to the shared conversations dictionary. - - def start_conversation(self, user_id: str, system_prompt: Optional[str] = None) -> str: - """Initializes a new conversation for a given user. - This method creates a unique conversation ID and a corresponding - `ConversationState` object to track the conversation's progress. - It handles an optional system prompt by creating a reusable, hashed key for it. - If the total number of active conversations reaches the configured - maximum (`self.max_conversations`), the least recently accessed - conversation is evicted to make room for the new one. - Args: - user_id (str): The unique identifier for the user starting the conversation. - system_prompt (Optional[str]): An optional initial prompt to set the - conversation's context. Defaults to None. - Returns: - str: The unique identifier generated for the new conversation. - """ - - conv_id = f"conv_{user_id}_{int(time.time()*1000)}" - - state = ConversationState( - conversation_id=conv_id, - user_id=user_id, - turn_number=0, - created_at=datetime.now(), - last_access=datetime.now(), - cache_keys=[], - cumulative_tokens=0, - cache_locations={} - ) - - # If a system prompt is provided, create a deterministic, reusable key for it. - # Hashing the prompt text ensures that identical system prompts across different - # conversations map to the same cache key, enabling high-efficiency reuse. - if system_prompt: - state.system_prompt_key = f"system_prompt_{hashlib.sha256(system_prompt.encode()).hexdigest()[:16]}" - - with self.lock: - # If the number of conversations exceeds the max, evict the oldest one. Otherwise, add the new conversation. - if len(self.conversations) >= self.max_conversations: - self._evict_oldest_conversation() - - self.conversations[conv_id] = state - - return conv_id - - def add_turn(self, conversation_id: str, user_message_tokens: int, - assistant_response_tokens: int) -> Tuple[int, str]: - """ - Adds a new turn to an existing conversation, updating its state. - This method is thread-safe. It locates a conversation by its ID, - increments the turn counter, updates the total token count, and generates - a unique cache key for the new turn. The conversation's last access - time is also updated. - Args: - conversation_id (str): The unique identifier for the conversation. - user_message_tokens (int): The number of tokens in the user's message for this turn. - assistant_response_tokens (int): The number of tokens in the assistant's response for this turn. - Returns: - Tuple[int, str]: A tuple containing the new turn number and the unique cache key generated for this turn. - Raises: - ValueError: If no conversation with the given `conversation_id` is found. - """ - - with self.lock: - if conversation_id not in self.conversations: - raise ValueError(f"Conversation {conversation_id} not found") - - state = self.conversations[conversation_id] - state.turn_number += 1 - state.last_access = datetime.now() - - turn_cache_key = f"{conversation_id}_turn_{state.turn_number}" - - # Update conversation state with new tokens and cache key. - state.cache_keys.append(turn_cache_key) - state.cumulative_tokens += user_message_tokens + assistant_response_tokens - state.turns_completed += 1 - - return state.turn_number, turn_cache_key - - def get_conversation_context_size(self, conversation_id: str) -> int: - """Gets the total number of tokens accumulated in a conversation.""" - with self.lock: - if conversation_id not in self.conversations: - return 0 - return self.conversations[conversation_id].cumulative_tokens - - def get_all_previous_turn_keys(self, conversation_id: str, current_turn: int) -> List[str]: - """ - Retrieves all cache keys from previous turns in a conversation. - - This method is used to assemble the full context for a new turn by fetching - the cache keys for all preceding turns in a given conversation. It allows - the inference engine to load the entire conversational history from the - KV cache before processing the new user input. - - Args: - conversation_id (str): The unique identifier for the conversation. - current_turn (int): The current turn number. The cache key for this - turn will be excluded from the result. - - Returns: - List[str]: A list of cache keys corresponding to all turns before - the current one. Returns an empty list if the conversation - is not found. - """ - with self.lock: - if conversation_id not in self.conversations: - return [] - state = self.conversations[conversation_id] - # Return all turns up to (but not including) the current turn - return [key for key in state.cache_keys if key != f"{conversation_id}_turn_{current_turn}"] - - def _evict_oldest_conversation(self): - """Evicts the least recently used (LRU) conversation to make space.""" - if not self.conversations: - return - # Find the conversation with the oldest `last_access` timestamp (Least Recently Used). - # The min() function scans all conversations to find the one with the smallest - # (oldest) `last_access` time. This is the LRU entry. - # - # Time --> - # +------------------------------------------------+ - # | Conv_B | Conv_D | Conv_A | Conv_C | - # +------------------------------------------------+ - # ^ - # | - # Oldest Access Time (min). This one is evicted. - # - oldest_conv_id = min( - self.conversations, - key=lambda k: (self.conversations[k].last_access, self.conversations[k].created_at) - ) - del self.conversations[oldest_conv_id] - - -# ============================================================================ -# FEATURE 3: HIERARCHICAL PREFIX CACHING -# Models the reuse of common prompts (e.g., "You are a helpful assistant"). -# ============================================================================ - -class PrefixType(Enum): - """Enumeration for the different tiers of prefix caching.""" - SYSTEM_PROMPT = "system_prompt" # Highest reuse, almost never evicted. - COMMON_PHRASE = "common_phrase" # High reuse, rarely evicted. - USER_SPECIFIC = "user_specific" # Low reuse, normal eviction policy. - - -@dataclass -class PrefixCacheEntry: - """Represents a cached prefix.""" - prefix_key: str - prefix_type: PrefixType - text_hash: str - token_count: int - kv_cache_key: str # The key pointing to the actual KV cache data in the multi-tier cache. - - # Usage statistics to track popularity and reuse. - use_count: int = 0 - first_seen: datetime = field(default_factory=datetime.now) - last_used: datetime = field(default_factory=datetime.now) - users_using: Set[str] = field(default_factory=set) - - # Storage information. - storage_tier: str = "" - size_bytes: int = 0 - - -class PrefixMatcher: - """Detects and matches common prefixes in requests to enable reuse.""" - - # A list of common system prompts to simulate prefix matching. - COMMON_SYSTEM_PROMPTS = [ - "You are a helpful assistant.", - "You are an AI assistant helping with coding tasks.", - "You are a professional writing assistant.", - ] - - def __init__(self, min_prefix_length: int = 50): - self.min_prefix_length = min_prefix_length - self.prefix_index: Dict[str, PrefixCacheEntry] = {} - self.prefix_frequency: Dict[str, int] = {} - self.lock = threading.Lock() - - def hash_prefix(self, text: str, token_count: int) -> str: - """Creates a deterministic hash for a given text prefix.""" - content = f"{text[:500]}_{token_count}" - return hashlib.sha256(content.encode()).hexdigest()[:16] - - def detect_system_prompt(self, context_tokens: int) -> Optional[PrefixCacheEntry]: - """Simulates the detection of a common system prompt at the start of a request.""" - # In this simulation, 20% of requests are assumed to start with a common system prompt. - if random.random() < 0.2: - system_prompt = random.choice(self.COMMON_SYSTEM_PROMPTS) - prefix_hash = self.hash_prefix(system_prompt, len(system_prompt.split())) - - with self.lock: - if prefix_hash in self.prefix_index: - # If this prompt has been seen before, increment its use count. - entry = self.prefix_index[prefix_hash] - entry.use_count += 1 - entry.last_used = datetime.now() - return entry - else: - # If it's a new prompt, create a new entry for it. - entry = PrefixCacheEntry( - prefix_key=f"system_{prefix_hash}", - prefix_type=PrefixType.SYSTEM_PROMPT, - text_hash=prefix_hash, - token_count=len(system_prompt.split()), - kv_cache_key=f"kv_system_{prefix_hash}", - use_count=1 - ) - self.prefix_index[prefix_hash] = entry - return entry - return None - - -class PrefixCacheManager: - """Orchestrates the prefix matching and caching logic.""" - - def __init__(self, cache, max_prefix_entries: int = 1000): - self.cache = cache # A reference to the main MultiTierCache. - self.max_prefix_entries = max_prefix_entries - self.prefix_matcher = PrefixMatcher() - self.lock = threading.Lock() - - # Statistics for reporting prefix cache effectiveness. - self.stats = { - 'prefix_hits': 0, - 'prefix_misses': 0, - 'system_prompt_reuse': 0, - 'common_phrase_reuse': 0, - 'bytes_saved': 0 - } - - def check_prefix_cache(self, request: InferenceRequest, model_config: ModelConfig) -> Tuple[Optional[PrefixCacheEntry], int]: - """ - Checks if the beginning of a request matches a known, cached prefix. - - Returns: - A tuple containing the PrefixCacheEntry if a hit occurs (or None), - and the number of remaining (non-prefixed) tokens in the request. - """ - prefix_entry = self.prefix_matcher.detect_system_prompt(request.context_tokens) - - if prefix_entry: - # On a hit, update stats and calculate how many tokens were saved. - with self.lock: - self.stats['prefix_hits'] += 1 - if prefix_entry.prefix_type == PrefixType.SYSTEM_PROMPT: - self.stats['system_prompt_reuse'] += 1 - self.stats['bytes_saved'] += prefix_entry.token_count * model_config.kv_cache_size_per_token - - # Return the prefix entry and the number of remaining tokens to process. - remaining_tokens = max(0, request.context_tokens - prefix_entry.token_count) - return prefix_entry, remaining_tokens - else: - # On a miss, update stats and return. - with self.lock: - self.stats['prefix_misses'] += 1 - return None, request.context_tokens - - -# ============================================================================ -# FEATURE 4: RAG WORKLOAD MODELING -# Simulates a Retrieval-Augmented Generation workload, where large document -# chunks are loaded into the context window, stressing the cache. -# ============================================================================ - -@dataclass -class RAGChunk: - """Represents a single chunk of a document in a RAG system.""" - chunk_id: str - doc_id: str - chunk_index: int - token_count: int - kv_cache_key: str # The key for this chunk's KV cache. - - access_count: int = 0 - last_accessed: datetime = field(default_factory=datetime.now) - storage_tier: str = "" - size_bytes: int = 0 - - -@dataclass -class RAGDocument: - """Represents a document that has been chunked for RAG.""" - doc_id: str - total_tokens: int - chunk_size: int - chunks: List[RAGChunk] = field(default_factory=list) - - @property - def num_chunks(self) -> int: - return len(self.chunks) - - -@dataclass -class RAGQuery: - """Represents a RAG query that retrieves document chunks.""" - query_id: str - query_tokens: int - retrieved_chunks: List[RAGChunk] - generation_tokens: int - - @property - def total_context_tokens(self) -> int: - """The total context is the user's query plus all retrieved document chunks.""" - return self.query_tokens + sum(c.token_count for c in self.retrieved_chunks) - - -class RAGDocumentManager: - """Manages the ingestion and retrieval of RAG document chunks.""" - - def __init__(self, cache, chunk_size: int = 512, top_k_chunks: int = 5): - self.cache = cache # A reference to the main MultiTierCache. - self.chunk_size = chunk_size - self.top_k_chunks = top_k_chunks - self.documents: Dict[str, RAGDocument] = {} - self.chunk_index: Dict[str, RAGChunk] = {} - - def ingest_document(self, doc_id: str, total_tokens: int, model_config: ModelConfig): - """ - Simulates the ingestion of a document. - This involves splitting it into chunks and pre-calculating and storing the - KV cache for each chunk in the multi-tier cache. - """ - max_chunk_bytes = 256 * 1024**2 # Target ~256MB per chunk to limit memory pressure. - bytes_per_token = max(model_config.kv_cache_size_per_token, 1) - max_tokens_per_chunk = max(1, min(self.chunk_size, max_chunk_bytes // bytes_per_token)) - - if max_tokens_per_chunk < self.chunk_size: - print(f"[RAG] Adjusting chunk size for {doc_id} to {max_tokens_per_chunk} tokens " - f"to stay under {max_chunk_bytes / 1024**2:.0f} MB per chunk.") - - num_chunks = (total_tokens + max_tokens_per_chunk - 1) // max_tokens_per_chunk - - doc = RAGDocument( - doc_id=doc_id, - total_tokens=total_tokens, - chunk_size=max_tokens_per_chunk, - chunks=[] - ) - - for chunk_idx in range(num_chunks): - remaining_tokens = total_tokens - chunk_idx * max_tokens_per_chunk - chunk_tokens = min(max_tokens_per_chunk, remaining_tokens) - - chunk = RAGChunk( - chunk_id=f"{doc_id}_chunk_{chunk_idx}", - doc_id=doc_id, - chunk_index=chunk_idx, - token_count=chunk_tokens, - kv_cache_key=f"rag_{doc_id}_chunk_{chunk_idx}" - ) - - # Allocate and store the KV cache for this new chunk. - try: - success, location, write_latency = self.cache.allocate_cache( - key=chunk.kv_cache_key, - num_tokens=chunk_tokens - ) - except MemoryError: - print(f"[RAG] MemoryError while ingesting chunk {chunk.chunk_id}; skipping remaining chunks.") - break - except Exception as exc: - print(f"[RAG] Error ingesting chunk {chunk.chunk_id}: {exc}") - continue - - if not success: - print(f"[RAG] Warning: Failed to allocate cache for chunk {chunk.chunk_id}.") - continue - - chunk.storage_tier = location - chunk.size_bytes = chunk_tokens * model_config.kv_cache_size_per_token - - doc.chunks.append(chunk) - self.chunk_index[chunk.chunk_id] = chunk - - self.documents[doc_id] = doc - return doc - - def retrieve_chunks(self, doc_id: str) -> List[RAGChunk]: - """Simulates the retrieval of the top-k most relevant chunks for a query.""" - if doc_id not in self.documents: - return [] - - doc = self.documents[doc_id] - - # Simulate a realistic retrieval access pattern, where earlier chunks in a - # document are more likely to be retrieved. - chunk_probabilities = [1.0 / (i + 1) for i in range(len(doc.chunks))] - total_prob = sum(chunk_probabilities) - chunk_probabilities = [p / total_prob for p in chunk_probabilities] - - retrieved_indices = np.random.choice( - len(doc.chunks), - size=min(self.top_k_chunks, len(doc.chunks)), - replace=False, - p=chunk_probabilities - ) - - retrieved_chunks = [doc.chunks[i] for i in retrieved_indices] - - # Update access stats for the retrieved chunks. - for chunk in retrieved_chunks: - chunk.access_count += 1 - chunk.last_accessed = datetime.now() - - return retrieved_chunks - - -# ============================================================================ -# FEATURE 5: SHAREGPT DATASET REPLAY -# Loads and replays real conversation data from ShareGPT dataset for realistic workload generation -# ============================================================================ - -class ShareGPTDatasetLoader: - """ - Loads ShareGPT conversation data and provides realistic request patterns. - ShareGPT format has conversations with 'from' (human/gpt) and 'value' (text content). - """ - - def __init__(self, dataset_path: str, max_conversations: int = 1000, seed: Optional[int] = None): - """ - Initialize the ShareGPT dataset loader. - - Args: - dataset_path: Path to the ShareGPT JSON file - max_conversations: Maximum number of conversations to load - seed: Random seed for reproducibility - """ - self.dataset_path = dataset_path - self.max_conversations = max_conversations - self.conversations = [] - self.token_stats = {} - - if seed: - random.seed(seed) - np.random.seed(seed) - - self._load_dataset() - - def _load_dataset(self): - """Load and process the ShareGPT dataset.""" - if not os.path.exists(self.dataset_path): - print(f"[ShareGPT] Warning: Dataset not found at {self.dataset_path}") - return - - try: - # Try to initialize tokenizer for accurate token counting - try: - self.tokenizer = tiktoken.get_encoding("cl100k_base") # GPT-4 tokenizer - except: - self.tokenizer = None - print("[ShareGPT] Tiktoken not available, using approximate token counting") - - with open(self.dataset_path, 'r', encoding='utf-8') as f: - data = json.load(f) - - # Process conversations - for conv_idx, conversation in enumerate(data[:self.max_conversations]): - if 'conversations' not in conversation: - continue - - conv_data = [] - turns = conversation['conversations'] - - for i in range(0, len(turns) - 1, 2): # Process pairs of human-gpt turns - if i + 1 >= len(turns): - break - - human_turn = turns[i] - gpt_turn = turns[i + 1] - - if human_turn.get('from') != 'human' or gpt_turn.get('from') != 'gpt': - continue - - # Calculate tokens - context_text = human_turn.get('value', '') - generation_text = gpt_turn.get('value', '') - - if self.tokenizer: - context_tokens = len(self.tokenizer.encode(context_text)) - generation_tokens = len(self.tokenizer.encode(generation_text)) - else: - # Approximate: 4 characters per token on average - context_tokens = max(1, len(context_text) // 4) - generation_tokens = max(1, len(generation_text) // 4) - - # Limit extreme values for stability - context_tokens = min(context_tokens, 16384) # Cap at 16K context - generation_tokens = min(generation_tokens, 2048) # Cap at 2K generation - - conv_data.append({ - 'context_tokens': context_tokens, - 'generation_tokens': generation_tokens, - 'turn_number': i // 2 + 1 - }) - - if conv_data: - self.conversations.append({ - 'id': conversation.get('id', f'conv_{conv_idx}'), - 'turns': conv_data - }) - - # Calculate statistics - if self.conversations: - all_context_tokens = [] - all_generation_tokens = [] - - for conv in self.conversations: - for turn in conv['turns']: - all_context_tokens.append(turn['context_tokens']) - all_generation_tokens.append(turn['generation_tokens']) - - self.token_stats = { - 'context_mean': np.mean(all_context_tokens), - 'context_std': np.std(all_context_tokens), - 'context_min': np.min(all_context_tokens), - 'context_max': np.max(all_context_tokens), - 'context_p50': np.percentile(all_context_tokens, 50), - 'context_p95': np.percentile(all_context_tokens, 95), - 'generation_mean': np.mean(all_generation_tokens), - 'generation_std': np.std(all_generation_tokens), - 'generation_min': np.min(all_generation_tokens), - 'generation_max': np.max(all_generation_tokens), - 'generation_p50': np.percentile(all_generation_tokens, 50), - 'generation_p95': np.percentile(all_generation_tokens, 95), - 'total_conversations': len(self.conversations), - 'total_turns': sum(len(c['turns']) for c in self.conversations) - } - - print(f"[ShareGPT] Loaded {len(self.conversations)} conversations with {self.token_stats['total_turns']} turns") - print(f"[ShareGPT] Context tokens: mean={self.token_stats['context_mean']:.1f}, p50={self.token_stats['context_p50']:.1f}, p95={self.token_stats['context_p95']:.1f}") - print(f"[ShareGPT] Generation tokens: mean={self.token_stats['generation_mean']:.1f}, p50={self.token_stats['generation_p50']:.1f}, p95={self.token_stats['generation_p95']:.1f}") - - except Exception as e: - print(f"[ShareGPT] Error loading dataset: {e}") - self.conversations = [] - - def get_random_conversation(self) -> Optional[Dict]: - """Get a random conversation from the dataset.""" - if not self.conversations: - return None - return random.choice(self.conversations) - - def get_random_turn(self) -> Optional[Tuple[int, int]]: - """Get random context and generation token counts from the dataset.""" - if not self.conversations: - return None - - conv = self.get_random_conversation() - if conv and conv['turns']: - turn = random.choice(conv['turns']) - return turn['context_tokens'], turn['generation_tokens'] - return None - - def iterate_conversations(self, shuffle: bool = True): - """Iterate through all conversations, optionally shuffled.""" - conversations = self.conversations.copy() - if shuffle: - random.shuffle(conversations) - for conv in conversations: - yield conv - - -# ============================================================================ -# STORAGE BACKEND CLASSES -# These classes abstract the I/O operations for each tier of the memory hierarchy. -# ============================================================================ - -class StorageBackend: - """Abstract base class for all storage backends (GPU, CPU, NVMe).""" - - @dataclass - class IOTiming: - """Captures total latency along with host and device components.""" - total: float - device: float - host: float - - def write(self, key: str, data: np.ndarray) -> 'StorageBackend.IOTiming': - """Writes data to the backend and returns latency breakdown.""" - raise NotImplementedError - - def read(self, key: str) -> Tuple[np.ndarray, 'StorageBackend.IOTiming']: - """Reads data from the backend and returns the data and latency.""" - raise NotImplementedError - - def delete(self, key: str): - """Deletes data from the backend.""" - raise NotImplementedError - - def clear(self): - """Clears all data from the backend.""" - raise NotImplementedError - - -class GPUMemoryBackend(StorageBackend): - """ - GPU VRAM storage backend. - Uses PyTorch or CuPy for GPU operations. This is the fastest tier. - """ - - def __init__(self, use_torch=True): - if use_torch and TORCH_AVAILABLE: - self.backend = 'torch' - self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') - if self.device.type == 'cpu': - raise RuntimeError("No GPU available for PyTorch backend") - # Pre-allocate a large chunk of GPU memory to simulate a real server environment. - torch.cuda.set_per_process_memory_fraction(0.8, 0) - torch.cuda.empty_cache() - elif CUPY_AVAILABLE: - self.backend = 'cupy' - mempool = cp.get_default_memory_pool() - mempool.free_all_blocks() - else: - raise RuntimeError("No GPU backend (PyTorch or CuPy) available.") - - self.cache = {} # Holds tensors on the GPU. - self.pinned_memory = {} # Holds CPU memory pinned for fast async GPU transfers. - - def write(self, key: str, data: np.ndarray) -> StorageBackend.IOTiming: - """ - Writes a NumPy array from CPU to GPU VRAM. - Uses pinned memory and non-blocking transfers for maximum performance. - """ - # Simple eviction mechanism if GPU runs out of memory. - if self.backend == 'torch' and torch.cuda.is_available(): - free_memory = torch.cuda.mem_get_info()[0] - if data.nbytes > free_memory * 0.9: - torch.cuda.empty_cache() - if data.nbytes > torch.cuda.mem_get_info()[0] * 0.9: - if len(self.cache) > 0: - oldest_key = list(self.cache.keys())[0] - del self.cache[oldest_key] - torch.cuda.empty_cache() - - start = time.perf_counter() - - if self.backend == 'torch': - # Pin the CPU memory for this tensor to enable fast asynchronous transfer. - if key not in self.pinned_memory: - self.pinned_memory[key] = torch.from_numpy(data).pin_memory() - # Asynchronously copy the pinned memory to the GPU. - gpu_tensor = self.pinned_memory[key].to(self.device, non_blocking=True) - # Wait for the transfer to complete to accurately measure latency. - torch.cuda.synchronize() - self.cache[key] = gpu_tensor - del self.pinned_memory[key] # Release the pinned memory. - else: # CuPy backend - self.cache[key] = cp.asarray(data) - cp.cuda.Stream.null.synchronize() - - total = time.perf_counter() - start - # GPU transfers are all host-managed; device component equals total for now. - return StorageBackend.IOTiming(total=total, device=total, host=total) - - def read(self, key: str) -> Tuple[np.ndarray, StorageBackend.IOTiming]: - """Reads a tensor from GPU VRAM back to a NumPy array on the CPU.""" - if key not in self.cache: - raise KeyError(f"Key {key} not found in GPU cache") - - start = time.perf_counter() - - if self.backend == 'torch': - gpu_tensor = self.cache[key] - # Asynchronously copy the tensor from GPU to CPU. - cpu_tensor = gpu_tensor.to('cpu', non_blocking=True) - # Wait for the transfer to complete to measure latency. - torch.cuda.synchronize() - data = cpu_tensor.numpy() - else: # CuPy backend - data = cp.asnumpy(self.cache[key]) - cp.cuda.Stream.null.synchronize() - - total = time.perf_counter() - start - return data, StorageBackend.IOTiming(total=total, device=total, host=total) - - def delete(self, key: str): - if key in self.cache: - del self.cache[key] - if key in self.pinned_memory: - del self.pinned_memory[key] - - def clear(self): - """Clears all tensors from the GPU cache and frees memory.""" - for key in list(self.cache.keys()): - del self.cache[key] - self.cache.clear() - for key in list(self.pinned_memory.keys()): - del self.pinned_memory[key] - self.pinned_memory.clear() - - if self.backend == 'torch' and torch.cuda.is_available(): - torch.cuda.empty_cache() - torch.cuda.synchronize() - elif self.backend == 'cupy': - mempool = cp.get_default_memory_pool() - pinned_mempool = cp.get_default_pinned_memory_pool() - mempool.free_all_blocks() - pinned_mempool.free_all_blocks() - - -class CPUMemoryBackend(StorageBackend): - """CPU RAM storage backend. This is the second tier in the cache hierarchy.""" - - def __init__(self): - self.cache = {} - - def write(self, key: str, data: np.ndarray) -> StorageBackend.IOTiming: - """Writes data by copying it into the cache dictionary.""" - start = time.perf_counter() - self.cache[key] = np.copy(data) - total = time.perf_counter() - start - return StorageBackend.IOTiming(total=total, device=total, host=total) - - def read(self, key: str) -> Tuple[np.ndarray, StorageBackend.IOTiming]: - """Reads data by copying it from the cache dictionary.""" - if key not in self.cache: - raise KeyError(f"Key {key} not found in CPU cache") - start = time.perf_counter() - data = np.copy(self.cache[key]) - total = time.perf_counter() - start - return data, StorageBackend.IOTiming(total=total, device=total, host=total) - - def delete(self, key: str): - if key in self.cache: - del self.cache[key] - - def clear(self): - for key in list(self.cache.keys()): - del self.cache[key] - self.cache.clear() - import gc - gc.collect() # Force garbage collection. - - -class NVMeBackend(StorageBackend): - """ - NVMe/SSD storage backend using memory-mapped files. - This is the third and slowest tier, used for offloading from CPU RAM. - """ - - def __init__(self, base_path: str = None): - self.temp_dir = None - if base_path is None: - self.temp_dir = tempfile.TemporaryDirectory(prefix="kv_cache_") - self.base_path = Path(self.temp_dir.name) - else: - self.base_path = Path(base_path) - # Ensure the cache directory exists but do not remove the mount point itself. - if self.base_path.exists(): - if not self.base_path.is_dir(): - raise NotADirectoryError(f"Cache path {self.base_path} exists but is not a directory.") - # Remove only the files the benchmark generated (.npy shards). - for entry in self.base_path.glob("*.npy"): - try: - entry.unlink() - except OSError: - pass - else: - self.base_path.mkdir(parents=True, exist_ok=True) - - # Final sanity check. - if not self.base_path.exists(): - raise OSError(f"Cache directory {self.base_path} does not exist and could not be created.") - - self.metadata = {} - - def _get_path(self, key: str) -> Path: - """Constructs the file path for a given cache key.""" - return self.base_path / f"{key}.npy" - - def write(self, key: str, data: np.ndarray) -> StorageBackend.IOTiming: - """Writes a NumPy array to a binary .npy file on disk.""" - start = time.perf_counter() - path = self._get_path(key) - - with open(path, 'wb') as f: - np.save(f, data, allow_pickle=False) - # Host serialization (NumPy header + buffer copy) completes here. - post_save = time.perf_counter() - f.flush() - # fsync blocks until the kernel persists data to the device. - os.fsync(f.fileno()) - post_fsync = time.perf_counter() - - self.metadata[key] = {'shape': data.shape, 'dtype': str(data.dtype), 'size': data.nbytes} - - host_time = post_save - start - device_time = post_fsync - post_save - total = post_fsync - start - return StorageBackend.IOTiming(total=total, device=device_time, host=host_time) - - def read(self, key: str) -> Tuple[np.ndarray, StorageBackend.IOTiming]: - """ - Reads a .npy file from disk. - - IMPORTANT: This method is designed to force actual disk I/O for accurate storage - benchmarking. It uses posix_fadvise() to drop the file from the Linux page cache - before reading, ensuring that: - 1. Every read operation hits the physical storage device (NVMe/SSD) - 2. iostat and other system monitoring tools accurately reflect storage I/O - 3. Latency measurements represent real-world storage performance - - Without this, Linux would serve reads from the page cache, making it appear as if - no disk I/O is occurring (iostat shows 0 r/s), which defeats the purpose of a - storage benchmark. - """ - start = time.perf_counter() - path = self._get_path(key) - - if not path.exists(): - raise KeyError(f"Key {key} not found in NVMe cache") - - # CRITICAL FIX: Drop this file from the Linux page cache before reading. - # This ensures that the subsequent read operation will be served from the actual - # storage device rather than from cached memory. - try: - fd = os.open(path, os.O_RDONLY) - try: - os.posix_fadvise(fd, 0, 0, 4) # POSIX_FADV_DONTNEED - except AttributeError: - pass - finally: - os.close(fd) - except Exception: - pass - - pre_load = time.perf_counter() - data = np.load(path, allow_pickle=False) - load_done = time.perf_counter() - # Convert to a standard numpy array to ensure the full data is loaded into memory. - data = np.array(data) - copy_done = time.perf_counter() - - device_time = load_done - pre_load - host_time = (pre_load - start) + (copy_done - load_done) - total = copy_done - start - return data, StorageBackend.IOTiming(total=total, device=device_time, host=host_time) - - def delete(self, key: str): - path = self._get_path(key) - if path.exists(): - path.unlink() - if key in self.metadata: - del self.metadata[key] - - def clear(self): - """Deletes all .npy files from the cache directory.""" - for file in self.base_path.glob("*.npy"): - file.unlink() - self.metadata.clear() - - def __del__(self): - """Cleans up the temporary directory when the object is destroyed.""" - if self.temp_dir: - import shutil - shutil.rmtree(self.temp_dir, ignore_errors=True) - - -class KVCacheGenerator: - """Generates realistic-looking KV cache data for testing.""" - - def __init__(self, model_config: ModelConfig, global_seed: Optional[int] = None): - self.model_config = model_config - self.global_seed = 0 if global_seed is None else int(global_seed) - - def _seed_from_key(self, key: str) -> int: - # Use stable cryptographic hash to get deterministic 64-bit seed - h = hashlib.sha256(key.encode('utf-8')).digest() - key_hash64 = int.from_bytes(h[:8], 'little') - return (key_hash64 ^ self.global_seed) & 0xFFFFFFFFFFFFFFFF - - def generate(self, sequence_length: int, key: Optional[str] = None) -> np.ndarray: - """ - Generates a NumPy array with the correct shape and dtype for a KV cache. - The data itself is random noise, but is generated deterministically if a key is provided. - """ - # The shape of a KV cache tensor is typically: - # (num_layers, 2 (for K/V), sequence_length, num_kv_heads, head_dimension) - kv_shape = ( - self.model_config.num_layers, - 2, # K and V - sequence_length, - self.model_config.kv_heads, - self.model_config.kv_dim_per_head - ) - - dtype = np.float16 if 'float16' in self.model_config.dtype else np.float32 - - if key is None: - # Fallback to global RNG if no key is provided (less deterministic in multithreading) - rng = np.random.default_rng(self.global_seed) - else: - # Generate a seed deterministically from the key and global seed - seed = self._seed_from_key(key) - rng = np.random.default_rng(seed & 0xFFFFFFFF) - - data = rng.uniform(-1.0, 1.0, size=kv_shape).astype(dtype) - return data - - -# ============================================================================ -# ENHANCED MULTI-TIER CACHE -# This is the core logic of the benchmark, managing the three-tier hierarchy. -# ============================================================================ - -class MultiTierCache: - """ - Manages KV cache data across GPU, CPU, and NVMe tiers. - - This class is the heart of the benchmark. It orchestrates where cache data is - written to and read from based on available space and access patterns. - It is heavily instrumented to collect detailed performance metrics. - """ - - def __init__(self, - model_config: ModelConfig, - gpu_memory_gb: float, - cpu_memory_gb: float, - cache_dir: str = None, - eviction_policy: str = 'lru', - performance_profile: str = 'latency', - seed: Optional[int] = None): - - self.model_config = model_config - self.gpu_memory_limit = gpu_memory_gb * 1024**3 - self.cpu_memory_limit = cpu_memory_gb * 1024**3 - self.eviction_policy = eviction_policy - self.performance_profile = performance_profile - self.seed = seed - - # Initialize storage backends for each tier. - self.backends = {} - try: - if TORCH_AVAILABLE or CUPY_AVAILABLE: - self.backends['gpu'] = GPUMemoryBackend(use_torch=TORCH_AVAILABLE) - except Exception as e: - print(f"Warning: Could not initialize GPU backend: {e}") - - self.backends['cpu'] = CPUMemoryBackend() - self.backends['nvme'] = NVMeBackend(base_path=cache_dir) - - self.generator = KVCacheGenerator(model_config, global_seed=self.seed) - - # Metadata tracking for all cache entries across all tiers. - self.cache_entries = {} # Main dictionary mapping a key to its metadata. - self.entry_locks: Dict[str, threading.Lock] = {} # Fine-grained locks per cache key. - self.gpu_memory_used = 0 - self.cpu_memory_used = 0 - - # Global locks for managing shared state. - self.metadata_lock = threading.Lock() # For coarse-grained operations on the cache_entries dict itself. - self.memory_lock = threading.Lock() # For updating the gpu_memory_used and cpu_memory_used counters. - self.stats_lock = threading.Lock() # For updating the performance statistics dictionary. - - # Dictionary for collecting a wide range of performance metrics. - self.stats = { - 'cache_hits': 0, - 'cache_misses': 0, - 'evictions': 0, - 'offloads_cpu': 0, # Prefills that went directly to CPU. - 'offloads_nvme': 0, # Prefills that went directly to NVMe. - - # Latency lists for each tier and operation. - 'gpu_read_latencies': [], 'cpu_read_latencies': [], 'nvme_read_latencies': [], - 'gpu_write_latencies': [], 'cpu_write_latencies': [], 'nvme_write_latencies': [], - 'nvme_read_device_latencies': [], 'nvme_read_host_latencies': [], - 'nvme_write_device_latencies': [], 'nvme_write_host_latencies': [], - - # Phase-specific I/O metrics. - 'prefill_writes': 0, 'decode_reads': 0, - 'prefill_bytes_written': 0, 'decode_bytes_read': 0, - - # Cache type metrics for analyzing hit sources. - 'system_prompt_hits': 0, 'common_phrase_hits': 0, - 'user_cache_hits': 0, 'multi_turn_hits': 0, - - # Aggregate I/O metrics. - 'total_read_bytes': 0, 'total_write_bytes': 0, - 'read_operations': 0, 'write_operations': 0, - - # New counter for NVMe tokens processed (for throughput assessment) - 'nvme_tokens_processed': 0, - } - - def _get_entry_lock(self, key: str) -> threading.Lock: - """Get or create a lock for a specific cache entry to ensure thread safety.""" - with self.metadata_lock: - if key not in self.entry_locks: - self.entry_locks[key] = threading.Lock() - return self.entry_locks[key] - - def allocate_cache(self, key: str, num_tokens: int, phase: InferencePhase = InferencePhase.PREFILL) -> Tuple[bool, str, float]: - """ - Allocates and writes a new KV cache entry to the most appropriate tier. - This simulates the 'prefill' phase. - - Args: - key: The unique key for the cache entry. - num_tokens: The number of tokens to generate cache for. - phase: The current inference phase (should be PREFILL). - - Returns: - A tuple of (success_boolean, location_string, write_latency_seconds). - """ - # Quick check to see if the key already exists to avoid redundant work. - with self.metadata_lock: - if key in self.cache_entries: - return True, self.cache_entries[key]['location'], 0.0 - - # Generate the KV cache data. This is computationally expensive and done outside locks. - try: - data = self.generator.generate(sequence_length=num_tokens, key=key) - except MemoryError: - print(f"[KVCache] MemoryError generating cache for key {key} ({num_tokens} tokens)") - return False, 'none', 0.0 - except Exception as exc: - print(f"[KVCache] Failed to generate cache for key {key}: {exc}") - return False, 'none', 0.0 - - size_bytes = data.nbytes - - # Update write statistics. - with self.stats_lock: - if phase == InferencePhase.PREFILL: - self.stats['prefill_writes'] += 1 - self.stats['prefill_bytes_written'] += size_bytes - self.stats['write_operations'] += 1 - self.stats['total_write_bytes'] += size_bytes - - # --- Tiering Logic --- - # Decide which tier to write to based on available memory. - with self.memory_lock: - # Tier 1: GPU. Check if there's space in the GPU budget (with a 20% buffer). - if 'gpu' in self.backends and self.gpu_memory_used + size_bytes < self.gpu_memory_limit * 0.8: - self.gpu_memory_used += size_bytes - allocated_tier = 'gpu' - # Tier 2: CPU. Check if there's space in the CPU budget. - elif self.cpu_memory_used + size_bytes < self.cpu_memory_limit * 0.8: - self.cpu_memory_used += size_bytes - allocated_tier = 'cpu' - # Tier 3: NVMe. If no space in RAM, offload to disk. - else: - allocated_tier = 'nvme' - - # Perform the actual write operation to the chosen backend. - try: - if allocated_tier == 'gpu': - timing = self.backends['gpu'].write(key, data) - elif allocated_tier == 'cpu': - timing = self.backends['cpu'].write(key, data) - else: - timing = self.backends['nvme'].write(key, data) - - # After a successful write, update the central metadata dictionary. - with self.metadata_lock: - self.cache_entries[key] = { - 'location': allocated_tier, - 'size': size_bytes, - 'last_access': time.time(), - 'access_count': 1 - } - - # Record latency and offload stats. - with self.stats_lock: - if allocated_tier == 'cpu': - self.stats['offloads_cpu'] += 1 - self.stats['cpu_write_latencies'].append(timing.total) - elif allocated_tier == 'nvme': - self.stats['offloads_nvme'] += 1 - self.stats['nvme_write_latencies'].append(timing.total) - self.stats['nvme_write_device_latencies'].append(timing.device) - self.stats['nvme_write_host_latencies'].append(timing.host) - self.stats['nvme_tokens_processed'] += num_tokens - elif allocated_tier == 'gpu': - self.stats['gpu_write_latencies'].append(timing.total) - - del data # Free the memory for the generated data. - return True, allocated_tier, timing.total - - except Exception as e: - # If the write fails, roll back the memory reservation. - with self.memory_lock: - if allocated_tier == 'gpu': - self.gpu_memory_used -= size_bytes - elif allocated_tier == 'cpu': - self.cpu_memory_used -= size_bytes - del data - return False, 'none', 0.0 - - def access_cache(self, key: str, phase: InferencePhase = InferencePhase.DECODE, - cache_type: str = 'user') -> Tuple[Optional[str], float]: - """ - Accesses an existing cached entry and records the read performance. - This simulates the 'decode' phase. - - Args: - key: The unique key for the cache entry to access. - phase: The current inference phase (should be DECODE). - cache_type: The type of cache being accessed (for detailed stats). - - Returns: - A tuple of (location_string, read_latency_seconds). - """ - # First, check if the metadata for the key exists. - with self.metadata_lock: - if key not in self.cache_entries: - with self.stats_lock: - self.stats['cache_misses'] += 1 - return None, 0.0 - - entry = self.cache_entries[key] - location = entry['location'] - entry_size = entry['size'] - - # Get the specific lock for this key to handle concurrent access. - entry_lock = self._get_entry_lock(key) - - with entry_lock: - # Update metadata (access time, count) and performance stats. - with self.metadata_lock: - entry = self.cache_entries[key] - entry['last_access'] = time.time() - entry['access_count'] += 1 - - with self.stats_lock: - self.stats['cache_hits'] += 1 - - # Track hits by cache type for deeper analysis. - if cache_type == 'system': self.stats['system_prompt_hits'] += 1 - elif cache_type == 'common': self.stats['common_phrase_hits'] += 1 - elif cache_type == 'multi_turn': self.stats['multi_turn_hits'] += 1 - else: self.stats['user_cache_hits'] += 1 - - # Track phase-specific I/O. - if phase == InferencePhase.DECODE: - self.stats['decode_reads'] += 1 - self.stats['decode_bytes_read'] += entry_size - - self.stats['read_operations'] += 1 - self.stats['total_read_bytes'] += entry_size - - # Perform the actual read from the correct backend (GPU, CPU, or NVMe). - try: - _, timing = self.backends[location].read(key) - - # Record the latency for the specific tier that was read from. - with self.stats_lock: - if location == 'gpu': - self.stats['gpu_read_latencies'].append(timing.total) - elif location == 'cpu': - self.stats['cpu_read_latencies'].append(timing.total) - else: - self.stats['nvme_read_latencies'].append(timing.total) - self.stats['nvme_read_device_latencies'].append(timing.device) - self.stats['nvme_read_host_latencies'].append(timing.host) - - #The access_cache function already retrieves the size of the entry in bytes: entry_size = entry['size']. - #The number of tokens can be calculated by dividing entry_size by the size of a single token's KV cache, which is available via self.model_config.kv_cache_size_per_token. - #This calculation should happen only when the read is from the 'nvme' tier. - if self.model_config.kv_cache_size_per_token > 0: - num_tokens = entry_size / self.model_config.kv_cache_size_per_token - self.stats['nvme_tokens_processed'] += num_tokens - - return location, timing.total - except Exception as e: - # In case of a read error, return the location but with zero latency. - return location, 0.0 - - def _evaluate_storage_performance(self, duration: float) -> Dict: - """ - Evaluates storage performance against pre-defined MLPerf Storage WG criteria. - This provides a clear PASS/FAIL assessment of the storage system. - """ - criteria = [] - all_passed = True - - # Throughput-focused profile for MLPerf submission - if self.performance_profile == 'throughput': - # Criterion: Throughput should be based on tokens processed by the NVMe tier. - nvme_tokens = self.stats.get('nvme_tokens_processed', 0) - # Correctly use the benchmark's full duration for an accurate tok/s calculation. - throughput = nvme_tokens / duration if duration > 0 else 0 - - passed = throughput > 0 # Simple check to ensure it ran - criteria.append({ - 'name': 'Throughput (tok/s)', - 'target': '>0', 'actual': f"{throughput:.2f}", 'unit': 'tok/s', 'passed': passed - }) - all_passed = all_passed and passed - - return { - 'overall_status': 'PASS' if all_passed else 'FAIL', - 'criteria': criteria, - 'passed_count': sum(1 for c in criteria if c['passed']), - 'total_count': len(criteria) - } - - # Latency-focused profile (default) - # Criterion 1: NVMe Write P95 latency should be less than 500ms. - nvme_write_device = self.stats.get('nvme_write_device_latencies', []) - nvme_write_total = self.stats.get('nvme_write_latencies', []) - nvme_write_basis = nvme_write_device if nvme_write_device else nvme_write_total - if nvme_write_basis: - nvme_write_p95 = np.percentile(nvme_write_basis, 95) * 1000 - passed = nvme_write_p95 < 500 - criteria.append({ - 'name': 'NVMe Write P95 < 500ms', - 'target': 500, 'actual': nvme_write_p95, 'unit': 'ms', 'passed': passed - }) - all_passed = all_passed and passed - - # Criterion 2: NVMe Read P95 latency should be less than 200ms. - nvme_read_device = self.stats.get('nvme_read_device_latencies', []) - nvme_read_total = self.stats.get('nvme_read_latencies', []) - nvme_read_basis = nvme_read_device if nvme_read_device else nvme_read_total - if nvme_read_basis: - nvme_read_p95 = np.percentile(nvme_read_basis, 95) * 1000 - passed = nvme_read_p95 < 200 - criteria.append({ - 'name': 'NVMe Read P95 < 200ms', - 'target': 200, 'actual': nvme_read_p95, 'unit': 'ms', 'passed': passed - }) - all_passed = all_passed and passed - - # Criterion 3: CPU RAM P95 latency should be less than 150ms. - # This accounts for large memory copies within RAM. - cpu_read_lats = self.stats.get('cpu_read_latencies', []) - cpu_write_lats = self.stats.get('cpu_write_latencies', []) - if cpu_read_lats or cpu_write_lats: - all_cpu_lats = cpu_read_lats + cpu_write_lats - cpu_p95 = np.percentile(all_cpu_lats, 95) * 1000 - passed = cpu_p95 < 150 - criteria.append({ - 'name': 'CPU RAM P95 < 150ms', - 'target': 150, 'actual': cpu_p95, 'unit': 'ms', 'passed': passed - }) - all_passed = all_passed and passed - - # Criterion 4: Overall cache hit rate should be above 30% for a realistic workload. - total_accesses = self.stats['cache_hits'] + self.stats['cache_misses'] - if total_accesses > 0: - hit_rate = self.stats['cache_hits'] / total_accesses - passed = hit_rate > 0.3 - criteria.append({ - 'name': 'Cache Hit Rate > 30%', - 'target': 0.3, 'actual': hit_rate, 'unit': 'ratio', 'passed': passed - }) - all_passed = all_passed and passed - - return { - 'overall_status': 'PASS' if all_passed else 'FAIL', - 'criteria': criteria, - 'passed_count': sum(1 for c in criteria if c['passed']), - 'total_count': len(criteria) - } - - def get_stats(self, duration: float) -> Dict: - """Gathers and returns a comprehensive dictionary of all performance statistics.""" - # Snapshot stats and metadata under locks to ensure consistency. - with self.stats_lock: - total_accesses = self.stats['cache_hits'] + self.stats['cache_misses'] - hit_rate = self.stats['cache_hits'] / total_accesses if total_accesses > 0 else 0 - stats_snapshot = self.stats.copy() - - with self.metadata_lock: - gpu_entries = sum(1 for e in self.cache_entries.values() if e['location'] == 'gpu') - cpu_entries = sum(1 for e in self.cache_entries.values() if e['location'] == 'cpu') - nvme_entries = sum(1 for e in self.cache_entries.values() if e['location'] == 'nvme') - - with self.memory_lock: - gpu_mem_used = self.gpu_memory_used - cpu_mem_used = self.cpu_memory_used - - # Get the pass/fail assessment. - storage_health = self._evaluate_storage_performance(duration) - - stats = { - 'cache_hit_rate': hit_rate, - 'cache_hits': stats_snapshot['cache_hits'], - 'cache_misses': stats_snapshot['cache_misses'], - 'gpu_entries': gpu_entries, - 'cpu_entries': cpu_entries, - 'nvme_entries': nvme_entries, - 'gpu_memory_used_gb': gpu_mem_used / 1024**3, - 'cpu_memory_used_gb': cpu_mem_used / 1024**3, - 'offloads_cpu': stats_snapshot['offloads_cpu'], - 'offloads_nvme': stats_snapshot['offloads_nvme'], - 'storage_health': storage_health, - 'prefill_writes': self.stats['prefill_writes'], - 'decode_reads': self.stats['decode_reads'], - 'prefill_bytes_written_gb': self.stats['prefill_bytes_written'] / 1024**3, - 'decode_bytes_read_gb': self.stats['decode_bytes_read'] / 1024**3, - 'system_prompt_hits': self.stats['system_prompt_hits'], - 'common_phrase_hits': self.stats['common_phrase_hits'], - 'user_cache_hits': self.stats['user_cache_hits'], - 'multi_turn_hits': self.stats['multi_turn_hits'], - 'total_read_bytes': self.stats['total_read_bytes'], - 'total_write_bytes': self.stats['total_write_bytes'], - 'total_read_gb': self.stats['total_read_bytes'] / 1024**3, - 'total_write_gb': self.stats['total_write_bytes'] / 1024**3, - 'read_write_ratio': self.stats['total_read_bytes'] / max(self.stats['total_write_bytes'], 1), - 'read_iops': self.stats['read_operations'], - 'write_iops': self.stats['write_operations'], - } - - # Add latency percentiles for each tier. - for tier in ['gpu', 'cpu', 'nvme']: - for op in ['read', 'write']: - latencies = self.stats[f'{tier}_{op}_latencies'] - if latencies: - lat_array = np.array(latencies) - stats[f'{tier}_{op}_p50_ms'] = np.percentile(lat_array, 50) * 1000 - stats[f'{tier}_{op}_p95_ms'] = np.percentile(lat_array, 95) * 1000 - stats[f'{tier}_{op}_p99_ms'] = np.percentile(lat_array, 99) * 1000 - - # Expose NVMe latency component breakdowns when present. - for op in ['read', 'write']: - device_latencies = self.stats[f'nvme_{op}_device_latencies'] - host_latencies = self.stats[f'nvme_{op}_host_latencies'] - if device_latencies: - device_array = np.array(device_latencies) - stats[f'nvme_{op}_device_p50_ms'] = np.percentile(device_array, 50) * 1000 - stats[f'nvme_{op}_device_p95_ms'] = np.percentile(device_array, 95) * 1000 - stats[f'nvme_{op}_device_p99_ms'] = np.percentile(device_array, 99) * 1000 - if host_latencies: - host_array = np.array(host_latencies) - stats[f'nvme_{op}_host_p50_ms'] = np.percentile(host_array, 50) * 1000 - stats[f'nvme_{op}_host_p95_ms'] = np.percentile(host_array, 95) * 1000 - stats[f'nvme_{op}_host_p99_ms'] = np.percentile(host_array, 99) * 1000 - - return stats - - -# ============================================================================ -# FEATURE 5: ADAPTIVE AUTOSCALING -# Automatically adjusts the user load to find a performance limit. -# ============================================================================ - -@dataclass -class StorageMetrics: - """A snapshot of storage performance metrics at a point in time.""" - timestamp: float - read_throughput_gbps: float - write_throughput_gbps: float - read_iops: int - write_iops: int - read_latency_p95_ms: float - write_latency_p95_ms: float - queue_depth: int - is_saturated: bool = False - saturation_level: float = 0.0 - - - # @property - # def is_saturated(self) -> bool: - # """Determines if storage is saturated based on latency and queue depth thresholds.""" - # return ( - # self.read_latency_p95_ms > 100 or - # self.write_latency_p95_ms > 50 or - # self.queue_depth > 100 - # ) - - -class StorageMonitor: - """Monitors storage performance in real-time to feed the autoscaler.""" - - def __init__(self, benchmark_instance, sampling_interval_ms: float = 100): - self.benchmark_instance = benchmark_instance - self.sampling_interval = sampling_interval_ms / 1000.0 - self.last_collection_time = None - self.last_total_read = 0 - self.last_total_write = 0 - self.metrics_history = [] - self.lock = threading.Lock() - - def collect_metrics(self, cache, queue_size): - """Collects all relevant performance metrics.""" - now = time.time() - if self.last_collection_time is None: - self.last_collection_time = now - self.last_total_read = cache.stats.get('total_read_bytes', 0) - self.last_total_write = cache.stats.get('total_write_bytes', 0) - return {} - - elapsed = now - self.last_collection_time - if elapsed == 0: - return {} - - # The duration for get_stats should be the total benchmark duration, not the interval - stats = cache.get_stats(duration=self.benchmark_instance.duration) - current_total_read = stats.get('total_read_bytes', 0) - current_total_write = stats.get('total_write_bytes', 0) - - # Calculate deltas since the last sample - read_delta = max(current_total_read - self.last_total_read, 0) - write_delta = max(current_total_write - self.last_total_write, 0) - - # Calculate read and write throughput in GB/s - read_throughput = (read_delta / 1024**3) / elapsed - write_throughput = (write_delta / 1024**3) / elapsed - - # Calculate queue depth as the number of requests in the queue - queue_depth = queue_size - - # Estimate read and write IOPS based on common block sizes (4KB for reads, 16KB for writes) - read_iops = int((read_delta / 4096) / elapsed) if elapsed > 0 else 0 - write_iops = int((write_delta / (16 * 1024)) / elapsed) if elapsed > 0 else 0 - - # Default to 0.0 if the keys don't exist (e.g., at the start of the run). - read_latency_p95_ms = stats.get('nvme_read_p95_ms', 0.0) - write_latency_p95_ms = stats.get('nvme_write_p95_ms', 0.0) - - # --- Saturation Detection Logic --- - is_saturated = False - if len(self.metrics_history) >= 2: - # Compare with the previous metric - prev_metric = self.metrics_history[-2] - if (prev_metric.read_latency_p95_ms < 100 and prev_metric.write_latency_p95_ms < 50 and prev_metric.queue_depth < 100): - # If the previous metric was not saturated, check for a sudden increase in latency or queue depth - if (abs(prev_metric.read_latency_p95_ms - read_latency_p95_ms) > 20 or - abs(prev_metric.write_latency_p95_ms - write_latency_p95_ms) > 10 or - abs(prev_metric.queue_depth - queue_depth) > 10): - is_saturated = True - else: - # If the previous metric was saturated, check if it's still above the thresholds - if (read_latency_p95_ms > 120 or write_latency_p95_ms > 60 or queue_depth > 120): - is_saturated = True - - # Create a new StorageMetrics object for this sample - metrics = StorageMetrics( - timestamp=now, - read_throughput_gbps=read_throughput, - write_throughput_gbps=write_throughput, - read_iops=read_iops, - write_iops=write_iops, - read_latency_p95_ms=read_latency_p95_ms, - write_latency_p95_ms=write_latency_p95_ms, - queue_depth=queue_depth, - is_saturated=is_saturated - ) - - # Add to the history and calculate saturation using a snapshot for thread safety. - with self.lock: - self.metrics_history.append(metrics) - saturation_level = self._compute_saturation_from_history(self.metrics_history) - - metrics.saturation_level = saturation_level - - # Update baselines for the next interval. - self.last_collection_time = now - self.last_total_read = current_total_read - self.last_total_write = current_total_write - return metrics - - def get_saturation_level(self) -> float: - """ - Calculates the storage saturation level (0.0 = idle, 1.0 = saturated). - Uses heuristics like increasing latency and plateauing throughput. - """ - with self.lock: - history_snapshot = list(self.metrics_history) - - return self._compute_saturation_from_history(history_snapshot) - - def _compute_saturation_from_history(self, history: List[StorageMetrics]) -> float: - if len(history) < 10: - return 0.0 - - recent_metrics = history[-10:] - - # Check if latency is trending upwards. - latencies = [m.read_latency_p95_ms for m in recent_metrics] - if len(latencies) > 1: - latency_trend = np.polyfit(range(len(latencies)), latencies, 1)[0] - else: - latency_trend = 0 - - # Check if throughput is plateauing (low variance). - throughputs = [m.read_throughput_gbps + m.write_throughput_gbps for m in recent_metrics] - throughput_variance = np.std(throughputs) / (np.mean(throughputs) + 0.01) - - # Combine indicators to get a single saturation score. - latency_factor = min(max(latencies) / 100, 1.0) - plateau_factor = 1.0 if throughput_variance < 0.1 and latency_trend > 0 else 0.5 - - saturation = latency_factor * plateau_factor - return min(saturation, 1.0) - - -class WorkloadAutoscaler: - """Automatically scales the number of simulated users to find a performance limit.""" - - def __init__(self, - mode: str = 'qos', - initial_users: int = 10, - target_saturation: float = 0.8, - scale_interval_seconds: int = 10): - self.mode = mode - self.current_users = initial_users - self.target_saturation = target_saturation - self.scale_interval = scale_interval_seconds - self.min_users = 1 - self.max_users = 10000 - self.scaling_history = [] - self.lock = threading.Lock() - - # State for 'qos' mode (latency-driven) - self.cooldown_counter = 0 - self.cooldown_period = 3 # Wait for 3 cycles after a scale-down action - self.downward_trend_count = 0 - - # State for 'capacity' mode (throughput-driven) - self.capacity_stage = 0 - self.last_throughput = 0.0 - self.peak_throughput = 0.0 - self.peak_user_count = 0 - self.capacity_test_finished = False - self.throughput_history: List[float] = [] - # Clip capacity-mode step ramps so we do not overwhelm the system in a single jump. - self.capacity_initial_fraction = 0.4 - self.capacity_scale_fraction = 0.2 - self.capacity_min_step = 5 - self.capacity_max_step = 100 - - def calculate_scale_action( - self, - metrics: Optional[StorageMetrics], - current_throughput: float, - saturation_level: Optional[float] = None - ) -> Tuple[str, int]: - """Decides the next scaling action based on the selected mode.""" - if self.mode == 'qos': - if not metrics: return 'stable', self.current_users - return self._calculate_qos_action(metrics, saturation_level) - elif self.mode == 'capacity': - return self._calculate_capacity_action(current_throughput) - return 'stable', self.current_users - - def _calculate_qos_action(self, metrics: StorageMetrics, saturation_level: Optional[float]) -> Tuple[str, int]: - """Determines the scaling action for 'qos' mode based on latency and saturation.""" - with self.lock: - if self.cooldown_counter > 0: - self.cooldown_counter -= 1 - return 'hold', self.current_users # In cooldown from a recent scale-down - - saturation = saturation_level - if saturation is None: - saturation = 1.0 if metrics.is_saturated else 0.0 - - action = 'hold' - target_users = self.current_users - - if saturation > self.target_saturation * 1.1: # Significantly over target - self.downward_trend_count += 1 - if self.downward_trend_count >= 2: # Consistently over target - target_users = max(int(self.current_users * 0.8), self.min_users) - if target_users < self.current_users: - self.current_users = target_users - self.cooldown_counter = self.cooldown_period - action = 'scale_down' - elif saturation < self.target_saturation * 0.9: # Significantly under target - self.downward_trend_count = 0 - target_users = min(int(self.current_users * 1.2), self.max_users) - if target_users > self.current_users: - self.current_users = target_users - action = 'scale_up' - else: # Within target range - self.downward_trend_count = 0 - - return action, self.current_users - return 'hold', self.current_users - - def _calculate_capacity_action(self, current_throughput: float) -> Tuple[str, int]: - """ - Determines the scaling action for 'capacity' mode. - Aggressively adds users until throughput stops increasing. - """ - with self.lock: - self.throughput_history.append(current_throughput) - - if not self.throughput_history or len(self.throughput_history) == 1: - # First datapoint: kick off with a moderate scale-up to start discovery - self.peak_throughput = current_throughput - self.peak_user_count = self.current_users - step = self._compute_capacity_step(self.capacity_initial_fraction) - new_users = min(self.current_users + step, self.max_users) - if new_users > self.current_users: - self.current_users = new_users - return 'scale_up', self.current_users - return 'hold', self.current_users - - if current_throughput > self.peak_throughput * 1.01: # Require >1% increase - self.peak_throughput = current_throughput - self.peak_user_count = self.current_users - self.downward_trend_count = 0 - step = self._compute_capacity_step(self.capacity_scale_fraction) - new_users = min(self.current_users + step, self.max_users) - if new_users > self.current_users: - self.current_users = new_users - return 'scale_up', self.current_users - return 'hold', self.current_users - - self.downward_trend_count += 1 - if self.downward_trend_count >= 2: - self.capacity_test_finished = True - print(f"INFO: Peak capacity found at {self.peak_throughput:.2f} tok/s. Stopping test.") - return 'stop', self.current_users - - return 'hold', self.current_users - return 'hold', self.current_users - - def _compute_capacity_step(self, fraction: float) -> int: - """Calculate a bounded capacity-mode step for smoother scaling.""" - raw_step = max(int(self.current_users * fraction), self.capacity_min_step) - return min(raw_step, self.capacity_max_step) - - -# ============================================================================ -# FEATURE 7: QOS MONITORING -# Tracks QoS compliance for different user priority levels. -# ============================================================================ - -class QoSMonitor: - """Monitors and reports on QoS compliance in real-time.""" - - def __init__(self): - self.requests_by_qos: Dict[QoSLevel, List[InferenceRequest]] = {level: [] for level in QoSLevel} - self.lock = threading.Lock() - self.violations_by_qos: Dict[QoSLevel, int] = {level: 0 for level in QoSLevel} - - def record_request(self, request: InferenceRequest): - """Records a completed request and checks if it violated its SLA.""" - with self.lock: - self.requests_by_qos[request.qos_level].append(request) - - # Check for SLA violation. - sla = QOS_PROFILES[request.qos_level] - if request.total_latency_ms > sla.target_latency_p95_ms: - self.violations_by_qos[request.qos_level] += 1 - sla.violations += 1 - sla.total_requests += 1 - - def get_qos_metrics(self, qos_level: QoSLevel) -> Dict: - """Gets performance metrics for a specific QoS level.""" - with self.lock: - requests = self.requests_by_qos[qos_level] - if not requests: return {'no_data': True} - - latencies = [r.total_latency_ms for r in requests] - sla = QOS_PROFILES[qos_level] - - return { - 'total_requests': len(requests), - 'latency_ms': { - 'mean': np.mean(latencies), 'p50': np.percentile(latencies, 50), - 'p95': np.percentile(latencies, 95), 'p99': np.percentile(latencies, 99), - 'max': np.max(latencies), - }, - 'sla': { - 'target_p95_ms': sla.target_latency_p95_ms, - 'actual_p95_ms': np.percentile(latencies, 95), - 'compliance': sla.sla_compliance, - 'met': sla.sla_compliance >= 0.95 - - } - } - - def get_all_qos_metrics(self) -> Dict: - """Gets metrics for all QoS levels.""" - return {level.value: self.get_qos_metrics(level) for level in QoSLevel} - - -# ============================================================================ -# FEATURE 6: TRACE-DRIVEN VALIDATION -# Validates the benchmark's accuracy by comparing its results to a real trace. -# ============================================================================ - -@dataclass -class RealTraceEntry: - """Represents a single entry from a real-world LLM inference trace file.""" - timestamp: float - request_id: str - user_id: str - context_tokens: int - generation_tokens: int - phase: str - cache_hit: bool - cache_tier: str - read_bytes: int - write_bytes: int - read_latency_ms: float - write_latency_ms: float - model_name: str - conversation_id: Optional[str] = None - turn_number: Optional[int] = None - prefix_cached: bool = False - - -class ValidationEngine: - """Validates benchmark accuracy against real-world traces.""" - - def __init__(self, trace_path: Optional[str] = None): - self.trace_path = trace_path - self.trace_stats = None - - def load_trace(self) -> Dict: - """Loads and analyzes a trace file, or returns synthetic stats if none provided.""" - if not self.trace_path or not os.path.exists(self.trace_path): - # Return synthetic trace stats for testing purposes. - return { - 'total_requests': 1000, 'duration_seconds': 100, 'cache_hit_rate': 0.65, - 'read_write_ratio': 10.0, 'context_tokens_mean': 1024, 'generation_tokens_mean': 200, - } - - with open(self.trace_path, 'r') as f: - data = json.load(f) - entries = [RealTraceEntry(**entry) for entry in data] - - # Calculate key statistics from the real trace. - self.trace_stats = { - 'total_requests': len(entries), - 'cache_hit_rate': sum(1 for e in entries if e.cache_hit) / len(entries), - 'read_write_ratio': sum(e.read_bytes for e in entries) / max(sum(e.write_bytes for e in entries), 1), - 'context_tokens_mean': np.mean([e.context_tokens for e in entries]), - 'generation_tokens_mean': np.mean([e.generation_tokens for e in entries]), - } - return self.trace_stats - - def validate_benchmark(self, benchmark_results: Dict) -> Dict: - """Compares key benchmark results against the trace to calculate an error percentage.""" - if self.trace_stats is None: - self.trace_stats = self.load_trace() - - summary = benchmark_results.get('summary', {}) - cache_stats = summary.get('cache_stats', {}) - comparison = {} - - # Compare cache hit rate. - bench_hit_rate = cache_stats.get('cache_hit_rate', 0) - trace_hit_rate = self.trace_stats['cache_hit_rate'] - hit_rate_error = abs(bench_hit_rate - trace_hit_rate) / trace_hit_rate * 100 - - comparison['cache_hit_rate'] = { - 'benchmark': bench_hit_rate, 'trace': trace_hit_rate, - 'error_pct': hit_rate_error, 'within_5pct': hit_rate_error <= 5.0 - } - - errors = [comp['error_pct'] for comp in comparison.values() if 'error_pct' in comp] - avg_error = np.mean(errors) if errors else 0 - passed = avg_error <= 5.0 - - return { - 'passed': passed, 'avg_error_pct': avg_error, - 'comparison': comparison, 'trace_stats': self.trace_stats - } - - -# ============================================================================ -# USER SIMULATION AND WORKLOAD GENERATION -# Creates a realistic mix of user behaviors and request patterns. -# ============================================================================ - -class UserSimulator: - """Generates realistic user workloads based on pre-defined templates.""" - - # Templates for different user personas (chatbot, coding, document analysis). - USER_TEMPLATES = { - 'chatbot': { - 'context_range': (256, 1024), 'generation_range': (50, 150), 'think_time_range': (0.1, 0.5), - }, - 'coding': { - 'context_range': (1024, 4096), 'generation_range': (100, 500), 'think_time_range': (0.2, 1.0), - }, - 'document': { - 'context_range': (2048, 8192), 'generation_range': (200, 800), 'think_time_range': (0.3, 1.5), - }, - } - - @classmethod - def generate_user(cls, user_id: str, user_type: str = 'chatbot', priority: int = 1, - qos_level: QoSLevel = QoSLevel.BATCH) -> UserProfile: - """Generates a single user profile based on a template.""" - template = cls.USER_TEMPLATES.get(user_type, cls.USER_TEMPLATES['chatbot']) - return UserProfile( - user_id=user_id, - context_length=random.randint(*template['context_range']), - generation_length=random.randint(*template['generation_range']), - think_time=random.uniform(*template['think_time_range']), - priority=priority, - qos_level=qos_level - ) - - @classmethod - def generate_mixed_users(cls, num_users: int) -> List[UserProfile]: - """Generates a list of users with a realistic distribution of types and QoS levels.""" - users = [] - for i in range(num_users): - user_type = random.choice(['chatbot', 'coding', 'document']) - - # Simulate a realistic QoS distribution. - # 15% Interactive, 35% Responsive, 50% Batch. - rand = random.random() - if rand < 0.15: - qos_level, priority = QoSLevel.INTERACTIVE, 3 - elif rand < 0.50: - qos_level, priority = QoSLevel.RESPONSIVE, 2 - else: - qos_level, priority = QoSLevel.BATCH, 1 - - users.append(cls.generate_user(f"user_{i:04d}", user_type, priority, qos_level)) - return users - - -# ============================================================================ -# INTEGRATED BENCHMARK ORCHESTRATOR -# This class wires all the components together and runs the main benchmark loop. -# ============================================================================ - -class IntegratedBenchmark: - """The main orchestrator for the entire benchmark.""" - - def __init__(self, - model_config: ModelConfig, - num_users: int, - gpu_memory_gb: float, - cpu_memory_gb: float, - duration_seconds: int, - cache_dir: str = None, - enable_autoscaling: bool = False, - autoscaler_mode: str = 'qos', - target_saturation: float = 0.8, - enable_multi_turn: bool = True, - enable_prefix_caching: bool = True, - enable_rag: bool = False, - rag_num_docs: int = 10, - validation_trace: Optional[str] = None, - generation_mode: GenerationMode = GenerationMode.NONE, - performance_profile: str = 'latency', - use_burst_trace: bool = False, - burst_trace_path: Optional[str] = None, - dataset_path: Optional[str] = None, - max_conversations: int = 500, - seed: Optional[int] = None): - - self.model_config = model_config - self.num_users = num_users - self.initial_users = num_users - self.duration = duration_seconds - self.enable_autoscaling = enable_autoscaling - self.enable_multi_turn = enable_multi_turn - self.generation_mode = generation_mode - self.ms_per_token = GENERATION_TIMING[generation_mode] * 1000 - self.enable_prefix_caching = enable_prefix_caching - self.enable_rag = enable_rag - self.rag_num_docs = rag_num_docs - self.performance_profile = performance_profile - self.use_burst_trace = use_burst_trace - self.burst_trace_path = burst_trace_path - self.dataset_path = dataset_path - self.max_conversations = max_conversations - self.seed = seed - self.burst_requests: List[Tuple[int, int]] = [] - self.sharegpt_loader: Optional[ShareGPTDatasetLoader] = None - - # Load dataset if provided (takes priority over burst trace) - if self.dataset_path: - self.sharegpt_loader = ShareGPTDatasetLoader( - dataset_path=self.dataset_path, - max_conversations=self.max_conversations, - seed=self.seed - ) - self.use_dataset = True - elif self.use_burst_trace: - self._load_burst_trace() - self.use_dataset = False - else: - self.use_dataset = False - - # Initialize components - self.cache = MultiTierCache( - model_config=model_config, - gpu_memory_gb=gpu_memory_gb, - cpu_memory_gb=cpu_memory_gb, - cache_dir=cache_dir, - performance_profile=performance_profile, - seed=seed - ) - self.conversation_manager = ConversationManager() - self.prefix_cache_manager = PrefixCacheManager(self.cache) if enable_prefix_caching else None - self.rag_manager = RAGDocumentManager(self.cache) if enable_rag else None - self.qos_monitor = QoSMonitor() - self.storage_monitor = StorageMonitor(self) if enable_autoscaling else None - self.autoscaler = WorkloadAutoscaler( - mode=autoscaler_mode, - initial_users=self.num_users, - target_saturation=target_saturation - ) if enable_autoscaling else None - self.scale_interval = self.autoscaler.scale_interval if self.autoscaler else 1.0 - self.validator = ValidationEngine(validation_trace) if validation_trace else None - - self.request_queue = queue.PriorityQueue() - self.request_counter = 0 - self.counter_lock = threading.Lock() - - self.active_users = [] - self.user_generators = {} - self.user_conversations: Dict[str, str] = {} - self.user_conversations_lock = threading.Lock() - - # Dictionary to store all results. - self.results = { - 'requests_completed': 0, 'total_tokens_generated': 0, - 'total_storage_io_latency': 0.0, 'total_generation_latency': 0.0, - 'end_to_end_latencies': [], 'storage_latencies': [], 'generation_latencies': [], - 'throughput_timeline': [], 'prefill_latencies': [], 'decode_latencies': [], - 'multi_turn_cache_hits': 0, 'multi_turn_cache_misses': 0, - 'seed': self.seed, - } - self.results_lock = threading.Lock() - self.rag_ingest_done = threading.Event() if self.enable_rag else None - - def _ingest_rag_documents(self, num_docs: int, stop_event: Optional[threading.Event] = None): - """Ingests RAG documents for the workload.""" - print(f"Ingesting {num_docs} RAG documents...") - for i in range(num_docs): - if stop_event and stop_event.is_set(): - break - # Scale document size based on model footprint so ingestion doesn't monopolize memory. - if self.model_config.hidden_dim >= 8192 or self.model_config.num_layers >= 64: - token_range = (1024, 4096) - else: - token_range = (4000, 12000) - - doc_tokens = random.randint(*token_range) - self.rag_manager.ingest_document(f"doc_{i:04d}", doc_tokens, self.model_config) - - if self.rag_ingest_done: - self.rag_ingest_done.set() - - def _load_burst_trace(self): - """Loads requests from the BurstGPT CSV trace file.""" - if not self.burst_trace_path: - print("Error: --use-burst-trace flag requires --burst-trace-path to be set.") - sys.exit(1) - try: - with open(self.burst_trace_path, 'r', encoding='utf-8') as f: - reader = csv.DictReader(f) - for row in reader: - try: - context_tokens = int(row['Request tokens']) - generate_tokens = int(row['Response tokens']) - self.burst_requests.append((context_tokens, generate_tokens)) - except (ValueError, KeyError): - continue - print(f"Loaded {len(self.burst_requests)} requests from BurstGPT trace.") - except FileNotFoundError: - print(f"Error: Trace file not found at {self.burst_trace_path}") - sys.exit(1) - except Exception as e: - print(f"Error reading trace file: {e}") - sys.exit(1) - - def _generate_requests_from_dataset(self, stop_event: threading.Event): - """Generates InferenceRequest objects from the loaded ShareGPT dataset.""" - if not self.sharegpt_loader or not self.sharegpt_loader.conversations: - print("Warning: ShareGPT dataset is empty or not loaded. Falling back to synthetic workload.") - # Fall back to synthetic generation - users = UserSimulator.generate_mixed_users(self.num_users) - self.generate_requests(users, stop_event) - return - - conversation_iterator = iter(self.sharegpt_loader.iterate_conversations(shuffle=True)) - current_conversation = None - turn_index = 0 - - while not stop_event.is_set(): - # Get next conversation turn - if current_conversation is None or turn_index >= len(current_conversation['turns']): - try: - current_conversation = next(conversation_iterator) - turn_index = 0 - except StopIteration: - # Restart iteration when we run out of conversations - conversation_iterator = iter(self.sharegpt_loader.iterate_conversations(shuffle=True)) - continue - - turn = current_conversation['turns'][turn_index] - context_tokens = turn['context_tokens'] - generate_tokens = turn['generation_tokens'] - - with self.counter_lock: - req_id = self.request_counter - self.request_counter += 1 - - # Assign QoS level based on request characteristics - rand = random.random() - if rand < 0.15: - qos_level, priority = QoSLevel.INTERACTIVE, 3 - elif rand < 0.50: - qos_level, priority = QoSLevel.RESPONSIVE, 2 - else: - qos_level, priority = QoSLevel.BATCH, 1 - - user_id = f"dataset_user_{req_id % self.num_users}" - conv_id = current_conversation['id'] - - # Determine inference phase - phase = InferencePhase.PREFILL if context_tokens >= 10000 else InferencePhase.PREFILL_DECODE - - request = InferenceRequest( - user_id=user_id, - request_id=f"{user_id}_req_{req_id:04d}", - timestamp=datetime.now(), - context_tokens=context_tokens, - generate_tokens=generate_tokens, - priority=priority, - phase=phase, - qos_level=qos_level, - cache_key=f"{conv_id}_turn_{turn['turn_number']}", - conversation_id=conv_id if self.enable_multi_turn else None, - turn_number=turn['turn_number'] if self.enable_multi_turn else None - ) - - priority_tuple = (-QOS_PROFILES[request.qos_level].priority, time.time()) - self.request_queue.put((priority_tuple, request)) - - turn_index += 1 - - # Control request arrival rate to match target throughput - # For comparison with vllm benchmark at ~10 requests/second - time.sleep(1.0 / 10.0) # 10 requests per second - - def _generate_requests_from_trace(self, stop_event: threading.Event): - """Generates InferenceRequest objects from the loaded trace.""" - request_index = 0 - while not stop_event.is_set(): - if not self.burst_requests: - print("Warning: BurstGPT trace is empty. No requests to generate.") - time.sleep(1) - continue - - if request_index >= len(self.burst_requests): - request_index = 0 # Loop - - context_tokens, generate_tokens = self.burst_requests[request_index] - - with self.counter_lock: - req_id = self.request_counter - self.request_counter += 1 - - rand = random.random() - if rand < 0.15: - qos_level, priority = QoSLevel.INTERACTIVE, 3 - elif rand < 0.50: - qos_level, priority = QoSLevel.RESPONSIVE, 2 - else: - qos_level, priority = QoSLevel.BATCH, 1 - - user_id = f"trace_user_{request_index % 1000}" - - # Determine inference phase for trace-driven requests. - # CRITICAL FIX: Using the same 10000-token threshold as synthetic workloads - # to ensure consistent behavior and comprehensive storage I/O testing. - # See the detailed explanation in generate_requests() for why this threshold matters. - request = InferenceRequest( - user_id=user_id, - request_id=f"{user_id}_req_{req_id:04d}", - timestamp=datetime.now(), - context_tokens=context_tokens, - generate_tokens=generate_tokens, - priority=priority, - phase=InferencePhase.PREFILL if context_tokens >= 10000 else InferencePhase.PREFILL_DECODE, - qos_level=qos_level, - cache_key=f"{user_id}_req_{req_id:04d}" - ) - - priority_tuple = (-QOS_PROFILES[request.qos_level].priority, time.time()) - self.request_queue.put((priority_tuple, request)) - - request_index += 1 - time.sleep(0.01) # Simulate request arrival rate - - def generate_requests(self, users: List[UserProfile], stop_event: threading.Event): - """Generate requests concurrently for each simulated user.""" - - # Kick off RAG ingestion so document threads can run in parallel with user traffic. - if self.enable_rag and self.rag_manager and self.rag_ingest_done: - threading.Thread( - target=self._ingest_rag_documents, - args=(self.rag_num_docs, stop_event), - daemon=True - ).start() - - def enqueue_request(request: InferenceRequest): - priority_tuple = (-QOS_PROFILES[request.qos_level].priority, time.time()) - self.request_queue.put((priority_tuple, request)) - - def user_worker(user: UserProfile): - """Simulates an individual user generating traffic.""" - local_conv_id = None - - while not stop_event.is_set(): - # Randomize think time slightly to avoid global synchronization. - time.sleep(user.think_time * random.uniform(0.8, 1.2)) - if stop_event.is_set(): - break - - # Handle conversation lifecycle when multi-turn is enabled. - if self.enable_multi_turn and self.conversation_manager: - if local_conv_id and random.random() >= 0.8: - with self.user_conversations_lock: - self.user_conversations.pop(user.user_id, None) - local_conv_id = None - - if local_conv_id is None: - local_conv_id = self.conversation_manager.start_conversation(user.user_id) - with self.user_conversations_lock: - self.user_conversations[user.user_id] = local_conv_id - else: - local_conv_id = None - - new_context = random.randint(max(1, user.context_length // 4), user.context_length) - new_gen = random.randint(max(1, user.generation_length // 4), user.generation_length) - - with self.counter_lock: - req_id = self.request_counter - self.request_counter += 1 - - if self.enable_multi_turn and self.conversation_manager and local_conv_id: - turn_number, cache_key = self.conversation_manager.add_turn(local_conv_id, new_context, new_gen) - else: - turn_number = 1 - cache_key = f"{user.user_id}_req_{req_id:06d}" - - phase = InferencePhase.PREFILL if new_context >= 10000 else InferencePhase.PREFILL_DECODE - - request = InferenceRequest( - user_id=user.user_id, - request_id=f"req_{user.user_id}_{req_id:06d}", - timestamp=datetime.now(), - context_tokens=new_context, - generate_tokens=new_gen, - priority=user.priority, - phase=phase, - qos_level=user.qos_level, - cache_key=cache_key, - conversation_id=local_conv_id, - turn_number=turn_number - ) - - enqueue_request(request) - - # Occasionally inject RAG queries on behalf of this user. - if (self.enable_rag and self.rag_manager and self.rag_ingest_done and - self.rag_ingest_done.is_set() and self.rag_manager.documents and - random.random() < 0.1): - doc_id = random.choice(list(self.rag_manager.documents.keys())) - retrieved_chunks = self.rag_manager.retrieve_chunks(doc_id) - rag_context_tokens = sum(chunk.token_count for chunk in retrieved_chunks) - - with self.counter_lock: - rag_req_id = self.request_counter - self.request_counter += 1 - - rag_request = InferenceRequest( - user_id=user.user_id, - request_id=f"rag_{user.user_id}_{rag_req_id:06d}", - timestamp=datetime.now(), - context_tokens=rag_context_tokens, - generate_tokens=random.randint(50, 200), - priority=user.priority, - phase=InferencePhase.DECODE, - qos_level=user.qos_level, - cache_key=f"rag_{doc_id}" - ) - enqueue_request(rag_request) - - # Launch a worker thread per user to maintain high request concurrency. - for user in users: - threading.Thread(target=user_worker, args=(user,), daemon=True).start() - - self.active_users = users - - # Keep this generator alive until the benchmark signals shutdown. - stop_event.wait() - - def process_requests(self, stop_event: threading.Event): - """The main worker loop that processes requests from the queue.""" - while not stop_event.is_set(): - try: - priority_tuple, request = self.request_queue.get(timeout=0.5) - except queue.Empty: - continue # If the queue is empty, loop again. - - request.start_time = time.perf_counter() - storage_latency = 0.0 - cache_type = 'user' - - # --- REQUEST LIFECYCLE --- # - - # 1. Check for a prefix cache hit. - if self.prefix_cache_manager: - prefix_entry, remaining_tokens = self.prefix_cache_manager.check_prefix_cache(request, self.model_config) - if prefix_entry: - cache_type = 'system' if prefix_entry.prefix_type == PrefixType.SYSTEM_PROMPT else 'common' - _, read_lat = self.cache.access_cache(prefix_entry.kv_cache_key, request.phase, cache_type) - storage_latency += read_lat - request.context_tokens = remaining_tokens - - # 2. For multi-turn conversations, access the cache from the previous turn. - if self.conversation_manager and request.turn_number > 1: - prev_turn_key = f"{request.conversation_id}_turn_{request.turn_number - 1}" - location, read_latency = self.cache.access_cache(prev_turn_key, InferencePhase.DECODE, 'multi_turn') - if location is not None: - storage_latency += read_latency - with self.results_lock: self.results['multi_turn_cache_hits'] += 1 - else: - with self.results_lock: self.results['multi_turn_cache_misses'] += 1 - - # 3. Perform the main PREFILL operation (a cache WRITE). - if request.phase == InferencePhase.PREFILL or request.phase == InferencePhase.PREFILL_DECODE: - success, location, write_latency = self.cache.allocate_cache( - request.cache_key, request.context_tokens, InferencePhase.PREFILL - ) - storage_latency += write_latency - with self.results_lock: self.results['prefill_latencies'].append(write_latency) - - # 4. Simulate a RAG operation by reading random chunk caches. - if self.rag_manager and random.random() < 0.1: # 10% of requests are RAG queries - doc_id = random.choice(list(self.rag_manager.documents.keys())) - chunks = self.rag_manager.retrieve_chunks(doc_id) - for chunk in chunks: # Read the KV cache for each retrieved chunk. - _, read_lat = self.cache.access_cache(chunk.kv_cache_key, InferencePhase.DECODE) - storage_latency += read_lat - - # 5. Perform the DECODE operation (a cache READ). - if request.phase == InferencePhase.DECODE or request.phase == InferencePhase.PREFILL_DECODE: - location, read_latency = self.cache.access_cache(request.cache_key, InferencePhase.DECODE, cache_type) - - if location is None: # This would be a cache miss. - _, _, write_latency = self.cache.allocate_cache( - request.cache_key, - request.context_tokens, - InferencePhase.PREFILL - ) - storage_latency += write_latency - else: - # Simulate realistic decode I/O: reads are batched, not per-token. - decode_batch_size = 32 - num_batched_reads = max(1, (request.generate_tokens + decode_batch_size - 1) // decode_batch_size) - for _ in range(num_batched_reads): - _, batch_read_latency = self.cache.access_cache(request.cache_key, InferencePhase.DECODE, cache_type) - storage_latency += batch_read_latency - - with self.results_lock: self.results['decode_latencies'].append(read_latency) - - # 6. Simulate token generation time if not in pure storage mode. - generation_latency = request.generate_tokens * GENERATION_TIMING[self.generation_mode] - if generation_latency > 0: time.sleep(generation_latency) - - request.complete_time = time.perf_counter() - - # 7. Record all results for this request. - with self.results_lock: - self.results['requests_completed'] += 1 - self.results['total_tokens_generated'] += request.generate_tokens - self.results['total_storage_io_latency'] += storage_latency - self.results['total_generation_latency'] += generation_latency - self.results['end_to_end_latencies'].append(request.total_latency_ms / 1000) - self.results['storage_latencies'].append(storage_latency) - self.results['generation_latencies'].append(generation_latency) - - self.qos_monitor.record_request(request) - - def monitor_stats(self, stop_event: threading.Event): - """Periodically collects and logs stats, and triggers autoscaling.""" - start_time = time.time() - last_log_time = start_time - - while not stop_event.is_set(): - time.sleep(self.scale_interval) - now = time.time() - - elapsed = now - start_time - if elapsed > self.duration: - break - - # Track throughput timeline for reporting - with self.results_lock: - total_tokens = self.results['total_tokens_generated'] - throughput = total_tokens / max(elapsed, 1e-6) - with self.results_lock: - self.results['throughput_timeline'].append({ - 'timestamp': elapsed, - 'throughput_tokens_per_sec': throughput - }) - - if self.enable_autoscaling and self.storage_monitor and self.autoscaler: - metrics = self.storage_monitor.collect_metrics(self.cache, self.request_queue.qsize()) - saturation_level = self.storage_monitor.get_saturation_level() - if metrics: - metrics.saturation_level = saturation_level - - action, target_users = self.autoscaler.calculate_scale_action( - metrics if metrics else None, - throughput, - saturation_level - ) - - if action in ('scale_up', 'scale_down') and target_users != self.num_users: - self.num_users = max(1, min(target_users, 500)) - self.autoscaler.current_users = self.num_users - log_entry = { - 'timestamp': datetime.now().isoformat(), - 'mode': self.autoscaler.mode, - 'action': action, - 'users': self.num_users, - 'saturation_level': saturation_level, - 'read_latency_p95_ms': metrics.read_latency_p95_ms if metrics else None, - 'write_latency_p95_ms': metrics.write_latency_p95_ms if metrics else None, - 'throughput_tokens_per_sec': throughput - } - self.autoscaler.scaling_history.append(log_entry) - print(f"Autoscaler {action} -> {self.num_users} users (saturation: {saturation_level:.2f})") - elif action == 'stop': - print("Autoscaler requested stop after reaching capacity peak.") - stop_event.set() - log_entry = { - 'timestamp': datetime.now().isoformat(), - 'mode': self.autoscaler.mode, - 'action': 'stop', - 'users': self.num_users, - 'saturation_level': saturation_level, - 'peak_throughput_tokens_per_sec': self.autoscaler.peak_throughput - } - self.autoscaler.scaling_history.append(log_entry) - else: - # Keep autoscaler internal state aligned with the active user count. - self.autoscaler.current_users = self.num_users - - # Log stats periodically - if now - last_log_time >= 10: - self._calculate_stats() - queue_depth = self.request_queue.qsize() - print(f"Time: {int(elapsed)}s, Users: {self.num_users}, Queue: {queue_depth}, " - f"Throughput: {throughput:.2f} tok/s") - last_log_time = now - - def run(self) -> Dict: - """The main entry point to start the benchmark execution.""" - print(f"\nIntegrated Multi-User KV Cache Benchmark - MLPerf Edition") - print(f"Model: {self.model_config.name}") - print(f"Users: {self.num_users}") - print(f"Duration: {self.duration}s") - if self.seed is not None: - print(f"Seed: {self.seed}") - print(f"Generation Mode: {self.generation_mode.value} ({self.ms_per_token:.1f}ms/token)") - print(f"Features:") - print(f" - Phase-Aware Processing: Enabled") - print(f" - Multi-turn Conversations: {'Enabled' if self.enable_multi_turn else 'Disabled'}") - print(f" - Prefix Caching: {'Enabled' if self.enable_prefix_caching else 'Disabled'}") - print(f" - RAG Workload: {'Enabled' if self.enable_rag else 'Disabled'}") - print(f" - Autoscaling: {'Enabled' if self.enable_autoscaling else 'Disabled'}") - if self.enable_autoscaling: - print(f" - Mode: {self.autoscaler.mode}") - print(f" - QoS Support: Enabled (Interactive/Responsive/Batch)") - print(f" - Trace-Driven (BurstGPT): {'Enabled' if self.use_burst_trace else 'Disabled'}") - print(f" - ShareGPT Dataset: {'Enabled' if self.use_dataset else 'Disabled'}") - print("=" * 80) - - users = [] - if self.use_dataset and self.sharegpt_loader: - # Display ShareGPT dataset statistics - stats = self.sharegpt_loader.token_stats - if stats: - print(f"\nShareGPT Dataset Statistics:") - print(f" Conversations: {stats['total_conversations']}") - print(f" Total turns: {stats['total_turns']}") - print(f"\nContext Token Distribution:") - print(f" Min: {stats['context_min']:.0f} tokens ({stats['context_min'] * self.model_config.kv_cache_size_per_token / 1024**2:.2f} MB)") - print(f" Max: {stats['context_max']:.0f} tokens ({stats['context_max'] * self.model_config.kv_cache_size_per_token / 1024**2:.2f} MB)") - print(f" Mean: {stats['context_mean']:.0f} tokens ({stats['context_mean'] * self.model_config.kv_cache_size_per_token / 1024**2:.2f} MB)") - print(f" P50: {stats['context_p50']:.0f} tokens") - print(f" P95: {stats['context_p95']:.0f} tokens") - print(f"\nGeneration Token Distribution:") - print(f" Min: {stats['generation_min']:.0f} tokens") - print(f" Max: {stats['generation_max']:.0f} tokens") - print(f" Mean: {stats['generation_mean']:.0f} tokens") - print(f" P50: {stats['generation_p50']:.0f} tokens") - print(f" P95: {stats['generation_p95']:.0f} tokens") - elif not self.use_burst_trace: - users = UserSimulator.generate_mixed_users(self.num_users) - context_lengths = [u.context_length for u in users] - print(f"\nUser Context Length Distribution:") - print(f" Min: {min(context_lengths)} tokens ({min(context_lengths) * self.model_config.kv_cache_size_per_token / 1024**2:.2f} MB)") - print(f" Max: {max(context_lengths)} tokens ({max(context_lengths) * self.model_config.kv_cache_size_per_token / 1024**2:.2f} MB)") - print(f" Mean: {np.mean(context_lengths):.0f} tokens ({np.mean(context_lengths) * self.model_config.kv_cache_size_per_token / 1024**2:.2f} MB)") - - qos_dist = {level: sum(1 for u in users if u.qos_level == level) for level in QoSLevel} - print(f"\nQoS Distribution:") - for level, count in qos_dist.items(): - print(f" {level.value}: {count} users") - - print(f"\nStarting benchmark...") - print("-" * 80) - - stop_event = threading.Event() - - threads = [] - if self.use_dataset: - gen_thread = threading.Thread(target=self._generate_requests_from_dataset, args=(stop_event,), daemon=True) - elif self.use_burst_trace: - gen_thread = threading.Thread(target=self._generate_requests_from_trace, args=(stop_event,), daemon=True) - else: - gen_thread = threading.Thread(target=self.generate_requests, args=(users, stop_event), daemon=True) - - threads.append(gen_thread) - gen_thread.start() - - num_workers = min(self.num_users, 500) - for _ in range(num_workers): - proc_thread = threading.Thread(target=self.process_requests, args=(stop_event,), daemon=True) - threads.append(proc_thread) - proc_thread.start() - - # Only start the monitor thread if autoscaling is enabled. - if self.enable_autoscaling: - mon_thread = threading.Thread(target=self.monitor_stats, args=(stop_event,), daemon=True) - threads.append(mon_thread) - mon_thread.start() - - # Wait for either the configured duration or an earlier stop signal from the monitor. - stop_event.wait(timeout=self.duration) - - stop_event.set() - for thread in threads: - thread.join(timeout=2.0) - - self._calculate_stats() - - if self.validator: - self.results['validation'] = self.validator.validate_benchmark(self.results) - - return self.results - - def _calculate_stats(self): - """Calculate final statistics with all feature breakdowns""" - if not self.results['end_to_end_latencies']: - print("\nNo requests completed during benchmark!") - return - - e2e = np.array(self.results['end_to_end_latencies']) - storage = np.array(self.results['storage_latencies']) - generation = np.array(self.results['generation_latencies']) - - cache_stats = self.cache.get_stats(self.duration) - qos_metrics = self.qos_monitor.get_all_qos_metrics() - prefix_stats = self.prefix_cache_manager.stats if self.prefix_cache_manager else {} - autoscaling_stats = self.autoscaler.scaling_history if self.autoscaler else [] - - autoscaling_summary = None - if self.autoscaler: - autoscaling_summary = { - 'initial_users': getattr(self, 'initial_users', self.num_users), - 'final_users': self.autoscaler.current_users, - 'total_scale_events': len(autoscaling_stats) - } - if self.autoscaler.mode == 'capacity': - autoscaling_summary.update({ - 'peak_user_count': self.autoscaler.peak_user_count, - 'peak_throughput_tokens_per_sec': self.autoscaler.peak_throughput - }) - - summary = { - 'total_requests': self.results['requests_completed'], - 'total_tokens': self.results['total_tokens_generated'], - 'avg_throughput_tokens_per_sec': self.results['total_tokens_generated'] / self.duration, - 'requests_per_second': self.results['requests_completed'] / self.duration, - 'end_to_end_latency_ms': { - 'mean': np.mean(e2e) * 1000, - 'p50': np.percentile(e2e, 50) * 1000, - 'p95': np.percentile(e2e, 95) * 1000, - 'p99': np.percentile(e2e, 99) * 1000, - }, - 'storage_io_latency_ms': { - 'mean': np.mean(storage) * 1000, - 'p50': np.percentile(storage, 50) * 1000, - 'p95': np.percentile(storage, 95) * 1000, - 'p99': np.percentile(storage, 99) * 1000, - }, - 'generation_latency_ms': { - 'mean': np.mean(generation) * 1000, - 'p50': np.percentile(generation, 50) * 1000, - 'p95': np.percentile(generation, 95) * 1000, - 'p99': np.percentile(generation, 99) * 1000, - }, - 'cache_stats': cache_stats, - 'qos_metrics': qos_metrics, - 'prefix_cache_stats': prefix_stats, - 'autoscaling_stats': autoscaling_stats, - 'autoscaling_summary': autoscaling_summary, - 'multi_turn_stats': { - 'cache_hits': self.results['multi_turn_cache_hits'], - 'cache_misses': self.results['multi_turn_cache_misses'], - 'hit_rate': self.results['multi_turn_cache_hits'] / - max(self.results['multi_turn_cache_hits'] + self.results['multi_turn_cache_misses'], 1) - } - } - self.results['summary'] = summary - self._print_summary(summary) - - def _print_summary(self, summary: Dict): - """ - Print a comprehensive benchmark results summary to console. - Displays detailed performance metrics including storage I/O latency, throughput, - cache statistics, tier-specific performance, and QoS metrics in a formatted - report suitable for analysis and comparison. - Args: - summary (Dict): Benchmark results dictionary containing: - - cache_stats: Storage performance and cache hit statistics - - total_requests: Number of completed requests - - total_tokens: Total tokens processed - - avg_throughput_tokens_per_sec: Average token throughput - - requests_per_second: Request rate - - end_to_end_latency_ms: Complete request latency percentiles - - storage_io_latency_ms: Storage-only latency percentiles - - generation_latency_ms: Token generation latency percentiles - - qos_metrics: Quality of service metrics by tier - - prefix_cache_stats: Prefix caching performance (optional) - - multi_turn_stats: Multi-turn conversation metrics (optional) - - autoscaling_stats: Autoscaling events (optional) - The report includes: - - Storage performance assessment with pass/fail criteria - - Overall throughput and latency metrics - - Cache hit rates and I/O statistics - - Memory tier distribution (GPU/CPU/NVMe) - - Phase-specific metrics (prefill/decode) - - QoS compliance by service tier - - Validation results if available - Note: - The symbols âœ" and ✗ are intended to be checkmark (✓) and cross (✗) - characters for pass/fail indicators but may display incorrectly due to - encoding issues. - """ - """Print comprehensive results summary""" - print("\n" + "=" * 80) - print("BENCHMARK RESULTS - MLPerf KV Cache Storage Benchmark") - print(f"Generation Mode: {self.generation_mode.value} ({self.ms_per_token:.1f}ms/token)") - print("=" * 80) - - cache_stats = summary['cache_stats'] - if 'storage_health' in cache_stats: - storage_health = cache_stats['storage_health'] - status = storage_health['overall_status'] - status_symbol = '✓' if status == 'PASS' else '✗' - print(f"\n### STORAGE PERFORMANCE ASSESSMENT: {status} {status_symbol} ###") - print(f" Criteria Passed: {storage_health['passed_count']}/{storage_health['total_count']}") - for criterion in storage_health['criteria']: - symbol = '✓' if criterion['passed'] else '✗' - unit = criterion.get('unit', '') - if unit == 'ratio': - print(f" {symbol} {criterion['name']}: {criterion['actual']:.1%} (target: {criterion['target']:.1%})") - continue - - actual = criterion.get('actual') - target = criterion.get('target') - try: - # Attempt to format if it's a number - actual_str = f"{actual:.2f}" - except (ValueError, TypeError): - # If it's already a string or can't be formatted, use it directly - actual_str = str(actual) - - try: - target_str = f"{target:.2f}" - except (ValueError, TypeError): - target_str = str(target) - - unit_suffix = unit if unit else '' - print(f" {symbol} {criterion['name']}: {actual_str}{unit_suffix} (target: {target_str}{unit_suffix})") - - print(f"\n### OVERALL PERFORMANCE ###") - print(f"Requests Completed: {summary['total_requests']}") - print(f"Total Tokens Generated: {summary['total_tokens']}") - print(f"Throughput: {summary['avg_throughput_tokens_per_sec']:.2f} tokens/sec") - print(f"Requests/sec: {summary['requests_per_second']:.2f}") - - print(f"\n### END-TO-END LATENCY (Storage I/O + Token Generation) ###") - print(f" Mean: {summary['end_to_end_latency_ms']['mean']:.2f} ms") - print(f" P50: {summary['end_to_end_latency_ms']['p50']:.2f} ms") - print(f" P95: {summary['end_to_end_latency_ms']['p95']:.2f} ms") - print(f" P99: {summary['end_to_end_latency_ms']['p99']:.2f} ms") - - print(f"\n### STORAGE I/O LATENCY (Primary Metric) ###") - print(f" Mean: {summary['storage_io_latency_ms']['mean']:.2f} ms") - print(f" P50: {summary['storage_io_latency_ms']['p50']:.2f} ms") - print(f" P95: {summary['storage_io_latency_ms']['p95']:.2f} ms") - print(f" P99: {summary['storage_io_latency_ms']['p99']:.2f} ms") - - if self.generation_mode != GenerationMode.NONE: - print(f"\n### TOKEN GENERATION LATENCY (Simulated @ {self.ms_per_token:.1f}ms/token) ###") - print(f" Mean: {summary['generation_latency_ms']['mean']:.2f} ms") - print(f" P50: {summary['generation_latency_ms']['p50']:.2f} ms") - print(f" P95: {summary['generation_latency_ms']['p95']:.2f} ms") - - print(f"\n### STORAGE PERFORMANCE ###") - print(f" Cache Hit Rate: {cache_stats['cache_hit_rate']*100:.1f}%") - print(f" Total Read: {cache_stats['total_read_gb']:.2f} GB") - print(f" Total Write: {cache_stats['total_write_gb']:.2f} GB") - print(f" Read/Write Ratio: {cache_stats['read_write_ratio']:.2f}") - print(f" Read IOPS: {cache_stats['read_iops'] / self.duration:.2f}") - print(f" Write IOPS: {cache_stats['write_iops'] / self.duration:.2f}") - - print(f"\n### CACHE TIER DISTRIBUTION ###") - print(f" GPU Entries: {cache_stats['gpu_entries']} ({cache_stats['gpu_memory_used_gb']:.2f} GB)") - print(f" CPU Entries: {cache_stats['cpu_entries']} ({cache_stats['cpu_memory_used_gb']:.2f} GB)") - print(f" NVMe Entries: {cache_stats['nvme_entries']}") - - print(f"\n### PHASE-SPECIFIC METRICS ###") - print(f" Prefill Writes: {cache_stats['prefill_writes']}") - print(f" Prefill Bytes Written: {cache_stats['prefill_bytes_written_gb']:.2f} GB") - print(f" Decode Reads: {cache_stats['decode_reads']}") - print(f" Decode Bytes Read: {cache_stats['decode_bytes_read_gb']:.2f} GB") - - print(f"\n### TIER-SPECIFIC LATENCIES ###") - for tier in ['gpu', 'cpu', 'nvme']: - for op in ['read', 'write']: - p95_key = f'{tier}_{op}_p95_ms' - if p95_key in cache_stats: - print(f" {tier.upper()} {op.title()} P95: {cache_stats[p95_key]:.2f} ms") - - print(f"\n### CACHE TYPE BREAKDOWNS ###") - print(f" System Prompt Hits: {cache_stats['system_prompt_hits']}") - print(f" Common Phrase Hits: {cache_stats['common_phrase_hits']}") - print(f" User Cache Hits: {cache_stats['user_cache_hits']}") - print(f" Multi-turn Hits: {cache_stats['multi_turn_hits']}") - - if summary.get('prefix_cache_stats') and summary['prefix_cache_stats']['prefix_hits'] > 0: - print(f"\n### PREFIX CACHING ###") - prefix_stats = summary['prefix_cache_stats'] - print(f" Prefix Hits: {prefix_stats['prefix_hits']}") - print(f" Prefix Misses: {prefix_stats['prefix_misses']}") - print(f" System Prompt Reuse: {prefix_stats['system_prompt_reuse']}") - print(f" Bytes Saved: {prefix_stats['bytes_saved'] / 1024**3:.2f} GB") - - if summary.get('multi_turn_stats') and summary['multi_turn_stats']['cache_hits'] > 0: - print(f"\n### MULTI-TURN CONVERSATIONS ###") - mt_stats = summary['multi_turn_stats'] - print(f" Multi-turn Cache Hits: {mt_stats['cache_hits']}") - print(f" Multi-turn Cache Misses: {mt_stats['cache_misses']}") - print(f" Multi-turn Hit Rate: {mt_stats['hit_rate']*100:.1f}%") - - print(f"\n### QOS LATENCY METRICS (Informational - includes simulated generation) ###") - qos_metrics = summary['qos_metrics'] - for qos_level, metrics in qos_metrics.items(): - if metrics.get('no_data'): continue - print(f"\n {qos_level.upper()}:") - print(f" Requests: {metrics['total_requests']}") - print(f" Latency P95: {metrics['latency_ms']['p95']:.2f} ms") - print(f" Latency P99: {metrics['latency_ms']['p99']:.2f} ms") - if 'sla' in metrics: - sla_met = '✓' if metrics['sla']['met'] else '✗' - print(f" SLA Met: {sla_met} (compliance: {metrics['sla']['compliance']:.1%})") - - if summary.get('autoscaling_stats'): - auto_stats = summary['autoscaling_stats'] - if auto_stats: - print(f"\n### AUTOSCALING ({self.autoscaler.mode} mode) ###") - print(f" Scaling Events: {len(auto_stats)}") - print(f" Final User Count: {self.autoscaler.current_users}") - if self.autoscaler.mode == 'capacity': - print(f" Peak Capacity Found: {self.autoscaler.peak_throughput:.2f} tok/s at {self.autoscaler.peak_user_count} users") - - if 'validation' in self.results: - print(f"\n### VALIDATION ###") - validation = self.results['validation'] - print(f" Validation: {'PASSED ✓' if validation['passed'] else 'FAILED ✗'}") - print(f" Average Error: {validation['avg_error_pct']:.2f}%") - - print("\n" + "=" * 80) - print("NOTES:") - if self.generation_mode == GenerationMode.NONE: - print(" - Pure storage I/O benchmark (no generation simulation)") - else: - print(" - End-to-end latency includes simulated GPU inference") - print("=" * 80) - - -def main(): - """Main entry point for running the benchmark from the command line.""" - parser = argparse.ArgumentParser(description="Integrated Multi-User KV Cache Benchmark") - parser.add_argument('--model', type=str, default='llama3.1-8b', choices=MODEL_CONFIGS.keys(), - help='The model configuration to use.') - parser.add_argument('--num-users', type=int, default=100, - help='The number of concurrent users to simulate.') - parser.add_argument('--duration', type=int, default=60, - help='The duration of the benchmark in seconds.') - parser.add_argument('--gpu-mem-gb', type=float, default=16, - help='The amount of GPU memory (VRAM) to allocate for the cache in GB.') - parser.add_argument('--cpu-mem-gb', type=float, default=32, - help='The amount of CPU memory (RAM) to allocate for the cache in GB.') - parser.add_argument('--cache-dir', type=str, default=None, - help='The directory to use for the NVMe cache tier. Defaults to a temporary directory.') - parser.add_argument('--generation-mode', type=str, default='realistic', choices=[g.value for g in GenerationMode], - help='The token generation speed simulation mode.') - parser.add_argument('--performance-profile', type=str, default='latency', choices=['latency', 'throughput'], - help='The performance profile to use for pass/fail criteria (latency or throughput).') - parser.add_argument('--disable-multi-turn', action='store_true', - help='Disable multi-turn conversation caching.') - parser.add_argument('--disable-prefix-caching', action='store_true', - help='Disable prefix caching.') - parser.add_argument('--enable-rag', action='store_true', - help='Enable the RAG workload simulation.') - parser.add_argument('--rag-num-docs', type=int, default=10, help='Number of RAG documents to ingest') - parser.add_argument('--enable-autoscaling', action='store_true', - help='Enable workload autoscaling.') - parser.add_argument('--autoscaler-mode', type=str, default='qos', choices=['qos', 'capacity'], - help='The autoscaling strategy: "qos" (latency-based) or "capacity" (throughput-based).') - parser.add_argument('--target-saturation', type=float, default=0.8, help='Target storage saturation for autoscaling (0.0-1.0)') - parser.add_argument('--use-burst-trace', action='store_true', - help='Use BurstGPT trace for workload generation.') - parser.add_argument('--burst-trace-path', type=str, default='BurstGPT/data/BurstGPT_1.csv', - help='Path to the BurstGPT trace file.') - parser.add_argument('--validation-trace', type=str, default=None, - help='Path to a real-world trace file for validation.') - parser.add_argument('--dataset-path', type=str, default=None, - help='Path to ShareGPT dataset JSON file for realistic workload generation.') - parser.add_argument('--max-conversations', type=int, default=500, - help='Maximum number of conversations to load from the ShareGPT dataset.') - parser.add_argument('--output', type=str, default=f"benchmark_results_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json", help='Output file for results') - parser.add_argument('--seed', type=int, default=None, - help='Seed for random number generators to ensure reproducibility.') - - args = parser.parse_args() - - if args.seed is not None: - print(f"Using random seed: {args.seed}") - random.seed(args.seed) - np.random.seed(args.seed) - if TORCH_AVAILABLE: - torch.manual_seed(args.seed) - if CUPY_AVAILABLE: - cp.random.seed(args.seed) - - model_config = MODEL_CONFIGS[args.model] - gen_mode = GenerationMode(args.generation_mode) - - benchmark = IntegratedBenchmark( - model_config=model_config, - num_users=args.num_users, - gpu_memory_gb=args.gpu_mem_gb, - cpu_memory_gb=args.cpu_mem_gb, - duration_seconds=args.duration, - cache_dir=args.cache_dir, - enable_autoscaling=args.enable_autoscaling, - autoscaler_mode=args.autoscaler_mode, - target_saturation=args.target_saturation, - enable_multi_turn=not args.disable_multi_turn, - enable_prefix_caching=not args.disable_prefix_caching, - enable_rag=args.enable_rag, - rag_num_docs=args.rag_num_docs, - validation_trace=args.validation_trace, - generation_mode=gen_mode, - performance_profile=args.performance_profile, - use_burst_trace=args.use_burst_trace, - burst_trace_path=args.burst_trace_path, - dataset_path=args.dataset_path, - max_conversations=args.max_conversations, - seed=args.seed - ) - - results = benchmark.run() - - # Save results to a JSON file - def convert_numpy(obj): - if isinstance(obj, np.ndarray): - return obj.tolist() - if isinstance(obj, np.generic): - return obj.item() - if isinstance(obj, datetime): - return obj.isoformat() - if is_dataclass(obj): - return asdict(obj) - raise TypeError(f"Object of type {type(obj)} is not JSON serializable") - - with open(args.output, 'w') as f: - json.dump(results, f, indent=4, default=convert_numpy) - - print(f"\nResults saved to {args.output}") - -if __name__ == "__main__": - main() \ No newline at end of file diff --git a/kv_cache_benchmark/kv_cache/__init__.py b/kv_cache_benchmark/kv_cache/__init__.py new file mode 100755 index 00000000..4ae90211 --- /dev/null +++ b/kv_cache_benchmark/kv_cache/__init__.py @@ -0,0 +1,145 @@ +""" +KV Cache Benchmark v3.0 — modular package. + +Re-exports all public symbols so existing code can do: + from kv_cache import MultiTierCache, IntegratedBenchmark, ... +""" + +# Compatibility flags +from kv_cache._compat import ( + HAS_CUPY, HAS_YAML, HAS_TORCH, HAS_TIKTOKEN, + CUPY_AVAILABLE, YAML_AVAILABLE, TORCH_AVAILABLE, TIKTOKEN_AVAILABLE, + HAS_PANDAS, PANDAS_AVAILABLE, + HAS_OPENPYXL, OPENPYXL_AVAILABLE, + cp, +) + +# Configuration +from kv_cache.config import ( + ConfigLoader, + cfg, + get_config, + set_config, +) + +# Core data models +from kv_cache.models import ( + ModelConfig, + MODEL_CONFIGS, + InferencePhase, + GenerationMode, + GENERATION_TIMING, + QoSLevel, + QoSSLA, + QOS_PROFILES, + get_qos_profiles, + UserProfile, + InferenceRequest, +) + +# Conversation management +from kv_cache.conversation import ( + ConversationState, + ConversationManager, +) + +# Prefix caching +from kv_cache.prefix_cache import ( + PrefixType, + PrefixCacheEntry, + PrefixMatcher, + PrefixCacheManager, +) + +# RAG workload +from kv_cache.rag import ( + RAGChunk, + RAGDocument, + RAGQuery, + RAGDocumentManager, +) + +# Storage backends +from kv_cache.backends import ( + StorageBackend, + CPUMemoryBackend, + NVMeBackend, +) + +# GPU backend is optional (requires CUDA) +try: + from kv_cache.backends import GPUMemoryBackend +except Exception: + pass + +# Core cache engine +from kv_cache.cache import ( + KVCacheGenerator, + MultiTierCache, +) + +# Monitoring and autoscaling +from kv_cache.monitoring import ( + StorageMetrics, + StorageMonitor, + WorkloadAutoscaler, + QoSMonitor, +) + +# Workload generation and validation +from kv_cache.workload import ( + RealTraceEntry, + ValidationEngine, + UserSimulator, + ShareGPTDatasetLoader, + validate_args, + MAX_USERS, + MAX_DURATION_SECONDS, + MAX_GPU_MEMORY_GB, + MAX_CPU_MEMORY_GB, + FORBIDDEN_CACHE_PREFIXES, +) + +# Benchmark orchestrator +from kv_cache.benchmark import IntegratedBenchmark + +# CLI +from kv_cache.cli import ( + export_results_to_xlsx, + main, +) + +__all__ = [ + # Compat flags + 'HAS_CUPY', 'HAS_YAML', 'HAS_TORCH', 'HAS_TIKTOKEN', + 'CUPY_AVAILABLE', 'YAML_AVAILABLE', 'TORCH_AVAILABLE', 'TIKTOKEN_AVAILABLE', + 'HAS_PANDAS', 'PANDAS_AVAILABLE', 'HAS_OPENPYXL', 'OPENPYXL_AVAILABLE', + 'cp', + # Config + 'ConfigLoader', 'cfg', 'get_config', 'set_config', + # Models + 'ModelConfig', 'MODEL_CONFIGS', + 'InferencePhase', 'GenerationMode', 'GENERATION_TIMING', + 'QoSLevel', 'QoSSLA', 'QOS_PROFILES', 'get_qos_profiles', + 'UserProfile', 'InferenceRequest', + # Conversation + 'ConversationState', 'ConversationManager', + # Prefix cache + 'PrefixType', 'PrefixCacheEntry', 'PrefixMatcher', 'PrefixCacheManager', + # RAG + 'RAGChunk', 'RAGDocument', 'RAGQuery', 'RAGDocumentManager', + # Backends + 'StorageBackend', 'GPUMemoryBackend', 'CPUMemoryBackend', 'NVMeBackend', + # Cache engine + 'KVCacheGenerator', 'MultiTierCache', + # Monitoring + 'StorageMetrics', 'StorageMonitor', 'WorkloadAutoscaler', 'QoSMonitor', + # Workload + 'RealTraceEntry', 'ValidationEngine', 'UserSimulator', 'ShareGPTDatasetLoader', + 'validate_args', 'MAX_USERS', 'MAX_DURATION_SECONDS', + 'MAX_GPU_MEMORY_GB', 'MAX_CPU_MEMORY_GB', 'FORBIDDEN_CACHE_PREFIXES', + # Benchmark + 'IntegratedBenchmark', + # CLI + 'export_results_to_xlsx', 'main', +] diff --git a/kv_cache_benchmark/kv_cache/_compat.py b/kv_cache_benchmark/kv_cache/_compat.py new file mode 100755 index 00000000..8ce129ba --- /dev/null +++ b/kv_cache_benchmark/kv_cache/_compat.py @@ -0,0 +1,64 @@ +""" +Optional dependency detection for KV Cache Benchmark. + +Centralizes try-import guards so other modules can check availability +without scattered try/except blocks. +""" + +# Optional YAML support for config file loading +try: + import yaml + HAS_YAML = True +except ImportError: + yaml = None + HAS_YAML = False + +# Alias for backward compatibility +YAML_AVAILABLE = HAS_YAML + +# Optional GPU libraries +try: + import torch + HAS_TORCH = True +except ImportError: + torch = None + HAS_TORCH = False + +TORCH_AVAILABLE = HAS_TORCH + +try: + import cupy as cp + HAS_CUPY = True +except ImportError: + cp = None + HAS_CUPY = False + +CUPY_AVAILABLE = HAS_CUPY + +try: + import tiktoken + HAS_TIKTOKEN = True +except ImportError: + tiktoken = None + HAS_TIKTOKEN = False + +TIKTOKEN_AVAILABLE = HAS_TIKTOKEN + +# Optional pandas/openpyxl for XLSX output +try: + import pandas as pd + HAS_PANDAS = True +except ImportError: + pd = None + HAS_PANDAS = False + +PANDAS_AVAILABLE = HAS_PANDAS + +try: + import openpyxl + HAS_OPENPYXL = True +except ImportError: + openpyxl = None + HAS_OPENPYXL = False + +OPENPYXL_AVAILABLE = HAS_OPENPYXL diff --git a/kv_cache_benchmark/kv_cache/backends.py b/kv_cache_benchmark/kv_cache/backends.py new file mode 100755 index 00000000..06f660cf --- /dev/null +++ b/kv_cache_benchmark/kv_cache/backends.py @@ -0,0 +1,333 @@ +""" +Storage backend classes for KV Cache Benchmark. + +Provides the abstract StorageBackend interface and concrete implementations +for GPU VRAM, CPU RAM, and NVMe/SSD storage tiers. +""" + +import os +import gc +import time +import logging +import tempfile +from pathlib import Path +from typing import Dict, Tuple + +import numpy as np + +from kv_cache._compat import ( + HAS_TORCH, TORCH_AVAILABLE, + HAS_CUPY, CUPY_AVAILABLE, +) +from kv_cache.config import cfg + +if HAS_TORCH: + import torch +if HAS_CUPY: + import cupy as cp + +logger = logging.getLogger(__name__) + + +# ============================================================================ +# STORAGE BACKEND CLASSES +# ============================================================================ + +class StorageBackend: + """Abstract base class for all storage backends (GPU, CPU, NVMe).""" + + from dataclasses import dataclass + + @dataclass + class IOTiming: + """Captures total latency along with host and device components.""" + total: float + device: float + host: float + + def write(self, key: str, data: np.ndarray) -> 'StorageBackend.IOTiming': + """Writes data to the backend and returns latency breakdown.""" + raise NotImplementedError + + def read(self, key: str) -> Tuple[np.ndarray, 'StorageBackend.IOTiming']: + """Reads data from the backend and returns the data and latency.""" + raise NotImplementedError + + def delete(self, key: str): + """Deletes data from the backend.""" + raise NotImplementedError + + def clear(self): + """Clears all data from the backend.""" + raise NotImplementedError + + +class GPUMemoryBackend(StorageBackend): + """ + GPU VRAM storage backend. + Uses PyTorch or CuPy for GPU operations. This is the fastest tier. + """ + + def __init__(self, use_torch=True, on_eviction_callback=None): + self.on_eviction_callback = on_eviction_callback + + if use_torch and TORCH_AVAILABLE: + self.backend = 'torch' + self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') + if self.device.type == 'cpu': + raise RuntimeError("No GPU available for PyTorch backend") + memory_fraction = cfg('gpu_backend', 'memory_fraction', default=0.8) + torch.cuda.set_per_process_memory_fraction(memory_fraction, 0) + torch.cuda.empty_cache() + elif CUPY_AVAILABLE: + self.backend = 'cupy' + mempool = cp.get_default_memory_pool() + mempool.free_all_blocks() + else: + raise RuntimeError("No GPU backend (PyTorch or CuPy) available.") + + self.cache = {} + self.pinned_memory = {} + + def write(self, key: str, data: np.ndarray) -> StorageBackend.IOTiming: + """Writes a NumPy array from CPU to GPU VRAM.""" + if self.backend == 'torch' and torch.cuda.is_available(): + required_bytes = data.nbytes + max_eviction_attempts = cfg('gpu_backend', 'max_eviction_attempts', default=100) + eviction_count = 0 + free_memory_threshold = cfg('gpu_backend', 'free_memory_threshold', default=0.1) + usable_fraction = 1.0 - free_memory_threshold + + while eviction_count < max_eviction_attempts: + free_memory = torch.cuda.mem_get_info()[0] + if required_bytes <= free_memory * usable_fraction: + break + + torch.cuda.empty_cache() + free_memory = torch.cuda.mem_get_info()[0] + if required_bytes <= free_memory * usable_fraction: + break + + if len(self.cache) == 0: + logger.warning( + f"GPU OOM: Need {required_bytes / 1024**2:.1f}MB, " + f"have {free_memory / 1024**2:.1f}MB, no entries to evict" + ) + break + + oldest_key = next(iter(self.cache)) + evicted_tensor = self.cache.pop(oldest_key) + evicted_size = evicted_tensor.element_size() * evicted_tensor.nelement() + del evicted_tensor + + if oldest_key in self.pinned_memory: + del self.pinned_memory[oldest_key] + + if self.on_eviction_callback: + try: + self.on_eviction_callback(oldest_key, 'gpu', evicted_size) + except Exception as e: + logger.warning(f"GPU eviction callback failed for {oldest_key}: {e}") + + eviction_count += 1 + logger.debug( + f"GPU eviction #{eviction_count}: evicted {oldest_key} " + f"({evicted_size / 1024**2:.1f}MB)" + ) + + if eviction_count > 0: + torch.cuda.empty_cache() + logger.debug(f"GPU: evicted {eviction_count} entries to make room for {key}") + + start = time.perf_counter() + + if self.backend == 'torch': + if key not in self.pinned_memory: + self.pinned_memory[key] = torch.from_numpy(data).pin_memory() + gpu_tensor = self.pinned_memory[key].to(self.device, non_blocking=True) + torch.cuda.synchronize() + self.cache[key] = gpu_tensor + del self.pinned_memory[key] + else: + self.cache[key] = cp.asarray(data) + cp.cuda.Stream.null.synchronize() + + total = time.perf_counter() - start + return StorageBackend.IOTiming(total=total, device=total, host=total) + + def read(self, key: str) -> Tuple[np.ndarray, StorageBackend.IOTiming]: + """Reads a tensor from GPU VRAM back to a NumPy array on the CPU.""" + if key not in self.cache: + raise KeyError(f"Key {key} not found in GPU cache") + + start = time.perf_counter() + + if self.backend == 'torch': + gpu_tensor = self.cache[key] + cpu_tensor = gpu_tensor.to('cpu', non_blocking=True) + torch.cuda.synchronize() + data = cpu_tensor.numpy() + else: + data = cp.asnumpy(self.cache[key]) + cp.cuda.Stream.null.synchronize() + + total = time.perf_counter() - start + return data, StorageBackend.IOTiming(total=total, device=total, host=total) + + def delete(self, key: str): + if key in self.cache: + del self.cache[key] + if key in self.pinned_memory: + del self.pinned_memory[key] + + def clear(self): + """Clears all tensors from the GPU cache and frees memory.""" + for key in list(self.cache.keys()): + del self.cache[key] + self.cache.clear() + for key in list(self.pinned_memory.keys()): + del self.pinned_memory[key] + self.pinned_memory.clear() + + if self.backend == 'torch' and torch.cuda.is_available(): + torch.cuda.empty_cache() + torch.cuda.synchronize() + elif self.backend == 'cupy': + mempool = cp.get_default_memory_pool() + pinned_mempool = cp.get_default_pinned_memory_pool() + mempool.free_all_blocks() + pinned_mempool.free_all_blocks() + + +class CPUMemoryBackend(StorageBackend): + """CPU RAM storage backend. This is the second tier in the cache hierarchy.""" + + def __init__(self): + self.cache = {} + + def write(self, key: str, data: np.ndarray) -> StorageBackend.IOTiming: + """Writes data by copying it into the cache dictionary.""" + start = time.perf_counter() + self.cache[key] = np.copy(data) + total = time.perf_counter() - start + return StorageBackend.IOTiming(total=total, device=total, host=total) + + def read(self, key: str) -> Tuple[np.ndarray, StorageBackend.IOTiming]: + """Reads data by copying it from the cache dictionary.""" + if key not in self.cache: + raise KeyError(f"Key {key} not found in CPU cache") + start = time.perf_counter() + data = np.copy(self.cache[key]) + total = time.perf_counter() - start + return data, StorageBackend.IOTiming(total=total, device=total, host=total) + + def delete(self, key: str): + if key in self.cache: + del self.cache[key] + + def clear(self): + for key in list(self.cache.keys()): + del self.cache[key] + self.cache.clear() + gc.collect() + + +class NVMeBackend(StorageBackend): + """ + NVMe/SSD storage backend using memory-mapped files. + This is the third and slowest tier, used for offloading from CPU RAM. + """ + + def __init__(self, base_path: str = None): + self.temp_dir = None + if base_path is None: + self.temp_dir = tempfile.TemporaryDirectory(prefix="kv_cache_") + self.base_path = Path(self.temp_dir.name) + else: + self.base_path = Path(base_path) + if self.base_path.exists(): + if not self.base_path.is_dir(): + raise NotADirectoryError(f"Cache path {self.base_path} exists but is not a directory.") + for entry in self.base_path.glob("*.npy"): + try: + entry.unlink() + except OSError: + pass + else: + self.base_path.mkdir(parents=True, exist_ok=True) + + if not self.base_path.exists(): + raise OSError(f"Cache directory {self.base_path} does not exist and could not be created.") + + self.metadata = {} + + def _get_path(self, key: str) -> Path: + """Constructs the file path for a given cache key.""" + return self.base_path / f"{key}.npy" + + def write(self, key: str, data: np.ndarray) -> StorageBackend.IOTiming: + """Writes a NumPy array to a binary .npy file on disk.""" + start = time.perf_counter() + path = self._get_path(key) + + with open(path, 'wb') as f: + np.save(f, data, allow_pickle=False) + post_save = time.perf_counter() + f.flush() + os.fsync(f.fileno()) + post_fsync = time.perf_counter() + + self.metadata[key] = {'shape': data.shape, 'dtype': str(data.dtype), 'size': data.nbytes} + + host_time = post_save - start + device_time = post_fsync - post_save + total = post_fsync - start + return StorageBackend.IOTiming(total=total, device=device_time, host=host_time) + + def read(self, key: str) -> Tuple[np.ndarray, StorageBackend.IOTiming]: + """Reads a .npy file from disk, dropping page cache first for accurate benchmarking.""" + start = time.perf_counter() + path = self._get_path(key) + + if not path.exists(): + raise KeyError(f"Key {key} not found in NVMe cache") + + try: + fd = os.open(path, os.O_RDONLY) + try: + os.posix_fadvise(fd, 0, 0, 4) # POSIX_FADV_DONTNEED + except AttributeError: + pass + finally: + os.close(fd) + except Exception: + pass + + pre_load = time.perf_counter() + data = np.load(path, allow_pickle=False) + load_done = time.perf_counter() + data = np.array(data) + copy_done = time.perf_counter() + + device_time = load_done - pre_load + host_time = (pre_load - start) + (copy_done - load_done) + total = copy_done - start + return data, StorageBackend.IOTiming(total=total, device=device_time, host=host_time) + + def delete(self, key: str): + path = self._get_path(key) + if path.exists(): + path.unlink() + if key in self.metadata: + del self.metadata[key] + + def clear(self): + """Deletes all .npy files from the cache directory.""" + for file in self.base_path.glob("*.npy"): + file.unlink() + self.metadata.clear() + + def __del__(self): + """Cleans up the temporary directory when the object is destroyed.""" + if self.temp_dir: + self.temp_dir.cleanup() diff --git a/kv_cache_benchmark/kv_cache/benchmark.py b/kv_cache_benchmark/kv_cache/benchmark.py new file mode 100755 index 00000000..f80ad070 --- /dev/null +++ b/kv_cache_benchmark/kv_cache/benchmark.py @@ -0,0 +1,1140 @@ +""" +Integrated benchmark orchestrator for KV Cache Benchmark. + +Contains IntegratedBenchmark which wires all components together +and runs the main benchmark loop with thread management, trace replay, +preconditioning, and summary printing. +""" + +import os +import sys +import csv +import glob +import time +import queue +import random +import logging +import threading +from typing import Dict, List, Optional, Tuple +from datetime import datetime +from concurrent.futures import ThreadPoolExecutor + +import numpy as np + +from kv_cache.config import cfg +from kv_cache.models import ( + ModelConfig, InferencePhase, GenerationMode, GENERATION_TIMING, + QoSLevel, QOS_PROFILES, UserProfile, InferenceRequest, +) +from kv_cache.cache import MultiTierCache +from kv_cache.conversation import ConversationManager +from kv_cache.prefix_cache import PrefixType, PrefixCacheManager +from kv_cache.rag import RAGDocumentManager +from kv_cache.monitoring import StorageMonitor, WorkloadAutoscaler, QoSMonitor +from kv_cache.workload import ( + ValidationEngine, UserSimulator, ShareGPTDatasetLoader, +) + +logger = logging.getLogger(__name__) + + +class IntegratedBenchmark: + """The main orchestrator for the entire benchmark.""" + + def __init__(self, + model_config: ModelConfig, + num_users: int, + gpu_memory_gb: float, + cpu_memory_gb: float, + duration_seconds: int, + cache_dir: str = None, + enable_autoscaling: bool = False, + autoscaler_mode: str = 'qos', + target_saturation: float = 0.8, + enable_multi_turn: bool = True, + enable_prefix_caching: bool = True, + enable_rag: bool = False, + rag_num_docs: int = 10, + validation_trace: Optional[str] = None, + generation_mode: GenerationMode = GenerationMode.NONE, + performance_profile: str = 'latency', + use_burst_trace: bool = False, + burst_trace_path: Optional[str] = None, + dataset_path: Optional[str] = None, + max_conversations: int = 500, + seed: Optional[int] = None, + max_concurrent_allocs: int = 0, + request_rate: float = 0, + max_requests: int = 0, + storage_capacity_gb: float = 0, + precondition: bool = False, + precondition_size_gb: float = 0, + precondition_threads: int = 0, + trace_speedup: float = 1.0, + replay_cycles: int = 0, + prefill_only: bool = False, + decode_only: bool = False): + + self.model_config = model_config + self.num_users = num_users + self.initial_users = num_users + self.duration = duration_seconds + self.enable_autoscaling = enable_autoscaling + self.enable_multi_turn = enable_multi_turn + self.generation_mode = generation_mode + self.ms_per_token = GENERATION_TIMING[generation_mode] * 1000 + self.enable_prefix_caching = enable_prefix_caching + self.enable_rag = enable_rag + self.rag_num_docs = rag_num_docs + self.performance_profile = performance_profile + self.use_burst_trace = use_burst_trace + self.burst_trace_path = burst_trace_path + self.dataset_path = dataset_path + self.max_conversations = max_conversations + self.seed = seed + self.max_concurrent_allocs = max_concurrent_allocs + self.request_rate = request_rate + self.max_requests = max_requests + self.storage_capacity_gb = storage_capacity_gb + self.precondition = precondition + self.precondition_size_gb = precondition_size_gb + self.precondition_threads = precondition_threads if precondition_threads > 0 else (os.cpu_count() or 4) + self.trace_speedup = trace_speedup + self.replay_cycles = replay_cycles + self.prefill_only = prefill_only + self.decode_only = decode_only + self.burst_trace_files: List[str] = [] + self.sharegpt_loader: Optional[ShareGPTDatasetLoader] = None + + if self.dataset_path: + self.sharegpt_loader = ShareGPTDatasetLoader( + dataset_path=self.dataset_path, + max_conversations=self.max_conversations, + seed=self.seed + ) + self.use_dataset = True + elif self.use_burst_trace: + self.burst_trace_files = self._resolve_burst_trace_files() + self.use_dataset = False + else: + self.use_dataset = False + + # Initialize components + self.cache = MultiTierCache( + model_config=model_config, + gpu_memory_gb=gpu_memory_gb, + cpu_memory_gb=cpu_memory_gb, + cache_dir=cache_dir, + performance_profile=performance_profile, + seed=seed, + max_concurrent_allocs=max_concurrent_allocs, + storage_capacity_gb=storage_capacity_gb + ) + self.conversation_manager = ConversationManager() + self.prefix_cache_manager = PrefixCacheManager(self.cache) if enable_prefix_caching else None + self.rag_manager = RAGDocumentManager(self.cache) if enable_rag else None + self.qos_monitor = QoSMonitor() + self.storage_monitor = StorageMonitor(self) if enable_autoscaling else None + self.autoscaler = WorkloadAutoscaler( + mode=autoscaler_mode, + initial_users=self.num_users, + target_saturation=target_saturation + ) if enable_autoscaling else None + self.scale_interval = self.autoscaler.scale_interval if self.autoscaler else 1.0 + self.validator = ValidationEngine(validation_trace) if validation_trace else None + + self.request_queue = queue.PriorityQueue() + self.request_counter = 0 + self.counter_lock = threading.Lock() + + self.active_users = [] + self.user_generators = {} + self.user_conversations: Dict[str, str] = {} + self.user_conversations_lock = threading.Lock() + + self.results = { + 'requests_completed': 0, 'total_tokens_generated': 0, + 'total_storage_io_latency': 0.0, 'total_generation_latency': 0.0, + 'end_to_end_latencies': [], 'storage_latencies': [], 'generation_latencies': [], + 'throughput_timeline': [], 'prefill_latencies': [], 'decode_latencies': [], + 'multi_turn_cache_hits': 0, 'multi_turn_cache_misses': 0, + 'seed': self.seed, + } + self.results_lock = threading.Lock() + self.stop_event: Optional[threading.Event] = None + self.rag_ingest_done = threading.Event() if self.enable_rag else None + + def _ingest_rag_documents(self, num_docs: int, stop_event: Optional[threading.Event] = None): + """Ingests RAG documents for the workload.""" + logger.info(f"Ingesting {num_docs} RAG documents...") + + # Determine token range based on model size + # Large models (70B+) have bigger per-token KV cache, so use fewer tokens per doc + is_large_model = self.model_config.hidden_dim >= 8192 or self.model_config.num_layers >= 64 + if is_large_model: + token_min = cfg('rag', 'large_model_doc_tokens_min', default=1024) + token_max = cfg('rag', 'large_model_doc_tokens_max', default=4096) + else: + token_min = cfg('rag', 'small_model_doc_tokens_min', default=4000) + token_max = cfg('rag', 'small_model_doc_tokens_max', default=12000) + + logger.info(f"RAG document token range: [{token_min}, {token_max}] " + f"({'large' if is_large_model else 'small'} model profile)") + + for i in range(num_docs): + if stop_event and stop_event.is_set(): + break + doc_tokens = random.randint(token_min, token_max) + self.rag_manager.ingest_document(f"doc_{i:04d}", doc_tokens, self.model_config) + + if self.rag_ingest_done: + self.rag_ingest_done.set() + + def _resolve_burst_trace_files(self) -> List[str]: + """Resolve --burst-trace-path to a sorted list of CSV file paths.""" + p = self.burst_trace_path + if not p: + logger.error("--use-burst-trace flag requires --burst-trace-path to be set.") + sys.exit(1) + + if os.path.isdir(p): + files = sorted(glob.glob(os.path.join(p, '*.csv'))) + elif '*' in p or '?' in p: + files = sorted(glob.glob(p)) + elif os.path.isfile(p): + files = [p] + else: + logger.error(f"Trace path not found: {p}") + sys.exit(1) + + if not files: + logger.error(f"No CSV files matched: {p}") + sys.exit(1) + + logger.info(f"Resolved {len(files)} BurstGPT trace file(s): {[os.path.basename(f) for f in files]}") + return files + + def _burst_trace_iterator(self): + """Streaming iterator that yields trace rows from each CSV file.""" + for filepath in self.burst_trace_files: + try: + with open(filepath, 'r', encoding='utf-8') as f: + reader = csv.DictReader(f) + for row in reader: + try: + timestamp = float(row.get('Timestamp', 0)) + context_tokens = int(row['Request tokens']) + generate_tokens = int(row['Response tokens']) + total_tokens = int(row.get('Total tokens', context_tokens + generate_tokens)) + yield (timestamp, context_tokens, generate_tokens, total_tokens) + except (ValueError, KeyError): + continue + except FileNotFoundError: + logger.error(f"Trace file not found: {filepath}") + sys.exit(1) + except Exception as e: + logger.error(f"Error reading trace file {filepath}: {e}") + sys.exit(1) + + def _generate_requests_from_trace(self, stop_event: threading.Event): + """Generates InferenceRequest objects from the streaming trace iterator.""" + speedup = self.trace_speedup + cycles_remaining = self.replay_cycles + request_index = 0 + prev_timestamp = None + trace_total_tokens_sum = 0 + + interactive_prob = cfg('qos_distribution', 'interactive_probability', default=0.15) + responsive_threshold = cfg('qos_distribution', 'responsive_threshold', default=0.50) + + while not stop_event.is_set(): + rows_in_cycle = 0 + for timestamp, context_tokens, generate_tokens, total_tokens in self._burst_trace_iterator(): + if stop_event.is_set(): + break + + if prev_timestamp is not None and speedup > 0: + delta = timestamp - prev_timestamp + if delta > 0: + sleep_time = delta / speedup + remaining = sleep_time + while remaining > 0 and not stop_event.is_set(): + chunk = min(remaining, 5.0) + time.sleep(chunk) + remaining -= chunk + if stop_event.is_set(): + break + prev_timestamp = timestamp + + trace_total_tokens_sum += total_tokens + + with self.counter_lock: + req_id = self.request_counter + self.request_counter += 1 + + rand = random.random() + if rand < interactive_prob: + qos_level, priority = QoSLevel.INTERACTIVE, 3 + elif rand < responsive_threshold: + qos_level, priority = QoSLevel.RESPONSIVE, 2 + else: + qos_level, priority = QoSLevel.BATCH, 1 + + user_id = f"trace_user_{request_index % 1000}" + + request = InferenceRequest( + user_id=user_id, + request_id=f"{user_id}_req_{req_id:04d}", + timestamp=datetime.now(), + context_tokens=context_tokens, + generate_tokens=generate_tokens, + priority=priority, + phase=InferencePhase.PREFILL if context_tokens >= 10000 else InferencePhase.PREFILL_DECODE, + qos_level=qos_level, + cache_key=f"{user_id}_req_{req_id:04d}" + ) + + priority_tuple = (-QOS_PROFILES[request.qos_level].priority, time.time()) + self.request_queue.put((priority_tuple, request)) + + request_index += 1 + rows_in_cycle += 1 + + if rows_in_cycle == 0: + logger.warning("BurstGPT trace yielded 0 rows.") + break + + if cycles_remaining > 0: + cycles_remaining -= 1 + if cycles_remaining == 0: + logger.info(f"Completed {self.replay_cycles} replay cycle(s). " + f"Trace total_tokens sum: {trace_total_tokens_sum:,}") + if self.stop_event: + self.stop_event.set() + break + + prev_timestamp = None + + def _generate_requests_from_dataset(self, stop_event: threading.Event): + """Generates InferenceRequest objects from the loaded ShareGPT dataset.""" + if not self.sharegpt_loader or not self.sharegpt_loader.conversations: + logger.warning("ShareGPT dataset is empty or not loaded. Falling back to synthetic workload.") + users = UserSimulator.generate_mixed_users(self.num_users) + self.generate_requests(users, stop_event) + return + + conversation_iterator = iter(self.sharegpt_loader.iterate_conversations(shuffle=True)) + current_conversation = None + turn_index = 0 + cycles_remaining = self.replay_cycles + + while not stop_event.is_set(): + if current_conversation is None or turn_index >= len(current_conversation['turns']): + try: + current_conversation = next(conversation_iterator) + turn_index = 0 + except StopIteration: + if cycles_remaining > 0: + cycles_remaining -= 1 + if cycles_remaining == 0: + logger.info(f"Completed {self.replay_cycles} ShareGPT replay cycle(s).") + if self.stop_event: + self.stop_event.set() + return + conversation_iterator = iter(self.sharegpt_loader.iterate_conversations(shuffle=True)) + continue + + turn = current_conversation['turns'][turn_index] + context_tokens = turn['context_tokens'] + generate_tokens = turn['generation_tokens'] + + with self.counter_lock: + req_id = self.request_counter + self.request_counter += 1 + + interactive_prob = cfg('qos_distribution', 'interactive_probability', default=0.15) + responsive_threshold = cfg('qos_distribution', 'responsive_threshold', default=0.50) + + rand = random.random() + if rand < interactive_prob: + qos_level, priority = QoSLevel.INTERACTIVE, 3 + elif rand < responsive_threshold: + qos_level, priority = QoSLevel.RESPONSIVE, 2 + else: + qos_level, priority = QoSLevel.BATCH, 1 + + user_id = f"dataset_user_{req_id % self.num_users}" + conv_id = current_conversation['id'] + + phase = InferencePhase.PREFILL if context_tokens >= 10000 else InferencePhase.PREFILL_DECODE + + request = InferenceRequest( + user_id=user_id, + request_id=f"{user_id}_req_{req_id:04d}", + timestamp=datetime.now(), + context_tokens=context_tokens, + generate_tokens=generate_tokens, + priority=priority, + phase=phase, + qos_level=qos_level, + cache_key=f"{conv_id}_turn_{turn['turn_number']}", + conversation_id=conv_id if self.enable_multi_turn else None, + turn_number=turn['turn_number'] if self.enable_multi_turn else None + ) + + priority_tuple = (-QOS_PROFILES[request.qos_level].priority, time.time()) + self.request_queue.put((priority_tuple, request)) + + turn_index += 1 + + if self.request_rate > 0: + time.sleep(1.0 / self.request_rate) + + def generate_requests(self, users: List[UserProfile], stop_event: threading.Event): + """Generate requests concurrently for each simulated user.""" + + if self.enable_rag and self.rag_manager and self.rag_ingest_done: + threading.Thread( + target=self._ingest_rag_documents, + args=(self.rag_num_docs, stop_event), + daemon=True + ).start() + + def enqueue_request(request: InferenceRequest): + priority_tuple = (-QOS_PROFILES[request.qos_level].priority, time.time()) + self.request_queue.put((priority_tuple, request)) + + def user_worker(user: UserProfile): + """Simulates an individual user generating traffic.""" + local_conv_id = None + + while not stop_event.is_set(): + time.sleep(user.think_time * random.uniform(0.8, 1.2)) + if stop_event.is_set(): + break + + if self.enable_multi_turn and self.conversation_manager: + if local_conv_id and random.random() >= 0.8: + with self.user_conversations_lock: + self.user_conversations.pop(user.user_id, None) + local_conv_id = None + + if local_conv_id is None: + local_conv_id = self.conversation_manager.start_conversation(user.user_id) + with self.user_conversations_lock: + self.user_conversations[user.user_id] = local_conv_id + else: + local_conv_id = None + + new_context = random.randint(max(1, user.context_length // 4), user.context_length) + new_gen = random.randint(max(1, user.generation_length // 4), user.generation_length) + + with self.counter_lock: + req_id = self.request_counter + self.request_counter += 1 + + if self.enable_multi_turn and self.conversation_manager and local_conv_id: + turn_number, cache_key = self.conversation_manager.add_turn(local_conv_id, new_context, new_gen) + else: + turn_number = 1 + cache_key = f"{user.user_id}_req_{req_id:06d}" + + phase = InferencePhase.PREFILL if new_context >= 10000 else InferencePhase.PREFILL_DECODE + + request = InferenceRequest( + user_id=user.user_id, + request_id=f"req_{user.user_id}_{req_id:06d}", + timestamp=datetime.now(), + context_tokens=new_context, + generate_tokens=new_gen, + priority=user.priority, + phase=phase, + qos_level=user.qos_level, + cache_key=cache_key, + conversation_id=local_conv_id, + turn_number=turn_number + ) + + enqueue_request(request) + + if self.rag_manager and random.random() < cfg('rag', 'request_probability', default=0.1): + doc_keys = list(self.rag_manager.documents.keys()) + if not doc_keys: + continue # RAG documents not yet ingested + doc_id = random.choice(doc_keys) + retrieved_chunks = self.rag_manager.retrieve_chunks(doc_id) + rag_context_tokens = sum(chunk.token_count for chunk in retrieved_chunks) + + with self.counter_lock: + rag_req_id = self.request_counter + self.request_counter += 1 + + rag_request = InferenceRequest( + user_id=user.user_id, + request_id=f"rag_{user.user_id}_{rag_req_id:06d}", + timestamp=datetime.now(), + context_tokens=rag_context_tokens, + generate_tokens=random.randint(50, 200), + priority=user.priority, + phase=InferencePhase.DECODE, + qos_level=user.qos_level, + cache_key=f"rag_{doc_id}" + ) + enqueue_request(rag_request) + + for user in users: + threading.Thread(target=user_worker, args=(user,), daemon=True).start() + + self.active_users = users + + stop_event.wait() + + def process_requests(self, stop_event: threading.Event): + """The main worker loop that processes requests from the queue.""" + while not stop_event.is_set(): + try: + priority_tuple, request = self.request_queue.get(timeout=0.5) + except queue.Empty: + continue + + # Check again after dequeue — don't start expensive I/O after stop + if stop_event.is_set(): + break + + request.start_time = time.perf_counter() + storage_latency = 0.0 + cache_type = 'user' + + # 1. Check for a prefix cache hit. + if self.prefix_cache_manager: + prefix_entry, remaining_tokens = self.prefix_cache_manager.check_prefix_cache(request, self.model_config) + if prefix_entry: + cache_type = 'system' if prefix_entry.prefix_type == PrefixType.SYSTEM_PROMPT else 'common' + _, read_lat = self.cache.access_cache(prefix_entry.kv_cache_key, request.phase, cache_type) + storage_latency += read_lat + request.context_tokens = remaining_tokens + + # 2. For multi-turn conversations, access cache from previous turn. + if self.conversation_manager and request.turn_number > 1: + prev_turn_key = f"{request.conversation_id}_turn_{request.turn_number - 1}" + location, read_latency = self.cache.access_cache(prev_turn_key, InferencePhase.DECODE, 'multi_turn') + if location is not None: + storage_latency += read_latency + with self.results_lock: self.results['multi_turn_cache_hits'] += 1 + else: + with self.results_lock: self.results['multi_turn_cache_misses'] += 1 + + # 3. Perform the main PREFILL operation (a cache WRITE). + # Skip if decode_only mode (disaggregated decode node) + if not self.decode_only: + if request.phase == InferencePhase.PREFILL or request.phase == InferencePhase.PREFILL_DECODE: + success, location, write_latency = self.cache.allocate_cache( + request.cache_key, request.context_tokens, InferencePhase.PREFILL + ) + storage_latency += write_latency + with self.results_lock: self.results['prefill_latencies'].append(write_latency) + + # 4. Simulate a RAG operation. + if self.rag_manager and random.random() < cfg('rag', 'request_probability', default=0.1): + doc_keys = list(self.rag_manager.documents.keys()) if self.rag_manager.documents else [] + if doc_keys: + doc_id = random.choice(doc_keys) + chunks = self.rag_manager.retrieve_chunks(doc_id) + for chunk in chunks: + _, read_lat = self.cache.access_cache(chunk.kv_cache_key, InferencePhase.DECODE) + storage_latency += read_lat + + # 5. Perform the DECODE operation (a cache READ). + # Skip if prefill_only mode (disaggregated prefill node) + if not self.prefill_only: + if request.phase == InferencePhase.DECODE or request.phase == InferencePhase.PREFILL_DECODE: + # For decode-only mode, read from pre-populated cache entries + if self.decode_only and hasattr(self, '_prepopulated_keys') and self._prepopulated_keys: + # Pick a random pre-populated key to read from + decode_key = random.choice(self._prepopulated_keys) + else: + decode_key = request.cache_key + + location, read_latency = self.cache.access_cache(decode_key, InferencePhase.DECODE, cache_type) + + if location is None: + # Cache miss during decode - need to allocate (unless decode_only) + if not self.decode_only: + _, _, write_latency = self.cache.allocate_cache( + request.cache_key, + request.context_tokens, + InferencePhase.PREFILL + ) + storage_latency += write_latency + else: + decode_batch_size = cfg('decode', 'batch_size', default=32) + num_batched_reads = max(1, (request.generate_tokens + decode_batch_size - 1) // decode_batch_size) + for _ in range(num_batched_reads): + _, batch_read_latency = self.cache.access_cache(decode_key, InferencePhase.DECODE, cache_type) + storage_latency += batch_read_latency + + with self.results_lock: self.results['decode_latencies'].append(read_latency) + + # 6. Simulate token generation time. + generation_latency = request.generate_tokens * GENERATION_TIMING[self.generation_mode] + if generation_latency > 0: time.sleep(generation_latency) + + request.complete_time = time.perf_counter() + + # 7. Record all results. + with self.results_lock: + self.results['requests_completed'] += 1 + self.results['total_tokens_generated'] += request.generate_tokens + self.results['total_storage_io_latency'] += storage_latency + self.results['total_generation_latency'] += generation_latency + self.results['end_to_end_latencies'].append(request.total_latency_ms / 1000) + self.results['storage_latencies'].append(storage_latency) + self.results['generation_latencies'].append(generation_latency) + + if self.max_requests > 0 and self.results['requests_completed'] >= self.max_requests: + if self.stop_event: + self.stop_event.set() + + self.qos_monitor.record_request(request) + + def monitor_stats(self, stop_event: threading.Event): + """Periodically collects and logs stats, and triggers autoscaling.""" + start_time = time.time() + last_log_time = start_time + + while not stop_event.is_set(): + time.sleep(self.scale_interval) + now = time.time() + + elapsed = now - start_time + if elapsed > self.duration: + break + + with self.results_lock: + total_tokens = self.results['total_tokens_generated'] + throughput = total_tokens / max(elapsed, 1e-6) + with self.results_lock: + self.results['throughput_timeline'].append({ + 'timestamp': elapsed, + 'throughput_tokens_per_sec': throughput + }) + + if self.enable_autoscaling and self.storage_monitor and self.autoscaler: + metrics = self.storage_monitor.collect_metrics(self.cache, self.request_queue.qsize()) + saturation_level = self.storage_monitor.get_saturation_level() + if metrics: + metrics.saturation_level = saturation_level + + action, target_users = self.autoscaler.calculate_scale_action( + metrics if metrics else None, + throughput, + saturation_level + ) + + if action in ('scale_up', 'scale_down') and target_users != self.num_users: + self.num_users = max(1, min(target_users, 500)) + self.autoscaler.current_users = self.num_users + log_entry = { + 'timestamp': datetime.now().isoformat(), + 'mode': self.autoscaler.mode, + 'action': action, + 'users': self.num_users, + 'saturation_level': saturation_level, + 'read_latency_p95_ms': metrics.read_latency_p95_ms if metrics else None, + 'write_latency_p95_ms': metrics.write_latency_p95_ms if metrics else None, + 'throughput_tokens_per_sec': throughput + } + self.autoscaler.scaling_history.append(log_entry) + logger.info(f"Autoscaler {action} -> {self.num_users} users (saturation: {saturation_level:.2f})") + elif action == 'stop': + logger.info("Autoscaler requested stop after reaching capacity peak.") + stop_event.set() + log_entry = { + 'timestamp': datetime.now().isoformat(), + 'mode': self.autoscaler.mode, + 'action': 'stop', + 'users': self.num_users, + 'saturation_level': saturation_level, + 'peak_throughput_tokens_per_sec': self.autoscaler.peak_throughput + } + self.autoscaler.scaling_history.append(log_entry) + else: + self.autoscaler.current_users = self.num_users + + if now - last_log_time >= 10: + self._calculate_stats() + queue_depth = self.request_queue.qsize() + logger.info(f"Time: {int(elapsed)}s, Users: {self.num_users}, Queue: {queue_depth}, " + f"Throughput: {throughput:.2f} tok/s") + last_log_time = now + + def run(self) -> Dict: + """The main entry point to start the benchmark execution.""" + print(f"\nIntegrated Multi-User KV Cache Benchmark - MLPerf Edition") + print(f"Model: {self.model_config.name}") + print(f"Users: {self.num_users}") + print(f"Duration: {self.duration}s") + if self.seed is not None: + print(f"Seed: {self.seed}") + print(f"Generation Mode: {self.generation_mode.value} ({self.ms_per_token:.1f}ms/token)") + print(f"Features:") + print(f" - Phase-Aware Processing: Enabled") + print(f" - Multi-turn Conversations: {'Enabled' if self.enable_multi_turn else 'Disabled'}") + print(f" - Prefix Caching: {'Enabled' if self.enable_prefix_caching else 'Disabled'}") + print(f" - RAG Workload: {'Enabled' if self.enable_rag else 'Disabled'}") + print(f" - Autoscaling: {'Enabled' if self.enable_autoscaling else 'Disabled'}") + if self.enable_autoscaling: + print(f" - Mode: {self.autoscaler.mode}") + print(f" - QoS Support: Enabled (Interactive/Responsive/Batch)") + print(f" - Trace-Driven (BurstGPT): {'Enabled' if self.use_burst_trace else 'Disabled'}") + if self.use_burst_trace: + print(f" Trace files: {len(self.burst_trace_files)}") + print(f" Trace speedup: {self.trace_speedup}x ({'no delay' if self.trace_speedup == 0 else 'real-time' if self.trace_speedup == 1.0 else f'{self.trace_speedup}x faster'})") + print(f" Replay cycles: {'infinite' if self.replay_cycles == 0 else self.replay_cycles}") + print(f" - ShareGPT Dataset: {'Enabled' if self.use_dataset else 'Disabled'}") + if self.max_concurrent_allocs > 0: + print(f" - Max Concurrent Allocations: {self.max_concurrent_allocs} (bounds RAM usage)") + print("=" * 80) + + users = [] + if not self.use_burst_trace and not self.use_dataset: + users = UserSimulator.generate_mixed_users(self.num_users) + context_lengths = [u.context_length for u in users] + print(f"\nUser Context Length Distribution:") + print(f" Min: {min(context_lengths)} tokens ({min(context_lengths) * self.model_config.kv_cache_size_per_token / 1024**2:.2f} MB)") + print(f" Max: {max(context_lengths)} tokens ({max(context_lengths) * self.model_config.kv_cache_size_per_token / 1024**2:.2f} MB)") + print(f" Mean: {np.mean(context_lengths):.0f} tokens ({np.mean(context_lengths) * self.model_config.kv_cache_size_per_token / 1024**2:.2f} MB)") + + qos_dist = {level: sum(1 for u in users if u.qos_level == level) for level in QoSLevel} + print(f"\nQoS Distribution:") + for level, count in qos_dist.items(): + print(f" {level.value}: {count} users") + elif self.use_dataset and self.sharegpt_loader: + print(f"\nShareGPT Dataset Statistics:") + print(f" Conversations: {self.sharegpt_loader.token_stats.get('total_conversations', 0)}") + print(f" Total Turns: {self.sharegpt_loader.token_stats.get('total_turns', 0)}") + + if self.precondition: + self._run_preconditioning() + + # Pre-populate cache for decode-only mode + if self.decode_only: + self._prepopulate_cache_for_decode() + + # Log disaggregated mode + mode_str = "standard (prefill+decode)" + if self.prefill_only: + mode_str = "PREFILL-ONLY (write-heavy, disaggregated prefill node)" + elif self.decode_only: + mode_str = "DECODE-ONLY (read-heavy, assumes KV cache pre-populated)" + print(f"\nStarting benchmark... Mode: {mode_str}") + print("-" * 80) + + stop_event = threading.Event() + self.stop_event = stop_event + + threads = [] + if self.use_dataset: + gen_thread = threading.Thread(target=self._generate_requests_from_dataset, args=(stop_event,), daemon=True) + elif self.use_burst_trace: + gen_thread = threading.Thread(target=self._generate_requests_from_trace, args=(stop_event,), daemon=True) + else: + gen_thread = threading.Thread(target=self.generate_requests, args=(users, stop_event), daemon=True) + + threads.append(gen_thread) + gen_thread.start() + + num_workers = min(self.num_users, 500) + for _ in range(num_workers): + proc_thread = threading.Thread(target=self.process_requests, args=(stop_event,), daemon=True) + threads.append(proc_thread) + proc_thread.start() + + if self.enable_autoscaling: + mon_thread = threading.Thread(target=self.monitor_stats, args=(stop_event,), daemon=True) + threads.append(mon_thread) + mon_thread.start() + + benchmark_start = time.time() + stop_event.wait(timeout=self.duration) + actual_duration = time.time() - benchmark_start + + stop_event.set() + for thread in threads: + thread.join(timeout=2.0) + + self._calculate_stats(actual_duration) + + if self.validator: + self.results['validation'] = self.validator.validate_benchmark(self.results) + + return self.results + + def _run_preconditioning(self): + """Run multi-threaded SSD preconditioning phase.""" + nvme_limit = self.cache.nvme_memory_limit + if self.precondition_size_gb > 0: + target_bytes = self.precondition_size_gb * 1024**3 + elif nvme_limit != float('inf'): + target_bytes = 2 * nvme_limit + else: + print("WARNING: Cannot precondition — NVMe capacity unknown and --precondition-size-gb not set. Skipping.") + return + + target_gb = target_bytes / 1024**3 + num_threads = self.precondition_threads + print(f"\n### PRECONDITIONING PHASE ###") + print(f" Target: {target_gb:.1f} GB") + print(f" Threads: {num_threads}") + + tokens_per_entry = 2048 + lock = threading.Lock() + state = {'written_bytes': 0, 'seq': 0, 'last_report': 0} + + def worker(): + while True: + with lock: + if state['written_bytes'] >= target_bytes: + return + my_seq = state['seq'] + state['seq'] += 1 + + key = f"precond_{my_seq}" + success, tier, latency = self.cache.allocate_cache(key, tokens_per_entry) + + if success: + entry = self.cache.cache_entries.get(key) + if entry: + with lock: + state['written_bytes'] += entry['size'] + gb_written = state['written_bytes'] / 1024**3 + if gb_written - state['last_report'] >= 10: + print(f" Preconditioning progress: {gb_written:.1f} / {target_gb:.1f} GB") + state['last_report'] = gb_written + + with ThreadPoolExecutor(max_workers=num_threads) as executor: + futures = [executor.submit(worker) for _ in range(num_threads)] + for f in futures: + f.result() + + print(f" Preconditioning complete: {state['written_bytes'] / 1024**3:.1f} GB written") + print(f" Resetting stats for steady-state measurement...") + self.cache.reset_stats() + + def _prepopulate_cache_for_decode(self): + """Pre-populate cache entries for decode-only mode. + + In disaggregated inference, the decode node assumes KV cache already exists + (written by prefill nodes). This simulates that by writing entries upfront. + """ + print(f"\n### PRE-POPULATING CACHE FOR DECODE-ONLY MODE ###") + + # Determine how many entries to pre-populate based on num_users and typical context + num_entries = self.num_users * 10 # 10 entries per user (multi-turn) + tokens_per_entry = 2048 # Average context length + num_threads = os.cpu_count() or 16 + + print(f" Creating {num_entries} cache entries ({tokens_per_entry} tokens each)...") + print(f" Threads: {num_threads}") + + # Temporarily disable semaphore for fast pre-population + # (pre-population is not part of measured benchmark) + original_semaphore = self.cache.allocation_semaphore + self.cache.allocation_semaphore = None + + # Track pre-populated keys so decode requests can use them + self._prepopulated_keys = [] + lock = threading.Lock() + state = {'completed': 0, 'seq': 0} + + def worker(): + while True: + with lock: + if state['seq'] >= num_entries: + return + my_seq = state['seq'] + state['seq'] += 1 + + key = f"prepop_{my_seq}" + success, tier, latency = self.cache.allocate_cache(key, tokens_per_entry) + + with lock: + if success: + self._prepopulated_keys.append(key) + state['completed'] += 1 + if state['completed'] % 100 == 0: + print(f" Progress: {state['completed']}/{num_entries} entries created") + + with ThreadPoolExecutor(max_workers=num_threads) as executor: + futures = [executor.submit(worker) for _ in range(num_threads)] + for f in futures: + f.result() + + # Restore semaphore for actual benchmark + self.cache.allocation_semaphore = original_semaphore + + print(f" Pre-population complete: {len(self._prepopulated_keys)} entries in cache") + print(f" Resetting stats for decode-only measurement...") + self.cache.reset_stats() + + def _calculate_stats(self, actual_duration: float = None): + """Calculate final statistics with all feature breakdowns.""" + if not self.results['end_to_end_latencies']: + logger.warning("No requests completed during benchmark!") + return + + duration = actual_duration if actual_duration else self.duration + + e2e = np.array(self.results['end_to_end_latencies']) + storage = np.array(self.results['storage_latencies']) + generation = np.array(self.results['generation_latencies']) + + cache_stats = self.cache.get_stats(duration) + qos_metrics = self.qos_monitor.get_all_qos_metrics() + prefix_stats = self.prefix_cache_manager.stats if self.prefix_cache_manager else {} + autoscaling_stats = self.autoscaler.scaling_history if self.autoscaler else [] + + autoscaling_summary = None + if self.autoscaler: + autoscaling_summary = { + 'initial_users': getattr(self, 'initial_users', self.num_users), + 'final_users': self.autoscaler.current_users, + 'total_scale_events': len(autoscaling_stats) + } + if self.autoscaler.mode == 'capacity': + autoscaling_summary.update({ + 'peak_user_count': self.autoscaler.peak_user_count, + 'peak_throughput_tokens_per_sec': self.autoscaler.peak_throughput + }) + + summary = { + 'total_requests': self.results['requests_completed'], + 'total_tokens': self.results['total_tokens_generated'], + 'elapsed_time': duration, + 'avg_throughput_tokens_per_sec': self.results['total_tokens_generated'] / duration, + 'total_storage_io_time': self.results['total_storage_io_latency'], + 'storage_throughput_tokens_per_sec': self.results['total_tokens_generated'] / self.results['total_storage_io_latency'] if self.results['total_storage_io_latency'] > 0 else 0, + 'requests_per_second': self.results['requests_completed'] / duration, + 'end_to_end_latency_ms': { + 'mean': np.mean(e2e) * 1000, + 'p50': np.percentile(e2e, 50) * 1000, + 'p95': np.percentile(e2e, 95) * 1000, + 'p99': np.percentile(e2e, 99) * 1000, + 'p999': np.percentile(e2e, 99.9) * 1000, + 'p9999': np.percentile(e2e, 99.99) * 1000, + }, + 'storage_io_latency_ms': { + 'mean': np.mean(storage) * 1000, + 'p50': np.percentile(storage, 50) * 1000, + 'p95': np.percentile(storage, 95) * 1000, + 'p99': np.percentile(storage, 99) * 1000, + 'p999': np.percentile(storage, 99.9) * 1000, + 'p9999': np.percentile(storage, 99.99) * 1000, + }, + 'generation_latency_ms': { + 'mean': np.mean(generation) * 1000, + 'p50': np.percentile(generation, 50) * 1000, + 'p95': np.percentile(generation, 95) * 1000, + 'p99': np.percentile(generation, 99) * 1000, + 'p999': np.percentile(generation, 99.9) * 1000, + 'p9999': np.percentile(generation, 99.99) * 1000, + }, + 'cache_stats': cache_stats, + 'qos_metrics': qos_metrics, + 'prefix_cache_stats': prefix_stats, + 'autoscaling_stats': autoscaling_stats, + 'autoscaling_summary': autoscaling_summary, + 'multi_turn_stats': { + 'cache_hits': self.results['multi_turn_cache_hits'], + 'cache_misses': self.results['multi_turn_cache_misses'], + 'hit_rate': self.results['multi_turn_cache_hits'] / + max(self.results['multi_turn_cache_hits'] + self.results['multi_turn_cache_misses'], 1) + } + } + self.results['summary'] = summary + self._print_summary(summary) + + def _print_summary(self, summary: Dict): + """Print comprehensive results summary.""" + print("\n" + "=" * 80) + print("BENCHMARK RESULTS - MLPerf KV Cache Storage Benchmark") + print(f"Generation Mode: {self.generation_mode.value} ({self.ms_per_token:.1f}ms/token)") + print("=" * 80) + + PASS_SYMBOL = "[OK]" + FAIL_SYMBOL = "[X]" + + cache_stats = summary['cache_stats'] + if 'storage_health' in cache_stats: + storage_health = cache_stats['storage_health'] + status = storage_health['overall_status'] + status_symbol = PASS_SYMBOL if status == 'PASS' else FAIL_SYMBOL + print(f"\n### STORAGE PERFORMANCE ASSESSMENT: {status} {status_symbol} ###") + print(f" Criteria Passed: {storage_health['passed_count']}/{storage_health['total_count']}") + for criterion in storage_health['criteria']: + symbol = PASS_SYMBOL if criterion['passed'] else FAIL_SYMBOL + unit = criterion.get('unit', '') + if unit == 'ratio': + print(f" {symbol} {criterion['name']}: {criterion['actual']:.1%} (target: {criterion['target']:.1%})") + continue + + actual = criterion.get('actual') + target = criterion.get('target') + try: + actual_str = f"{actual:.2f}" + except (ValueError, TypeError): + actual_str = str(actual) + + try: + target_str = f"{target:.2f}" + except (ValueError, TypeError): + target_str = str(target) + + unit_suffix = unit if unit else '' + print(f" {symbol} {criterion['name']}: {actual_str}{unit_suffix} (target: {target_str}{unit_suffix})") + + print(f"\n### OVERALL PERFORMANCE ###") + print(f"Requests Completed: {summary['total_requests']}") + print(f"Total Tokens Generated: {summary['total_tokens']}") + print(f"Throughput (wall-clock): {summary['avg_throughput_tokens_per_sec']:.2f} tokens/sec") + print(f"Throughput (storage I/O): {summary['storage_throughput_tokens_per_sec']:.2f} tokens/sec") + print(f"Requests/sec: {summary['requests_per_second']:.2f}") + + print(f"\n### END-TO-END LATENCY (Queue Wait + Storage I/O + Generation) ###") + print(f" Mean: {summary['end_to_end_latency_ms']['mean']:.2f} ms") + print(f" P50: {summary['end_to_end_latency_ms']['p50']:.2f} ms") + print(f" P95: {summary['end_to_end_latency_ms']['p95']:.2f} ms") + print(f" P99: {summary['end_to_end_latency_ms']['p99']:.2f} ms") + + print(f"\n### PER-REQUEST STORAGE LATENCY (All I/O ops for one request) ###") + print(f" Mean: {summary['storage_io_latency_ms']['mean']:.2f} ms") + print(f" P50: {summary['storage_io_latency_ms']['p50']:.2f} ms") + print(f" P95: {summary['storage_io_latency_ms']['p95']:.2f} ms") + print(f" P99: {summary['storage_io_latency_ms']['p99']:.2f} ms") + print(f" (= 1 prefill write + N decode reads per request)") + + if self.generation_mode != GenerationMode.NONE: + print(f"\n### TOKEN GENERATION LATENCY (Simulated @ {self.ms_per_token:.1f}ms/token) ###") + print(f" Mean: {summary['generation_latency_ms']['mean']:.2f} ms") + print(f" P50: {summary['generation_latency_ms']['p50']:.2f} ms") + print(f" P95: {summary['generation_latency_ms']['p95']:.2f} ms") + + print(f"\n### STORAGE PERFORMANCE ###") + print(f" Cache Hit Rate: {cache_stats['cache_hit_rate']*100:.1f}%") + print(f" Total Read: {cache_stats['total_read_gb']:.2f} GB") + print(f" Total Write: {cache_stats['total_write_gb']:.2f} GB") + rw_ratio = cache_stats['read_write_ratio'] + if rw_ratio > 1e9: + print(f" Read/Write Ratio: ∞ (read-only)") + elif rw_ratio < 1e-9: + print(f" Read/Write Ratio: 0 (write-only)") + else: + print(f" Read/Write Ratio: {rw_ratio:.2f}") + print(f" Storage KV Read Operations/sec: {cache_stats['read_iops'] / self.duration:.2f}") + print(f" Storage KV Write Operations/sec: {cache_stats['write_iops'] / self.duration:.2f}") + + print(f"\n### CACHE TIER DISTRIBUTION ###") + print(f" GPU Entries: {cache_stats['gpu_entries']} ({cache_stats['gpu_memory_used_gb']:.2f} GB)") + print(f" CPU Entries: {cache_stats['cpu_entries']} ({cache_stats['cpu_memory_used_gb']:.2f} GB)") + print(f" Storage Entries: {cache_stats['storage_entries']}") + + print(f"\n### TIER-SPECIFIC KV BYTES ###") + if cache_stats.get('tier_gpu_kv_bytes_written_gb', 0) > 0: + print(f" GPU KV Bytes Written: {cache_stats['tier_gpu_kv_bytes_written_gb']:.2f} GB") + if cache_stats.get('tier_gpu_kv_bytes_read_gb', 0) > 0: + print(f" GPU KV Bytes Read: {cache_stats['tier_gpu_kv_bytes_read_gb']:.2f} GB") + if cache_stats.get('tier_cpu_kv_bytes_written_gb', 0) > 0: + print(f" CPU KV Bytes Written: {cache_stats['tier_cpu_kv_bytes_written_gb']:.2f} GB") + if cache_stats.get('tier_cpu_kv_bytes_read_gb', 0) > 0: + print(f" CPU KV Bytes Read: {cache_stats['tier_cpu_kv_bytes_read_gb']:.2f} GB") + if cache_stats.get('tier_storage_kv_bytes_written_gb', 0) > 0: + print(f" Storage KV Bytes Written: {cache_stats['tier_storage_kv_bytes_written_gb']:.2f} GB") + if cache_stats.get('tier_storage_kv_bytes_read_gb', 0) > 0: + print(f" Storage KV Bytes Read: {cache_stats['tier_storage_kv_bytes_read_gb']:.2f} GB") + + print(f"\n### STORAGE KV BANDWIDTH ###") + for tier_label, tier_key in [('GPU', 'gpu'), ('CPU', 'cpu'), ('Storage', 'storage')]: + read_bw = cache_stats.get(f'tier_{tier_key}_read_bandwidth_gbps', 0) + write_bw = cache_stats.get(f'tier_{tier_key}_write_bandwidth_gbps', 0) + if read_bw > 0: + print(f" {tier_label} KV Read Bandwidth: {read_bw:.2f} GB/s") + if write_bw > 0: + print(f" {tier_label} KV Write Bandwidth: {write_bw:.2f} GB/s") + + print(f"\n### TIER-SPECIFIC LATENCIES (Total = Host + Device) ###") + for tier in ['gpu', 'cpu', 'storage']: + for op in ['read', 'write']: + p95_key = f'{tier}_{op}_p95_ms' + if p95_key in cache_stats: + tier_label = 'Storage' if tier == 'storage' else tier.upper() + print(f" {tier_label} {op.title()} P95 (Total): {cache_stats[p95_key]:.2f} ms") + + print(f"\n### STORAGE TIER LATENCY BREAKDOWN (Device = Disk I/O, Host = Serialization) ###") + for op in ['read', 'write']: + device_key = f'storage_{op}_device_p95_ms' + host_key = f'storage_{op}_host_p95_ms' + total_key = f'storage_{op}_p95_ms' + if device_key in cache_stats: + print(f" Storage {op.title()}:") + print(f" - Device P95 (Disk I/O): {cache_stats[device_key]:.2f} ms") + if host_key in cache_stats: + print(f" - Host P95 (Serialization): {cache_stats[host_key]:.2f} ms") + if total_key in cache_stats: + print(f" - Total P95: {cache_stats[total_key]:.2f} ms") + + print(f"\n### CACHE TYPE BREAKDOWNS ###") + print(f" System Prompt Hits: {cache_stats['system_prompt_hits']}") + print(f" Common Phrase Hits: {cache_stats['common_phrase_hits']}") + print(f" User Cache Hits: {cache_stats['user_cache_hits']}") + print(f" Multi-turn Hits: {cache_stats['multi_turn_hits']}") + + if summary.get('prefix_cache_stats') and summary['prefix_cache_stats']['prefix_hits'] > 0: + print(f"\n### PREFIX CACHING ###") + prefix_stats = summary['prefix_cache_stats'] + print(f" Prefix Hits: {prefix_stats['prefix_hits']}") + print(f" Prefix Misses: {prefix_stats['prefix_misses']}") + print(f" System Prompt Reuse: {prefix_stats['system_prompt_reuse']}") + print(f" Bytes Saved: {prefix_stats['bytes_saved'] / 1024**3:.2f} GB") + + if summary.get('multi_turn_stats') and summary['multi_turn_stats']['cache_hits'] > 0: + print(f"\n### MULTI-TURN CONVERSATIONS ###") + mt_stats = summary['multi_turn_stats'] + print(f" Multi-turn Cache Hits: {mt_stats['cache_hits']}") + print(f" Multi-turn Cache Misses: {mt_stats['cache_misses']}") + print(f" Multi-turn Hit Rate: {mt_stats['hit_rate']*100:.1f}%") + + if self.performance_profile != 'throughput': + print(f"\n### QOS LATENCY METRICS (Informational - includes simulated generation) ###") + qos_metrics = summary['qos_metrics'] + for qos_level, metrics in qos_metrics.items(): + if metrics.get('no_data'): continue + print(f"\n {qos_level.upper()}:") + print(f" Requests: {metrics['total_requests']}") + print(f" Latency P95: {metrics['latency_ms']['p95']:.2f} ms") + print(f" Latency P99: {metrics['latency_ms']['p99']:.2f} ms") + if 'sla' in metrics: + sla_met = '[OK]' if metrics['sla']['met'] else '[X]' + print(f" SLA Met: {sla_met} (compliance: {metrics['sla']['compliance']:.1%})") + + if summary.get('autoscaling_stats'): + auto_stats = summary['autoscaling_stats'] + if auto_stats: + print(f"\n### AUTOSCALING ({self.autoscaler.mode} mode) ###") + print(f" Scaling Events: {len(auto_stats)}") + print(f" Final User Count: {self.autoscaler.current_users}") + if self.autoscaler.mode == 'capacity': + print(f" Peak Capacity Found: {self.autoscaler.peak_throughput:.2f} tok/s at {self.autoscaler.peak_user_count} users") + + if 'validation' in self.results: + print(f"\n### VALIDATION ###") + validation = self.results['validation'] + print(f" Validation: {'PASSED [OK]' if validation['passed'] else 'FAILED [X]'}") + print(f" Average Error: {validation['avg_error_pct']:.2f}%") + + print("\n" + "=" * 80) + print("NOTES:") + if self.generation_mode == GenerationMode.NONE: + print(" - Pure storage I/O benchmark (no generation simulation)") + else: + print(" - End-to-end latency includes simulated GPU inference") + print("=" * 80) diff --git a/kv_cache_benchmark/kv_cache/cache.py b/kv_cache_benchmark/kv_cache/cache.py new file mode 100755 index 00000000..94ab686a --- /dev/null +++ b/kv_cache_benchmark/kv_cache/cache.py @@ -0,0 +1,766 @@ +""" +Core multi-tier cache engine for KV Cache Benchmark. + +Contains KVCacheGenerator (data generation with pre-allocated buffers) +and MultiTierCache (3-tier LRU cache with waterfall eviction). +""" + +import time +import hashlib +import shutil +import logging +import threading +from typing import Dict, List, Optional, Tuple + +import numpy as np + +from kv_cache._compat import TORCH_AVAILABLE, CUPY_AVAILABLE +from kv_cache.config import cfg +from kv_cache.models import ModelConfig, InferencePhase +from kv_cache.backends import ( + StorageBackend, GPUMemoryBackend, CPUMemoryBackend, NVMeBackend, +) + +logger = logging.getLogger(__name__) + + +class KVCacheGenerator: + """Generates realistic-looking KV cache data for testing.""" + + def __init__(self, model_config: ModelConfig, global_seed: Optional[int] = None): + self.model_config = model_config + self.global_seed = 0 if global_seed is None else int(global_seed) + + self.buffer_size_elements = 128 * 1024 * 1024 # 128 million elements (~256MB for float16) + self.dtype = np.float16 if 'float16' in self.model_config.dtype else np.float32 + + logger.info(f"Pre-generating {self.buffer_size_elements * 2 / 1024**2:.0f} MB noise buffer...") + rng = np.random.default_rng(self.global_seed) + self.precomputed_buffer = rng.uniform(-1.0, 1.0, size=self.buffer_size_elements).astype(self.dtype) + + def _seed_from_key(self, key: str) -> int: + h = hashlib.sha256(key.encode('utf-8')).digest() + key_hash64 = int.from_bytes(h[:8], 'little') + return (key_hash64 ^ self.global_seed) & 0xFFFFFFFFFFFFFFFF + + def generate(self, sequence_length: int, key: Optional[str] = None) -> np.ndarray: + """ + Generates a NumPy array with the correct shape and dtype for a KV cache. + Uses a pre-computed buffer to avoid CPU bottlenecks during benchmarking. + """ + if self.model_config.attention_type == 'mla': + # MLA: compressed latent (kv_lora_rank) + decoupled RoPE key (qk_rope_head_dim) + # No separate K and V — jointly compressed into single latent vector per layer + kv_shape = ( + self.model_config.num_layers, + int(sequence_length), + self.model_config.kv_lora_rank + self.model_config.qk_rope_head_dim, + ) + else: + kv_shape = ( + self.model_config.num_layers, + 2, + int(sequence_length), + self.model_config.kv_heads, + self.model_config.kv_dim_per_head, + ) + + total_elements = int(np.prod(kv_shape)) + + if total_elements <= self.buffer_size_elements: + if key: + seed = self._seed_from_key(key) + divisor = self.buffer_size_elements - total_elements + start_idx = int(seed % divisor) if divisor > 0 else 0 + else: + start_idx = 0 + + flat_view = self.precomputed_buffer[start_idx : start_idx + total_elements] + return flat_view.reshape(kv_shape) + else: + repeats = int((total_elements + self.buffer_size_elements - 1) // self.buffer_size_elements) + large_data = np.tile(self.precomputed_buffer, repeats)[:total_elements] + return large_data.reshape(kv_shape) + + +# ============================================================================ +# ENHANCED MULTI-TIER CACHE +# ============================================================================ + +class MultiTierCache: + """ + Manages KV cache data across GPU, CPU, and NVMe tiers. + + This class is the heart of the benchmark. It orchestrates where cache data is + written to and read from based on available space and access patterns. + """ + + def __init__(self, + model_config: ModelConfig, + gpu_memory_gb: float, + cpu_memory_gb: float, + cache_dir: str = None, + eviction_policy: str = 'lru', + performance_profile: str = 'latency', + seed: Optional[int] = None, + max_concurrent_allocs: int = 0, + storage_capacity_gb: float = 0): + + self.model_config = model_config + self.gpu_memory_limit = gpu_memory_gb * 1024**3 + self.cpu_memory_limit = cpu_memory_gb * 1024**3 + self.eviction_policy = eviction_policy + self.performance_profile = performance_profile + self.seed = seed + self.max_concurrent_allocs = max_concurrent_allocs + + # Initialize storage backends for each tier. + self.backends = {} + try: + if TORCH_AVAILABLE or CUPY_AVAILABLE: + self.backends['gpu'] = GPUMemoryBackend( + use_torch=TORCH_AVAILABLE, + on_eviction_callback=self._handle_gpu_eviction + ) + except Exception as e: + logger.warning(f"Could not initialize GPU backend: {e}") + + self.backends['cpu'] = CPUMemoryBackend() + self.backends['nvme'] = NVMeBackend(base_path=cache_dir) + + self.generator = KVCacheGenerator(model_config, global_seed=self.seed) + + self.cache_entries = {} + self.entry_locks: Dict[str, threading.Lock] = {} + if storage_capacity_gb > 0: + self.nvme_memory_limit = storage_capacity_gb * 1024**3 + else: + try: + nvme_base = self.backends['nvme'].base_path + self.nvme_memory_limit = float(shutil.disk_usage(nvme_base).free) + except Exception: + self.nvme_memory_limit = float('inf') + + self.gpu_memory_used = 0 + self.cpu_memory_used = 0 + self.nvme_memory_used = 0 + + self.metadata_lock = threading.Lock() + self.memory_lock = threading.Lock() + self.stats_lock = threading.Lock() + + if self.max_concurrent_allocs and self.max_concurrent_allocs > 0: + self.allocation_semaphore = threading.Semaphore(self.max_concurrent_allocs) + else: + self.allocation_semaphore = None + + self.stats = { + 'cache_hits': 0, + 'cache_misses': 0, + 'evictions': 0, + 'offloads_cpu': 0, + 'offloads_storage': 0, + + 'gpu_read_latencies': [], 'cpu_read_latencies': [], 'storage_read_latencies': [], + 'gpu_write_latencies': [], 'cpu_write_latencies': [], 'storage_write_latencies': [], + 'storage_read_device_latencies': [], 'storage_read_host_latencies': [], + 'storage_write_device_latencies': [], 'storage_write_host_latencies': [], + + 'prefill_writes': 0, 'decode_reads': 0, + + 'tier_gpu_kv_bytes_written': 0, 'tier_cpu_kv_bytes_written': 0, 'tier_storage_kv_bytes_written': 0, + 'tier_gpu_kv_bytes_read': 0, 'tier_cpu_kv_bytes_read': 0, 'tier_storage_kv_bytes_read': 0, + + 'system_prompt_hits': 0, 'common_phrase_hits': 0, + 'user_cache_hits': 0, 'multi_turn_hits': 0, + + 'total_read_bytes': 0, 'total_write_bytes': 0, + 'read_operations': 0, 'write_operations': 0, + + 'storage_tokens_processed': 0, + } + + def _get_entry_lock(self, key: str) -> threading.Lock: + """Get or create a lock for a specific cache entry.""" + with self.metadata_lock: + if key not in self.entry_locks: + self.entry_locks[key] = threading.Lock() + return self.entry_locks[key] + + def _handle_gpu_eviction(self, key: str, tier: str, evicted_size: int) -> None: + """Callback invoked by GPUMemoryBackend when it evicts entries during OOM handling.""" + with self.metadata_lock: + if key in self.cache_entries: + del self.cache_entries[key] + if key in self.entry_locks: + del self.entry_locks[key] + + with self.memory_lock: + self.gpu_memory_used = max(0, self.gpu_memory_used - evicted_size) + + with self.stats_lock: + self.stats['evictions'] += 1 + + logger.debug(f"GPU eviction synced: removed {key} from cache metadata") + + # ======================================================================== + # WATERFALL LRU EVICTION METHODS + # ======================================================================== + + def _get_tier_order(self) -> List[str]: + """Returns the tier hierarchy from fastest to slowest.""" + tiers = [] + if 'gpu' in self.backends: + tiers.append('gpu') + tiers.extend(['cpu', 'nvme']) + return tiers + + def _get_tier_limit(self, tier: str) -> float: + """Get the memory limit for a tier in bytes.""" + if tier == 'gpu': + return self.gpu_memory_limit + elif tier == 'cpu': + return self.cpu_memory_limit + else: + return self.nvme_memory_limit + + def _get_tier_usage(self, tier: str) -> float: + """Get the current memory usage for a tier in bytes.""" + if tier == 'gpu': + return self.gpu_memory_used + elif tier == 'cpu': + return self.cpu_memory_used + else: + return self.nvme_memory_used + + def _update_tier_usage(self, tier: str, delta: int): + """Update the memory usage tracking for a tier.""" + if tier == 'gpu': + self.gpu_memory_used = max(0, self.gpu_memory_used + delta) + elif tier == 'cpu': + self.cpu_memory_used = max(0, self.cpu_memory_used + delta) + elif tier == 'nvme': + self.nvme_memory_used = max(0, self.nvme_memory_used + delta) + + def _get_lru_entries_in_tier(self, tier: str) -> List[Tuple[str, dict]]: + """Get all cache entries in a specific tier, sorted by LRU order.""" + with self.metadata_lock: + entries = [ + (k, dict(v)) + for k, v in self.cache_entries.items() + if v['location'] == tier + ] + entries.sort(key=lambda x: (x[1]['last_access'], x[1].get('access_count', 0))) + return entries + + def _demote_entry(self, key: str, from_tier: str, to_tier: str) -> Tuple[bool, float]: + """Move a cache entry from one tier to a lower (slower) tier.""" + entry_lock = self._get_entry_lock(key) + + with entry_lock: + with self.metadata_lock: + if key not in self.cache_entries: + return False, 0.0 + entry = self.cache_entries[key] + current_location = entry['location'] + if current_location != from_tier: + return True, 0.0 + size = entry['size'] + + try: + data, read_timing = self.backends[from_tier].read(key) + write_timing = self.backends[to_tier].write(key, data) + self.backends[from_tier].delete(key) + + with self.metadata_lock: + if key in self.cache_entries: + self.cache_entries[key]['location'] = to_tier + + with self.memory_lock: + self._update_tier_usage(from_tier, -size) + + with self.stats_lock: + self.stats['evictions'] += 1 + if to_tier == 'cpu': + self.stats['offloads_cpu'] += 1 + elif to_tier == 'nvme': + self.stats['offloads_storage'] += 1 + bytes_per_token = self.model_config.kv_cache_size_per_token + if bytes_per_token > 0: + tokens = size // bytes_per_token + self.stats['storage_tokens_processed'] += tokens + else: + logger.warning("bytes_per_token is 0, skipping token count update") + + total_latency = read_timing.total + write_timing.total + return True, total_latency + + except Exception as e: + logger.error(f"Failed to demote {key} from {from_tier} to {to_tier}: {e}") + return False, 0.0 + + def _ensure_space_in_tier(self, tier: str, required_bytes: int, recursion_depth: int = 0) -> bool: + """Ensure there's enough space in a tier by evicting LRU entries.""" + if tier == 'nvme' and self.nvme_memory_limit == float('inf'): + # Still track usage even when unlimited, for accurate metrics + with self.memory_lock: + self._update_tier_usage('nvme', required_bytes) + return True + + max_recursion = cfg('eviction', 'max_recursion_depth', default=10) + if recursion_depth > max_recursion: + logger.warning("Hit recursion limit in _ensure_space_in_tier") + return False + + tier_order = self._get_tier_order() + try: + tier_idx = tier_order.index(tier) + except ValueError: + return False + + next_tier = tier_order[tier_idx + 1] if tier_idx + 1 < len(tier_order) else None + if next_tier is None and tier != 'nvme': + return False + + limit = self._get_tier_limit(tier) + target_usage_ratio = cfg('eviction', 'target_usage_ratio', default=0.8) + target_usage = limit * target_usage_ratio + + large_entry_limit_ratio = cfg('eviction', 'large_entry_limit_ratio', default=0.95) + if required_bytes > limit * large_entry_limit_ratio: + return False + + entries_in_tier = len(self._get_lru_entries_in_tier(tier)) + max_evictions_hard_cap = cfg('eviction', 'max_evictions_hard_cap', default=5000) + max_evictions_min = cfg('eviction', 'max_evictions_min', default=1000) + max_evictions_per_call = min(max_evictions_hard_cap, max(max_evictions_min, entries_in_tier + 100)) + eviction_count = 0 + + while eviction_count < max_evictions_per_call: + with self.memory_lock: + current_usage = self._get_tier_usage(tier) + if current_usage + required_bytes <= target_usage: + self._update_tier_usage(tier, required_bytes) + return True + + if current_usage < limit * 0.05 and required_bytes <= limit * large_entry_limit_ratio: + self._update_tier_usage(tier, required_bytes) + return True + + lru_entries = self._get_lru_entries_in_tier(tier) + + if not lru_entries: + with self.metadata_lock: + actual_usage = sum( + entry['size'] for entry in self.cache_entries.values() + if entry['location'] == tier + ) + with self.memory_lock: + if tier == 'gpu': + self.gpu_memory_used = actual_usage + elif tier == 'cpu': + self.cpu_memory_used = actual_usage + elif tier == 'nvme': + self.nvme_memory_used = actual_usage + + with self.memory_lock: + current_usage = self._get_tier_usage(tier) + if current_usage + required_bytes <= target_usage: + self._update_tier_usage(tier, required_bytes) + return True + + return False + + total_size_in_tier = sum(e['size'] for _, e in lru_entries) + if total_size_in_tier < limit * 0.2 and required_bytes > target_usage * 0.5: + return False + + lru_key, lru_entry = lru_entries[0] + lru_size = lru_entry['size'] + + if next_tier is None and tier == 'nvme': + entry_lock = self._get_entry_lock(lru_key) + with entry_lock: + try: + self.backends['nvme'].delete(lru_key) + except Exception as e: + logger.warning(f"Failed to delete NVMe entry {lru_key}: {e}") + with self.metadata_lock: + self.cache_entries.pop(lru_key, None) + with self.memory_lock: + self.nvme_memory_used = max(0, self.nvme_memory_used - lru_size) + with self.stats_lock: + self.stats['evictions'] += 1 + else: + if not self._ensure_space_in_tier(next_tier, lru_size, recursion_depth + 1): + logger.warning(f"Could not make space in {next_tier} for demotion") + return False + + success, _ = self._demote_entry(lru_key, tier, next_tier) + if not success: + # Entry may have been deleted/moved by another thread; skip to next + eviction_count += 1 + continue + + eviction_count += 1 + + return False + + def allocate_cache(self, key: str, num_tokens: int, phase: InferencePhase = InferencePhase.PREFILL) -> Tuple[bool, str, float]: + """Allocates and writes a new KV cache entry to the most appropriate tier.""" + with self.metadata_lock: + if key in self.cache_entries: + return True, self.cache_entries[key]['location'], 0.0 + + if self.allocation_semaphore: + self.allocation_semaphore.acquire() + + try: + return self._allocate_cache_inner(key, num_tokens, phase) + finally: + if self.allocation_semaphore: + self.allocation_semaphore.release() + + def _allocate_cache_inner(self, key: str, num_tokens: int, phase: InferencePhase) -> Tuple[bool, str, float]: + """Inner implementation of allocate_cache, called within semaphore.""" + try: + data = self.generator.generate(sequence_length=num_tokens, key=key) + except MemoryError: + logger.error(f"MemoryError generating cache for key {key} ({num_tokens} tokens)") + return False, 'none', 0.0 + except Exception as exc: + logger.error(f"Failed to generate cache for key {key}: {exc}") + return False, 'none', 0.0 + + size_bytes = data.nbytes + + with self.stats_lock: + if phase == InferencePhase.PREFILL: + self.stats['prefill_writes'] += 1 + self.stats['write_operations'] += 1 + self.stats['total_write_bytes'] += size_bytes + + tier_order = self._get_tier_order() + allocated_tier = None + + for tier in tier_order: + if self._ensure_space_in_tier(tier, size_bytes): + allocated_tier = tier + break + + if allocated_tier is None: + logger.warning("All tiers full — eviction could not free space, forcing write to NVMe") + allocated_tier = 'nvme' + + try: + if allocated_tier == 'gpu': + timing = self.backends['gpu'].write(key, data) + elif allocated_tier == 'cpu': + timing = self.backends['cpu'].write(key, data) + else: + timing = self.backends['nvme'].write(key, data) + + with self.metadata_lock: + self.cache_entries[key] = { + 'location': allocated_tier, + 'size': size_bytes, + 'last_access': time.time(), + 'access_count': 1 + } + + with self.stats_lock: + tier_stats_name = 'storage' if allocated_tier == 'nvme' else allocated_tier + + self.stats[f'tier_{tier_stats_name}_kv_bytes_written'] += size_bytes + + if allocated_tier == 'cpu': + self.stats['offloads_cpu'] += 1 + self.stats['cpu_write_latencies'].append(timing.total) + elif allocated_tier == 'nvme': + self.stats['offloads_storage'] += 1 + self.stats['storage_write_latencies'].append(timing.total) + self.stats['storage_write_device_latencies'].append(timing.device) + self.stats['storage_write_host_latencies'].append(timing.host) + self.stats['storage_tokens_processed'] += num_tokens + elif allocated_tier == 'gpu': + self.stats['gpu_write_latencies'].append(timing.total) + + del data + return True, allocated_tier, timing.total + + except Exception as e: + with self.memory_lock: + self._update_tier_usage(allocated_tier, -size_bytes) + del data + return False, 'none', 0.0 + + def access_cache(self, key: str, phase: InferencePhase = InferencePhase.DECODE, + cache_type: str = 'user') -> Tuple[Optional[str], float]: + """Accesses an existing cached entry and records the read performance.""" + with self.metadata_lock: + if key not in self.cache_entries: + with self.stats_lock: + self.stats['cache_misses'] += 1 + return None, 0.0 + + # try: + entry = self.cache_entries[key] + location = entry['location'] + entry_size = entry['size'] + # except KeyError: + # with self.stats_lock: + # self.stats['cache_misses'] += 1 + # return None, 0.0 + + entry_lock = self._get_entry_lock(key) + + with entry_lock: + with self.metadata_lock: + if key not in self.cache_entries: + with self.stats_lock: + self.stats['cache_misses'] += 1 + return None, 0.0 + + entry = self.cache_entries[key] + entry['last_access'] = time.time() + entry['access_count'] += 1 + + with self.stats_lock: + self.stats['cache_hits'] += 1 + + if cache_type == 'system': self.stats['system_prompt_hits'] += 1 + elif cache_type == 'common': self.stats['common_phrase_hits'] += 1 + elif cache_type == 'multi_turn': self.stats['multi_turn_hits'] += 1 + else: self.stats['user_cache_hits'] += 1 + + tier_stats_name = 'storage' if location == 'nvme' else location + + self.stats[f'tier_{tier_stats_name}_kv_bytes_read'] += entry_size + + if phase == InferencePhase.DECODE: + self.stats['decode_reads'] += 1 + + self.stats['read_operations'] += 1 + self.stats['total_read_bytes'] += entry_size + + try: + _, timing = self.backends[location].read(key) + + with self.stats_lock: + if location == 'gpu': + self.stats['gpu_read_latencies'].append(timing.total) + elif location == 'cpu': + self.stats['cpu_read_latencies'].append(timing.total) + else: + self.stats['storage_read_latencies'].append(timing.total) + self.stats['storage_read_device_latencies'].append(timing.device) + self.stats['storage_read_host_latencies'].append(timing.host) + + if self.model_config.kv_cache_size_per_token > 0: + num_tokens = entry_size / self.model_config.kv_cache_size_per_token + self.stats['storage_tokens_processed'] += num_tokens + + return location, timing.total + except Exception as e: + return location, 0.0 + + def _evaluate_storage_performance(self, duration: float) -> Dict: + """Evaluates storage performance against MLPerf Storage WG criteria.""" + criteria = [] + all_passed = True + + if self.performance_profile == 'throughput': + read_bytes = self.stats.get('tier_storage_kv_bytes_read', 0) + write_bytes = self.stats.get('tier_storage_kv_bytes_written', 0) + read_bw_gbps = (read_bytes / 1024**3) / duration if duration > 0 else 0 + write_bw_gbps = (write_bytes / 1024**3) / duration if duration > 0 else 0 + + # Only check read bandwidth if there were reads (skip for prefill-only mode) + if read_bytes > 0 or write_bytes == 0: + read_passed = read_bw_gbps > 0 + criteria.append({ + 'name': 'Storage KV Read Bandwidth', + 'target': '>0', 'actual': f"{read_bw_gbps:.2f}", 'unit': 'GB/s', 'passed': read_passed + }) + all_passed = all_passed and read_passed + + # Only check write bandwidth if there were writes (skip for decode-only mode) + if write_bytes > 0 or read_bytes == 0: + write_passed = write_bw_gbps > 0 + criteria.append({ + 'name': 'Storage KV Write Bandwidth', + 'target': '>0', 'actual': f"{write_bw_gbps:.2f}", 'unit': 'GB/s', 'passed': write_passed + }) + all_passed = all_passed and write_passed + + return { + 'overall_status': 'PASS' if all_passed else 'FAIL', + 'criteria': criteria, + 'passed_count': sum(1 for c in criteria if c['passed']), + 'total_count': len(criteria) + } + + # Latency-focused profile (default) + storage_write_device = self.stats.get('storage_write_device_latencies', []) + storage_write_total = self.stats.get('storage_write_latencies', []) + storage_write_basis = storage_write_device if storage_write_device else storage_write_total + latency_type = 'Device' if storage_write_device else 'Total' + if storage_write_basis: + storage_write_p95 = np.percentile(storage_write_basis, 95) * 1000 + passed = storage_write_p95 < 500 + criteria.append({ + 'name': f'Storage Tier Write {latency_type} P95 < 500ms', + 'target': 500, 'actual': storage_write_p95, 'unit': 'ms', 'passed': passed + }) + all_passed = all_passed and passed + + storage_read_device = self.stats.get('storage_read_device_latencies', []) + storage_read_total = self.stats.get('storage_read_latencies', []) + storage_read_basis = storage_read_device if storage_read_device else storage_read_total + latency_type = 'Device' if storage_read_device else 'Total' + if storage_read_basis: + storage_read_p95 = np.percentile(storage_read_basis, 95) * 1000 + passed = storage_read_p95 < 200 + criteria.append({ + 'name': f'Storage Tier Read {latency_type} P95 < 200ms', + 'target': 200, 'actual': storage_read_p95, 'unit': 'ms', 'passed': passed + }) + all_passed = all_passed and passed + + cpu_read_lats = self.stats.get('cpu_read_latencies', []) + cpu_write_lats = self.stats.get('cpu_write_latencies', []) + if cpu_read_lats or cpu_write_lats: + all_cpu_lats = cpu_read_lats + cpu_write_lats + cpu_p95 = np.percentile(all_cpu_lats, 95) * 1000 + passed = cpu_p95 < 150 + criteria.append({ + 'name': 'CPU RAM P95 < 150ms', + 'target': 150, 'actual': cpu_p95, 'unit': 'ms', 'passed': passed + }) + all_passed = all_passed and passed + + total_accesses = self.stats['cache_hits'] + self.stats['cache_misses'] + if total_accesses > 0: + hit_rate = self.stats['cache_hits'] / total_accesses + passed = hit_rate > 0.3 + criteria.append({ + 'name': 'Cache Hit Rate > 30%', + 'target': 0.3, 'actual': hit_rate, 'unit': 'ratio', 'passed': passed + }) + all_passed = all_passed and passed + + return { + 'overall_status': 'PASS' if all_passed else 'FAIL', + 'criteria': criteria, + 'passed_count': sum(1 for c in criteria if c['passed']), + 'total_count': len(criteria) + } + + def get_stats(self, duration: float) -> Dict: + """Gathers and returns a comprehensive dictionary of all performance statistics.""" + with self.stats_lock: + total_accesses = self.stats['cache_hits'] + self.stats['cache_misses'] + hit_rate = self.stats['cache_hits'] / total_accesses if total_accesses > 0 else 0 + stats_snapshot = self.stats.copy() + + with self.metadata_lock: + gpu_entries = sum(1 for e in self.cache_entries.values() if e['location'] == 'gpu') + cpu_entries = sum(1 for e in self.cache_entries.values() if e['location'] == 'cpu') + nvme_entries = sum(1 for e in self.cache_entries.values() if e['location'] == 'nvme') + + with self.memory_lock: + gpu_mem_used = self.gpu_memory_used + cpu_mem_used = self.cpu_memory_used + + storage_health = self._evaluate_storage_performance(duration) + + tier_gpu_read_bytes = self.stats['tier_gpu_kv_bytes_read'] + tier_gpu_write_bytes = self.stats['tier_gpu_kv_bytes_written'] + tier_cpu_read_bytes = self.stats['tier_cpu_kv_bytes_read'] + tier_cpu_write_bytes = self.stats['tier_cpu_kv_bytes_written'] + tier_storage_read_bytes = self.stats['tier_storage_kv_bytes_read'] + tier_storage_write_bytes = self.stats['tier_storage_kv_bytes_written'] + + stats = { + 'cache_hit_rate': hit_rate, + 'cache_hits': stats_snapshot['cache_hits'], + 'cache_misses': stats_snapshot['cache_misses'], + 'gpu_entries': gpu_entries, + 'cpu_entries': cpu_entries, + 'storage_entries': nvme_entries, + 'gpu_memory_used_gb': gpu_mem_used / 1024**3, + 'cpu_memory_used_gb': cpu_mem_used / 1024**3, + 'offloads_cpu': stats_snapshot['offloads_cpu'], + 'offloads_storage': stats_snapshot['offloads_storage'], + 'storage_health': storage_health, + 'prefill_writes': self.stats['prefill_writes'], + 'decode_reads': self.stats['decode_reads'], + + 'tier_gpu_kv_bytes_written_gb': tier_gpu_write_bytes / 1024**3, + 'tier_cpu_kv_bytes_written_gb': tier_cpu_write_bytes / 1024**3, + 'tier_storage_kv_bytes_written_gb': tier_storage_write_bytes / 1024**3, + 'tier_gpu_kv_bytes_read_gb': tier_gpu_read_bytes / 1024**3, + 'tier_cpu_kv_bytes_read_gb': tier_cpu_read_bytes / 1024**3, + 'tier_storage_kv_bytes_read_gb': tier_storage_read_bytes / 1024**3, + + 'tier_gpu_read_bandwidth_gbps': (tier_gpu_read_bytes / 1024**3) / duration if duration > 0 else 0, + 'tier_gpu_write_bandwidth_gbps': (tier_gpu_write_bytes / 1024**3) / duration if duration > 0 else 0, + 'tier_cpu_read_bandwidth_gbps': (tier_cpu_read_bytes / 1024**3) / duration if duration > 0 else 0, + 'tier_cpu_write_bandwidth_gbps': (tier_cpu_write_bytes / 1024**3) / duration if duration > 0 else 0, + 'tier_storage_read_bandwidth_gbps': (tier_storage_read_bytes / 1024**3) / duration if duration > 0 else 0, + 'tier_storage_write_bandwidth_gbps': (tier_storage_write_bytes / 1024**3) / duration if duration > 0 else 0, + + 'system_prompt_hits': self.stats['system_prompt_hits'], + 'common_phrase_hits': self.stats['common_phrase_hits'], + 'user_cache_hits': self.stats['user_cache_hits'], + 'multi_turn_hits': self.stats['multi_turn_hits'], + 'total_read_bytes': self.stats['total_read_bytes'], + 'total_write_bytes': self.stats['total_write_bytes'], + 'total_read_gb': self.stats['total_read_bytes'] / 1024**3, + 'total_write_gb': self.stats['total_write_bytes'] / 1024**3, + 'read_write_ratio': self.stats['total_read_bytes'] / max(self.stats['total_write_bytes'], 1), + 'read_iops': self.stats['read_operations'], + 'write_iops': self.stats['write_operations'], + 'storage_tokens_processed': self.stats['storage_tokens_processed'], + } + + # tier_mapping = {'gpu': 'gpu', 'cpu': 'cpu', 'nvme': 'storage'} + for internal_tier, output_tier in [('gpu', 'gpu'), ('cpu', 'cpu'), ('storage', 'storage')]: + for op in ['read', 'write']: + latencies = self.stats.get(f'{internal_tier}_{op}_latencies', []) + if latencies: + lat_array = np.array(latencies) + stats[f'{output_tier}_{op}_p50_ms'] = np.percentile(lat_array, 50) * 1000 + stats[f'{output_tier}_{op}_p95_ms'] = np.percentile(lat_array, 95) * 1000 + stats[f'{output_tier}_{op}_p99_ms'] = np.percentile(lat_array, 99) * 1000 + stats[f'{output_tier}_{op}_p999_ms'] = np.percentile(lat_array, 99.9) * 1000 + stats[f'{output_tier}_{op}_p9999_ms'] = np.percentile(lat_array, 99.99) * 1000 + + for op in ['read', 'write']: + device_latencies = self.stats.get(f'storage_{op}_device_latencies', []) + host_latencies = self.stats.get(f'storage_{op}_host_latencies', []) + if device_latencies: + device_array = np.array(device_latencies) + stats[f'storage_{op}_device_p50_ms'] = np.percentile(device_array, 50) * 1000 + stats[f'storage_{op}_device_p95_ms'] = np.percentile(device_array, 95) * 1000 + stats[f'storage_{op}_device_p99_ms'] = np.percentile(device_array, 99) * 1000 + stats[f'storage_{op}_device_p999_ms'] = np.percentile(device_array, 99.9) * 1000 + stats[f'storage_{op}_device_p9999_ms'] = np.percentile(device_array, 99.99) * 1000 + if host_latencies: + host_array = np.array(host_latencies) + stats[f'storage_{op}_host_p50_ms'] = np.percentile(host_array, 50) * 1000 + stats[f'storage_{op}_host_p95_ms'] = np.percentile(host_array, 95) * 1000 + stats[f'storage_{op}_host_p99_ms'] = np.percentile(host_array, 99) * 1000 + stats[f'storage_{op}_host_p999_ms'] = np.percentile(host_array, 99.9) * 1000 + stats[f'storage_{op}_host_p9999_ms'] = np.percentile(host_array, 99.99) * 1000 + + return stats + + def reset_stats(self): + """Reset all performance counters (used after preconditioning).""" + with self.stats_lock: + for key, value in self.stats.items(): + if isinstance(value, list): + self.stats[key] = [] + elif isinstance(value, (int, float)): + self.stats[key] = 0 diff --git a/kv_cache_benchmark/kv_cache/cli.py b/kv_cache_benchmark/kv_cache/cli.py new file mode 100755 index 00000000..03864c3b --- /dev/null +++ b/kv_cache_benchmark/kv_cache/cli.py @@ -0,0 +1,406 @@ +""" +Command-line interface for KV Cache Benchmark. + +Contains validate_args(), main(), and export_results_to_xlsx(). +""" + +import os +import sys +import json +import random +import logging +import argparse +from datetime import datetime +from dataclasses import is_dataclass, asdict +from typing import Dict + +import numpy as np + +from kv_cache._compat import ( + TORCH_AVAILABLE, CUPY_AVAILABLE, PANDAS_AVAILABLE, OPENPYXL_AVAILABLE, +) +from kv_cache.config import ConfigLoader, set_config, cfg +from kv_cache.models import ( + MODEL_CONFIGS, ModelConfig, GenerationMode, QoSLevel, + QOS_PROFILES, get_qos_profiles, +) +from kv_cache.workload import validate_args +from kv_cache.benchmark import IntegratedBenchmark + +if TORCH_AVAILABLE: + import torch +if CUPY_AVAILABLE: + import cupy as cp +if PANDAS_AVAILABLE: + import pandas as pd + +logger = logging.getLogger(__name__) + + +def export_results_to_xlsx(results: Dict, args, output_path: str): + """ + Export benchmark results to an Excel file with run parameters embedded. + Falls back to CSV if openpyxl is not available. + """ + if not PANDAS_AVAILABLE: + logger.warning("pandas not available, skipping XLSX export. Install with: pip install pandas") + return + + summary = results.get('summary', {}) + if not summary: + logger.warning("No summary data available for XLSX export") + return + + def get_nested(d, keys, default=None): + for key in keys: + if isinstance(d, dict): + d = d.get(key, default) + else: + return default + return d + + run_params = { + 'Timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S'), + 'Model': args.model, + 'Num Users': args.num_users, + 'Duration (s)': args.duration, + 'GPU Memory (GB)': args.gpu_mem_gb, + 'CPU Memory (GB)': args.cpu_mem_gb, + 'Generation Mode': args.generation_mode, + 'Performance Profile': args.performance_profile, + 'Multi-turn': not args.disable_multi_turn, + 'Prefix Caching': not args.disable_prefix_caching, + 'RAG Enabled': args.enable_rag, + 'Autoscaling': args.enable_autoscaling, + 'Seed': args.seed, + 'Max Concurrent Allocs': args.max_concurrent_allocs, + 'Request Rate': args.request_rate, + 'Max Requests': args.max_requests, + 'Dataset Path': args.dataset_path or 'N/A', + 'Cache Dir': args.cache_dir or 'temp', + 'Storage Capacity (GB)': args.storage_capacity_gb, + 'Precondition': args.precondition, + 'Precondition Size (GB)': args.precondition_size_gb, + 'Precondition Threads': args.precondition_threads if args.precondition_threads > 0 else (os.cpu_count() or 4), + 'Trace Speedup': args.trace_speedup, + 'Replay Cycles': args.replay_cycles, + } + + metrics = { + 'Total Requests': summary.get('total_requests'), + 'Total Tokens': summary.get('total_tokens'), + 'Elapsed Time (s)': summary.get('elapsed_time'), + 'Avg Throughput (tok/s)': summary.get('avg_throughput_tokens_per_sec'), + 'Storage Throughput (tok/s)': summary.get('storage_throughput_tokens_per_sec'), + 'Requests/sec': summary.get('requests_per_second'), + + 'E2E Latency Mean (ms)': get_nested(summary, ['end_to_end_latency_ms', 'mean']), + 'E2E Latency P50 (ms)': get_nested(summary, ['end_to_end_latency_ms', 'p50']), + 'E2E Latency P95 (ms)': get_nested(summary, ['end_to_end_latency_ms', 'p95']), + 'E2E Latency P99 (ms)': get_nested(summary, ['end_to_end_latency_ms', 'p99']), + 'E2E Latency P99.9 (ms)': get_nested(summary, ['end_to_end_latency_ms', 'p999']), + 'E2E Latency P99.99 (ms)': get_nested(summary, ['end_to_end_latency_ms', 'p9999']), + + 'Storage Latency Mean (ms)': get_nested(summary, ['storage_io_latency_ms', 'mean']), + 'Storage Latency P50 (ms)': get_nested(summary, ['storage_io_latency_ms', 'p50']), + 'Storage Latency P95 (ms)': get_nested(summary, ['storage_io_latency_ms', 'p95']), + 'Storage Latency P99 (ms)': get_nested(summary, ['storage_io_latency_ms', 'p99']), + 'Storage Latency P99.9 (ms)': get_nested(summary, ['storage_io_latency_ms', 'p999']), + 'Storage Latency P99.99 (ms)': get_nested(summary, ['storage_io_latency_ms', 'p9999']), + + 'Gen Latency Mean (ms)': get_nested(summary, ['generation_latency_ms', 'mean']), + 'Gen Latency P50 (ms)': get_nested(summary, ['generation_latency_ms', 'p50']), + 'Gen Latency P95 (ms)': get_nested(summary, ['generation_latency_ms', 'p95']), + 'Gen Latency P99 (ms)': get_nested(summary, ['generation_latency_ms', 'p99']), + + 'Storage Tier Read Total P50 (ms)': get_nested(summary, ['cache_stats', 'storage_read_p50_ms']), + 'Storage Tier Read Total P95 (ms)': get_nested(summary, ['cache_stats', 'storage_read_p95_ms']), + 'Storage Tier Read Total P99 (ms)': get_nested(summary, ['cache_stats', 'storage_read_p99_ms']), + 'Storage Tier Read Total P99.9 (ms)': get_nested(summary, ['cache_stats', 'storage_read_p999_ms']), + 'Storage Tier Read Total P99.99 (ms)': get_nested(summary, ['cache_stats', 'storage_read_p9999_ms']), + 'Storage Tier Write Total P50 (ms)': get_nested(summary, ['cache_stats', 'storage_write_p50_ms']), + 'Storage Tier Write Total P95 (ms)': get_nested(summary, ['cache_stats', 'storage_write_p95_ms']), + 'Storage Tier Write Total P99 (ms)': get_nested(summary, ['cache_stats', 'storage_write_p99_ms']), + 'Storage Tier Write Total P99.9 (ms)': get_nested(summary, ['cache_stats', 'storage_write_p999_ms']), + 'Storage Tier Write Total P99.99 (ms)': get_nested(summary, ['cache_stats', 'storage_write_p9999_ms']), + + 'Storage Tier Read Device P50 (ms)': get_nested(summary, ['cache_stats', 'storage_read_device_p50_ms']), + 'Storage Tier Read Device P95 (ms)': get_nested(summary, ['cache_stats', 'storage_read_device_p95_ms']), + 'Storage Tier Read Device P99 (ms)': get_nested(summary, ['cache_stats', 'storage_read_device_p99_ms']), + 'Storage Tier Read Device P99.9 (ms)': get_nested(summary, ['cache_stats', 'storage_read_device_p999_ms']), + 'Storage Tier Read Device P99.99 (ms)': get_nested(summary, ['cache_stats', 'storage_read_device_p9999_ms']), + 'Storage Tier Write Device P50 (ms)': get_nested(summary, ['cache_stats', 'storage_write_device_p50_ms']), + 'Storage Tier Write Device P95 (ms)': get_nested(summary, ['cache_stats', 'storage_write_device_p95_ms']), + 'Storage Tier Write Device P99 (ms)': get_nested(summary, ['cache_stats', 'storage_write_device_p99_ms']), + 'Storage Tier Write Device P99.9 (ms)': get_nested(summary, ['cache_stats', 'storage_write_device_p999_ms']), + 'Storage Tier Write Device P99.99 (ms)': get_nested(summary, ['cache_stats', 'storage_write_device_p9999_ms']), + + 'Storage Tier Read Host P50 (ms)': get_nested(summary, ['cache_stats', 'storage_read_host_p50_ms']), + 'Storage Tier Read Host P95 (ms)': get_nested(summary, ['cache_stats', 'storage_read_host_p95_ms']), + 'Storage Tier Read Host P99 (ms)': get_nested(summary, ['cache_stats', 'storage_read_host_p99_ms']), + 'Storage Tier Read Host P99.9 (ms)': get_nested(summary, ['cache_stats', 'storage_read_host_p999_ms']), + 'Storage Tier Read Host P99.99 (ms)': get_nested(summary, ['cache_stats', 'storage_read_host_p9999_ms']), + 'Storage Tier Write Host P50 (ms)': get_nested(summary, ['cache_stats', 'storage_write_host_p50_ms']), + 'Storage Tier Write Host P95 (ms)': get_nested(summary, ['cache_stats', 'storage_write_host_p95_ms']), + 'Storage Tier Write Host P99 (ms)': get_nested(summary, ['cache_stats', 'storage_write_host_p99_ms']), + 'Storage Tier Write Host P99.9 (ms)': get_nested(summary, ['cache_stats', 'storage_write_host_p999_ms']), + 'Storage Tier Write Host P99.99 (ms)': get_nested(summary, ['cache_stats', 'storage_write_host_p9999_ms']), + + 'Cache Hit Rate': get_nested(summary, ['cache_stats', 'cache_hit_rate']), + 'Read/Write Ratio': get_nested(summary, ['cache_stats', 'read_write_ratio']), + 'Total Read (GB)': get_nested(summary, ['cache_stats', 'total_read_gb']), + 'Total Write (GB)': get_nested(summary, ['cache_stats', 'total_write_gb']), + + 'Tier GPU KV Bytes Written (GB)': get_nested(summary, ['cache_stats', 'tier_gpu_kv_bytes_written_gb']), + 'Tier CPU KV Bytes Written (GB)': get_nested(summary, ['cache_stats', 'tier_cpu_kv_bytes_written_gb']), + 'Tier Storage KV Bytes Written (GB)': get_nested(summary, ['cache_stats', 'tier_storage_kv_bytes_written_gb']), + + 'Tier GPU KV Bytes Read (GB)': get_nested(summary, ['cache_stats', 'tier_gpu_kv_bytes_read_gb']), + 'Tier CPU KV Bytes Read (GB)': get_nested(summary, ['cache_stats', 'tier_cpu_kv_bytes_read_gb']), + 'Tier Storage KV Bytes Read (GB)': get_nested(summary, ['cache_stats', 'tier_storage_kv_bytes_read_gb']), + + 'Tier GPU Read Bandwidth (GB/s)': get_nested(summary, ['cache_stats', 'tier_gpu_read_bandwidth_gbps']), + 'Tier GPU Write Bandwidth (GB/s)': get_nested(summary, ['cache_stats', 'tier_gpu_write_bandwidth_gbps']), + 'Tier CPU Read Bandwidth (GB/s)': get_nested(summary, ['cache_stats', 'tier_cpu_read_bandwidth_gbps']), + 'Tier CPU Write Bandwidth (GB/s)': get_nested(summary, ['cache_stats', 'tier_cpu_write_bandwidth_gbps']), + 'Tier Storage Read Bandwidth (GB/s)': get_nested(summary, ['cache_stats', 'tier_storage_read_bandwidth_gbps']), + 'Tier Storage Write Bandwidth (GB/s)': get_nested(summary, ['cache_stats', 'tier_storage_write_bandwidth_gbps']), + + 'GPU Entries': get_nested(summary, ['cache_stats', 'gpu_entries']), + 'CPU Entries': get_nested(summary, ['cache_stats', 'cpu_entries']), + 'Storage Entries': get_nested(summary, ['cache_stats', 'storage_entries']), + + 'Multi-turn Hit Rate': get_nested(summary, ['multi_turn_stats', 'hit_rate']), + } + + combined_row = {**run_params, **metrics} + + df = pd.DataFrame([combined_row]) + + use_excel = OPENPYXL_AVAILABLE and output_path.endswith('.xlsx') + + try: + if use_excel: + with pd.ExcelWriter(output_path, engine='openpyxl') as writer: + df.to_excel(writer, sheet_name='Summary', index=False) + + params_df = pd.DataFrame(list(run_params.items()), columns=['Parameter', 'Value']) + params_df.to_excel(writer, sheet_name='Run Parameters', index=False) + + metrics_df = pd.DataFrame(list(metrics.items()), columns=['Metric', 'Value']) + metrics_df.to_excel(writer, sheet_name='Performance Metrics', index=False) + + qos_metrics = summary.get('qos_metrics', {}) + if qos_metrics: + is_throughput = args.performance_profile == 'throughput' + qos_rows = [] + for level, data in qos_metrics.items(): + if isinstance(data, dict) and not data.get('no_data'): + qos_rows.append({ + 'QoS Level': level, + 'Total Requests': data.get('total_requests'), + 'Latency P95 (ms)': get_nested(data, ['latency_ms', 'p95']), + 'Latency P99 (ms)': get_nested(data, ['latency_ms', 'p99']), + 'SLA Met': 'N/A (throughput mode)' if is_throughput else get_nested(data, ['sla', 'met']), + 'SLA Compliance': 'N/A (throughput mode)' if is_throughput else get_nested(data, ['sla', 'compliance']), + }) + if qos_rows: + qos_df = pd.DataFrame(qos_rows) + qos_df.to_excel(writer, sheet_name='QoS Metrics', index=False) + + logger.info(f"XLSX results saved to {output_path}") + else: + csv_path = output_path.replace('.xlsx', '.csv') if output_path.endswith('.xlsx') else output_path + if not csv_path.endswith('.csv'): + csv_path += '.csv' + df.to_csv(csv_path, index=False) + logger.info(f"CSV results saved to {csv_path} (openpyxl not available for XLSX)") + + except Exception as e: + logger.error(f"Error saving XLSX/CSV: {e}") + try: + csv_path = output_path.replace('.xlsx', '.csv') + df.to_csv(csv_path, index=False) + logger.info(f"Fallback CSV saved to {csv_path}") + except Exception as e2: + logger.error(f"Failed to save results: {e2}") + + +def main(): + """Main entry point for running the benchmark from the command line.""" + parser = argparse.ArgumentParser(description="Integrated Multi-User KV Cache Benchmark") + parser.add_argument('--log-level', type=str, default='INFO', + choices=['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'], + help='Set the logging level (default: INFO)') + parser.add_argument('--model', type=str, default='llama3.1-8b', + help='The model configuration to use. Models are loaded from config.yaml.') + parser.add_argument('--num-users', type=int, default=100, + help='The number of concurrent users to simulate.') + parser.add_argument('--duration', type=int, default=60, + help='The duration of the benchmark in seconds.') + parser.add_argument('--gpu-mem-gb', type=float, default=16, + help='The amount of GPU memory (VRAM) to allocate for the cache in GB.') + parser.add_argument('--cpu-mem-gb', type=float, default=32, + help='The amount of CPU memory (RAM) to allocate for the cache in GB.') + parser.add_argument('--cache-dir', type=str, default=None, + help='The directory to use for the NVMe cache tier.') + parser.add_argument('--generation-mode', type=str, default='realistic', choices=[g.value for g in GenerationMode], + help='The token generation speed simulation mode.') + parser.add_argument('--performance-profile', type=str, default='latency', choices=['latency', 'throughput'], + help='The performance profile to use for pass/fail criteria.') + parser.add_argument('--disable-multi-turn', action='store_true', + help='Disable multi-turn conversation caching.') + parser.add_argument('--disable-prefix-caching', action='store_true', + help='Disable prefix caching.') + parser.add_argument('--enable-rag', action='store_true', + help='Enable the RAG workload simulation.') + parser.add_argument('--rag-num-docs', type=int, default=10, help='Number of RAG documents to ingest') + parser.add_argument('--enable-autoscaling', action='store_true', + help='Enable workload autoscaling.') + parser.add_argument('--autoscaler-mode', type=str, default='qos', choices=['qos', 'capacity'], + help='The autoscaling strategy.') + parser.add_argument('--target-saturation', type=float, default=0.8, help='Target storage saturation (0.0-1.0)') + parser.add_argument('--use-burst-trace', action='store_true', + help='Use BurstGPT trace for workload generation.') + parser.add_argument('--burst-trace-path', type=str, default='BurstGPT/data/BurstGPT_1.csv', + help='Path to the BurstGPT trace file.') + parser.add_argument('--validation-trace', type=str, default=None, + help='Path to a real-world trace file for validation.') + parser.add_argument('--dataset-path', type=str, default=None, + help='Path to ShareGPT dataset JSON file.') + parser.add_argument('--max-conversations', type=int, default=500, + help='Maximum number of conversations from ShareGPT dataset.') + parser.add_argument('--output', type=str, default=f"benchmark_results_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json", help='Output file for results') + parser.add_argument('--seed', type=int, default=None, + help='Seed for random number generators.') + parser.add_argument('--max-concurrent-allocs', type=int, default=0, + help='Limit concurrent allocations. 0 = unlimited.') + parser.add_argument('--request-rate', type=float, default=0, + help='Target request arrival rate (requests/sec). 0 = unlimited.') + parser.add_argument('--max-requests', type=int, default=0, + help='Stop after completing N requests (0 = use duration instead).') + parser.add_argument('--xlsx-output', type=str, default=None, + help='Optional: Output Excel file path.') + parser.add_argument('--config', type=str, default=None, + help='Path to YAML configuration file.') + parser.add_argument('--storage-capacity-gb', type=float, default=0, + help='NVMe/storage tier capacity in GB. 0 = auto-detect.') + parser.add_argument('--precondition', action='store_true', + help='Enable SSD preconditioning phase before benchmark.') + parser.add_argument('--precondition-size-gb', type=float, default=0, + help='Preconditioning data volume in GB. 0 = 2x NVMe capacity.') + parser.add_argument('--precondition-threads', type=int, default=0, + help='Number of threads for preconditioning writes. 0 = os.cpu_count().') + parser.add_argument('--trace-speedup', type=float, default=1.0, + help='Speedup factor for BurstGPT trace replay timestamps.') + parser.add_argument('--replay-cycles', type=int, default=0, + help='Number of complete passes through the trace dataset. 0 = infinite.') + parser.add_argument('--prefill-only', action='store_true', + help='Simulate disaggregated prefill node (write-heavy, no decode reads).') + parser.add_argument('--decode-only', action='store_true', + help='Simulate disaggregated decode node (read-heavy, assumes KV cache exists).') + + args = parser.parse_args() + + # Validate mutually exclusive flags + if args.prefill_only and args.decode_only: + parser.error("--prefill-only and --decode-only are mutually exclusive") + + logging.basicConfig( + level=getattr(logging, args.log_level), + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', + datefmt='%Y-%m-%d %H:%M:%S' + ) + + args = validate_args(args) + + if args.config: + config = ConfigLoader(args.config) + set_config(config) + logger.info(f"Loaded configuration from {args.config}") + + # Refresh MODEL_CONFIGS and QOS_PROFILES with config values + import kv_cache.models as _models + _models.MODEL_CONFIGS = _models.get_model_configs() + _models.QOS_PROFILES = get_qos_profiles() + + # Re-import MODEL_CONFIGS after potential config reload + from kv_cache.models import MODEL_CONFIGS as CURRENT_MODEL_CONFIGS + + # Validate model choice + if args.model not in CURRENT_MODEL_CONFIGS: + available = ', '.join(sorted(CURRENT_MODEL_CONFIGS.keys())) + logger.error(f"Unknown model '{args.model}'. Available models: {available}") + sys.exit(1) + + if args.seed is not None: + logger.info(f"Using random seed: {args.seed}") + random.seed(args.seed) + np.random.seed(args.seed) + if TORCH_AVAILABLE: + torch.manual_seed(args.seed) + if CUPY_AVAILABLE: + cp.random.seed(args.seed) + + model_config = CURRENT_MODEL_CONFIGS[args.model] + gen_mode = GenerationMode(args.generation_mode) + + benchmark = IntegratedBenchmark( + model_config=model_config, + num_users=args.num_users, + gpu_memory_gb=args.gpu_mem_gb, + cpu_memory_gb=args.cpu_mem_gb, + duration_seconds=args.duration, + cache_dir=args.cache_dir, + enable_autoscaling=args.enable_autoscaling, + autoscaler_mode=args.autoscaler_mode, + target_saturation=args.target_saturation, + enable_multi_turn=not args.disable_multi_turn, + enable_prefix_caching=not args.disable_prefix_caching, + enable_rag=args.enable_rag, + rag_num_docs=args.rag_num_docs, + validation_trace=args.validation_trace, + generation_mode=gen_mode, + performance_profile=args.performance_profile, + use_burst_trace=args.use_burst_trace, + burst_trace_path=args.burst_trace_path, + dataset_path=args.dataset_path, + max_conversations=args.max_conversations, + seed=args.seed, + max_concurrent_allocs=args.max_concurrent_allocs, + request_rate=args.request_rate, + max_requests=args.max_requests, + storage_capacity_gb=args.storage_capacity_gb, + precondition=args.precondition, + precondition_size_gb=args.precondition_size_gb, + precondition_threads=args.precondition_threads, + trace_speedup=args.trace_speedup, + replay_cycles=args.replay_cycles, + prefill_only=args.prefill_only, + decode_only=args.decode_only + ) + + results = benchmark.run() + + def convert_numpy(obj): + if isinstance(obj, np.ndarray): + return obj.tolist() + if isinstance(obj, np.generic): + return obj.item() + if isinstance(obj, datetime): + return obj.isoformat() + if is_dataclass(obj): + return asdict(obj) + raise TypeError(f"Object of type {type(obj)} is not JSON serializable") + + with open(args.output, 'w') as f: + json.dump(results, f, indent=4, default=convert_numpy) + + logger.info(f"Results saved to {args.output}") + + if args.xlsx_output: + export_results_to_xlsx(results, args, args.xlsx_output) + + +if __name__ == "__main__": + main() diff --git a/kv_cache_benchmark/kv_cache/config.py b/kv_cache_benchmark/kv_cache/config.py new file mode 100755 index 00000000..24f6183e --- /dev/null +++ b/kv_cache_benchmark/kv_cache/config.py @@ -0,0 +1,225 @@ +""" +Configuration loader and global config accessors for KV Cache Benchmark. + +Provides YAML-based config loading with strict schema validation, +plus module-level cfg()/get_config()/set_config() accessors. +""" + +import logging +from pathlib import Path +from typing import Optional + +from kv_cache._compat import HAS_YAML + +if HAS_YAML: + import yaml + +logger = logging.getLogger(__name__) + + +class ConfigLoader: + """ + Loads and validates benchmark configuration from YAML files. + + Raises errors on invalid/unknown keys to prevent silent misconfigurations + in MLPerf competition submissions. + """ + + # Define the valid configuration schema with expected types + VALID_SCHEMA = { + 'model_configs': ..., # Dynamic keys (model names) with nested model properties + 'user_templates': { + 'chatbot': {'context_range': list, 'generation_range': list, 'think_time_range': list}, + 'coding': {'context_range': list, 'generation_range': list, 'think_time_range': list}, + 'document': {'context_range': list, 'generation_range': list, 'think_time_range': list}, + }, + 'generation_timing': { + 'none': (int, float), + 'fast': (int, float), + 'realistic': (int, float), + }, + 'qos_profiles': { + 'interactive': {'target_latency_p95_ms': (int, float), 'target_latency_p99_ms': (int, float), + 'target_latency_p999_ms': (int, float), 'target_latency_p9999_ms': (int, float), 'priority': int}, + 'responsive': {'target_latency_p95_ms': (int, float), 'target_latency_p99_ms': (int, float), + 'target_latency_p999_ms': (int, float), 'target_latency_p9999_ms': (int, float), 'priority': int}, + 'batch': {'target_latency_p95_ms': (int, float), 'target_latency_p99_ms': (int, float), + 'target_latency_p999_ms': (int, float), 'target_latency_p9999_ms': (int, float), 'priority': int}, + }, + 'qos_distribution': { + 'interactive_probability': (int, float), + 'responsive_threshold': (int, float), + }, + 'eviction': { + 'max_recursion_depth': int, + 'target_usage_ratio': (int, float), + 'large_entry_limit_ratio': (int, float), + 'max_evictions_hard_cap': int, + 'max_evictions_min': int, + }, + 'gpu_backend': { + 'memory_fraction': (int, float), + 'max_eviction_attempts': int, + 'free_memory_threshold': (int, float), + }, + 'prefix_cache': { + 'min_prefix_length': int, + 'max_prefix_entries': int, + 'system_prompt_hit_probability': (int, float), + }, + 'rag': { + 'chunk_size_tokens': int, + 'top_k_chunks': int, + 'max_chunk_bytes': int, + 'request_probability': (int, float), + 'retrieval_distribution': str, + 'max_documents': int, + 'large_model_doc_tokens_min': int, + 'large_model_doc_tokens_max': int, + 'small_model_doc_tokens_min': int, + 'small_model_doc_tokens_max': int, + }, + 'conversation': { + 'max_conversations': int, + 'max_turns_per_conv': int, + 'end_conversation_probability': (int, float), + }, + 'autoscaler': { + 'min_users': int, + 'max_users': int, + 'scale_up_factor': (int, float), + 'scale_down_factor': (int, float), + 'consecutive_samples_required': int, + }, + 'decode': { + 'batch_size': int, + }, + 'sharegpt': { + 'max_context_tokens': int, + 'max_generation_tokens': int, + 'chars_per_token_estimate': int, + }, + 'saturation_detection': { + 'read_latency_p95_threshold_ms': (int, float), + 'write_latency_p95_threshold_ms': (int, float), + 'queue_depth_threshold': int, + 'history_window_size': int, + }, + 'validation_limits': { + 'max_users': int, + 'max_duration_seconds': int, + 'max_gpu_memory_gb': int, + 'max_cpu_memory_gb': int, + }, + } + + def __init__(self, config_path: Optional[str] = None): + """ + Initialize the ConfigLoader. + + Args: + config_path: Path to YAML config file. If None, uses built-in defaults. + """ + self.config_path = config_path + self.config = {} + + if config_path: + self._load_and_validate(config_path) + + def _load_and_validate(self, config_path: str) -> None: + """Load YAML config and validate strictly against schema.""" + if not HAS_YAML: + raise RuntimeError("pyyaml is required for config file support. Install with: pip install pyyaml") + + path = Path(config_path) + if not path.exists(): + raise FileNotFoundError(f"Config file not found: {config_path}") + + with open(path, 'r') as f: + self.config = yaml.safe_load(f) or {} + + # Validate all keys against schema + self._validate_keys(self.config, self.VALID_SCHEMA, path_prefix='') + + logger.info(f"Loaded configuration from {config_path}") + + def _validate_keys(self, config: dict, schema: dict, path_prefix: str) -> None: + """Recursively validate config keys against schema. Raises on unknown keys.""" + for key, value in config.items(): + full_path = f"{path_prefix}.{key}" if path_prefix else key + + if key not in schema: + raise ValueError(f"Unknown configuration key: '{full_path}'. " + f"Valid keys at this level: {list(schema.keys())}") + + expected_type = schema[key] + + # Ellipsis (...) means "allow any structure" - skip validation + if expected_type is ...: + continue + + # If schema expects a dict, recurse + if isinstance(expected_type, dict): + if not isinstance(value, dict): + raise ValueError(f"Config key '{full_path}' must be a dict, got {type(value).__name__}") + self._validate_keys(value, expected_type, full_path) + else: + # Validate type + if isinstance(expected_type, tuple): + if not isinstance(value, expected_type): + raise ValueError(f"Config key '{full_path}' must be one of {expected_type}, " + f"got {type(value).__name__}") + elif not isinstance(value, expected_type): + raise ValueError(f"Config key '{full_path}' must be {expected_type.__name__}, " + f"got {type(value).__name__}") + + def get(self, *keys, default=None): + """ + Get a nested configuration value. + + Args: + *keys: Path to the config value (e.g., 'qos_profiles', 'interactive', 'priority') + default: Default value if key not found + + Returns: + The config value or default + """ + value = self.config + for key in keys: + if isinstance(value, dict) and key in value: + value = value[key] + else: + return default + return value + + +# Global config instance (set from main() when --config is provided) +_global_config: Optional[ConfigLoader] = None + + +def get_config() -> Optional[ConfigLoader]: + """Get the global configuration loader instance.""" + return _global_config + + +def set_config(config: ConfigLoader) -> None: + """Set the global configuration loader instance.""" + global _global_config + _global_config = config + + +def cfg(*keys, default=None): + """ + Get a configuration value from the global config, with fallback to default. + + Args: + *keys: Path to the config value (e.g., 'qos_profiles', 'interactive', 'priority') + default: Default value if config not loaded or key not found + + Returns: + The config value or default + """ + config = get_config() + if config is None: + return default + return config.get(*keys, default=default) diff --git a/kv_cache_benchmark/kv_cache/conversation.py b/kv_cache_benchmark/kv_cache/conversation.py new file mode 100755 index 00000000..7cdab358 --- /dev/null +++ b/kv_cache_benchmark/kv_cache/conversation.py @@ -0,0 +1,121 @@ +""" +Stateful multi-turn conversation management for KV Cache Benchmark. + +Tracks conversation state and cache key history across turns, +enabling cache reuse in conversational AI workloads. +""" + +import time +import hashlib +import threading +from dataclasses import dataclass, field +from typing import Dict, List, Optional, Tuple +from datetime import datetime + +from kv_cache.config import cfg +from kv_cache.models import InferenceRequest + + +@dataclass +class ConversationState: + """Tracks the state of a single multi-turn conversation for a user.""" + conversation_id: str + user_id: str + turn_number: int + created_at: datetime + last_access: datetime + + # KV cache management for this conversation. + cache_keys: List[str] = field(default_factory=list) + cumulative_tokens: int = 0 + cache_locations: Dict[str, str] = field(default_factory=dict) + + # Metadata for advanced caching strategies. + system_prompt_key: Optional[str] = None + common_prefix_keys: List[str] = field(default_factory=list) + + # Performance tracking for this conversation. + turns_completed: int = 0 + total_latency: float = 0.0 + cache_hits: int = 0 + cache_misses: int = 0 + + +class ConversationManager: + """Manages the lifecycle of all multi-turn conversations and enables cache reuse.""" + + def __init__(self, max_conversations: int = None, max_turns_per_conv: int = None): + self.conversations: Dict[str, ConversationState] = {} + self.max_conversations = max_conversations if max_conversations is not None else cfg('conversation', 'max_conversations', default=1000) + self.max_turns_per_conv = max_turns_per_conv if max_turns_per_conv is not None else cfg('conversation', 'max_turns_per_conv', default=50) + self.lock = threading.Lock() + + def start_conversation(self, user_id: str, system_prompt: Optional[str] = None) -> str: + """Initializes a new conversation for a given user.""" + conv_id = f"conv_{user_id}_{int(time.time()*1000)}" + + state = ConversationState( + conversation_id=conv_id, + user_id=user_id, + turn_number=0, + created_at=datetime.now(), + last_access=datetime.now(), + cache_keys=[], + cumulative_tokens=0, + cache_locations={} + ) + + if system_prompt: + state.system_prompt_key = f"system_prompt_{hashlib.sha256(system_prompt.encode()).hexdigest()[:16]}" + + with self.lock: + if len(self.conversations) >= self.max_conversations: + self._evict_oldest_conversation() + + self.conversations[conv_id] = state + + return conv_id + + def add_turn(self, conversation_id: str, user_message_tokens: int, + assistant_response_tokens: int) -> Tuple[int, str]: + """Adds a new turn to an existing conversation, updating its state.""" + with self.lock: + if conversation_id not in self.conversations: + raise ValueError(f"Conversation {conversation_id} not found") + + state = self.conversations[conversation_id] + state.turn_number += 1 + state.last_access = datetime.now() + + turn_cache_key = f"{conversation_id}_turn_{state.turn_number}" + + state.cache_keys.append(turn_cache_key) + state.cumulative_tokens += user_message_tokens + assistant_response_tokens + state.turns_completed += 1 + + return state.turn_number, turn_cache_key + + def get_conversation_context_size(self, conversation_id: str) -> int: + """Gets the total number of tokens accumulated in a conversation.""" + with self.lock: + if conversation_id not in self.conversations: + return 0 + return self.conversations[conversation_id].cumulative_tokens + + def get_all_previous_turn_keys(self, conversation_id: str, current_turn: int) -> List[str]: + """Retrieves all cache keys from previous turns in a conversation.""" + with self.lock: + if conversation_id not in self.conversations: + return [] + state = self.conversations[conversation_id] + return [key for key in state.cache_keys if key != f"{conversation_id}_turn_{current_turn}"] + + def _evict_oldest_conversation(self): + """Evicts the least recently used (LRU) conversation to make space.""" + if not self.conversations: + return + oldest_conv_id = min( + self.conversations, + key=lambda k: (self.conversations[k].last_access, self.conversations[k].created_at) + ) + del self.conversations[oldest_conv_id] diff --git a/kv_cache_benchmark/kv_cache/models.py b/kv_cache_benchmark/kv_cache/models.py new file mode 100755 index 00000000..0b32981c --- /dev/null +++ b/kv_cache_benchmark/kv_cache/models.py @@ -0,0 +1,273 @@ +""" +Core data models for KV Cache Benchmark. + +Defines enums, dataclasses, and model configurations used throughout +the benchmark: ModelConfig, InferencePhase, GenerationMode, QoSLevel, +QoSSLA, UserProfile, InferenceRequest, etc. +""" + +import time +import random +from dataclasses import dataclass, field +from typing import Dict, List, Optional, Set +from enum import Enum +from datetime import datetime + +from kv_cache.config import cfg + + +# ============================================================================ +# CORE DATA MODELS +# ============================================================================ + +@dataclass +class ModelConfig: + """ + Configuration for a model's KV cache requirements. + + This dataclass holds the architectural parameters of an LLM that are essential + for calculating the size of its KV cache. + """ + name: str + num_layers: int # Number of transformer layers in the model. + hidden_dim: int # The size of the main hidden state vector. + num_heads: int # Number of attention heads for queries (Q). + kv_heads: int # Number of attention heads for keys/values (K/V). For GQA, kv_heads < num_heads. + dtype: str = 'float16' # Data type used for cache tensors (e.g., float16, bfloat16). + _kv_dim_override: int = 0 # Optional override for kv_dim_per_head (e.g., DeepSeek MLA uses 56) + attention_type: str = 'mha' # 'mha', 'gqa', or 'mla' + kv_lora_rank: int = 0 # MLA: compressed KV latent dimension (d_c) + qk_rope_head_dim: int = 0 # MLA: decoupled RoPE key dimension (d_R^h) + + @property + def bytes_per_element(self) -> int: + """Returns the size in bytes of a single element based on the data type.""" + dtype_map = {'float32': 4, 'float16': 2, 'bfloat16': 2, 'int8': 1} + return dtype_map.get(self.dtype, 2) + + @property + def kv_dim_per_head(self) -> int: + """Calculates the dimension of each Key/Value attention head.""" + if self._kv_dim_override > 0: + return self._kv_dim_override + return self.hidden_dim // self.num_heads + + @property + def kv_cache_size_per_token(self) -> int: + """ + Calculates the total memory in bytes required to store the KV cache for a single token. + + For MHA/GQA: num_layers * kv_heads * head_dim * 2 (K+V) * dtype_bytes + For MLA: num_layers * (kv_lora_rank + qk_rope_head_dim) * dtype_bytes + MLA jointly compresses K and V into a single latent vector (no ×2), + plus a shared RoPE key that must also be cached. + """ + if self.attention_type == 'mla': + return self.num_layers * (self.kv_lora_rank + self.qk_rope_head_dim) * self.bytes_per_element + return self.num_layers * self.kv_heads * self.kv_dim_per_head * 2 * self.bytes_per_element + + +_DEFAULT_MODEL_CONFIGS = { + 'tiny-1b': {'name': 'Tiny 1B', 'num_layers': 12, 'hidden_dim': 1024, 'num_heads': 8, 'kv_heads': 4, 'dtype': 'float16'}, + 'mistral-7b': {'name': 'Mistral 7B', 'num_layers': 32, 'hidden_dim': 4096, 'num_heads': 32, 'kv_heads': 8, 'dtype': 'float16'}, + 'llama2-7b': {'name': 'Llama 2 7B', 'num_layers': 32, 'hidden_dim': 4096, 'num_heads': 32, 'kv_heads': 32, 'dtype': 'float16'}, + 'llama3.1-8b': {'name': 'Llama 3.1 8B', 'num_layers': 32, 'hidden_dim': 4096, 'num_heads': 32, 'kv_heads': 8, 'dtype': 'float16'}, + 'llama3.1-70b-instruct': {'name': 'Llama 3.1 70B Instruct', 'num_layers': 80, 'hidden_dim': 8192, 'num_heads': 64, 'kv_heads': 8, 'dtype': 'float16'}, + 'deepseek-v3': {'name': 'DeepSeek V3', 'num_layers': 61, 'hidden_dim': 7168, 'num_heads': 128, 'kv_heads': 128, 'dtype': 'float16', + 'attention_type': 'mla', 'kv_lora_rank': 512, 'qk_rope_head_dim': 64}, + 'qwen3-32b': {'name': 'Qwen3 32B', 'num_layers': 64, 'hidden_dim': 5120, 'num_heads': 64, 'kv_heads': 8, 'kv_dim_per_head': 128, 'dtype': 'float16'}, + 'gpt-oss-120b': {'name': 'GPT OSS 120B (MoE)', 'num_layers': 36, 'hidden_dim': 2880, 'num_heads': 64, 'kv_heads': 8, 'kv_dim_per_head': 64, 'dtype': 'float16'}, + 'gpt-oss-20b': {'name': 'GPT OSS 20B (MoE)', 'num_layers': 24, 'hidden_dim': 2880, 'num_heads': 64, 'kv_heads': 8, 'kv_dim_per_head': 64, 'dtype': 'float16'}, +} + + +def get_model_configs() -> Dict[str, ModelConfig]: + """ + Returns model configurations, merging config.yaml values with defaults. + Models defined in YAML are added to/override the defaults. + """ + configs = {} + + # Get models from config.yaml (empty dict if not defined) + yaml_models = cfg('model_configs', default={}) + + # Merge: defaults + yaml (yaml overrides defaults) + all_model_keys = set(_DEFAULT_MODEL_CONFIGS.keys()) | set(yaml_models.keys()) + + for model_key in all_model_keys: + defaults = _DEFAULT_MODEL_CONFIGS.get(model_key, {}) + + configs[model_key] = ModelConfig( + name=cfg('model_configs', model_key, 'name', default=defaults.get('name', model_key)), + num_layers=cfg('model_configs', model_key, 'num_layers', default=defaults.get('num_layers', 32)), + hidden_dim=cfg('model_configs', model_key, 'hidden_dim', default=defaults.get('hidden_dim', 4096)), + num_heads=cfg('model_configs', model_key, 'num_heads', default=defaults.get('num_heads', 32)), + kv_heads=cfg('model_configs', model_key, 'kv_heads', default=defaults.get('kv_heads', 8)), + dtype=cfg('model_configs', model_key, 'dtype', default=defaults.get('dtype', 'float16')), + _kv_dim_override=cfg('model_configs', model_key, 'kv_dim_per_head', default=defaults.get('kv_dim_per_head', 0)), + attention_type=cfg('model_configs', model_key, 'attention_type', default=defaults.get('attention_type', 'mha')), + kv_lora_rank=cfg('model_configs', model_key, 'kv_lora_rank', default=defaults.get('kv_lora_rank', 0)), + qk_rope_head_dim=cfg('model_configs', model_key, 'qk_rope_head_dim', default=defaults.get('qk_rope_head_dim', 0)), + ) + + return configs + + +# For backward compatibility +MODEL_CONFIGS = get_model_configs() + + +# ============================================================================ +# PHASE-AWARE PROCESSING +# ============================================================================ + +class InferencePhase(Enum): + """Enumeration for the two main phases of LLM inference.""" + PREFILL = "prefill" + DECODE = "decode" + PREFILL_DECODE = "both" + + +class GenerationMode(Enum): + """Enumeration for token generation simulation modes.""" + NONE = "none" + FAST = "fast" + REALISTIC = "realistic" + +# Defines the sleep time per token to simulate GPU work for each mode. +GENERATION_TIMING = { + GenerationMode.NONE: 0.0, + GenerationMode.FAST: 0.002, + GenerationMode.REALISTIC: 0.030, +} + + +# ============================================================================ +# QOS SUPPORT +# ============================================================================ + +class QoSLevel(Enum): + """Enumeration for Quality of Service (QoS) levels, defining user priority.""" + INTERACTIVE = "interactive" + RESPONSIVE = "responsive" + BATCH = "batch" + + +@dataclass +class QoSSLA: + """ + Represents a Service Level Agreement (SLA) for a given QoS level. + Defines the performance targets and tracks violations. + """ + qos_level: QoSLevel + target_latency_p95_ms: float + target_latency_p99_ms: float + target_latency_p999_ms: float + target_latency_p9999_ms: float + priority: int + + # SLA violation tracking + violations: int = 0 + total_requests: int = 0 + + @property + def sla_compliance(self) -> float: + """Calculates the percentage of requests that met the SLA target.""" + if self.total_requests == 0: + return 1.0 + return 1.0 - (self.violations / self.total_requests) + + +# Default QoS profile values (overridden by config.yaml when loaded) +_DEFAULT_QOS_PROFILES = { + 'interactive': {'target_latency_p95_ms': 50, 'target_latency_p99_ms': 100, + 'target_latency_p999_ms': 150, 'target_latency_p9999_ms': 200, 'priority': 3}, + 'responsive': {'target_latency_p95_ms': 100, 'target_latency_p99_ms': 200, + 'target_latency_p999_ms': 350, 'target_latency_p9999_ms': 500, 'priority': 2}, + 'batch': {'target_latency_p95_ms': 1000, 'target_latency_p99_ms': 5000, + 'target_latency_p999_ms': 7500, 'target_latency_p9999_ms': 10000, 'priority': 1}, +} + + +def get_qos_profiles() -> Dict[QoSLevel, QoSSLA]: + """ + Returns QoS profiles, using config.yaml values if loaded, otherwise defaults. + """ + profiles = {} + for level in QoSLevel: + level_key = level.value + defaults = _DEFAULT_QOS_PROFILES[level_key] + + profiles[level] = QoSSLA( + qos_level=level, + target_latency_p95_ms=cfg('qos_profiles', level_key, 'target_latency_p95_ms', + default=defaults['target_latency_p95_ms']), + target_latency_p99_ms=cfg('qos_profiles', level_key, 'target_latency_p99_ms', + default=defaults['target_latency_p99_ms']), + target_latency_p999_ms=cfg('qos_profiles', level_key, 'target_latency_p999_ms', + default=defaults['target_latency_p999_ms']), + target_latency_p9999_ms=cfg('qos_profiles', level_key, 'target_latency_p9999_ms', + default=defaults['target_latency_p9999_ms']), + priority=cfg('qos_profiles', level_key, 'priority', default=defaults['priority']), + ) + return profiles + + +# For backward compatibility, QOS_PROFILES can still be used as a dict +# but code should prefer get_qos_profiles() to pick up config changes +QOS_PROFILES = get_qos_profiles() + + +# ============================================================================ +# USER AND REQUEST MODELS +# ============================================================================ + +@dataclass +class UserProfile: + """Represents a simulated user with specific behavior patterns.""" + user_id: str + context_length: int + generation_length: int + think_time: float + priority: int + qos_level: QoSLevel + session_start: datetime = field(default_factory=datetime.now) + total_latency: float = 0.0 + request_count: int = 0 + + +@dataclass +class InferenceRequest: + """Represents a single, atomic inference request sent to the benchmark.""" + user_id: str + request_id: str + timestamp: datetime + context_tokens: int + generate_tokens: int + priority: int + phase: InferencePhase = InferencePhase.PREFILL_DECODE + qos_level: QoSLevel = QoSLevel.BATCH + cache_key: Optional[str] = None + + # Timing fields to track latency at different stages. + submit_time: float = field(default_factory=time.perf_counter) + start_time: float = 0 + complete_time: float = 0 + + # Conversation tracking for stateful workloads. + conversation_id: Optional[str] = None + turn_number: int = 0 + + def __post_init__(self): + if self.cache_key is None: + if self.conversation_id: + self.cache_key = f"{self.conversation_id}_turn_{self.turn_number}" + else: + self.cache_key = f"{self.user_id}_ctx" + + @property + def total_latency_ms(self) -> float: + """Calculates the total end-to-end latency for the request in milliseconds.""" + if self.complete_time == 0: + return 0 + return (self.complete_time - self.submit_time) * 1000 diff --git a/kv_cache_benchmark/kv_cache/monitoring.py b/kv_cache_benchmark/kv_cache/monitoring.py new file mode 100755 index 00000000..0bbf8240 --- /dev/null +++ b/kv_cache_benchmark/kv_cache/monitoring.py @@ -0,0 +1,329 @@ +""" +Monitoring, autoscaling, and QoS tracking for KV Cache Benchmark. + +Contains StorageMetrics, StorageMonitor, WorkloadAutoscaler, and QoSMonitor. +""" + +import time +import logging +import threading +from dataclasses import dataclass +from typing import Dict, List, Optional, Tuple + +import numpy as np + +from kv_cache.config import cfg +from kv_cache.models import QoSLevel, QoSSLA, QOS_PROFILES, InferenceRequest + +logger = logging.getLogger(__name__) + + +# ============================================================================ +# ADAPTIVE AUTOSCALING +# ============================================================================ + +@dataclass +class StorageMetrics: + """A snapshot of storage performance metrics at a point in time.""" + timestamp: float + read_throughput_gbps: float + write_throughput_gbps: float + read_iops: int + write_iops: int + read_latency_p95_ms: float + write_latency_p95_ms: float + queue_depth: int + is_saturated: bool = False + saturation_level: float = 0.0 + + +class StorageMonitor: + """Monitors storage performance in real-time to feed the autoscaler.""" + + def __init__(self, benchmark_instance, sampling_interval_ms: float = 100): + self.benchmark_instance = benchmark_instance + self.sampling_interval = sampling_interval_ms / 1000.0 + self.last_collection_time = None + self.last_total_read = 0 + self.last_total_write = 0 + self.metrics_history = [] + self.lock = threading.Lock() + + def collect_metrics(self, cache, queue_size): + """Collects all relevant performance metrics.""" + now = time.time() + if self.last_collection_time is None: + self.last_collection_time = now + self.last_total_read = cache.stats.get('total_read_bytes', 0) + self.last_total_write = cache.stats.get('total_write_bytes', 0) + return {} + + elapsed = now - self.last_collection_time + if elapsed == 0: + return {} + + stats = cache.get_stats(duration=self.benchmark_instance.duration) + current_total_read = stats.get('total_read_bytes', 0) + current_total_write = stats.get('total_write_bytes', 0) + + read_delta = max(current_total_read - self.last_total_read, 0) + write_delta = max(current_total_write - self.last_total_write, 0) + + read_throughput = (read_delta / 1024**3) / elapsed + write_throughput = (write_delta / 1024**3) / elapsed + + queue_depth = queue_size + + read_iops = int((read_delta / 4096) / elapsed) if elapsed > 0 else 0 + write_iops = int((write_delta / (16 * 1024)) / elapsed) if elapsed > 0 else 0 + + read_latency_p95_ms = stats.get('storage_read_p95_ms', 0.0) + write_latency_p95_ms = stats.get('storage_write_p95_ms', 0.0) + + # Saturation Detection Logic + read_lat_threshold = cfg('saturation_detection', 'read_latency_p95_threshold_ms', default=100) + write_lat_threshold = cfg('saturation_detection', 'write_latency_p95_threshold_ms', default=50) + queue_depth_threshold = cfg('saturation_detection', 'queue_depth_threshold', default=100) + + is_saturated = False + if len(self.metrics_history) >= 2: + prev_metric = self.metrics_history[-2] + if (prev_metric.read_latency_p95_ms < read_lat_threshold and + prev_metric.write_latency_p95_ms < write_lat_threshold and + prev_metric.queue_depth < queue_depth_threshold): + if (abs(prev_metric.read_latency_p95_ms - read_latency_p95_ms) > 20 or + abs(prev_metric.write_latency_p95_ms - write_latency_p95_ms) > 10 or + abs(prev_metric.queue_depth - queue_depth) > 10): + is_saturated = True + else: + if (read_latency_p95_ms > read_lat_threshold * 1.2 or + write_latency_p95_ms > write_lat_threshold * 1.2 or + queue_depth > queue_depth_threshold * 1.2): + is_saturated = True + + metrics = StorageMetrics( + timestamp=now, + read_throughput_gbps=read_throughput, + write_throughput_gbps=write_throughput, + read_iops=read_iops, + write_iops=write_iops, + read_latency_p95_ms=read_latency_p95_ms, + write_latency_p95_ms=write_latency_p95_ms, + queue_depth=queue_depth, + is_saturated=is_saturated + ) + + with self.lock: + self.metrics_history.append(metrics) + saturation_level = self._compute_saturation_from_history(self.metrics_history) + + metrics.saturation_level = saturation_level + + self.last_collection_time = now + self.last_total_read = current_total_read + self.last_total_write = current_total_write + return metrics + + def get_saturation_level(self) -> float: + """Calculates the storage saturation level (0.0 = idle, 1.0 = saturated).""" + with self.lock: + history_snapshot = list(self.metrics_history) + + return self._compute_saturation_from_history(history_snapshot) + + def _compute_saturation_from_history(self, history: List[StorageMetrics]) -> float: + if len(history) < 10: + return 0.0 + + recent_metrics = history[-10:] + + latencies = [m.read_latency_p95_ms for m in recent_metrics] + if len(latencies) > 1: + latency_trend = np.polyfit(range(len(latencies)), latencies, 1)[0] + else: + latency_trend = 0 + + throughputs = [m.read_throughput_gbps + m.write_throughput_gbps for m in recent_metrics] + throughput_variance = np.std(throughputs) / (np.mean(throughputs) + 0.01) + + latency_factor = min(max(latencies) / 100, 1.0) + plateau_factor = 1.0 if throughput_variance < 0.1 and latency_trend > 0 else 0.5 + + saturation = latency_factor * plateau_factor + return min(saturation, 1.0) + + +class WorkloadAutoscaler: + """Automatically scales the number of simulated users to find a performance limit.""" + + def __init__(self, + mode: str = 'qos', + initial_users: int = 10, + target_saturation: float = 0.8, + scale_interval_seconds: int = 10): + self.mode = mode + self.current_users = initial_users + self.target_saturation = target_saturation + self.scale_interval = scale_interval_seconds + self.min_users = cfg('autoscaler', 'min_users', default=1) + self.max_users = cfg('autoscaler', 'max_users', default=10000) + self.scale_up_factor = cfg('autoscaler', 'scale_up_factor', default=1.2) + self.scale_down_factor = cfg('autoscaler', 'scale_down_factor', default=0.8) + self.consecutive_samples_required = cfg('autoscaler', 'consecutive_samples_required', default=2) + self.scaling_history = [] + self.lock = threading.Lock() + + self.cooldown_counter = 0 + self.cooldown_period = 3 + self.downward_trend_count = 0 + + self.capacity_stage = 0 + self.last_throughput = 0.0 + self.peak_throughput = 0.0 + self.peak_user_count = 0 + self.capacity_test_finished = False + self.throughput_history: List[float] = [] + self.capacity_initial_fraction = 0.4 + self.capacity_scale_fraction = 0.2 + self.capacity_min_step = 5 + self.capacity_max_step = 100 + + def calculate_scale_action( + self, + metrics: Optional[StorageMetrics], + current_throughput: float, + saturation_level: Optional[float] = None + ) -> Tuple[str, int]: + """Decides the next scaling action based on the selected mode.""" + if self.mode == 'qos': + if not metrics: return 'stable', self.current_users + return self._calculate_qos_action(metrics, saturation_level) + elif self.mode == 'capacity': + return self._calculate_capacity_action(current_throughput) + return 'stable', self.current_users + + def _calculate_qos_action(self, metrics: StorageMetrics, saturation_level: Optional[float]) -> Tuple[str, int]: + """Determines the scaling action for 'qos' mode.""" + with self.lock: + if self.cooldown_counter > 0: + self.cooldown_counter -= 1 + return 'hold', self.current_users + + saturation = saturation_level + if saturation is None: + saturation = 1.0 if metrics.is_saturated else 0.0 + + action = 'hold' + target_users = self.current_users + + if saturation > self.target_saturation * 1.1: + self.downward_trend_count += 1 + if self.downward_trend_count >= 2: + target_users = max(int(self.current_users * 0.8), self.min_users) + if target_users < self.current_users: + self.current_users = target_users + self.cooldown_counter = self.cooldown_period + action = 'scale_down' + elif saturation < self.target_saturation * 0.9: + self.downward_trend_count = 0 + target_users = min(int(self.current_users * 1.2), self.max_users) + if target_users > self.current_users: + self.current_users = target_users + action = 'scale_up' + else: + self.downward_trend_count = 0 + + return action, self.current_users + return 'hold', self.current_users + + def _calculate_capacity_action(self, current_throughput: float) -> Tuple[str, int]: + """Determines the scaling action for 'capacity' mode.""" + with self.lock: + self.throughput_history.append(current_throughput) + + if not self.throughput_history or len(self.throughput_history) == 1: + self.peak_throughput = current_throughput + self.peak_user_count = self.current_users + step = self._compute_capacity_step(self.capacity_initial_fraction) + new_users = min(self.current_users + step, self.max_users) + if new_users > self.current_users: + self.current_users = new_users + return 'scale_up', self.current_users + return 'hold', self.current_users + + if current_throughput > self.peak_throughput * 1.01: + self.peak_throughput = current_throughput + self.peak_user_count = self.current_users + self.downward_trend_count = 0 + step = self._compute_capacity_step(self.capacity_scale_fraction) + new_users = min(self.current_users + step, self.max_users) + if new_users > self.current_users: + self.current_users = new_users + return 'scale_up', self.current_users + return 'hold', self.current_users + + self.downward_trend_count += 1 + if self.downward_trend_count >= 2: + self.capacity_test_finished = True + logger.info(f"Peak capacity found at {self.peak_throughput:.2f} tok/s. Stopping test.") + return 'stop', self.current_users + + return 'hold', self.current_users + return 'hold', self.current_users + + def _compute_capacity_step(self, fraction: float) -> int: + """Calculate a bounded capacity-mode step for smoother scaling.""" + raw_step = max(int(self.current_users * fraction), self.capacity_min_step) + return min(raw_step, self.capacity_max_step) + + +# ============================================================================ +# QOS MONITORING +# ============================================================================ + +class QoSMonitor: + """Monitors and reports on QoS compliance in real-time.""" + + def __init__(self): + self.requests_by_qos: Dict[QoSLevel, List[InferenceRequest]] = {level: [] for level in QoSLevel} + self.lock = threading.Lock() + self.violations_by_qos: Dict[QoSLevel, int] = {level: 0 for level in QoSLevel} + + def record_request(self, request: InferenceRequest): + """Records a completed request and checks if it violated its SLA.""" + with self.lock: + self.requests_by_qos[request.qos_level].append(request) + + sla = QOS_PROFILES[request.qos_level] + if request.total_latency_ms > sla.target_latency_p95_ms: + self.violations_by_qos[request.qos_level] += 1 + sla.violations += 1 + sla.total_requests += 1 + + def get_qos_metrics(self, qos_level: QoSLevel) -> Dict: + """Gets performance metrics for a specific QoS level.""" + with self.lock: + requests = self.requests_by_qos[qos_level] + if not requests: return {'no_data': True} + + latencies = [r.total_latency_ms for r in requests] + sla = QOS_PROFILES[qos_level] + + return { + 'total_requests': len(requests), + 'latency_ms': { + 'mean': np.mean(latencies), 'p50': np.percentile(latencies, 50), + 'p95': np.percentile(latencies, 95), 'p99': np.percentile(latencies, 99), + 'max': np.max(latencies), + }, + 'sla': { + 'target_p95_ms': sla.target_latency_p95_ms, + 'actual_p95_ms': np.percentile(latencies, 95), + 'compliance': sla.sla_compliance, + 'met': sla.sla_compliance >= 0.95 + } + } + + def get_all_qos_metrics(self) -> Dict: + """Gets metrics for all QoS levels.""" + return {level.value: self.get_qos_metrics(level) for level in QoSLevel} diff --git a/kv_cache_benchmark/kv_cache/prefix_cache.py b/kv_cache_benchmark/kv_cache/prefix_cache.py new file mode 100755 index 00000000..24a2792a --- /dev/null +++ b/kv_cache_benchmark/kv_cache/prefix_cache.py @@ -0,0 +1,133 @@ +""" +Hierarchical prefix caching for KV Cache Benchmark. + +Models the reuse of common prompts (e.g., system prompts) across +users to reduce redundant cache allocations. +""" + +import hashlib +import random +import threading +from dataclasses import dataclass, field +from typing import Dict, Optional, Set, Tuple +from datetime import datetime +from enum import Enum + +from kv_cache.config import cfg +from kv_cache.models import ModelConfig, InferenceRequest + + +class PrefixType(Enum): + """Enumeration for the different tiers of prefix caching.""" + SYSTEM_PROMPT = "system_prompt" + COMMON_PHRASE = "common_phrase" + USER_SPECIFIC = "user_specific" + + +@dataclass +class PrefixCacheEntry: + """Represents a cached prefix.""" + prefix_key: str + prefix_type: PrefixType + text_hash: str + token_count: int + kv_cache_key: str + + # Usage statistics to track popularity and reuse. + use_count: int = 0 + first_seen: datetime = field(default_factory=datetime.now) + last_used: datetime = field(default_factory=datetime.now) + users_using: Set[str] = field(default_factory=set) + + # Storage information. + storage_tier: str = "" + size_bytes: int = 0 + + +class PrefixMatcher: + """Detects and matches common prefixes in requests to enable reuse.""" + + COMMON_SYSTEM_PROMPTS = [ + "You are a helpful assistant.", + "You are an AI assistant helping with coding tasks.", + "You are a professional writing assistant.", + ] + + def __init__(self, min_prefix_length: int = None): + self.min_prefix_length = min_prefix_length if min_prefix_length is not None else cfg('prefix_cache', 'min_prefix_length', default=50) + self.prefix_index: Dict[str, PrefixCacheEntry] = {} + self.prefix_frequency: Dict[str, int] = {} + self.lock = threading.Lock() + + def hash_prefix(self, text: str, token_count: int) -> str: + """Creates a deterministic hash for a given text prefix.""" + content = f"{text[:500]}_{token_count}" + return hashlib.sha256(content.encode()).hexdigest()[:16] + + def detect_system_prompt(self, context_tokens: int) -> Optional[PrefixCacheEntry]: + """Simulates the detection of a common system prompt at the start of a request.""" + system_prompt_hit_probability = cfg('prefix_cache', 'system_prompt_hit_probability', default=0.2) + if random.random() < system_prompt_hit_probability: + system_prompt = random.choice(self.COMMON_SYSTEM_PROMPTS) + prefix_hash = self.hash_prefix(system_prompt, len(system_prompt.split())) + + with self.lock: + if prefix_hash in self.prefix_index: + entry = self.prefix_index[prefix_hash] + entry.use_count += 1 + entry.last_used = datetime.now() + return entry + else: + entry = PrefixCacheEntry( + prefix_key=f"system_{prefix_hash}", + prefix_type=PrefixType.SYSTEM_PROMPT, + text_hash=prefix_hash, + token_count=len(system_prompt.split()), + kv_cache_key=f"kv_system_{prefix_hash}", + use_count=1 + ) + self.prefix_index[prefix_hash] = entry + return entry + return None + + +class PrefixCacheManager: + """Orchestrates the prefix matching and caching logic.""" + + def __init__(self, cache, max_prefix_entries: int = None): + self.cache = cache + self.max_prefix_entries = max_prefix_entries if max_prefix_entries is not None else cfg('prefix_cache', 'max_prefix_entries', default=1000) + self.prefix_matcher = PrefixMatcher() + self.lock = threading.Lock() + + self.stats = { + 'prefix_hits': 0, + 'prefix_misses': 0, + 'system_prompt_reuse': 0, + 'common_phrase_reuse': 0, + 'bytes_saved': 0 + } + + def check_prefix_cache(self, request: InferenceRequest, model_config: ModelConfig) -> Tuple[Optional[PrefixCacheEntry], int]: + """ + Checks if the beginning of a request matches a known, cached prefix. + + Returns: + A tuple containing the PrefixCacheEntry if a hit occurs (or None), + and the number of remaining (non-prefixed) tokens in the request. + """ + prefix_entry = self.prefix_matcher.detect_system_prompt(request.context_tokens) + + if prefix_entry: + with self.lock: + self.stats['prefix_hits'] += 1 + if prefix_entry.prefix_type == PrefixType.SYSTEM_PROMPT: + self.stats['system_prompt_reuse'] += 1 + self.stats['bytes_saved'] += prefix_entry.token_count * model_config.kv_cache_size_per_token + + remaining_tokens = max(0, request.context_tokens - prefix_entry.token_count) + return prefix_entry, remaining_tokens + else: + with self.lock: + self.stats['prefix_misses'] += 1 + return None, request.context_tokens diff --git a/kv_cache_benchmark/kv_cache/rag.py b/kv_cache_benchmark/kv_cache/rag.py new file mode 100755 index 00000000..2deb9d99 --- /dev/null +++ b/kv_cache_benchmark/kv_cache/rag.py @@ -0,0 +1,246 @@ +""" +RAG (Retrieval-Augmented Generation) workload modeling for KV Cache Benchmark. + +Simulates document ingestion, chunking, and retrieval patterns that +stress the cache with large context sizes and unique I/O patterns. +""" + +import random +import logging +import threading +from dataclasses import dataclass, field +from typing import Dict, List, Optional, Tuple +from datetime import datetime + +import numpy as np + +from kv_cache.config import cfg +from kv_cache.models import ModelConfig, InferenceRequest + +logger = logging.getLogger(__name__) + + +@dataclass +class RAGChunk: + """Represents a single chunk of a document in a RAG system.""" + chunk_id: str + doc_id: str + chunk_index: int + token_count: int + kv_cache_key: str + + access_count: int = 0 + last_accessed: datetime = field(default_factory=datetime.now) + storage_tier: str = "" + size_bytes: int = 0 + + +@dataclass +class RAGDocument: + """Represents a document that has been chunked for RAG.""" + doc_id: str + total_tokens: int + chunk_size: int + chunks: List[RAGChunk] = field(default_factory=list) + + @property + def num_chunks(self) -> int: + return len(self.chunks) + + +@dataclass +class RAGQuery: + """Represents a RAG query that retrieves document chunks.""" + query_id: str + query_tokens: int + retrieved_chunks: List[RAGChunk] + generation_tokens: int + + @property + def total_context_tokens(self) -> int: + """The total context is the user's query plus all retrieved document chunks.""" + return self.query_tokens + sum(c.token_count for c in self.retrieved_chunks) + + +class RAGDocumentManager: + """Manages the ingestion and retrieval of RAG document chunks.""" + + # Supported retrieval distributions + DISTRIBUTIONS = ('zipfian', 'uniform', 'random') + + def __init__(self, cache, chunk_size: int = None, top_k_chunks: int = None): + self.cache = cache + self.chunk_size = chunk_size if chunk_size is not None else cfg('rag', 'chunk_size_tokens', default=512) + self.top_k_chunks = top_k_chunks if top_k_chunks is not None else cfg('rag', 'top_k_chunks', default=5) + self.max_documents = cfg('rag', 'max_documents', default=0) # 0 = unlimited + self.retrieval_distribution = cfg('rag', 'retrieval_distribution', default='zipfian') + if self.retrieval_distribution not in self.DISTRIBUTIONS: + logger.warning(f"Unknown retrieval distribution '{self.retrieval_distribution}', defaulting to 'zipfian'") + self.retrieval_distribution = 'zipfian' + self.documents: Dict[str, RAGDocument] = {} + self.chunk_index: Dict[str, RAGChunk] = {} + self.lock = threading.Lock() + self.ingestion_order: List[str] = [] # Track order for LRU eviction + + # Statistics + self.stats = { + 'documents_ingested': 0, + 'documents_evicted': 0, + 'chunks_created': 0, + 'retrieval_requests': 0, + 'chunks_retrieved': 0, + } + + def ingest_document(self, doc_id: str, total_tokens: int, model_config: ModelConfig): + """ + Simulates the ingestion of a document. + Splits it into chunks and stores the KV cache for each chunk. + """ + max_chunk_bytes = cfg('rag', 'max_chunk_bytes', default=256 * 1024**2) + bytes_per_token = max(model_config.kv_cache_size_per_token, 1) + max_tokens_per_chunk = max(1, min(self.chunk_size, max_chunk_bytes // bytes_per_token)) + + if max_tokens_per_chunk < self.chunk_size: + logger.debug(f"Adjusting chunk size for {doc_id} to {max_tokens_per_chunk} tokens " + f"to stay under {max_chunk_bytes / 1024**2:.0f} MB per chunk.") + + num_chunks = (total_tokens + max_tokens_per_chunk - 1) // max_tokens_per_chunk + + doc = RAGDocument( + doc_id=doc_id, + total_tokens=total_tokens, + chunk_size=max_tokens_per_chunk, + chunks=[] + ) + + for chunk_idx in range(num_chunks): + remaining_tokens = total_tokens - chunk_idx * max_tokens_per_chunk + chunk_tokens = min(max_tokens_per_chunk, remaining_tokens) + + chunk = RAGChunk( + chunk_id=f"{doc_id}_chunk_{chunk_idx}", + doc_id=doc_id, + chunk_index=chunk_idx, + token_count=chunk_tokens, + kv_cache_key=f"rag_{doc_id}_chunk_{chunk_idx}" + ) + + try: + success, location, write_latency = self.cache.allocate_cache( + key=chunk.kv_cache_key, + num_tokens=chunk_tokens + ) + except MemoryError: + logger.error(f"MemoryError while ingesting chunk {chunk.chunk_id}; skipping remaining chunks.") + break + except Exception as exc: + logger.error(f"Error ingesting chunk {chunk.chunk_id}: {exc}") + continue + + if not success: + logger.warning(f"Failed to allocate cache for chunk {chunk.chunk_id}.") + continue + + chunk.storage_tier = location + chunk.size_bytes = chunk_tokens * model_config.kv_cache_size_per_token + + doc.chunks.append(chunk) + self.chunk_index[chunk.chunk_id] = chunk + + with self.lock: + # Evict oldest documents if we've hit the limit + if self.max_documents > 0: + while len(self.documents) >= self.max_documents: + self._evict_oldest_document_unlocked() + + self.documents[doc_id] = doc + self.ingestion_order.append(doc_id) + self.stats['documents_ingested'] += 1 + self.stats['chunks_created'] += len(doc.chunks) + return doc + + def _evict_oldest_document_unlocked(self): + """Evict the oldest document to free cache space. Must be called with lock held.""" + if not self.ingestion_order: + return + + oldest_doc_id = self.ingestion_order.pop(0) + if oldest_doc_id not in self.documents: + return + + doc = self.documents[oldest_doc_id] + for chunk in doc.chunks: + try: + self.cache.delete(chunk.kv_cache_key) + except Exception as e: + logger.debug(f"Could not delete cache for chunk {chunk.chunk_id}: {e}") + if chunk.chunk_id in self.chunk_index: + del self.chunk_index[chunk.chunk_id] + + del self.documents[oldest_doc_id] + self.stats['documents_evicted'] += 1 + logger.debug(f"Evicted RAG document {oldest_doc_id} ({doc.num_chunks} chunks)") + + def evict_oldest_document(self): + """Evict the oldest document to free cache space (thread-safe).""" + with self.lock: + self._evict_oldest_document_unlocked() + + def _compute_chunk_probabilities(self, num_chunks: int) -> Optional[List[float]]: + """ + Compute selection probabilities based on configured distribution. + + Returns: + List of probabilities, or None for uniform random selection. + """ + if self.retrieval_distribution in ('uniform', 'random'): + # Uniform: all chunks equally likely (None tells np.random.choice to use uniform) + return None + elif self.retrieval_distribution == 'zipfian': + # Zipfian: earlier chunks are more likely (1/1, 1/2, 1/3, ...) + # This models real RAG where document intros/summaries are often most relevant + probs = [1.0 / (i + 1) for i in range(num_chunks)] + total = sum(probs) + return [p / total for p in probs] + else: + # Fallback to uniform + return None + + def retrieve_chunks(self, doc_id: str) -> List[RAGChunk]: + """ + Simulates the retrieval of the top-k most relevant chunks for a query. + + The chunk selection distribution is configurable via 'rag.retrieval_distribution': + - 'zipfian': Earlier chunks more likely (realistic) + - 'uniform'/'random': All chunks equally likely + """ + with self.lock: + if doc_id not in self.documents: + return [] + doc = self.documents[doc_id] + self.stats['retrieval_requests'] += 1 + + chunk_probabilities = self._compute_chunk_probabilities(len(doc.chunks)) + + retrieved_indices = np.random.choice( + len(doc.chunks), + size=min(self.top_k_chunks, len(doc.chunks)), + replace=False, + p=chunk_probabilities + ) + + retrieved_chunks = [doc.chunks[i] for i in retrieved_indices] + + for chunk in retrieved_chunks: + chunk.access_count += 1 + chunk.last_accessed = datetime.now() + + with self.lock: + self.stats['chunks_retrieved'] += len(retrieved_chunks) + + return retrieved_chunks + + def get_stats(self) -> Dict: + """Returns a copy of the current statistics.""" + with self.lock: + return dict(self.stats) \ No newline at end of file diff --git a/kv_cache_benchmark/kv_cache/workload.py b/kv_cache_benchmark/kv_cache/workload.py new file mode 100755 index 00000000..d1538998 --- /dev/null +++ b/kv_cache_benchmark/kv_cache/workload.py @@ -0,0 +1,426 @@ +""" +Workload generation and validation for KV Cache Benchmark. + +Contains ValidationEngine, UserSimulator, ShareGPTDatasetLoader, +and RealTraceEntry for trace-driven validation. +""" + +import os +import json +import random +import logging +import argparse +from dataclasses import dataclass +from pathlib import Path +from typing import Dict, List, Optional, Tuple + +import numpy as np + +from kv_cache._compat import TIKTOKEN_AVAILABLE +from kv_cache.config import cfg +from kv_cache.models import ( + QoSLevel, UserProfile, InferenceRequest, +) + +if TIKTOKEN_AVAILABLE: + import tiktoken + +logger = logging.getLogger(__name__) + + +# ============================================================================ +# TRACE-DRIVEN VALIDATION +# ============================================================================ + +@dataclass +class RealTraceEntry: + """Represents a single entry from a real-world LLM inference trace file.""" + timestamp: float + request_id: str + user_id: str + context_tokens: int + generation_tokens: int + phase: str + cache_hit: bool + cache_tier: str + read_bytes: int + write_bytes: int + read_latency_ms: float + write_latency_ms: float + model_name: str + conversation_id: Optional[str] = None + turn_number: Optional[int] = None + prefix_cached: bool = False + + +class ValidationEngine: + """Validates benchmark accuracy against real-world traces.""" + + def __init__(self, trace_path: Optional[str] = None): + self.trace_path = trace_path + self.trace_stats = None + + def load_trace(self) -> Dict: + """Loads and analyzes a trace file, or returns synthetic stats if none provided.""" + if not self.trace_path or not os.path.exists(self.trace_path): + return { + 'total_requests': 1000, 'duration_seconds': 100, 'cache_hit_rate': 0.65, + 'read_write_ratio': 10.0, 'context_tokens_mean': 1024, 'generation_tokens_mean': 200, + } + + with open(self.trace_path, 'r') as f: + data = json.load(f) + entries = [RealTraceEntry(**entry) for entry in data] + + self.trace_stats = { + 'total_requests': len(entries), + 'cache_hit_rate': sum(1 for e in entries if e.cache_hit) / len(entries), + 'read_write_ratio': sum(e.read_bytes for e in entries) / max(sum(e.write_bytes for e in entries), 1), + 'context_tokens_mean': np.mean([e.context_tokens for e in entries]), + 'generation_tokens_mean': np.mean([e.generation_tokens for e in entries]), + } + return self.trace_stats + + def validate_benchmark(self, benchmark_results: Dict) -> Dict: + """Compares key benchmark results against the trace to calculate an error percentage.""" + if self.trace_stats is None: + self.trace_stats = self.load_trace() + + summary = benchmark_results.get('summary', {}) + cache_stats = summary.get('cache_stats', {}) + comparison = {} + + bench_hit_rate = cache_stats.get('cache_hit_rate', 0) + trace_hit_rate = self.trace_stats['cache_hit_rate'] + hit_rate_error = abs(bench_hit_rate - trace_hit_rate) / trace_hit_rate * 100 + + comparison['cache_hit_rate'] = { + 'benchmark': bench_hit_rate, 'trace': trace_hit_rate, + 'error_pct': hit_rate_error, 'within_5pct': hit_rate_error <= 5.0 + } + + errors = [comp['error_pct'] for comp in comparison.values() if 'error_pct' in comp] + avg_error = np.mean(errors) if errors else 0 + passed = avg_error <= 5.0 + + return { + 'passed': passed, 'avg_error_pct': avg_error, + 'comparison': comparison, 'trace_stats': self.trace_stats + } + + +# ============================================================================ +# INPUT VALIDATION +# ============================================================================ + +# Validation constants with documented rationale +MAX_USERS = 100000 +MAX_DURATION_SECONDS = 86400 +MAX_GPU_MEMORY_GB = 1024 +MAX_CPU_MEMORY_GB = 16384 + +FORBIDDEN_CACHE_PREFIXES = frozenset([ + '/etc', '/bin', '/sbin', '/usr/bin', '/usr/sbin', + '/boot', '/sys', '/proc', '/dev', '/root' +]) + + +def validate_args(args: argparse.Namespace) -> argparse.Namespace: + """ + Validate command-line arguments to catch invalid values early. + + Args: + args: Parsed argparse namespace + + Returns: + The validated args namespace + + Raises: + ValueError: If any validation check fails + """ + errors = [] + + if args.num_users <= 0: + errors.append(f"--num-users must be positive, got {args.num_users}") + if args.num_users > MAX_USERS: + errors.append(f"--num-users exceeds limit ({MAX_USERS}), got {args.num_users}") + + if args.duration <= 0: + errors.append(f"--duration must be positive, got {args.duration}") + if args.duration > MAX_DURATION_SECONDS: + errors.append(f"--duration exceeds 24 hours ({MAX_DURATION_SECONDS}s), got {args.duration}") + + if args.gpu_mem_gb < 0: + errors.append(f"--gpu-mem-gb cannot be negative, got {args.gpu_mem_gb}") + if args.gpu_mem_gb > MAX_GPU_MEMORY_GB: + errors.append(f"--gpu-mem-gb exceeds limit ({MAX_GPU_MEMORY_GB}GB), got {args.gpu_mem_gb}") + + if args.cpu_mem_gb < 0: + errors.append(f"--cpu-mem-gb cannot be negative, got {args.cpu_mem_gb}") + if args.cpu_mem_gb > MAX_CPU_MEMORY_GB: + errors.append(f"--cpu-mem-gb exceeds limit ({MAX_CPU_MEMORY_GB}GB), got {args.cpu_mem_gb}") + + if args.rag_num_docs < 0: + errors.append(f"--rag-num-docs cannot be negative, got {args.rag_num_docs}") + + if args.max_conversations <= 0: + errors.append(f"--max-conversations must be positive, got {args.max_conversations}") + + if args.max_concurrent_allocs < 0: + errors.append(f"--max-concurrent-allocs cannot be negative, got {args.max_concurrent_allocs}") + + if args.request_rate < 0: + errors.append(f"--request-rate cannot be negative, got {args.request_rate}") + + if args.max_requests < 0: + errors.append(f"--max-requests cannot be negative, got {args.max_requests}") + + if args.storage_capacity_gb < 0: + errors.append(f"--storage-capacity-gb cannot be negative, got {args.storage_capacity_gb}") + + if args.precondition_size_gb < 0: + errors.append(f"--precondition-size-gb cannot be negative, got {args.precondition_size_gb}") + + if args.precondition_threads < 0: + errors.append(f"--precondition-threads cannot be negative, got {args.precondition_threads}") + + if args.trace_speedup < 0: + errors.append(f"--trace-speedup cannot be negative, got {args.trace_speedup}") + + if args.replay_cycles < 0: + errors.append(f"--replay-cycles cannot be negative, got {args.replay_cycles}") + + if not (0.0 <= args.target_saturation <= 1.0): + errors.append(f"--target-saturation must be between 0.0 and 1.0, got {args.target_saturation}") + + if args.cache_dir: + cache_path = Path(args.cache_dir).resolve() + cache_path_str = str(cache_path) + + for prefix in FORBIDDEN_CACHE_PREFIXES: + if cache_path_str.startswith(prefix): + errors.append(f"--cache-dir cannot be a system directory: {cache_path}") + break + + parent = cache_path.parent + if parent.exists() and not os.access(parent, os.W_OK): + errors.append(f"--cache-dir parent is not writable: {parent}") + + if errors: + for error in errors: + logger.error(f"Validation error: {error}") + raise ValueError(f"Invalid arguments:\n " + "\n ".join(errors)) + + return args + + +# ============================================================================ +# USER SIMULATION AND WORKLOAD GENERATION +# ============================================================================ + +class UserSimulator: + """Generates realistic user workloads based on pre-defined templates.""" + + DEFAULT_USER_TEMPLATES = { + 'chatbot': { + 'context_range': (512, 4096), 'generation_range': (50, 200), 'think_time_range': (0.1, 0.5), + }, + 'coding': { + 'context_range': (4096, 25000), 'generation_range': (100, 500), 'think_time_range': (0.2, 1.0), + }, + 'document': { + 'context_range': (4096, 16384), 'generation_range': (200, 800), 'think_time_range': (0.3, 1.5), + }, + } + + @classmethod + def _get_user_templates(cls) -> Dict: + """Get user templates from config, falling back to defaults.""" + templates = {} + for user_type in ['chatbot', 'coding', 'document']: + default = cls.DEFAULT_USER_TEMPLATES[user_type] + templates[user_type] = { + 'context_range': tuple(cfg('user_templates', user_type, 'context_range', default=list(default['context_range']))), + 'generation_range': tuple(cfg('user_templates', user_type, 'generation_range', default=list(default['generation_range']))), + 'think_time_range': tuple(cfg('user_templates', user_type, 'think_time_range', default=list(default['think_time_range']))), + } + return templates + + @classmethod + def generate_user(cls, user_id: str, user_type: str = 'chatbot', priority: int = 1, + qos_level: QoSLevel = QoSLevel.BATCH) -> UserProfile: + """Generates a single user profile based on a template.""" + templates = cls._get_user_templates() + template = templates.get(user_type, templates['chatbot']) + return UserProfile( + user_id=user_id, + context_length=random.randint(*template['context_range']), + generation_length=random.randint(*template['generation_range']), + think_time=random.uniform(*template['think_time_range']), + priority=priority, + qos_level=qos_level + ) + + @classmethod + def generate_mixed_users(cls, num_users: int) -> List[UserProfile]: + """Generates a list of users with a realistic distribution of types and QoS levels.""" + interactive_prob = cfg('qos_distribution', 'interactive_probability', default=0.15) + responsive_threshold = cfg('qos_distribution', 'responsive_threshold', default=0.50) + + users = [] + for i in range(num_users): + user_type = random.choice(['chatbot', 'coding', 'document']) + + rand = random.random() + if rand < interactive_prob: + qos_level, priority = QoSLevel.INTERACTIVE, 3 + elif rand < responsive_threshold: + qos_level, priority = QoSLevel.RESPONSIVE, 2 + else: + qos_level, priority = QoSLevel.BATCH, 1 + + users.append(cls.generate_user(f"user_{i:04d}", user_type, priority, qos_level)) + return users + + +# ============================================================================ +# SHAREGPT DATASET LOADER +# ============================================================================ + +class ShareGPTDatasetLoader: + """ + Loads ShareGPT conversation data and provides realistic request patterns. + """ + + def __init__(self, dataset_path: str, max_conversations: int = 1000, seed: Optional[int] = None): + self.dataset_path = dataset_path + self.max_conversations = max_conversations + self.conversations = [] + self.token_stats = {} + + if seed: + random.seed(seed) + np.random.seed(seed) + + self._load_dataset() + + def _load_dataset(self): + """Load and process the ShareGPT dataset.""" + if not os.path.exists(self.dataset_path): + logger.warning(f"Dataset not found at {self.dataset_path}") + return + + try: + tokenizer = None + if TIKTOKEN_AVAILABLE: + try: + tokenizer = tiktoken.get_encoding("cl100k_base") + except Exception: + pass + + if tokenizer is None: + logger.info("Tiktoken not available, using approximate token counting") + + with open(self.dataset_path, 'r', encoding='utf-8') as f: + data = json.load(f) + + for conv_idx, conversation in enumerate(data[:self.max_conversations]): + if 'conversations' not in conversation: + continue + + conv_data = [] + turns = conversation['conversations'] + + for i in range(0, len(turns) - 1, 2): + if i + 1 >= len(turns): + break + + human_turn = turns[i] + gpt_turn = turns[i + 1] + + if human_turn.get('from') != 'human' or gpt_turn.get('from') != 'gpt': + continue + + context_text = human_turn.get('value', '') + generation_text = gpt_turn.get('value', '') + + if tokenizer: + context_tokens = len(tokenizer.encode(context_text)) + generation_tokens = len(tokenizer.encode(generation_text)) + else: + context_tokens = max(1, len(context_text) // 4) + generation_tokens = max(1, len(generation_text) // 4) + + context_tokens = min(context_tokens, 16384) + generation_tokens = min(generation_tokens, 2048) + + conv_data.append({ + 'context_tokens': context_tokens, + 'generation_tokens': generation_tokens, + 'turn_number': i // 2 + 1 + }) + + if conv_data: + self.conversations.append({ + 'id': conversation.get('id', f'conv_{conv_idx}'), + 'turns': conv_data + }) + + if self.conversations: + all_context_tokens = [] + all_generation_tokens = [] + + for conv in self.conversations: + for turn in conv['turns']: + all_context_tokens.append(turn['context_tokens']) + all_generation_tokens.append(turn['generation_tokens']) + + self.token_stats = { + 'context_mean': np.mean(all_context_tokens), + 'context_std': np.std(all_context_tokens), + 'context_min': np.min(all_context_tokens), + 'context_max': np.max(all_context_tokens), + 'context_p50': np.percentile(all_context_tokens, 50), + 'context_p95': np.percentile(all_context_tokens, 95), + 'generation_mean': np.mean(all_generation_tokens), + 'generation_std': np.std(all_generation_tokens), + 'generation_min': np.min(all_generation_tokens), + 'generation_max': np.max(all_generation_tokens), + 'generation_p50': np.percentile(all_generation_tokens, 50), + 'generation_p95': np.percentile(all_generation_tokens, 95), + 'total_conversations': len(self.conversations), + 'total_turns': sum(len(c['turns']) for c in self.conversations) + } + + logger.info(f"Loaded {len(self.conversations)} conversations with {self.token_stats['total_turns']} turns") + logger.info(f"Context tokens: mean={self.token_stats['context_mean']:.1f}, p50={self.token_stats['context_p50']:.1f}, p95={self.token_stats['context_p95']:.1f}") + logger.info(f"Generation tokens: mean={self.token_stats['generation_mean']:.1f}, p50={self.token_stats['generation_p50']:.1f}, p95={self.token_stats['generation_p95']:.1f}") + + except Exception as e: + logger.error(f"Error loading dataset: {e}") + self.conversations = [] + + def get_random_conversation(self) -> Optional[Dict]: + """Get a random conversation from the dataset.""" + if not self.conversations: + return None + return random.choice(self.conversations) + + def get_random_turn(self) -> Optional[Tuple[int, int]]: + """Get random context and generation token counts from the dataset.""" + if not self.conversations: + return None + + conv = self.get_random_conversation() + if conv and conv['turns']: + turn = random.choice(conv['turns']) + return turn['context_tokens'], turn['generation_tokens'] + return None + + def iterate_conversations(self, shuffle: bool = True): + """Iterate through all conversations, optionally shuffled.""" + conversations = self.conversations.copy() + if shuffle: + random.shuffle(conversations) + for conv in conversations: + yield conv diff --git a/kv_cache_benchmark/pyproject.toml b/kv_cache_benchmark/pyproject.toml new file mode 100755 index 00000000..3eaf156c --- /dev/null +++ b/kv_cache_benchmark/pyproject.toml @@ -0,0 +1,113 @@ +[build-system] +requires = ["setuptools>=61.0", "wheel"] +build-backend = "setuptools.build_meta" + +[project] +name = "mlperf-kv-cache" +version = "3.0.0" +description = "MLPerf KV Cache Benchmark - Multi-Tier Performance Comparison for LLM Inference" +readme = "README.md" +license = {text = "Apache-2.0"} +authors = [ + {name = "Hazem Awadallah", email = "hazem_awadallah@kingston.com"}, + {name = "Kingston Digital"}, + {name = "MLPerf Storage Working Group"}, +] +keywords = [ + "mlperf", + "benchmark", + "kv-cache", + "llm", + "inference", + "gpu", + "storage", + "multi-tier", +] +classifiers = [ + "Development Status :: 4 - Beta", + "Environment :: Console", + "Environment :: GPU", + "Intended Audience :: Developers", + "Intended Audience :: Science/Research", + "License :: OSI Approved :: Apache Software License", + "Operating System :: OS Independent", + "Programming Language :: Python :: 3", + "Programming Language :: Python :: 3.10", + "Programming Language :: Python :: 3.11", + "Programming Language :: Python :: 3.12", + "Topic :: Scientific/Engineering :: Artificial Intelligence", + "Topic :: System :: Benchmark", +] +requires-python = ">=3.10" + +# Core dependencies (minimal set for basic functionality) +dependencies = [] + +[project.optional-dependencies] +# YAML config file support +yaml = ["pyyaml>=6.0"] + +# GPU support +gpu = [ + "torch>=2.0", + "cupy-cuda12x>=12.0", # Adjust cuda version as needed +] + +# Tokenization +tokenizer = ["tiktoken>=0.5"] + +# Excel/DataFrame output +reporting = [ + "pandas>=2.0", + "openpyxl>=3.1", +] + +# Full installation with all optional dependencies +full = [ + "pyyaml>=6.0", + "torch>=2.0", + "tiktoken>=0.5", + "pandas>=2.0", + "openpyxl>=3.1", +] + +# Development dependencies +dev = [ + "pytest>=7.0", + "pytest-cov>=4.0", + "ruff>=0.1", + "mypy>=1.0", +] + +[project.scripts] +kv-cache = "kv_cache.cli:main" +mlperf-kv-cache = "kv_cache.cli:main" + +[project.urls] +Homepage = "https://github.com/mlcommons/storage" +Documentation = "https://mlcommons.org/en/groups/research-storage/" +Repository = "https://github.com/mlcommons/storage" +Issues = "https://github.com/mlcommons/storage/issues" + +[tool.setuptools] +packages = ["kv_cache"] + +[tool.ruff] +line-length = 120 +target-version = "py310" + +[tool.ruff.lint] +select = ["E", "F", "W", "I", "N", "UP", "B", "C4"] +ignore = ["E501"] # Line too long (handled by formatter) + +[tool.mypy] +python_version = "3.10" +warn_return_any = true +warn_unused_ignores = true +ignore_missing_imports = true + +[tool.pytest.ini_options] +testpaths = ["tests", "."] +python_files = ["test_*.py"] +python_functions = ["test_*"] +addopts = "-v --tb=short" diff --git a/kv_cache_benchmark/requirements.txt b/kv_cache_benchmark/requirements.txt index 6570c74b..d0d3f213 100644 --- a/kv_cache_benchmark/requirements.txt +++ b/kv_cache_benchmark/requirements.txt @@ -3,6 +3,7 @@ # Core dependencies (required) numpy>=1.20.0 +pyyaml>=6.0.0 # For config.yaml support # GPU support (optional - enables GPU tier testing) torch>=2.0.0 # For CUDA tensor support @@ -19,6 +20,10 @@ openpyxl>=3.1.0 # Required for .xlsx output; without this, falls back to .csv pytest>=7.0.0 pytest-html>=4.0.0 # Required for HTML test reports +# High-performance storage backends (optional - for --storage-backend mmap/parallel) +# aiofiles>=23.0.0 # Async file I/O (uncomment for potential future async backend) +# Note: io_uring bindings (liburing) are not available via pip; requires system install + # Wrapper script utilities (system packages, not pip) # bc - arbitrary precision calculator (apt install bc) # jq - JSON processor (apt install jq) diff --git a/kv_cache_benchmark/tests/test_kv_cache.py b/kv_cache_benchmark/tests/test_kv_cache.py index cfa42f56..f5d44759 100644 --- a/kv_cache_benchmark/tests/test_kv_cache.py +++ b/kv_cache_benchmark/tests/test_kv_cache.py @@ -11,19 +11,46 @@ These tests verify core functionality without running the full benchmark. Typical execution time: < 5 seconds + +This version tests kv-cache.py which includes: +- ConfigLoader with YAML support and strict validation +- Extended QoS SLA with p999 and p9999 percentiles +- Config-driven parameters via cfg() helper +- Renamed nvme_* to storage_* in stats """ import os import sys +import time +import argparse import tempfile +import threading import pytest import numpy as np from datetime import datetime from pathlib import Path # Import from kv-cache.py (handle the hyphen in filename) +# Try multiple locations: same directory, parent directory import importlib.util -spec = importlib.util.spec_from_file_location("kv_cache", os.path.join(os.path.dirname(__file__), "kv-cache.py")) + +_kv_cache_path = None +_possible_paths = [ + os.path.join(os.path.dirname(__file__), "kv-cache.py"), # Same directory + os.path.join(os.path.dirname(__file__), "..", "kv-cache.py"), # Parent directory +] +for _path in _possible_paths: + if os.path.exists(_path): + _kv_cache_path = _path + break + +if _kv_cache_path is None: + raise FileNotFoundError( + f"Could not find kv-cache.py. Searched in:\n" + + "\n".join(f" - {os.path.abspath(p)}" for p in _possible_paths) + ) + +spec = importlib.util.spec_from_file_location("kv_cache", _kv_cache_path) kv_cache = importlib.util.module_from_spec(spec) spec.loader.exec_module(kv_cache) @@ -44,6 +71,25 @@ MultiTierCache = kv_cache.MultiTierCache export_results_to_xlsx = kv_cache.export_results_to_xlsx PANDAS_AVAILABLE = kv_cache.PANDAS_AVAILABLE + +# New imports for 01-26-2026 version +ConfigLoader = kv_cache.ConfigLoader +cfg = kv_cache.cfg +get_config = kv_cache.get_config +set_config = kv_cache.set_config +get_qos_profiles = kv_cache.get_qos_profiles +QoSSLA = kv_cache.QoSSLA +YAML_AVAILABLE = kv_cache.YAML_AVAILABLE +IntegratedBenchmark = kv_cache.IntegratedBenchmark + +# Input validation imports +validate_args = kv_cache.validate_args +MAX_USERS = kv_cache.MAX_USERS +MAX_DURATION_SECONDS = kv_cache.MAX_DURATION_SECONDS +MAX_GPU_MEMORY_GB = kv_cache.MAX_GPU_MEMORY_GB +MAX_CPU_MEMORY_GB = kv_cache.MAX_CPU_MEMORY_GB +FORBIDDEN_CACHE_PREFIXES = kv_cache.FORBIDDEN_CACHE_PREFIXES + if PANDAS_AVAILABLE: import pandas as pd @@ -187,9 +233,180 @@ class MockArgs: max_requests = 0 dataset_path = None cache_dir = None + storage_capacity_gb = 0 + precondition = False + precondition_size_gb = 0 + precondition_threads = 0 + trace_speedup = 1.0 + replay_cycles = 0 return MockArgs() +@pytest.fixture +def sample_config_yaml(tmp_path): + """Create a sample config.yaml for testing.""" + config_content = ''' +user_templates: + chatbot: + context_range: [256, 1024] + generation_range: [50, 150] + think_time_range: [0.1, 0.5] + coding: + context_range: [1024, 4096] + generation_range: [100, 500] + think_time_range: [0.2, 1.0] + document: + context_range: [2048, 8192] + generation_range: [200, 800] + think_time_range: [0.3, 1.5] + +qos_profiles: + interactive: + target_latency_p95_ms: 50 + target_latency_p99_ms: 100 + target_latency_p999_ms: 150 + target_latency_p9999_ms: 200 + priority: 3 + responsive: + target_latency_p95_ms: 100 + target_latency_p99_ms: 200 + target_latency_p999_ms: 350 + target_latency_p9999_ms: 500 + priority: 2 + batch: + target_latency_p95_ms: 1000 + target_latency_p99_ms: 5000 + target_latency_p999_ms: 7500 + target_latency_p9999_ms: 10000 + priority: 1 + +qos_distribution: + interactive_probability: 0.15 + responsive_threshold: 0.50 + +eviction: + max_recursion_depth: 10 + target_usage_ratio: 0.8 + large_entry_limit_ratio: 0.95 + max_evictions_hard_cap: 5000 + max_evictions_min: 1000 + +decode: + batch_size: 32 + +conversation: + max_conversations: 1000 + max_turns_per_conv: 50 + end_conversation_probability: 0.2 +''' + config_file = tmp_path / "test_config.yaml" + config_file.write_text(config_content) + return str(config_file) + + +# ============================================================================= +# Test 0: ConfigLoader (New in 01-26-2026) +# ============================================================================= + +@pytest.mark.skipif(not YAML_AVAILABLE, reason="PyYAML not installed") +class TestConfigLoader: + """Tests for ConfigLoader and cfg() helper function.""" + + def test_config_loader_without_file(self): + """ConfigLoader should work without a config file.""" + loader = ConfigLoader(config_path=None) + assert loader is not None + assert loader.config == {} + + def test_config_loader_loads_yaml(self, sample_config_yaml): + """ConfigLoader should load and parse YAML file.""" + loader = ConfigLoader(config_path=sample_config_yaml) + assert loader.config is not None + assert 'qos_profiles' in loader.config + + def test_config_loader_get_nested_value(self, sample_config_yaml): + """ConfigLoader.get() should retrieve nested values.""" + loader = ConfigLoader(config_path=sample_config_yaml) + priority = loader.get('qos_profiles', 'interactive', 'priority') + assert priority == 3 + + def test_config_loader_get_with_default(self, sample_config_yaml): + """ConfigLoader.get() should return default for missing keys.""" + loader = ConfigLoader(config_path=sample_config_yaml) + value = loader.get('nonexistent', 'key', default=42) + assert value == 42 + + def test_cfg_without_global_config(self): + """cfg() should return default when no global config is set.""" + # Ensure no global config + set_config(None) + value = cfg('qos_profiles', 'interactive', 'priority', default=99) + assert value == 99 + + def test_cfg_with_global_config(self, sample_config_yaml): + """cfg() should retrieve values from global config.""" + loader = ConfigLoader(config_path=sample_config_yaml) + set_config(loader) + try: + value = cfg('qos_profiles', 'interactive', 'priority', default=99) + assert value == 3 + finally: + set_config(None) # Clean up + + def test_config_loader_validates_schema(self, tmp_path): + """ConfigLoader should reject unknown keys.""" + bad_config = tmp_path / "bad_config.yaml" + bad_config.write_text(''' +unknown_section: + bad_key: true +''') + with pytest.raises(ValueError, match="Unknown configuration key"): + ConfigLoader(config_path=str(bad_config)) + + def test_get_config_returns_none_initially(self): + """get_config() should return None before set_config() is called.""" + set_config(None) + assert get_config() is None + + def test_set_config_stores_loader(self, sample_config_yaml): + """set_config() should store the ConfigLoader globally.""" + loader = ConfigLoader(config_path=sample_config_yaml) + set_config(loader) + try: + assert get_config() is loader + finally: + set_config(None) + + +class TestCfgHelper: + """Tests for cfg() helper function in various contexts.""" + + def test_cfg_returns_default_for_none_config(self): + """cfg() returns default when config is None.""" + set_config(None) + assert cfg('any', 'path', default='fallback') == 'fallback' + + def test_cfg_returns_default_for_missing_key(self, sample_config_yaml): + """cfg() returns default for missing nested keys.""" + loader = ConfigLoader(config_path=sample_config_yaml) + set_config(loader) + try: + result = cfg('nonexistent', 'nested', 'key', default=123) + assert result == 123 + finally: + set_config(None) + + def test_cfg_retrieves_list_values(self, sample_config_yaml): + """cfg() can retrieve list values from config.""" + loader = ConfigLoader(config_path=sample_config_yaml) + set_config(loader) + try: + context_range = cfg('user_templates', 'chatbot', 'context_range') + assert context_range == [256, 1024] + finally: + set_config(None) + + # ============================================================================= # Test 1: ModelConfig # ============================================================================= @@ -318,6 +535,39 @@ def test_sla_compliance_starts_at_one(self): def test_interactive_target_latency(self): sla = QOS_PROFILES[QoSLevel.INTERACTIVE] assert sla.target_latency_p95_ms == 50 + + # New tests for extended QoS percentiles (01-26-2026 feature) + def test_interactive_has_p999_latency(self): + """Test that p999 percentile is defined for INTERACTIVE.""" + sla = QOS_PROFILES[QoSLevel.INTERACTIVE] + assert hasattr(sla, 'target_latency_p999_ms') + assert sla.target_latency_p999_ms > sla.target_latency_p99_ms + + def test_interactive_has_p9999_latency(self): + """Test that p9999 percentile is defined for INTERACTIVE.""" + sla = QOS_PROFILES[QoSLevel.INTERACTIVE] + assert hasattr(sla, 'target_latency_p9999_ms') + assert sla.target_latency_p9999_ms > sla.target_latency_p999_ms + + def test_all_qos_levels_have_extended_percentiles(self): + """Verify all QoS levels have p999 and p9999 defined.""" + for level in QoSLevel: + sla = QOS_PROFILES[level] + assert hasattr(sla, 'target_latency_p999_ms') + assert hasattr(sla, 'target_latency_p9999_ms') + + def test_get_qos_profiles_returns_dict(self): + """Test that get_qos_profiles() returns profiles dict.""" + profiles = get_qos_profiles() + assert isinstance(profiles, dict) + assert len(profiles) == 3 + + def test_get_qos_profiles_levels(self): + """Test that get_qos_profiles() has all QoS levels.""" + profiles = get_qos_profiles() + assert QoSLevel.INTERACTIVE in profiles + assert QoSLevel.RESPONSIVE in profiles + assert QoSLevel.BATCH in profiles # ============================================================================= @@ -603,7 +853,8 @@ def test_generate_mixed_users(self): def test_users_have_valid_context_lengths(self): users = UserSimulator.generate_mixed_users(10) for user in users: - assert 256 <= user.context_length <= 8192 + # Range covers all user templates: chatbot [512,4096], coding [4096,25000], document [4096,16384] + assert 512 <= user.context_length <= 25000 def test_qos_levels_assigned(self): users = UserSimulator.generate_mixed_users(10) @@ -868,15 +1119,1276 @@ def test_cpu_limit(self, multi_tier_cache): cpu_limit = multi_tier_cache._get_tier_limit('cpu') assert cpu_limit == 0.1 * 1024**3 # 100MB - def test_nvme_limit_infinite(self, multi_tier_cache): + def test_nvme_limit_auto_detected(self, multi_tier_cache): + """NVMe limit should be auto-detected from disk free space (not inf).""" nvme_limit = multi_tier_cache._get_tier_limit('nvme') - assert nvme_limit == float('inf') + assert nvme_limit > 0 def test_initial_cpu_usage_zero(self, multi_tier_cache): cpu_usage = multi_tier_cache._get_tier_usage('cpu') assert cpu_usage == 0 +# ============================================================================= +# Test 13: Config-Driven Parameters (New in 01-26-2026) +# ============================================================================= + +class TestConfigDrivenConversationManager: + """Tests for ConversationManager with config-driven parameters.""" + + def test_default_max_conversations(self): + """Without config, should use hardcoded default of 1000.""" + set_config(None) + manager = ConversationManager() + assert manager.max_conversations == 1000 + + def test_default_max_turns(self): + """Without config, should use hardcoded default of 50.""" + set_config(None) + manager = ConversationManager() + assert manager.max_turns_per_conv == 50 + + def test_explicit_params_override_config(self, sample_config_yaml): + """Explicit constructor params should override config values.""" + loader = ConfigLoader(config_path=sample_config_yaml) + set_config(loader) + try: + manager = ConversationManager(max_conversations=42, max_turns_per_conv=7) + assert manager.max_conversations == 42 + assert manager.max_turns_per_conv == 7 + finally: + set_config(None) + + +@pytest.mark.skipif(not YAML_AVAILABLE, reason="PyYAML not installed") +class TestConfigDrivenUserSimulator: + """Tests for UserSimulator with config-driven parameters.""" + + def test_user_templates_from_config(self, sample_config_yaml): + """UserSimulator should read templates from config.""" + loader = ConfigLoader(config_path=sample_config_yaml) + set_config(loader) + try: + templates = UserSimulator._get_user_templates() + assert 'chatbot' in templates + assert 'coding' in templates + assert 'document' in templates + assert templates['chatbot']['context_range'] == (256, 1024) + finally: + set_config(None) + + def test_qos_distribution_from_config(self, sample_config_yaml): + """UserSimulator.generate_mixed_users should use config QoS distribution.""" + loader = ConfigLoader(config_path=sample_config_yaml) + set_config(loader) + try: + # Generate many users to test distribution + users = UserSimulator.generate_mixed_users(1000) + # With 15% interactive probability, expect ~150 interactive users + interactive_count = sum(1 for u in users if u.qos_level == QoSLevel.INTERACTIVE) + # Allow 50% variance for randomness + assert 75 <= interactive_count <= 225, f"Expected ~150 interactive, got {interactive_count}" + finally: + set_config(None) + + +# ============================================================================= +# Test 14: Stats Naming Convention (storage_* vs nvme_*) +# ============================================================================= + +class TestStatsNamingConvention: + """Tests that stats use 'storage_*' naming (not 'nvme_*') in 01-26-2026.""" + + def test_stats_use_storage_prefix(self, multi_tier_cache): + """Stats should use 'storage_' prefix instead of 'nvme_'.""" + multi_tier_cache.allocate_cache("test_entry", num_tokens=100) + multi_tier_cache.access_cache("test_entry", InferencePhase.DECODE) + stats = multi_tier_cache.get_stats(duration=1.0) + + # Check for storage_* naming + storage_keys = [k for k in stats.keys() if 'storage_' in k.lower()] + nvme_keys = [k for k in stats.keys() if 'nvme_' in k.lower()] + + # Should have storage_* keys + assert len(storage_keys) > 0, "Expected storage_* keys in stats" + + def test_tier_stats_key_format(self, multi_tier_cache): + """tier_storage_* keys should exist (renamed from tier_nvme_*).""" + multi_tier_cache.allocate_cache("test_entry", num_tokens=100) + stats = multi_tier_cache.get_stats(duration=1.0) + + # Check for tier_storage_* keys + tier_storage_keys = [k for k in stats.keys() if k.startswith('tier_storage_')] + assert len(tier_storage_keys) > 0, "Expected tier_storage_* keys in stats" + + +# ============================================================================= +# Test 15: GPUMemoryBackend Eviction Callback (New in 01-26-2026) +# ============================================================================= + +@pytest.mark.skipif(not CUDA_AVAILABLE, reason="CUDA not available") +class TestGPUMemoryBackendEvictionCallback: + """Tests for GPUMemoryBackend's on_eviction_callback feature.""" + + def test_gpu_backend_accepts_callback(self): + """GPUMemoryBackend should accept on_eviction_callback parameter.""" + evicted_keys = [] + def callback(key, tier, size): + evicted_keys.append((key, tier, size)) + + backend = GPUMemoryBackend(on_eviction_callback=callback) + assert backend.on_eviction_callback is callback + backend.clear() + + def test_gpu_backend_works_without_callback(self): + """GPUMemoryBackend should work without a callback (None).""" + backend = GPUMemoryBackend(on_eviction_callback=None) + assert backend.on_eviction_callback is None + backend.clear() + + +# ============================================================================= +# Test 16: Input Validation (validate_args) +# ============================================================================= + +class TestValidateArgs: + """Tests for the validate_args() input validation function.""" + + @pytest.fixture + def valid_args(self): + """Create a valid args namespace with all required attributes.""" + import argparse + args = argparse.Namespace( + num_users=100, + duration=60, + gpu_mem_gb=16, + cpu_mem_gb=32, + rag_num_docs=10, + max_conversations=500, + max_concurrent_allocs=0, + request_rate=0, + max_requests=0, + target_saturation=0.8, + cache_dir=None, + storage_capacity_gb=0, + precondition_size_gb=0, + precondition_threads=0, + trace_speedup=1.0, + replay_cycles=0 + ) + return args + + def test_valid_args_pass_through(self, valid_args): + """Valid arguments should pass validation and return unchanged.""" + result = validate_args(valid_args) + assert result is valid_args + assert result.num_users == 100 + assert result.duration == 60 + + def test_num_users_zero_rejected(self, valid_args): + """num_users=0 should raise ValueError.""" + valid_args.num_users = 0 + with pytest.raises(ValueError, match="num-users must be positive"): + validate_args(valid_args) + + def test_num_users_negative_rejected(self, valid_args): + """Negative num_users should raise ValueError.""" + valid_args.num_users = -5 + with pytest.raises(ValueError, match="num-users must be positive"): + validate_args(valid_args) + + def test_num_users_exceeds_limit(self, valid_args): + """num_users exceeding MAX_USERS should raise ValueError.""" + valid_args.num_users = MAX_USERS + 1 + with pytest.raises(ValueError, match="num-users exceeds limit"): + validate_args(valid_args) + + def test_duration_zero_rejected(self, valid_args): + """duration=0 should raise ValueError.""" + valid_args.duration = 0 + with pytest.raises(ValueError, match="duration must be positive"): + validate_args(valid_args) + + def test_duration_negative_rejected(self, valid_args): + """Negative duration should raise ValueError.""" + valid_args.duration = -10 + with pytest.raises(ValueError, match="duration must be positive"): + validate_args(valid_args) + + def test_duration_exceeds_limit(self, valid_args): + """duration exceeding 24 hours should raise ValueError.""" + valid_args.duration = MAX_DURATION_SECONDS + 1 + with pytest.raises(ValueError, match="duration exceeds 24 hours"): + validate_args(valid_args) + + def test_gpu_mem_negative_rejected(self, valid_args): + """Negative gpu_mem_gb should raise ValueError.""" + valid_args.gpu_mem_gb = -1 + with pytest.raises(ValueError, match="gpu-mem-gb cannot be negative"): + validate_args(valid_args) + + def test_gpu_mem_zero_allowed(self, valid_args): + """gpu_mem_gb=0 should be valid (disables GPU tier).""" + valid_args.gpu_mem_gb = 0 + result = validate_args(valid_args) + assert result.gpu_mem_gb == 0 + + def test_gpu_mem_exceeds_limit(self, valid_args): + """gpu_mem_gb exceeding limit should raise ValueError.""" + valid_args.gpu_mem_gb = MAX_GPU_MEMORY_GB + 1 + with pytest.raises(ValueError, match="gpu-mem-gb exceeds limit"): + validate_args(valid_args) + + def test_cpu_mem_negative_rejected(self, valid_args): + """Negative cpu_mem_gb should raise ValueError.""" + valid_args.cpu_mem_gb = -1 + with pytest.raises(ValueError, match="cpu-mem-gb cannot be negative"): + validate_args(valid_args) + + def test_cpu_mem_zero_allowed(self, valid_args): + """cpu_mem_gb=0 should be valid.""" + valid_args.cpu_mem_gb = 0 + result = validate_args(valid_args) + assert result.cpu_mem_gb == 0 + + def test_cpu_mem_exceeds_limit(self, valid_args): + """cpu_mem_gb exceeding limit should raise ValueError.""" + valid_args.cpu_mem_gb = MAX_CPU_MEMORY_GB + 1 + with pytest.raises(ValueError, match="cpu-mem-gb exceeds limit"): + validate_args(valid_args) + + def test_target_saturation_below_zero_rejected(self, valid_args): + """target_saturation < 0 should raise ValueError.""" + valid_args.target_saturation = -0.1 + with pytest.raises(ValueError, match="target-saturation must be between 0.0 and 1.0"): + validate_args(valid_args) + + def test_target_saturation_above_one_rejected(self, valid_args): + """target_saturation > 1 should raise ValueError.""" + valid_args.target_saturation = 1.5 + with pytest.raises(ValueError, match="target-saturation must be between 0.0 and 1.0"): + validate_args(valid_args) + + def test_target_saturation_boundaries_valid(self, valid_args): + """target_saturation at 0.0 and 1.0 should be valid.""" + valid_args.target_saturation = 0.0 + result = validate_args(valid_args) + assert result.target_saturation == 0.0 + + valid_args.target_saturation = 1.0 + result = validate_args(valid_args) + assert result.target_saturation == 1.0 + + def test_rag_num_docs_negative_rejected(self, valid_args): + """Negative rag_num_docs should raise ValueError.""" + valid_args.rag_num_docs = -1 + with pytest.raises(ValueError, match="rag-num-docs cannot be negative"): + validate_args(valid_args) + + def test_max_conversations_zero_rejected(self, valid_args): + """max_conversations=0 should raise ValueError.""" + valid_args.max_conversations = 0 + with pytest.raises(ValueError, match="max-conversations must be positive"): + validate_args(valid_args) + + def test_max_concurrent_allocs_negative_rejected(self, valid_args): + """Negative max_concurrent_allocs should raise ValueError.""" + valid_args.max_concurrent_allocs = -1 + with pytest.raises(ValueError, match="max-concurrent-allocs cannot be negative"): + validate_args(valid_args) + + def test_request_rate_negative_rejected(self, valid_args): + """Negative request_rate should raise ValueError.""" + valid_args.request_rate = -1 + with pytest.raises(ValueError, match="request-rate cannot be negative"): + validate_args(valid_args) + + def test_max_requests_negative_rejected(self, valid_args): + """Negative max_requests should raise ValueError.""" + valid_args.max_requests = -1 + with pytest.raises(ValueError, match="max-requests cannot be negative"): + validate_args(valid_args) + + @pytest.mark.skipif(sys.platform == 'win32', reason="Unix paths not valid on Windows") + def test_forbidden_cache_dir_rejected(self, valid_args): + """Cache directories in system paths should be rejected.""" + valid_args.cache_dir = '/etc/kv_cache' + with pytest.raises(ValueError, match="cannot be a system directory"): + validate_args(valid_args) + + def test_valid_cache_dir_allowed(self, valid_args, tmp_path): + """Valid cache directory should be accepted.""" + valid_args.cache_dir = str(tmp_path / "kv_cache_test") + result = validate_args(valid_args) + assert result.cache_dir == str(tmp_path / "kv_cache_test") + + def test_multiple_errors_collected(self, valid_args): + """Multiple validation errors should all be reported.""" + valid_args.num_users = -1 + valid_args.duration = -1 + valid_args.gpu_mem_gb = -1 + with pytest.raises(ValueError) as exc_info: + validate_args(valid_args) + # All three errors should be in the message + error_msg = str(exc_info.value) + assert "num-users" in error_msg + assert "duration" in error_msg + assert "gpu-mem-gb" in error_msg + + # --- New validation tests for v3.0 Changes 1-3 --- + + def test_storage_capacity_gb_negative_rejected(self, valid_args): + """Negative storage_capacity_gb should raise ValueError.""" + valid_args.storage_capacity_gb = -1 + with pytest.raises(ValueError, match="storage-capacity-gb cannot be negative"): + validate_args(valid_args) + + def test_storage_capacity_gb_zero_allowed(self, valid_args): + """storage_capacity_gb=0 should be valid (auto-detect).""" + valid_args.storage_capacity_gb = 0 + result = validate_args(valid_args) + assert result.storage_capacity_gb == 0 + + def test_storage_capacity_gb_positive_allowed(self, valid_args): + """Positive storage_capacity_gb should be valid.""" + valid_args.storage_capacity_gb = 100 + result = validate_args(valid_args) + assert result.storage_capacity_gb == 100 + + def test_precondition_size_gb_negative_rejected(self, valid_args): + """Negative precondition_size_gb should raise ValueError.""" + valid_args.precondition_size_gb = -1 + with pytest.raises(ValueError, match="precondition-size-gb cannot be negative"): + validate_args(valid_args) + + def test_precondition_size_gb_zero_allowed(self, valid_args): + """precondition_size_gb=0 should be valid (default to 2x NVMe capacity).""" + valid_args.precondition_size_gb = 0 + result = validate_args(valid_args) + assert result.precondition_size_gb == 0 + + def test_precondition_threads_negative_rejected(self, valid_args): + """Negative precondition_threads should raise ValueError.""" + valid_args.precondition_threads = -1 + with pytest.raises(ValueError, match="precondition-threads cannot be negative"): + validate_args(valid_args) + + def test_precondition_threads_zero_allowed(self, valid_args): + """precondition_threads=0 should be valid (auto-detect from cpu_count).""" + valid_args.precondition_threads = 0 + result = validate_args(valid_args) + assert result.precondition_threads == 0 + + +# ============================================================================= +# Test 16b: NVMe Capacity Tracking (Change 1) +# ============================================================================= + +class TestNVMeCapacityTracking: + """Tests for NVMe/storage tier capacity tracking.""" + + @pytest.fixture + def tiny_model_config(self): + return MODEL_CONFIGS['tiny-1b'] + + def test_explicit_storage_capacity(self, tiny_model_config): + """Explicit storage_capacity_gb should set nvme_memory_limit.""" + cache = MultiTierCache( + model_config=tiny_model_config, + gpu_memory_gb=0, + cpu_memory_gb=0.1, + seed=42, + storage_capacity_gb=10.0 + ) + assert cache.nvme_memory_limit == 10.0 * 1024**3 + + def test_auto_detect_storage_capacity(self, tiny_model_config): + """storage_capacity_gb=0 should auto-detect from disk free space.""" + cache = MultiTierCache( + model_config=tiny_model_config, + gpu_memory_gb=0, + cpu_memory_gb=0.1, + seed=42, + storage_capacity_gb=0 + ) + # Auto-detect should return a finite positive value (disk free space) + assert cache.nvme_memory_limit > 0 + assert cache.nvme_memory_limit != float('inf') + + def test_nvme_usage_starts_at_zero(self, tiny_model_config): + """NVMe usage should start at 0.""" + cache = MultiTierCache( + model_config=tiny_model_config, + gpu_memory_gb=0, + cpu_memory_gb=0.1, + seed=42, + storage_capacity_gb=10.0 + ) + assert cache.nvme_memory_used == 0 + + def test_nvme_usage_tracked_after_write(self, tiny_model_config): + """NVMe usage should increase after writing to NVMe tier.""" + cache = MultiTierCache( + model_config=tiny_model_config, + gpu_memory_gb=0, + cpu_memory_gb=0.001, # 1MB — force overflow to NVMe + seed=42, + storage_capacity_gb=10.0 + ) + # Write enough to overflow CPU to NVMe + for i in range(10): + cache.allocate_cache(f"entry_{i}", num_tokens=1000) + assert cache.nvme_memory_used > 0 + + def test_get_tier_limit_returns_set_value(self, tiny_model_config): + """_get_tier_limit('nvme') should return the configured limit.""" + cache = MultiTierCache( + model_config=tiny_model_config, + gpu_memory_gb=0, + cpu_memory_gb=0.1, + seed=42, + storage_capacity_gb=5.0 + ) + assert cache._get_tier_limit('nvme') == 5.0 * 1024**3 + + def test_get_tier_usage_reflects_writes(self, tiny_model_config): + """_get_tier_usage('nvme') should reflect bytes written.""" + cache = MultiTierCache( + model_config=tiny_model_config, + gpu_memory_gb=0, + cpu_memory_gb=0.001, + seed=42, + storage_capacity_gb=10.0 + ) + assert cache._get_tier_usage('nvme') == 0 + for i in range(10): + cache.allocate_cache(f"entry_{i}", num_tokens=1000) + assert cache._get_tier_usage('nvme') > 0 + + +# ============================================================================= +# Test 16c: NVMe Eviction (Change 2) +# ============================================================================= + +class TestNVMeEviction: + """Tests for NVMe eviction when storage tier is full.""" + + @pytest.fixture + def tiny_model_config(self): + return MODEL_CONFIGS['tiny-1b'] + + def test_nvme_eviction_triggers_when_full(self, tiny_model_config): + """When NVMe is full, LRU entries should be evicted (deleted).""" + # tiny-1b: ~24KB per token. 10 tokens = ~240KB per entry. + # 10MB NVMe fits ~42 entries before eviction triggers. + cache = MultiTierCache( + model_config=tiny_model_config, + gpu_memory_gb=0, + cpu_memory_gb=0.001, # 1MB CPU + seed=42, + storage_capacity_gb=0.01 # 10MB NVMe + ) + # Write more data than fits in NVMe (200 >> 42) + keys = [] + for i in range(200): + key = f"entry_{i}" + success, location, _ = cache.allocate_cache(key, num_tokens=10) + if success: + keys.append(key) + + # evictions counter is in cache.stats, not in get_stats() output + assert cache.stats['evictions'] > 0, "Evictions should have occurred when NVMe is full" + + def test_evicted_entry_removed_from_cache_entries(self, tiny_model_config): + """Evicted NVMe entries should be removed from cache_entries dict.""" + cache = MultiTierCache( + model_config=tiny_model_config, + gpu_memory_gb=0, + cpu_memory_gb=0.001, + seed=42, + storage_capacity_gb=0.01 # 10MB NVMe + ) + # Fill and overflow (200 entries >> ~42 capacity) + for i in range(200): + cache.allocate_cache(f"entry_{i}", num_tokens=10) + + # Some early entries should have been evicted + total_entries = len(cache.cache_entries) + assert total_entries < 200, f"Expected evictions to reduce entries, got {total_entries}" + + def test_allocation_still_succeeds_after_eviction(self, tiny_model_config): + """New allocations should succeed even after NVMe evictions.""" + cache = MultiTierCache( + model_config=tiny_model_config, + gpu_memory_gb=0, + cpu_memory_gb=0.001, + seed=42, + storage_capacity_gb=0.01 + ) + # Fill NVMe + for i in range(100): + cache.allocate_cache(f"fill_{i}", num_tokens=10) + + # New allocation should still work (eviction frees space) + success, location, _ = cache.allocate_cache("after_eviction", num_tokens=10) + assert success is True + + def test_unlimited_nvme_skips_eviction(self, tiny_model_config): + """When nvme_memory_limit is inf (auto-detect fails), no eviction should occur.""" + cache = MultiTierCache( + model_config=tiny_model_config, + gpu_memory_gb=0, + cpu_memory_gb=0.001, + seed=42, + storage_capacity_gb=0 # auto-detect + ) + # Force nvme_memory_limit to inf for this test + cache.nvme_memory_limit = float('inf') + + for i in range(20): + cache.allocate_cache(f"entry_{i}", num_tokens=500) + + stats = cache.get_stats(duration=1.0) + # With unlimited NVMe, no NVMe-tier evictions should occur + # (CPU evictions/demotions to NVMe are expected) + nvme_entries = sum(1 for e in cache.cache_entries.values() if e['location'] == 'nvme') + assert nvme_entries > 0, "Entries should exist on NVMe tier" + + +# ============================================================================= +# Test 16d: reset_stats (Change 3) +# ============================================================================= + +class TestResetStats: + """Tests for MultiTierCache.reset_stats() method.""" + + @pytest.fixture + def tiny_model_config(self): + return MODEL_CONFIGS['tiny-1b'] + + def test_reset_stats_zeroes_counters(self, tiny_model_config): + """reset_stats() should zero all numeric counters.""" + cache = MultiTierCache( + model_config=tiny_model_config, + gpu_memory_gb=0, + cpu_memory_gb=0.1, + seed=42 + ) + # Generate some stats + for i in range(5): + cache.allocate_cache(f"entry_{i}", num_tokens=100) + cache.access_cache(f"entry_{i}", InferencePhase.DECODE) + + # Verify stats are non-zero before reset + assert cache.stats['cache_hits'] > 0 + assert cache.stats['write_operations'] > 0 + + cache.reset_stats() + + assert cache.stats['cache_hits'] == 0 + assert cache.stats['cache_misses'] == 0 + assert cache.stats['write_operations'] == 0 + assert cache.stats['read_operations'] == 0 + assert cache.stats['evictions'] == 0 + + def test_reset_stats_clears_lists(self, tiny_model_config): + """reset_stats() should clear all list stats.""" + cache = MultiTierCache( + model_config=tiny_model_config, + gpu_memory_gb=0, + cpu_memory_gb=0.1, + seed=42 + ) + for i in range(5): + cache.allocate_cache(f"entry_{i}", num_tokens=100) + + cache.reset_stats() + + for key, value in cache.stats.items(): + if isinstance(value, list): + assert len(value) == 0, f"List stat '{key}' should be empty after reset" + + def test_reset_stats_preserves_cache_entries(self, tiny_model_config): + """reset_stats() should NOT remove cached data, only counters.""" + cache = MultiTierCache( + model_config=tiny_model_config, + gpu_memory_gb=0, + cpu_memory_gb=0.1, + seed=42 + ) + for i in range(5): + cache.allocate_cache(f"entry_{i}", num_tokens=100) + + entries_before = len(cache.cache_entries) + cache.reset_stats() + entries_after = len(cache.cache_entries) + + assert entries_after == entries_before, "Cache entries should survive reset_stats()" + + +# ============================================================================= +# Test 16e: Race Condition Safety in read_cache (Change 2 fix) +# ============================================================================= + +class TestReadCacheRaceConditionSafety: + """Tests that read_cache handles evicted entries gracefully.""" + + @pytest.fixture + def tiny_model_config(self): + return MODEL_CONFIGS['tiny-1b'] + + def test_access_evicted_key_returns_none(self, tiny_model_config): + """Accessing a key that was evicted should return None, not crash.""" + cache = MultiTierCache( + model_config=tiny_model_config, + gpu_memory_gb=0, + cpu_memory_gb=0.001, + seed=42, + storage_capacity_gb=0.005 + ) + # Allocate an entry + cache.allocate_cache("victim", num_tokens=500) + + # Force eviction by filling the cache + for i in range(50): + cache.allocate_cache(f"fill_{i}", num_tokens=500) + + # Try to read the likely-evicted entry — should not crash + loc, latency = cache.access_cache("victim", InferencePhase.DECODE) + # loc is None if evicted, or a tier name if still present + if loc is None: + assert latency == 0.0 + else: + assert loc in ['cpu', 'nvme'] + + def test_access_nonexistent_key_records_miss(self, tiny_model_config): + """Accessing a key that doesn't exist should record a cache miss.""" + cache = MultiTierCache( + model_config=tiny_model_config, + gpu_memory_gb=0, + cpu_memory_gb=0.1, + seed=42 + ) + loc, latency = cache.access_cache("does_not_exist", InferencePhase.DECODE) + assert loc is None + stats = cache.get_stats(duration=1.0) + assert stats['cache_misses'] >= 1 + + +# ============================================================================= +# Test 17: Per-Tier Phase Metrics +# ============================================================================= + +class TestPerTierPhaseMetrics: + """Tests for per-tier KV bytes tracking (prefill/decode per tier).""" + + @pytest.fixture + def tiny_model_config(self): + """Return the tiny-1b model config for fast tests.""" + return MODEL_CONFIGS['tiny-1b'] + + @pytest.fixture + def multi_tier_cache_cpu_only(self, tiny_model_config): + """Return a MultiTierCache in CPU-only mode (GPU disabled).""" + return MultiTierCache( + model_config=tiny_model_config, + gpu_memory_gb=0, + cpu_memory_gb=0.1, # 100MB + seed=42 + ) + + def test_stats_have_tier_kv_bytes_written_keys(self, multi_tier_cache_cpu_only): + """Stats should include tier_*_kv_bytes_written keys.""" + multi_tier_cache_cpu_only.allocate_cache("test_entry", num_tokens=100) + stats = multi_tier_cache_cpu_only.get_stats(duration=1.0) + + # Check for per-tier write tracking + assert 'tier_gpu_kv_bytes_written_gb' in stats + assert 'tier_cpu_kv_bytes_written_gb' in stats + assert 'tier_storage_kv_bytes_written_gb' in stats + + def test_stats_have_tier_kv_bytes_read_keys(self, multi_tier_cache_cpu_only): + """Stats should include tier_*_kv_bytes_read keys.""" + multi_tier_cache_cpu_only.allocate_cache("test_entry", num_tokens=100) + multi_tier_cache_cpu_only.access_cache("test_entry", InferencePhase.DECODE) + stats = multi_tier_cache_cpu_only.get_stats(duration=1.0) + + # Check for per-tier read tracking + assert 'tier_gpu_kv_bytes_read_gb' in stats + assert 'tier_cpu_kv_bytes_read_gb' in stats + assert 'tier_storage_kv_bytes_read_gb' in stats + + def test_cpu_write_bytes_increment_on_allocate(self, multi_tier_cache_cpu_only): + """Allocating to CPU tier should increment tier_cpu_kv_bytes_written.""" + # Get initial stats + stats_before = multi_tier_cache_cpu_only.get_stats(duration=1.0) + cpu_written_before = stats_before.get('tier_cpu_kv_bytes_written_gb', 0) + + # Allocate cache entry (goes to CPU since GPU is disabled) + success, location, _ = multi_tier_cache_cpu_only.allocate_cache("test_entry", num_tokens=100) + assert success + assert location == 'cpu' + + # Check that CPU write bytes increased + stats_after = multi_tier_cache_cpu_only.get_stats(duration=1.0) + cpu_written_after = stats_after.get('tier_cpu_kv_bytes_written_gb', 0) + + assert cpu_written_after > cpu_written_before, \ + f"CPU write bytes should increase: {cpu_written_before} -> {cpu_written_after}" + + def test_cpu_read_bytes_increment_on_access(self, multi_tier_cache_cpu_only): + """Accessing from CPU tier should increment tier_cpu_kv_bytes_read.""" + # Allocate first + multi_tier_cache_cpu_only.allocate_cache("test_entry", num_tokens=100) + + # Get stats before access + stats_before = multi_tier_cache_cpu_only.get_stats(duration=1.0) + cpu_read_before = stats_before.get('tier_cpu_kv_bytes_read_gb', 0) + + # Access the cache entry + location, _ = multi_tier_cache_cpu_only.access_cache("test_entry", InferencePhase.DECODE) + assert location == 'cpu' + + # Check that CPU read bytes increased + stats_after = multi_tier_cache_cpu_only.get_stats(duration=1.0) + cpu_read_after = stats_after.get('tier_cpu_kv_bytes_read_gb', 0) + + assert cpu_read_after > cpu_read_before, \ + f"CPU read bytes should increase: {cpu_read_before} -> {cpu_read_after}" + + def test_gpu_bytes_zero_when_gpu_disabled(self, multi_tier_cache_cpu_only): + """With GPU disabled (0 GB), GPU tier bytes should remain zero.""" + # Do some allocations and accesses + for i in range(5): + multi_tier_cache_cpu_only.allocate_cache(f"entry_{i}", num_tokens=100) + for i in range(5): + multi_tier_cache_cpu_only.access_cache(f"entry_{i}", InferencePhase.DECODE) + + stats = multi_tier_cache_cpu_only.get_stats(duration=1.0) + + # GPU bytes should be zero since GPU tier is disabled + assert stats.get('tier_gpu_kv_bytes_written_gb', 0) == 0, \ + "GPU write bytes should be 0 when GPU disabled" + assert stats.get('tier_gpu_kv_bytes_read_gb', 0) == 0, \ + "GPU read bytes should be 0 when GPU disabled" + + def test_storage_tier_overflow(self, tiny_model_config): + """When CPU is full, allocations should overflow to storage tier.""" + # Create cache with very small CPU limit + cache = MultiTierCache( + model_config=tiny_model_config, + gpu_memory_gb=0, + cpu_memory_gb=0.001, # 1MB - very small + seed=42 + ) + + # Allocate enough to overflow CPU + for i in range(20): + cache.allocate_cache(f"entry_{i}", num_tokens=1000) + + stats = cache.get_stats(duration=1.0) + + # Storage tier should have received some data + storage_written = stats.get('tier_storage_kv_bytes_written_gb', 0) + assert storage_written > 0, \ + f"Storage tier should have data when CPU overflows: {storage_written}" + + def test_per_tier_bandwidth_calculated(self, multi_tier_cache_cpu_only): + """Per-tier bandwidth stats should be calculated.""" + # Do some I/O + for i in range(10): + multi_tier_cache_cpu_only.allocate_cache(f"entry_{i}", num_tokens=100) + for i in range(10): + multi_tier_cache_cpu_only.access_cache(f"entry_{i}", InferencePhase.DECODE) + + stats = multi_tier_cache_cpu_only.get_stats(duration=1.0) + + # Bandwidth stats should exist + assert 'tier_cpu_read_bandwidth_gbps' in stats + assert 'tier_cpu_write_bandwidth_gbps' in stats + assert 'tier_storage_read_bandwidth_gbps' in stats + assert 'tier_storage_write_bandwidth_gbps' in stats + + +@pytest.mark.skipif(not CUDA_AVAILABLE, reason="CUDA not available") +class TestPerTierPhaseMetricsWithGPU: + """Tests for per-tier metrics when GPU is enabled.""" + + @pytest.fixture + def tiny_model_config(self): + """Return the tiny-1b model config for fast tests.""" + return MODEL_CONFIGS['tiny-1b'] + + @pytest.fixture + def multi_tier_cache_with_gpu(self, tiny_model_config): + """Return a MultiTierCache with GPU enabled.""" + return MultiTierCache( + model_config=tiny_model_config, + gpu_memory_gb=1.0, # 1GB GPU + cpu_memory_gb=0.1, # 100MB CPU + seed=42 + ) + + def test_gpu_write_bytes_increment_on_allocate(self, multi_tier_cache_with_gpu): + """Allocating to GPU tier should increment tier_gpu_kv_bytes_written.""" + # Get initial stats + stats_before = multi_tier_cache_with_gpu.get_stats(duration=1.0) + gpu_written_before = stats_before.get('tier_gpu_kv_bytes_written_gb', 0) + + # Allocate cache entry (should go to GPU first) + success, location, _ = multi_tier_cache_with_gpu.allocate_cache("test_entry", num_tokens=100) + assert success + assert location == 'gpu' + + # Check that GPU write bytes increased + stats_after = multi_tier_cache_with_gpu.get_stats(duration=1.0) + gpu_written_after = stats_after.get('tier_gpu_kv_bytes_written_gb', 0) + + assert gpu_written_after > gpu_written_before, \ + f"GPU write bytes should increase: {gpu_written_before} -> {gpu_written_after}" + + def test_gpu_read_bytes_increment_on_access(self, multi_tier_cache_with_gpu): + """Accessing from GPU tier should increment tier_gpu_kv_bytes_read.""" + # Allocate first + multi_tier_cache_with_gpu.allocate_cache("test_entry", num_tokens=100) + + # Get stats before access + stats_before = multi_tier_cache_with_gpu.get_stats(duration=1.0) + gpu_read_before = stats_before.get('tier_gpu_kv_bytes_read_gb', 0) + + # Access the cache entry + location, _ = multi_tier_cache_with_gpu.access_cache("test_entry", InferencePhase.DECODE) + assert location == 'gpu' + + # Check that GPU read bytes increased + stats_after = multi_tier_cache_with_gpu.get_stats(duration=1.0) + gpu_read_after = stats_after.get('tier_gpu_kv_bytes_read_gb', 0) + + assert gpu_read_after > gpu_read_before, \ + f"GPU read bytes should increase: {gpu_read_before} -> {gpu_read_after}" + + def test_gpu_bandwidth_calculated(self, multi_tier_cache_with_gpu): + """GPU tier bandwidth stats should be calculated.""" + # Do some I/O + for i in range(5): + multi_tier_cache_with_gpu.allocate_cache(f"entry_{i}", num_tokens=100) + for i in range(5): + multi_tier_cache_with_gpu.access_cache(f"entry_{i}", InferencePhase.DECODE) + + stats = multi_tier_cache_with_gpu.get_stats(duration=1.0) + + # GPU bandwidth stats should exist + assert 'tier_gpu_read_bandwidth_gbps' in stats + assert 'tier_gpu_write_bandwidth_gbps' in stats + + +# ============================================================================= +# Test: Trace Replay (Streaming Iterator, Timestamp Pacing, Replay Cycles) +# ============================================================================= + +class TestTraceReplay: + """Tests for BurstGPT trace streaming iterator and replay logic.""" + + @pytest.fixture + def trace_dir(self, tmp_path): + """Create a temporary directory with small BurstGPT CSV trace files.""" + # File 1: 5 rows + csv1 = tmp_path / "BurstGPT_1.csv" + csv1.write_text( + "Timestamp,Model,Request tokens,Response tokens,Total tokens,Log Type\n" + "0,ChatGPT,100,20,120,Conversation log\n" + "10,ChatGPT,200,40,240,Conversation log\n" + "20,GPT-4,300,60,360,Conversation log\n" + "30,ChatGPT,400,80,480,Conversation log\n" + "40,ChatGPT,500,100,600,Conversation log\n" + ) + # File 2: 3 rows with timestamps continuing from file 1 + csv2 = tmp_path / "BurstGPT_2.csv" + csv2.write_text( + "Timestamp,Model,Request tokens,Response tokens,Total tokens,Log Type\n" + "50,GPT-4,150,30,180,Conversation log\n" + "60,ChatGPT,250,50,300,Conversation log\n" + "70,GPT-4,350,70,420,Conversation log\n" + ) + return tmp_path + + @pytest.fixture + def benchmark_with_trace(self, trace_dir): + """Return an IntegratedBenchmark configured for trace replay testing.""" + model_config = MODEL_CONFIGS['tiny-1b'] + bench = IntegratedBenchmark( + model_config=model_config, + num_users=5, + gpu_memory_gb=0, + cpu_memory_gb=0.01, + duration_seconds=30, + use_burst_trace=True, + burst_trace_path=str(trace_dir), + generation_mode=GenerationMode.NONE, + trace_speedup=0, # no delay for testing + replay_cycles=1, # single pass + ) + return bench + + def test_resolve_trace_files_from_directory(self, trace_dir): + """Passing a directory should resolve all CSVs sorted by name.""" + model_config = MODEL_CONFIGS['tiny-1b'] + bench = IntegratedBenchmark( + model_config=model_config, + num_users=1, + gpu_memory_gb=0, + cpu_memory_gb=0.01, + duration_seconds=5, + use_burst_trace=True, + burst_trace_path=str(trace_dir), + generation_mode=GenerationMode.NONE, + trace_speedup=0, + replay_cycles=1, + ) + assert len(bench.burst_trace_files) == 2 + assert 'BurstGPT_1.csv' in bench.burst_trace_files[0] + assert 'BurstGPT_2.csv' in bench.burst_trace_files[1] + + def test_resolve_single_file(self, trace_dir): + """Passing a single CSV file should resolve to a list of one.""" + csv_path = str(trace_dir / "BurstGPT_1.csv") + model_config = MODEL_CONFIGS['tiny-1b'] + bench = IntegratedBenchmark( + model_config=model_config, + num_users=1, + gpu_memory_gb=0, + cpu_memory_gb=0.01, + duration_seconds=5, + use_burst_trace=True, + burst_trace_path=csv_path, + generation_mode=GenerationMode.NONE, + trace_speedup=0, + replay_cycles=1, + ) + assert len(bench.burst_trace_files) == 1 + + def test_streaming_iterator_yields_all_rows(self, benchmark_with_trace): + """Streaming iterator should yield all rows across all files.""" + rows = list(benchmark_with_trace._burst_trace_iterator()) + assert len(rows) == 8 # 5 from file 1 + 3 from file 2 + + def test_streaming_iterator_tuple_format(self, benchmark_with_trace): + """Each yielded row should be (timestamp, context, generate, total).""" + row = next(iter(benchmark_with_trace._burst_trace_iterator())) + timestamp, context, generate, total = row + assert timestamp == 0.0 + assert context == 100 + assert generate == 20 + assert total == 120 + + def test_streaming_iterator_preserves_order(self, benchmark_with_trace): + """Rows should come in file order: all of file 1 then all of file 2.""" + rows = list(benchmark_with_trace._burst_trace_iterator()) + timestamps = [r[0] for r in rows] + # Timestamps should be monotonically increasing across both files + for i in range(1, len(timestamps)): + assert timestamps[i] > timestamps[i-1], \ + f"Timestamp at index {i} ({timestamps[i]}) should be > {timestamps[i-1]}" + + def test_replay_cycles_one_pass(self, trace_dir): + """With replay_cycles=1, generator should process all rows once then stop.""" + import threading + model_config = MODEL_CONFIGS['tiny-1b'] + bench = IntegratedBenchmark( + model_config=model_config, + num_users=5, + gpu_memory_gb=0, + cpu_memory_gb=0.01, + duration_seconds=60, + use_burst_trace=True, + burst_trace_path=str(trace_dir), + generation_mode=GenerationMode.NONE, + trace_speedup=0, + replay_cycles=1, + ) + + stop_event = threading.Event() + bench.stop_event = stop_event + + # Run generator in a thread + gen_thread = threading.Thread( + target=bench._generate_requests_from_trace, + args=(stop_event,), + daemon=True + ) + gen_thread.start() + gen_thread.join(timeout=10) + + # stop_event should have been set by the generator after 1 cycle + assert stop_event.is_set(), "stop_event should be set after replay_cycles=1 completes" + + # Queue should have exactly 8 requests (5 + 3) + count = 0 + while not bench.request_queue.empty(): + bench.request_queue.get_nowait() + count += 1 + assert count == 8, f"Expected 8 requests from 1 cycle, got {count}" + + def test_replay_cycles_two_passes(self, trace_dir): + """With replay_cycles=2, generator should process all rows twice.""" + import threading + model_config = MODEL_CONFIGS['tiny-1b'] + bench = IntegratedBenchmark( + model_config=model_config, + num_users=5, + gpu_memory_gb=0, + cpu_memory_gb=0.01, + duration_seconds=60, + use_burst_trace=True, + burst_trace_path=str(trace_dir), + generation_mode=GenerationMode.NONE, + trace_speedup=0, + replay_cycles=2, + ) + + stop_event = threading.Event() + bench.stop_event = stop_event + + gen_thread = threading.Thread( + target=bench._generate_requests_from_trace, + args=(stop_event,), + daemon=True + ) + gen_thread.start() + gen_thread.join(timeout=10) + + assert stop_event.is_set() + count = 0 + while not bench.request_queue.empty(): + bench.request_queue.get_nowait() + count += 1 + assert count == 16, f"Expected 16 requests from 2 cycles, got {count}" + + def test_total_tokens_tracked(self, benchmark_with_trace): + """Total tokens from trace should be summed correctly.""" + rows = list(benchmark_with_trace._burst_trace_iterator()) + expected_total = sum(r[3] for r in rows) + # 120+240+360+480+600 + 180+300+420 = 2700 + assert expected_total == 2700 + + def test_trace_speedup_zero_no_sleep(self, trace_dir): + """trace_speedup=0 should skip all timestamp delays (fast).""" + import threading + model_config = MODEL_CONFIGS['tiny-1b'] + bench = IntegratedBenchmark( + model_config=model_config, + num_users=5, + gpu_memory_gb=0, + cpu_memory_gb=0.01, + duration_seconds=60, + use_burst_trace=True, + burst_trace_path=str(trace_dir), + generation_mode=GenerationMode.NONE, + trace_speedup=0, + replay_cycles=1, + ) + + stop_event = threading.Event() + bench.stop_event = stop_event + + start = time.time() + gen_thread = threading.Thread( + target=bench._generate_requests_from_trace, + args=(stop_event,), + daemon=True + ) + gen_thread.start() + gen_thread.join(timeout=10) + elapsed = time.time() - start + + # With speedup=0, should finish almost instantly (< 2s) + assert elapsed < 2.0, f"speedup=0 should be near-instant, took {elapsed:.2f}s" + + +# ============================================================================= +# Test: Eviction Tracing +# ============================================================================= + +class TestEvictionTracing: + """Test that traces eviction behavior in the multi-tier cache.""" + + def test_eviction_lifecycle(self): + """Trace the full eviction lifecycle: fill tier, trigger eviction, verify entries removed.""" + model_config = MODEL_CONFIGS['tiny-1b'] + # tiny-1b: ~24KB per token of KV cache. + # 10 tokens per entry = ~240KB per entry. + # storage_capacity_gb=0.01 (~10MB) fits ~42 entries before eviction. + cache = MultiTierCache( + model_config=model_config, + gpu_memory_gb=0, + cpu_memory_gb=0.001, # ~1MB CPU to trigger overflow quickly + seed=42, + storage_capacity_gb=0.01 # ~10MB storage to trigger NVMe eviction + ) + + eviction_log = [] + allocated_keys = [] + allocated_tiers = {} + + # Phase 1: Fill both CPU and storage tiers (200 entries >> ~42 capacity) + for i in range(200): + key = f"evict_test_{i}" + success, tier, latency = cache.allocate_cache(key, num_tokens=10) + if success: + allocated_keys.append(key) + allocated_tiers[key] = tier + eviction_log.append(('allocate', key, tier)) + + # Phase 2: Check that evictions occurred + # Note: evictions counter is in cache.stats directly, not in get_stats() output + evictions = cache.stats['evictions'] + eviction_log.append(('stats', 'evictions', evictions)) + + # Phase 3: Verify some early keys were evicted (no longer in cache) + evicted_count = 0 + surviving_count = 0 + for key in allocated_keys[:50]: # Check first 50 keys + if key in cache.cache_entries: + surviving_count += 1 + else: + evicted_count += 1 + eviction_log.append(('evicted', key, None)) + + # Assertions + assert evictions > 0, \ + f"Evictions should have occurred with tiny capacity. Log: {eviction_log[:20]}" + assert evicted_count > 0, \ + f"Some early entries should have been evicted. " \ + f"Evicted: {evicted_count}, Surviving: {surviving_count}" + + # Phase 4: Verify later keys are still accessible + late_key = allocated_keys[-1] + assert late_key in cache.cache_entries, \ + f"Most recent key '{late_key}' should still be in cache" + + +# ============================================================================= +# Test: Bottleneck Profiling +# ============================================================================= + +class TestBottleneckProfiling: + """Profile bottleneck detection in the KV cache benchmark.""" + + def test_profile_allocate_vs_access_overhead(self): + """Profile allocate vs access operations to identify bottleneck ratios.""" + import time as time_mod + + model_config = MODEL_CONFIGS['tiny-1b'] + cache = MultiTierCache( + model_config=model_config, + gpu_memory_gb=0, + cpu_memory_gb=0.1, # 100MB + seed=42 + ) + + num_ops = 500 + keys = [f"profile_key_{i}" for i in range(num_ops)] + + # Profile allocations (write path) + alloc_start = time_mod.perf_counter() + for key in keys: + cache.allocate_cache(key, num_tokens=100) + alloc_elapsed = time_mod.perf_counter() - alloc_start + + # Profile accesses (read path) + access_start = time_mod.perf_counter() + for key in keys: + cache.access_cache(key, InferencePhase.DECODE) + access_elapsed = time_mod.perf_counter() - access_start + + alloc_per_op_us = (alloc_elapsed / num_ops) * 1e6 + access_per_op_us = (access_elapsed / num_ops) * 1e6 + + # Profile lock contention: metadata_lock acquire time + lock_times = [] + for _ in range(100): + t0 = time_mod.perf_counter() + with cache.metadata_lock: + pass + lock_times.append((time_mod.perf_counter() - t0) * 1e6) + avg_lock_us = sum(lock_times) / len(lock_times) + + # Profile stats collection overhead + stats_start = time_mod.perf_counter() + for _ in range(100): + cache.get_stats(duration=1.0) + stats_elapsed = time_mod.perf_counter() - stats_start + stats_per_call_us = (stats_elapsed / 100) * 1e6 + + # Assertions: ensure no single operation is unreasonably slow + # These thresholds are generous — the point is detecting regressions + assert alloc_per_op_us < 50000, \ + f"Allocation too slow: {alloc_per_op_us:.0f} us/op (threshold: 50ms)" + assert access_per_op_us < 50000, \ + f"Access too slow: {access_per_op_us:.0f} us/op (threshold: 50ms)" + assert avg_lock_us < 1000, \ + f"Lock contention too high: {avg_lock_us:.0f} us/acquire (threshold: 1ms)" + assert stats_per_call_us < 100000, \ + f"get_stats() too slow: {stats_per_call_us:.0f} us/call (threshold: 100ms)" + + # Report profiling results for visibility in test output + print(f"\n --- Bottleneck Profile ({num_ops} ops) ---") + print(f" Allocate: {alloc_per_op_us:>8.1f} us/op ({num_ops / alloc_elapsed:>8.0f} ops/s)") + print(f" Access: {access_per_op_us:>8.1f} us/op ({num_ops / access_elapsed:>8.0f} ops/s)") + print(f" Lock: {avg_lock_us:>8.1f} us/acquire") + print(f" get_stats(): {stats_per_call_us:>8.1f} us/call") + print(f" Write:Read ratio: {alloc_per_op_us / max(access_per_op_us, 0.01):.2f}x") + + +# ============================================================================= +# Test: Validation for new CLI args (trace_speedup, replay_cycles) +# ============================================================================= + +class TestValidateNewTraceArgs: + """Validation tests for --trace-speedup and --replay-cycles.""" + + @pytest.fixture + def valid_args(self): + import argparse + return argparse.Namespace( + num_users=100, duration=60, gpu_mem_gb=16, cpu_mem_gb=32, + rag_num_docs=10, max_conversations=500, max_concurrent_allocs=0, + request_rate=0, max_requests=0, target_saturation=0.8, + cache_dir=None, storage_capacity_gb=0, precondition_size_gb=0, + precondition_threads=0, trace_speedup=1.0, replay_cycles=0 + ) + + def test_trace_speedup_negative_rejected(self, valid_args): + valid_args.trace_speedup = -1.0 + with pytest.raises(ValueError, match="trace-speedup cannot be negative"): + validate_args(valid_args) + + def test_trace_speedup_zero_accepted(self, valid_args): + valid_args.trace_speedup = 0 + result = validate_args(valid_args) + assert result.trace_speedup == 0 + + def test_trace_speedup_positive_accepted(self, valid_args): + valid_args.trace_speedup = 100.0 + result = validate_args(valid_args) + assert result.trace_speedup == 100.0 + + def test_replay_cycles_negative_rejected(self, valid_args): + valid_args.replay_cycles = -1 + with pytest.raises(ValueError, match="replay-cycles cannot be negative"): + validate_args(valid_args) + + def test_replay_cycles_zero_accepted(self, valid_args): + valid_args.replay_cycles = 0 + result = validate_args(valid_args) + assert result.replay_cycles == 0 + + def test_replay_cycles_positive_accepted(self, valid_args): + valid_args.replay_cycles = 5 + result = validate_args(valid_args) + assert result.replay_cycles == 5 + + # ============================================================================= # Main entry point for running without pytest # ============================================================================= @@ -885,8 +2397,10 @@ def pytest_configure(config): """Add metadata to pytest-html report.""" if hasattr(config, '_metadata'): config._metadata['Project'] = 'MLPerf v3 KV Cache Benchmark' + config._metadata['Source File'] = 'kv-cache.py' config._metadata['Models'] = 'tiny-1b, mistral-7b, llama2-7b, llama3.1-8b, llama3.1-70b-instruct' config._metadata['Test File'] = 'test_kv_cache.py' + config._metadata['New Features Tested'] = 'ConfigLoader, Extended QoS (p999/p9999), cfg() helper, storage_* naming, NVMe capacity tracking, NVMe eviction, reset_stats, preconditioning validation, trace streaming iterator, timestamp pacing, replay cycles, eviction tracing, bottleneck profiling' def pytest_html_report_title(report): diff --git a/kv_cache_benchmark/tests/unit_test_results/kv-cache-test-report.html b/kv_cache_benchmark/tests/unit_test_results/kv-cache-test-report.html deleted file mode 100644 index 1f4a7fa3..00000000 --- a/kv_cache_benchmark/tests/unit_test_results/kv-cache-test-report.html +++ /dev/null @@ -1,1091 +0,0 @@ - - - - - kv-cache-test-report.html - - - - -

kv-cache-test-report.html

-

Report generated on 12-Jan-2026 at 16:00:59 by pytest-html - v4.1.1

-
-

Environment

-
-
- - - - - -
-
-

Summary

-
-
-

112 tests took 00:01:19.

-

(Un)check the boxes to filter the results.

-
- -
-
-
-
- - 0 Failed, - - 112 Passed, - - 0 Skipped, - - 0 Expected failures, - - 0 Unexpected passes, - - 0 Errors, - - 0 Reruns -
-
-  /  -
-
-
-
-
-
-
-
- - - - - - - - - -
ResultTestDurationLinks
- -
-
- -
- \ No newline at end of file diff --git a/kv_cache_benchmark/utils/json_to_xlsx.py b/kv_cache_benchmark/utils/json_to_xlsx.py index 79a044d3..b0dcb0e9 100644 --- a/kv_cache_benchmark/utils/json_to_xlsx.py +++ b/kv_cache_benchmark/utils/json_to_xlsx.py @@ -1,128 +1,193 @@ -import os -import json -import pandas as pd -import glob -import argparse - -def process_json_files(input_dir='.', output_file='mlperf_storage_summary.xlsx'): - # Find all json files in the specified directory - json_pattern = os.path.join(input_dir, '*.json') - json_files = glob.glob(json_pattern) - - if not json_files: - print(f"No JSON files found in {input_dir}") - return - - data_list = [] - - for json_file in json_files: - try: - with open(json_file, 'r') as f: - data = json.load(f) - - # Extract summary data - summary = data.get('summary', {}) - if not summary: - print(f"Warning: No 'summary' key found in {json_file}") - continue - - # Helper to safely get nested keys - def get_nested(d, keys, default=None): - for key in keys: - if isinstance(d, dict): - d = d.get(key, default) - else: - return default - return d - - # Calculate storage throughput from root-level fields - # This is the correct metric: tokens / total_storage_io_latency - total_tokens = data.get('total_tokens_generated', 0) - total_io_latency = data.get('total_storage_io_latency', 0) - storage_throughput = total_tokens / total_io_latency if total_io_latency > 0 else None - - # Also get requests completed for storage requests/sec - requests_completed = data.get('requests_completed', 0) - storage_requests_per_sec = requests_completed / total_io_latency if total_io_latency > 0 else None - - # Build the row for this file - row = { - 'Filename': json_file, - # Storage throughput is the PRIMARY metric for MLPerf Storage benchmark - 'Storage Throughput (tokens/sec)': storage_throughput, - 'Storage Requests/sec': storage_requests_per_sec, - 'Total I/O Time (s)': total_io_latency, - # Wall-clock throughput (for reference only - NOT for tier comparison) - 'Wall-Clock Throughput (tokens/sec)': summary.get('avg_throughput_tokens_per_sec'), - 'Wall-Clock Requests/sec': summary.get('requests_per_second'), - 'Total Tokens': summary.get('total_tokens') or total_tokens, - 'Total Requests': summary.get('total_requests') or requests_completed, - - # End to End Latency - 'E2E Latency Mean (ms)': get_nested(summary, ['end_to_end_latency_ms', 'mean']), - 'E2E Latency P50 (ms)': get_nested(summary, ['end_to_end_latency_ms', 'p50']), - 'E2E Latency P95 (ms)': get_nested(summary, ['end_to_end_latency_ms', 'p95']), - 'E2E Latency P99 (ms)': get_nested(summary, ['end_to_end_latency_ms', 'p99']), - - # Generation Latency - 'Gen Latency Mean (ms)': get_nested(summary, ['generation_latency_ms', 'mean']), - 'Gen Latency P50 (ms)': get_nested(summary, ['generation_latency_ms', 'p50']), - 'Gen Latency P95 (ms)': get_nested(summary, ['generation_latency_ms', 'p95']), - 'Gen Latency P99 (ms)': get_nested(summary, ['generation_latency_ms', 'p99']), - - # Storage IO Latency - 'Storage Latency Mean (ms)': get_nested(summary, ['storage_io_latency_ms', 'mean']), - 'Storage Latency P50 (ms)': get_nested(summary, ['storage_io_latency_ms', 'p50']), - 'Storage Latency P95 (ms)': get_nested(summary, ['storage_io_latency_ms', 'p95']), - 'Storage Latency P99 (ms)': get_nested(summary, ['storage_io_latency_ms', 'p99']), - - # Cache Stats - 'Cache Hit Rate': get_nested(summary, ['cache_stats', 'cache_hit_rate']), - 'Read/Write Ratio': get_nested(summary, ['cache_stats', 'read_write_ratio']), - 'Total Read (GB)': get_nested(summary, ['cache_stats', 'total_read_gb']), - 'Total Write (GB)': get_nested(summary, ['cache_stats', 'total_write_gb']), - 'Prefill Bytes Written (GB)': get_nested(summary, ['cache_stats', 'prefill_bytes_written_gb']), - 'Decode Bytes Read (GB)': get_nested(summary, ['cache_stats', 'decode_bytes_read_gb']), - } - - data_list.append(row) - print(f"Processed {json_file}") - - except Exception as e: - print(f"Error processing {json_file}: {e}") - - if not data_list: - print("No valid data extracted.") - return - - # Create DataFrame - df = pd.DataFrame(data_list) - - # Sort by Filename - df = df.sort_values('Filename') - - # Save to Excel - try: - df.to_excel(output_file, index=False) - print(f"\nSuccessfully created {output_file} with {len(df)} records.") - print("\nColumns included:") - print(df.columns.tolist()) - print("\nPreview of data (Storage Throughput is the correct metric for tier comparison):") - preview_cols = ['Filename', 'Storage Throughput (tokens/sec)', 'Total I/O Time (s)', 'Total Tokens'] - available_cols = [c for c in preview_cols if c in df.columns] - print(df[available_cols].to_string()) - except Exception as e: - print(f"Error saving Excel file: {e}") - # Fallback to CSV if Excel fails (e.g. missing openpyxl) - csv_file = output_file.replace('.xlsx', '.csv') - print(f"Attempting to save as CSV to {csv_file}...") - df.to_csv(csv_file, index=False) - print(f"Successfully created {csv_file}") - -if __name__ == "__main__": - parser = argparse.ArgumentParser(description='Convert JSON benchmark results to Excel') - parser.add_argument('--input-dir', '-i', default='.', help='Directory containing JSON files') - parser.add_argument('--output', '-o', default='mlperf_storage_summary.xlsx', help='Output Excel filename') - args = parser.parse_args() - - process_json_files(input_dir=args.input_dir, output_file=args.output) +import os +import json +import pandas as pd +import glob +import argparse + +def process_json_files(input_dir='.', output_file='mlperf_storage_summary.xlsx'): + # Find all json files in the specified directory + json_pattern = os.path.join(input_dir, '*.json') + json_files = glob.glob(json_pattern) + + if not json_files: + print(f"No JSON files found in {input_dir}") + return + + data_list = [] + + for json_file in json_files: + try: + with open(json_file, 'r') as f: + data = json.load(f) + + # Extract summary data + summary = data.get('summary', {}) + if not summary: + print(f"Warning: No 'summary' key found in {json_file}") + continue + + # Helper to safely get nested keys + def get_nested(d, keys, default=None): + for key in keys: + if isinstance(d, dict): + d = d.get(key, default) + else: + return default + return d + + # Calculate storage throughput from root-level fields + # This is the correct metric: tokens / total_storage_io_latency + total_tokens = data.get('total_tokens_generated', 0) + total_io_latency = data.get('total_storage_io_latency', 0) + storage_throughput = total_tokens / total_io_latency if total_io_latency > 0 else None + + # Also get requests completed for storage requests/sec + requests_completed = data.get('requests_completed', 0) + storage_requests_per_sec = requests_completed / total_io_latency if total_io_latency > 0 else None + + # Build the row for this file + row = { + 'Filename': json_file, + + # === THROUGHPUT METRICS === + # Storage throughput is the PRIMARY metric for MLPerf Storage benchmark + 'Storage Throughput (tok/s)': storage_throughput, + 'Storage Requests/sec': storage_requests_per_sec, + 'Total I/O Time (s)': total_io_latency, + # Wall-clock throughput (for reference only - NOT for tier comparison) + 'Avg Throughput (tok/s)': summary.get('avg_throughput_tokens_per_sec'), + 'Requests/sec': summary.get('requests_per_second'), + 'Total Tokens': summary.get('total_tokens') or total_tokens, + 'Total Requests': summary.get('total_requests') or requests_completed, + 'Elapsed Time (s)': summary.get('elapsed_time'), + + # === END-TO-END LATENCY === + 'E2E Latency Mean (ms)': get_nested(summary, ['end_to_end_latency_ms', 'mean']), + 'E2E Latency P50 (ms)': get_nested(summary, ['end_to_end_latency_ms', 'p50']), + 'E2E Latency P95 (ms)': get_nested(summary, ['end_to_end_latency_ms', 'p95']), + 'E2E Latency P99 (ms)': get_nested(summary, ['end_to_end_latency_ms', 'p99']), + 'E2E Latency P99.9 (ms)': get_nested(summary, ['end_to_end_latency_ms', 'p999']), + 'E2E Latency P99.99 (ms)': get_nested(summary, ['end_to_end_latency_ms', 'p9999']), + + # === STORAGE I/O LATENCY (aggregate) === + 'Storage Latency Mean (ms)': get_nested(summary, ['storage_io_latency_ms', 'mean']), + 'Storage Latency P50 (ms)': get_nested(summary, ['storage_io_latency_ms', 'p50']), + 'Storage Latency P95 (ms)': get_nested(summary, ['storage_io_latency_ms', 'p95']), + 'Storage Latency P99 (ms)': get_nested(summary, ['storage_io_latency_ms', 'p99']), + 'Storage Latency P99.9 (ms)': get_nested(summary, ['storage_io_latency_ms', 'p999']), + 'Storage Latency P99.99 (ms)': get_nested(summary, ['storage_io_latency_ms', 'p9999']), + + # === GENERATION LATENCY (simulated GPU work) === + 'Gen Latency Mean (ms)': get_nested(summary, ['generation_latency_ms', 'mean']), + 'Gen Latency P50 (ms)': get_nested(summary, ['generation_latency_ms', 'p50']), + 'Gen Latency P95 (ms)': get_nested(summary, ['generation_latency_ms', 'p95']), + 'Gen Latency P99 (ms)': get_nested(summary, ['generation_latency_ms', 'p99']), + + # === STORAGE TIER TOTAL LATENCY (Host + Device) === + 'Storage Tier Read Total P50 (ms)': get_nested(summary, ['cache_stats', 'storage_read_p50_ms']), + 'Storage Tier Read Total P95 (ms)': get_nested(summary, ['cache_stats', 'storage_read_p95_ms']), + 'Storage Tier Read Total P99 (ms)': get_nested(summary, ['cache_stats', 'storage_read_p99_ms']), + 'Storage Tier Read Total P99.9 (ms)': get_nested(summary, ['cache_stats', 'storage_read_p999_ms']), + 'Storage Tier Read Total P99.99 (ms)': get_nested(summary, ['cache_stats', 'storage_read_p9999_ms']), + 'Storage Tier Write Total P50 (ms)': get_nested(summary, ['cache_stats', 'storage_write_p50_ms']), + 'Storage Tier Write Total P95 (ms)': get_nested(summary, ['cache_stats', 'storage_write_p95_ms']), + 'Storage Tier Write Total P99 (ms)': get_nested(summary, ['cache_stats', 'storage_write_p99_ms']), + 'Storage Tier Write Total P99.9 (ms)': get_nested(summary, ['cache_stats', 'storage_write_p999_ms']), + 'Storage Tier Write Total P99.99 (ms)': get_nested(summary, ['cache_stats', 'storage_write_p9999_ms']), + + # === STORAGE TIER DEVICE LATENCY (actual disk I/O - PRIMARY METRIC) === + 'Storage Tier Read Device P50 (ms)': get_nested(summary, ['cache_stats', 'storage_read_device_p50_ms']), + 'Storage Tier Read Device P95 (ms)': get_nested(summary, ['cache_stats', 'storage_read_device_p95_ms']), + 'Storage Tier Read Device P99 (ms)': get_nested(summary, ['cache_stats', 'storage_read_device_p99_ms']), + 'Storage Tier Read Device P99.9 (ms)': get_nested(summary, ['cache_stats', 'storage_read_device_p999_ms']), + 'Storage Tier Read Device P99.99 (ms)': get_nested(summary, ['cache_stats', 'storage_read_device_p9999_ms']), + 'Storage Tier Write Device P50 (ms)': get_nested(summary, ['cache_stats', 'storage_write_device_p50_ms']), + 'Storage Tier Write Device P95 (ms)': get_nested(summary, ['cache_stats', 'storage_write_device_p95_ms']), + 'Storage Tier Write Device P99 (ms)': get_nested(summary, ['cache_stats', 'storage_write_device_p99_ms']), + 'Storage Tier Write Device P99.9 (ms)': get_nested(summary, ['cache_stats', 'storage_write_device_p999_ms']), + 'Storage Tier Write Device P99.99 (ms)': get_nested(summary, ['cache_stats', 'storage_write_device_p9999_ms']), + + # === STORAGE TIER HOST LATENCY (CPU serialization/deserialization) === + 'Storage Tier Read Host P50 (ms)': get_nested(summary, ['cache_stats', 'storage_read_host_p50_ms']), + 'Storage Tier Read Host P95 (ms)': get_nested(summary, ['cache_stats', 'storage_read_host_p95_ms']), + 'Storage Tier Read Host P99 (ms)': get_nested(summary, ['cache_stats', 'storage_read_host_p99_ms']), + 'Storage Tier Read Host P99.9 (ms)': get_nested(summary, ['cache_stats', 'storage_read_host_p999_ms']), + 'Storage Tier Read Host P99.99 (ms)': get_nested(summary, ['cache_stats', 'storage_read_host_p9999_ms']), + 'Storage Tier Write Host P50 (ms)': get_nested(summary, ['cache_stats', 'storage_write_host_p50_ms']), + 'Storage Tier Write Host P95 (ms)': get_nested(summary, ['cache_stats', 'storage_write_host_p95_ms']), + 'Storage Tier Write Host P99 (ms)': get_nested(summary, ['cache_stats', 'storage_write_host_p99_ms']), + 'Storage Tier Write Host P99.9 (ms)': get_nested(summary, ['cache_stats', 'storage_write_host_p999_ms']), + 'Storage Tier Write Host P99.99 (ms)': get_nested(summary, ['cache_stats', 'storage_write_host_p9999_ms']), + + # === CACHE STATS === + 'Cache Hit Rate': get_nested(summary, ['cache_stats', 'cache_hit_rate']), + 'Read/Write Ratio': get_nested(summary, ['cache_stats', 'read_write_ratio']), + 'Total Read (GB)': get_nested(summary, ['cache_stats', 'total_read_gb']), + 'Total Write (GB)': get_nested(summary, ['cache_stats', 'total_write_gb']), + + # === PER-TIER KV BYTES (MLPerf v3.0) === + 'Tier GPU KV Bytes Written (GB)': get_nested(summary, ['cache_stats', 'tier_gpu_kv_bytes_written_gb']), + 'Tier GPU KV Bytes Read (GB)': get_nested(summary, ['cache_stats', 'tier_gpu_kv_bytes_read_gb']), + 'Tier CPU KV Bytes Written (GB)': get_nested(summary, ['cache_stats', 'tier_cpu_kv_bytes_written_gb']), + 'Tier CPU KV Bytes Read (GB)': get_nested(summary, ['cache_stats', 'tier_cpu_kv_bytes_read_gb']), + 'Tier Storage KV Bytes Written (GB)': get_nested(summary, ['cache_stats', 'tier_storage_kv_bytes_written_gb']), + 'Tier Storage KV Bytes Read (GB)': get_nested(summary, ['cache_stats', 'tier_storage_kv_bytes_read_gb']), + + # === PER-TIER BANDWIDTH (GB/s) - PRIMARY METRICS === + 'Tier GPU Read Bandwidth (GB/s)': get_nested(summary, ['cache_stats', 'tier_gpu_read_bandwidth_gbps']), + 'Tier GPU Write Bandwidth (GB/s)': get_nested(summary, ['cache_stats', 'tier_gpu_write_bandwidth_gbps']), + 'Tier CPU Read Bandwidth (GB/s)': get_nested(summary, ['cache_stats', 'tier_cpu_read_bandwidth_gbps']), + 'Tier CPU Write Bandwidth (GB/s)': get_nested(summary, ['cache_stats', 'tier_cpu_write_bandwidth_gbps']), + 'Tier Storage Read Bandwidth (GB/s)': get_nested(summary, ['cache_stats', 'tier_storage_read_bandwidth_gbps']), + 'Tier Storage Write Bandwidth (GB/s)': get_nested(summary, ['cache_stats', 'tier_storage_write_bandwidth_gbps']), + + # === TIER ENTRY DISTRIBUTION === + 'GPU Entries': get_nested(summary, ['cache_stats', 'gpu_entries']), + 'CPU Entries': get_nested(summary, ['cache_stats', 'cpu_entries']), + 'Storage Entries': get_nested(summary, ['cache_stats', 'storage_entries']), + + # === MULTI-TURN STATS === + 'Multi-turn Hit Rate': get_nested(summary, ['multi_turn_stats', 'hit_rate']), + } + + data_list.append(row) + print(f"Processed {json_file}") + + except Exception as e: + print(f"Error processing {json_file}: {e}") + + if not data_list: + print("No valid data extracted.") + return + + # Create DataFrame + df = pd.DataFrame(data_list) + + # Sort by Filename + df = df.sort_values('Filename') + + # Save to Excel + try: + df.to_excel(output_file, index=False) + print(f"\nSuccessfully created {output_file} with {len(df)} records.") + print("\nColumns included:") + print(df.columns.tolist()) + print(f"\nPreview of data (Storage Throughput is the correct metric for tier comparison):") + preview_cols = ['Filename', 'Storage Throughput (tok/s)', 'Tier Storage Read Bandwidth (GB/s)', 'Total Tokens'] + available_cols = [c for c in preview_cols if c in df.columns] + print(df[available_cols].to_string()) + except Exception as e: + print(f"Error saving Excel file: {e}") + # Fallback to CSV if Excel fails (e.g. missing openpyxl) + csv_file = output_file.replace('.xlsx', '.csv') + print(f"Attempting to save as CSV to {csv_file}...") + df.to_csv(csv_file, index=False) + print(f"Successfully created {csv_file}") + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description='Convert JSON benchmark results to Excel') + parser.add_argument('--input-dir', '-i', default='.', help='Directory containing JSON files') + parser.add_argument('--output', '-o', default='mlperf_storage_summary.xlsx', help='Output Excel filename') + args = parser.parse_args() + + process_json_files(input_dir=args.input_dir, output_file=args.output) diff --git a/kv_cache_benchmark/kv-cache-wrapper.sh b/kv_cache_benchmark/utils/kv-cache-wrapper.sh similarity index 90% rename from kv_cache_benchmark/kv-cache-wrapper.sh rename to kv_cache_benchmark/utils/kv-cache-wrapper.sh index 2b648d6a..62714190 100644 --- a/kv_cache_benchmark/kv-cache-wrapper.sh +++ b/kv_cache_benchmark/utils/kv-cache-wrapper.sh @@ -1,7 +1,7 @@ #!/bin/bash # KV Cache Storage Benchmark - Multi-Tier Performance Comparison -# Kingston Digital, 2025 -# Apache 2.0 license +# Hazem Awadallah, Kingston Digital, 2025 +# Assisted by Github Copilot # This script runs a comprehensive comparison of cache tier configurations for LLM inference workloads. # It automatically detects your hardware (GPU, RAM, storage) and runs 9 different test scenarios to show # you exactly where your data ends up and how fast it moves between tiers. @@ -40,6 +40,7 @@ Usage: ./kv-cache-wrapper.sh [options] [model] Options: -m MODEL Model key to benchmark (tiny-1b, mistral-7b, llama3.1-8b, llama2-7b, llama3.1-70b-instruct) + -c DIR Cache directory path (default: auto-detect /mnt/nvme, /mnt/ssd, or /tmp) -t SECONDS Duration for tier comparison tests (default: 120) -s SECONDS Duration for storage saturation test (default: 180) -r SECONDS Duration for realistic production test (default: 180) @@ -57,6 +58,7 @@ EOF # Default configuration (can be overridden via getopts) model="" +cache_dir_override="" tier_duration=120 saturation_duration=180 realistic_duration=180 @@ -67,9 +69,10 @@ users_high_override="" rag_enabled=0 rag_docs_override="" -while getopts ":m:t:s:r:a:w:u:U:RD:h" opt; do +while getopts ":m:c:t:s:r:a:w:u:U:RD:h" opt; do case "$opt" in m) model="$OPTARG" ;; + c) cache_dir_override="$OPTARG" ;; t) tier_duration="$OPTARG" ;; s) saturation_duration="$OPTARG" ;; r) realistic_duration="$OPTARG" ;; @@ -275,15 +278,18 @@ else fi # System detection - Storage path -# Priority: /mnt/nvme > /mnt/ssd > /tmp -cache_dir="/tmp/kvcache_benchmark" -if [ -d "/mnt/nvme" ] && [ -w "/mnt/nvme" ]; then +# Priority: user override > /mnt/nvme > /mnt/ssd > /tmp +if [ -n "$cache_dir_override" ]; then + cache_dir="$cache_dir_override" + echo "Cache directory (user override): $cache_dir" +elif [ -d "/mnt/nvme" ] && [ -w "/mnt/nvme" ]; then cache_dir="/mnt/nvme" echo "NVMe storage path: $cache_dir" elif [ -d "/mnt/ssd" ] && [ -w "/mnt/ssd" ]; then cache_dir="/mnt/ssd" echo "SSD storage path: $cache_dir" else + cache_dir="/tmp/kvcache_benchmark" echo "Warning: using temp storage at $cache_dir (consider mounting NVMe to /mnt/nvme)" fi @@ -367,17 +373,19 @@ if should_run 'capacity-autoscale'; then capacity_model="llama3.1-70b-instruct" python3 kv-cache.py \ + --config config.yaml \ --model "$capacity_model" \ --num-users "$capacity_start_users" \ --duration "$autoscale_duration" \ --gpu-mem-gb 0 \ - --cpu-mem-gb 4 \ + --cpu-mem-gb 0 \ --enable-autoscaling \ --autoscaler-mode capacity \ --generation-mode none \ --cache-dir "$cache_dir" \ --seed 42 \ - --output results_autoscaling_capacity.json + --output results_autoscaling_capacity.json \ + --xlsx-output results_autoscaling_capacity.xlsx echo "" echo "Capacity discovery complete. Check results_autoscaling_capacity.json for peak throughput." @@ -388,51 +396,39 @@ else fi # ============================================================================== -# OFFICIAL MLPERF SUBMISSION WORKLOAD (DISCOVERY-VALIDATED) +# OFFICIAL MLPERF SUBMISSION WORKLOAD # ============================================================================== -# These invocations have been validated through extensive discovery testing: -# - 1,411 Fast system tests (14,000 MB/s NVMe) -# - 268 Slow system tests (3,000 MB/s storage) -# -# KEY FINDINGS FROM DISCOVERY TESTING: -# - Storage Throughput metric is UNRELIABLE at cpu_mem=0GB (only 1.1x differentiation) -# - Decode Bytes Read shows 2.62x differentiation at cpu_mem=0GB (100% win rate) -# - Wall-Clock Throughput shows 2.43x differentiation at cpu_mem=0GB (100% win rate) -# - Storage Throughput works at cpu_mem=4GB (2.2x differentiation, 97% win rate) -# - High variance (CV 50-125%) requires multiple trials +# This is a special workload that runs only the two required scenarios for an +# official MLPerf v3.0 storage submission. It uses fixed, long durations and +# specific user counts to ensure results are standardized and comparable. # -# This workload runs TWO configurations: -# 1. Maximum Storage Stress (cpu_mem=0GB) - Use Decode Bytes Read as primary metric -# 2. Storage Throughput Test (cpu_mem=4GB) - Use Storage Throughput as primary metric +# NOTE: These parameters are intentionally stressful. They use a high user count +# with a small CPU memory budget to force near-constant NVMe access. The goal is +# to saturate the storage device and measure its performance under extreme load. +# Expect very high latencies; this is not a test of user experience, but a +# benchmark of the underlying storage hardware's breaking point. See the +# analysis in `report_analysis.md` for context on why this occurs. # ============================================================================== if should_run 'mlperf_submission'; then echo "============================================================================" - echo "RUNNING OFFICIAL MLPERF SUBMISSION WORKLOAD (DISCOVERY-VALIDATED)" + echo "RUNNING OFFICIAL MLPERF SUBMISSION WORKLOAD" echo "============================================================================" echo "" - echo "NOTE: Discovery testing validated these configurations across 1,679 tests." - echo " See mlperfv3_results_and_metrics_discovery.md for full analysis." - echo "" - # ------------------------------------------------------------------------- - # Test 1: Maximum Storage Stress (cpu_mem=0GB) - # Primary Metrics: Decode Bytes Read (2.62x), Wall-Clock Throughput (2.43x) - # WARNING: Do NOT use Storage Throughput at cpu_mem=0GB (only 1.1x differentiation) - # ------------------------------------------------------------------------- - echo "[MLPerf 1/4] Maximum Storage Stress: llama3.1-8b, cpu_mem=0GB, 200 users..." - echo " PRIMARY METRICS: Decode Bytes Read, Wall-Clock Throughput" - echo " WARNING: Storage Throughput unreliable at cpu_mem=0GB" + echo "[MLPerf 1/2] Standard Submission: llama3.1-8b with 150 users..." python3 kv-cache.py \ + --config config.yaml \ --model llama3.1-8b \ - --num-users 200 \ - --duration 300 \ + --num-users 150 \ + --duration 600 \ --gpu-mem-gb 0 \ --cpu-mem-gb 0 \ - --max-concurrent-allocs 16 \ - --generation-mode none \ + --generation-mode realistic \ + --performance-profile throughput \ --cache-dir "$cache_dir" \ --seed 42 \ - --output mlperf_v3_stress_8b.json + --output mlperf_v3_stress_8b.json \ + --xlsx-output mlperf_v3_stress_8b.xlsx echo "Maximum storage stress test (8B) complete." echo "" @@ -443,6 +439,7 @@ if should_run 'mlperf_submission'; then echo "[MLPerf 2/4] Storage Throughput Test: llama3.1-8b, cpu_mem=4GB, 100 users..." echo " PRIMARY METRIC: Storage Throughput (tok/s)" python3 kv-cache.py \ + --config config.yaml \ --model llama3.1-8b \ --num-users 100 \ --duration 300 \ @@ -452,7 +449,8 @@ if should_run 'mlperf_submission'; then --generation-mode none \ --cache-dir "$cache_dir" \ --seed 42 \ - --output mlperf_v3_throughput_8b.json + --output mlperf_v3_throughput_8b.json \ + --xlsx-output mlperf_v3_throughput_8b.xlsx echo "Storage throughput test (8B) complete." echo "" @@ -463,16 +461,18 @@ if should_run 'mlperf_submission'; then echo "[MLPerf 3/4] Large Model Stress: llama3.1-70b-instruct, cpu_mem=0GB, 70 users..." echo " PRIMARY METRICS: Decode Bytes Read, Wall-Clock Throughput" python3 kv-cache.py \ + --config config.yaml \ --model llama3.1-70b-instruct \ - --num-users 70 \ - --duration 300 \ + --num-users 40 \ + --duration 600 \ --gpu-mem-gb 0 \ --cpu-mem-gb 0 \ - --max-concurrent-allocs 4 \ - --generation-mode none \ + --generation-mode realistic \ + --performance-profile throughput \ --cache-dir "$cache_dir" \ --seed 42 \ - --output mlperf_v3_stress_70b.json + --output mlperf_v3_stress_70b.json \ + --xlsx-output mlperf_v3_stress_70b.xlsx echo "Large model storage stress test (70B) complete." echo "" @@ -482,6 +482,7 @@ if should_run 'mlperf_submission'; then echo "[MLPerf 4/4] Large Model Throughput: llama3.1-70b-instruct, cpu_mem=4GB, 50 users..." echo " PRIMARY METRIC: Storage Throughput (tok/s)" python3 kv-cache.py \ + --config config.yaml \ --model llama3.1-70b-instruct \ --num-users 50 \ --duration 300 \ @@ -491,7 +492,8 @@ if should_run 'mlperf_submission'; then --generation-mode none \ --cache-dir "$cache_dir" \ --seed 42 \ - --output mlperf_v3_throughput_70b.json + --output mlperf_v3_throughput_70b.json \ + --xlsx-output mlperf_v3_throughput_70b.xlsx echo "Large model throughput test (70B) complete." echo "" @@ -523,15 +525,17 @@ if should_run 'gpu-only'; then if [ "$gpu_available" -eq 1 ]; then echo "[1/10] GPU Only - All cache in VRAM..." python3 kv-cache.py \ + --config config.yaml \ --model $model \ --num-users $users_baseline \ --duration "$tier_duration" \ --gpu-mem-gb $gpu_mem_gb \ - --cpu-mem-gb 4 \ + --cpu-mem-gb 0 \ --generation-mode realistic \ "${rag_args[@]}" \ --seed 42 \ - --output results_tier_gpu_only.json + --output results_tier_gpu_only.json \ + --xlsx-output results_tier_gpu_only.xlsx echo "" echo "GPU test complete. Expect lowest latency but limited capacity." @@ -552,6 +556,7 @@ fi if should_run 'cpu-only'; then echo "[2/10] CPU Only - All cache in RAM..." python3 kv-cache.py \ + --config config.yaml \ --model $model \ --num-users $users_baseline \ --duration "$tier_duration" \ @@ -560,7 +565,8 @@ if should_run 'cpu-only'; then --generation-mode realistic \ "${rag_args[@]}" \ --seed 42 \ - --output results_tier_cpu_only.json + --output results_tier_cpu_only.json \ + --xlsx-output results_tier_cpu_only.xlsx echo "" echo "CPU test complete. This is the typical production configuration." @@ -589,16 +595,18 @@ fi if should_run 'storage-only'; then echo "[3/10] TIER TEST: Storage Only - Pure NVMe/SSD caching..." python3 kv-cache.py \ + --config config.yaml \ --model $model \ --num-users $users_baseline \ --duration "$tier_duration" \ --gpu-mem-gb 0 \ - --cpu-mem-gb 4 \ + --cpu-mem-gb 0 \ --generation-mode realistic \ --cache-dir $cache_dir \ "${rag_args[@]}" \ --seed 42 \ - --output results_tier_storage_only.json + --output results_tier_storage_only.json \ + --xlsx-output results_tier_storage_only.xlsx echo "" echo "Expected: Highest latency, validates NVMe P95 < 200ms for reads" @@ -628,6 +636,7 @@ if should_run 'gpu-cpu'; then if [ "$gpu_available" -eq 1 ]; then echo "[4/10] TIER TEST: GPU + CPU - Two-tier hot/warm caching..." python3 kv-cache.py \ + --config config.yaml \ --model $model \ --num-users $users_baseline \ --duration "$tier_duration" \ @@ -636,7 +645,8 @@ if should_run 'gpu-cpu'; then --generation-mode realistic \ "${rag_args[@]}" \ --seed 42 \ - --output results_tier_gpu_cpu.json + --output results_tier_gpu_cpu.json \ + --xlsx-output results_tier_gpu_cpu.xlsx echo "" echo "Expected: Low latency with large capacity" @@ -670,6 +680,7 @@ fi if should_run 'cpu-storage'; then echo "[5/10] TIER TEST: CPU + Storage - RAM with NVMe spillover..." python3 kv-cache.py \ + --config config.yaml \ --model $model \ --num-users $users_high \ --duration "$tier_duration" \ @@ -679,7 +690,8 @@ if should_run 'cpu-storage'; then --cache-dir $cache_dir \ "${rag_args[@]}" \ --seed 42 \ - --output results_tier_cpu_storage.json + --output results_tier_cpu_storage.json \ + --xlsx-output results_tier_cpu_storage.xlsx echo "" echo "Expected: Moderate latency, forces storage spillover with ${users_high} users" @@ -710,6 +722,7 @@ if should_run 'gpu-cpu-storage'; then if [ "$gpu_available" -eq 1 ]; then echo "[6/10] TIER TEST: GPU + CPU + Storage - Full three-tier hierarchy..." python3 kv-cache.py \ + --config config.yaml \ --model $model \ --num-users $users_high \ --duration "$tier_duration" \ @@ -719,7 +732,8 @@ if should_run 'gpu-cpu-storage'; then --cache-dir $cache_dir \ "${rag_args[@]}" \ --seed 42 \ - --output results_tier_gpu_cpu_storage.json + --output results_tier_gpu_cpu_storage.json \ + --xlsx-output results_tier_gpu_cpu_storage.xlsx echo "" echo "Expected: Best overall - hot in GPU, warm in CPU, cold in storage" @@ -752,16 +766,18 @@ fi if should_run 'storage-saturation'; then echo "[7/10] STRESS TEST: Storage Saturation - Maximum NVMe load..." python3 kv-cache.py \ + --config config.yaml \ --model $model \ --num-users $users_high \ --duration "$saturation_duration" \ --gpu-mem-gb 0 \ - --cpu-mem-gb 4 \ + --cpu-mem-gb 0 \ --generation-mode realistic \ --cache-dir $cache_dir \ "${rag_args[@]}" \ --seed 42 \ - --output results_stress_storage_saturation.json + --output results_stress_storage_saturation.json \ + --xlsx-output results_stress_storage_saturation.xlsx echo "" echo "Expected: High storage load, validates NVMe can handle ${users_high} users" @@ -796,6 +812,7 @@ fi if should_run 'production'; then echo "[8/10] REALISTIC TEST: Production Workload - Multi-tier with realistic load..." python3 kv-cache.py \ + --config config.yaml \ --model $model \ --num-users $users_baseline \ --duration "$realistic_duration" \ @@ -805,7 +822,8 @@ if should_run 'production'; then --cache-dir $cache_dir \ "${rag_args[@]}" \ --seed 42 \ - --output results_realistic_production.json + --output results_realistic_production.json \ + --xlsx-output results_realistic_production.xlsx echo "" echo "Expected: Balanced performance, realistic production scenario" @@ -839,6 +857,7 @@ fi if should_run 'autoscale'; then echo "[9/10] DISCOVERY TEST: Autoscaling - Find optimal user count..." python3 kv-cache.py \ + --config config.yaml \ --model $model \ --num-users 20 \ --duration "$autoscale_duration" \ @@ -850,7 +869,8 @@ if should_run 'autoscale'; then --cache-dir $cache_dir \ "${rag_args[@]}" \ --seed 42 \ - --output results_autoscaling_discovery.json + --output results_autoscaling_discovery.json \ + --xlsx-output results_autoscaling_discovery.xlsx echo "" echo "Expected: Progressive scaling to find hardware limits" @@ -908,25 +928,15 @@ print("COMPREHENSIVE BENCHMARK ANALYSIS") print("="*100) # Scenario catalog ties each results JSON to a friendly description. -# Updated to reflect discovery-validated MLPerf invocations (Jan 2026) scenarios = [ - # MLPerf Stress Tests (cpu_mem=0GB) - Use Decode Bytes Read / Wall-Clock Throughput - ("mlperf_stress_8b", "mlperf_v3_stress_8b.json", "MLPerf: Storage Stress (8B, cpu_mem=0GB)", "Maximum storage stress test. PRIMARY METRICS: Decode Bytes Read (2.62x), Wall-Clock Throughput (2.43x). WARNING: Storage Throughput unreliable at cpu_mem=0GB."), - ("mlperf_stress_70b", "mlperf_v3_stress_70b.json", "MLPerf: Storage Stress (70B, cpu_mem=0GB)", "Large model storage stress (~10x I/O per token). PRIMARY METRICS: Decode Bytes Read, Wall-Clock Throughput."), - # MLPerf Throughput Tests (cpu_mem=4GB) - Use Storage Throughput - ("mlperf_throughput_8b", "mlperf_v3_throughput_8b.json", "MLPerf: Storage Throughput (8B, cpu_mem=4GB)", "Storage throughput benchmark. PRIMARY METRIC: Storage Throughput (2.2x differentiation, 97% win rate)."), - ("mlperf_throughput_70b", "mlperf_v3_throughput_70b.json", "MLPerf: Storage Throughput (70B, cpu_mem=4GB)", "Large model throughput test. PRIMARY METRIC: Storage Throughput."), - # Legacy MLPerf filenames (for backwards compatibility) - ("mlperf_submission_8b", "mlperf_v3_storage_submission_8b.json", "MLPerf: Legacy Submission (8B)", "Legacy format. Consider using new discovery-validated invocations."), - ("mlperf_submission_70b", "mlperf_v3_storage_submission_70b.json", "MLPerf: Legacy Submission (70B)", "Legacy format. Consider using new discovery-validated invocations."), - # Tier tests + ("mlperf_submission_8b", "mlperf_v3_storage_submission_8b.json", "MLPerf: Standard Submission (8B)", "Official MLPerf v3.0 storage submission with llama3.1-8b."), + ("mlperf_submission_70b", "mlperf_v3_storage_submission_70b.json", "MLPerf: Large Model Submission (70B)", "Official MLPerf v3.0 storage submission with llama3.1-70b."), ("gpu-only", "results_tier_gpu_only.json", "Tier: GPU Only", "All KV cache pinned in GPU VRAM for a latency baseline."), ("cpu-only", "results_tier_cpu_only.json", "Tier: CPU Only", "Cache entirely in system RAM (typical production baseline)."), ("storage-only", "results_tier_storage_only.json", "Tier: Storage Only", "Forces every lookup to NVMe/SSD to expose disk behaviour."), ("gpu-cpu", "results_tier_gpu_cpu.json", "Tier: GPU + CPU", "Two-tier hot/warm cache without backing storage."), ("cpu-storage", "results_tier_cpu_storage.json", "Tier: CPU + Storage", "RAM backed by NVMe spillover for larger working sets."), ("gpu-cpu-storage", "results_tier_gpu_cpu_storage.json", "Tier: GPU + CPU + Storage", "Full three-tier hierarchy (VRAM + RAM + NVMe)."), - # Stress tests ("storage-saturation", "results_stress_storage_saturation.json", "Stress: Storage Saturation", "High-concurrency workload with constrained RAM to find NVMe limits."), ("production", "results_realistic_production.json", "Stress: Realistic Production", "Balanced configuration intended to mimic steady-state inference load."), ("autoscale", "results_autoscaling_discovery.json", "Stress: Autoscaling Discovery", "Adaptive user ramp designed to discover sustainable concurrency."), @@ -935,14 +945,8 @@ scenarios = [ selected_env = os.getenv("KVCACHE_SELECTED_WORKLOADS", "") selected_keys = {item.strip() for item in selected_env.split(",") if item.strip()} if selected_env else set() -# If mlperf_submission is selected, add all MLPerf sub-scenarios to the list to be processed. +# If mlperf_submission is selected, add its sub-scenarios to the list to be processed. if "mlperf_submission" in selected_keys: - # New discovery-validated scenarios - selected_keys.add("mlperf_stress_8b") - selected_keys.add("mlperf_stress_70b") - selected_keys.add("mlperf_throughput_8b") - selected_keys.add("mlperf_throughput_70b") - # Legacy scenarios (for backwards compatibility) selected_keys.add("mlperf_submission_8b") selected_keys.add("mlperf_submission_70b") diff --git a/kv_cache_benchmark/utils/run_benchmarks_256gb.sh b/kv_cache_benchmark/utils/run_benchmarks_256gb.sh new file mode 100755 index 00000000..fc790490 --- /dev/null +++ b/kv_cache_benchmark/utils/run_benchmarks_256gb.sh @@ -0,0 +1,403 @@ +#!/usr/bin/env bash +# ============================================================================= +# MLPerf v3.0 KV Cache Benchmark Runner (256GB RAM Safe) +# Kingston Digital, 2025 — Licensed under Apache 2.0 +# +# Memory-safe version for systems with 256GB RAM. +# Optimized for STORAGE BENCHMARKING: cpu_mem=0, gpu_mem=0 (NVMe-only) +# +# Includes: stress, throughput, prefill-only, decode-only, and RAG suites. +# +# Usage: +# ./run_benchmarks_256gb.sh # defaults: 3 trials, /mnt/nvme +# ./run_benchmarks_256gb.sh --trials 1 --cache-dir /mnt/ssd +# ./run_benchmarks_256gb.sh --suites "prefill decode" # only run prefill and decode suites +# ./run_benchmarks_256gb.sh --suites rag # only run RAG suite +# ./run_benchmarks_256gb.sh --models "llama3.1-8b" # single model +# +# Available suites: stress, throughput, prefill, decode, rag +# ============================================================================= +set -euo pipefail + +# ─── Defaults (tuned for 256GB RAM, NVMe-only storage testing) ─────────────── +TRIALS=3 +CACHE_DIR="/mnt/nvme" +DURATION=300 +SEED=42 +SUITES="stress throughput prefill decode rag" +MODELS="" # empty = all models +KV_CACHE_CMD="kv-cache" +RESULTS_DIR="results_256gb" + +# ============================================================================= +# MEMORY BUDGET CALCULATION (256GB system, ~200GB usable for benchmark) +# ============================================================================= +# KV cache bytes per token (from config.yaml, verified against HuggingFace): +# - llama2-7b: 524,288 bytes (500 KB) ← MHA, largest per-token cache +# - llama3.1-70b: 327,680 bytes (313 KB) ← GQA, efficient +# - qwen3-32b: 262,144 bytes (250 KB) ← GQA (head_dim=128 explicit) +# - llama3.1-8b: 131,072 bytes (125 KB) ← GQA, efficient +# - mistral-7b: 131,072 bytes (125 KB) ← GQA +# - gpt-oss-120b: 73,728 bytes (70 KB) ← MoE (head_dim=64 explicit) +# - deepseek-v3: 70,272 bytes (67 KB) ← MLA compressed (kv_lora_rank=512 + rope=64) +# - gpt-oss-20b: 49,152 bytes (47 KB) ← MoE (head_dim=64 explicit) +# +# Peak RAM ≈ num_users × avg_context_tokens × bytes_per_token × in_flight_factor +# With max_concurrent_allocs=N, in_flight_factor ≈ min(N, num_users) +# +# Safe configurations for 256GB (targeting ~150GB peak to leave headroom): +# - llama2-7b: 30 users × 4K × 500KB × 8 allocs = 49 GB peak ✓ +# - llama3.1-70b: 40 users × 4K × 313KB × 8 allocs = 41 GB peak ✓ +# - qwen3-32b: 80 users × 4K × 250KB × 8 allocs = 64 GB peak ✓ +# - llama3.1-8b: 100 users × 4K × 125KB × 16 allocs = 82 GB peak ✓ +# - mistral-7b: 100 users × 4K × 125KB × 16 allocs = 82 GB peak ✓ +# - deepseek-v3: 150 users × 4K × 67KB × 16 allocs = 66 GB peak ✓ (MLA compressed) +# ============================================================================= + +# ─── Parse arguments ───────────────────────────────────────────────────────── +while [[ $# -gt 0 ]]; do + case "$1" in + --trials) TRIALS="$2"; shift 2 ;; + --cache-dir) CACHE_DIR="$2"; shift 2 ;; + --duration) DURATION="$2"; shift 2 ;; + --seed) SEED="$2"; shift 2 ;; + --suites) SUITES="$2"; shift 2 ;; + --models) MODELS="$2"; shift 2 ;; + --results-dir) RESULTS_DIR="$2"; shift 2 ;; + --help|-h) + head -16 "$0" | tail -10 + exit 0 + ;; + *) + echo "Unknown option: $1" >&2; exit 1 ;; + esac +done + +# ─── All models from config.yaml (storage benchmark selection) ─────────────── +# Ordered by KV cache size (largest first) for progressive storage stress +ALL_MODELS=( + llama2-7b # 500 KB/token - MHA baseline (no GQA), largest per-token + llama3.1-70b-instruct # 313 KB/token - Large GQA model + qwen3-32b # 250 KB/token - Medium GQA model (head_dim=128 explicit) + llama3.1-8b # 125 KB/token - Standard GQA model + mistral-7b # 125 KB/token - Standard GQA model + gpt-oss-120b # 70 KB/token - MoE (head_dim=64 explicit) + deepseek-v3 # 67 KB/token - MLA compressed (kv_lora_rank=512+rope=64) + gpt-oss-20b # 47 KB/token - MoE (head_dim=64 explicit) +) + +# Use user-specified models or full suite +if [[ -n "$MODELS" ]]; then + read -ra MODEL_LIST <<< "$MODELS" +else + MODEL_LIST=("${ALL_MODELS[@]}") +fi + +# ─── Model classification and RAM-safe parameters ──────────────────────────── +# Returns: users max_allocs cpu_mem gpu_mem +# ALL configurations use cpu_mem=0 gpu_mem=0 for pure storage benchmarking +get_model_params() { + local model="$1" + local suite="$2" + + # Model-specific safe parameters for 256GB RAM + # Format: users max_allocs cpu_mem gpu_mem + case "$model" in + deepseek-v3) + # 67 KB/token (MLA compressed: kv_lora_rank=512 + qk_rope_head_dim=64) + # 150 users × 4K × 67KB = 40GB (with allocs=16) + case "$suite" in + stress) echo "150 16 0 0" ;; + throughput) echo "120 16 0 0" ;; + prefill) echo "180 16 0 0" ;; + decode) echo "120 16 0 0" ;; + rag) echo "100 8 0 0" ;; + esac + ;; + llama2-7b) + # 512 KB/token - MHA (no GQA), larger than 8B GQA models + # 30 users × 4K × 512KB = 61GB (with allocs=8) + case "$suite" in + stress) echo "30 8 0 0" ;; + throughput) echo "25 8 0 0" ;; + prefill) echo "35 8 0 0" ;; + decode) echo "25 8 0 0" ;; + rag) echo "20 4 0 0" ;; + esac + ;; + llama3.1-70b-instruct) + # 320 KB/token - Large but GQA-efficient + # 40 users × 4K × 320KB = 51GB (with allocs=8) + case "$suite" in + stress) echo "40 8 0 0" ;; + throughput) echo "35 8 0 0" ;; + prefill) echo "50 8 0 0" ;; + decode) echo "35 8 0 0" ;; + rag) echo "25 4 0 0" ;; + esac + ;; + qwen3-32b) + # 250 KB/token - Medium GQA model (head_dim=128 explicit in HF config) + # 50 users × 4K × 250KB = 50GB (with allocs=8) + case "$suite" in + stress) echo "50 8 0 0" ;; + throughput) echo "40 8 0 0" ;; + prefill) echo "60 8 0 0" ;; + decode) echo "40 8 0 0" ;; + rag) echo "30 4 0 0" ;; + esac + ;; + llama3.1-8b|mistral-7b) + # 128 KB/token - Efficient GQA models + # 100 users × 4K × 128KB = 51GB (with allocs=16) + case "$suite" in + stress) echo "100 16 0 0" ;; + throughput) echo "80 16 0 0" ;; + prefill) echo "120 16 0 0" ;; + decode) echo "80 16 0 0" ;; + rag) echo "60 8 0 0" ;; + esac + ;; + gpt-oss-120b|gpt-oss-20b|tiny-1b) + # 48-73 KB/token - MoE models, very efficient KV cache + # 150 users × 4K × 73KB = 44GB (with allocs=16) + case "$suite" in + stress) echo "150 16 0 0" ;; + throughput) echo "120 16 0 0" ;; + prefill) echo "180 16 0 0" ;; + decode) echo "120 16 0 0" ;; + rag) echo "100 8 0 0" ;; + esac + ;; + *) + # Unknown model - use conservative defaults + echo "30 8 0 0" + ;; + esac +} + +mkdir -p "${RESULTS_DIR}" + +# ─── Detect block device under cache dir ────────────────────────────────────── +# Returns the whole-disk block device path (e.g., /dev/nvme0n1) for iostat. +# Handles both partitioned (nvme0n1p1 → nvme0n1) and whole-device mounts. +detect_block_device() { + local dir="$1" + local dev + + # Method 1: df-based detection + dev=$(df "$dir" 2>/dev/null | tail -1 | awk '{print $1}') + + # Method 2: fallback to findmnt (more reliable for NVMe) + if [[ -z "$dev" ]] || [[ ! -b "$dev" ]]; then + dev=$(findmnt -no SOURCE "$dir" 2>/dev/null | head -1) + fi + + if [[ -n "$dev" ]] && [[ -b "$dev" ]]; then + # Try to resolve to parent (partition → whole disk) + local base + base=$(lsblk -no PKNAME "$dev" 2>/dev/null | head -1) + if [[ -n "$base" ]]; then + echo "/dev/${base}" + else + # No parent = already a whole-disk device (common for NVMe) + echo "$dev" + fi + else + echo "" + fi +} + +BLOCK_DEV=$(detect_block_device "${CACHE_DIR}") +# iostat needs just the device name (e.g., "nvme0n1"), not the full path +IOSTAT_DEV="" +if [[ -n "$BLOCK_DEV" ]]; then + IOSTAT_DEV=$(basename "$BLOCK_DEV") +fi + +TIMESTAMP=$(date +%Y%m%d_%H%M%S) +LOG_FILE="${RESULTS_DIR}/benchmark_run_${TIMESTAMP}.log" + +echo "================================================================" | tee "$LOG_FILE" +echo "MLPerf v3.0 KV Cache Benchmark (256GB RAM Safe)" | tee -a "$LOG_FILE" +echo "$(date)" | tee -a "$LOG_FILE" +echo "================================================================" | tee -a "$LOG_FILE" +echo "Trials: ${TRIALS} Cache Dir: ${CACHE_DIR} Duration: ${DURATION}s" | tee -a "$LOG_FILE" +echo "Models: ${MODEL_LIST[*]}" | tee -a "$LOG_FILE" +echo "Suites: ${SUITES}" | tee -a "$LOG_FILE" +echo "System RAM: 256GB (parameters tuned for memory safety)" | tee -a "$LOG_FILE" +if [[ -n "$BLOCK_DEV" ]]; then + echo "Block Device: ${BLOCK_DEV} (iostat target: ${IOSTAT_DEV})" | tee -a "$LOG_FILE" +else + echo "Block Device: (not detected — iostat monitoring disabled)" | tee -a "$LOG_FILE" + echo " Tip: verify mount with 'findmnt ${CACHE_DIR}' or 'df ${CACHE_DIR}'" | tee -a "$LOG_FILE" +fi +echo "================================================================" | tee -a "$LOG_FILE" + +run_trial() { + local suite="$1" model="$2" trial="$3" + local users="$4" max_allocs="$5" cpu_mem="$6" gpu_mem="$7" + local extra_args="${8:-}" + + local tag="${suite}_${model}_trial${trial}" + local json_out="${RESULTS_DIR}/mlperf_v3_${tag}.json" + local xlsx_out="${RESULTS_DIR}/mlperf_v3_${tag}.xlsx" + local iostat_out="${RESULTS_DIR}/mlperf_v3_${tag}_iostat.log" + local iostat_pid="" + + echo "" | tee -a "$LOG_FILE" + echo ">>> [${suite}] ${model} — trial ${trial}/${TRIALS}" | tee -a "$LOG_FILE" + echo " users=${users} cpu_mem=${cpu_mem}GB gpu_mem=${gpu_mem}GB max_allocs=${max_allocs}" | tee -a "$LOG_FILE" + if [[ -n "$extra_args" ]]; then + echo " extra: ${extra_args}" | tee -a "$LOG_FILE" + fi + + # Start iostat background monitor (use short device name for compatibility) + if [[ -n "$IOSTAT_DEV" ]] && command -v iostat &>/dev/null; then + iostat -mx "$IOSTAT_DEV" 1 > "$iostat_out" 2>&1 & + iostat_pid=$! + echo " iostat PID ${iostat_pid} monitoring ${IOSTAT_DEV} -> ${iostat_out}" | tee -a "$LOG_FILE" + elif [[ -z "$IOSTAT_DEV" ]]; then + echo " WARNING: No block device detected for ${CACHE_DIR} — iostat disabled" | tee -a "$LOG_FILE" + fi + + # shellcheck disable=SC2086 + ${KV_CACHE_CMD} \ + --config config.yaml \ + --model "${model}" \ + --num-users "${users}" \ + --duration "${DURATION}" \ + --gpu-mem-gb "${gpu_mem}" \ + --cpu-mem-gb "${cpu_mem}" \ + --max-concurrent-allocs "${max_allocs}" \ + --generation-mode none \ + --cache-dir "${CACHE_DIR}" \ + --seed "${SEED}" \ + --output "${json_out}" \ + --xlsx-output "${xlsx_out}" \ + ${extra_args} \ + 2>&1 | tee -a "$LOG_FILE" + + # Stop iostat + if [[ -n "$iostat_pid" ]]; then + kill "$iostat_pid" 2>/dev/null || true + wait "$iostat_pid" 2>/dev/null || true + echo " ✓ iostat: ${iostat_out}" | tee -a "$LOG_FILE" + fi + + echo " ✓ JSON: ${json_out}" | tee -a "$LOG_FILE" + echo " ✓ XLSX: ${xlsx_out}" | tee -a "$LOG_FILE" +} + +# ─── Suite 1: Storage Stress (cpu_mem=0, gpu_mem=0, NVMe-only) ─────────────── +if [[ "$SUITES" == *"stress"* ]]; then + echo "" | tee -a "$LOG_FILE" + echo "============================================================" | tee -a "$LOG_FILE" + echo "SUITE: STORAGE STRESS (cpu=0GB, gpu=0GB, NVMe-only)" | tee -a "$LOG_FILE" + echo " Scenario: ALL KV cache I/O goes directly to NVMe" | tee -a "$LOG_FILE" + echo " Primary metrics: Read/Write Bandwidth, Device Latency" | tee -a "$LOG_FILE" + echo "============================================================" | tee -a "$LOG_FILE" + + for model in "${MODEL_LIST[@]}"; do + read -r users max_allocs cpu_mem gpu_mem <<< "$(get_model_params "$model" stress)" + for trial in $(seq 1 "$TRIALS"); do + run_trial "stress" "$model" "$trial" "$users" "$max_allocs" "$cpu_mem" "$gpu_mem" + done + done +fi + +# ─── Suite 2: Storage Throughput (cpu_mem=0, gpu_mem=0 for pure storage) ────── +if [[ "$SUITES" == *"throughput"* ]]; then + echo "" | tee -a "$LOG_FILE" + echo "============================================================" | tee -a "$LOG_FILE" + echo "SUITE: STORAGE THROUGHPUT (cpu=0GB, gpu=0GB)" | tee -a "$LOG_FILE" + echo " Scenario: Sustained storage throughput measurement" | tee -a "$LOG_FILE" + echo " Primary metric: Storage Throughput (GB/s)" | tee -a "$LOG_FILE" + echo "============================================================" | tee -a "$LOG_FILE" + + for model in "${MODEL_LIST[@]}"; do + read -r users max_allocs cpu_mem gpu_mem <<< "$(get_model_params "$model" throughput)" + for trial in $(seq 1 "$TRIALS"); do + run_trial "throughput" "$model" "$trial" "$users" "$max_allocs" "$cpu_mem" "$gpu_mem" + done + done +fi + +# ─── Suite 3: Prefill-Only (write-heavy, simulates prefill workers) ─────────── +if [[ "$SUITES" == *"prefill"* ]]; then + echo "" | tee -a "$LOG_FILE" + echo "============================================================" | tee -a "$LOG_FILE" + echo "SUITE: PREFILL-ONLY (write-heavy, cpu=0GB, gpu=0GB)" | tee -a "$LOG_FILE" + echo " Scenario: Disaggregated inference — prefill worker" | tee -a "$LOG_FILE" + echo " Real-world: Prefill server computes KV, writes to storage" | tee -a "$LOG_FILE" + echo " I/O pattern: ~95% writes, minimal reads" | tee -a "$LOG_FILE" + echo " Primary metric: Write Bandwidth (GB/s)" | tee -a "$LOG_FILE" + echo "============================================================" | tee -a "$LOG_FILE" + + for model in "${MODEL_LIST[@]}"; do + read -r users max_allocs cpu_mem gpu_mem <<< "$(get_model_params "$model" prefill)" + for trial in $(seq 1 "$TRIALS"); do + run_trial "prefill" "$model" "$trial" "$users" "$max_allocs" "$cpu_mem" "$gpu_mem" \ + "--prefill-only --disable-multi-turn --disable-prefix-caching" + done + done +fi + +# ─── Suite 4: Decode-Only (read-heavy, simulates decode workers) ────────────── +if [[ "$SUITES" == *"decode"* ]]; then + echo "" | tee -a "$LOG_FILE" + echo "============================================================" | tee -a "$LOG_FILE" + echo "SUITE: DECODE-ONLY (read-heavy, cpu=0GB, gpu=0GB)" | tee -a "$LOG_FILE" + echo " Scenario: Disaggregated inference — decode worker" | tee -a "$LOG_FILE" + echo " Real-world: Decode server reads pre-computed KV from storage" | tee -a "$LOG_FILE" + echo " I/O pattern: ~100% reads from pre-populated cache" | tee -a "$LOG_FILE" + echo " Primary metric: Read Bandwidth (GB/s), Read Latency P99" | tee -a "$LOG_FILE" + echo "============================================================" | tee -a "$LOG_FILE" + + for model in "${MODEL_LIST[@]}"; do + read -r users max_allocs cpu_mem gpu_mem <<< "$(get_model_params "$model" decode)" + for trial in $(seq 1 "$TRIALS"); do + run_trial "decode" "$model" "$trial" "$users" "$max_allocs" "$cpu_mem" "$gpu_mem" \ + "--decode-only" + done + done +fi + +# ─── Suite 5: RAG Workload (mixed reads from document cache) ────────────────── +if [[ "$SUITES" == *"rag"* ]]; then + echo "" | tee -a "$LOG_FILE" + echo "============================================================" | tee -a "$LOG_FILE" + echo "SUITE: RAG WORKLOAD (cpu=0GB, gpu=0GB)" | tee -a "$LOG_FILE" + echo " Scenario: Retrieval-Augmented Generation" | tee -a "$LOG_FILE" + echo " Real-world: Each request retrieves 3-5 document chunks" | tee -a "$LOG_FILE" + echo " I/O pattern: Write doc embeddings once, read many times" | tee -a "$LOG_FILE" + echo " Primary metric: Read Bandwidth, Cache Hit Rate" | tee -a "$LOG_FILE" + echo "============================================================" | tee -a "$LOG_FILE" + + for model in "${MODEL_LIST[@]}"; do + read -r users max_allocs cpu_mem gpu_mem <<< "$(get_model_params "$model" rag)" + for trial in $(seq 1 "$TRIALS"); do + run_trial "rag" "$model" "$trial" "$users" "$max_allocs" "$cpu_mem" "$gpu_mem" \ + "--enable-rag --rag-num-docs 50" + done + done +fi + +echo "" | tee -a "$LOG_FILE" +echo "================================================================" | tee -a "$LOG_FILE" +echo "All benchmarks complete — $(date)" | tee -a "$LOG_FILE" +echo "Results in: ${RESULTS_DIR}/" | tee -a "$LOG_FILE" +echo "Log: ${LOG_FILE}" | tee -a "$LOG_FILE" +echo "" | tee -a "$LOG_FILE" +echo "Memory usage summary (256GB safe, cpu=0 gpu=0 storage-only):" | tee -a "$LOG_FILE" +echo " Model | KB/tok | Users | max_allocs | Peak RAM (est)" | tee -a "$LOG_FILE" +echo " -------------------|--------|-------|------------|----------------" | tee -a "$LOG_FILE" +echo " llama2-7b | 500 | 30 | 8 | ~49 GB" | tee -a "$LOG_FILE" +echo " llama3.1-70b | 313 | 40 | 8 | ~41 GB" | tee -a "$LOG_FILE" +echo " qwen3-32b | 250 | 50 | 8 | ~50 GB" | tee -a "$LOG_FILE" +echo " llama3.1-8b | 125 | 100 | 16 | ~82 GB" | tee -a "$LOG_FILE" +echo " mistral-7b | 125 | 100 | 16 | ~82 GB" | tee -a "$LOG_FILE" +echo " deepseek-v3 (MLA) | 67 | 150 | 16 | ~66 GB" | tee -a "$LOG_FILE" +echo "" | tee -a "$LOG_FILE" +echo "All tests use cpu_mem=0 gpu_mem=0 for pure NVMe storage benchmarking" | tee -a "$LOG_FILE" +echo "================================================================" | tee -a "$LOG_FILE" diff --git a/kv_cache_benchmark/validate.sh b/kv_cache_benchmark/utils/validate.sh similarity index 100% rename from kv_cache_benchmark/validate.sh rename to kv_cache_benchmark/utils/validate.sh diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/comparison_report.txt b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/comparison_report.txt deleted file mode 100644 index df9ceba6..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/comparison_report.txt +++ /dev/null @@ -1,78 +0,0 @@ -================================================================================ -LMCACHE vs KV-CACHE COMPARISON RESULTS -================================================================================ - -vLLM Baseline (no LMCache) --------------------------------------------------- - Trials: 3 - Tokens/sec: 13730.11 +/- 8.84 - Requests/sec: 28.62 +/- 0.02 - Elapsed time: 17.47s +/- 0.01s - -LMCache GPU-only --------------------------------------------------- - Trials: 3 - Tokens/sec: 9508.42 +/- 32.22 - Requests/sec: 75.88 +/- 0.18 - Elapsed time: 6.48s +/- 0.02s - -LMCache CPU Offload --------------------------------------------------- - Trials: 3 - Tokens/sec: 9410.65 +/- 90.61 - Requests/sec: 75.15 +/- 0.72 - Elapsed time: 6.55s +/- 0.06s - -kv-cache.py GPU-only (equal capacity) --------------------------------------------------- - Trials: 3 - Storage Throughput: 1691.13 +/- 154.59 tok/s - Storage Requests/sec: 6.30 +/- 0.59 - Total I/O Time: 87.98s +/- 8.36s - -kv-cache.py GPU+CPU (equal capacity) --------------------------------------------------- - Trials: 3 - Storage Throughput: 1545.70 +/- 258.91 tok/s - Storage Requests/sec: 5.74 +/- 0.97 - Total I/O Time: 98.78s +/- 18.96s - -kv-cache.py GPU+CPU+NVMe (equal capacity) --------------------------------------------------- - Trials: 3 - Storage Throughput: 1175.43 +/- 181.07 tok/s - Storage Requests/sec: 4.36 +/- 0.66 - Total I/O Time: 121.51s +/- 27.68s - -kv-cache.py NVMe-only (MLPerf Storage) --------------------------------------------------- - Trials: 3 - Storage Throughput: 262.90 +/- 2.40 tok/s - Storage Requests/sec: 0.98 +/- 0.01 - Total I/O Time: 558.70s +/- 3.91s - -================================================================================ -COMPARATIVE ANALYSIS -================================================================================ - -Note: kv-cache.py tests use EQUAL total cache capacity for fair comparison. - Storage Throughput = tokens / total_storage_io_latency (correct metric) - -kv-cache.py Storage Tier Comparison (Storage Throughput): - GPU ONLY : 1691.13 tok/s - GPU CPU : 1545.70 tok/s - GPU CPU NVME : 1175.43 tok/s - NVME ONLY : 262.90 tok/s - - Speedup vs NVMe-only: - gpu only : 6.43x - gpu cpu : 5.88x - gpu cpu nvme : 4.47x - -LMCache vs kv-cache.py (NOTE: different tools, different purposes): - - LMCache: Real GPU inference with KV cache optimization - - kv-cache.py: Storage I/O simulator for MLPerf Storage benchmark - - LMCache CPU offload: 9410.65 tok/s (real inference) - kv-cache.py GPU+CPU: 1545.70 tok/s (storage I/O sim) - Ratio: 6.09x (expected: LMCache faster due to GPU compute) \ No newline at end of file diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_cpu_nvme_trial1.json b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_cpu_nvme_trial1.json deleted file mode 100644 index 83cb42b4..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_cpu_nvme_trial1.json +++ /dev/null @@ -1,2330 +0,0 @@ -{ - "requests_completed": 438, - "total_tokens_generated": 118293, - "total_storage_io_latency": 82.95608637391706, - "total_generation_latency": 0.0, - "end_to_end_latencies": [ - 0.1923613800026942, - 0.21002943499479443, - 0.2684829039935721, - 0.28013048799766693, - 0.28030198100896087, - 0.2818663439975353, - 0.347250778999296, - 0.5218408340006135, - 0.5522496080084238, - 0.553977207004209, - 0.5724959309882252, - 0.594748803996481, - 0.6068099889962468, - 0.6245523100078572, - 0.6244986830133712, - 0.6312699380068807, - 0.6317707470007008, - 0.6323704420065042, - 0.6333047899970552, - 0.6326515109976754, - 0.6325186939939158, - 0.641659224012983, - 0.654468178996467, - 0.6601225030026399, - 0.6616168070031563, - 0.6614686350076227, - 0.6819239519973053, - 0.6872391150100157, - 0.6880177499988349, - 0.7011276890116278, - 0.7085818949999521, - 0.708583704996272, - 0.7084484039951349, - 0.7091028400027426, - 0.7195216200052528, - 0.7195380930061219, - 0.8020095270039747, - 0.8035877710062778, - 0.8043156630010344, - 0.8101278130052378, - 0.8106198579916963, - 0.8104726909950841, - 0.8172544210101478, - 0.8196473319985671, - 0.8196301949938061, - 0.8219834699993953, - 0.8215866640093736, - 0.8278899099968839, - 0.8348692339932313, - 0.8520948479999788, - 0.8522654260013951, - 0.8539028279919876, - 0.8601034599996638, - 0.8654053600039333, - 0.9350625059887534, - 0.945725722995121, - 0.9509017700038385, - 0.9595645180088468, - 0.9675972219993128, - 0.9654580899950815, - 0.9704397860041354, - 0.9683052740001585, - 0.9692888999998104, - 0.977888626002823, - 0.9769635249977, - 0.9834449579939246, - 0.9958240130072227, - 0.9960832759970799, - 1.0014334709994728, - 1.01067466600216, - 1.083502886001952, - 1.0866853699990315, - 1.0999155950121349, - 1.1081219449988566, - 1.1126800459896913, - 1.1166769099945668, - 1.1239468190033222, - 1.1284154029999627, - 1.139868906000629, - 1.145779375001439, - 1.1613374400039902, - 1.1619183400034672, - 1.2636408739926992, - 1.2664362450013869, - 1.2715369219949935, - 1.2752637160010636, - 1.27526142699935, - 1.2878467099944828, - 1.469373115003691, - 1.4731352270027855, - 1.474558422996779, - 1.4720732330024475, - 1.7412434549914906, - 1.7532500329980394, - 1.765197539003566, - 1.7994966380065307, - 1.8045961739990162, - 1.8059672219969798, - 1.806512061986723, - 1.8284299469960388, - 1.8599006180011202, - 1.8766652750055073, - 2.1204554729920346, - 2.2145419090084033, - 2.2244327869993867, - 2.313478174008196, - 2.5268448989954777, - 2.6026234589953674, - 2.7116311400022823, - 2.746679813004448, - 2.911341931001516, - 2.9366284730058396, - 3.020557956988341, - 3.185656404006295, - 3.233060115991975, - 3.5387906830001157, - 3.6177575390029233, - 3.6295645070058526, - 3.837230648001423, - 3.8865896359930048, - 3.935650943996734, - 4.162847380008316, - 4.197428333995049, - 4.434835685999133, - 4.483599757004413, - 4.521467444996233, - 4.569185982996714, - 4.59940378300962, - 4.659570984003949, - 4.729797777996282, - 5.094161079992773, - 5.155856562996632, - 5.1865768669958925, - 5.4263532890036, - 5.49573610900552, - 5.651162832000409, - 5.693880559003446, - 5.76800981600536, - 5.8615157840104075, - 6.3331519559869776, - 6.380286284009344, - 6.6460909790039295, - 6.72157760799746, - 6.823196001001634, - 6.892363591003232, - 6.963596721994691, - 7.110387506996631, - 7.5661101279983995, - 7.638511733006453, - 7.762742935010465, - 7.968465964004281, - 8.078398244993878, - 8.11912976000167, - 8.172568813999533, - 8.182447228988167, - 8.202634589004447, - 8.230006783996942, - 8.335464179006522, - 8.440857424007845, - 8.506223737000255, - 8.557516943998053, - 8.572605109991855, - 8.613448902004166, - 8.68754005599476, - 9.238270567002473, - 9.373342263002996, - 9.578906319002272, - 9.602268027010723, - 9.631486864003818, - 10.107479672995396, - 10.426940530000138, - 10.611756559999776, - 10.678088058994035, - 10.679438290011603, - 11.279411426992738, - 11.323441883010673, - 11.345124396000756, - 11.43796993100841, - 11.47441179100133, - 11.718796553002903, - 11.722186705999775, - 11.77867538499413, - 12.017972198009375, - 12.058226399007253, - 12.178101846002392, - 12.243953696000972, - 12.251219345009304, - 12.260702794999816, - 12.261041058998671, - 12.544555729007698, - 12.544594040999073, - 12.588179127997137, - 12.65853739499289, - 12.780235624988563, - 12.831775992002804, - 12.869113808003021, - 13.851763516999199, - 14.053922734005027, - 14.312951935993624, - 14.312610206005047, - 14.318876329998602, - 14.359937458008062, - 14.417886583003565, - 14.48721386800753, - 14.697981446006452, - 14.709224257007008, - 14.852007210007287, - 14.862140372002614, - 14.909557742008474, - 14.925418578000972, - 15.002592531003756, - 15.02669177899952, - 15.296361001994228, - 15.613229533002595, - 15.644823135997285, - 15.789299683994614, - 15.788998160991468, - 15.801426929989248, - 15.877306912007043, - 16.066959581992705, - 17.182200878000003, - 17.218054176002624, - 17.374973465994117, - 17.39070557100058, - 17.554110772005515, - 17.568968847001088, - 17.963365301999147, - 17.96187980400282, - 18.345936986006564, - 18.371431614999892, - 18.6996270029922, - 18.816595809999853, - 18.918647398997564, - 19.052077654007007, - 19.0516710410011, - 19.06350173401006, - 19.074706867002533, - 19.13518916699104, - 19.366777156988974, - 19.387728810994304, - 19.46661101700738, - 19.592928425990976, - 19.660456155994325, - 21.19550670598983, - 21.274938467002357, - 21.332820558003732, - 21.33782662000158, - 21.575323636992835, - 21.60845646201051, - 21.869712283005356, - 22.029657207996934, - 22.049188460005098, - 22.182495496002957, - 22.209185543004423, - 22.417544017007458, - 22.492321149999043, - 22.574138063995633, - 22.576182161996257, - 22.616557465007645, - 23.043255331998807, - 23.155477159001748, - 23.207817211994552, - 23.240033594993292, - 23.525360259998706, - 23.624173787000473, - 23.64431026400416, - 23.7976009289996, - 23.961532555011217, - 24.03440814599162, - 24.285642436007038, - 24.331329223001376, - 24.432111570000416, - 24.533101683991845, - 26.201084768996225, - 26.27262355601124, - 26.28337242199632, - 26.289579549003975, - 26.39736232299765, - 26.654388045004453, - 26.760431971997605, - 26.784753472005832, - 26.823318500988535, - 26.869481347006513, - 27.215969611992477, - 27.632621140990523, - 27.65954657799739, - 27.869745802003308, - 28.048406618006993, - 28.060148333999678, - 28.24510566200479, - 28.271147015999304, - 28.513675715992576, - 28.538945514999796, - 28.616341550994548, - 28.745345381990774, - 28.80468600599852, - 28.937122016999638, - 29.055190640996443, - 29.08599386899732, - 29.25120342800801, - 29.359413465004764, - 30.010950987998513, - 30.078151301000617, - 30.106007167996722, - 30.180591147989617, - 30.21614707900153, - 32.3103712079901, - 32.38919858599547, - 32.49504894099664, - 32.7850246020098, - 33.05303129401, - 33.44216062199848, - 33.59365134399559, - 33.630535055999644, - 33.640107380007976, - 33.91705598800036, - 34.14522813400254, - 34.161449015999096, - 34.64713632200437, - 35.24150339100743, - 35.28307368498645, - 35.44082973799959, - 35.508847755001625, - 35.757194613004685, - 35.80262945999857, - 36.06390774200554, - 36.15394493000349, - 36.472868594995816, - 37.64112110399583, - 37.692286010991666, - 37.76979971601395, - 39.84194671199657, - 39.94180870600394, - 40.41030624700943, - 40.66196980199311, - 40.72093433099508, - 40.88640097499592, - 40.926312616007635, - 41.445960663986625, - 41.486710224999115, - 41.52923532499699, - 41.89079268199566, - 42.025711497000884, - 42.08604792400729, - 42.673284748991136, - 42.74041636000038, - 42.91340975899948, - 43.36095639500127, - 43.44333131199528, - 43.494598393997876, - 43.710633566006436, - 44.184220933995675, - 44.64945485899807, - 44.81567210798676, - 44.84664799500024, - 45.27336649800418, - 45.37763894199452, - 45.72380971700477, - 46.070966469997074, - 46.21998862699547, - 46.49928090299363, - 46.61048620700603, - 46.630982068003505, - 46.64721728800214, - 46.758437234006124, - 46.815806950005936, - 46.81596550501126, - 49.69330024700321, - 49.824039707003976, - 50.10200952801097, - 50.67490084300516, - 51.457757969008526, - 51.92383089300711, - 51.98023691501294, - 52.00546727899928, - 52.504609970987076, - 52.59523017999891, - 52.625921427010326, - 53.22545795601036, - 53.50951228800113, - 54.19563728400681, - 54.31840053200722, - 54.994218398001976, - 55.045038519994705, - 55.18861213400669, - 56.20158334900043, - 56.874866219004616, - 57.69438712899864, - 61.14790675599943, - 61.175323649003985, - 61.186063515997375, - 61.23364794199006, - 61.24815951001074, - 61.25046820699936, - 61.26247760601109, - 61.27802414300095, - 61.350588646993856, - 61.35126514500007, - 61.35154833900742, - 61.39491612800339, - 61.39787496801, - 61.4111119559966, - 61.41535419100546, - 61.4169203779893, - 61.42459191400849, - 61.4299074330047, - 61.43086649400357, - 61.43191205900803, - 61.43561225599842, - 61.43792119300633, - 61.44461846999184, - 61.4494390119944, - 61.59495368099306, - 61.608374235001975, - 61.638140117996954, - 61.65177527400374, - 61.65145175099315, - 61.665927855996415, - 61.676219476998085, - 61.715734231998795, - 62.19154511600209, - 62.36093542200979, - 62.36464802900446, - 62.375151258995174, - 62.37767653001356, - 62.38165806100005, - 62.39870344600058, - 62.39937302400358, - 62.4087756700028, - 62.422988730002544, - 62.64057097300247, - 63.29517850000411, - 63.31779746701068, - 63.40137365000555, - 63.60650566199911, - 63.97567374900973, - 64.04213299399999, - 64.39380086799792 - ], - "storage_latencies": [ - 0.06576044998655561, - 0.13227312602975871, - 0.14833014299802016, - 0.12940345599781722, - 0.08079318501404487, - 0.03851915801351424, - 0.11124105298949871, - 0.03997600503498688, - 0.18412183101463597, - 0.15637688101560343, - 0.20964958900003694, - 0.23935689101926982, - 0.28159209998557344, - 0.1972191500099143, - 0.23608006301219575, - 0.08733629599737469, - 0.08480164600769058, - 0.22056906700890977, - 0.2708284169930266, - 0.09373642501304857, - 0.06428962999780197, - 0.21161344202118926, - 0.28449158601870295, - 0.1666449629847193, - 0.3446800149831688, - 0.14493131599738263, - 0.16948098798457067, - 0.10573152799042873, - 0.04210825200425461, - 0.23975943902041763, - 0.1226795100083109, - 0.11444678998668678, - 0.032771705999039114, - 0.10711553400324192, - 0.14688998003839515, - 0.08103180199395865, - 0.13682693301234394, - 0.15093147198786028, - 0.22583320800913498, - 0.4063988420239184, - 0.18610853698919527, - 0.13000962999649346, - 0.030561211999156512, - 0.138661471020896, - 0.11224866600241512, - 0.38258418304030783, - 0.12659311498282477, - 0.1371955940121552, - 0.28548553498694673, - 0.1606422710174229, - 0.06403717199282255, - 0.19345145601255354, - 0.17863497498910874, - 0.017726034013321623, - 0.1675329830031842, - 0.0864473999972688, - 0.09660499400342815, - 0.29293386501376517, - 0.35827926895581186, - 0.20665711599576753, - 0.3845737090159673, - 0.11915449898515362, - 0.02417231300205458, - 0.19567270504194312, - 0.047625261009670794, - 0.1776426259893924, - 0.17350132700812537, - 0.22028869400674012, - 0.118560447008349, - 0.4138425630371785, - 0.15000940702157095, - 0.3230201029946329, - 0.04564290899725165, - 0.3408860020135762, - 0.10445033501309808, - 0.6263300780410646, - 0.25191158802772406, - 0.30379336401529144, - 0.33120687697373796, - 0.23171292696497403, - 0.5853674439858878, - 0.29946053902676795, - 0.17379574402002618, - 0.5133802399941487, - 0.040400521000265144, - 0.6940544150129426, - 0.5264280299888924, - 0.23800051400030497, - 0.19071017400710844, - 0.7205946709727868, - 0.7385843790252693, - 0.27367339200282004, - 0.25129778699192684, - 0.5410279589996208, - 0.3915728000138188, - 0.4956009610177716, - 0.4726059270033147, - 0.7351234600209864, - 0.6131827740027802, - 0.21650276698346715, - 0.2025227379926946, - 0.6144821619673166, - 0.30199442501179874, - 0.6689792400138685, - 0.7840347070014104, - 0.45720247300050687, - 0.8206987389858114, - 0.7927983730187407, - 0.5497817599825794, - 0.283377861007466, - 0.4070295139972586, - 0.16185636798036285, - 0.5128098069835687, - 0.051563022992922924, - 0.3332059399690479, - 0.35584961698623374, - 0.09484067001903895, - 0.5551759369991487, - 0.7149687789788004, - 0.04003256499709096, - 1.1049414700100897, - 0.8512745349726174, - 0.26009871902351733, - 0.05880806401546579, - 0.24743968198890798, - 0.0424551179894479, - 0.5143879030365497, - 0.093981347992667, - 0.8496191829908639, - 0.8660496999800671, - 0.4465681429574033, - 0.27199913701042533, - 0.10907531301199924, - 0.1680209320038557, - 0.9661046219698619, - 0.48005090397782624, - 0.058986437012208626, - 0.035881577015970834, - 0.05715254398819525, - 0.674437659981777, - 0.467504665008164, - 0.9174954579793848, - 0.02052381199609954, - 0.04801385798782576, - 0.42863893101457506, - 0.041271208989201114, - 0.8433196480036713, - 0.6604326340166153, - 0.10096597400843166, - 0.34534043598978315, - 0.3817532790126279, - 0.0674393379886169, - 0.06367615499766544, - 0.40254115998686757, - 0.06368129501061048, - 0.0931783929845551, - 0.650089276037761, - 0.42190641796332784, - 0.07941390501218848, - 0.6080576619569911, - 0.08936212997650728, - 0.07913996998104267, - 0.11930710998422, - 0.036137966002570465, - 0.07490761599910911, - 0.0795958359958604, - 0.07892401301069185, - 0.49567125203611795, - 0.12024337198818102, - 0.07928184200136457, - 0.12992552600917406, - 0.07971953698142897, - 0.11049159603135195, - 0.09872442399500869, - 0.4641484600142576, - 0.03647157100203913, - 0.14212802701513283, - 0.07368297704670113, - 0.08808102901093662, - 0.6297939329961082, - 0.13076594000449404, - 0.10944680598913692, - 0.10436702100560069, - 0.06750037100573536, - 0.1355146020068787, - 0.0948289819934871, - 0.16134010098176077, - 0.09908211301080883, - 0.051683890022104606, - 0.11412177901365794, - 0.041767596019781195, - 0.19530780096829403, - 0.0422328419808764, - 0.05308560798584949, - 0.1307921089755837, - 0.2736974930012366, - 0.036607073998311535, - 0.08373901901359204, - 0.041897786999470554, - 0.005285015009576455, - 0.10473586899752263, - 0.6658943159854971, - 0.10379863000707701, - 0.2566024239931721, - 0.03635341499466449, - 0.12448221698286943, - 0.09359707799740136, - 0.11361703198053874, - 0.06367949899868108, - 0.12051916698692366, - 0.08538509599748068, - 0.06188697701145429, - 0.13618722799583338, - 0.15104664201498963, - 0.11531516200921033, - 0.15755683899624273, - 0.12894454602792393, - 0.20953284799179528, - 0.11415791400941089, - 0.057191817002603784, - 0.06311396801902447, - 0.036343041007057764, - 1.074208936013747, - 0.31097010699159, - 0.09949515998596326, - 0.04136223302339204, - 0.1773884219728643, - 0.09928632499941159, - 0.07314215599035379, - 0.07876377397042233, - 0.14578143796825316, - 0.1999110920005478, - 0.09346123097930104, - 0.12546059895248618, - 0.10977379101677798, - 0.07874537601310294, - 0.2737362700427184, - 0.10526115800894331, - 0.14496007100387942, - 0.11513839398685377, - 0.2144425999285886, - 0.07457268796861172, - 0.11427519698918331, - 0.10954286401101854, - 0.06281118502374738, - 0.04833924298873171, - 0.026316449002479203, - 0.07224403800501022, - 0.1826809640188003, - 1.127575447986601, - 0.11913242000446189, - 0.06177921796916053, - 0.04724862102011684, - 0.34345412798575126, - 0.046114136013784446, - 0.08313838700996712, - 0.026124845011509024, - 0.025885905008181, - 0.08452657301677391, - 0.04286515200510621, - 0.015349930996308103, - 0.08232314101769589, - 0.14628728100797161, - 0.0988505089917453, - 0.08819169401249383, - 0.07895478302089032, - 0.155269258946646, - 0.0722724619845394, - 0.047422773990547284, - 0.118798136987607, - 0.08258997101802379, - 0.11987935702200048, - 0.13476535298104864, - 0.14647621003678069, - 0.12330419498903211, - 1.4372622359951492, - 0.13029665098292753, - 0.1469078329973854, - 0.1241104710061336, - 0.01021185600257013, - 0.12066693398810457, - 0.061805343968444504, - 1.2339523030386772, - 0.07816204903065227, - 0.22910250497807283, - 0.05266437600948848, - 0.09369853597308975, - 0.11596266500419006, - 0.03694765300315339, - 0.051946767009212635, - 0.036522968992358074, - 0.026461090994416736, - 0.005189493007492274, - 0.0480829879961675, - 0.08752336099860258, - 1.5163846549985465, - 0.08753929196973331, - 0.09468941997329239, - 0.09455814900866244, - 0.10334384100860916, - 0.046922473004087806, - 0.046688846006873064, - 0.05274895498587284, - 0.09394809600780718, - 0.07754454898531549, - 0.05742956198810134, - 0.12474851301521994, - 0.13270334298431408, - 0.0980170190014178, - 0.08011637501476798, - 0.047218308012816124, - 0.05707990500377491, - 0.053550674027064815, - 0.18104937701718882, - 0.16716161496879067, - 0.04740794998360798, - 0.04420725998352282, - 0.14066507195821032, - 0.16074739501345903, - 0.08774983900366351, - 0.04192214498471003, - 0.23307849500270095, - 0.12953561000176705, - 0.13082690701412503, - 0.06268091000674758, - 0.13972429897694383, - 0.08866276900516823, - 0.02190706600958947, - 0.1358157260256121, - 0.04157165199285373, - 0.10868396401929203, - 2.1708688030048506, - 0.19118912798876408, - 0.092523739032913, - 0.06238620899966918, - 0.1507351679756539, - 0.046843165007885545, - 0.03103967801143881, - 0.057212368003092706, - 0.04341988802480046, - 0.10410961099842098, - 0.06351525096397381, - 0.10416265697858762, - 0.09880073000385892, - 0.05727759699220769, - 0.22688610496697947, - 0.06264595303218812, - 0.07832245997269638, - 0.05204484397836495, - 0.0565025900141336, - 0.21386070897278842, - 0.21078058100829367, - 0.12536449698382057, - 0.1363832319912035, - 0.07870069798082113, - 0.010285190001013689, - 0.19276418903609738, - 0.09303993998037186, - 0.01629259900073521, - 0.05198245400970336, - 0.14596016703580972, - 0.06401312202797271, - 0.15080415603006259, - 0.03595476500049699, - 0.08879615504702087, - 0.134124404983595, - 0.13616102399828378, - 0.13006233403575607, - 0.015405006997752935, - 0.10504595999373123, - 0.11920296300377231, - 0.10306946597120259, - 0.16633376594108995, - 0.10584626698982902, - 0.07731451804284006, - 0.08806320898293052, - 0.16349571901082527, - 0.09823709698684979, - 0.09892444599245209, - 0.035694980004336685, - 0.09514713496901095, - 0.040633153010276146, - 0.10803254402708262, - 0.11569973500445485, - 0.09399353797198273, - 0.09970159402291756, - 0.13373025300097652, - 0.06685303500853479, - 0.13547205699433107, - 0.02853528002742678, - 0.02849819495168049, - 0.01239483199606184, - 0.006430265988456085, - 0.00309597299201414, - 0.004352821983047761, - 0.0019259780092397705, - 0.006746972998371348, - 0.008303673006594181, - 0.007565916996099986, - 0.00977107000653632, - 0.01925282401498407, - 0.015929663000861183, - 0.016256393006187864, - 0.04728574801993091, - 0.03497907202108763, - 0.03591049298120197, - 0.018649428995558992, - 0.02712489099940285, - 0.01907216898689512, - 0.06111424198024906, - 0.030973046988947317, - 0.05507334592402913, - 0.00969422297202982, - 0.008869918994605541, - 0.012227307015564293, - 0.05260633498255629, - 0.02254366000124719, - 0.0229267549875658, - 0.044166907988255844, - 0.02604383097786922, - 0.04903008999826852, - 0.3296788629668299, - 0.2134792639844818, - 0.19410284601326566, - 0.12541228900954593, - 0.24298049001663458, - 0.18219376698834822, - 0.28781624701514374, - 0.34520229396002833, - 0.2886695629567839, - 0.0643100900342688, - 0.35917489300481975, - 0.0473519770312123, - 0.0635549460130278, - 0.057159465010045096, - 0.0663154470094014, - 0.048957566992612556, - 0.1349980469822185 - ], - "generation_latencies": [ - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0 - ], - "throughput_timeline": [], - "prefill_latencies": [ - 0.016567835002206266, - 0.03769296599784866, - 0.10188257900881581, - 0.09135945999878459, - 0.017585232999408618, - 0.022440528991864994, - 0.02358558599371463, - 0.02394587099843193, - 0.01773415600473527, - 0.04141982599685434, - 0.034766147000482306, - 0.057696875010151416, - 0.07650976700824685, - 0.05816890099958982, - 0.017216790991369635, - 0.07006149299559183, - 0.05983190500410274, - 0.013465336000081152, - 0.013416707995929755, - 0.06604582400177605, - 0.06301823099784087, - 0.10662045500066597, - 0.07029102099477313, - 0.09758354999939911, - 0.09781121001287829, - 0.04684158699819818, - 0.04783574699831661, - 0.16325232898816466, - 0.1746836860111216, - 0.11722076100704726, - 0.17461072000151034, - 0.17553821200272068, - 0.11784193300991319, - 0.014422697000554763, - 0.03596674700384028, - 0.04462784100905992, - 0.16588747799687553, - 0.16297616400697734, - 0.15844448099960573, - 0.04171995000797324, - 0.03460470000572968, - 0.0429603949887678, - 0.053063735002069734, - 0.02223732699349057, - 0.058852603993727826, - 0.052555466012563556, - 0.05246600401005708, - 0.05416856400552206, - 0.0599124240106903, - 0.0511614619899774, - 0.058221086001140065, - 0.01562784699490294, - 0.032573987991781905, - 0.07386956899426877, - 0.07007876998977736, - 0.048942709996481426, - 0.070338334000553, - 0.07384985800308641, - 0.04460924099839758, - 0.05043657499481924, - 0.02647479099687189, - 0.031129941999097355, - 0.01726394399884157, - 0.02501635000226088, - 0.02201549599703867, - 0.013701721007237211, - 0.019019098006538115, - 0.019242602997110225, - 0.028145953998318873, - 0.02719425300892908, - 0.03500628500478342, - 0.021119940996868536, - 0.020073453997611068, - 0.04144139301206451, - 0.02725226899201516, - 0.0148632039927179, - 0.022849625995149836, - 0.036011892996612005, - 0.016506330008269288, - 0.10005215999262873, - 0.10932181199314073, - 0.017871604999527335, - 0.10873387799074408, - 0.10197662700375076, - 0.01878946200304199, - 0.011384315002942458, - 0.008916524995584041, - 0.1089394739974523, - 0.02349517500260845, - 0.011841077008284628, - 0.029934956997749396, - 0.02917033899575472, - 0.019752939988393337, - 0.04984243999933824, - 0.012773351001669653, - 0.03787937700690236, - 0.12424392699904274, - 0.09215354100160766, - 0.11715272499714047, - 0.10465404600836337, - 0.12611356300476473, - 0.014825769001618028, - 0.006708619999699295, - 0.10573203201056458, - 0.02369180000096094, - 0.02414735399361234, - 0.02353079199383501, - 0.1311744870035909, - 0.024719767010537907, - 0.03928679600358009, - 0.03263831700314768, - 0.08948173999669962, - 0.11663242000213359, - 0.11557862799963914, - 0.09634467400610447, - 0.09681654900487047, - 0.02188977699552197, - 0.09022707199619617, - 0.10571266998886131, - 0.02918725900235586, - 0.02253701900190208, - 0.026162027003010735, - 0.12354779499582946, - 0.02412053600710351, - 0.013661297009093687, - 0.035181291998014785, - 0.14607140098814853, - 0.11955428999499418, - 0.1356646920030471, - 0.12844380699971225, - 0.20571788899542298, - 0.19936786200560164, - 0.19860582599358167, - 0.470902886998374, - 0.05152547199395485, - 0.06337590199836995, - 0.3454057030030526, - 0.030836365011055022, - 0.03652377400430851, - 0.3321477150020655, - 0.01059910400363151, - 0.35083109700644854, - 0.02827997799613513, - 0.0, - 0.5241918309911853, - 0.042401667989906855, - 0.026081683987285942, - 0.08125062599719968, - 0.036155419002170675, - 0.02199593000113964, - 0.0, - 0.5246143840049626, - 0.021046327005024068, - 0.02643393700418528, - 0.020820712990825996, - 0.030809874995611608, - 0.031288548998418264, - 0.016409875999670476, - 0.03107755300879944, - 0.0, - 0.02730656300263945, - 0.03479139901173767, - 0.025655492005171254, - 0.03719048900529742, - 0.020772539006429724, - 0.02113028199528344, - 0.020785051994607784, - 0.015887777000898495, - 0.010616273997584358, - 0.025528704005409963, - 0.015548069000942633, - 0.01563660299871117, - 0.020965294999768957, - 0.015633777002221905, - 0.01535521999176126, - 0.026240593011607416, - 0.020485711997025646, - 0.02747710699622985, - 0.03629582498979289, - 0.016002838005078956, - 0.020578990996000357, - 0.03241593099664897, - 0.010380961000919342, - 0.025944588996935636, - 0.031302158007747494, - 0.0319327089964645, - 0.010537574999034405, - 0.04609133201302029, - 0.015895625998382457, - 0.02651676100504119, - 0.021419803000753745, - 0.02589840099972207, - 0.026391521998448297, - 0.026143416995182633, - 0.01570080700912513, - 0.02709238900570199, - 0.021756870002718642, - 0.0, - 0.010653104007360525, - 0.0, - 0.03074548300355673, - 0.0, - 0.015904464002233, - 0.015499706991249695, - 0.02635566699609626, - 0.025760127988178283, - 0.010630691002006643, - 0.04716319699946325, - 0.015356055009760894, - 0.0261353060050169, - 0.030709101993124932, - 0.02205411100294441, - 0.015754886000650004, - 0.036599518003640696, - 0.01556789300229866, - 0.015776129002915695, - 0.020771704002982005, - 0.021043353990535252, - 0.0, - 0.03585215000202879, - 0.06140265900467057, - 0.020695112005341798, - 0.030997523994301446, - 0.00033656500454526395, - 0.015413845991133712, - 0.0, - 0.020860103992163204, - 0.0, - 0.015480278991162777, - 0.0, - 0.025681439001346007, - 0.0, - 0.02583249399322085, - 0.015927290005492978, - 0.02132443999289535, - 0.02161171499756165, - 0.02560122500290163, - 0.02053396200062707, - 0.0, - 0.0, - 0.010420242004329339, - 0.041142015004879795, - 0.01622871300787665, - 0.0, - 0.029492882997146808, - 0.016208505010581575, - 0.025848473989753984, - 0.0358723309909692, - 0.0, - 0.011971792991971597, - 0.010719034995418042, - 0.0, - 0.0, - 0.0, - 0.025910379001288675, - 0.0, - 0.0, - 0.0, - 0.05149770800198894, - 0.025658952989033423, - 0.020959782996214926, - 0.03624730299634393, - 1.1171213659981731, - 0.015899457997875288, - 0.021862435998627916, - 0.02705691399751231, - 0.01085872101248242, - 0.030944961996283382, - 0.030622140009654686, - 0.02097037799831014, - 0.015488213000935502, - 0.020887242004391737, - 0.02069991099415347, - 0.0, - 0.035956406994955614, - 0.026040458003990352, - 0.0, - 0.0, - 0.0, - 0.0, - 0.020754447003128007, - 0.04100435500731692, - 0.0, - 0.015602742991177365, - 0.020796486001927406, - 0.035903331998270005, - 0.015562211003270932, - 0.0, - 0.0, - 0.01677818299503997, - 0.041316496004583314, - 0.0, - 0.01036106000537984, - 0.010239397000987083, - 0.025690479000331834, - 0.011031362009816803, - 0.02606066300359089, - 0.020779669997864403, - 0.031589517995598726, - 0.03636863200517837, - 0.04149961700022686, - 0.02596004599763546, - 0.0366871370060835, - 0.03622168301080819, - 0.016108141004224308, - 0.020856282993918285, - 0.020597492999513634, - 0.0, - 0.0156979949970264, - 0.026622556993970647, - 0.021470496998517774, - 0.0, - 0.03123864799272269, - 0.030975357003626414, - 0.020634387008612975, - 0.020595156005583704, - 0.0, - 0.03134017098636832, - 0.0, - 0.010764504011604004, - 0.03119078101008199, - 0.02567590201215353, - 0.028072199987946078, - 0.02609549299813807, - 0.0, - 0.0, - 0.041147864001686685, - 0.0, - 0.020396010993863456, - 0.041266297004767694, - 0.026255246993969195, - 0.0, - 0.0, - 0.025760652992175892, - 0.016379168009734713, - 0.0, - 0.016314616004819982, - 0.030827668000711128, - 0.030658263000077568, - 0.0358481189905433, - 0.0211556480062427, - 0.0, - 0.0, - 0.021200390008743852, - 0.025873207006952725, - 0.02600572600204032, - 0.03600203100359067, - 0.03124159900471568, - 0.0, - 0.02126129699172452, - 0.016047641998738982, - 0.04688966200046707, - 0.03652845599572174, - 0.016668153999489732, - 0.015766907992656343, - 0.03587021099519916, - 0.0, - 0.010713709008996375, - 0.0, - 0.025608291995013133, - 0.015401064010802656, - 0.01588601699040737, - 0.010794902002089657, - 0.030772815996897407, - 0.0, - 0.0, - 0.0, - 0.016122507993713953, - 0.025741154997376725, - 0.01049107800645288, - 0.030841152009088546, - 0.0, - 0.025835982989519835, - 0.0, - 0.0, - 0.0, - 0.0, - 0.01582718300051056, - 0.021588956005871296, - 0.057278389009297825, - 0.0, - 0.0, - 0.0, - 0.02068537000741344, - 0.03794512400054373, - 0.0, - 0.03623983600118663, - 0.02062907700019423, - 0.04607023099379148, - 0.0, - 0.0007080599898472428, - 0.0023766720114508644, - 0.0027846979937748984, - 0.0021538079890888184, - 0.001658809997024946, - 0.0012036609987262636, - 0.005677351000485942, - 0.005100794005556963, - 0.004247526987455785, - 0.004513659005169757, - 0.006183331002830528, - 0.008561354989069514, - 0.006702549013425596, - 0.007155332001275383, - 0.0092567039973801, - 0.009846305998507887, - 0.006609885997022502, - 0.005660662005539052, - 0.0060261350008659065, - 0.004717845993582159, - 0.00784027400368359, - 0.005557324999244884, - 0.003494489996228367, - 0.005701928996131755, - 0.005613949004327878, - 0.005525457003386691, - 0.007256711003719829, - 0.008186192993889563, - 0.009212370991008356, - 0.007948947997647338, - 0.015751849990920164, - 0.02361061099509243, - 0.018639528992935084, - 0.04005296099057887, - 0.042342337997979484, - 0.042142079008044675, - 0.08964365400606766, - 0.0980294490000233, - 0.056965426992974244, - 0.035351571001228876, - 0.00834106799447909, - 0.15595498100447003, - 0.011651561013422906, - 0.008085110006504692, - 0.012084706002497114, - 0.014028488993062638, - 0.008334452009876259, - 0.035735591998673044 - ], - "decode_latencies": [ - 0.01194276599562727, - 0.006016447994625196, - 0.005526591994566843, - 0.015543071000138298, - 3.14170029014349e-05, - 0.006951012008357793, - 0.01128020801115781, - 0.0036605819914257154, - 0.00345924700377509, - 0.07160862800083123, - 0.007066934995236807, - 0.013130473002092913, - 0.00043306799489073455, - 0.011315383002511226, - 0.011368953011697158, - 0.0202045280020684, - 0.01287315200897865, - 0.021227017990895547, - 0.0001292670058319345, - 0.011056672999984585, - 0.00730061499052681, - 0.004254396000760607, - 0.03428387100575492, - 0.07700080799986608, - 0.0028210849995957687, - 0.0651099529932253, - 0.036699976000818424, - 0.01365726899530273, - 2.329199924133718e-05, - 0.004923355998471379, - 0.0031436839926755056, - 0.0067699189967243, - 0.015970487002050504, - 0.019924181993701495, - 0.013877928999136202, - 0.011816906000603922, - 0.021128937994944863, - 0.0223565110063646, - 0.0656437050056411, - 0.004678787998273037, - 0.10598150199803058, - 0.013500368993845768, - 0.013862822001101449, - 0.015157341011217795, - 0.011891771995578893, - 0.001626223005587235, - 0.013804995993268676, - 0.006357384991133586, - 0.07784170701052062, - 0.01311058399733156, - 0.0027658909966703504, - 0.011273915995843709, - 0.007032109991996549, - 0.011962271993979812, - 0.013222342007793486, - 0.005351890999008901, - 0.03332135800155811, - 0.02657233898935374, - 0.0415141770063201, - 0.017830381999374367, - 0.0421400920022279, - 3.0305003747344017e-05, - 0.0007494480087189004, - 0.0033809820015449077, - 0.08551406799233519, - 0.0020028479921165854, - 0.007465185000910424, - 0.010198240997851826, - 0.003294490001280792, - 0.07456291900598444, - 0.018762699997751042, - 0.026167489995714277, - 0.06897559999197256, - 0.020413530990481377, - 0.013301740007591434, - 0.005904077988816425, - 0.019489105005050078, - 0.039089489000616595, - 0.024731911995331757, - 0.013840153987985104, - 0.005118030996527523, - 0.02940019300149288, - 0.07895687900600024, - 0.024191126998630352, - 0.013624152008560486, - 0.001390392004395835, - 0.01993094700446818, - 0.020259905999409966, - 0.09000474598724395, - 0.014730964001500979, - 0.0768255489965668, - 0.015557834005448967, - 0.022280176999629475, - 0.006947072004550137, - 0.012248680999618955, - 0.01443242900131736, - 0.02668211499985773, - 0.019597374004661106, - 0.007493188997614197, - 0.011097899012384005, - 0.1906325210002251, - 0.09083775599719957, - 0.10982952000631485, - 0.012367118004476652, - 0.0014796420000493526, - 0.011309181994874962, - 0.002893487995606847, - 0.007613349007442594, - 0.013618009004858322, - 0.2824634759890614, - 0.016317544010234997, - 0.012899940003990196, - 0.05111421999754384, - 0.005202877000556327, - 0.018386927011306398, - 0.005149967008037493, - 0.14163670199923217, - 0.07599269400816411, - 0.01376875300775282, - 0.005167344002984464, - 0.007768124996800907, - 0.006362607004120946, - 0.19833026301057544, - 0.010555529996054247, - 0.4696216129959794, - 0.0154406860092422, - 0.006954550000955351, - 0.02076998598931823, - 0.013413838998530991, - 0.07614801899762824, - 0.013376842005527578, - 0.27063015999738127, - 0.005221587009145878, - 0.005456754995975643, - 0.012434356001904234, - 0.03618199599441141, - 0.01023564201022964, - 0.010342027002479881, - 0.010173258997383527, - 0.009484937996603549, - 0.08144687900494318, - 0.006672972987871617, - 0.010905921997618861, - 0.015553752004052512, - 0.010371185999247245, - 0.005306065999320708, - 0.018802642996888608, - 0.015119664996745996, - 0.005196138998144306, - 0.005482033011503518, - 0.005152409998117946, - 0.020532055001240224, - 0.00524392600345891, - 0.005189523013541475, - 0.01632023201091215, - 5.985501047689468e-05, - 0.005278850003378466, - 0.2766585369972745, - 0.01033475600706879, - 0.020537135991617106, - 0.21582167399174068, - 0.010305434989277273, - 0.00012085901107639074, - 0.010746517000370659, - 0.010340927998186089, - 0.01065652100078296, - 0.0053430630068760365, - 0.014216839001164772, - 0.005133436992764473, - 0.00559031699958723, - 0.010371700001996942, - 0.3825412790029077, - 0.010687742003938183, - 0.005301982004311867, - 0.005175004000193439, - 0.025792684013140388, - 0.010585769996396266, - 0.005195773992454633, - 0.010433338000439107, - 0.006050858995877206, - 0.00017347699031233788, - 0.0102511279983446, - 0.00015111399989109486, - 0.005172154997126199, - 0.005195745005039498, - 0.010460606994456612, - 0.010431716000312008, - 0.015473010993446223, - 0.015533661004155874, - 0.005243478997726925, - 0.010496440998394974, - 0.005468475996167399, - 0.010265616001561284, - 0.010400497005321085, - 0.0053785940108355135, - 0.010169852001126856, - 0.036033912998391315, - 0.010321114008547738, - 0.010378387989476323, - 0.010351942997658625, - 3.625800309237093e-05, - 0.010357260005548596, - 0.02068007900379598, - 0.005351155996322632, - 0.010324654998839833, - 0.01066612100112252, - 4.628898750524968e-05, - 0.015194157997029833, - 0.005161679000593722, - 0.005236976998276077, - 0.010438950004754588, - 0.005143592003150843, - 0.01042442600009963, - 0.011030415000277571, - 0.005361460003769025, - 0.005260917998384684, - 0.005178259001695551, - 0.010659857987775467, - 0.01036635399213992, - 0.010334266000427306, - 0.010341090994188562, - 0.00518409299547784, - 0.0054014979978092015, - 0.005415079998783767, - 0.010331204990507104, - 0.030693366992636584, - 0.00526071700733155, - 0.0051275019941385835, - 0.005128823991981335, - 0.005297448005876504, - 0.005285003004246391, - 0.010394369004643522, - 0.005175348007469438, - 0.00034644900006242096, - 0.01073479799379129, - 0.021095609001349658, - 0.00017546700837556273, - 0.01041372299368959, - 0.015362358011770993, - 0.015738978996523656, - 0.010370549003710039, - 0.010274486005073413, - 0.010690821000025608, - 0.015557243008515798, - 0.005171359996893443, - 0.0102591110044159, - 0.005288239000947215, - 0.015322728009778075, - 0.010311744001228362, - 0.010419345999252982, - 0.005527365006855689, - 0.010765796003397554, - 0.005188266004552133, - 0.010381329993833788, - 0.0051968939951621, - 0.005171369994059205, - 0.010230993997538462, - 0.00512829699437134, - 0.005146482988493517, - 0.015416623995406553, - 0.005550635003601201, - 0.015325182990636677, - 0.00526956700196024, - 0.010283947005518712, - 0.005114609986776486, - 0.015425621997565031, - 0.005136940002557822, - 0.01033744499727618, - 0.0053073749877512455, - 0.005148206008016132, - 0.005345262004993856, - 0.005156825005542487, - 0.010319223001715727, - 0.010309456993127242, - 0.021131202011019923, - 0.005191047996049747, - 0.02056658400397282, - 0.0052191050053806975, - 0.010145084001123905, - 0.005190886004129425, - 0.005159057007404044, - 9.000299905892462e-05, - 0.005558749006013386, - 0.00013738499546889216, - 0.005209326001931913, - 0.015901589998975396, - 0.005149157004780136, - 0.005258669989416376, - 0.00591478100977838, - 0.025453679991187528, - 0.005221710001933388, - 0.005145977993379347, - 0.005231568997260183, - 4.314600664656609e-05, - 0.026029360000393353, - 0.00045170799421612173, - 0.015465041011339054, - 0.015319715996156447, - 0.010359917010646313, - 0.02029070600110572, - 0.011748771008569747, - 0.005644072007271461, - 0.010375463985837996, - 0.005191199990804307, - 0.015308893998735584, - 0.005693648010492325, - 0.005520200997125357, - 0.00513478800712619, - 0.020513740993919782, - 0.005161731009138748, - 0.015771949998452328, - 0.005196783997234888, - 1.6679081589973066, - 0.012484634993597865, - 0.005183840999961831, - 0.005282073994749226, - 0.015315648008254357, - 0.005193545002839528, - 0.005215857003349811, - 0.010318422995624132, - 0.005277353004203178, - 0.01574213500134647, - 0.005257170996628702, - 0.015374205002444796, - 0.005241382998065092, - 0.01014207499974873, - 0.005130068006110378, - 0.00524111399136018, - 0.015706798003520817, - 0.005267691012704745, - 0.015398754010675475, - 0.005147543997736648, - 0.010224999001366086, - 0.005137871004990302, - 0.02048291900428012, - 0.006043375004082918, - 0.010224521000054665, - 0.0051813449972542, - 0.005523430998437107, - 0.01553147699451074, - 0.005120178000652231, - 0.005317738992744125, - 0.013801447988953441, - 0.0051235049904789776, - 0.0051770510035566986, - 0.010445406995131634, - 0.010293829007423483, - 9.890100045595318e-05, - 0.005148990007000975, - 0.01032385700091254, - 0.005136666004545987, - 0.010330092001822777, - 0.010298838999005966, - 0.010285481999744661, - 0.005211415991652757, - 0.005140263994690031, - 0.005555670999456197, - 0.020938146990374662, - 0.005146114010130987, - 0.01029996300349012, - 0.005166532995644957, - 0.010278333007590845, - 0.005841356993187219, - 0.005124149000039324, - 0.010297828004695475, - 0.005176217993721366, - 0.010198735006269999, - 0.020582381999702193, - 0.015408744002343155, - 0.005188413997530006, - 0.026684858996304683, - 0.005266334002953954, - 0.010771517001558095, - 0.01542947300185915, - 0.025582257003406994, - 0.015571847994579002, - 0.010259229005896486, - 0.020448973009479232, - 0.005244505999144167, - 0.04588954600330908, - 0.010445211999467574, - 0.007723301998339593, - 0.005120967995026149, - 0.015291294999769889, - 0.010367474998929538, - 0.015326166991144419, - 0.010968499991577119, - 0.015457683010026813, - 0.015381272998638451, - 0.021573795995209366, - 0.00021914999524597079, - 0.00031427199428435415, - 0.00106359799974598, - 0.00029237101261969656, - 0.0011936770024476573, - 0.0002106949978042394, - 0.0012082610046491027, - 0.0011456749925855547, - 0.0005249970126897097, - 0.0035293349938001484, - 0.003935162007110193, - 0.0030960730073275045, - 0.007572087997687049, - 0.003758382998057641, - 0.0038186659949133173, - 0.006429864006349817, - 0.0028802490123780444, - 0.004441714001586661, - 0.001970516997971572, - 0.0015689129941165447, - 0.003187758004060015, - 0.0034448360092937946, - 0.0005901349941268563, - 0.0019903459906345233, - 0.00207536100060679, - 0.0033826940052676946, - 0.0011337489995639771, - 0.002121260011335835, - 0.0036664269864559174, - 0.0018581199983600527, - 0.16107058100169525, - 0.008991191993118264, - 0.08509104500990361, - 0.1549915279902052, - 0.018320393006433733, - 0.09428185000433587, - 0.012362159002805129, - 0.0881672149989754, - 0.06694382199202664, - 0.010536344008869492, - 0.0031823940080357715, - 0.005405217001680285, - 0.007273847004398704, - 0.003073693995247595, - 0.005302746998495422, - 0.006004191003739834, - 0.003996491999714635, - 0.014978374005295336 - ], - "multi_turn_cache_hits": 44, - "multi_turn_cache_misses": 257, - "seed": 42, - "summary": { - "total_requests": 438, - "total_tokens": 118293, - "elapsed_time": 60.63476014137268, - "avg_throughput_tokens_per_sec": 1950.9106612146982, - "requests_per_second": 7.223579329394282, - "end_to_end_latency_ms": { - "mean": 22496.613017578275, - "p50": 15972.133246999874, - "p95": 61651.500279444735, - "p99": 63370.45046229745 - }, - "storage_io_latency_ms": { - "mean": 189.3974574746965, - "p50": 109.65832751389826, - "p95": 669.7980030090547, - "p99": 1119.2008761352915 - }, - "generation_latency_ms": { - "mean": 0.0, - "p50": 0.0, - "p95": 0.0, - "p99": 0.0 - }, - "cache_stats": { - "cache_hit_rate": 0.9259102455546148, - "cache_hits": 4374, - "cache_misses": 350, - "gpu_entries": 61, - "cpu_entries": 156, - "nvme_entries": 158, - "gpu_memory_used_gb": 3.1993408203125, - "cpu_memory_used_gb": 1.7122802734375, - "offloads_cpu": 314, - "offloads_nvme": 158, - "storage_health": { - "overall_status": "PASS", - "criteria": [ - { - "name": "CPU RAM P95 < 150ms", - "target": 150, - "actual": 15.70464664910105, - "unit": "ms", - "passed": true - }, - { - "name": "Cache Hit Rate > 30%", - "target": 0.3, - "actual": 0.9259102455546148, - "unit": "ratio", - "passed": true - } - ], - "passed_count": 2, - "total_count": 2 - }, - "prefill_writes": 380, - "decode_reads": 4374, - "prefill_bytes_written_gb": 6.681640625, - "decode_bytes_read_gb": 75.67919921875, - "system_prompt_hits": 859, - "common_phrase_hits": 0, - "user_cache_hits": 3471, - "multi_turn_hits": 44, - "total_read_bytes": 81259921408, - "total_write_bytes": 7174356992, - "total_read_gb": 75.67919921875, - "total_write_gb": 6.681640625, - "read_write_ratio": 11.326439637532886, - "read_iops": 4374, - "write_iops": 380, - "gpu_read_p50_ms": 8.292541991977487, - "gpu_read_p95_ms": 46.69102755069613, - "gpu_read_p99_ms": 197.14496817949077, - "gpu_write_p50_ms": 25.927483999112155, - "gpu_write_p95_ms": 136.18502745230228, - "gpu_write_p99_ms": 376.04617290475045, - "cpu_read_p50_ms": 5.403606999607291, - "cpu_read_p95_ms": 15.70464664910105, - "cpu_read_p99_ms": 20.526593357644742 - }, - "qos_metrics": { - "interactive": { - "total_requests": 438, - "latency_ms": { - "mean": 22496.613017578275, - "p50": 15972.133246999874, - "p95": 61651.500279444735, - "p99": 63370.45046229745, - "max": 64393.800867997925 - }, - "sla": { - "target_p95_ms": 50, - "actual_p95_ms": 61651.500279444735, - "compliance": 0.0, - "met": false - } - }, - "responsive": { - "no_data": true - }, - "batch": { - "no_data": true - } - }, - "prefix_cache_stats": { - "prefix_hits": 93, - "prefix_misses": 345, - "system_prompt_reuse": 93, - "common_phrase_reuse": 0, - "bytes_saved": 83755008 - }, - "autoscaling_stats": [], - "autoscaling_summary": null, - "multi_turn_stats": { - "cache_hits": 44, - "cache_misses": 257, - "hit_rate": 0.1461794019933555 - } - } -} \ No newline at end of file diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_cpu_nvme_trial2.json b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_cpu_nvme_trial2.json deleted file mode 100644 index 32865d16..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_cpu_nvme_trial2.json +++ /dev/null @@ -1,2901 +0,0 @@ -{ - "requests_completed": 549, - "total_tokens_generated": 147313, - "total_storage_io_latency": 146.6765308654285, - "total_generation_latency": 0.0, - "end_to_end_latencies": [ - 0.1872955740109319, - 0.18761493399506435, - 0.19107846800761763, - 0.19369288500456605, - 0.24685060798947234, - 0.2525458739983151, - 0.3111986660078401, - 0.4564885030122241, - 0.45759857700613793, - 0.5250525730079971, - 0.5565691279916791, - 0.574119236000115, - 0.5748368650092743, - 0.5941671600012342, - 0.5942018630012171, - 0.6014892060047714, - 0.6020638189947931, - 0.6154026980075287, - 0.6211578810034553, - 0.6205733230017358, - 0.6295913469948573, - 0.6302506649954012, - 0.6308020510041388, - 0.63046607299475, - 0.6312565090047428, - 0.6321888550010044, - 0.6399984320014482, - 0.6489197500050068, - 0.6552459209924564, - 0.66526816500118, - 0.6653285129868891, - 0.6751319249888184, - 0.6753548999986378, - 0.6822900750121335, - 0.6829520090104779, - 0.6954850259935483, - 0.6963801289966796, - 0.7017106150015024, - 0.7084806099883281, - 0.790552211998147, - 0.7970889020070899, - 0.7971360200026538, - 0.8042725299892481, - 0.8053299689927371, - 0.8110461239994038, - 0.8164083709998522, - 0.8188800290081417, - 0.8354042029968696, - 0.8355303800053662, - 0.8410423769964837, - 0.841321263986174, - 0.8419612860016059, - 0.8430241189926164, - 0.8497634490049677, - 0.857352624007035, - 0.8567482090002159, - 0.9244079100026283, - 0.9257107110024663, - 0.931483467007638, - 0.9403410360100679, - 0.9476016239932505, - 0.9557799629983492, - 0.9670759720029309, - 0.9733594150020508, - 0.9749394329992356, - 0.9811467469990021, - 0.9841422389872605, - 0.9827991510101128, - 1.0800848900107667, - 1.0790339919913094, - 1.0789557230018545, - 1.0952852159971371, - 1.0984159849904245, - 1.0984462469932623, - 1.1712541570013855, - 1.1734612620057305, - 1.1934316069964552, - 1.1957526710029924, - 1.1969108369958121, - 1.2100030900037382, - 1.215334731998155, - 1.2191774709935999, - 1.2168136980035342, - 1.2197806280018995, - 1.3340819169970928, - 1.3365529449947644, - 1.3401139050110942, - 1.3484123509988422, - 1.5259373519947985, - 1.5274461340013659, - 1.534107680010493, - 1.5340742820117157, - 1.5348662589967716, - 1.5351461009995546, - 1.8495722920051776, - 1.8652760149998358, - 1.866320597997401, - 1.8802658849890577, - 1.8809516199980862, - 1.8808238210040145, - 1.8918472179939272, - 1.8974203259858768, - 2.1882182650006143, - 2.188211343003786, - 2.1973609759879764, - 2.21108839099179, - 2.2119844379922142, - 2.217525479005417, - 2.2178188450052403, - 2.218756727001164, - 2.2358008789888117, - 2.270939094989444, - 2.2782412150118034, - 2.291601749995607, - 2.290800483999192, - 2.2948620360111818, - 2.2938408460031496, - 2.2939123760006623, - 2.3004609680065187, - 2.300927086995216, - 2.3020091409998713, - 2.3385360700049205, - 2.3415801960072713, - 2.3405518420040607, - 2.340906511992216, - 2.343221071991138, - 2.342734185993322, - 2.3798979020066326, - 2.392812991005485, - 2.3951521039998624, - 2.5197587979928358, - 2.530016528995475, - 2.6534440349932993, - 2.723881641009939, - 2.729986040998483, - 2.7311636850063223, - 2.737337539001601, - 2.7363458990002982, - 2.749428266994073, - 2.7572110680048354, - 2.7675899179885164, - 2.769403158003115, - 2.76855730600073, - 2.7809773899934953, - 2.7823998620006023, - 2.7823262519959826, - 2.783004302997142, - 2.7848069940082496, - 2.794727023006999, - 2.793888137995964, - 2.796360595006263, - 2.8049278120015515, - 2.8121364309918135, - 2.8112961540027754, - 2.8272737899969798, - 2.827477247992647, - 2.836091448989464, - 2.8483742980024545, - 2.8566916530107846, - 2.857112061989028, - 2.864228575999732, - 2.8653149110032246, - 2.872968187002698, - 2.876351090002572, - 2.876960801993846, - 2.877676648000488, - 2.8767359069897793, - 2.8776570850022836, - 2.8796848649944877, - 2.879371426999569, - 2.901018568998552, - 2.900818593989243, - 2.8999806409992743, - 2.901223373992252, - 2.9004012090008473, - 2.929949914003373, - 2.9457356680068187, - 2.9689551900082733, - 2.9751899330003653, - 2.987162965000607, - 2.988147508003749, - 2.9957479659933597, - 3.000994667992927, - 3.007145550000132, - 3.008577052009059, - 3.015265752997948, - 3.0141893179970793, - 3.016451835996122, - 3.0162226149986964, - 3.0161401700024726, - 3.01751384200179, - 3.1634739180008182, - 3.1705234900000505, - 3.1709352690086234, - 3.1867896869953256, - 3.192605796997668, - 3.2136813620018074, - 3.215288856998086, - 3.215232102011214, - 3.222041838002042, - 3.223983028001385, - 3.2223200150037883, - 3.222645811998518, - 3.2328556110005593, - 3.234919971000636, - 3.2346281130012358, - 3.236582397003076, - 3.259770190998097, - 3.2586854730034247, - 3.2746373300033156, - 3.2811311690020375, - 3.2881212159991264, - 3.289745750007569, - 3.289993583006435, - 3.290491188992746, - 3.2923827199992957, - 3.2939711670042016, - 3.2988360990129877, - 3.2979333920084173, - 3.30188166500011, - 3.29944328199781, - 3.299320183999953, - 3.300467592009227, - 3.3007241430022987, - 3.3202498809987446, - 3.324106871004915, - 3.3280963199940743, - 3.333876795004471, - 3.3449850259930827, - 3.3563185920065735, - 3.3627217629982624, - 3.3687908179999795, - 3.389560725991032, - 3.404288257006556, - 3.4048670299962396, - 3.4093200450006407, - 3.413751060987124, - 3.414231415008544, - 3.606840202002786, - 3.605096914994647, - 3.6192612320010085, - 3.619518221996259, - 3.6196217489923583, - 3.627885937996325, - 3.6358851160039194, - 3.634440916008316, - 3.6422676569927717, - 3.6467760779923992, - 3.647423030997743, - 3.6531472289934754, - 3.653043843994965, - 3.653543427994009, - 3.660028707992751, - 3.659829405005439, - 3.6682149979897076, - 3.6671593799983384, - 3.6873563880071742, - 3.693857964986819, - 3.699972660993808, - 3.7146858509950107, - 3.9223696979897795, - 3.9301114410045557, - 3.9289965579955606, - 3.934844098999747, - 3.9448517999990145, - 3.946876608999446, - 3.945197901004576, - 3.9453082060063025, - 3.9482327820005594, - 3.9477624850114807, - 3.95152838199283, - 3.952422670001397, - 3.9554732379911, - 3.9560974150081165, - 3.9613214770070044, - 3.959794675989542, - 3.9607803789986065, - 3.9615427749959053, - 3.9617035050032428, - 3.961083852002048, - 3.969003755002632, - 3.973114063992398, - 3.9753943539981265, - 3.975108195998473, - 3.980326031000004, - 3.9835886689979816, - 3.9884805379988393, - 3.988762883003801, - 3.9901840550010093, - 3.9897682709997753, - 3.9909737249981845, - 3.9965114229999017, - 3.994885534993955, - 3.9965185939945513, - 3.9995770399982575, - 4.009640418997151, - 4.008850238999003, - 4.010413063006126, - 4.009252256000764, - 4.012141055005486, - 4.013062286001514, - 4.017273489997024, - 4.018977968007675, - 4.1306675969972275, - 4.135463967992109, - 4.137628575990675, - 4.142070330999559, - 4.1414270590030355, - 4.146017874998506, - 4.146377857003245, - 4.147063012991566, - 4.146262009002385, - 4.148083734995453, - 4.148240929003805, - 4.154655496007763, - 4.154824964993168, - 4.153755463994457, - 4.1710673089983175, - 4.174276441000984, - 4.179281158998492, - 4.1907608789915685, - 4.192421170999296, - 4.21069280699885, - 4.2264996230078395, - 4.23392520500056, - 4.239986482003587, - 4.3508382240106585, - 4.3543299269949785, - 4.374135603007744, - 4.380778573002317, - 4.407389165004133, - 4.4299835480051115, - 4.429783314000815, - 4.437294703995576, - 4.464916427998105, - 4.482584539000527, - 4.484407384006772, - 4.48348209199321, - 4.755168718009372, - 4.757107868994353, - 4.75791653599299, - 4.759698898997158, - 4.955176827992545, - 5.145290692002163, - 5.666834982999717, - 5.836728576003225, - 5.86161184500088, - 5.8723084420053056, - 5.999860451003769, - 6.083904772007372, - 6.445324786007404, - 6.473341862001689, - 6.4964339599973755, - 6.512709037997411, - 6.530341903999215, - 6.538242783994065, - 6.542488517996389, - 6.573984590999316, - 6.579470119002508, - 6.6075980689929565, - 6.6376233310002135, - 6.730120735010132, - 6.742085419988143, - 6.775547625991749, - 6.8242953209992265, - 6.83010506699793, - 6.860246170996106, - 6.90607841500605, - 6.96507689099235, - 7.5398662139923545, - 7.6119441889895825, - 7.873153715001536, - 7.888045065003098, - 7.959592490995419, - 7.996049181994749, - 8.10992420000548, - 9.208584331005113, - 9.585677062001196, - 9.591963206999935, - 9.73227947599662, - 9.731006315007107, - 9.872445567001705, - 10.410463802996674, - 10.753350815008162, - 10.859261796009378, - 11.114680615995894, - 11.389370543998666, - 12.60490025600302, - 12.794718981996994, - 12.796133445997839, - 12.838288186001591, - 13.20666074399196, - 13.295554786003777, - 13.381424865001463, - 13.751792684008251, - 13.983418934003566, - 14.818483349008602, - 15.970353120996151, - 16.279037824991974, - 16.334176569987903, - 16.457858465990284, - 16.48634327799664, - 16.730575637004222, - 17.75944034899294, - 17.87963322699943, - 18.035662798996782, - 18.17399041900353, - 18.56711978299427, - 18.661133309011348, - 18.69571813500079, - 19.05338424500951, - 19.145231584989233, - 20.253707219002536, - 20.284010525996564, - 20.331059873991762, - 21.476774479000596, - 21.58732291299384, - 21.58922254499339, - 21.632937115995446, - 21.65824086300563, - 21.74969869799679, - 21.832076241000323, - 22.284507844000473, - 22.359371290003764, - 22.437948410995887, - 22.527004421004676, - 22.967532203998417, - 23.011190307996003, - 23.02645990801102, - 23.2065354679944, - 23.249131521006348, - 23.421045669994783, - 23.980130134994397, - 24.15467832099239, - 24.68982929000049, - 26.05239281700051, - 26.068052274989896, - 26.422548425005516, - 26.469465545000276, - 26.537596804992063, - 26.65582311899925, - 26.65665409300709, - 26.87842174200341, - 27.158829948995844, - 27.202391514001647, - 27.202008729000227, - 27.25386090099346, - 27.382874891001848, - 27.791233293988626, - 27.80745863300399, - 27.900959920007153, - 28.33626515000651, - 28.617650920001324, - 28.731553288002033, - 28.740048060004483, - 28.76754838701163, - 28.864312630001223, - 29.41732098700595, - 29.495873925989144, - 29.792551020000246, - 30.020011870990857, - 30.041191922005964, - 30.102549848001217, - 31.794979610989685, - 32.093317010003375, - 32.41068338099285, - 33.20974623400252, - 33.39558113900421, - 33.57938226200349, - 33.59505602098943, - 33.93665173900081, - 34.029201755009126, - 34.04903928200656, - 34.42674820999673, - 34.478553371998714, - 34.72250811899721, - 35.08190425000794, - 35.41209616899141, - 35.453258050998556, - 35.4742534620018, - 36.05660937100765, - 36.092348799007596, - 36.46715690700512, - 36.508341610999196, - 36.65307218499947, - 38.97674624800857, - 39.43939419600065, - 39.80589537999185, - 39.85302864199912, - 40.64622304799559, - 40.661224334995495, - 40.696954632003326, - 41.290461463999236, - 41.373218078006175, - 41.5222454140021, - 41.97458146199642, - 42.08883602800779, - 42.15528186800657, - 42.18633142799081, - 42.43728088699572, - 42.44321962299, - 42.445764542004326, - 42.44679950700083, - 42.44833134600776, - 42.46417743799975, - 42.46433295099996, - 42.46474183800456, - 42.46545829900424, - 42.46967051598767, - 42.47178244400129, - 42.473577837998164, - 42.49921956600156, - 42.54521446301078, - 42.55115235799167, - 42.5555158900097, - 42.640892217998044, - 42.63800666099996, - 42.642965102000744, - 42.65056743599416, - 43.1339626070112, - 43.17303142200399, - 43.190421357998275, - 43.1918922830082, - 43.19000567200419, - 43.20527043999755, - 43.212918778997846, - 43.212460939001176, - 43.233969633001834, - 43.2422206409974, - 43.24548975098878, - 43.27088516300137, - 43.28404440000304, - 43.79381841700524, - 43.852608408997185, - 43.89778324900544, - 44.15833079900767, - 44.16502504098753, - 44.22203227800492, - 44.403366635990096, - 44.41394009299984, - 44.48063934300444, - 44.53668952800217, - 44.652499954987434, - 45.118630739001674, - 45.283984626992606, - 45.47525045499788, - 46.30247065299773, - 46.94673252898792, - 47.034731070001726 - ], - "storage_latencies": [ - 0.050538696013973095, - 0.12640770098369103, - 0.10712415000307374, - 0.12832203299331013, - 0.07404386399139185, - 0.12836056000378449, - 0.09138932000496425, - 0.12497554099536501, - 0.15763417902053334, - 0.10716614301782101, - 0.11323413098580204, - 0.10589277000690345, - 0.11007837200304493, - 0.2662829679902643, - 0.12935557901801076, - 0.2384498579922365, - 0.2797555679862853, - 0.2619162679766305, - 0.35208098802831955, - 0.24436759699892718, - 0.3161286420072429, - 0.232269861997338, - 0.25078086799476296, - 0.2099702970299404, - 0.12948265901650302, - 0.14708386700658593, - 0.3103693330194801, - 0.2554424160043709, - 0.2781019149988424, - 0.2097128039895324, - 0.13467127301555593, - 0.15433833097631577, - 0.07275107700843364, - 0.16819331800797954, - 0.13980427800561301, - 0.16841407697938848, - 0.24392830400029197, - 0.029585511991172098, - 0.21844599602627568, - 0.20840228199085686, - 0.291498660997604, - 0.1386141900002258, - 0.32580062400666066, - 0.29603386101371143, - 0.1662063949916046, - 0.15180919201520737, - 0.4476953539560782, - 0.2147573079855647, - 0.154912614976638, - 0.12325278601201717, - 0.034465336007997394, - 0.12760969299415592, - 0.12023124902043492, - 0.34579346200916916, - 0.4079981360118836, - 0.20450198402977549, - 0.14772653402178548, - 0.16094068200618494, - 0.21583799796644598, - 0.46476559605798684, - 0.1802119339845376, - 0.5371176599728642, - 0.19912297299015336, - 0.04045116498309653, - 0.27716349200636614, - 0.11110277898842469, - 0.5081587369786575, - 0.06637615100771654, - 0.5909454360371456, - 0.2514455350319622, - 0.09920729899022263, - 0.21515252796234563, - 0.6140858259896049, - 0.42543840399594046, - 0.04575126100098714, - 0.3938029679848114, - 0.11215578200062737, - 0.4569650899793487, - 0.520175513040158, - 0.5969849430111935, - 0.13158761398517527, - 0.7192482799582649, - 0.41308925600606017, - 0.5392241700028535, - 0.5431768929847749, - 0.5972534239554079, - 0.12874512901180424, - 0.20454676500230562, - 0.2768060900416458, - 0.4725660860276548, - 0.6175260849850019, - 0.5495546490128618, - 0.24813134300347883, - 0.3682450940250419, - 0.7486668859637575, - 0.708044017010252, - 0.8537747970258351, - 1.0675501329824328, - 1.0759920040582074, - 0.9117635849979706, - 0.9200173349963734, - 0.3897921360330656, - 1.0367814200290013, - 1.1602165829972364, - 0.5609768470021663, - 0.7961356999876443, - 1.184453849986312, - 0.5027533249958651, - 0.5588296490022913, - 0.7781920050183544, - 1.012453650982934, - 0.6689918000047328, - 0.3781559240014758, - 1.1614725190302124, - 0.42350314700161107, - 1.5548519889562158, - 0.7269507689925376, - 0.14658266898186412, - 1.3247764499828918, - 1.0330059969855938, - 1.0793450500350446, - 0.353865170996869, - 1.0791752209624974, - 0.07264935100101866, - 0.08766890301194508, - 1.1073954239982413, - 0.09857881099742372, - 0.44956525998713914, - 1.2488315289810998, - 1.4620224040118046, - 0.06616646799375303, - 0.2428444999968633, - 0.34108671099238563, - 1.2931673669809243, - 0.8477085659687873, - 0.9763890660105972, - 0.8287017209950136, - 0.2344636510097189, - 0.6571880819974467, - 1.124255283983075, - 0.32430490700062364, - 1.0549618729855865, - 0.28097126800275873, - 0.26910749898524955, - 1.0972097900084918, - 0.765805848990567, - 0.6512841040239437, - 1.6310736459854525, - 1.0701260170026217, - 0.22370279901952017, - 1.4641491779766511, - 1.2661875510093523, - 1.5530506139766658, - 0.748663367019617, - 0.45161052301409654, - 0.3370151290000649, - 0.9032811370270792, - 1.2439132570289075, - 0.35298860499460716, - 0.06699558701075148, - 0.4072134299931349, - 0.49012404498353135, - 1.1627010939992033, - 0.14105741401726846, - 0.24649750998651143, - 0.7463972680270672, - 0.35658453198266216, - 0.33659542397072073, - 1.211258154027746, - 0.5215555299655534, - 0.47285343101248145, - 0.3247350279852981, - 0.07509194299927913, - 0.3782450250146212, - 0.050497415009886026, - 0.09300678300496656, - 0.10457102398504503, - 0.1517583969834959, - 0.13760428597743157, - 0.08091987798979972, - 0.15794889199605677, - 0.30954727402422577, - 0.11513969601946883, - 0.1709161159960786, - 0.1645335169887403, - 0.20989750200533308, - 0.04653094199602492, - 0.20763742800045293, - 0.12453340699721593, - 0.10954158002277836, - 0.4399399680114584, - 0.1370500259945402, - 0.34611593301815446, - 0.30073663001530804, - 0.26060457798303105, - 0.0924884879932506, - 0.04736990199307911, - 0.33075298901530914, - 0.19273463601712137, - 0.2444558879797114, - 0.9994896179850912, - 0.27561364801658783, - 0.08320658499724232, - 0.18844840898236725, - 0.5050423670181772, - 0.0868013540020911, - 0.245939286032808, - 0.7675375479884678, - 0.20789098799286876, - 0.20606445704470389, - 0.1637384439818561, - 0.04504460300086066, - 0.12099563599622343, - 0.1370320189744234, - 0.2128305500227725, - 0.39408010900660884, - 0.08584164398780558, - 0.6591311299998779, - 0.3131724320264766, - 1.0713569119834574, - 0.21979563402419444, - 0.0694228290085448, - 0.07229243399342522, - 0.11456182702386286, - 0.28482359398913104, - 0.39620646805269644, - 0.17467450301046483, - 0.02601409899943974, - 0.16558267902291846, - 0.066859207014204, - 0.32116344798123464, - 0.2740557580109453, - 0.04518641000322532, - 0.07166945999779273, - 0.11528432500199415, - 0.37264583505748305, - 0.2892809069890063, - 0.10845845896983519, - 0.708447109995177, - 0.28553684998769313, - 0.14487610998912714, - 0.23166010896966327, - 0.2146400019992143, - 0.3275705169799039, - 0.9412106589734321, - 0.13396983900747728, - 0.5404184520011768, - 0.11014021300070453, - 0.2892124480276834, - 0.2307554720027838, - 0.24314322401187383, - 0.23914720701577608, - 0.3209017490153201, - 0.22987069300143048, - 0.3897902570461156, - 0.05440581700531766, - 0.23771756699716207, - 0.05477409699233249, - 0.053333568983362056, - 0.19353289001446683, - 0.3196665769792162, - 0.6677362259652, - 0.4779735519841779, - 0.4494828329916345, - 0.5838035309716361, - 0.6772472789598396, - 0.24509161499736365, - 0.048849836020963266, - 0.8220187290135073, - 0.5210690360545414, - 0.5158051180042094, - 0.24713047401746735, - 0.4635276890185196, - 0.2852339469973231, - 0.5742862220213283, - 0.24662937599350698, - 0.3060006540035829, - 0.35855854299734347, - 0.5553124680009205, - 0.2766216979944147, - 0.47729773398896214, - 0.26623028703033924, - 0.07318530899647158, - 0.24480492499424145, - 0.29000912900664844, - 0.1009096369962208, - 0.053629002984962426, - 0.24683196494879667, - 0.49895724303496536, - 0.25129011103126686, - 0.2989052010088926, - 0.35623596400546376, - 0.01696241900208406, - 0.05269589000090491, - 0.03798486701271031, - 0.09676635898358654, - 0.01284547800605651, - 0.283900150025147, - 0.030399695999221876, - 0.11739050898177084, - 0.030334766997839324, - 0.033356151994667016, - 0.36619333997077774, - 0.0015914859977783635, - 0.12798720599676017, - 0.13063139200676233, - 0.03369228199881036, - 0.03761407599085942, - 0.10062762007873971, - 0.14295130199752748, - 0.1626224200008437, - 0.1309313070087228, - 0.011909298002137803, - 0.012012963998131454, - 0.17165121002472006, - 0.14080334901518654, - 0.02721777599072084, - 0.5587559219711693, - 0.049485782961710356, - 0.14907598101126496, - 0.03356603298743721, - 0.16599429902271368, - 0.06544192600995302, - 0.06348178398911841, - 0.17512926501512993, - 0.04726671001117211, - 0.06587363602011465, - 0.34619393799221143, - 0.08722698198107537, - 0.24911382401478477, - 0.07641371998761315, - 0.19830689002992585, - 0.03318525300710462, - 0.05536852398654446, - 0.14846779401705135, - 0.11675790400477126, - 0.09992092398169916, - 0.19513357401592657, - 0.10728440701495856, - 0.2575145270093344, - 0.058623551987693645, - 0.3997869729792001, - 0.393113331971108, - 0.09661871398566291, - 0.4304790280002635, - 0.13742685099714436, - 0.5214138649898814, - 0.1260941730142804, - 0.06529674396733753, - 0.10131481099233497, - 0.44626070599770173, - 0.35972367599606514, - 0.03397372001199983, - 0.03958891500951722, - 0.012042911010212265, - 0.09768420302134473, - 0.3856618829886429, - 0.07421717602119315, - 0.01855904898548033, - 0.05420845397748053, - 0.09562344099686015, - 0.08424966201710049, - 0.06087223798385821, - 0.011703987998771481, - 0.06470735401671845, - 0.1839480350317899, - 0.05204772000433877, - 0.054105574963614345, - 0.07603898602246772, - 0.07968512501975056, - 0.10273971396964043, - 0.08624635101296008, - 0.14026100996125024, - 0.12186230703082401, - 0.06403906899504364, - 0.10539708401483949, - 0.10495088201423641, - 0.07316089200321585, - 0.06220174401823897, - 0.1394415859831497, - 0.06830290102516301, - 0.464802984992275, - 0.011039892007829621, - 0.31292515799577814, - 0.10184725097496994, - 0.1106551260309061, - 0.20678777602734044, - 0.09308863698970526, - 0.07906661799643189, - 0.1480930569668999, - 0.11113928398117423, - 0.14947423599369358, - 0.0792679630103521, - 0.1722426939959405, - 0.05223294500319753, - 0.1090870649786666, - 0.08347723100450821, - 0.03720021001936402, - 0.120128943992313, - 0.10874098297790624, - 0.06012683396693319, - 0.0683589819673216, - 0.8618825819867197, - 0.06207692601310555, - 0.08904916598112322, - 0.141401882036007, - 0.010197295996476896, - 0.010318943997845054, - 0.14741593597864266, - 0.08310788197559305, - 0.15459540503798053, - 0.08295678099966608, - 0.15127916999335866, - 0.062458289990900084, - 0.026820903018233366, - 0.06810544998734258, - 0.07226802098739427, - 0.1013172830134863, - 0.04467606601247098, - 0.047141324015683495, - 0.031228642008500174, - 0.13920108700403944, - 0.13990114103944506, - 0.07919696500175633, - 0.06810403898998629, - 0.09399280798970722, - 0.17168274796858896, - 0.1809486469865078, - 0.3887528249906609, - 0.03073132700228598, - 0.10319882399926428, - 0.06303443700016942, - 1.0693050300178584, - 0.0469460439926479, - 0.08885010601079557, - 0.08195062000595499, - 0.06766044899995904, - 0.08800857501046266, - 0.07845331302087288, - 0.10966058001213241, - 0.13554691898752935, - 0.0938700440019602, - 0.09861373601597734, - 0.1785246129729785, - 0.16787566697166767, - 0.0626076420157915, - 0.10251262999372557, - 0.11952477000886574, - 0.015505500006838702, - 0.16804490497452207, - 0.11564816400641575, - 0.08005972103273962, - 0.1654324390401598, - 0.2220081370032858, - 0.105361591980909, - 0.397378875000868, - 0.07431169203482568, - 0.11248856996826362, - 0.06112181898788549, - 0.23955393800861202, - 0.053332961004343815, - 0.139471319023869, - 0.07787311001447961, - 0.10898212699976284, - 0.048904021008638665, - 0.07347866898635402, - 0.2502691510162549, - 0.685333626010106, - 0.09546834898355883, - 0.09076555501087569, - 0.056122052017599344, - 0.12901346500439104, - 0.06216541900357697, - 0.098033869988285, - 0.056761151005048305, - 0.07336811198911164, - 0.020866774997557513, - 0.07800446198962163, - 0.09376576900831424, - 0.23991482298879419, - 0.13443449803162366, - 0.11509222800668795, - 0.1393893509521149, - 0.0982402569934493, - 0.08010045098490082, - 0.0785733989905566, - 0.16851980800856836, - 0.19897476299956907, - 2.348874579955009, - 0.1030887749948306, - 0.08593910402851179, - 0.005262993989163078, - 0.10503920698829461, - 0.10469147200637963, - 0.155530326985172, - 0.08277860899397638, - 0.11015964197576977, - 0.12547369301319122, - 0.03149315100745298, - 0.05776390300889034, - 0.05873828400217462, - 0.03709551299107261, - 0.04243317099462729, - 0.1991773620247841, - 0.0036062319995835423, - 0.008336551007232629, - 0.08541885996237397, - 0.0032773510174592957, - 0.04995516003691591, - 0.002441688993712887, - 0.0032846350222826004, - 0.002598197999759577, - 0.030613570008426905, - 0.015442307994817384, - 0.022893268993357196, - 0.02438306200201623, - 0.03389662400877569, - 0.04030108798178844, - 0.02073556400137022, - 0.09393270999134984, - 0.09827631199732423, - 0.13828637600818183, - 0.14327697500993963, - 0.13184049101255368, - 0.15932092098228168, - 0.16321964500821196, - 0.17442924896022305, - 0.1808289890177548, - 0.19650307498523034, - 0.06266586600395385, - 0.03231307702662889, - 0.045494162040995434, - 0.03132582000398543, - 0.08888382496661507, - 0.004730473010567948, - 0.026878835007664748, - 0.03474290401209146, - 0.16174187204160262, - 0.040151745022740215, - 0.008415818985668011, - 0.013418945003650151, - 0.05394085000443738, - 0.0770285760081606, - 0.018303937002201565, - 0.06054634600877762, - 0.07400071699521504, - 0.1379486029909458, - 0.3513541969732614, - 0.06368277697765734 - ], - "generation_latencies": [ - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0 - ], - "throughput_timeline": [], - "prefill_latencies": [ - 0.02811412599112373, - 0.01745213000685908, - 0.0214271570002893, - 0.08595944900298491, - 0.08591743900615256, - 0.02718320200801827, - 0.01699369501147885, - 0.02453418700315524, - 0.013344803999643773, - 0.012650358999962918, - 0.04741405599634163, - 0.05846475101134274, - 0.05971233799937181, - 0.07466157799353823, - 0.0782404859928647, - 0.09296963000087999, - 0.07118051699944772, - 0.06375274999300018, - 0.06516094399557915, - 0.06552038299560081, - 0.06863483499910217, - 0.07696993899298832, - 0.04709868899954017, - 0.07879510300699621, - 0.04758143601065967, - 0.10218201500538271, - 0.0743444990075659, - 0.11439409299055114, - 0.11475805100053549, - 0.04691292000643443, - 0.057513961000950076, - 0.0587495190120535, - 0.05078125299769454, - 0.05109392599842977, - 0.06320706100086682, - 0.1053726209938759, - 0.07806083299510647, - 0.08566020699799992, - 0.0930769219994545, - 0.043371086998376995, - 0.03933939700073097, - 0.08591821299341973, - 0.08657565999601502, - 0.04420665500219911, - 0.08630175801226869, - 0.015282077001756988, - 0.08658831499633379, - 0.08738372000516392, - 0.045030291003058665, - 0.045428379002260044, - 0.0458701210009167, - 0.11155226800474338, - 0.10016616999928374, - 0.0983979000011459, - 0.12381233900669031, - 0.12155693900422193, - 0.12985662200662773, - 0.03306886399514042, - 0.011536819001776166, - 0.045121507995645516, - 0.03179507900495082, - 0.039722176996292546, - 0.011475018007331528, - 0.021195976005401462, - 0.045769886011839844, - 0.02704482300032396, - 0.008661663989187218, - 0.024167117997421883, - 0.016189183996175416, - 0.02544831899285782, - 0.01620817999355495, - 0.019083232997218147, - 0.024174790989491157, - 0.035904549004044384, - 0.042684395011747256, - 0.048546965001150966, - 0.04862577399762813, - 0.0340551429981133, - 0.02004250900063198, - 0.012951021999469958, - 0.026894706999883056, - 0.06006245799653698, - 0.09420852299081162, - 0.10194829099054914, - 0.1226482659985777, - 0.11034153299988247, - 0.019911354000214487, - 0.11345309400348924, - 0.033688593000988476, - 0.02724742700229399, - 0.027300510002532974, - 0.020051831990713254, - 0.03030233600293286, - 0.031330646990682, - 0.023821458002203144, - 0.013372301997151226, - 0.025923288994817995, - 0.006637361002503894, - 0.032491468999069184, - 0.08240982799907215, - 0.08949174800363835, - 0.012191308007459156, - 0.08813499798998237, - 0.012633656995603815, - 0.09674214301048778, - 0.02168779299245216, - 0.03619194700149819, - 0.1111015679925913, - 0.027447475993540138, - 0.04298642301000655, - 0.01575769198825583, - 0.016759923004428856, - 0.015730209997855127, - 0.09658565699646715, - 0.0966465500096092, - 0.08842675299092662, - 0.09700824299943633, - 0.009124942007474601, - 0.07558261499798391, - 0.1804582579934504, - 0.02433869800006505, - 0.03006099500635173, - 0.11477006299537607, - 0.014506282008369453, - 0.019464619996142574, - 0.04421708200243302, - 0.03626949900353793, - 0.023785343000781722, - 0.11768215100164525, - 0.11746665400278289, - 0.12287201700382866, - 0.14682507200632244, - 0.3019797290035058, - 0.4720540410053218, - 0.2994579530059127, - 0.5061604770016856, - 0.5171280570066301, - 0.34458653000183403, - 0.6627970340050524, - 0.05002938999678008, - 0.050090302000171505, - 0.36478338699089363, - 0.0, - 0.2947798109962605, - 0.339434366003843, - 0.3145181859872537, - 0.30876177598838694, - 0.2963030829996569, - 0.3235831390047679, - 0.32521843000722583, - 0.31938562300638296, - 0.0, - 0.3170696519955527, - 0.021426252991659567, - 0.05436134000774473, - 0.02856806399358902, - 0.04981221399793867, - 0.0539909500075737, - 0.059017232997575775, - 0.02188266601297073, - 0.028487155999755487, - 0.0, - 0.03856424200057518, - 0.06047881199629046, - 0.0456428670004243, - 0.03972885799885262, - 0.039513045994681306, - 0.03977952600689605, - 0.04928458799258806, - 0.04282788799901027, - 0.026921281998511404, - 0.041657472000224516, - 0.042604522008332424, - 0.0407507030031411, - 0.05118104700522963, - 0.045881153011578135, - 0.17494977999012917, - 0.24544467400119174, - 0.262316753010964, - 0.013426227000309154, - 0.2290209899947513, - 0.03615285600244533, - 0.025559673013049178, - 0.03943572999560274, - 0.0253102190035861, - 0.04536022900720127, - 0.04501092700229492, - 0.019833479993394576, - 0.03562236799916718, - 0.06788860299275257, - 0.02408302400726825, - 0.02362355099467095, - 0.03211481499602087, - 0.032775140993180685, - 0.034156474997871555, - 0.0, - 0.017687187995761633, - 0.04018194999662228, - 0.03895932999148499, - 0.03007085200806614, - 0.02895652000734117, - 0.07675803601159714, - 0.054172354997717775, - 0.0, - 0.035702943001524545, - 0.04271738399984315, - 0.06663504699827172, - 0.019981292003649287, - 0.01935917101218365, - 0.0, - 0.03617033301270567, - 0.021759124007076025, - 0.022151217999635264, - 0.02439592601149343, - 0.019673122005769983, - 0.021855507002328523, - 0.0, - 0.0, - 0.019948688001022674, - 0.025335639002150856, - 0.045335404996876605, - 0.05451036999875214, - 0.01888799900189042, - 0.019696532996022142, - 0.019800016001681797, - 0.0, - 0.002537161999498494, - 0.02804926699900534, - 0.009565772998030297, - 0.005718227999750525, - 0.14157098100986332, - 0.0, - 0.1629706800013082, - 0.14812947699101642, - 0.15246441699855495, - 0.16273097800149117, - 0.15281968700583093, - 0.005932297004619613, - 0.0, - 0.011345777005772106, - 0.0, - 0.01308540599711705, - 0.01196138298837468, - 0.0, - 0.014280332005000673, - 0.021261417990899645, - 0.021376908000092953, - 0.024325544000021182, - 0.0, - 0.01435049700376112, - 0.02985438499308657, - 0.008159093995345756, - 0.005711938996682875, - 0.011142959003336728, - 0.0070962850004434586, - 0.00985714299895335, - 0.0, - 0.02591231299447827, - 0.0, - 0.0, - 0.0, - 0.018659453999134712, - 0.01121310199960135, - 0.027455848001409322, - 0.017272828990826383, - 0.028889849985716864, - 0.0, - 0.0052042070019524544, - 0.18007051199674606, - 0.0, - 0.1810287599946605, - 0.18414228800975252, - 0.18474308200529777, - 0.19453695599804632, - 0.1911619970051106, - 0.19005767699854914, - 0.017526978990645148, - 0.19812171599187423, - 0.014274474000558257, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.026386385012301616, - 0.0, - 0.013178951005102135, - 0.0, - 0.03480647499964107, - 0.02170364600897301, - 0.014637489002780057, - 0.028722120987367816, - 0.04749504399660509, - 0.21971407798992004, - 0.0, - 0.22767876299622003, - 0.22080158299650066, - 0.22880928000085987, - 0.22696733100747224, - 0.007307890002266504, - 0.0036110700020799413, - 0.0, - 0.0035880089999409392, - 0.004911911993985996, - 0.0064703350071795285, - 0.002909674003603868, - 0.0, - 0.009502079992671497, - 0.009068530998774804, - 0.004321845990489237, - 0.00455390899151098, - 0.007409847996314056, - 0.005309135012794286, - 0.0, - 0.009359859002870508, - 0.0, - 0.009321260004071519, - 0.008247803008998744, - 0.01232069099205546, - 0.12692940099805128, - 0.12109681899892166, - 0.12465708900708705, - 0.0, - 0.008901491004507989, - 0.009438964989385568, - 0.12862248299643397, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.012001933995634317, - 0.0, - 0.0, - 0.035806952000712045, - 0.027628784999251366, - 0.0, - 0.03331241199339274, - 0.028043554004398175, - 0.027757738003856502, - 0.0, - 0.0, - 0.0, - 0.011463818998890929, - 0.012268934995518066, - 0.03302612299739849, - 0.027005992000340484, - 0.0, - 0.026617499999701977, - 0.03129579400410876, - 0.017308460010099225, - 0.29443177398934495, - 0.027659265004331246, - 0.027292887010844424, - 0.0, - 0.0, - 0.0379211330000544, - 0.023450998007319868, - 0.0, - 0.012656517996219918, - 0.0121950500033563, - 0.026855561998672783, - 0.03166891999717336, - 0.0, - 0.0, - 0.011464028008049354, - 0.0, - 0.02684567400137894, - 0.007293928996659815, - 0.027179660988622345, - 0.0, - 0.020164069006568752, - 0.01989728500484489, - 0.034650474990485236, - 0.02180442500684876, - 0.04133225300756749, - 0.032517168001504615, - 0.0326511989987921, - 0.0, - 0.010589771991362795, - 0.03599419399688486, - 0.0, - 0.035902494011679664, - 0.041725476999999955, - 0.036682031000964344, - 0.021444450001581572, - 0.01565645400842186, - 0.015949039996485226, - 0.0, - 0.0, - 0.0, - 0.04671302399947308, - 0.045917640993138775, - 0.0, - 0.021024157002102584, - 0.046262343996204436, - 0.036545416005537845, - 0.030843260989058763, - 0.015392738991067745, - 0.02640423299453687, - 0.04618797100556549, - 0.0, - 0.015419344999827445, - 0.024131525002303533, - 0.021049950999440625, - 0.0, - 0.026214909012196586, - 0.041224720000172965, - 0.025534430009429343, - 0.03587635500298347, - 0.0, - 0.05127660800644662, - 0.030660777003504336, - 0.0, - 0.015777913999045268, - 0.030961852986365557, - 0.0, - 0.0, - 0.027399216007324867, - 0.030805888993199915, - 0.036255448998417705, - 0.0, - 0.0, - 0.0, - 0.020930719008902088, - 0.036130378997768275, - 0.025785934005398303, - 0.03626160600106232, - 0.04126601800089702, - 0.03108419400814455, - 0.01725625900144223, - 0.020956816006219015, - 0.015894883006694727, - 0.021206470992183313, - 0.025719380006194115, - 0.025921409993316047, - 1.0264746970060514, - 0.0268099910026649, - 0.0, - 0.0, - 0.0, - 0.0, - 0.02235398600168992, - 0.025567691001924686, - 0.04708665500220377, - 0.0358072459930554, - 0.01567951300239656, - 0.015545768008450978, - 0.02550469800189603, - 0.03247856999223586, - 0.015371507004601881, - 0.01565664800000377, - 0.0, - 0.01212937300442718, - 0.0, - 0.0, - 0.0, - 0.0, - 0.030624897000961937, - 0.0, - 0.02578090099268593, - 0.0, - 0.0, - 0.0, - 0.026250248003634624, - 0.0, - 0.015940507990308106, - 0.0, - 0.0, - 0.028243929002201185, - 0.016055211002822034, - 0.0, - 0.010609367993311025, - 0.021236934000626206, - 0.034181826005806215, - 0.010579862995655276, - 0.0, - 0.02614560499205254, - 0.030962050004745834, - 0.0, - 0.0, - 0.026619166994350962, - 0.017741390998708084, - 0.021087408007588238, - 0.0, - 0.0, - 0.0, - 0.021349823000491597, - 0.0, - 0.0, - 0.03385218800394796, - 0.03704490099335089, - 0.03664959399611689, - 0.030789685988565907, - 0.025775472007808276, - 0.020955640997271985, - 0.0, - 0.011155116008012556, - 0.02271154199843295, - 0.02658670699747745, - 0.0, - 0.01586142400628887, - 0.0, - 0.03609469399088994, - 0.047791049000807106, - 0.0, - 0.0, - 0.002056780009297654, - 0.0006352039927151054, - 0.0009893770038615912, - 0.0018647649994818494, - 0.008662759995786473, - 0.006723001002683304, - 0.007303879989194684, - 0.010939191008219495, - 0.016775757001596503, - 0.0074548379925545305, - 0.0072678389988141134, - 0.01057562899950426, - 0.014803562007728033, - 0.017231429999810643, - 0.016382918009185232, - 0.013791212986689061, - 0.024133222992531955, - 0.03873489699617494, - 0.04090888099744916, - 0.04158518598705996, - 0.0403067420120351, - 0.018372003993135877, - 0.007129894001991488, - 0.010420011996757239, - 0.006561493995832279, - 0.01167305899434723, - 0.0037831209920113906, - 0.007598915995913558, - 0.007474569996702485, - 0.008600870001828298, - 0.008422052997048013, - 0.005050905005191453, - 0.01061011099955067, - 0.008173221998731606, - 0.009849946000031196, - 0.013770499004749581, - 0.013208425996708684, - 0.013177412998629734, - 0.03732756599492859, - 0.15360581799177453, - 0.011610753004788421 - ], - "decode_latencies": [ - 0.005380546004744247, - 0.011162932001752779, - 0.016286677986499853, - 0.00540681098937057, - 0.0005002890102332458, - 0.005443396992632188, - 7.222199928946793e-05, - 0.006051946998923086, - 0.005569601009483449, - 0.005587915002251975, - 0.06746193500293884, - 0.02651048800908029, - 0.005637265989207663, - 0.009410623999428935, - 0.06793762699817307, - 0.0049522189947310835, - 0.0006548929959535599, - 0.013466928998241201, - 0.017712507004034705, - 0.029978787002619356, - 0.010931547993095592, - 0.04303217300912365, - 0.0007538229983765632, - 0.0356046489905566, - 0.08669294699211605, - 0.02468812500592321, - 0.012251951993675902, - 0.0048951819917419925, - 0.0064189020049525425, - 0.0492427540011704, - 0.06844646800891496, - 0.06973575199663173, - 0.0897860159893753, - 0.06892594999226276, - 0.06775972699688282, - 0.06956005199754145, - 0.005831602000398561, - 0.000825852999696508, - 0.012703328000498004, - 0.06836409999232274, - 0.04782937500567641, - 0.0156097520084586, - 0.0022505749948322773, - 0.06749515800038353, - 0.006894240999827161, - 0.013931000008597039, - 0.05129074900469277, - 0.007752299003186636, - 0.007000213008723222, - 0.005733429003157653, - 0.011355380003806204, - 0.02372486201056745, - 0.025694883996038698, - 0.03069463600695599, - 0.008928808994824067, - 0.008439400000497699, - 0.012514642003225163, - 0.00826848299766425, - 0.014431610004976392, - 0.03565164300380275, - 0.00954278300923761, - 0.0005135950050316751, - 0.00715174300421495, - 0.007994464001967572, - 0.02030508199823089, - 0.00716263399226591, - 0.05161353500443511, - 0.010431970003992319, - 0.04711258799943607, - 0.01370056500309147, - 0.013264277004054748, - 0.09506632499687839, - 0.00548250500287395, - 0.06785479599784594, - 0.01563236999209039, - 0.01824938399659004, - 0.09899376699468121, - 0.017396667011780664, - 0.06835263999528252, - 0.06836739799473435, - 0.08111247800115962, - 0.05228550599713344, - 0.00619440599984955, - 0.019156880996888503, - 0.007772573997499421, - 0.06866494299902115, - 0.014237807990866713, - 0.024287092994200066, - 0.015107005994650535, - 0.027692664007190615, - 0.013304912004969083, - 0.021953631992801093, - 0.013034293006057851, - 0.008761235993006267, - 0.006635223995544948, - 0.015568066999549046, - 0.020201406005071476, - 0.0072327939997194335, - 0.013635349998367019, - 0.01589213999977801, - 0.007617853989358991, - 0.08476850199804176, - 0.01123564199951943, - 0.07010349600750487, - 0.12356759699468967, - 0.19168605799495708, - 0.001905743993120268, - 0.18604322499595582, - 0.11461931299709249, - 0.017900264007039368, - 0.0893897600035416, - 0.2808861950034043, - 0.03092309201019816, - 0.008154597991961055, - 0.02485939000325743, - 0.03200016099435743, - 0.01944646899937652, - 0.27509815400117077, - 0.006207833008375019, - 0.017640032005147077, - 0.018138282001018524, - 0.02936415000294801, - 0.011210113996639848, - 0.30252940898935776, - 0.01976248700520955, - 0.002007768998737447, - 0.009180875000311062, - 0.031202792990370654, - 0.00776478300394956, - 0.007481544991605915, - 0.03411545300332364, - 0.0331939650059212, - 0.2690888270008145, - 0.008113501011393964, - 0.3033053309918614, - 0.27526905700506177, - 0.008117113000480458, - 0.021009953998145647, - 0.032069936001789756, - 0.013397432994679548, - 0.00820250999822747, - 0.07236477200058289, - 0.07675585300603416, - 0.04161081799247768, - 0.07310317800147459, - 0.01645860700227786, - 0.03046493099827785, - 0.06945303900283761, - 0.014066755000385456, - 0.0392459940048866, - 0.0161848059942713, - 0.18594115000450984, - 0.012832378997700289, - 0.016906616001506336, - 0.026768982002977282, - 0.015350763002061285, - 0.028437268003472127, - 0.0152231749962084, - 0.038907719994313084, - 0.03306801700091455, - 0.0461705810012063, - 0.0244642009929521, - 0.007667591999052092, - 0.1315691399940988, - 0.03636370399908628, - 0.030305071006296203, - 0.0770425760129001, - 0.008184186997823417, - 0.024979870999231935, - 0.030655167996883392, - 0.038381410005968064, - 0.04445046700129751, - 0.024291937006637454, - 0.03317547700135037, - 0.021674957999493927, - 0.00728003800031729, - 2.7542002499103546e-05, - 0.013097470000502653, - 0.006341258995234966, - 0.0313988639973104, - 0.015225523995468393, - 0.1319600300048478, - 0.02369903400540352, - 0.006692092996672727, - 0.031548541999654844, - 0.007144064991734922, - 0.01869800899294205, - 0.013209106007707305, - 0.024278006996610202, - 0.005124036004417576, - 0.01718448101019021, - 0.0032104620040627196, - 0.010005349002312869, - 0.006719740995322354, - 0.02266016899375245, - 0.002525404008338228, - 0.020988839998608455, - 0.035856301998137496, - 0.016574545996263623, - 0.014654560000053607, - 0.03812438200111501, - 0.015784906994667836, - 0.021294375008437783, - 0.022890959997312166, - 0.13615131098777056, - 0.02352982600859832, - 0.03453501299372874, - 0.021245519994408824, - 0.010473620000993833, - 0.018273174995556474, - 0.021996016992488876, - 0.007579081007861532, - 0.14244859099562746, - 0.007262818995513953, - 0.009898869990138337, - 0.010774954003863968, - 0.010789895997731946, - 0.04040564599563368, - 0.03454997499648016, - 0.11527119600214064, - 0.01118667799164541, - 0.016953452010056935, - 0.015943282007356174, - 0.012863914991612546, - 0.006787989987060428, - 0.015237819010508247, - 0.0026041990058729425, - 0.013917315009166487, - 0.0014098879910307005, - 0.009740233013872057, - 0.028819409999414347, - 0.022071749990573153, - 0.008087700000032783, - 0.012831301006372087, - 0.021857282001292333, - 0.008292760991025716, - 0.024528059002477676, - 0.01879066500987392, - 0.01724139200814534, - 0.009996919994591735, - 0.0076579720043810084, - 0.00810900199576281, - 0.02418828700319864, - 0.011500137989060022, - 0.016949273005593568, - 0.007483681009034626, - 0.010058353989734314, - 0.013816756996675394, - 0.008025069007999264, - 0.026906499988399446, - 0.013185181000153534, - 0.024519575003068894, - 0.010430723006720655, - 0.00827557299635373, - 0.007020090997684747, - 0.17787256800511386, - 0.008186726001440547, - 0.006304290000116453, - 3.6052006180398166e-05, - 0.012464586005080491, - 0.007701698996243067, - 0.14428460199269466, - 0.0303299649967812, - 0.012315241998294368, - 0.009133628002018668, - 0.0061716019990853965, - 0.03468832699581981, - 0.014416451987926848, - 0.01608240499626845, - 0.009016961004817858, - 0.004981816993677057, - 0.014499985991278663, - 0.016262159013422206, - 0.008374088996788487, - 0.004291056000511162, - 0.0070514800027012825, - 0.002997153002070263, - 0.016018579990486614, - 0.004831956000998616, - 0.022682565002469346, - 0.02158159100508783, - 0.03466538499924354, - 0.007139748995541595, - 0.0027598760061664507, - 0.019101036014035344, - 0.028011978007270955, - 0.24753918600617908, - 0.03220933700504247, - 0.02786108599684667, - 0.008652499003801495, - 0.014620014000684023, - 0.012041811001836322, - 0.004655527998693287, - 0.002280177010106854, - 0.00230131100397557, - 0.02008891799778212, - 0.0025665329885669053, - 0.0030538889986928552, - 0.0026924440026050434, - 0.006793941996875219, - 0.00478975499572698, - 0.0023673399991821498, - 0.007015625000349246, - 0.008594541999627836, - 0.007854687006329186, - 0.0019162600074196234, - 0.0021798289963044226, - 0.003230998001527041, - 0.00745385600021109, - 0.0056314560060855, - 0.001199874997837469, - 0.007096255998476408, - 0.0036322759988252074, - 0.00023281500034499913, - 0.0039321790100075305, - 0.006250455990084447, - 0.0014010830054758117, - 0.009769498006789945, - 0.004445439990377054, - 0.005103258998133242, - 0.0015717649948783219, - 0.0017676370043773204, - 0.11741138900106307, - 0.0004474479937925935, - 0.0013267189933685586, - 0.0021203540090937167, - 0.01150701601000037, - 0.02676258300198242, - 0.017427981001674198, - 0.0008085979934548959, - 0.009379997005453333, - 0.0020513250055955723, - 0.005532209004741162, - 0.01170361100230366, - 0.0003805299929808825, - 0.005835623000166379, - 0.027677093996317126, - 0.0031684309942647815, - 0.010416784993140027, - 0.010626158997183666, - 0.0011766499956138432, - 0.03143015499517787, - 0.011720326001523063, - 0.005424948991276324, - 0.009306089996243827, - 0.010652482000296004, - 0.015414198001963086, - 0.010534194007050246, - 0.005687240001861937, - 0.010126652996405028, - 0.0053425770020112395, - 0.006251419996260665, - 0.021752398999524303, - 0.005979291992844082, - 0.005724123009713367, - 0.01607486700231675, - 0.006748285988578573, - 0.0014496079966193065, - 0.005679164998582564, - 0.0063060800021048635, - 0.006228958009160124, - 0.005384155985666439, - 0.010149991998332553, - 0.017803745999117382, - 0.011348133004503325, - 0.000980597993475385, - 0.005408514000009745, - 0.00026914500631392, - 0.0010494019952602684, - 0.014489580993540585, - 0.007634341993252747, - 0.0023498019872931764, - 0.008209748004446737, - 0.008659814993734471, - 0.005166639995877631, - 0.015445764001924545, - 0.0002194620028603822, - 0.010278929999913089, - 0.02145528698747512, - 0.030824803994619288, - 0.015339566001784988, - 0.00605315800930839, - 0.010153844006708823, - 0.04708836200006772, - 0.015288616996258497, - 0.015243133995682001, - 0.02807793699321337, - 0.005439512999146245, - 0.010383996996097267, - 0.0001957569911610335, - 0.00515623101091478, - 0.030574778997106478, - 0.005222587002208456, - 0.025628598988987505, - 0.010377832004451193, - 0.019869333002134226, - 0.010664799003279768, - 0.015271747994120233, - 0.0051689199899556115, - 0.005443208006909117, - 0.00515509900287725, - 0.006177681992994621, - 0.010353549994761124, - 0.01030328799970448, - 0.015315204000216909, - 0.020440826992853545, - 0.010397247999208048, - 0.010261731993523426, - 0.005199046005145647, - 0.010463500002515502, - 0.02045115499640815, - 0.005164379006600939, - 0.010114838994923048, - 0.010503430006792769, - 0.005440772991278209, - 0.021048782000434585, - 0.010420903010526672, - 0.005655506989569403, - 0.005264945997623727, - 0.010282923001796007, - 0.015537920000497252, - 0.01064911599678453, - 0.00518490000104066, - 0.010539000999415293, - 0.005149189004441723, - 0.005275186995277181, - 0.00522386199736502, - 0.005176039005164057, - 0.04973523199441843, - 0.015789362994837575, - 0.005332996996003203, - 0.005130094999913126, - 0.020462707994738594, - 0.015367190993856639, - 0.010345222995965742, - 0.025903528003254905, - 0.01041305200487841, - 0.026955687993904576, - 0.005201983003644273, - 0.010697384001105092, - 0.010148761997697875, - 0.010162777005461976, - 0.0052317139925435185, - 0.010374335004598834, - 0.00547694499255158, - 0.005272896989481524, - 0.005187419010326266, - 0.005112514001666568, - 0.020529243993223645, - 0.005510538001544774, - 0.010371586002293043, - 0.010313922000932507, - 0.005201476000365801, - 0.00583657699462492, - 0.005139839995536022, - 0.06322474399348721, - 0.0051671109977178276, - 0.005374199012294412, - 0.010290702994097956, - 0.00517449299513828, - 0.026177166000707075, - 0.015463015995919704, - 0.010178944998187944, - 0.01063549899845384, - 0.02043671799765434, - 0.005491695992532186, - 0.010357072998885997, - 0.0367605170031311, - 0.015351920999819413, - 0.010256010995362885, - 0.010360880012740381, - 0.02041357700363733, - 0.0203815470013069, - 0.010412019997602329, - 0.01038535000407137, - 0.010182641999563202, - 0.010285338998073712, - 0.010423419007565826, - 0.005175376005354337, - 0.005186409005546011, - 4.1382998460903764e-05, - 0.0157523560046684, - 0.01529920200118795, - 8.476999937556684e-05, - 0.005180071995710023, - 0.015401983007905073, - 0.015616133998264559, - 0.030832702992483974, - 0.04182674600451719, - 0.01061117798963096, - 0.005631816005916335, - 0.0306234590098029, - 0.01625608500035014, - 0.005166068993275985, - 0.01858949099550955, - 0.07305770000675693, - 0.005174420002731495, - 0.015446841003722511, - 0.025964471991756, - 0.010404045999166556, - 0.01537961300346069, - 0.030673931003548205, - 0.005391395010519773, - 0.024170361997676082, - 0.00035332900006324053, - 0.03301037699566223, - 0.005529863992705941, - 0.0002648679947014898, - 0.0001331750099780038, - 5.0125992856919765e-05, - 0.00013591200695373118, - 0.0003811610076809302, - 0.0016342010057996958, - 0.0018123859917977825, - 0.004667974993935786, - 0.0048074069927679375, - 0.0053510340076172724, - 0.006489228995633312, - 0.004987976004485972, - 0.016427309994469397, - 0.013780310007859953, - 0.014907907010638155, - 0.006798501999583095, - 0.015262067987350747, - 0.014333485989482142, - 0.017200795002281666, - 0.022140531000331976, - 0.01568260000203736, - 0.015057386000989936, - 0.016340531001333147, - 0.006355528996209614, - 0.006682563005597331, - 0.0073041650030063465, - 0.00520968998898752, - 0.0008871850004652515, - 0.0026582609862089157, - 0.001568994004628621, - 0.0018253940070280805, - 0.0016769460053183138, - 0.0011820740037364885, - 0.003669155004899949, - 0.003656469998531975, - 0.003502511011902243, - 0.005199849998462014, - 0.005751980002969503, - 0.004671464994316921, - 0.014623105991631746, - 0.00514161899627652, - 0.00491866200172808 - ], - "multi_turn_cache_hits": 71, - "multi_turn_cache_misses": 301, - "seed": 42, - "summary": { - "total_requests": 549, - "total_tokens": 147313, - "elapsed_time": 42.03712797164917, - "avg_throughput_tokens_per_sec": 3504.3545339099132, - "requests_per_second": 13.059883643103747, - "end_to_end_latency_ms": { - "mean": 11741.778062626385, - "p50": 3959.794675989542, - "p95": 43183.21597200411, - "p99": 44894.88796267483 - }, - "storage_io_latency_ms": { - "mean": 267.17036587509745, - "p50": 146.58266898186412, - "p95": 1035.2712508116383, - "p99": 1396.144346077924 - }, - "generation_latency_ms": { - "mean": 0.0, - "p50": 0.0, - "p95": 0.0, - "p99": 0.0 - }, - "cache_stats": { - "cache_hit_rate": 0.9332992152848857, - "cache_hits": 5471, - "cache_misses": 391, - "gpu_entries": 22, - "cpu_entries": 8, - "nvme_entries": 419, - "gpu_memory_used_gb": 3.036865234375, - "cpu_memory_used_gb": 0.9547119140625, - "offloads_cpu": 427, - "offloads_nvme": 419, - "storage_health": { - "overall_status": "PASS", - "criteria": [ - { - "name": "NVMe Read P95 < 200ms", - "target": 200, - "actual": 39.03934770060004, - "unit": "ms", - "passed": true - }, - { - "name": "CPU RAM P95 < 150ms", - "target": 150, - "actual": 26.890885252214503, - "unit": "ms", - "passed": true - }, - { - "name": "Cache Hit Rate > 30%", - "target": 0.3, - "actual": 0.9332992152848857, - "unit": "ratio", - "passed": true - } - ], - "passed_count": 3, - "total_count": 3 - }, - "prefill_writes": 451, - "decode_reads": 5471, - "prefill_bytes_written_gb": 7.75634765625, - "decode_bytes_read_gb": 91.873779296875, - "system_prompt_hits": 852, - "common_phrase_hits": 0, - "user_cache_hits": 4548, - "multi_turn_hits": 71, - "total_read_bytes": 98648719360, - "total_write_bytes": 8328314880, - "total_read_gb": 91.873779296875, - "total_write_gb": 7.75634765625, - "read_write_ratio": 11.844979540446962, - "read_iops": 5471, - "write_iops": 451, - "gpu_read_p50_ms": 10.459969998919405, - "gpu_read_p95_ms": 114.4120625045616, - "gpu_read_p99_ms": 276.06978995288944, - "gpu_write_p50_ms": 29.85438499308657, - "gpu_write_p95_ms": 228.24402149853995, - "gpu_write_p99_ms": 418.4187139981077, - "cpu_read_p50_ms": 5.564635997870937, - "cpu_read_p95_ms": 26.890885252214503, - "cpu_read_p99_ms": 50.135158539342, - "nvme_read_p50_ms": 41.98183499102015, - "nvme_read_p95_ms": 81.55867819441481, - "nvme_read_p99_ms": 759.3386758567129, - "nvme_read_device_p50_ms": 18.51838899892755, - "nvme_read_device_p95_ms": 39.03934770060004, - "nvme_read_device_p99_ms": 72.37369697802941, - "nvme_read_host_p50_ms": 21.193786000367254, - "nvme_read_host_p95_ms": 40.916514190030284, - "nvme_read_host_p99_ms": 688.0033975589386 - }, - "qos_metrics": { - "interactive": { - "total_requests": 549, - "latency_ms": { - "mean": 11741.778062626383, - "p50": 3959.794675989542, - "p95": 43183.21597200411, - "p99": 44894.88796267483, - "max": 47034.731070001726 - }, - "sla": { - "target_p95_ms": 50, - "actual_p95_ms": 43183.21597200411, - "compliance": 0.0, - "met": false - } - }, - "responsive": { - "no_data": true - }, - "batch": { - "no_data": true - } - }, - "prefix_cache_stats": { - "prefix_hits": 90, - "prefix_misses": 459, - "system_prompt_reuse": 90, - "common_phrase_reuse": 0, - "bytes_saved": 78643200 - }, - "autoscaling_stats": [], - "autoscaling_summary": null, - "multi_turn_stats": { - "cache_hits": 71, - "cache_misses": 301, - "hit_rate": 0.19086021505376344 - } - } -} \ No newline at end of file diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_cpu_nvme_trial3.json b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_cpu_nvme_trial3.json deleted file mode 100644 index 12f606f7..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_cpu_nvme_trial3.json +++ /dev/null @@ -1,2901 +0,0 @@ -{ - "requests_completed": 549, - "total_tokens_generated": 147832, - "total_storage_io_latency": 134.88631002581678, - "total_generation_latency": 0.0, - "end_to_end_latencies": [ - 0.13726037299784366, - 0.1473485170135973, - 0.15617771199322306, - 0.23652365400630515, - 0.3046579899964854, - 0.328320339001948, - 0.35165926499757916, - 0.47354063599777874, - 0.47955940999963786, - 0.5199209049897036, - 0.5252203030104283, - 0.5332953140023164, - 0.5394713980058441, - 0.547410088009201, - 0.5613570849964162, - 0.5632163900008891, - 0.5631961720064282, - 0.591455264002434, - 0.5927284859935753, - 0.5919805939920479, - 0.5925803050049581, - 0.5937257679906907, - 0.6035363130067708, - 0.6123845580004854, - 0.612672677001683, - 0.6136969780054642, - 0.6154692109994357, - 0.6154171700036386, - 0.6158350540063111, - 0.6174587239947869, - 0.6269361479935469, - 0.6283345710107824, - 0.776717134998762, - 0.7782282970001688, - 0.7854412899905583, - 0.7854064499988453, - 0.8050480039964896, - 0.8058237289951649, - 0.8524785040062852, - 0.879407924003317, - 0.8807789590064203, - 0.8867304879968287, - 0.8874170360068092, - 0.8874715790007031, - 0.8955466490006074, - 0.8991795869951602, - 0.9059452680085087, - 0.9127573850128101, - 0.9136890730005689, - 0.9137259300041478, - 0.9148967369983438, - 0.9151435980020324, - 0.9226222509896616, - 0.9296752210066188, - 0.9287924539967207, - 0.9289658259949647, - 1.0046553789870813, - 1.0072125389997382, - 1.0060483150009532, - 1.0066190740035381, - 1.0082896070089191, - 1.021946544002276, - 1.0250180900038686, - 1.0365892910049297, - 1.0379627579968655, - 1.0444171089911833, - 1.0447691139997914, - 1.0503514540032484, - 1.052489819994662, - 1.056412518999423, - 1.055018459999701, - 1.0552314990054583, - 1.0567966329981573, - 1.0590405920083867, - 1.1372427419992164, - 1.1422664569981862, - 1.2566966689919354, - 1.3373396140086697, - 1.3416042710014153, - 1.3537924440024653, - 1.35907413699897, - 1.365452936006477, - 1.3662328540085582, - 1.368546536003123, - 1.3746873600030085, - 1.3832490490021883, - 1.3815363909961889, - 1.3824222050025128, - 1.555628660004004, - 1.5581142480077688, - 1.5600865659944247, - 1.8338820579956518, - 1.8400365859997692, - 1.854662012992776, - 1.85502624399669, - 1.8550364239927148, - 1.8677166819979902, - 1.8739746600040235, - 1.8780960959993536, - 1.8824679250101326, - 1.8879379340069136, - 1.8894684289989527, - 1.8991168710053898, - 1.8986471270036418, - 1.9203558940062067, - 1.92002022601082, - 1.9224645259964745, - 1.9201884429930942, - 1.9215957109990995, - 1.9240023620077409, - 1.9241624889982631, - 1.9252490830112947, - 1.950081251008669, - 2.2155404969962547, - 2.236007760002394, - 2.2366147699940484, - 2.246688790997723, - 2.2535668140044436, - 2.269094369999948, - 2.292508379992796, - 2.2993676049954956, - 2.2999687009869376, - 2.3006693400093354, - 2.3230722560110735, - 2.3347648210037732, - 2.334993198994198, - 2.3404225520062027, - 2.340502020000713, - 2.3474035179970087, - 2.34866719400452, - 2.5063627990020905, - 2.599279863992706, - 2.718889896001201, - 2.718748154002242, - 2.72079116301029, - 2.72037202300271, - 2.7284534679929493, - 2.7330010879959445, - 2.7385028209973825, - 2.8144724659941858, - 2.8160004660021514, - 2.821402411995223, - 2.8218446920072893, - 2.8220088809903245, - 2.8235158720053732, - 2.8280179320136085, - 2.8354557220009156, - 2.83471410900529, - 2.8360389239969663, - 2.8427843480021693, - 2.850328428001376, - 2.865198301995406, - 2.871998098999029, - 2.8732566819962813, - 2.873951556990505, - 2.873998921000748, - 2.8802332249906613, - 2.896459876006702, - 2.904154871997889, - 2.904202120989794, - 2.9038007780036423, - 2.9067688129871385, - 2.9049744039948564, - 2.904823953998857, - 2.907990143998177, - 2.9132739790074993, - 2.9205017979984405, - 2.940360430002329, - 2.941759687004378, - 2.9428098969947314, - 2.954253926000092, - 2.9554570420004893, - 2.9616010320023634, - 2.961139759994694, - 2.9632938690047013, - 2.976789310996537, - 2.9780828049988486, - 2.9896496110013686, - 2.9952952050080057, - 2.9969726910057943, - 3.023476499001845, - 3.0232601259922376, - 3.02980821399251, - 3.0304879600007553, - 3.0381408279936295, - 3.038133560999995, - 3.0439351750101196, - 3.051336906995857, - 3.0582699369988404, - 3.0696404089976568, - 3.0834247390012024, - 3.085693147004349, - 3.0836266260012053, - 3.0840214689960703, - 3.084273476997623, - 3.091958279008395, - 3.092472985998029, - 3.0928067049972015, - 3.1164895049878396, - 3.2754116889991565, - 3.2781757730117533, - 3.2902177200012375, - 3.2985891610005638, - 3.296321960995556, - 3.2970059359940933, - 3.29660749399045, - 3.303615015989635, - 3.3246709619998, - 3.3262238379975315, - 3.3251599699869985, - 3.331048724008724, - 3.3441438430018025, - 3.343337648009765, - 3.3448119499953464, - 3.352859977996559, - 3.3687256160046672, - 3.3698748019960476, - 3.3806492869916838, - 3.388076523988275, - 3.389157540994347, - 3.3921259750059107, - 3.392692748006084, - 3.3967537849966902, - 3.4038926959910896, - 3.416512429001159, - 3.4173218170035398, - 3.4270885869918857, - 3.428130874002818, - 3.4348782039887737, - 3.440181047990336, - 3.4405287859990494, - 3.4403980799979763, - 3.440405051005655, - 3.4478390979929827, - 3.46155387199542, - 3.463135029989644, - 3.6302007100021, - 3.6318304979940876, - 3.6329578080039937, - 3.6393946030002553, - 3.6398218980029924, - 3.64022829500027, - 3.641593965003267, - 3.6445766760007245, - 3.64599979299237, - 3.6558130259945756, - 3.675928233002196, - 3.704147536001983, - 3.7075709689961514, - 3.7050947309908224, - 3.7044654740020633, - 3.704767912000534, - 3.7060896519978996, - 3.7087536149920197, - 3.7074208009871654, - 3.710643231999711, - 3.7110487180034397, - 3.710347788000945, - 3.7113640979951015, - 3.7139204670093022, - 3.7135887799959164, - 3.7141741679952247, - 3.7166628029954154, - 3.7203023440088145, - 3.721553612005664, - 3.7238467640127055, - 3.723050760003389, - 3.7235781159979524, - 3.723898119991645, - 3.724901330002467, - 3.7286234600032913, - 3.7290853279992007, - 3.7299943850084674, - 3.732726198999444, - 3.736381352005992, - 3.73595871499856, - 3.7363512919982895, - 3.737333954006317, - 3.738080233000801, - 3.74054957300541, - 3.7411279270017985, - 3.739193683999474, - 3.739642978995107, - 3.741470692009898, - 3.742847507004626, - 3.74301328198635, - 3.744020003010519, - 3.748109703999944, - 3.7477105779980775, - 3.749960243992973, - 3.7508104509906843, - 3.7516892830026336, - 3.7545062030112604, - 3.7585559500003, - 3.7603238689916907, - 3.7605792219983414, - 3.7645231490023434, - 3.7671532050007954, - 3.768837026989786, - 3.771168047998799, - 3.7739483439945616, - 3.7766019010014134, - 3.778715434993501, - 3.7792457769974135, - 3.782968384999549, - 3.787255937990267, - 3.7885621359891957, - 3.791273850001744, - 3.7895319100061897, - 3.791361348994542, - 3.793129315992701, - 3.793990093996399, - 3.795540823994088, - 3.798876979999477, - 3.800272072010557, - 3.8006036989972927, - 3.8010759970056824, - 3.807450225998764, - 3.8116867469943827, - 3.8086271539941663, - 3.8114568759920076, - 3.8133495829970343, - 3.813279045993113, - 3.813612108002417, - 3.8136923819984077, - 3.815628062991891, - 3.81581992500287, - 3.8164131399971666, - 3.8176446729921736, - 3.817800571996486, - 3.8177193320007063, - 3.8193575559998862, - 3.820123344004969, - 3.822094235001714, - 3.824703136997414, - 3.825240070989821, - 3.836326351010939, - 3.836235038994346, - 3.8382716850028373, - 3.8397030380001524, - 3.8454227650072426, - 3.8519034839991946, - 3.8547755210020114, - 3.857206907996442, - 3.860389961002511, - 3.8603832200024044, - 3.861014045003685, - 3.8651952699874528, - 3.86699203500757, - 3.868240537995007, - 3.867434915009653, - 3.868543374002911, - 3.868815778012504, - 3.8692025409982307, - 3.8719966479984578, - 3.874506931999349, - 3.873659631004557, - 3.87637306698889, - 3.8769042460044147, - 3.877177299000323, - 3.8796412499941653, - 3.881002706999425, - 3.88265804600087, - 3.882940294002765, - 3.883528189995559, - 3.8841803959949175, - 3.8842024649929954, - 3.8839279060048284, - 3.8843242510047276, - 3.884286897999118, - 3.884470049990341, - 3.8872703469969565, - 3.887318545996095, - 3.8887179949961137, - 3.9008851289981976, - 3.92301704599231, - 4.161058605997823, - 4.209605680007371, - 4.306456581995008, - 4.318711065003299, - 4.3280791439901805, - 4.371890669004642, - 4.382254850992467, - 4.4422867669927655, - 4.477980758994818, - 4.651174302998697, - 4.652022413996747, - 4.862492385000223, - 5.010143491002964, - 5.085327703010989, - 5.229485826988821, - 5.241403438005364, - 5.328808176986058, - 5.675140528997872, - 5.679291202002787, - 5.848741091002012, - 5.901469414995518, - 5.978275487999781, - 5.9893883539916715, - 6.074444665995543, - 6.100792687997455, - 6.104438519992982, - 6.127821640999173, - 6.298168605993851, - 6.308691586993518, - 7.034928942011902, - 7.221279939010856, - 7.854309905000264, - 7.899934819011833, - 7.906420601007994, - 7.908124704990769, - 7.925156630997662, - 7.929519456010894, - 7.9458828670030925, - 7.945327156005078, - 7.952578850003192, - 7.956930646003457, - 7.961247477011057, - 7.959937574996729, - 7.962679994991049, - 7.967734328005463, - 7.967871703003766, - 7.971265516011044, - 7.97436690601171, - 7.976898623994202, - 7.982308894992457, - 7.995263071992667, - 8.008645724999951, - 8.014559572999133, - 8.019632679002825, - 8.020886173006147, - 8.026063254001201, - 8.03356805400108, - 8.03679777700745, - 8.036582338012522, - 8.038780157003202, - 8.039328707993263, - 8.042962925988832, - 8.04451836500084, - 8.04628149700875, - 8.066335909999907, - 8.078191781998612, - 8.078326485992875, - 8.097190617991146, - 8.097498700008146, - 8.097337631988921, - 8.135132335999515, - 8.147166214999743, - 8.187913529996877, - 8.187635384994792, - 8.204324837002787, - 8.220513799999026, - 8.23217962999479, - 8.231943058999605, - 8.239220687988563, - 8.244764429007773, - 8.245235373004107, - 8.245695926001645, - 8.257419311004924, - 8.38201297300111, - 8.398497414993471, - 8.399429818993667, - 8.432478164002532, - 8.43253842300328, - 8.439612114001648, - 8.446957577994908, - 8.447018127000774, - 8.466797598011908, - 8.466047560999868, - 8.61740953399567, - 8.641289389997837, - 8.646918794998783, - 8.680917253004736, - 8.680352277006023, - 8.68133590198704, - 8.683005847007735, - 8.692613856997923, - 8.732870462001301, - 8.734641635994194, - 8.73496692700428, - 8.751671150996117, - 8.751872342007118, - 8.752583768000477, - 8.753682851995109, - 8.754445096012205, - 8.765989475999959, - 8.79055871500168, - 8.802183815991157, - 8.812898777992814, - 8.8193883660133, - 8.825402608010336, - 8.874113844009116, - 8.896114659000887, - 8.896740312004113, - 8.930904722001287, - 8.937931514999946, - 8.937687310011825, - 8.97352959800628, - 8.984613705004449, - 9.018754339005682, - 9.02121791600075, - 9.024815655997372, - 9.025293612998212, - 9.026523923006607, - 9.027120478000143, - 9.030037243006518, - 9.084996111996588, - 9.09861948499747, - 9.100958841998363, - 9.105852519001928, - 9.104968906001886, - 9.109166641006595, - 9.108132276989636, - 9.113645085992175, - 9.149340702002519, - 9.179016209003748, - 9.181733595993137, - 9.187704238996957, - 9.191180549009005, - 9.211073350001243, - 9.222248270001728, - 9.329252978001023, - 9.354530509997858, - 9.370190767993336, - 9.398530694001238, - 9.430727668004693, - 9.425775310999597, - 9.601370709002367, - 9.613674591004383, - 10.150759497002582, - 10.163099243000033, - 10.170351959997788, - 10.18771638200269, - 10.375716947004548, - 10.38551967300009, - 10.487986281994381, - 10.500392595989979, - 11.020426807008334, - 11.40261157299392, - 11.406018330002553, - 11.449344791006297, - 11.499409215000924, - 12.271344770008, - 12.4588232649985, - 12.643655598003534, - 13.102237380007864, - 13.580015063998871, - 13.806769105998683, - 13.962206897995202 - ], - "storage_latencies": [ - 0.07754423900041729, - 0.07604568000533618, - 0.11326006198942196, - 0.06598225400375668, - 0.07708668299892452, - 0.04066159301146399, - 0.12903961099800654, - 0.223863964973134, - 0.1714860029896954, - 0.1636837610276416, - 0.051652683003339916, - 0.2698893949855119, - 0.20636400401417632, - 0.2483065329870442, - 0.06422232298064046, - 0.24877339202794246, - 0.17004395100229885, - 0.23275622398068663, - 0.1795028529886622, - 0.045523460998083465, - 0.07880327900056727, - 0.03564579498197418, - 0.2454688610159792, - 0.09206424601143226, - 0.2051997149537783, - 0.028637818002607673, - 0.15706447498814669, - 0.21224149099725764, - 0.05392088697408326, - 0.10011052600748371, - 0.026006878004409373, - 0.07335425398196094, - 0.05405755899846554, - 0.12551357501070015, - 0.34385641901462805, - 0.14813523499469738, - 0.20014599098067265, - 0.18858813699625898, - 0.11508726997999474, - 0.1739862179965712, - 0.22700052797154058, - 0.21389432999421842, - 0.24631160401622765, - 0.21168098998896312, - 0.43035822798265144, - 0.07294606098730583, - 0.19318529601150658, - 0.20820953501970507, - 0.23037400301836897, - 0.15535768003610428, - 0.29735634800454136, - 0.21892339197802357, - 0.548124518012628, - 0.24556934500287753, - 0.1635526239988394, - 0.09627497699693777, - 0.018298135983059183, - 0.1806394760205876, - 0.07959589999518357, - 0.21297176300140563, - 0.1769063690007897, - 0.02844980199006386, - 0.42213072699087206, - 0.1888161379902158, - 0.1368828680133447, - 0.35421157800010405, - 0.19703696403303184, - 0.3040208879538113, - 0.3347521000105189, - 0.6418062730372185, - 0.33555416703165974, - 0.27944499503064435, - 0.27439460303867236, - 0.4837433289794717, - 0.2988577430078294, - 0.166062280011829, - 0.3182367510307813, - 0.20920320600271225, - 0.6610026409907732, - 0.4820290169882355, - 0.16405636600393336, - 0.10728260800533462, - 0.24529717698169407, - 0.5929605250275927, - 0.40436855297593866, - 0.4589222520007752, - 0.11560754601669032, - 0.35893806199601386, - 0.3415510669874493, - 0.47845871100435033, - 0.5796431740309345, - 0.3158690169948386, - 0.41735677102406044, - 0.7864415960066253, - 0.7778381870157318, - 0.664936894987477, - 0.8901956640038406, - 0.48987484400277026, - 0.8579801599844359, - 0.7852099250303581, - 0.5876012509834254, - 0.8447884170163888, - 0.784385051971185, - 0.6908227489766432, - 0.8912665209936677, - 0.7754687249980634, - 1.063625290960772, - 0.8417302319867304, - 1.0669743020262104, - 0.2233556759892963, - 0.06507055500696879, - 0.7790519820264308, - 0.6832891340309288, - 0.390806067007361, - 0.8702746900089551, - 0.7854303750063991, - 1.2821539389842656, - 0.0550431340088835, - 0.7108346860331949, - 0.05444388999603689, - 0.3829494310193695, - 0.5903648710227571, - 0.672764023009222, - 0.335715835011797, - 1.3099828289996367, - 1.2071378540131263, - 0.42288694498711266, - 0.3437184849899495, - 0.945488256009412, - 1.1048496440198505, - 0.44700743399153, - 0.5891868740000064, - 1.0680182299984153, - 0.05068708998442162, - 0.29159430498839356, - 0.3584237940085586, - 0.2861678799963556, - 1.2023191799671622, - 0.8744724619900808, - 1.1548261080024531, - 0.9346543720166665, - 0.813436260985327, - 0.7944258010102203, - 0.5194319279835327, - 1.6651925089827273, - 0.5213700490130577, - 1.1725704619893804, - 0.4956187259958824, - 0.9461075529980008, - 1.645128489981289, - 0.9073543429985875, - 0.7621084259881172, - 0.3022060190269258, - 0.6625518470391398, - 0.882332258974202, - 0.6243415400094818, - 0.5617075069894781, - 1.4389227609353838, - 0.5781836660171393, - 0.20841582099092193, - 0.07447566199698485, - 1.2991732520313235, - 0.4825533910043305, - 0.12984882599266712, - 1.5330570760415867, - 1.4281522799865343, - 1.131157377953059, - 0.9070633600204019, - 0.8763378029834712, - 0.7063858620094834, - 0.33602003200212494, - 0.5436554409970995, - 0.2177457210054854, - 0.04086713999276981, - 0.8808997670130339, - 0.1197528389893705, - 0.3735237560031237, - 0.0930005260015605, - 0.08596159997978248, - 0.09182784899894614, - 0.14516043799812905, - 0.09327418000611942, - 0.24052520698751323, - 0.0489998779958114, - 0.13935716899868567, - 0.03668480200576596, - 0.023997064010472968, - 0.10520254397124518, - 0.42336763998901006, - 0.1505556039919611, - 0.47973153401107993, - 0.6065840129886055, - 0.06597180200333241, - 0.11027011602709536, - 0.01301688700914383, - 0.24629046804329846, - 0.19245692300319206, - 0.2639093089965172, - 0.2792921079817461, - 0.26786777198140044, - 0.6240067839971744, - 0.34341656604374293, - 1.6868577689892845, - 0.3043496369791683, - 0.36446899504517205, - 0.19330424902727827, - 0.2090990989963757, - 0.26563803701719735, - 0.6863626800040947, - 0.2691007859975798, - 0.265892619965598, - 0.6858279149601003, - 0.30358403200807516, - 0.3694009609753266, - 1.2381239559617825, - 0.3107484410284087, - 0.2148195449844934, - 0.024899846001062542, - 0.36872660902736243, - 0.21457037900108844, - 0.3901871079724515, - 0.35358905501198024, - 0.24114905099850148, - 0.11328341301123146, - 0.06144075500196777, - 0.07353222100937273, - 0.29934782498457935, - 0.2960464210336795, - 0.4266610619961284, - 0.12926098401658237, - 0.3304455840116134, - 0.08849705300235655, - 0.11011842801235616, - 0.44271114499133546, - 0.2703026949602645, - 0.44225573402945884, - 0.10662459601007868, - 0.16958199901273474, - 0.2948099880013615, - 0.4481179139984306, - 0.06055168800230604, - 0.06593534702551551, - 0.5482347360084532, - 0.468591609998839, - 0.4598484040470794, - 0.07772338599897921, - 0.20739802700700238, - 0.23845346800226253, - 0.704195709025953, - 0.2930274489626754, - 0.23117324800114147, - 0.027701768005499616, - 0.10696582801756449, - 0.34165960604150314, - 0.03877992500201799, - 0.5484691250312608, - 0.42151565599488094, - 0.05056465599045623, - 0.06365850399015471, - 0.1530979690287495, - 0.1198922829789808, - 0.18983983993530273, - 0.062352066015591845, - 0.3552985329879448, - 0.2347124499938218, - 0.5165671389986528, - 0.05987180101510603, - 0.092455436984892, - 0.23187916800088715, - 0.04431501500948798, - 0.16291215796081815, - 0.06354136799927801, - 0.03958747300202958, - 0.059883919995627366, - 0.4739326910057571, - 0.5432527990196832, - 0.5872207450156566, - 0.01312197199149523, - 0.010517167989746667, - 0.2828872030513594, - 0.05878574002417736, - 0.007800464998581447, - 7.253300282172859e-05, - 0.013801310007693246, - 0.07203407496854197, - 0.05061345698777586, - 0.01870502198289614, - 0.06664000998716801, - 0.024335801004781388, - 0.0782403719931608, - 0.3242654310015496, - 0.08008306499687023, - 0.06450327803031541, - 0.07456957695831079, - 0.030463989023701288, - 0.034588902999530546, - 0.01349802799813915, - 0.006121431026258506, - 0.2835944269463653, - 0.29877321897947695, - 0.13918555002601352, - 0.04502312102704309, - 0.028894874994875863, - 0.028158021974377334, - 0.04770529898814857, - 0.030435999025939964, - 0.04041957297886256, - 0.08436677399731707, - 0.01535108000098262, - 0.007441592984832823, - 0.03620973003853578, - 0.011488505988381803, - 0.04609814997820649, - 0.002257809988805093, - 0.02619870801572688, - 0.013897488985094242, - 0.011326283012749627, - 0.023371179995592684, - 0.17417764599667862, - 0.025835282998741604, - 0.03421194698603358, - 0.035092605990939774, - 0.03000548695854377, - 0.015537454004515894, - 0.0075618120026774704, - 0.051148238038877025, - 0.03413442503369879, - 0.003920577990356833, - 0.016865932993823662, - 0.01632620400050655, - 0.006780405004974455, - 0.036928546003764495, - 0.03607475898752455, - 0.03433194004173856, - 0.035169551003491506, - 0.0037813899689354002, - 0.039302340024732985, - 0.01492022699676454, - 0.034735572000499815, - 0.017656263982644305, - 0.022208525988389738, - 0.017800949994125403, - 0.05869575202814303, - 0.01486955099971965, - 0.02643462501873728, - 0.029311832986422814, - 0.01946632898761891, - 0.04273030097829178, - 0.014383458998054266, - 0.03668319803546183, - 0.00965720099338796, - 0.018425406989990734, - 0.008798242997727357, - 0.00543576201016549, - 0.02634649800893385, - 0.03428009297931567, - 0.0305657020653598, - 0.0348754650040064, - 0.029568052996182814, - 0.030727734992979094, - 0.0007678149850107729, - 0.014760293008293957, - 0.007397141002002172, - 0.012804355006664991, - 0.0076806149882031605, - 0.02627586299786344, - 0.024665903983986937, - 0.0037707820010837168, - 0.015861663006944582, - 0.00016009999671950936, - 0.00252895598532632, - 0.007530436007073149, - 0.0026568880130071193, - 0.03847491399210412, - 0.007376641005976126, - 0.018325967976124957, - 0.27177119995758403, - 0.2354245250025997, - 0.14378465099434834, - 0.11496480499044992, - 0.33487734098162036, - 0.28471207001712173, - 0.03871121699921787, - 0.05455556299421005, - 0.06832443398889154, - 0.16445948797627352, - 0.1998798600252485, - 0.35225127001467627, - 0.28573168902948964, - 0.30278920399723575, - 0.0640299340011552, - 0.1569095299928449, - 0.2973311840032693, - 0.32274944601522293, - 0.28895283200836275, - 0.3342900879943045, - 0.31942854898807127, - 0.060683696006890386, - 0.07274797199352179, - 0.10539275499468204, - 0.07151016901480034, - 0.35480504496081267, - 0.07307573698926717, - 0.05971371600753628, - 0.10226112802047282, - 0.8917187740298687, - 0.08477229297568556, - 0.397903402990778, - 0.0589129599975422, - 0.39791436899395194, - 0.42970022300141864, - 0.0821675489860354, - 0.4505270669760648, - 0.42180719596217386, - 0.05485737200069707, - 0.43224015198939014, - 0.1410439120081719, - 0.4366174420429161, - 0.19130330099142157, - 0.02256764902267605, - 0.08149787598813418, - 0.09382306598126888, - 0.4334113510121824, - 0.0216282549808966, - 0.11928595798963215, - 0.01111435200436972, - 0.050982720000320114, - 0.020009000989375636, - 0.12623706899466924, - 0.05277165195730049, - 0.08713929395889863, - 0.01424328399298247, - 0.0451514930173289, - 0.1654805609723553, - 0.08566585402877536, - 0.07790490702609532, - 0.036077834010939114, - 7.185997674241662e-06, - 0.08125725905119907, - 0.057807361969025806, - 0.0958064849692164, - 0.050989429975743406, - 0.07118913800513837, - 0.10266866099846084, - 0.052725469024153426, - 0.0227829590003239, - 0.12849979402380995, - 0.13863805601431523, - 0.14977809695119504, - 0.08597892598481849, - 0.12592975801089779, - 0.07510987298155669, - 0.14599578698107507, - 0.07374289901053999, - 0.11257297801785171, - 0.12750755502202082, - 0.104022964995238, - 0.11534572903474327, - 0.12323538998316508, - 0.2637105810135836, - 0.15074610304145608, - 0.09024631799547933, - 0.15472023699840065, - 0.17382224203902297, - 0.06423350400291383, - 0.2336003180098487, - 0.1929565640166402, - 0.19159119202231523, - 0.04381403198931366, - 0.15971865398751106, - 0.21117090599727817, - 0.14840026099409442, - 0.22931794500618707, - 0.34392103597929236, - 0.058121850961470045, - 0.17819277601665817, - 0.26217249099863693, - 0.026371746018412523, - 0.5654182660073275, - 0.2637099669955205, - 0.2367993070220109, - 0.06048201101657469, - 0.005436070001451299, - 0.2870385560381692, - 0.33967009400657844, - 0.09657933401467744, - 0.24739491697982885, - 0.08958342796540819, - 0.02889520001190249, - 0.0709059610235272, - 0.08435331897635479, - 0.08843050003633834, - 0.0946175690041855, - 0.06565687400870956, - 0.059526682991418056, - 0.17116284200164955, - 0.07286768799531274, - 0.5917965219705366, - 0.05344187200535089, - 0.1356368709821254, - 0.17060745001072064, - 0.047184965005726553, - 0.02691819799656514, - 0.005754301018896513, - 0.035109946999000385, - 0.03711067991389427, - 0.020305016019847244, - 0.021365729975514114, - 0.028859869998996146, - 0.01683027998660691, - 0.007100764967617579, - 0.017440675990656018, - 0.01317254701280035, - 0.018001180986175314, - 0.01398634203360416, - 0.01615902199409902, - 0.015654545000870712, - 0.03297068498795852, - 0.02122649597004056, - 0.01426987600279972, - 0.01777194900205359, - 0.028585267995367758, - 0.04514958798245061, - 0.06527780101168901, - 0.02591620797466021, - 0.12024695896252524, - 0.06991023996670265, - 0.04164175398182124, - 0.05605369101976976, - 0.036354270996525884, - 0.05189929000334814, - 0.031379731008200906, - 0.06600337097188458, - 0.035132240009261295, - 0.05473020295903552, - 0.07379320802283473, - 0.026810076014953665, - 0.2556990929879248, - 0.11875607298861723, - 0.06487386899243575, - 0.13657376100309193, - 0.0304050559643656, - 0.2123763519775821, - 0.07134174002567306, - 0.07266110795899294, - 0.03920501899847295, - 0.1138514399935957, - 0.12362647599366028, - 0.11541606500395574 - ], - "generation_latencies": [ - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0 - ], - "throughput_timeline": [], - "prefill_latencies": [ - 0.026439563996973448, - 0.027643855995847844, - 0.07213442299689632, - 0.07294419500976801, - 0.012003504991298541, - 0.10800254599598702, - 0.012901048001367599, - 0.058656995999626815, - 0.05743776599410921, - 0.07124273000226822, - 0.06403811200289056, - 0.06283450199407525, - 0.07463037499110214, - 0.07415439900069032, - 0.016889848993741907, - 0.07419871599995531, - 0.08658479400037322, - 0.04598875099327415, - 0.050779886994860135, - 0.05669224599841982, - 0.08554886799538508, - 0.035518292992492206, - 0.046842748997733, - 0.1220982200029539, - 0.09947611499228515, - 0.10507915601192508, - 0.015101828990736976, - 0.10244935099035501, - 0.11518316600995604, - 0.03527487600513268, - 0.027581469010328874, - 0.034748555000987835, - 0.030272111005615443, - 0.022887368002557196, - 0.037084883006173186, - 0.036905155997374095, - 0.019256982996012084, - 0.049365355996997096, - 0.014044832001673058, - 0.021781738003483042, - 0.02122811699518934, - 0.01494360200013034, - 0.015124283003387973, - 0.030999267008155584, - 0.03864031100238208, - 0.01636862200393807, - 0.023568133998196572, - 0.014274273009505123, - 0.020089066005311906, - 0.052545184997143224, - 0.02974483699654229, - 0.02804943799856119, - 0.016649295008392073, - 0.030698552989633754, - 0.009777522005606443, - 0.019352202012669295, - 0.013927530992077664, - 0.025379654005519114, - 0.025512647000141442, - 0.014652248995844275, - 0.013005058004637249, - 0.01613620700663887, - 0.014557233997038566, - 0.03675816000031773, - 0.04857168300077319, - 0.06268229499983136, - 0.03865464399859775, - 0.049284819993772544, - 0.11726031899161171, - 0.13381234899861738, - 0.11572356800024863, - 0.12093810300575569, - 0.12310552399139851, - 0.12428124900907278, - 0.0899940639937995, - 0.13434927799971774, - 0.10025962500367314, - 0.1365212060045451, - 0.14397315500536934, - 0.1462956429895712, - 0.15483298200706486, - 0.17488868201326113, - 0.053899673992418684, - 0.041208961003576405, - 0.06661666299623903, - 0.03710376399976667, - 0.08129337900027167, - 0.08128705099807121, - 0.03443988000799436, - 0.018781275997753255, - 0.01868253300199285, - 0.031756663011037745, - 0.014381454995600507, - 0.01554199299425818, - 0.0219934680062579, - 0.04144595700199716, - 0.015163297997787595, - 0.014536962989950553, - 0.015340049998485483, - 0.09339778599678539, - 0.08581899999990128, - 0.0962318220117595, - 0.09376451000571251, - 0.023322998997173272, - 0.019406179009820335, - 0.02389392300392501, - 0.01957874500658363, - 0.037225335006951354, - 0.02632110299600754, - 0.029608924000058323, - 0.02383686300890986, - 0.010705628999858163, - 0.010580824993667193, - 0.0040627229900565, - 0.026762547000544146, - 0.005069185004686005, - 0.015546114009339362, - 0.08218112699978519, - 0.08223208299023099, - 0.20412326700170524, - 0.1253629889979493, - 0.2893206839944469, - 0.2093273450009292, - 0.09988529300608207, - 0.2175468740024371, - 0.21752996600116603, - 0.014048288998310454, - 0.11338692100252956, - 0.03446946101030335, - 0.018719769999734126, - 0.1821450479910709, - 0.18180357199162245, - 0.1904823920049239, - 0.17269112401118036, - 0.18718379799975082, - 0.17962980699667241, - 0.2988562239916064, - 0.312852811999619, - 0.019502566996379755, - 0.31672253098804504, - 0.02600333200825844, - 0.032092327004647814, - 0.05500835400016513, - 0.0, - 0.02574514099978842, - 0.018677465996006504, - 0.025642331995186396, - 0.050687162001850083, - 0.020048001009854488, - 0.06157773699669633, - 0.03902663600456435, - 0.3619909010012634, - 0.0, - 0.2963139099883847, - 0.31380698099383153, - 0.32018409900774714, - 0.3182601379958214, - 0.3167055230005644, - 0.3044051850010874, - 0.3195996290014591, - 0.02397361199837178, - 0.028095746994949877, - 0.02609857999777887, - 0.03060992898826953, - 0.0, - 0.049416167006711476, - 0.052272002008976415, - 0.05847289400117006, - 0.05958871300390456, - 0.01922974901390262, - 0.016316388006089255, - 0.037378036009613425, - 0.038062161998823285, - 0.03617609999491833, - 0.06910680700093508, - 0.14215996400162112, - 0.164643462994718, - 0.17280417399888393, - 0.17127815600542817, - 0.21700500100268982, - 0.22430103199440055, - 0.08519934800278861, - 0.1005975499865599, - 0.10275947199261282, - 0.0978142910025781, - 0.029550926003139466, - 0.027039910986786708, - 0.10550626400799956, - 0.11125106700637843, - 0.027705852990038693, - 0.0451822900067782, - 0.017216416003066115, - 0.023983849998330697, - 0.01862699000048451, - 0.037348252008087, - 0.032118068003910594, - 0.021106390006025322, - 0.030313206996652298, - 0.059855277999304235, - 0.06598487500741612, - 0.0, - 0.023794704000465572, - 0.012881644011940807, - 0.026384500990388915, - 0.02245136299461592, - 0.029426831999444403, - 0.0, - 0.0, - 0.01309659999969881, - 0.021834799990756437, - 0.034588408991112374, - 0.035017482994589955, - 0.011341252000420354, - 0.034841722997953184, - 0.028896941992570646, - 0.04313684400403872, - 0.028300029996898957, - 0.011961288008023985, - 0.017733576998580247, - 0.04124829098873306, - 0.03493731300113723, - 0.032748441997682676, - 0.011819013991043903, - 0.029111195995938033, - 0.016662246009218507, - 0.0, - 0.0, - 0.031147569010499865, - 0.03718593099620193, - 0.0, - 0.03163316899735946, - 0.013014733005547896, - 0.017605325003387406, - 0.012784578997525387, - 0.030796348000876606, - 0.0, - 0.04713615799846593, - 0.04046800600190181, - 0.0, - 0.016933419989072718, - 0.022399948997190222, - 0.014235210008337162, - 0.16329212198616005, - 0.16311165399383754, - 0.16886490100296214, - 0.0, - 0.02673264300392475, - 0.02191505600058008, - 0.0, - 0.03582825799821876, - 0.023815627006115392, - 0.040369335998548195, - 0.032087978994240984, - 0.03785366298689041, - 0.0, - 0.006649553994066082, - 0.017737425994710065, - 0.019882640990545042, - 0.026196438004262745, - 0.0, - 0.02757215200108476, - 0.01490085999830626, - 0.03570642300473992, - 0.026402442003018223, - 0.03817819200048689, - 0.025084877997869626, - 0.0, - 0.0, - 0.0, - 0.023293410005862825, - 0.0, - 0.06649968599958811, - 0.0, - 0.18383041099878028, - 0.18656631099293008, - 0.003977779007982463, - 0.012779896991560236, - 0.0, - 0.022553371993126348, - 0.0, - 0.0, - 0.010745745996246114, - 0.0, - 0.028397952002706006, - 0.03186945601191837, - 0.03253776198835112, - 0.03199879299791064, - 0.02032586300629191, - 0.03470763299264945, - 0.036514203005936, - 0.041137695996440016, - 0.0, - 0.0057965500018326566, - 0.003302848999737762, - 0.0, - 0.0, - 0.010642699999152683, - 0.0, - 0.0, - 0.0054447910079034045, - 0.0113913999957731, - 0.008240972005296499, - 0.0033114899997599423, - 0.0, - 0.003108578996034339, - 0.0, - 0.0, - 0.009822462001466192, - 0.006309174001216888, - 0.005013850997784175, - 0.005289983993861824, - 0.0, - 0.0030092399974819273, - 0.005108397002913989, - 0.00577650401100982, - 0.0, - 0.002277623992995359, - 0.004673053990700282, - 0.006425627012504265, - 0.009732227001222782, - 0.008043231995543465, - 0.005808482994325459, - 0.0, - 0.005681502996594645, - 0.005703032991732471, - 0.002760765011771582, - 0.0027244049997534603, - 0.00954677100526169, - 0.00943828398885671, - 0.0, - 0.00804781200713478, - 0.0, - 0.0037029639934189618, - 0.0, - 0.00598853400151711, - 0.0, - 0.0, - 0.004148688007262535, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.002206919001764618, - 0.0, - 0.003782454994507134, - 0.0061067320057190955, - 0.007972013991093263, - 0.006906770999194123, - 0.009466369010624476, - 0.01252455600479152, - 0.0, - 0.0028093380096834153, - 0.01101747500069905, - 0.005452929995954037, - 0.0045381739910226315, - 0.010460372010129504, - 0.006426462001400068, - 0.0, - 0.007029877000604756, - 0.0060138769913464785, - 0.005499820996192284, - 0.0, - 0.0, - 0.0, - 0.004259966008248739, - 0.0, - 0.0, - 0.0005012860056012869, - 0.0, - 0.0, - 0.0007549729925813153, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.021575050996034406, - 0.006811023995396681, - 0.023557032996905036, - 0.025168473002850078, - 0.0, - 0.0, - 0.03253268099797424, - 0.02709125900582876, - 0.1400938829901861, - 0.0, - 0.20314206900366116, - 0.023214936009026133, - 0.02755105000687763, - 0.0, - 0.0, - 0.0, - 0.010134110998478718, - 0.0, - 0.04923538998991717, - 0.0, - 0.03963899699738249, - 0.006067634996725246, - 0.022130637007649057, - 0.028236440004548058, - 0.0, - 0.0, - 0.0, - 0.0, - 0.02127194800414145, - 0.029847315003280528, - 0.03619671800697688, - 0.03691921599966008, - 0.34613946499302983, - 0.3641084130067611, - 0.37608371299575083, - 0.018025407000095583, - 0.35637445900647435, - 0.35261786499177106, - 0.34788909800408874, - 0.3596701789938379, - 0.02684058400336653, - 0.02148074100841768, - 0.011331626999890432, - 0.014998707003542222, - 0.04541946300014388, - 0.03762722200190183, - 0.038414474998717196, - 0.01719154100283049, - 0.018501605998608284, - 0.009318555006757379, - 0.029360244996496476, - 0.01887436800461728, - 0.0, - 0.0117341549921548, - 0.011967616010224447, - 0.0, - 0.014845008990960196, - 0.02972772900830023, - 0.006515462999232113, - 0.0, - 0.0, - 0.0, - 0.0, - 0.012399644998367876, - 0.012443956002243795, - 0.0, - 0.00598902499768883, - 0.03142445500998292, - 0.0, - 0.0006692900060443208, - 0.0051758710033027455, - 0.0, - 0.0, - 0.008146599007886834, - 0.0019998229981865734, - 0.0, - 0.006950741008040495, - 0.025387526999111287, - 0.0, - 0.0, - 0.02367916399089154, - 0.0, - 0.0, - 0.03137326499563642, - 0.02966347400797531, - 0.030197968997526914, - 0.0, - 0.029205812999862246, - 0.01831856300123036, - 0.02388378400064539, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.14800403099798132, - 0.16793641599360853, - 0.005321491000358947, - 0.0, - 0.006539253008668311, - 0.012162218990852125, - 0.13428384199505672, - 0.12189775200386066, - 0.0, - 0.0, - 0.018253508998895995, - 0.02013161800277885, - 0.0, - 0.00895267800660804, - 0.02888620100566186, - 0.0, - 0.018048596000880934, - 0.0, - 0.0069198090059217066, - 0.0, - 0.023484767007175833, - 0.03543026100669522, - 0.0, - 0.0, - 0.04959257999144029, - 0.0, - 0.0, - 0.03565506500308402, - 0.0, - 0.0, - 0.029789207997964695, - 0.006087750007282011, - 0.0033570049999980256, - 0.0049313849885948, - 0.0042327819974161685, - 0.005249756999546662, - 0.006799471011618152, - 0.008600774992373772, - 0.002513785002520308, - 0.0018903730087913573, - 0.00363659999857191, - 0.006043127999873832, - 0.0068874599965056404, - 0.01409445000172127, - 0.0044239669950911775, - 0.005929456994635984, - 0.006262438997509889, - 0.014822565994109027, - 0.013072115005343221, - 0.012598593995789997, - 0.006944962995476089, - 0.015848939001443796, - 0.008265732001746073, - 0.013691166997887194, - 0.009845417996984906, - 0.018130718992324546, - 0.024553658993681893, - 0.017268702998990193, - 0.00958029500907287, - 0.011216277998755686, - 0.013344870007131249, - 0.011636837007245049, - 0.16111104999436066, - 0.01219363599375356, - 0.014300555994850583, - 0.042802194991963916, - 0.008258585003204644, - 0.015609401991241612, - 0.014154596006846987, - 0.014551911997841671, - 0.019482252988382243, - 0.02172881699516438, - 0.03803253600199241, - 0.0321790550078731 - ], - "decode_latencies": [ - 0.005634041008306667, - 0.0009119530004682019, - 0.0004568090080283582, - 0.05688213399844244, - 0.022996125990175642, - 0.01144431799184531, - 0.002990999011672102, - 0.002952801005449146, - 0.029259371003718115, - 0.01213178898615297, - 0.013465508003719151, - 0.01716267000301741, - 0.05050814199785236, - 0.011695831999531947, - 0.024970554004539736, - 0.011928637002711184, - 0.033711077994666994, - 0.017483453993918374, - 0.006395020987838507, - 0.007042274999548681, - 0.006003015005262569, - 0.006285365991061553, - 0.07452494399331044, - 0.025584726987290196, - 0.032117738999659196, - 0.007725728995865211, - 0.011753511003917083, - 0.03343107900582254, - 0.018389571996522136, - 0.006613945006392896, - 0.0014714670105604455, - 0.008022509995498694, - 0.0828427670057863, - 0.0074038330058101565, - 0.013192889004130848, - 0.014563384000211954, - 0.012913319005747326, - 0.0009298350050812587, - 0.0808698749897303, - 0.03069941999274306, - 0.0035895010078093037, - 0.0012142510095145553, - 0.00881881699024234, - 0.0011126780009362847, - 0.0060821669903816655, - 0.03355885201017372, - 0.019654287010780536, - 0.0257863340084441, - 0.004091044989763759, - 0.08024236500205006, - 0.031015469998237677, - 0.012454337993403897, - 0.005532019000384025, - 0.015233423007884994, - 0.08931644400581717, - 0.01964153000153601, - 5.0677001127041876e-05, - 0.019036472993320785, - 0.03551389700442087, - 0.01980779600853566, - 0.004035515012219548, - 0.07977376799681224, - 0.0057054630015045404, - 0.01925723000022117, - 0.12666287799947895, - 0.026013540002168156, - 0.08103631799167488, - 0.002743692006333731, - 0.01194000399846118, - 0.005882273995666765, - 0.0030200629989849404, - 0.037673305996577255, - 0.010232777000055648, - 0.00624232400150504, - 0.008304554998176172, - 0.001774598000338301, - 0.006186037004226819, - 0.0012663580127991736, - 0.011542501000803895, - 0.006072457996197045, - 0.11586970200005453, - 0.19738924699777272, - 0.0016565749974688515, - 0.014487577995168976, - 0.006570711004314944, - 0.026417998989927582, - 0.005801944003906101, - 0.008276672000647523, - 0.0038358140009222552, - 0.012089600000763312, - 0.032267592003336176, - 0.07424941900535487, - 0.00660312500258442, - 0.03208844999608118, - 0.021193196007516235, - 0.004741277996799909, - 0.027388404996600002, - 0.016004229997633956, - 0.0031993949960451573, - 0.013858835998689756, - 0.006481653996161185, - 0.00549881299957633, - 0.001889366001705639, - 0.11338408201118, - 0.03049170400481671, - 0.012937345993123017, - 0.0013535070029320195, - 0.02101132899406366, - 0.03068982499826234, - 0.295473074002075, - 0.01903979800408706, - 0.07818655400478747, - 0.07676912999886554, - 0.17555815799278207, - 0.01411212800303474, - 0.0015714760083938017, - 0.008827216995996423, - 0.031256324000423774, - 0.003044072989723645, - 0.030410841995035298, - 0.017159091992652975, - 0.0056265349994646385, - 0.03761613600363489, - 0.032345686995540746, - 0.022504399006720632, - 0.0025405039923498407, - 0.012566856006742455, - 0.026458574007847346, - 0.01361393700062763, - 0.01391163699736353, - 0.013403318997006863, - 0.05232159099250566, - 0.002514908992452547, - 0.0847482920071343, - 0.025276724001741968, - 0.030615698007750325, - 0.012949661002494395, - 0.11780547400121577, - 0.2785016749985516, - 0.02302443799271714, - 0.19691770199278835, - 0.27894806399126537, - 0.27693521800392773, - 0.029429386006086133, - 0.003520648999256082, - 0.007808581998688169, - 0.008215335998102091, - 0.03990709000208881, - 0.026989292004145682, - 0.023899804000393488, - 0.0044466629915405065, - 0.005758975996286608, - 0.006328672010567971, - 0.015210587007459253, - 0.0065482189966132864, - 0.030009615002200007, - 0.3022056519985199, - 0.00478413600649219, - 0.007808196998666972, - 0.21474591401056387, - 0.006447773004765622, - 0.010135820004506968, - 0.03631784000026528, - 0.01607308999518864, - 0.02773330800118856, - 0.0029747130029136315, - 0.021105452993651852, - 0.02382240899896715, - 0.039833982998970896, - 0.025442341007874347, - 0.21858841499488335, - 0.023198708004201762, - 0.01456781299202703, - 0.02065865600889083, - 0.04006616699916776, - 0.01579682900046464, - 0.20122552699467633, - 0.014221371005987749, - 0.014347966003697366, - 0.01612240300164558, - 0.021550454999669455, - 0.016593006992479786, - 0.005987349999486469, - 0.024208710005041212, - 0.020332688000053167, - 0.012858738002250902, - 0.01730237099400256, - 0.01606120800715871, - 0.09123609399830457, - 0.0250691049877787, - 0.1493045699899085, - 0.029773300993838347, - 0.025391229006345384, - 0.013681017997441813, - 0.019721835997188464, - 0.006516084002214484, - 0.037324606004403904, - 0.021751135995145887, - 0.0080344900052296, - 0.014668167001218535, - 0.19970328599447384, - 0.020759689999977127, - 0.018305470992345363, - 0.014673291996587068, - 4.870598786510527e-05, - 3.763800486922264e-05, - 0.02353903699258808, - 0.00767229899065569, - 0.023444395992555656, - 0.012111047995858826, - 0.017012308991979808, - 0.019299310995847918, - 0.022086466997279786, - 0.017171867002616636, - 0.027761081990320235, - 0.01635839100345038, - 0.028308656997978687, - 2.0608989871107042e-05, - 0.013625439998577349, - 0.025537375011481345, - 0.01966728399565909, - 0.007569093999336474, - 0.0170660439907806, - 0.1505316629918525, - 0.01861591600754764, - 0.007066569000016898, - 0.015805811999598518, - 0.02180835099716205, - 0.01434147500549443, - 0.1496609160094522, - 0.0067933829996036366, - 0.007998182001756504, - 0.006598022009711713, - 0.012266659003216773, - 0.0012127240042900667, - 0.02706727599434089, - 0.012217505005537532, - 0.17270431299402844, - 0.014579134003724903, - 0.006749697000486776, - 0.00012253900058567524, - 0.005758281011367217, - 0.0006801379931857809, - 0.037102961010532454, - 0.019207575998734683, - 0.007249871006933972, - 0.009624883998185396, - 0.009733849990880117, - 0.09256121898943093, - 0.005898565999814309, - 0.006765725993318483, - 0.019774191998294555, - 0.017311280011199415, - 0.021631647992762737, - 0.011646099999779835, - 0.0169371969968779, - 0.01094606400874909, - 0.009881403995677829, - 0.014409462004550733, - 0.01392467599362135, - 0.1856290279974928, - 0.015723087009973824, - 0.010021957001299597, - 0.17161877399485093, - 0.018147598006180488, - 0.014290702005382627, - 0.018393232006928883, - 0.0384044870006619, - 0.014325130003271624, - 0.00817625000490807, - 0.013574076991062611, - 0.005902980003156699, - 0.003237893004552461, - 0.013097449991619214, - 0.03525821299990639, - 0.03293467700132169, - 0.009777768995263614, - 0.003989735996583477, - 0.0011991679930360988, - 0.011285688990028575, - 0.015466343000298366, - 0.002143994002835825, - 3.955500142183155e-05, - 0.00557144200138282, - 0.006723715006955899, - 0.017904096996062435, - 0.0031709360046079382, - 0.013617466000141576, - 0.0030438450048677623, - 0.0015549620002275333, - 0.01216521101014223, - 0.012067251998814754, - 0.013176421009120531, - 0.0025487240054644644, - 0.00287393199687358, - 0.0052394640079000965, - 0.002686400999664329, - 0.003103923998423852, - 0.0004267750045983121, - 0.03136728800018318, - 0.007560596990515478, - 0.0039686320087639615, - 0.0019282549910712987, - 0.0028939079929841682, - 0.0028324090089881793, - 0.0035418329935055226, - 0.0019909650000045076, - 0.017630768998060375, - 7.257401011884212e-05, - 0.0018088289943989366, - 0.0015168100071605295, - 0.0027783699915744364, - 0.007715544998063706, - 0.001713509002001956, - 0.0047717069974169135, - 0.004251381993526593, - 0.007445233000908047, - 0.0036193859996274114, - 0.009795122008654289, - 2.5258996174670756e-05, - 0.0011843160027638078, - 0.004490642008022405, - 0.0014589259953936562, - 0.0025601989909773692, - 0.0024872210051398724, - 0.0016176399949472398, - 0.0035902499948861077, - 0.0021269529970595613, - 0.0021721359953517094, - 0.0037929760001134127, - 0.002485318007529713, - 0.004096402000868693, - 5.7358003687113523e-05, - 0.008798014998319559, - 0.0031403239991050214, - 0.0003693080070661381, - 0.000578717008465901, - 0.004919881001114845, - 0.0031858879956416786, - 0.002304884998011403, - 0.0013356359995668754, - 0.0047689070052001625, - 0.0011410480074118823, - 0.002269935008371249, - 0.0052612629951909184, - 0.002796090004267171, - 2.471900370437652e-05, - 0.009846625995123759, - 0.003879474999848753, - 0.00544721499318257, - 0.002651583999977447, - 0.0029714809934375808, - 0.0023609690106241032, - 0.0007415099971694872, - 0.0009350100008305162, - 0.005417560998466797, - 0.0025875989958876744, - 0.0036447489983402193, - 0.01044306300173048, - 0.005567343992879614, - 1.9098995835520327e-05, - 0.0007187459996202961, - 0.003767697009607218, - 0.0033726179972290993, - 0.00205633400764782, - 0.0030039140110602602, - 0.005537211007322185, - 0.001044674005242996, - 0.0023006420087767765, - 0.0004341830062912777, - 0.0007218950049718842, - 0.0030896690004738048, - 0.00029171899950597435, - 0.00039990199729800224, - 4.588499723467976e-05, - 0.0006722309917677194, - 0.0021746590064140037, - 0.00575927400495857, - 0.055263754999032244, - 0.011452850012574345, - 0.00022776899277232587, - 0.010353737001423724, - 0.01911377999931574, - 0.00015396700473502278, - 0.010884431001613848, - 0.01431059899914544, - 0.011964121003984474, - 0.005448665004223585, - 0.005663294010446407, - 0.017713692999677733, - 0.010462842998094857, - 0.016437218990176916, - 0.015255951992003247, - 0.0058221740036970004, - 0.009921832999680191, - 0.010056333994725719, - 0.016268491992377676, - 0.18819645600160584, - 0.0006929490045877174, - 0.0002658059966051951, - 0.0004785839992109686, - 0.010972317002597265, - 0.005547840992221609, - 0.0541134660015814, - 0.006256168999243528, - 0.04134191499906592, - 0.021692385998903774, - 0.007685803007916547, - 0.007281657002749853, - 0.00637036599800922, - 0.011608288987190463, - 0.3272625910030911, - 0.00975980400107801, - 0.007174226004281081, - 0.006652880008914508, - 0.013678947012522258, - 0.006196907997946255, - 0.013138460999471135, - 0.002057471006992273, - 0.010926636008662172, - 0.010655136007699184, - 0.0013882300117984414, - 0.006100750993937254, - 0.005365461009205319, - 0.007145864001358859, - 0.00908071399317123, - 0.013966051003080793, - 0.0029291020036907867, - 0.002793039006064646, - 0.0022372020030161366, - 0.009131525992415845, - 0.006289398006629199, - 0.004399568002554588, - 0.33281657099723816, - 0.002191444000345655, - 0.007900176002294756, - 0.0018209239933639765, - 0.0007252140057971701, - 0.01030448499659542, - 0.001150156997027807, - 0.0070335960044758394, - 0.005930300001637079, - 0.006655519988271408, - 0.02179010600957554, - 2.4766006390564144e-05, - 0.014760231002583168, - 0.0034896480065071955, - 0.0017880039958981797, - 0.0025120840000454336, - 0.002822877009748481, - 0.01261680900643114, - 0.000480575006804429, - 0.002103798004100099, - 0.023919794999528676, - 0.0029796549933962524, - 0.007891011002357118, - 0.0073361260001547635, - 0.01682964999054093, - 0.012828037011786364, - 0.0012927310017403215, - 0.018761678991722874, - 0.005471715005114675, - 0.00733365400810726, - 0.0016458979953313246, - 0.01341243099886924, - 0.013289191003423184, - 0.005873859001439996, - 0.005044674995588139, - 0.0014483539998764172, - 3.917799040209502e-05, - 0.011706147997756489, - 0.012417603997164406, - 0.0032732399995438755, - 0.013743837989750318, - 0.005989073993987404, - 0.0009355700021842495, - 0.005742016001022421, - 0.00674791399796959, - 0.07186691000242718, - 0.006195189998834394, - 0.01978745000087656, - 0.005110912999953143, - 0.011178799002664164, - 0.016790658002719283, - 0.01351239399809856, - 0.006045242000254802, - 0.006313861988019198, - 0.005824804989970289, - 0.005488121998496354, - 0.001959992994670756, - 0.01814196900522802, - 5.938101094216108e-05, - 0.005681959999492392, - 0.0011815589969046414, - 0.01311970700044185, - 0.0004453160072444007, - 0.012538846989627928, - 0.15024364199780393, - 0.01026390898914542, - 0.005717949999962002, - 0.005133300990564749, - 0.010739969002315775, - 0.012841247997130267, - 0.0183410549943801, - 0.0005113299994263798, - 0.021430537992273457, - 0.001146470007370226, - 0.0010916149913100526, - 0.0010744850005721673, - 0.002472883992595598, - 0.0006371490017045289, - 0.0020784109947271645, - 0.005566416992223822, - 0.0011542360007297248, - 0.0010332509991712868, - 0.0023456100025214255, - 0.0009254019969375804, - 0.0014068659947952256, - 0.0035796140000456944, - 0.0013704410084756091, - 0.0017112680070567876, - 0.008027743999264203, - 0.002211888990132138, - 0.0075393949955469, - 0.0033842289994936436, - 0.00375518600048963, - 0.001336855988483876, - 0.0049950419925153255, - 0.0067477730044629425, - 0.011584589010453783, - 0.003563714009942487, - 0.011390404004487209, - 0.011466060997918248, - 0.004376648998004384, - 0.0027005119918612763, - 0.006001703994115815, - 0.006588667994947173, - 0.007676356006413698, - 0.0033704270026646554, - 0.005787582995253615, - 0.004665640008170158, - 0.0035053699975833297, - 0.006297459010966122, - 0.005769068011431955, - 0.005402595998020843, - 0.006943440006580204, - 0.007738019994576462, - 0.012122886997531168, - 0.01276656100526452 - ], - "multi_turn_cache_hits": 74, - "multi_turn_cache_misses": 318, - "seed": 42, - "summary": { - "total_requests": 549, - "total_tokens": 147832, - "elapsed_time": 8.406119108200073, - "avg_throughput_tokens_per_sec": 17586.236656555528, - "requests_per_second": 65.30956710623536, - "end_to_end_latency_ms": { - "mean": 4470.002607346009, - "p50": 3735.95871499856, - "p95": 9286.451094801308, - "p99": 12368.833587403056 - }, - "storage_io_latency_ms": { - "mean": 245.69455378108705, - "p50": 128.49979402380995, - "p95": 881.7592621897347, - "p99": 1371.4309435128214 - }, - "generation_latency_ms": { - "mean": 0.0, - "p50": 0.0, - "p95": 0.0, - "p99": 0.0 - }, - "cache_stats": { - "cache_hit_rate": 0.927956807828581, - "cache_hits": 5500, - "cache_misses": 427, - "gpu_entries": 7, - "cpu_entries": 10, - "nvme_entries": 413, - "gpu_memory_used_gb": 3.182861328125, - "cpu_memory_used_gb": 2.6524658203125, - "offloads_cpu": 423, - "offloads_nvme": 413, - "storage_health": { - "overall_status": "PASS", - "criteria": [ - { - "name": "NVMe Read P95 < 200ms", - "target": 200, - "actual": 87.53535100549925, - "unit": "ms", - "passed": true - }, - { - "name": "CPU RAM P95 < 150ms", - "target": 150, - "actual": 15.01023083765176, - "unit": "ms", - "passed": true - }, - { - "name": "Cache Hit Rate > 30%", - "target": 0.3, - "actual": 0.927956807828581, - "unit": "ratio", - "passed": true - } - ], - "passed_count": 3, - "total_count": 3 - }, - "prefill_writes": 440, - "decode_reads": 5500, - "prefill_bytes_written_gb": 8.93701171875, - "decode_bytes_read_gb": 95.8282470703125, - "system_prompt_hits": 1026, - "common_phrase_hits": 0, - "user_cache_hits": 4400, - "multi_turn_hits": 74, - "total_read_bytes": 102894796800, - "total_write_bytes": 9596043264, - "total_read_gb": 95.8282470703125, - "total_write_gb": 8.93701171875, - "read_write_ratio": 10.722627438124897, - "read_iops": 5500, - "write_iops": 440, - "gpu_read_p50_ms": 7.593225993332453, - "gpu_read_p95_ms": 126.69581020018083, - "gpu_read_p99_ms": 281.39111567870697, - "gpu_write_p50_ms": 26.258770500135142, - "gpu_write_p95_ms": 217.0312492526136, - "gpu_write_p99_ms": 354.9093873407401, - "cpu_read_p50_ms": 3.6888249960611574, - "cpu_read_p95_ms": 15.01023083765176, - "cpu_read_p99_ms": 21.191230089317106, - "nvme_read_p50_ms": 41.70222899119835, - "nvme_read_p95_ms": 159.59849349746946, - "nvme_read_p99_ms": 261.15431699872715, - "nvme_read_device_p50_ms": 21.312534998287447, - "nvme_read_device_p95_ms": 87.53535100549925, - "nvme_read_device_p99_ms": 157.89245549967745, - "nvme_read_host_p50_ms": 18.934832012746483, - "nvme_read_host_p95_ms": 132.99959748837864, - "nvme_read_host_p99_ms": 225.86588749254588 - }, - "qos_metrics": { - "interactive": { - "total_requests": 549, - "latency_ms": { - "mean": 4470.002607346009, - "p50": 3735.95871499856, - "p95": 9286.451094801307, - "p99": 12368.833587403056, - "max": 13962.206897995202 - }, - "sla": { - "target_p95_ms": 50, - "actual_p95_ms": 9286.451094801307, - "compliance": 0.0, - "met": false - } - }, - "responsive": { - "no_data": true - }, - "batch": { - "no_data": true - } - }, - "prefix_cache_stats": { - "prefix_hits": 109, - "prefix_misses": 440, - "system_prompt_reuse": 109, - "common_phrase_reuse": 0, - "bytes_saved": 96337920 - }, - "autoscaling_stats": [], - "autoscaling_summary": null, - "multi_turn_stats": { - "cache_hits": 74, - "cache_misses": 318, - "hit_rate": 0.18877551020408162 - } - } -} \ No newline at end of file diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_cpu_trial1.json b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_cpu_trial1.json deleted file mode 100644 index c27fb184..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_cpu_trial1.json +++ /dev/null @@ -1,2875 +0,0 @@ -{ - "requests_completed": 549, - "total_tokens_generated": 148297, - "total_storage_io_latency": 85.37432557625289, - "total_generation_latency": 0.0, - "end_to_end_latencies": [ - 0.053452606007340364, - 0.05367491800279822, - 0.259031777997734, - 0.2815405940054916, - 0.28262081899447367, - 0.36569377199339215, - 0.390379546006443, - 0.3906261319934856, - 0.3921293860039441, - 0.393144473011489, - 0.41293133499857504, - 0.42930796700238716, - 0.429982468005619, - 0.4336329149955418, - 0.4329567690001568, - 0.46125186199788004, - 0.4741417330078548, - 0.47475290099100675, - 0.4862737160001416, - 0.49364392799907364, - 0.5069628119963454, - 0.5367650800035335, - 0.6114325199887389, - 0.6129173600056674, - 0.6274256460019387, - 0.6273081679973984, - 0.6275402640021639, - 0.6282047880085884, - 0.6527045990078477, - 0.6520410310040461, - 0.6633744680002565, - 0.6634506970003713, - 0.664787196990801, - 0.6715405229915632, - 0.6706476399995154, - 0.6700678579945816, - 0.6742758239997784, - 0.6971008580003399, - 0.7031323139963206, - 0.7158846880047349, - 0.7162435259961057, - 0.7173895000014454, - 0.7239443830039818, - 0.7241373629949521, - 0.7242602740006987, - 0.7241392430005362, - 0.7249924509960692, - 0.7256414059957024, - 0.7315989780036034, - 0.7390046009968501, - 0.7407298530015396, - 0.7476287110039266, - 0.7481780249945587, - 0.8239456939918455, - 0.8275686140113976, - 0.838401128014084, - 0.8451604720030446, - 0.8587939799908781, - 0.8598574870120501, - 0.8661292729957495, - 0.8668413019913714, - 0.9639523200021358, - 0.9650668840040453, - 0.968388994995621, - 0.9738630500069121, - 0.9734659380046651, - 0.9952997770014917, - 0.9935580729943467, - 1.0055565729999216, - 1.0083998719928786, - 1.007051330001559, - 1.0081940410018433, - 1.0081257440033369, - 1.0848925070022233, - 1.0840351410006406, - 1.0875650369998766, - 1.0870510419917991, - 1.091222763992846, - 1.1069849649938988, - 1.1618646650022129, - 1.1813195499998983, - 1.638135269007762, - 1.6677846910024527, - 1.7182723979931325, - 2.2145051390107255, - 2.2574090659909416, - 2.524545886000851, - 2.534926546009956, - 2.5767208640027093, - 2.602625964995241, - 2.6289061029965524, - 2.777865052004927, - 3.1845313030062243, - 3.2477029919973575, - 3.3308873070054688, - 3.4330428410030436, - 3.444512885995209, - 4.25142667000182, - 4.333292275012354, - 4.4442550120002124, - 4.543629341991618, - 4.560067280996009, - 4.587252318000537, - 4.761398383998312, - 5.053511787002208, - 5.059128629000043, - 5.11667743200087, - 5.366689670001506, - 5.4191473859973485, - 5.656757921999088, - 5.672398119000718, - 5.734321884010569, - 6.106856689992128, - 6.142979361000471, - 6.267472332008765, - 6.440742145990953, - 6.627858652005671, - 6.689623062004102, - 6.7312024050042965, - 6.758408911991864, - 6.773702436010353, - 6.871084933998645, - 6.936903670997708, - 6.989478578005219, - 6.996047492997604, - 7.48518212599447, - 7.5101210409920895, - 7.526752896010294, - 7.603261873999145, - 7.665261401009047, - 7.717058570997324, - 7.832114117001765, - 7.92122122499859, - 7.930951037997147, - 7.987784241006011, - 8.009232901997166, - 8.106902364001144, - 8.531406169000547, - 8.702136956999311, - 8.876861086013378, - 9.404561106988695, - 9.496925534011098, - 9.592393033992266, - 9.804192904004594, - 9.829362836011569, - 9.835634007002227, - 9.93300208299479, - 9.963231399000506, - 10.16410746499605, - 10.198852571003954, - 10.400805693003349, - 10.434608901996398, - 10.469927597005153, - 10.625812113998109, - 10.685865603009006, - 10.686895031001768, - 10.691765502007911, - 10.713801356992917, - 10.734960080008022, - 10.750978408002993, - 11.574238610992325, - 11.61385917798907, - 11.646242964998237, - 11.746351323003182, - 11.83845017600106, - 12.150091092000366, - 12.191248293005629, - 12.355542010991485, - 12.464985961007187, - 13.099686178000411, - 13.190671884003677, - 13.206074809990241, - 13.221700954003609, - 13.996008581991191, - 14.062391785992077, - 14.110145761995227, - 14.252838544998667, - 14.36610235599801, - 14.491416498000035, - 14.732455886012758, - 14.732938184999512, - 14.845291671997984, - 15.007807760994183, - 15.050055761996191, - 15.167686282991781, - 15.179892454005312, - 15.208277306999662, - 15.225428009987809, - 15.27744877699297, - 15.49052759699407, - 15.620312137005385, - 15.646428730993648, - 15.955127301000175, - 16.233067483000923, - 17.120673243989586, - 17.124406989998533, - 17.156237414004863, - 17.28065439799684, - 17.524419205990853, - 17.544941283005755, - 17.555929947993718, - 17.596207271009916, - 17.606323044004967, - 17.832763596990844, - 17.890095294991625, - 17.95098722500552, - 18.089680802004295, - 18.18867495599261, - 18.239044666988775, - 18.265297129008104, - 18.27103790899855, - 18.28645634499844, - 18.363337473987485, - 18.386277819998213, - 18.47031761899416, - 18.537581074997433, - 18.562832765994244, - 18.628793357987888, - 18.732437452999875, - 18.778707553996355, - 18.846688215999166, - 18.8723339119897, - 18.97594983599265, - 19.19522384199081, - 19.263140426002792, - 19.279858177003916, - 19.491905835995567, - 19.66534383299586, - 19.796948799994425, - 19.949845784998615, - 21.099951027004863, - 21.250794695006334, - 21.258743005993892, - 21.325795290991664, - 21.347746200990514, - 21.529587594006443, - 21.617193558995496, - 21.922520774009172, - 21.99007330099994, - 22.067459287005477, - 22.146750997999334, - 22.454072481996263, - 22.50549183599651, - 22.53294562600786, - 22.750282148990664, - 22.813230767002096, - 22.861071901003015, - 23.055260228997213, - 23.1807922529988, - 23.274439132001135, - 23.312160836998373, - 23.321543040990946, - 23.50964010799362, - 23.768737801990937, - 23.77046856500965, - 24.172127141006058, - 24.203503684999305, - 24.243413304007845, - 24.281965551999747, - 24.31495695799822, - 24.423996444005752, - 24.439344780999818, - 24.567224137994344, - 24.63588324400189, - 24.645667739998316, - 24.715807066997513, - 24.754166091006482, - 24.81670486499206, - 26.17558219199418, - 26.223541353989276, - 26.249697139995988, - 26.317228116997285, - 26.351027791009983, - 26.39484494100907, - 26.512057553991326, - 26.542785946003278, - 26.567871822000598, - 26.708306507003726, - 26.71968523900432, - 26.769257831998402, - 26.805320932995528, - 26.88287441400462, - 26.967942271992797, - 27.129527615994448, - 27.185535518001416, - 27.216790600999957, - 27.488770466996357, - 27.66418836900266, - 27.7412819720048, - 27.82885593199171, - 27.987944394990336, - 28.07111908699153, - 28.14615840499755, - 28.2382508120063, - 28.278608068008907, - 28.308057956994162, - 28.43234040000243, - 28.464781588001642, - 28.74012533800851, - 29.064121832998353, - 29.083315550000407, - 29.109070335005526, - 29.18180407700129, - 29.191376870992826, - 29.201366550987586, - 29.243098609003937, - 29.31803750200197, - 29.344300312994164, - 29.442806967010256, - 29.46291098499205, - 29.533981122003752, - 29.55042261799099, - 29.557028123002965, - 29.58603669500735, - 29.621756317006657, - 29.633057778002694, - 29.689566646993626, - 29.74582924999413, - 29.81880794398603, - 29.835412944012205, - 29.897025942002074, - 29.98562433499319, - 30.047102608004934, - 30.154880866000894, - 30.221725413997774, - 30.38835056500102, - 30.417566680000164, - 30.464393592003034, - 30.46717930599698, - 32.35659575399768, - 32.36357704999682, - 32.41316831999575, - 32.43387756400625, - 32.500303443011944, - 32.51764473899675, - 32.564899197008344, - 32.574910476992955, - 32.601461652011494, - 32.615751293997164, - 32.91588010500709, - 33.11494078399846, - 33.35120709199691, - 33.517555711005116, - 33.61086866800906, - 33.69265901099425, - 33.79641232200083, - 33.888697945993044, - 34.0597764460108, - 34.25079320700024, - 34.27330426800472, - 34.28971039399039, - 34.37897254599375, - 34.455142225007876, - 34.54729696200229, - 34.568331792994286, - 34.6874039809918, - 34.70452326501254, - 34.71118488100183, - 34.998698712995974, - 35.019471022998914, - 35.0460576480109, - 35.067572364001535, - 35.077052310996805, - 35.20318888100155, - 35.41262323499541, - 35.47034285200061, - 35.66737182800716, - 35.77386423099961, - 35.816897771001095, - 35.959184776991606, - 36.05317784899671, - 36.144025622008485, - 36.17028364101134, - 36.2196639120084, - 36.43043898700853, - 36.52032941200014, - 36.56917191800312, - 36.58784476300934, - 36.634001074999105, - 36.63916368399805, - 36.75911540600646, - 36.7906494110066, - 36.89258746399719, - 37.100022648999584, - 37.43895518600766, - 37.495196277988725, - 37.51569765000022, - 37.51628414299921, - 37.54723887, - 37.58936300998903, - 37.59537666400138, - 37.79908708500443, - 37.8569375579973, - 37.974191347006126, - 37.993587411998305, - 40.08288054300647, - 40.13005157900625, - 40.45543181699759, - 40.465368738005054, - 40.733543780996115, - 40.7334190190013, - 40.852934936992824, - 40.853663593996316, - 40.86402944198926, - 40.90899981599068, - 41.048764285005745, - 41.19019993599795, - 41.31141450800351, - 41.32085782599461, - 41.36324941800558, - 41.450180511994404, - 41.57022752400371, - 41.71462483200594, - 41.72475119399314, - 41.73486419199617, - 41.78534713400586, - 41.80146968000918, - 41.832623249996686, - 41.911574382989784, - 41.93217602200457, - 41.93301796200103, - 42.03119082599005, - 42.228277759000775, - 42.26225202399655, - 42.3700055119989, - 42.70804338000016, - 42.80471767899871, - 42.864793335000286, - 42.96439643600024, - 43.091440446994966, - 43.139091942997766, - 43.294758436008124, - 43.45563275599852, - 43.55063954999787, - 43.565488292006194, - 43.718643308995524, - 43.86887351400219, - 43.90869068900065, - 44.071273734007264, - 44.12127786800556, - 44.152847207005834, - 44.169607317991904, - 44.325929372003884, - 44.36079918500036, - 44.37531825300539, - 44.382389587000944, - 44.49546936999832, - 44.57662362798874, - 44.7167816709989, - 44.7381808719947, - 44.81956151500344, - 45.04149950899591, - 45.04368916999374, - 45.059777620990644, - 45.17879341400112, - 45.20899639900017, - 45.415484736993676, - 45.48347267600184, - 45.53567135699268, - 45.54163833799248, - 45.639902178998454, - 46.13799304499116, - 46.17222609199234, - 46.280873719995725, - 46.376912631996674, - 46.40128704899689, - 46.51625167099701, - 46.578261136004585, - 46.66602678000345, - 46.93037890898995, - 47.01500482400297, - 47.027077464998, - 47.21827012699214, - 47.24939069499669, - 50.996485412004404, - 51.01745001800009, - 51.05432579500484, - 51.25466104000225, - 51.39113936299691, - 51.48857628200494, - 51.57294386200374, - 51.5879001620051, - 51.73291014500137, - 52.11753091799619, - 52.261962757998845, - 52.5131687480025, - 52.599543521995656, - 52.728401526997914, - 52.87176360900048, - 52.899352302003535, - 53.0519055689947, - 53.149666434008395, - 53.20249381499889, - 53.23105097199732, - 53.40230493899435, - 53.563913657999365, - 53.856622059989604, - 53.94366757100215, - 53.9847600540088, - 54.02629446100036, - 54.035034726999584, - 54.034648175991606, - 54.035584486002335, - 54.04567450500326, - 54.04704702699382, - 54.051130811989424, - 54.05699639901286, - 54.05721877099131, - 54.05985346100351, - 54.061096395002096, - 54.06444693400408, - 54.06461527699139, - 54.06526569799462, - 54.06603060600173, - 54.065812664994155, - 54.06884601700585, - 54.069521169993095, - 54.071862719996716, - 54.08858179599338, - 54.091193273008685, - 54.09185243399406, - 54.09527474400238, - 54.10228158200334, - 54.10299348901026, - 54.112723765996634, - 54.125311223993776, - 54.124372146994574, - 54.124836475006305, - 54.126607213998795, - 54.12793981200957, - 54.12984402499569, - 54.1307741359924, - 54.13062637099938, - 54.1314077959978, - 54.13393318200542, - 54.13275802299904, - 54.13226571198902, - 54.15769564799848, - 54.166947767007514, - 54.173722288993304, - 54.17461828199157, - 54.1749615810113, - 54.17504055799509, - 54.18888949000393, - 54.24395789499977, - 54.334968684997875, - 54.956112078987644, - 55.95408808700449, - 56.380503844993655 - ], - "storage_latencies": [ - 0.006632845994317904, - 0.02162129599309992, - 0.15133022298687138, - 0.1562735320185311, - 0.192366551986197, - 0.13505638398055453, - 0.15351624799950514, - 0.032281926018185914, - 0.21549770199635532, - 0.2592784449807368, - 0.2684006649942603, - 0.09305478200258221, - 0.2333678449940635, - 0.1855456180201145, - 0.10177725202811416, - 0.24267534297541715, - 0.05676655100251082, - 0.06875019299332052, - 0.07059653999749571, - 0.05009320500539616, - 0.13077460599015467, - 0.26086763400235213, - 0.13519381098740268, - 0.366055046004476, - 0.344096195010934, - 0.3444424690242158, - 0.37171365099493414, - 0.2776123610237846, - 0.3742686609766679, - 0.16923542402219027, - 0.12002304098859895, - 0.0440922080015298, - 0.3628779209975619, - 0.39341801500995643, - 0.18874351697741076, - 0.029196599003626034, - 0.47198566202132497, - 0.20324314599565696, - 0.18066144998010714, - 0.12736719999520574, - 0.1831432859908091, - 0.2370150480128359, - 0.1847428269975353, - 0.20303723400866147, - 0.09869485801027622, - 0.06204646598780528, - 0.2531789469794603, - 0.24006352000287734, - 0.07479580299695954, - 0.31716611701995134, - 0.3966054239717778, - 0.17196514502575155, - 0.2774424800009001, - 0.10444792399357539, - 0.20441982099146117, - 0.1325330810068408, - 0.09182319897809066, - 0.17474705798667856, - 0.2775060249987291, - 0.254273874044884, - 0.10227128100814298, - 0.2457991539995419, - 0.25686651101568714, - 0.5684515180037124, - 0.41539570201712195, - 0.1373926779924659, - 0.5285211820009863, - 0.27154429600341246, - 0.21833688899641857, - 0.609065391952754, - 0.3899248660163721, - 0.5199928909860319, - 0.2780962710385211, - 0.4638133459666278, - 0.1423862110095797, - 0.667680299928179, - 0.5300024580210447, - 0.11041685300006066, - 0.1136930869979551, - 0.5509140819485765, - 0.458914864982944, - 0.5412152010248974, - 0.7782295500073815, - 0.5542813910287805, - 0.28581109397055116, - 0.4221494320227066, - 0.10184798999398481, - 0.3231013710174011, - 0.12585416600632016, - 0.44984069804195315, - 0.3456212129967753, - 0.22384522498759907, - 0.09949119700468145, - 0.06810911498905625, - 0.3367690290178871, - 0.04365815799974371, - 0.4036732440436026, - 0.3715964750153944, - 0.18607149099989329, - 0.3112726230319822, - 0.026643547011190094, - 0.5819726509798784, - 0.46701378097350243, - 0.056580783988465555, - 0.3749610130180372, - 0.2728221110010054, - 0.02073913998901844, - 0.03128491700044833, - 0.2060311879904475, - 0.3656914129969664, - 0.15843500502523966, - 0.7243772639631061, - 0.383040789005463, - 0.047210637989337556, - 0.025842000992270187, - 0.4951969469693722, - 0.276751002020319, - 0.32450424700800795, - 0.46043728101358283, - 0.38698990202101413, - 0.2879525020107394, - 0.0939044980041217, - 0.09869805500784423, - 0.08558375200664159, - 0.3798667909723008, - 0.10996922600315884, - 0.043926572005148046, - 0.04601358700892888, - 0.48924941002042033, - 0.2716946530272253, - 0.3809258029650664, - 0.2398179620213341, - 0.39489877500454895, - 0.10369110702595208, - 0.058981367998057976, - 0.06296186200052034, - 0.0208127039950341, - 0.4397171080345288, - 0.043509297000127845, - 0.05290720600169152, - 0.015616902994224802, - 0.05364176201692317, - 0.18687061200034805, - 0.34777080605272204, - 0.08495677397877444, - 0.12101901796995662, - 0.6730228919914225, - 0.07948091301659588, - 0.43149553099647164, - 0.102910837973468, - 0.06743351499608252, - 0.09480989497387782, - 0.0872836580092553, - 0.0742753959930269, - 0.048595051004667766, - 0.08755140399443917, - 0.08877810402191244, - 0.11866975798329804, - 0.12088135197700467, - 0.15077990001009312, - 0.09892640596081037, - 0.07304586500686128, - 0.11267752401181497, - 0.43761860101949424, - 0.057645788008812815, - 0.3173012169863796, - 0.6702742310008034, - 0.11438594199717045, - 0.1209423150139628, - 0.07051287099602632, - 0.07333210401702672, - 0.13267425300728064, - 0.0680662659869995, - 0.03670566200162284, - 0.08291188599832822, - 0.10094419901724905, - 0.680382049002219, - 0.041150596996885724, - 0.1373241930268705, - 0.06494600900623482, - 0.04743305301235523, - 0.10171880197594874, - 0.11878507898654789, - 0.1679677539941622, - 0.15102559298975393, - 0.15591175101872068, - 0.0873091649991693, - 0.0995748869900126, - 0.1703251909930259, - 0.9050085540075088, - 0.08180035301484168, - 0.0731878009828506, - 0.12480907602002844, - 0.09885411501454655, - 0.10004372002731543, - 0.010256618988933042, - 0.15775435497926082, - 0.13656450204143766, - 0.16488404202391393, - 0.08797697701083962, - 0.11643510901194531, - 0.05208949699590448, - 0.02184589600074105, - 0.2797037010022905, - 0.12528679400566034, - 0.038217936002183706, - 0.12405213600140996, - 0.09483066698885523, - 0.027198805983061902, - 0.08346116500615608, - 0.052233618014724925, - 0.07787212099356111, - 0.07772712901351042, - 0.2864333040342899, - 0.2887409140676027, - 0.2399135830346495, - 0.047285926004406065, - 0.9399372479965677, - 0.11176012401119806, - 0.13919624299160205, - 0.0838263660116354, - 0.06811182503588498, - 0.07414232498558704, - 0.0472062549815746, - 0.06617374101188034, - 0.16752428901963867, - 0.04753846699895803, - 0.13516923600400332, - 0.14757259999169037, - 0.08186787097656634, - 1.7176636150252307, - 0.179736410966143, - 0.10378375301661436, - 0.046324197988724336, - 0.1502666809974471, - 0.05236769700422883, - 0.06775153499620501, - 0.05683425199822523, - 0.06761038399417885, - 0.047284225016483106, - 0.11415926399058662, - 0.108876254002098, - 0.09898011999030132, - 0.07883582201611716, - 1.206619521981338, - 0.04742818900558632, - 1.1897135669714771, - 0.020829290981055237, - 0.13958429600461386, - 0.08866687798581552, - 0.14718788501340896, - 0.046603727038018405, - 0.26797778000764083, - 0.1049593210045714, - 0.296967692047474, - 0.13604679401032627, - 0.06777437600248959, - 0.2277010379912099, - 0.09418823600572068, - 0.2290813500439981, - 0.2328886429895647, - 0.09021361102350056, - 0.04201815099804662, - 0.04718547601078171, - 0.04205362101492938, - 0.21396931598428637, - 0.09950287999527063, - 0.15394035901408643, - 0.0676184369949624, - 0.15290416400239337, - 0.07879308598057833, - 0.10241723600483965, - 0.011264843007666059, - 1.4067321599868592, - 0.1405268700036686, - 0.12572127801831812, - 0.06771565500821453, - 0.057810770013020374, - 0.0832769379921956, - 0.04186172300251201, - 0.05712397300521843, - 0.10963935700419825, - 0.09454083997115958, - 0.04099360198597424, - 0.2530123939795885, - 0.19870281098701525, - 0.08430822801892646, - 0.14926613800344057, - 0.11491871099860873, - 0.09878008099622093, - 0.00521281200053636, - 0.030741348993615247, - 0.07929884799523279, - 0.04734696800005622, - 0.027929170988500118, - 0.20216089799941983, - 0.19821575800597202, - 0.06775133399059996, - 0.06793581796227954, - 0.1363902190059889, - 0.06708948801679071, - 0.05061274699983187, - 0.14272352801344823, - 0.04640642898448277, - 0.01035393399070017, - 0.06507781299296767, - 0.12352341898076702, - 0.07226433498726692, - 0.12429752702882979, - 0.01601854301407002, - 0.20088271399436053, - 0.10231722399475984, - 0.18713859299896285, - 0.03077939903596416, - 0.047492163997958414, - 0.12985665997257456, - 0.09887870999227744, - 0.10374039399903268, - 0.0424214210070204, - 0.13655379103147425, - 0.10634646397375036, - 1.4335239310166799, - 0.036078061995795, - 0.06265008200716693, - 0.1183698069798993, - 0.23337473500578199, - 0.12010280099639203, - 0.047736534019350074, - 0.24122194800293073, - 0.13736925700504798, - 0.2240284460131079, - 0.09936831201775931, - 0.12202318501658738, - 0.0051453409978421405, - 0.08350106699799653, - 0.08297134301392362, - 0.02608076701289974, - 0.10226450297341216, - 0.005116261003422551, - 0.4290998970536748, - 0.068031360002351, - 0.05246075097238645, - 0.07887794799171388, - 0.04898708900145721, - 0.07712703499419149, - 0.11638621697784401, - 0.0774584530008724, - 0.057683860024553724, - 0.08343702000274789, - 0.01646726200124249, - 0.05270364099123981, - 0.1689542599779088, - 0.06768455203564372, - 0.11352764499315526, - 0.07718659800593741, - 0.10489413603499997, - 0.1292430930043338, - 0.15652102799504064, - 0.06189951299165841, - 0.046936325015849434, - 0.028315135001321323, - 0.1313561480055796, - 0.08019718999275938, - 0.10174452100181952, - 0.0825629869941622, - 0.13520506503118668, - 0.1031978240062017, - 0.10899203101871535, - 0.10809510403487366, - 0.07728131097974256, - 0.06581887498032302, - 0.023284921000595205, - 0.03785365400835872, - 0.16160539801057894, - 0.07555236201733351, - 0.0319854330009548, - 1.4874614010041114, - 0.06869181001093239, - 0.04253357698325999, - 0.025739146003616042, - 0.10341335600242019, - 0.12441380199743435, - 0.07312378099595662, - 0.07980088399199303, - 0.09746115800226107, - 0.13401138296467252, - 0.08425896600238048, - 0.060831251001218334, - 0.03636582398030441, - 0.09827833401504904, - 0.09386769999400713, - 0.11042098398320377, - 0.14036766800563782, - 0.270022069933475, - 0.2700197309750365, - 0.1012872150313342, - 0.10368557696347125, - 0.13070508097007405, - 0.06824822697672062, - 0.10135923298366833, - 0.052175369986798614, - 0.08152663100918289, - 0.10831181495450437, - 0.05730082199443132, - 0.05858518899185583, - 0.08384986997407395, - 0.20908337898436002, - 0.06842352803505491, - 0.025848712990409695, - 0.08268135102116503, - 0.09914894499524962, - 0.06200453198107425, - 0.16544282001268584, - 0.10429867898346856, - 0.08341491602186579, - 0.09864298401225824, - 0.13862560797133483, - 0.12498204597795848, - 0.23883665598987136, - 0.07469448400661349, - 0.06850840398692526, - 0.07245766899723094, - 0.011042201003874652, - 0.17063140105165076, - 0.10856598701502662, - 0.03627613201388158, - 0.14543386102013756, - 0.063050172990188, - 0.025922589004039764, - 0.07220230098755565, - 0.031210184999508783, - 0.12782400198921096, - 0.15458318301534746, - 0.13605349900899455, - 0.09370911403675564, - 0.09586965599737596, - 0.09661618401878513, - 0.005103344999952242, - 0.12613797002995852, - 0.02104829699965194, - 0.025953578995540738, - 0.2063007719698362, - 0.2022949200036237, - 0.10078945102577563, - 0.005202745000133291, - 0.13329786200483795, - 0.07798496101167984, - 0.04621978699287865, - 0.0949763079843251, - 0.17245778498181608, - 0.06845141699886881, - 0.057021655986318365, - 0.0788587639835896, - 0.07419688298250549, - 0.060188020986970514, - 0.02599599900713656, - 0.032691915999748744, - 0.11963099999411497, - 0.04452833201503381, - 0.07869895199837629, - 0.10362947297107894, - 0.0679757900070399, - 0.3700420959794428, - 0.24360363802406937, - 0.07866831999854185, - 0.0680528119992232, - 0.1498001859872602, - 0.09327205498993862, - 0.1092707829666324, - 0.11208327498752624, - 0.17057002602086868, - 0.0572283409856027, - 0.051380635995883495, - 0.10358597099548206, - 0.07276780302345287, - 0.07771733799017966, - 0.0709769989916822, - 0.08017682797799353, - 0.15086804900784045, - 0.06239623097644653, - 0.19140763096220326, - 0.11671077700157184, - 0.16766619801637717, - 0.0789088959863875, - 0.10459842001728248, - 0.1931943030358525, - 0.06834764599625487, - 0.186008791992208, - 0.10321225599909667, - 0.15800265599682461, - 0.04170316501404159, - 0.2386770580051234, - 0.20068074799200986, - 0.08310137999069411, - 0.11159393102570903, - 0.07806641799106728, - 0.06843128900800366, - 0.09821863097022288, - 0.06406090699601918, - 0.04206638899631798, - 0.20476332501857542, - 0.19239664199994877, - 0.21942065496114083, - 0.054260880016954616, - 0.08526583302591462, - 0.009970045008230954, - 0.007305517021450214, - 0.00617399399925489, - 0.140643622042262, - 0.004927595000481233, - 0.12707469900487922, - 0.018390494005871005, - 0.010150664995308034, - 0.07929607597179711, - 0.0046487150102620944, - 0.009074969988432713, - 0.11095161302364431, - 0.017533687030663714, - 0.010599004992400296, - 0.028790191980078816, - 0.012619638000614941, - 0.00873558000603225, - 0.02098012500209734, - 0.022665261989459395, - 0.022542246981174685, - 0.03890366802806966, - 0.0274611529748654, - 0.022285394981736317, - 0.02825964702060446, - 0.04704270297952462, - 0.009278789002564736, - 0.028735398984281346, - 0.012382426008116454, - 0.022173949007992633, - 0.04004716902272776, - 0.013195268969866447, - 0.013428564969217405, - 0.014907246993971057, - 0.02026525499240961, - 0.03531024997937493, - 0.012062634006724693, - 0.013919328994234093, - 0.01299499798915349, - 0.008555651002097875, - 0.018609524981002323, - 0.015726643992820755, - 0.11089786198863294, - 0.05896200500137638, - 0.10389046800264623 - ], - "generation_latencies": [ - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0 - ], - "throughput_timeline": [], - "prefill_latencies": [ - 0.02040006799506955, - 0.00665886199567467, - 0.0009055699920281768, - 0.017381778001436032, - 0.00865850799891632, - 0.015389545005746186, - 0.020305803991504945, - 0.02413633700052742, - 0.02468874299665913, - 0.016999601997667924, - 0.02493964599852916, - 0.012617647997103631, - 0.07013115899462719, - 0.05950307699094992, - 0.016845021993503906, - 0.14061525999568403, - 0.1405788650008617, - 0.1795076929993229, - 0.14856669699656777, - 0.12477992598724086, - 0.14860145800048485, - 0.14852015099313576, - 0.14896095400035847, - 0.12606441299431026, - 0.13356833199213725, - 0.023830436999560334, - 0.12638574400625657, - 0.15599359398765955, - 0.13269445300102234, - 0.016724633998819627, - 0.15838712800177746, - 0.05325013400579337, - 0.0008751290006330237, - 0.04874620599730406, - 0.047669631996541284, - 0.08621647499967366, - 0.03250968900101725, - 0.07382859800418373, - 0.07527583600312937, - 0.07995541499985848, - 0.006492521992186084, - 0.021614425990264863, - 0.09345137600030284, - 0.018478033991414122, - 0.01881078000587877, - 0.02062681999814231, - 0.022286078994511627, - 0.0360789849946741, - 0.014329733996419236, - 0.013011143993935548, - 0.017270423006266356, - 0.03708557100617327, - 0.028098975992179476, - 0.0381567510048626, - 0.03476457900251262, - 0.05454739701235667, - 0.041964691001339816, - 0.03182252300030086, - 0.031941901994287036, - 0.026552768002147786, - 0.07506656799523626, - 0.06431557699397672, - 0.06253659300273284, - 0.06288061098894104, - 0.04612407299282495, - 0.021409767010482028, - 0.03987180200056173, - 0.030492759004118852, - 0.08009424200281501, - 0.021407508000265807, - 0.11084407799353357, - 0.09176831899094395, - 0.017805422001401894, - 0.02327575300296303, - 0.025548566001816653, - 0.0314473629987333, - 0.03555269099888392, - 0.012270849998458289, - 0.050888426005258225, - 0.038099811994470656, - 0.03458966000471264, - 0.03506241399736609, - 0.04450340798939578, - 0.04468962400278542, - 0.046224328005337156, - 0.034141265001380816, - 0.02318261300388258, - 0.06778397200105246, - 0.02243723398714792, - 0.013710375991649926, - 0.013459781999699771, - 0.019682211001054384, - 0.008366776994080283, - 0.022988386001088656, - 0.023610316988197155, - 0.08512508599960711, - 0.0938651800097432, - 0.08700751799915452, - 0.0937360409880057, - 0.0857782370003406, - 0.0053297070116968825, - 0.03063780500087887, - 0.11948714300524443, - 0.11949987600382883, - 0.021339929997338913, - 0.11998175400367472, - 0.12730122399807442, - 0.10639719599566888, - 0.1145076760003576, - 0.016454723008791916, - 0.01942053200036753, - 0.02575943300325889, - 0.026947548001771793, - 0.01858470700972248, - 0.04002144100377336, - 0.026603475998854265, - 0.09644762599782553, - 0.0837883780041011, - 0.026402029994642362, - 0.011108194012194872, - 0.07984850100183394, - 0.00788556098996196, - 0.03177034200052731, - 0.041331601998535916, - 0.023272036007256247, - 0.03060646599624306, - 0.016049757003202103, - 0.01563428100780584, - 0.02100818100734614, - 0.025830867001786828, - 0.37054560201067943, - 0.015689007996115834, - 0.015588844995363615, - 0.026372164007625543, - 0.025897453000652604, - 0.33069954899838194, - 0.04175330600992311, - 0.032129824001458474, - 0.02739857800770551, - 0.02095577699947171, - 0.0, - 0.02589014200202655, - 0.03588459400634747, - 0.02521880599670112, - 0.026108712001587264, - 0.05335027900582645, - 0.020735226993565448, - 0.005446445007692091, - 0.011457343003712595, - 0.021856359002413228, - 0.04117642799974419, - 0.026664712000638247, - 0.02320839700405486, - 0.033118385006673634, - 0.03076369600603357, - 0.02172036698902957, - 0.03176817200437654, - 0.0, - 0.01553273299941793, - 0.02104877600504551, - 0.0, - 0.031599982001353055, - 0.031195055998978205, - 0.010602037000353448, - 0.021149198000784963, - 0.026148618999286555, - 0.02730413000972476, - 0.011203115005628206, - 0.037409115000627935, - 0.010510331994737498, - 0.046190029999706894, - 0.03626077600347344, - 0.006539007008541375, - 0.0157044140069047, - 0.021756130998255685, - 0.04109210200840607, - 0.01636081600736361, - 0.02253586400183849, - 0.18626410499564372, - 0.036326727000414394, - 0.021509700003662147, - 0.010919035004917532, - 0.016091712008346803, - 0.02581619100237731, - 0.02477534400532022, - 0.005405577001511119, - 0.0, - 0.021607362999930046, - 0.006091549003031105, - 0.020721168009913526, - 0.02179735001118388, - 0.020790884998859838, - 0.015591004994348623, - 0.01567819200863596, - 0.02057896100450307, - 0.026099598995642737, - 0.021608851006021723, - 0.0, - 0.041869142005452886, - 0.025902780005708337, - 0.01679385598981753, - 0.016131390002556145, - 0.03640734999498818, - 0.03534467901044991, - 0.025627608003560454, - 0.020786219000001438, - 0.046365351998247206, - 0.0, - 0.03610482100339141, - 0.02068442500603851, - 0.026722214999608696, - 0.026176009007031098, - 0.025436741998419166, - 0.03073416200641077, - 0.016776880001998506, - 0.7478645600058371, - 0.015539055006229319, - 0.02695644299092237, - 0.0, - 0.011191705998498946, - 0.02132382898707874, - 0.03205594299652148, - 0.021064087995910086, - 0.0, - 0.026310296001611277, - 0.045247622998431325, - 0.02112677499826532, - 0.026035788003355265, - 0.0, - 0.030797380008152686, - 0.014794301008805633, - 0.01593650500581134, - 0.0, - 0.021977202006382868, - 0.02602251901407726, - 0.021361322011216544, - 0.011165920994244516, - 0.010632640987751074, - 0.010852369989152066, - 0.02122466500441078, - 0.024685753989615478, - 0.0, - 0.02243905000796076, - 0.0, - 0.010460734003572725, - 0.015989509993232787, - 0.010966687012114562, - 0.03455106700130273, - 0.026774095007567666, - 0.026925270998617634, - 0.011060000004363246, - 0.0, - 0.02124989399453625, - 0.01563163599348627, - 0.0206624630081933, - 0.0, - 0.026076081005157903, - 0.016048630990553647, - 0.02615807000256609, - 0.026212990007479675, - 0.0, - 0.026882802994805388, - 0.026359847994172014, - 0.020651288999943063, - 0.021276011000736617, - 0.026380457013146952, - 0.015547078000963666, - 0.031733887997688726, - 0.0, - 0.01649672600615304, - 0.011256790996412747, - 0.030838699007290415, - 0.010501831988221966, - 0.016302752002957277, - 0.010740847006672993, - 0.0, - 0.0, - 0.0, - 0.0, - 0.02100770598917734, - 0.025733039001352154, - 0.016366883006412536, - 0.021331800002371892, - 0.0, - 0.01558389600540977, - 0.20100414600165095, - 0.05008922600245569, - 0.02639128899318166, - 0.0, - 0.020774029995664023, - 0.0, - 0.031118810002226382, - 0.02071572899876628, - 0.020692360994871706, - 0.021060528000816703, - 0.03611154199461453, - 0.03616142399550881, - 0.020625717006623745, - 0.03110183301032521, - 0.0, - 0.021155408991035074, - 0.0, - 0.0, - 0.016281821008305997, - 0.015852149997954257, - 0.020827646003453992, - 0.0, - 0.0, - 1.3397562030004337, - 0.0, - 0.0, - 0.0, - 0.0, - 0.021487623002030887, - 0.047928867003065534, - 0.026655517998733558, - 0.03643542999634519, - 0.0, - 0.015589553004247136, - 0.03091365599539131, - 0.0, - 0.020937473003868945, - 0.024084857999696396, - 0.018543615995440632, - 0.03705884900409728, - 0.0, - 0.015283968008589, - 0.011398875998565927, - 0.03118210399406962, - 0.020871815009741113, - 0.045556276993011124, - 0.041956333006964996, - 0.03592829500848893, - 0.01044309300777968, - 0.01026017899857834, - 0.0, - 0.03621020799619146, - 0.010358362997067161, - 0.046009974990738556, - 0.02080324199050665, - 0.0, - 0.042532624007435516, - 0.015552819997537881, - 0.026286159991286695, - 0.03151013300521299, - 0.010825385004864074, - 0.0, - 0.0, - 0.0, - 0.02141154200944584, - 0.0, - 0.02560767499380745, - 0.020819095996557735, - 0.0, - 0.0, - 0.015886838009464554, - 0.0, - 0.0, - 0.02114221300871577, - 0.0, - 0.0, - 0.0, - 0.0, - 0.015725119999842718, - 0.0, - 0.03103463799925521, - 0.0, - 0.015651212990633212, - 0.011115308006992564, - 0.0, - 0.02079067200247664, - 0.0, - 0.0, - 0.0, - 0.026246592999086715, - 0.0, - 0.03145414299797267, - 0.021093664996442385, - 0.022473189994343556, - 0.015896269993390888, - 0.0, - 0.04114668800320942, - 0.034501031012041494, - 0.034164625001722015, - 0.020922908995999023, - 0.021149972002604045, - 0.036258000996895134, - 0.018122800000128336, - 0.02124312501109671, - 0.0, - 0.03125415999966208, - 0.010380152001744136, - 0.0, - 0.03660565899917856, - 0.016775989992311224, - 0.011085049001849256, - 0.0, - 0.010440479003591463, - 0.0, - 0.026232639997033402, - 0.02122125300229527, - 0.0, - 0.0, - 0.01668634099769406, - 0.021336263991543092, - 0.02759889399749227, - 0.0, - 0.0, - 0.02098034399386961, - 0.02190597499429714, - 0.025600256005418487, - 0.0, - 0.0, - 0.015888611000264063, - 0.02680986700579524, - 0.01063842000439763, - 0.022491367999464273, - 0.0, - 0.015728295998997055, - 0.0, - 0.015855220990488306, - 0.02088601099967491, - 0.015357801006757654, - 0.028156904998468235, - 0.026300596000510268, - 0.0, - 0.031879222005954944, - 0.021150175991351716, - 0.0, - 0.03609509998932481, - 0.025889254990033805, - 0.01075171199045144, - 0.0, - 0.0, - 0.0, - 0.032905939006013796, - 0.031078465006430633, - 0.0, - 0.021125375002156943, - 0.01646820300084073, - 0.0, - 0.0, - 0.036422543998924084, - 0.0362532459985232, - 0.0, - 0.011184563001734205, - 0.025784572993870825, - 0.0, - 0.0, - 0.0, - 0.0, - 0.04089742399810348, - 0.01571571199747268, - 0.030675397996674292, - 0.0, - 0.0, - 0.02152357499289792, - 0.032277522986987606, - 0.02670159899571445, - 0.03633444399747532, - 0.025484491998213343, - 0.020827196989557706, - 0.0, - 0.0, - 0.0, - 0.020701899004052393, - 0.021358199999667704, - 0.0311397309997119, - 0.028157314998679794, - 0.026059483992867172, - 0.0, - 0.03810199900181033, - 0.020800640995730646, - 0.02141752500028815, - 0.0, - 0.01560846099164337, - 0.016982448010821827, - 0.0, - 0.012466332002077252, - 0.015880767998169176, - 0.0, - 0.0, - 0.0, - 0.0, - 0.041311954002594575, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.03126601199619472, - 0.01098694400570821, - 0.015509837990975939, - 0.0, - 0.0, - 0.015886759996647015, - 0.0, - 0.04733684900566004, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.010479079006472602, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0015303449908969924, - 0.0, - 0.000667309999698773, - 0.0011312670103507116, - 0.0006712790054734796, - 0.0010211269982391968, - 0.001349300000583753, - 0.0035540810058591887, - 0.005712239013519138, - 0.006990141002461314, - 0.0052381790010258555, - 0.008055679994868115, - 0.0065711140050552785, - 0.003283138998085633, - 0.006611375996726565, - 0.005326276004780084, - 0.0016465399967273697, - 0.005134252001880668, - 0.006223761010915041, - 0.006174075999297202, - 0.0031299859983846545, - 0.0054511049966095015, - 0.004239354995661415, - 0.005303552999976091, - 0.005401456990512088, - 0.006583285998203792, - 0.01025042199762538, - 0.002900020990637131, - 0.004125569001189433, - 0.004204226992442273, - 0.0024280179932247847, - 0.005762264991062693, - 0.007111243001418188, - 0.012022362992865965, - 0.018217426986666396, - 0.012743312996462919, - 0.017025555003783666 - ], - "decode_latencies": [ - 0.00031362999288830906, - 0.00022614799672737718, - 0.0001488599955337122, - 0.00026380100462120026, - 0.0023620050051249564, - 0.001968734010006301, - 0.03755191700474825, - 2.2966996766626835e-05, - 0.007958611997310072, - 0.018710065007326193, - 0.0056052540021482855, - 0.020130278993747197, - 0.0068841739994240925, - 0.006276860993239097, - 0.0069547070015687495, - 0.04817758500576019, - 0.003005022997967899, - 0.07755273599468637, - 0.002089548986987211, - 0.012250528001459315, - 0.03200401000503916, - 0.007479173000319861, - 0.02016935299616307, - 0.008522772011929192, - 0.12675751899951138, - 0.00669490201107692, - 0.006926588001078926, - 0.002521854010410607, - 0.0012803900026483461, - 0.014393478995771147, - 0.006090206996304914, - 0.012258655988262035, - 0.008120925995172001, - 0.03715804799867328, - 0.00988623900047969, - 0.07124102700618096, - 0.015856173005886376, - 0.006517796995467506, - 0.013104626996209845, - 0.014244480000343174, - 0.014318366011139005, - 0.011958160990616307, - 0.0035417290055193007, - 0.01937165499839466, - 0.09877784799027722, - 0.013489789009327069, - 0.01237661499180831, - 0.006582078989595175, - 0.006303899004706182, - 0.04070681500888895, - 9.139599569607526e-05, - 0.032753546998719685, - 0.008745411993004382, - 0.02028437099943403, - 0.004086833010660484, - 0.008048147996305488, - 0.0032114240020746365, - 0.011557783000171185, - 0.02027184200414922, - 0.027325325994752347, - 0.013284030006616376, - 0.024963740987004712, - 0.018287181010236964, - 0.11077327300154138, - 0.030568030997528695, - 0.013840127998264506, - 0.015314397009206004, - 0.012563968004542403, - 0.0021627340029226616, - 0.002118571996106766, - 0.07138649599801283, - 0.03997275199799333, - 0.011776718994951807, - 0.016416095997556113, - 0.09890198599896394, - 0.007408145000226796, - 0.012434741991455667, - 0.012939252002979629, - 0.020777318000909872, - 0.002108479995513335, - 0.006905764996190555, - 0.006349331000819802, - 0.01647584099555388, - 0.010575805004918948, - 0.02039459499064833, - 0.015464462994714268, - 0.0013300150021677837, - 0.013299042999278754, - 0.0065659009997034445, - 0.0017536829982418567, - 0.021968028988339938, - 8.73160024639219e-05, - 0.11525731001165695, - 0.0681422229972668, - 0.002256063002278097, - 0.020934144995408133, - 0.0113853700022446, - 0.019791197002632543, - 0.005290180997690186, - 0.06949936800810974, - 0.015534713995293714, - 0.01485416101058945, - 0.021968424000078812, - 0.005480054998770356, - 0.005382914998335764, - 0.006588420990738086, - 0.00517667100939434, - 0.010417912009870633, - 0.09958461600763258, - 0.0015325669955927879, - 0.005197427002713084, - 0.013653660003910773, - 0.02644120699551422, - 0.00535065500298515, - 0.006422153004677966, - 0.02013679100491572, - 0.10722223199263681, - 0.007768330004182644, - 0.016942314003244974, - 0.014719053986482322, - 0.09118329000193626, - 0.2773326260066824, - 0.010508563005714677, - 0.010284616000717506, - 0.07126237801276147, - 0.005285110994009301, - 0.011140325994347222, - 0.010548844991717488, - 0.005258812001557089, - 0.0066887209977721795, - 0.06873775499116164, - 0.02040472999215126, - 0.0910046540084295, - 0.005140040011610836, - 0.01028514800418634, - 0.01063118599995505, - 0.010163409999222495, - 0.005293085006996989, - 0.015389530002721585, - 8.852299652062356e-05, - 3.8561003748327494e-05, - 0.02062128999386914, - 0.010341237997636199, - 0.007410118996631354, - 0.025836439002887346, - 0.0064551190007478, - 0.005187571994611062, - 0.015663285012124106, - 0.007506458990974352, - 0.010315011008060537, - 0.005272294991300441, - 0.02691048699489329, - 0.005275685005472042, - 0.010279202993842773, - 0.005307072991854511, - 0.005284054001094773, - 0.014469798989011906, - 0.010733859002357349, - 0.015488286997424439, - 0.012644560993066989, - 0.005649102997267619, - 0.005135522005730309, - 0.00531356199644506, - 0.007374737993814051, - 0.02074277400970459, - 0.0783384150126949, - 0.005210682997130789, - 0.005294635004247539, - 0.0001917930057970807, - 0.010355753998737782, - 0.020595484995283186, - 0.010508958002901636, - 0.005167709998204373, - 0.010287922996212728, - 0.015488707009353675, - 0.005202804008149542, - 0.005347022000933066, - 0.005120280999108218, - 0.005201292995479889, - 0.006260095004108734, - 0.005438378997496329, - 0.0003670409932965413, - 0.011038083001039922, - 0.010476186012965627, - 0.005311059998348355, - 0.005561792000662535, - 0.005160329994396307, - 0.010308780998457223, - 0.005173416997422464, - 0.010980211009155028, - 0.005245253996690735, - 0.00546857099107001, - 0.010362827000790276, - 0.015641471996787004, - 0.015304629007005133, - 0.005178621009690687, - 0.021189615988987498, - 0.005259458994260058, - 0.010227214996120892, - 0.010343260990339331, - 0.005159412990906276, - 0.02124291899963282, - 0.01038369799789507, - 0.020603326003765687, - 0.005310456996085122, - 0.020489360991632566, - 0.005691436002962291, - 0.005163383990293369, - 0.0051015470089623705, - 0.01041366699791979, - 0.025866914002108388, - 0.015472296989173628, - 0.005225233995588496, - 0.005194407989620231, - 0.0051638019940583035, - 0.0051655400020536035, - 0.01046620900160633, - 0.01041462499415502, - 0.010410640010377392, - 0.005578726006206125, - 0.0051074340008199215, - 0.010544100005063228, - 0.015580941995722242, - 0.005590916989604011, - 0.005300478005665354, - 0.01955804499448277, - 0.005250059999525547, - 0.010478917000000365, - 0.0159756069915602, - 0.0005103809962747619, - 0.01032196199230384, - 0.010879499008296989, - 0.005140651002875529, - 0.011072774010244757, - 0.03571528599422891, - 0.005160175001947209, - 0.010483051999472082, - 0.005239803998847492, - 0.03302687899849843, - 0.005569713990553282, - 0.010370870004408062, - 0.010742482001660392, - 0.010312432001228444, - 0.010303916002158076, - 0.010681740008294582, - 0.005179750005481765, - 0.015327211003750563, - 0.025609971999074332, - 0.010384231005446054, - 0.010442607002914883, - 0.015874693999649025, - 0.010271695005940273, - 0.015279151004506275, - 0.015759759000502527, - 0.005623581004329026, - 0.0052085589995840564, - 0.005149873992195353, - 0.01411910800379701, - 0.021000966997235082, - 0.010329680007998832, - 0.005177151004318148, - 0.015838981009437703, - 0.005394626990891993, - 0.00512474900460802, - 0.005460391999804415, - 0.015451870000106283, - 0.010466637002537027, - 0.010437570002977736, - 0.00522591698972974, - 0.010310879995813593, - 0.020571896006003954, - 0.005115160995046608, - 0.015545849993941374, - 0.015570987001410685, - 0.0012612759892363101, - 0.017825196002377197, - 0.00012859700655099005, - 0.01116674799413886, - 0.00015580799663439393, - 0.005232237002928741, - 0.00527061200409662, - 0.005326116006472148, - 0.011189470009412616, - 0.01029055799881462, - 0.006105929001932964, - 0.005152468002052046, - 0.015309012000216171, - 0.010288714009220712, - 0.0204511560004903, - 0.005149962002178654, - 0.005152187994099222, - 0.005213971991906874, - 0.005240813989075832, - 0.010829696999280713, - 0.016418649989645928, - 0.012891045000287704, - 0.016042083996580914, - 0.005325926002115011, - 0.010281252005370334, - 0.020614014996681362, - 0.015335485004470684, - 0.005114929997944273, - 0.015465459990082309, - 0.0052466579945757985, - 0.005433529004221782, - 0.010332795995054767, - 0.020570407999912277, - 0.005180038002436049, - 0.010335771003155969, - 0.0054306649981299415, - 0.010358690007706173, - 0.005140780995134264, - 0.015396479997434653, - 0.014322083996376023, - 0.015357686002971604, - 0.0052284609992057085, - 0.02563014000770636, - 0.005142178008100018, - 0.005241266990196891, - 0.005209064009250142, - 4.684500163421035e-05, - 0.01020526300999336, - 0.010635888000251725, - 0.01026615800219588, - 0.010368887000367977, - 0.0104358289972879, - 0.005219497994403355, - 0.010157209006138146, - 0.01646524399984628, - 0.02046262999647297, - 0.005531568007427268, - 0.005255929994746111, - 0.01609925700176973, - 0.0004607829905580729, - 0.0002233989944215864, - 0.010506954989978112, - 0.010317204010789283, - 0.0051619180012494326, - 0.021457279988680966, - 0.01547584000218194, - 0.010285962009220384, - 0.010324517992557958, - 0.0051314050069777295, - 0.00514769900473766, - 0.015578963997540995, - 0.005186477006645873, - 0.00027614200371317565, - 0.01042108099500183, - 9.515199053566903e-05, - 0.015509560995269567, - 0.015293864998966455, - 0.010752449001302011, - 0.025446783009101637, - 0.005179605010198429, - 0.005177194005227648, - 0.010449352994328365, - 0.0051161159935873, - 0.019515270992997102, - 0.010424349995446391, - 0.005768489005276933, - 0.005239846999756992, - 0.010449223991599865, - 0.010364551999373361, - 0.01032450600177981, - 0.010524329001782462, - 0.00518726599693764, - 0.015616992008290254, - 0.005143324000528082, - 0.00516119999520015, - 0.010485054008313455, - 0.011322285994538106, - 0.01028151999344118, - 0.015334517011069693, - 0.01537102299334947, - 0.005378147994633764, - 0.010345565009629354, - 0.009262727995519526, - 0.010417192010208964, - 0.0054291909909807146, - 0.010536339992540888, - 0.016984984002192505, - 0.0052115880098426715, - 0.016071603997261263, - 0.013724989999900572, - 0.010336496998206712, - 0.010616425002808683, - 0.005170245000044815, - 0.005288325002766214, - 0.12297682100324892, - 0.0053383080085041, - 0.01527940999949351, - 0.00560858599783387, - 0.015376845010905527, - 0.005192355005419813, - 0.01047875199583359, - 0.025474485009908676, - 0.010566695011220872, - 0.010398180995252915, - 0.010162778999074362, - 0.015535008991719224, - 0.01036563799425494, - 0.01030946199898608, - 0.025934472985682078, - 0.005201507010497153, - 0.01176005600427743, - 0.010343383008148521, - 0.005099342000903562, - 0.010234772998956032, - 0.015588738999213092, - 0.011008959001628682, - 0.005313811998348683, - 0.010289977988577448, - 0.010681590996682644, - 0.005106092998175882, - 0.005232206007349305, - 0.005240547005087137, - 0.005160022003110498, - 0.015552720011328347, - 0.010983611005940475, - 0.0051891839975724, - 0.005173289988306351, - 0.010136842000065371, - 0.005257719007204287, - 0.005149941993295215, - 0.016380877001211047, - 0.006768079998437315, - 0.005249395995633677, - 0.021781987001304515, - 0.0051087049941997975, - 0.0051784099923679605, - 0.005458789994008839, - 0.016214115006732754, - 0.005681844995706342, - 0.010336512990761548, - 0.005251814000075683, - 0.005137720989296213, - 0.010558804002357647, - 0.005136879000929184, - 0.010881773006985895, - 0.005126162999658845, - 0.010454078001203015, - 0.0065943929948844016, - 0.010444519008160569, - 0.005164261005120352, - 0.005109360005008057, - 0.010399646998848766, - 0.02049479501147289, - 0.00012468300701584667, - 0.01016064100258518, - 0.0108778409921797, - 0.011715339991496876, - 0.005158974992809817, - 0.005221017010626383, - 0.011897224001586437, - 0.010334568010875955, - 0.005170705000637099, - 0.005270844005281106, - 4.921801155433059e-05, - 0.005105111005832441, - 0.025755923008546233, - 0.005286523009999655, - 0.010498381001525559, - 0.010402454994618893, - 0.010230006999336183, - 0.011050350993173197, - 0.00017581800057087094, - 0.01546569999482017, - 0.005149085001903586, - 0.005229595000855625, - 0.006293263999396004, - 0.005507290989044122, - 0.0007095030014170334, - 0.00021435300004668534, - 0.010451710011693649, - 0.01564729100209661, - 0.0051617059943964705, - 0.02067437100049574, - 0.01015061599900946, - 0.005129620010848157, - 0.016044879987020977, - 0.00010062199726235121, - 0.01039750300697051, - 0.005349331011530012, - 0.010331461002351716, - 0.010318615997675806, - 0.0052251560118747875, - 0.010391254996648058, - 0.015738882997538894, - 0.005117217995575629, - 0.015434021013788879, - 0.005160310000064783, - 0.010172869006055407, - 0.012619025990716182, - 0.02564902500307653, - 0.005160732995136641, - 0.0051438279915601015, - 0.010275737004121765, - 0.010520935989916325, - 0.005142346999491565, - 0.01529075599682983, - 0.005158029991434887, - 0.015728957994724624, - 0.015506320996792056, - 0.01612230099271983, - 0.005585434002568945, - 0.001084703006199561, - 0.001048884994816035, - 0.0003575139999156818, - 0.0051526890019886196, - 0.0006633419980062172, - 0.000535054990905337, - 0.002157612005248666, - 0.00015372400230262429, - 0.005177465005544946, - 4.869599069934338e-05, - 0.00016117800259962678, - 0.005116509011713788, - 0.0010194339993176982, - 0.002851166995242238, - 0.010546956997131929, - 0.003093535007792525, - 0.003287223997176625, - 0.0012444309977581725, - 0.0036540909932227805, - 0.007326468999963254, - 0.0023179609997896478, - 0.0013457379973260686, - 0.0037219249934423715, - 0.001564570004120469, - 0.003802071005338803, - 0.002540273009799421, - 0.00614474099711515, - 0.0035235060058766976, - 0.0020848749991273507, - 0.0030761569942114875, - 0.00011049800377804786, - 0.0032840469939401373, - 0.0016655210056342185, - 0.0020146689930697903, - 0.0024495959951309487, - 0.0011240029998589307, - 0.0008108299953164533, - 0.000915138007258065, - 0.001446916998247616, - 0.00281543200253509, - 0.0047370859974762425, - 0.007937453003250994, - 0.005079669994302094, - 0.008126504995743744 - ], - "multi_turn_cache_hits": 63, - "multi_turn_cache_misses": 320, - "seed": 42, - "summary": { - "total_requests": 549, - "total_tokens": 148297, - "elapsed_time": 53.61261463165283, - "avg_throughput_tokens_per_sec": 2766.0840833613365, - "requests_per_second": 10.240127323987496, - "end_to_end_latency_ms": { - "mean": 25717.335976189104, - "p50": 26512.057553991326, - "p95": 54093.905819999054, - "p99": 54182.24200263969 - }, - "storage_io_latency_ms": { - "mean": 155.5087897563805, - "p50": 100.78945102577563, - "p95": 462.4629199854099, - "p99": 1069.820933863516 - }, - "generation_latency_ms": { - "mean": 0.0, - "p50": 0.0, - "p95": 0.0, - "p99": 0.0 - }, - "cache_stats": { - "cache_hit_rate": 0.9268128161888701, - "cache_hits": 5496, - "cache_misses": 434, - "gpu_entries": 352, - "cpu_entries": 4, - "nvme_entries": 77, - "gpu_memory_used_gb": 6.3973388671875, - "cpu_memory_used_gb": 6.3485107421875, - "offloads_cpu": 81, - "offloads_nvme": 77, - "storage_health": { - "overall_status": "PASS", - "criteria": [ - { - "name": "Cache Hit Rate > 30%", - "target": 0.3, - "actual": 0.9268128161888701, - "unit": "ratio", - "passed": true - } - ], - "passed_count": 1, - "total_count": 1 - }, - "prefill_writes": 433, - "decode_reads": 5496, - "prefill_bytes_written_gb": 7.8096923828125, - "decode_bytes_read_gb": 100.0064697265625, - "system_prompt_hits": 1182, - "common_phrase_hits": 0, - "user_cache_hits": 4251, - "multi_turn_hits": 63, - "total_read_bytes": 107381129216, - "total_write_bytes": 8385593344, - "total_read_gb": 100.0064697265625, - "total_write_gb": 7.8096923828125, - "read_write_ratio": 12.80543007643372, - "read_iops": 5496, - "write_iops": 433, - "gpu_read_p50_ms": 10.15951250155922, - "gpu_read_p95_ms": 30.929795244446723, - "gpu_read_p99_ms": 103.41269050040877, - "gpu_write_p50_ms": 22.43905000796076, - "gpu_write_p95_ms": 119.69262720376717, - "gpu_write_p99_ms": 196.28733287972875 - }, - "qos_metrics": { - "interactive": { - "total_requests": 549, - "latency_ms": { - "mean": 25717.335976189108, - "p50": 26512.057553991326, - "p95": 54093.905819999054, - "p99": 54182.24200263969, - "max": 56380.503844993655 - }, - "sla": { - "target_p95_ms": 50, - "actual_p95_ms": 54093.905819999054, - "compliance": 0.0, - "met": false - } - }, - "responsive": { - "no_data": true - }, - "batch": { - "no_data": true - } - }, - "prefix_cache_stats": { - "prefix_hits": 114, - "prefix_misses": 435, - "system_prompt_reuse": 114, - "common_phrase_reuse": 0, - "bytes_saved": 95289344 - }, - "autoscaling_stats": [], - "autoscaling_summary": null, - "multi_turn_stats": { - "cache_hits": 63, - "cache_misses": 320, - "hit_rate": 0.16449086161879894 - } - } -} \ No newline at end of file diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_cpu_trial2.json b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_cpu_trial2.json deleted file mode 100644 index 3949e0af..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_cpu_trial2.json +++ /dev/null @@ -1,2885 +0,0 @@ -{ - "requests_completed": 549, - "total_tokens_generated": 146891, - "total_storage_io_latency": 85.381408638641, - "total_generation_latency": 0.0, - "end_to_end_latencies": [ - 0.16409367299638689, - 0.2268519049976021, - 0.22715805999177974, - 0.22750141700089443, - 0.26824789500096813, - 0.28057444900332484, - 0.2868863180046901, - 0.36101435699674767, - 0.36094108200632036, - 0.39299531099095475, - 0.4092978230037261, - 0.5438222009979654, - 0.5507013019960141, - 0.563007014003233, - 0.576643199994578, - 0.5800187580025522, - 0.5858231440070085, - 0.6050003679993097, - 0.6097024839982623, - 0.6098238450067583, - 0.6323676030006027, - 0.6374471720046131, - 0.652128907997394, - 0.6530661569995573, - 0.6530128030135529, - 0.6673160920036025, - 0.6674593030038523, - 0.6673988919937983, - 0.6761061119905207, - 0.6810225969966268, - 0.6888001860061195, - 0.6877812949969666, - 0.6904179229895817, - 0.6906978040060494, - 0.7022138330066809, - 0.7025707770080771, - 0.7041262330021709, - 0.7054235389950918, - 0.7086738370126113, - 0.7197619520011358, - 0.725515797996195, - 0.732747299989569, - 0.7342058240028564, - 0.733445538993692, - 0.7339703699981328, - 0.7355673629936064, - 0.7348996410000836, - 0.7375822829926619, - 0.8188414330070373, - 0.8310909139981959, - 0.8375968460022705, - 0.8386657000082778, - 0.8447310390038183, - 0.9198605139972642, - 0.9211990979965776, - 0.9209230029955506, - 0.9221596120041795, - 0.9245044359995518, - 0.9301425629964797, - 0.930220680005732, - 0.9339401300094323, - 0.9320329979964299, - 0.9386224189947825, - 0.9397764370078221, - 0.95714794700325, - 0.9593410019879229, - 0.9751088929915568, - 0.975078768999083, - 0.978241845004959, - 0.9802480090002064, - 0.9818787520052865, - 0.9854285420005908, - 0.9857305530022131, - 1.053772472005221, - 1.0612564510083757, - 1.063739267992787, - 1.0814009759924375, - 1.0859417519968702, - 1.0871586180001032, - 1.1625959719967796, - 1.2365197889885167, - 1.2472092870011693, - 1.2508162739977706, - 1.2980631529935636, - 1.4413266769988695, - 1.4838878919981653, - 2.0381255320098717, - 2.0928995550057152, - 2.1548304290045053, - 2.262782262012479, - 2.294057398001314, - 2.5000401370052714, - 2.558471191005083, - 2.749943657996482, - 2.7757685799879255, - 2.7767761839932064, - 2.8750721249962226, - 3.3171676600031788, - 3.5125721450021956, - 3.512411981006153, - 3.513992838008562, - 3.6331490079901414, - 3.7482659139932366, - 3.9602521220076596, - 4.188655218997155, - 4.2991980310034705, - 4.311281771006179, - 4.517781927002943, - 4.601607496995712, - 4.6828367710113525, - 4.775840023008641, - 5.073990464006783, - 5.342311874002917, - 5.398327520000748, - 5.427743298001587, - 5.4927941110072425, - 5.527094292003312, - 5.557486927005812, - 5.5683983380004065, - 5.578847709010006, - 5.698473534997902, - 5.7492549829912605, - 6.1338546039914945, - 6.190849470993271, - 6.25828983100655, - 6.341488074991503, - 6.368065984002897, - 6.466416424998897, - 6.573601828000392, - 6.824348038993776, - 6.824559791013598, - 6.865028726009768, - 6.918445493996842, - 6.941887478998979, - 6.993844109994825, - 7.449187132995576, - 7.769064305990469, - 7.903193480000482, - 7.920530559000326, - 7.941042413003743, - 7.955755897011841, - 8.018007115009823, - 8.044673300988507, - 8.044535277993418, - 8.169536469999002, - 8.188878337008646, - 8.321915677996003, - 8.351795942988247, - 8.382366767997155, - 8.551776476000668, - 8.59275938400242, - 8.610237019995111, - 8.624076664011227, - 8.663455696005258, - 9.27089034000528, - 9.30587547000323, - 9.403685543991742, - 9.693810983008007, - 9.904245729005197, - 9.96680758499133, - 9.987993089001975, - 10.183819309007959, - 10.257253827992827, - 10.388586634988314, - 10.599833002997912, - 10.604529404998175, - 11.312944952005637, - 11.323801688005915, - 11.467932557003223, - 11.478697508995538, - 11.516923332004808, - 11.61843446000421, - 11.655274532997282, - 11.695690024993382, - 12.15568927100685, - 12.239885389004485, - 12.386546836001799, - 12.748848015005933, - 12.856211786012864, - 12.880788850001409, - 12.892586824993487, - 13.08982735099562, - 13.80848118699214, - 13.835823464003624, - 13.908749162990716, - 13.94821081199916, - 14.023978992001503, - 14.08007184600865, - 14.121359730997938, - 14.142678847987554, - 14.193183577008313, - 14.200064138000016, - 14.405660162999993, - 14.472136317999684, - 14.537653151986888, - 14.660302239994053, - 14.72787806398992, - 14.752593178011011, - 14.838463241991121, - 15.02305442999932, - 15.04291383898817, - 15.234951263002586, - 15.286905995992129, - 15.581523033994017, - 15.690922883004532, - 16.012843625998357, - 16.012692882999545, - 16.05383751000045, - 16.063919980006176, - 17.017714781992254, - 17.27469315099006, - 17.332421343991882, - 17.51584293101041, - 17.524438730994007, - 17.573776525998255, - 17.706464959002915, - 17.871283614993445, - 18.039310021995334, - 18.076051338008256, - 18.076942311003222, - 18.07660572999157, - 18.10400861299422, - 18.149953965010354, - 18.17279032699298, - 18.224128437999752, - 18.276014827992185, - 18.543588638000074, - 18.576419390999945, - 18.614444086008007, - 18.81628381300834, - 18.833078591997037, - 18.92259629399632, - 19.02052077499684, - 19.141131160999066, - 19.144581279993872, - 19.16624086200318, - 19.201355858996976, - 19.207589637007914, - 19.349458760989364, - 19.399800083003356, - 19.556755145007628, - 19.83782796098967, - 19.879799293004908, - 19.92145002000325, - 20.00042803499673, - 21.313112160001765, - 21.33015867001086, - 21.370266067999182, - 21.448394471997744, - 21.55237038299674, - 21.939533319004113, - 21.975767399999313, - 22.01774381600262, - 22.124130851007067, - 22.183841867998126, - 22.37358071800554, - 22.609647987002973, - 22.794871304009575, - 22.827403553994372, - 22.85976110099, - 23.081543472988415, - 23.115707043994917, - 23.11803423499805, - 23.170545440996648, - 23.16996941299294, - 23.19433316400682, - 23.197513971012086, - 23.22595353399811, - 23.251140902008046, - 23.250819883993245, - 23.30839946299966, - 23.453258997004014, - 23.453095125994878, - 23.803084030994796, - 23.836146176006878, - 23.84727045299951, - 23.85729974300193, - 23.900177332005114, - 23.96476086700568, - 24.145387730008224, - 24.157110354004544, - 24.19690924999304, - 24.214984311998705, - 24.282632211004966, - 24.47843056099373, - 24.539094905005186, - 24.5973017889919, - 26.15426782600116, - 26.16905107301136, - 26.184585687005892, - 26.444872514010058, - 26.633676710000145, - 26.686645737005165, - 26.808108762998017, - 26.95166430399695, - 27.02392291299475, - 27.163506770011736, - 27.215295401998446, - 27.224827288999222, - 27.312654556997586, - 27.620887757002492, - 27.638368628002354, - 27.687427397991996, - 27.733625958004268, - 27.804588507002336, - 27.849823700991692, - 27.850338653006474, - 27.927614717002143, - 27.944148977010627, - 27.959216105009546, - 28.03279170699534, - 28.043180996988667, - 28.065410517010605, - 28.19022840099933, - 28.193306901986944, - 28.219578307005577, - 28.24774281399732, - 28.426757348002866, - 28.490020363999065, - 28.547635551003623, - 28.828303207003046, - 28.858697610005038, - 28.889484979008557, - 28.900001273010275, - 29.149990793011966, - 29.49749766998866, - 29.4980088769953, - 29.670473856996978, - 29.79084982999484, - 29.853735759999836, - 29.890266941991285, - 29.891734781995183, - 29.895482792999246, - 29.97692619400914, - 29.980629839992616, - 30.021445051999763, - 30.139680621010484, - 30.172924105994753, - 30.22297002898995, - 30.346394431006047, - 32.31326451500354, - 32.384159935987554, - 32.40683705599804, - 32.49515454399807, - 32.53153803400346, - 32.53749562299345, - 32.717316323993145, - 32.87047430599341, - 32.99934702999599, - 33.14140464400407, - 33.1515515659994, - 33.16822068300098, - 33.32589290601027, - 33.33596162901085, - 33.36143197400088, - 33.458752204998746, - 33.507251130999066, - 33.61005248199217, - 33.865656048001256, - 33.94358854499296, - 34.17635657500068, - 34.454154664010275, - 34.49494296799821, - 34.54499343300995, - 34.61699054100609, - 34.62756033601181, - 34.67347871999664, - 34.71061224301229, - 34.74450581400015, - 34.92995620500005, - 34.93087654700503, - 34.94725527901028, - 34.99492178700166, - 35.01542060299835, - 35.09722206099832, - 35.13383564200194, - 35.18958170799306, - 35.330020408000564, - 35.35735681599181, - 35.38251569800195, - 35.422781007990125, - 35.50080721100676, - 35.552232627000194, - 35.598218326995266, - 35.681405070004985, - 35.74287805300264, - 35.92826890399738, - 35.928097083000466, - 36.03048103899346, - 36.08880724100163, - 36.10380519600585, - 36.55672233000223, - 36.72854824500973, - 36.78104832700046, - 36.811540976006654, - 36.826959145008004, - 36.84294480200333, - 36.969297101997654, - 37.02733323699795, - 37.138164159012376, - 37.14822147801169, - 37.20399306800391, - 37.25338674899831, - 37.40120982901135, - 37.422894221002935, - 37.52097852999577, - 37.674953909008764, - 37.736663319999934, - 40.00206438200257, - 40.01338632800616, - 40.120026123011485, - 40.146706667001126, - 40.270949272002326, - 40.30809306500305, - 40.432892404001905, - 40.489436728006694, - 40.6377628580085, - 40.64599640200322, - 40.831679804992746, - 40.86167600000044, - 40.89379556500353, - 40.98096349000116, - 41.01204934199632, - 41.09212927699264, - 41.18000416699215, - 41.22756285200012, - 41.301848895003786, - 41.442413962999126, - 41.558681524998974, - 41.64503808099835, - 42.07220864499686, - 42.197445597994374, - 42.218839467997896, - 42.42106242400769, - 42.46235437999712, - 42.52961095399223, - 42.69370988800074, - 42.82476735999808, - 42.86203419200319, - 42.95500106101099, - 42.97002359101316, - 42.980039479007246, - 43.136555528995814, - 43.321995481994236, - 43.341280328008, - 43.362657877994934, - 43.39824134699302, - 43.43036598300387, - 43.49349191499641, - 43.51276737099397, - 43.73326253199775, - 43.74200890799693, - 43.795758485997794, - 43.9295204700029, - 43.93942873799824, - 44.31664154000464, - 44.3572324179986, - 44.38668602799589, - 44.56635741100763, - 44.57973481999943, - 44.69400429300731, - 44.793752388999565, - 44.85323765600333, - 45.073721526001464, - 45.2016348259931, - 45.22826941900712, - 45.37592918101291, - 45.437991589002195, - 45.46674408200488, - 45.51807784099947, - 45.65814980100549, - 45.76891772799718, - 45.78197875499609, - 46.07752867198724, - 46.15742149099242, - 46.17107173601107, - 46.19155070598936, - 46.46544733400515, - 46.5479381030018, - 46.76130136499705, - 49.367786501999944, - 49.58270675499807, - 49.795696267989115, - 49.970533402010915, - 49.995967852999456, - 50.01738012100395, - 50.07390699100506, - 50.45906960200227, - 50.77375077700708, - 50.87638610199792, - 51.26219087000936, - 51.3556336699985, - 51.46092317200964, - 51.49693734399625, - 51.554585240999586, - 51.55540994100738, - 51.69693361398822, - 51.696554281006684, - 51.82136188700679, - 51.841607872003806, - 51.85057987298933, - 51.851156203003484, - 51.8513636150019, - 51.865710947997286, - 51.866273974010255, - 51.86675023699354, - 51.866913531004684, - 51.870328182994854, - 51.87249131500721, - 51.87239940799191, - 51.87275403199601, - 51.874073068989674, - 51.881925364010385, - 51.88201203101198, - 51.89367728900106, - 51.90208213799633, - 51.90254223199736, - 51.910124371002894, - 51.91036575600447, - 51.910873716988135, - 51.91086321401235, - 51.91858798600151, - 51.926546953996876, - 51.92648496400216, - 51.92815412201162, - 51.93598605701118, - 51.936787226994056, - 51.942909799006884, - 51.94470574200386, - 51.9530821859953, - 51.952103331990656, - 51.95911879900086, - 51.95931310299784, - 51.96216569299577, - 51.96248484699754, - 51.96317797800293, - 51.96148266000091, - 51.96555222400639, - 51.96732260500721, - 51.973278995996225, - 51.99055167598999, - 51.99877455898968, - 52.056904162003775, - 52.13005212400458, - 52.133113168005366, - 52.157176056993194, - 52.26597882898932, - 52.522648052996374, - 52.88614732699352 - ], - "storage_latencies": [ - 0.12254870198376011, - 0.029056816987576894, - 0.0950427799980389, - 0.1475669400242623, - 0.041618532995926216, - 0.1430664719810011, - 0.08919935602170881, - 0.049393113018595614, - 0.005963158022495918, - 0.21610818500630558, - 0.08957059099338949, - 0.09645204200933222, - 0.1750727049948182, - 0.07717971099191345, - 0.11376887799997348, - 0.1722568319964921, - 0.2503966560034314, - 0.33683619598741643, - 0.14441772797727026, - 0.14808185899164528, - 0.2570271399890771, - 0.031305699012591504, - 0.2253890260180924, - 0.2508012539765332, - 0.21092980101821013, - 0.2920971550338436, - 0.36152479601150844, - 0.17601750999165233, - 0.3023318410268985, - 0.18328944100358058, - 0.31471678201342, - 0.01986534301249776, - 0.1979273830074817, - 0.2010259430098813, - 0.0555171110027004, - 0.1660623080097139, - 0.18048644198279362, - 0.24762642898713239, - 0.19944588700309396, - 0.19612558899098076, - 0.33096790401032194, - 0.2636996929941233, - 0.35194316299748607, - 0.11201802098366898, - 0.051991910018841736, - 0.1824887660332024, - 0.1998512819991447, - 0.24903794401325285, - 0.05504106599255465, - 0.047383097000420094, - 0.08860238998022396, - 0.23223637000774033, - 0.041489596987958066, - 0.3299026659806259, - 0.32329627498984337, - 0.19636236201040447, - 0.20128501502040308, - 0.2800637529871892, - 0.11001621399191208, - 0.028635226990445517, - 0.476721059952979, - 0.15042572499078233, - 0.1979184409865411, - 0.16434846601623576, - 0.09568078599113505, - 0.4489634960045805, - 0.5850290090456838, - 0.3924781589739723, - 0.10943037299148273, - 0.31866404401080217, - 0.37004678999073803, - 0.15824558601889294, - 0.03375914700154681, - 0.4377069770125672, - 0.0346145660005277, - 0.23080244300945196, - 0.5526746200193884, - 0.43139688996598125, - 0.16510495502734557, - 0.5450845760642551, - 0.6563020610192325, - 0.6791778769838857, - 0.184443003978231, - 0.4216407200001413, - 0.344479510007659, - 0.4355970049946336, - 0.533058731991332, - 0.1400243469834095, - 0.3580173780064797, - 0.22213808698870707, - 0.27875529197626747, - 0.19532382999022957, - 0.45889950000855606, - 0.6198471779644024, - 0.13938398701429833, - 0.540000812994549, - 0.2658283100317931, - 0.14117120899027213, - 0.24382373398111667, - 0.199142186975223, - 0.5126506980304839, - 0.6119949950079899, - 0.23797733600076754, - 0.17470786902413238, - 0.05303007998736575, - 0.37954620200616773, - 0.705582487033098, - 0.12019802701252047, - 0.499345620002714, - 0.36016590901999734, - 0.11777193401940167, - 0.41201926200301386, - 0.29947553100646473, - 0.02086145400244277, - 0.36103305903088767, - 0.46292501493007876, - 0.43305687901738565, - 0.31180576504266355, - 0.0214239689958049, - 0.05290244000207167, - 0.03706711300765164, - 0.026552335009910166, - 0.3514900679583661, - 0.4508382630010601, - 0.031070913988514803, - 0.046933945995988324, - 0.14235822702175938, - 0.047361968987388536, - 0.15843893897545058, - 0.36217459200997837, - 0.45942039499641396, - 0.036265710994484834, - 0.1088566459948197, - 0.16640309698414057, - 0.23009964401717298, - 0.26994224498048425, - 0.34183643604046665, - 0.04174315601994749, - 0.7312605189945316, - 0.06788163098099176, - 0.0917577530053677, - 0.030839637998724356, - 0.38285048498073593, - 0.15022196296195034, - 0.15020026595448144, - 0.010463346014148556, - 0.4410445579851512, - 0.13138623101986013, - 0.6124011089850683, - 0.11552405098336749, - 0.032385823025833815, - 0.30084526202699635, - 0.05729249100841116, - 0.12407804101530928, - 0.29193665100319777, - 0.0412884590041358, - 0.37774315700517036, - 0.1509769160184078, - 0.2139612600003602, - 0.06251770902599674, - 0.07858497998677194, - 0.09365395799977705, - 0.057695624011103064, - 0.10426562400243711, - 0.12318819500796963, - 0.0990492659911979, - 0.6465832350222627, - 0.12422083097044379, - 0.04721977701410651, - 0.06384901200362947, - 0.046972909025498666, - 0.046876862019416876, - 0.02567998399899807, - 0.062464477989124134, - 0.08429801896272693, - 0.07295590102148708, - 0.07243417002609931, - 0.264697652994073, - 0.27394475297478493, - 0.06235442598699592, - 0.12571665302675683, - 0.13152707301196642, - 0.8072025080182357, - 0.10557445397716947, - 0.08237878397630993, - 1.0490331950277323, - 0.08458842302206904, - 0.041916128000593744, - 0.03675193699018564, - 0.07954968200647272, - 0.0052935860003344715, - 0.11707924200163689, - 0.057983825987321325, - 0.04666118600289337, - 0.22290176402020734, - 0.11895150900818408, - 0.11517169697617646, - 0.03132640500552952, - 0.26166299007309135, - 0.10084244699100964, - 0.04731605500273872, - 0.09009195998078212, - 0.11114907503360882, - 0.0881325919617666, - 0.20280808700772468, - 0.12179240997647867, - 0.11440630498691462, - 0.06812901998637244, - 0.0992077759874519, - 0.32455058104824275, - 0.03928392498346511, - 0.1449908719951054, - 0.1927602579962695, - 0.06812585801526438, - 0.24499996603117324, - 0.08887264800432604, - 0.10038428600819316, - 0.9419768130028388, - 0.09585444300319068, - 0.13371030398411676, - 0.031581294009811245, - 0.09485523999319412, - 0.06794185200124048, - 0.12417068897048011, - 0.07834153497242369, - 0.03698265299317427, - 0.07035083598748315, - 0.09385754997492768, - 0.1977662619756302, - 0.1245147610316053, - 1.0761945630511036, - 0.10808224095671903, - 0.08438339400163386, - 0.057926515975850634, - 0.042110294991289265, - 0.10442917600448709, - 0.03135535100591369, - 0.11111892999906559, - 0.11484022001968697, - 0.11504695603798609, - 0.1673128349793842, - 0.04775685899949167, - 0.0478610259888228, - 0.12405767300515436, - 0.19421471501118504, - 0.09580987298977561, - 0.12638376098766457, - 0.08877945000131149, - 0.10481365802115761, - 0.10978989703289699, - 0.09380196600977797, - 0.1289660959446337, - 0.18325091202859767, - 0.06726130400784314, - 0.23370787197200116, - 0.020750125011545606, - 0.11019699698954355, - 0.15691915001661982, - 0.08778467400406953, - 0.036894199016387574, - 0.05256889398151543, - 0.2963476680452004, - 0.06827492798038293, - 0.17418456097948365, - 0.04650341800879687, - 0.12212783699214924, - 0.008151940986863337, - 0.07967226198525168, - 0.04764678502397146, - 0.03087382200465072, - 0.17307527497177944, - 0.03635342400230002, - 0.05268732100375928, - 0.072069494985044, - 0.11631013700389303, - 0.13548308299505152, - 0.08401097702153493, - 0.12144855300721247, - 0.10500899502949324, - 0.02611732599325478, - 0.06323329400038347, - 0.01536665299499873, - 0.14465912200103048, - 0.18245127401314676, - 0.2594858440425014, - 0.0786745100049302, - 1.0933327069942607, - 0.0639368870324688, - 0.05135621200315654, - 0.26710470695979893, - 0.10004788100195583, - 0.08394620900799055, - 0.1455358279927168, - 0.09130835899850354, - 0.025708789995405823, - 0.29961284901946783, - 0.1083943970297696, - 0.08910338298301212, - 0.07476921701163519, - 0.07765958801610395, - 0.1634417529712664, - 0.1702290929388255, - 0.1394513999694027, - 0.08313280998845585, - 0.0534885190136265, - 0.005188241004361771, - 0.06758880500274245, - 0.045728967001196, - 1.4191723270050716, - 0.030723390998900868, - 0.0679461399995489, - 0.050142406980739906, - 0.10598222799308132, - 0.19829929601110052, - 0.02630626100290101, - 0.05393074399034958, - 0.1797704059863463, - 0.13078968098852783, - 0.10014915499777999, - 0.13091610099945683, - 0.0885009600315243, - 0.05231571901822463, - 0.1258060149702942, - 0.10279104797518812, - 0.03811109499656595, - 0.12443117202201393, - 0.08533081501082052, - 0.08542459402815439, - 0.0848922559816856, - 0.08012116597092245, - 0.06834360901848413, - 0.07778591298847459, - 0.10352800297550857, - 0.03124575399851892, - 0.19997379097912926, - 0.09538535997853614, - 0.13039573200512677, - 0.18263764801668003, - 0.16763426303805318, - 0.20517869397008326, - 0.041247808025218546, - 0.041483766995952465, - 0.12899203697452322, - 0.11458053301612381, - 0.08213857599184848, - 0.13632942801632453, - 0.16714515000057872, - 0.017073523995350115, - 0.0771085110027343, - 0.168330016953405, - 0.058427073978236876, - 1.6978589549835306, - 0.1577450820041122, - 0.10847710898087826, - 0.08844900800613686, - 0.0823341620125575, - 0.17959377700753976, - 0.15456393401836976, - 0.09958308799832594, - 0.07985717602423392, - 0.06974517500202637, - 0.09426384103426244, - 0.07839341700309888, - 0.1478786599909654, - 0.1356967620085925, - 0.1784022900101263, - 0.10438427599729039, - 0.13974783099547494, - 0.016243396006757393, - 0.06768847700732294, - 0.0879598360043019, - 0.06371957702504005, - 0.04793837598117534, - 0.06734984100330621, - 0.05669285400654189, - 0.0571834429865703, - 0.09461770801863167, - 0.010317417996702716, - 0.09380256800795905, - 0.07261879301222507, - 0.05266411598131526, - 0.07265844001085497, - 0.1319521309924312, - 0.09284740395378321, - 5.1369002903811634e-05, - 0.06319519200769719, - 0.03707661399675999, - 0.07488800400460605, - 0.00519567598530557, - 0.12977927495376207, - 0.10456649698608089, - 0.18209134895005263, - 0.10696238203672692, - 0.21969662894844078, - 0.14633276101085357, - 0.048724727996159345, - 0.13908198999706656, - 0.043079920011223294, - 0.058401872985996306, - 0.11510014100349508, - 0.15830413800722454, - 0.19832400399900507, - 0.07391441102663521, - 0.1187833389849402, - 0.09388013299030717, - 0.0776969750149874, - 0.005127401993377134, - 0.07889150800474454, - 0.057597170016379096, - 0.0636516629892867, - 0.06059575000836048, - 0.037464044013177045, - 0.016784336999990046, - 0.04194659000495449, - 0.1000362530030543, - 0.0874531159788603, - 0.07846757702645846, - 0.08821718902618159, - 0.07647391605132725, - 0.0838819989876356, - 0.11156313100946136, - 0.09846600898890756, - 0.15355274401372299, - 0.16620405203138944, - 0.08264032898296136, - 0.1272325869940687, - 0.1962034090247471, - 0.21368967001035344, - 0.09808671299833804, - 0.0412992839992512, - 0.15145987601135857, - 0.027262018993496895, - 0.052514967988827266, - 0.05251648000557907, - 0.1000823429931188, - 0.05808822099061217, - 0.13415207802609075, - 0.052263420991948806, - 0.020860462012933567, - 0.06850534399563912, - 0.05270955500600394, - 0.06255224598862696, - 0.11556224602099974, - 0.18585101602366194, - 0.06716963200597093, - 0.11169347102986649, - 0.11636884497420397, - 0.20866586500778794, - 0.12014543601253536, - 0.06325668600038625, - 0.12447983102174476, - 0.06920615697163157, - 0.24208075199567247, - 0.06230637199769262, - 0.06802809101645835, - 0.2600965120072942, - 0.14084313598868903, - 0.09641415998339653, - 0.11407732205407228, - 0.015411877000587992, - 0.11559066499467008, - 0.09059120698657352, - 0.14324004702211823, - 0.05276526701345574, - 0.09280027802742552, - 0.23195359799137805, - 0.031755855001392774, - 0.09943823801586404, - 0.12380351401225198, - 0.0986655410088133, - 0.27311748298234306, - 0.21110789400700014, - 0.08322033000877127, - 0.05944280797848478, - 0.162709206022555, - 0.10628843399172183, - 0.006057156992028467, - 0.05660795301082544, - 0.09326824500749353, - 0.00037635801709257066, - 2.576935536999372, - 0.10942561598494649, - 0.2079715689906152, - 0.11947618902195245, - 0.12024158300482668, - 0.13468418500269763, - 0.12648751398955937, - 0.10184026900969911, - 0.16435580600227695, - 0.03163086003041826, - 0.08301993702480104, - 0.11917386201093905, - 0.0855061430047499, - 0.11752786698343698, - 0.2266989429917885, - 0.23464020194660407, - 0.07313910700031556, - 0.005130700999870896, - 0.043865452986210585, - 0.06795198698819149, - 0.0882893739908468, - 0.08744256899808533, - 0.005953851999947801, - 0.024728965014219284, - 0.08618324801500421, - 0.13217826500476804, - 0.040628872986417264, - 0.04197103997285012, - 0.10403796200989746, - 0.05875997501425445, - 0.05708101099298801, - 0.11519327503629029, - 0.10457422597391997, - 0.0015351500042015687, - 0.03884532100346405, - 0.06968121202953625, - 0.01824708899948746, - 0.09050901103182696, - 0.06566043097700458, - 0.09870836198388133, - 0.07277825700293761, - 0.03323145800095517, - 0.10862203199940268, - 0.03382518299622461, - 0.09221294098824728, - 0.03808604103687685, - 0.04787335799483117, - 0.21031944201968145, - 0.057537985034286976, - 0.06667297298554331, - 0.037327990983612835, - 0.11307461698015686, - 0.10474657602026127, - 0.13500664703315124, - 0.06706420800765045, - 0.06423693703254685, - 0.04688686401641462, - 0.14259036695875693, - 0.02752642500854563, - 0.12165714299771935, - 0.011432808023528196, - 0.01921202801167965, - 0.021134858005098067, - 0.045480015003704466, - 0.05537801199534442, - 0.011673156026517972, - 0.01648648298578337, - 0.07715826100320555, - 0.025714598974445835 - ], - "generation_latencies": [ - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0 - ], - "throughput_timeline": [], - "prefill_latencies": [ - 0.03308815599302761, - 0.0390145860001212, - 0.016123113004141487, - 0.012921617992105894, - 0.08584935100225266, - 0.09093656900222413, - 0.08453245200507808, - 0.09052845599944703, - 0.01702884399855975, - 0.008507799997460097, - 0.015113144007045776, - 0.06468584899266716, - 0.005116416999953799, - 0.07335993899323512, - 0.07334008900215849, - 0.061030688011669554, - 0.0175331280042883, - 0.02230561600299552, - 0.09154576100991108, - 0.09123761400405783, - 0.041366032004589215, - 0.04274699400411919, - 0.042241474002366886, - 0.04971443201065995, - 0.010937875005765818, - 0.041035983987967484, - 0.032801250999909826, - 0.04796833799628075, - 0.005408953002188355, - 0.07838956199702807, - 0.08244845800800249, - 0.1424963079916779, - 0.08453660800296348, - 0.08476184100436512, - 0.11912808001216035, - 0.06909277399245184, - 0.1116646349983057, - 0.0789297300070757, - 0.07814641500590369, - 0.12600078400282655, - 0.07931425599963404, - 0.12573765800334513, - 0.0865426899981685, - 0.07935847299813759, - 0.08033884598989971, - 0.09609999399981461, - 0.07363661599811167, - 0.13378372498846147, - 0.07937459099048283, - 0.05497403700428549, - 0.07989892500336282, - 0.08528295300493483, - 0.09419738200085703, - 0.062295701005496085, - 0.09328724700026214, - 0.06367610298912041, - 0.03288552699086722, - 0.10741900299035478, - 0.07057231399812736, - 0.03964606499357615, - 0.14179665299889166, - 0.040738091003731824, - 0.03433329799736384, - 0.01150446200335864, - 0.031206313011352904, - 0.021723207013565116, - 0.06398586300201714, - 0.012872160004917532, - 0.01287980300548952, - 0.05394684699422214, - 0.04281183300190605, - 0.02035404200432822, - 0.041326457998366095, - 0.02748314199561719, - 0.028378997012623586, - 0.022369044003426097, - 0.023091675000614487, - 0.023446815001079813, - 0.02971667300153058, - 0.016964511989499442, - 0.03246966999722645, - 0.022597076997044496, - 0.022668702993541956, - 0.013917311996920034, - 0.04303935600910336, - 0.0094273489958141, - 0.02187273900199216, - 0.02460552599222865, - 0.02859799499856308, - 0.10682150500360876, - 0.08549104099802207, - 0.0950841909943847, - 0.018834515009075403, - 0.024195981008233503, - 0.025180106997140683, - 0.025099288002820686, - 0.09412152899312787, - 0.0845999080047477, - 0.19506180699681863, - 0.092443462999654, - 0.08583208899653982, - 0.011144789998070337, - 0.09517439099727198, - 0.008598407002864406, - 0.0956293899944285, - 0.01846367299731355, - 0.013429202997940592, - 0.021353592004743405, - 0.014415536992601119, - 0.028102295007556677, - 0.017068314002244733, - 0.02687416500702966, - 0.01856866601156071, - 0.017601685991394334, - 0.019669036992127076, - 0.01914430598844774, - 0.08893539301061537, - 0.02799950900953263, - 0.09858110500499606, - 0.032414418004918844, - 0.017324439002550207, - 0.10082064800371882, - 0.017471861996455118, - 0.10061576499720104, - 0.15756510000210255, - 0.16507262000232004, - 0.1798247239930788, - 0.030980937997810543, - 0.0905281970044598, - 0.025625745998695493, - 0.1654265819961438, - 0.010366968999733217, - 0.030899984994903207, - 0.04603371499979403, - 0.02079400498769246, - 0.029752253991318867, - 0.010859062997042201, - 0.3427168539928971, - 0.020777478988748044, - 0.0391739479964599, - 0.017751373001374304, - 0.025808865000726655, - 0.01055520299996715, - 0.044986317996517755, - 0.026517480000620708, - 0.0, - 0.03218106699932832, - 0.03625656799704302, - 0.0, - 0.23420624500431586, - 0.02121345499472227, - 0.02050914899155032, - 0.03607750099035911, - 0.026442572998348624, - 0.016237718999036588, - 0.03322523400129285, - 0.020977252002921887, - 0.0, - 0.03080967599817086, - 0.028560205013491213, - 0.025656276004156098, - 0.015548334995401092, - 0.016040965012507513, - 0.02582280400383752, - 0.041264495012001134, - 0.015923885002848692, - 0.02700061199720949, - 0.026753105004900135, - 0.025847187003819272, - 0.02899241000704933, - 0.032006421010009944, - 0.02675490899127908, - 0.01645656600885559, - 0.021599626998067833, - 0.005310775013640523, - 0.04644585200003348, - 0.0368149190035183, - 0.036126630002399907, - 0.0007147089927457273, - 0.03158074800739996, - 0.026699918002123013, - 0.015653674010536633, - 0.02103825399535708, - 0.010409264999907464, - 0.010233936991426162, - 0.0, - 0.010431788003188558, - 0.010260220995405689, - 0.023858629996539094, - 0.01032336801290512, - 0.010471800997038372, - 0.010363918001530692, - 0.010330807010177523, - 0.010341885004891083, - 0.020757442005560733, - 0.020736855003633536, - 0.015757590008433908, - 0.026331218992709182, - 0.02659348900488112, - 0.0208064460020978, - 0.03286779799964279, - 0.1756970190035645, - 0.02629622600215953, - 0.02776904399797786, - 0.025945496003259905, - 0.016257848998066038, - 0.03151036200870294, - 0.010534834989812225, - 0.0, - 0.010426592998555861, - 0.0, - 0.02578255299886223, - 0.040945133005152456, - 0.02097190600761678, - 0.04792420699959621, - 0.021209800004726276, - 0.02568630999303423, - 0.020960426991223358, - 0.02143066600547172, - 0.021198590999119915, - 0.026611334993503988, - 0.01552872100728564, - 0.015621106009348296, - 0.02076516399392858, - 0.0, - 0.0, - 0.04151009199267719, - 0.02067885700671468, - 0.0, - 0.015603439998812973, - 0.0, - 0.03128503300831653, - 0.010764018996269442, - 0.04156439899816178, - 0.016290247003780678, - 0.0, - 0.02063996999640949, - 0.04205125400039833, - 0.01040014399040956, - 0.0420533120050095, - 0.026257299992721528, - 0.010426889988593757, - 0.020789325004443526, - 0.03624268800194841, - 0.03609743498964235, - 0.011102418997325003, - 0.02100086200516671, - 0.026084584998898208, - 0.041872002999298275, - 0.9161495239968644, - 0.0, - 0.010234400004264899, - 0.02626997399784159, - 0.03623012499883771, - 0.025953174990718253, - 0.02603108099719975, - 0.026712178005254827, - 0.037110044999280944, - 0.020842056008405052, - 0.0, - 0.011067742001614533, - 0.0, - 0.021755925990873948, - 0.0312557739962358, - 0.02166354699875228, - 0.037237776996335015, - 0.005282908998196945, - 0.021730963999289088, - 0.03164995300176088, - 0.03553939099947456, - 0.010919251988525502, - 0.0, - 0.0, - 0.0, - 0.032763384995632805, - 0.0, - 0.041181406006217, - 0.02566259799641557, - 0.0, - 0.0, - 0.036981059005483985, - 0.03156687499722466, - 0.036078292003367096, - 0.19584080399363302, - 0.02575465600239113, - 0.0, - 0.05609521899896208, - 0.016241621007793583, - 0.041463946996373124, - 0.020861633005551994, - 0.0, - 0.04524826799752191, - 0.0, - 0.0, - 0.027621385001111776, - 0.0, - 0.0, - 0.04177869200066198, - 0.0, - 0.0, - 0.0155830299918307, - 0.0, - 0.02067374999751337, - 0.04575764300534502, - 0.023738604999380186, - 0.0, - 0.02588947799813468, - 0.0, - 0.031208508007694036, - 0.0003202619991498068, - 0.0, - 0.02057828700344544, - 0.020453517994610593, - 0.027169140012119897, - 0.030723871997906826, - 0.0, - 0.010727186003350653, - 0.016767145993071608, - 0.015774307001265697, - 0.016441081010270864, - 0.025639971005148254, - 0.036193385007209145, - 0.03318651800509542, - 0.010466420004377142, - 0.021009883988881484, - 0.04348790799849667, - 0.0, - 0.032344920007744804, - 0.01032747900171671, - 0.015467724995687604, - 0.0, - 0.0, - 0.020848476997343823, - 0.03602489401237108, - 0.020685251001850702, - 0.03777527700003702, - 0.04351880800095387, - 0.01848331200017128, - 0.016174084012163803, - 0.012567393991048448, - 0.03141837599105202, - 0.023658729987801053, - 0.0, - 0.035387377996812575, - 0.021159120005904697, - 0.0, - 0.021511426006327383, - 0.03166725499613676, - 0.036813062004512176, - 0.02283436499419622, - 0.030912261994672008, - 0.0, - 0.025676510005723685, - 0.0, - 0.015590395996696316, - 0.0, - 0.03653442900395021, - 0.0, - 0.0, - 0.03158789699955378, - 0.02616799900715705, - 0.010962162996293046, - 0.0, - 0.02085035599884577, - 0.020848116007982753, - 0.031200743003864773, - 0.0, - 0.026501964995986782, - 0.020877413000562228, - 0.03589868900598958, - 0.0, - 0.023023072993964888, - 0.0, - 0.031646431001718156, - 0.023815057007595897, - 0.010335553000913933, - 0.016095974991912954, - 0.015663725993363187, - 0.017353664996335283, - 0.0, - 0.025902955007040873, - 0.006679307000013068, - 0.0, - 0.02594478199898731, - 1.6804463399894303, - 0.021708708998630755, - 0.0, - 0.021477606002008542, - 0.03617470900644548, - 0.036380720004672185, - 0.0, - 0.03052616899367422, - 0.03893969199270941, - 0.02164010600245092, - 0.021348390000639483, - 0.015945252001984045, - 0.027080721003585495, - 0.0, - 0.04194193800503854, - 0.016261871001916006, - 0.021238738991087303, - 0.03874150100455154, - 0.0, - 0.03616762001183815, - 0.02612166100880131, - 0.0, - 0.036240931003703736, - 0.016587104997597635, - 0.020734651989187114, - 0.0, - 0.0, - 0.0, - 0.022522630999446847, - 0.0, - 0.0, - 0.015493043989408761, - 0.0, - 0.0, - 0.015468625002540648, - 0.0, - 0.0, - 0.02321907499572262, - 0.039306402002694085, - 0.0, - 0.03921258100308478, - 0.020959121000487357, - 0.03164497199759353, - 0.020973077000235207, - 0.01574616300058551, - 0.016348871009540744, - 0.0, - 0.038119615011964925, - 0.016367332005756907, - 0.020671851001679897, - 0.02038942500075791, - 0.03879131400026381, - 0.01030195401108358, - 0.02212872399832122, - 0.03143916200497188, - 0.0, - 0.0, - 0.01144477300113067, - 0.015662735007936135, - 0.020901038995361887, - 0.025759698008187115, - 0.0, - 0.021068860005470924, - 0.010886603005928919, - 0.020767221998539753, - 0.0, - 0.0, - 0.0, - 0.031546509999316186, - 0.0, - 0.02627226999902632, - 0.015863466993323527, - 0.032941373996436596, - 0.03613072399457451, - 0.0, - 0.0, - 0.0, - 0.030890647001797333, - 0.03688497099210508, - 0.02579182399495039, - 0.02581414399901405, - 0.0, - 0.010516011010622606, - 0.025971737995860167, - 0.016819426004076377, - 0.016289553997921757, - 0.02611408599477727, - 0.0219871070003137, - 0.0, - 0.0, - 0.051147190999472514, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.02682005800306797, - 0.02088586699392181, - 0.02062863000901416, - 0.04594665100739803, - 0.010547661004238762, - 0.038273968995781615, - 0.0, - 0.03141386999050155, - 0.0, - 0.0362946139939595, - 0.0, - 0.0, - 0.0, - 0.005500404004123993, - 0.0, - 0.0, - 0.0, - 0.0, - 0.01575941599730868, - 0.026272502989741042, - 0.015415843998198397, - 0.020160464991931804, - 0.0, - 0.0, - 0.016079225999419577, - 0.025938880004105158, - 0.025673702999483794, - 0.0, - 0.0, - 0.021117162992595695, - 0.0, - 0.027203732999623753, - 0.025803765005548485, - 0.023244089010404423, - 0.020867453000391833, - 0.024083009993773885, - 0.029314193001482636, - 0.013211487006628886, - 0.03088221000507474, - 0.025839859998086467, - 0.03625513600127306, - 0.02065941199543886, - 0.026116765002370812, - 0.0156436530087376, - 0.0, - 0.0, - 0.0, - 0.0012858010013587773, - 0.0006762639968656003, - 0.0030371220054803416, - 0.0018089250079356134, - 0.001968609998584725, - 0.005229878006502986, - 0.0038828179967822507, - 0.007633022993104532, - 0.015436603003763594, - 0.00884432099701371, - 0.009843977997661568, - 0.009803751003346406, - 0.0063738390017533675, - 0.007896413007983938, - 0.008063973000389524, - 0.00740513599885162, - 0.0077056750014889985, - 0.007417042012093589, - 0.012098383987904526, - 0.011189593002200127, - 0.008096338991890661 - ], - "decode_latencies": [ - 0.0017535119986860082, - 0.08455638600571547, - 0.000720163996447809, - 0.0063230579980881885, - 0.0001729950017761439, - 0.011113351996755227, - 0.0019097659969702363, - 4.401899059303105e-05, - 0.0003897559945471585, - 6.822800787631422e-05, - 0.019466045006993227, - 0.05390930999419652, - 4.360100138001144e-05, - 0.006761454991647042, - 0.08935146100702696, - 0.004491458996199071, - 0.00031085200316738337, - 0.0015937630087137222, - 0.05466949399851728, - 0.013838876009685919, - 0.020192819996736944, - 0.011841478990390897, - 0.044850664999103174, - 0.0055014690005918965, - 0.03908002299431246, - 0.04242249300295953, - 0.011150719990837388, - 0.007721620000666007, - 0.002278251005918719, - 0.02030059800017625, - 0.05627821700181812, - 0.023179333002190106, - 0.020474418008234352, - 0.05002553299709689, - 0.030294219992356375, - 0.014065438008401543, - 0.10905199999979232, - 0.02521150599932298, - 0.021027497001341544, - 0.014425854998989962, - 0.029805428988765925, - 0.0063361990032717586, - 0.019509723992086947, - 0.02290063299005851, - 0.015902104991255328, - 0.021091253001941368, - 0.013449024991132319, - 0.018128809009795077, - 0.006390008988091722, - 0.0011058850068366155, - 0.006373917989549227, - 0.006688940993626602, - 0.08795133899548091, - 0.08769193000625819, - 0.09494555200217292, - 0.0075666890043066815, - 0.02111809598864056, - 0.12310946299112402, - 0.09488150099059567, - 0.06950348299869802, - 0.002535392006393522, - 0.008683117994223721, - 0.001169319002656266, - 0.019894725002814084, - 0.00690888000826817, - 0.013545783993322402, - 0.005354567998438142, - 0.006767386003048159, - 0.01646648600581102, - 0.017566997004905716, - 0.02721618600480724, - 0.013281918989378028, - 0.012963193992618471, - 0.03026235599827487, - 0.009346803999505937, - 0.007778560000588186, - 0.005890227999771014, - 0.014346050011226907, - 0.014669013005914167, - 0.006772712993551977, - 0.001167000998975709, - 0.012726650005788542, - 0.011425598000641912, - 0.0025923710054485127, - 0.02785765699809417, - 0.006896457009133883, - 0.003404095012228936, - 0.16612218398950063, - 0.010390501003712416, - 0.08734277899202425, - 0.009021140998811461, - 0.015602422005031258, - 0.02086817599774804, - 0.01248949600267224, - 0.025388093999936245, - 0.004112752008950338, - 0.00962749200698454, - 0.07503219800128136, - 0.01645104101044126, - 0.010519440998905338, - 0.0038166699960129336, - 0.014134673998341896, - 0.08344377799949143, - 0.07061039500695188, - 0.01534854200144764, - 0.003735038000741042, - 0.020164613000815734, - 0.06954667399986647, - 0.027928730007261038, - 0.0018197610042989254, - 0.2734301369928289, - 0.0106900099926861, - 0.009258359990781173, - 0.01016871900355909, - 0.14206204500806052, - 0.008408383990172297, - 0.06955116600147448, - 0.01772673700179439, - 0.015329210000345483, - 0.020538789001875557, - 0.005475405996548943, - 0.01032222100184299, - 0.010312782993423752, - 0.016341093010851182, - 0.015322881998145021, - 0.005290278990287334, - 0.018083472998114303, - 0.015551874996162951, - 0.1443082680052612, - 0.00557137600844726, - 0.020489607006311417, - 0.006360095998388715, - 0.00513311599206645, - 0.1419484280049801, - 0.016902932999073528, - 0.005173407000256702, - 0.07100265899498481, - 0.015582282008836046, - 0.018208041990874335, - 0.015372261012089439, - 0.011464964001788758, - 0.005330718995537609, - 0.022573091002414003, - 0.010422798004583456, - 0.010619124994263984, - 4.936500045005232e-05, - 0.009235678997356445, - 0.01019944399013184, - 0.006576518004294485, - 0.010472180001670495, - 0.00025727899628691375, - 0.00825158299994655, - 0.010430309994262643, - 0.00023579200205858797, - 0.015428030994371511, - 0.0204732250131201, - 0.06927419699786697, - 0.010619300999678671, - 0.005300585005898029, - 0.010578565998002887, - 0.02622737300407607, - 0.005157847990631126, - 0.010756606003269553, - 0.005196430007345043, - 0.010229441992123611, - 0.025698198995087296, - 0.005177559010917321, - 0.005622083001071587, - 0.0051205100025981665, - 0.010743433987954631, - 0.01547641700017266, - 0.005153271995368414, - 0.00017587699403520674, - 0.005365301010897383, - 0.010419019003165886, - 0.010323695998522453, - 0.0052135010046185926, - 0.010406257002614439, - 0.011719453992554918, - 0.005348653998225927, - 0.015348567001638003, - 0.010623149006278254, - 0.005504982007551007, - 0.005474895995575935, - 0.025909755990142003, - 0.030719877991941758, - 0.010312612997950055, - 0.015542298002401367, - 0.015479744004551321, - 0.011176931991940364, - 0.005270565001410432, - 0.01039513100113254, - 0.010365721987909637, - 0.010225215999525972, - 0.015536671999143437, - 0.020445721005671658, - 0.010408780988655053, - 0.010336513005313464, - 0.015928900000290014, - 0.0104343080020044, - 7.962201198097318e-05, - 0.44709566699748393, - 0.00024580099852755666, - 0.010660593005013652, - 0.010401453007943928, - 0.015405925994855352, - 0.005198875995120034, - 0.005163077992619947, - 0.015363051003077999, - 0.011022769002011046, - 0.005822022998472676, - 0.000127135994262062, - 0.3839491899998393, - 0.005406333002611063, - 0.005275339994113892, - 0.00513161500566639, - 0.010575143009191379, - 0.005252085989923216, - 0.010381568004959263, - 0.0052739020029548556, - 0.005207074995269068, - 0.010460284000146203, - 0.0051900160033255816, - 0.016627390999929048, - 0.005133653001394123, - 0.005194195997319184, - 0.005135650993906893, - 0.010284528994816355, - 0.00534019700717181, - 0.010367641996708699, - 0.005173591009224765, - 0.010395411009085365, - 0.015667333005694672, - 4.5994995161890984e-05, - 0.0057028550072573125, - 0.005165431997738779, - 0.005189347008126788, - 0.014979460000176914, - 0.010613892998662777, - 0.010435241012601182, - 0.010886119998758659, - 0.005244938001851551, - 0.010491029999684542, - 0.005523169005755335, - 0.021335845987778157, - 0.010367491995566525, - 0.02074069700029213, - 0.011152677005156875, - 0.010642897002981044, - 0.005183664005016908, - 0.0103085610026028, - 0.005220434992224909, - 0.0051321550126885995, - 0.01542256900575012, - 0.005292759989970364, - 0.011576358010643162, - 0.00516863600932993, - 0.010356459009926766, - 0.015477552995434962, - 0.0056995549966814, - 0.005218777994741686, - 0.00996394400135614, - 0.015423097007442266, - 0.010459868994075805, - 0.005173489000299014, - 0.005195724996156059, - 0.01555153499066364, - 0.005637381007545628, - 0.010242699994705617, - 0.005205537992878817, - 0.015401286000269465, - 0.020421508001163602, - 0.010343841990106739, - 0.005333092005457729, - 0.01578371599316597, - 0.0051472479972289875, - 0.005142998998053372, - 0.02775042300345376, - 0.005578463998972438, - 0.01029450399801135, - 0.005148449999978766, - 0.006532282000989653, - 0.005167313996935263, - 0.010171029003686272, - 0.015270158008206636, - 0.005161947992746718, - 0.015153022002778016, - 0.005133547994773835, - 8.653500117361546e-05, - 0.015114038993488066, - 0.010308311000699177, - 0.010313710998161696, - 0.005230091992416419, - 0.007327923012780957, - 0.005145265007740818, - 0.005249262001598254, - 0.011000151993357576, - 0.00012163599603809416, - 0.005185526009881869, - 0.020610799998394214, - 0.02079285599756986, - 0.010318664004444145, - 0.005123961993376724, - 0.005214004006120376, - 0.025435770992771722, - 0.005228175999945961, - 0.010287714001606219, - 0.015541908011073247, - 0.010359050997067243, - 0.02018551500805188, - 0.005201369000133127, - 0.005493863005540334, - 0.005265150000923313, - 0.010275990993250161, - 0.010376239995821379, - 0.00021791500330436975, - 0.010431757997139357, - 0.01779774299939163, - 0.010611791993142106, - 9.81940102064982e-05, - 0.011553931995877065, - 0.010413465002784505, - 0.010522830998525023, - 0.01564063600380905, - 0.015581052997731604, - 0.005192368000280112, - 0.010467957996297628, - 0.005198982005822472, - 0.005780475010396913, - 0.010619568987749517, - 0.00523457900271751, - 0.01539902199874632, - 0.02043992000108119, - 0.010295815009158105, - 0.010418204998131841, - 0.025478171999566257, - 0.005192272990825586, - 0.011190848992555402, - 0.016080795001471415, - 0.01029172699782066, - 0.010209435000433587, - 0.011288808993413113, - 0.02787582999735605, - 0.005157357009011321, - 0.005243398001766764, - 0.005180330990697257, - 0.005298496995237656, - 0.010423432002426125, - 0.015092418994754553, - 0.011954888992477208, - 0.02049829499446787, - 0.005116693006129935, - 0.005239334001089446, - 0.010550557999522425, - 0.005377206005505286, - 0.008182700999896042, - 0.010346556999138556, - 0.010132647992577404, - 0.005132240010425448, - 0.010622436995618045, - 0.0052696899947477505, - 0.00561524200020358, - 0.011249332994339056, - 0.010310308993211947, - 0.010436409997055307, - 0.005129176992340945, - 0.027341740002157167, - 0.005109416990308091, - 0.010333154001273215, - 0.01551111000298988, - 0.01032106700586155, - 0.025485144986305386, - 0.010441544000059366, - 0.010163825994823128, - 0.025467981002293527, - 0.015650536995963193, - 0.005187179005588405, - 0.010479623000719585, - 0.01029425700835418, - 0.016513064998434857, - 0.005126998003106564, - 0.010289901998476125, - 0.0051644509949255735, - 0.005105450007249601, - 0.005131162004545331, - 0.005133005994139239, - 0.015293138989363797, - 0.00543455001024995, - 4.0194005123339593e-05, - 0.005218417005380616, - 0.016229152999585494, - 0.011617990996455774, - 0.015555041012703441, - 0.010463262995472178, - 0.005120528003317304, - 0.010247507991152816, - 0.005184726003790274, - 0.005362504001823254, - 0.015428906001034193, - 0.010351995995733887, - 0.005116961008752696, - 0.005172930992557667, - 0.010315747989807278, - 0.005243074003374204, - 0.00518986200040672, - 0.015438036003615707, - 0.005194151992327534, - 0.021990364999510348, - 0.00022063800133764744, - 0.005136384002980776, - 0.011119394999695942, - 0.013021888997172937, - 0.01699829001154285, - 0.010345105998567306, - 0.01036084299266804, - 0.015262328000972047, - 0.015288863010937348, - 0.011390357001801021, - 0.010346471011871472, - 0.00529268299578689, - 0.010937817991361953, - 0.010559556991211139, - 0.00519579098909162, - 0.010380268999142572, - 0.020769514012499712, - 0.01041628498933278, - 0.0155660610034829, - 0.005201064996072091, - 0.01035016700916458, - 0.005200561994570307, - 0.005178172999876551, - 0.011308506000204943, - 0.005201100997510366, - 0.010234197005047463, - 0.010401446998002939, - 0.016666513998643495, - 0.005293494992656633, - 0.005494502998772077, - 0.010283420997438952, - 0.0051854330085916445, - 0.025714079005410895, - 0.005136250998475589, - 0.005224168999120593, - 0.005178117993636988, - 0.015370945999165997, - 0.015410865991725586, - 0.015527821000432596, - 0.0052072550024604425, - 0.00524976899032481, - 0.02041888800158631, - 0.010499034004169516, - 0.005196529004024342, - 0.006915425008628517, - 0.0001596349902683869, - 6.664899410679936e-05, - 0.010371769007178955, - 0.00517054001102224, - 0.021164118006709032, - 0.01629118199343793, - 0.01543770800344646, - 0.005294978996971622, - 0.015454627005965449, - 0.010914915998000652, - 0.005322943994542584, - 0.005176597012905404, - 0.005232518000411801, - 0.005147440999280661, - 0.015275413999916054, - 0.009767620998900384, - 0.010527774997171946, - 0.005117610999150202, - 0.01041688400437124, - 0.0053306030022213235, - 0.016010474995709956, - 0.010356265003792942, - 0.010218774987151846, - 0.005856692994711921, - 0.010410061993752606, - 0.010454222006956115, - 0.010534859000472352, - 0.03657188000215683, - 0.02089069200155791, - 0.005302453006152064, - 0.010377059006714262, - 0.005363491989555769, - 0.005229732996667735, - 0.005106167009216733, - 0.005208250004216097, - 0.006146101004560478, - 4.678100231103599e-05, - 0.01024683000287041, - 0.005115947002195753, - 0.010166380001464859, - 0.01024151299498044, - 0.010903757996857166, - 0.010411000999738462, - 0.005221038998570293, - 0.005212867996306159, - 0.005441775996587239, - 0.00024404098803643137, - 0.010327784009859897, - 0.010266598997986875, - 0.0018688170093810186, - 0.00571271100488957, - 0.02033518200914841, - 0.01120905700372532, - 0.02065130199480336, - 0.016346891003195196, - 0.0057762930082390085, - 0.005217659010668285, - 0.00024801900144666433, - 0.0057489739992888644, - 0.010401049003121443, - 0.0008439480006927624, - 0.0028103429940529168, - 0.0052678680076496676, - 0.005237006989773363, - 0.01581819399143569, - 0.005623347999062389, - 0.005310119013302028, - 0.0058120550093008205, - 0.003916336005204357, - 0.01105598900176119, - 0.005829488000017591, - 0.0056699550041230395, - 0.008177547002560459, - 0.0012425460008671507, - 0.010195572991506197, - 0.0014362090005306527, - 0.004353216994786635, - 0.00757532799616456, - 0.005813556999783032, - 9.590599802322686e-05, - 0.01050766499247402, - 0.006877207008074038, - 0.006871910998597741, - 0.0068274730001576245, - 0.005144418988493271, - 0.0029890770092606544, - 0.006941360989003442, - 0.001485397995566018, - 0.0037987699906807393, - 0.0022946079989196733, - 0.00293578598939348, - 0.003239693003706634, - 0.0022776719997636974, - 0.005192968994379044, - 0.005713945007300936, - 0.004622426000423729 - ], - "multi_turn_cache_hits": 75, - "multi_turn_cache_misses": 297, - "seed": 42, - "summary": { - "total_requests": 549, - "total_tokens": 146891, - "elapsed_time": 51.472904920578, - "avg_throughput_tokens_per_sec": 2853.753838580722, - "requests_per_second": 10.665805647594585, - "end_to_end_latency_ms": { - "mean": 24335.68935563794, - "p50": 23836.146176006878, - "p95": 51915.50227839616, - "p99": 52094.94110224419 - }, - "storage_io_latency_ms": { - "mean": 155.52169150936433, - "p50": 104.03796200989746, - "p95": 450.08835620246833, - "p99": 997.6461316557815 - }, - "generation_latency_ms": { - "mean": 0.0, - "p50": 0.0, - "p95": 0.0, - "p99": 0.0 - }, - "cache_stats": { - "cache_hit_rate": 0.9298723404255319, - "cache_hits": 5463, - "cache_misses": 412, - "gpu_entries": 378, - "cpu_entries": 31, - "nvme_entries": 39, - "gpu_memory_used_gb": 6.0128173828125, - "cpu_memory_used_gb": 6.3109130859375, - "offloads_cpu": 70, - "offloads_nvme": 39, - "storage_health": { - "overall_status": "PASS", - "criteria": [ - { - "name": "CPU RAM P95 < 150ms", - "target": 150, - "actual": 15.771770998981083, - "unit": "ms", - "passed": true - }, - { - "name": "Cache Hit Rate > 30%", - "target": 0.3, - "actual": 0.9298723404255319, - "unit": "ratio", - "passed": true - } - ], - "passed_count": 2, - "total_count": 2 - }, - "prefill_writes": 448, - "decode_reads": 5463, - "prefill_bytes_written_gb": 7.5841064453125, - "decode_bytes_read_gb": 94.2763671875, - "system_prompt_hits": 1096, - "common_phrase_hits": 0, - "user_cache_hits": 4292, - "multi_turn_hits": 75, - "total_read_bytes": 101228478464, - "total_write_bytes": 8143372288, - "total_read_gb": 94.2763671875, - "total_write_gb": 7.5841064453125, - "read_write_ratio": 12.430781116708783, - "read_iops": 5463, - "write_iops": 448, - "gpu_read_p50_ms": 9.63814499846194, - "gpu_read_p95_ms": 28.541684598894783, - "gpu_read_p99_ms": 101.07552503468466, - "gpu_write_p50_ms": 25.818474001425784, - "gpu_write_p95_ms": 97.7127161531825, - "gpu_write_p99_ms": 195.47467540513023, - "cpu_read_p50_ms": 0.5242705010459758, - "cpu_read_p95_ms": 15.771770998981083, - "cpu_read_p99_ms": 19.75252379852464 - }, - "qos_metrics": { - "interactive": { - "total_requests": 549, - "latency_ms": { - "mean": 24335.68935563794, - "p50": 23836.146176006878, - "p95": 51915.50227839616, - "p99": 52094.94110224419, - "max": 52886.14732699352 - }, - "sla": { - "target_p95_ms": 50, - "actual_p95_ms": 51915.50227839616, - "compliance": 0.0, - "met": false - } - }, - "responsive": { - "no_data": true - }, - "batch": { - "no_data": true - } - }, - "prefix_cache_stats": { - "prefix_hits": 115, - "prefix_misses": 434, - "system_prompt_reuse": 115, - "common_phrase_reuse": 0, - "bytes_saved": 100007936 - }, - "autoscaling_stats": [], - "autoscaling_summary": null, - "multi_turn_stats": { - "cache_hits": 75, - "cache_misses": 297, - "hit_rate": 0.20161290322580644 - } - } -} \ No newline at end of file diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_cpu_trial3.json b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_cpu_trial3.json deleted file mode 100644 index 42f3812b..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_cpu_trial3.json +++ /dev/null @@ -1,2875 +0,0 @@ -{ - "requests_completed": 549, - "total_tokens_generated": 148164, - "total_storage_io_latency": 125.59847482843907, - "total_generation_latency": 0.0, - "end_to_end_latencies": [ - 0.04779652399884071, - 0.07631923900044058, - 0.2410641320020659, - 0.2783738599973731, - 0.2787208170047961, - 0.2856456059962511, - 0.3187843390041962, - 0.32272889000887517, - 0.48283531999913976, - 0.48293097599525936, - 0.4840428640018217, - 0.49144638200232293, - 0.4924779379944084, - 0.5471271660062484, - 0.5711941680056043, - 0.6118039290013257, - 0.630837283009896, - 0.6427911579958163, - 0.6437614839960588, - 0.645422798988875, - 0.6456346670020139, - 0.6454080090043135, - 0.6663640519982437, - 0.6678441849944647, - 0.6689197199884802, - 0.6693757619941607, - 0.6718228730023839, - 0.6844254280003952, - 0.6974042429937981, - 0.7033777579927118, - 0.709462564002024, - 0.7168199959996855, - 0.7171093110082438, - 0.7173893019935349, - 0.7175622479990125, - 0.7182180410018191, - 0.718222590003279, - 0.7239049359923229, - 0.7250950559973717, - 0.8228534139925614, - 0.8360806480050087, - 0.8362192259955918, - 0.8358107219974045, - 0.8427366219984833, - 0.84259067500534, - 0.8458591680100653, - 0.8466670869966038, - 0.8462653250026051, - 0.8464456479996443, - 0.848315052993712, - 0.8496355819952441, - 0.8554195210017497, - 0.8633701310027391, - 0.863124950992642, - 0.8647789050010033, - 0.8656416379963048, - 0.8710143289936241, - 0.9402146739885211, - 0.9391985380061669, - 0.959621451998828, - 0.9606167530000675, - 0.9665201049938332, - 0.9742302809900139, - 0.9748478350084042, - 0.9927725890011061, - 0.9994661199889379, - 1.0000674510083627, - 1.0061780440009898, - 1.0057639429869596, - 1.0078174679947551, - 1.088058456996805, - 1.0884193740057526, - 1.0960987959988415, - 1.184174559006351, - 1.1906815910042496, - 1.1928696180111729, - 1.2005293550028, - 1.2031591289996868, - 1.2148592790035764, - 1.21633919699525, - 1.2281029110017698, - 1.2305980419914704, - 1.2342925739940256, - 1.2360119890072383, - 1.3356419159972575, - 1.3440305019903462, - 1.3433213979878929, - 1.3501767669949913, - 1.5158470750029664, - 1.5143557999981567, - 1.5225715169945033, - 1.5279040910099866, - 1.793107246994623, - 1.8061305430019274, - 1.8304271759989206, - 1.9092251120018773, - 1.9154226859973278, - 1.9708792769961292, - 1.9774713480001083, - 1.9987307659903308, - 2.1213397639949108, - 2.132910725005786, - 2.142093129004934, - 2.143508004999603, - 1.9211331119877286, - 2.1432725670019863, - 2.1429932309983997, - 2.14366899500601, - 1.8627717460040003, - 1.971393108004122, - 2.1441007290122798, - 2.1532446939963847, - 2.131701022008201, - 2.158691404009005, - 2.1589313129952643, - 2.166897870003595, - 2.177794527000515, - 2.1831914450012846, - 2.1428294040088076, - 2.208501945991884, - 2.1450620099931257, - 2.2320234080107184, - 2.1530132799962303, - 2.1532528690004256, - 2.259397457994055, - 1.8976849389873678, - 2.260232425003778, - 2.377666030006367, - 2.3993233159999363, - 2.5589968529966427, - 2.5837417900038417, - 2.3394844879949233, - 2.749968461008393, - 2.9000899649981875, - 2.902038807005738, - 2.9016934800019953, - 2.906843589997152, - 2.4145438209961867, - 2.426304173000972, - 2.9522039190051146, - 2.5525272539962316, - 2.5590961769921705, - 2.9819015649991343, - 2.9913310300034937, - 2.991265126009239, - 2.578161265992094, - 2.5942180109996116, - 2.6049892229930265, - 2.9920146009972086, - 2.621006323999609, - 2.993209483989631, - 2.9930317050020676, - 2.9938820190000115, - 2.993669482995756, - 2.90768918399408, - 2.9958661319979, - 2.9960011590010254, - 2.9957552130072145, - 3.0149917230010033, - 3.0314675650006393, - 3.0406984080036636, - 3.197823263006285, - 3.204853101997287, - 2.9955691739887698, - 3.5182536950014764, - 3.578054366997094, - 3.609976192994509, - 3.6094129789998988, - 3.6524171869968995, - 3.6732529519940726, - 3.4730580350005766, - 3.6963989600044442, - 3.7006773250031983, - 3.708700470000622, - 3.7139058750035474, - 3.713981191001949, - 3.714723969998886, - 3.715524325001752, - 3.7414021879958455, - 3.6730592099920614, - 3.772605071993894, - 3.7828822399897035, - 3.807317487007822, - 3.811881478992291, - 3.81992436699511, - 3.835696920999908, - 3.7172026479966007, - 3.728903385999729, - 3.76266614299675, - 4.211910163998255, - 4.240962308991584, - 4.2420518870058, - 4.253118212000118, - 4.258672973999637, - 4.264551359010511, - 4.269644006009912, - 4.280187157986802, - 4.29044836499088, - 4.311503541001002, - 4.2012491910136305, - 4.343763595999917, - 4.355679501997656, - 4.3728753620089265, - 4.378468366005109, - 3.8484137750056107, - 4.420285790998605, - 4.2581253410025965, - 4.462318199002766, - 4.269349345995579, - 4.301005479996093, - 4.157778612992843, - 4.543123969997396, - 4.567013308987953, - 4.5666224930027965, - 4.5898367399931885, - 4.590219331003027, - 4.605557251998107, - 4.4002108160057105, - 4.985057176003465, - 4.107549010994262, - 5.018395181992673, - 5.039005326994811, - 5.05589196299843, - 4.550964112000656, - 4.322307525013457, - 4.968005668997648, - 5.21001413599879, - 4.440991843992379, - 4.9906135280034505, - 5.211856779002119, - 5.212361092999345, - 5.212181410999619, - 5.214206866003224, - 5.0915511490020435, - 5.108463589000166, - 5.2696076090069255, - 5.43617184899631, - 5.439042605998111, - 4.97352351200243, - 5.460036642005434, - 5.085235546997865, - 5.549454442996648, - 5.554197227000259, - 5.57790275401203, - 5.586103954992723, - 5.596677034991444, - 5.611685372001375, - 5.617038128999411, - 5.622687031005626, - 5.63440165600332, - 5.6396200400049565, - 5.6627866009948775, - 5.705128361005336, - 5.705625764006982, - 5.706108252998092, - 5.706950912994216, - 5.707707481997204, - 5.707930055010365, - 5.7076212340034544, - 5.725758715998381, - 5.732221206999384, - 5.478106440001284, - 5.74937334000424, - 5.719313223002246, - 6.030490478995489, - 6.032573859993136, - 6.23025371399126, - 6.041938426002162, - 6.052035448999959, - 6.264578654998331, - 6.287330997001845, - 6.293133821003721, - 6.3208206109993625, - 6.326219782000408, - 6.336650960001862, - 6.343716030009091, - 6.344480536004994, - 6.371808444993803, - 6.247200540004997, - 6.3992714999913005, - 6.410748060006881, - 6.096462151006563, - 6.434325661000912, - 6.136459397996077, - 6.287425850998261, - 6.456052783993073, - 6.476513293993776, - 6.481774651008891, - 6.3256399870006135, - 6.502865841001039, - 6.536603740998544, - 6.556609129009303, - 6.567638681997778, - 6.608331868992536, - 6.439856058001169, - 6.444344378993264, - 6.672315137999249, - 6.694132892997004, - 6.709186643012799, - 6.742246036999859, - 6.337665949991788, - 6.777020636000088, - 6.5570647419954184, - 6.573579660995165, - 7.167572898993967, - 6.601742041995749, - 7.20178823301103, - 7.2196123610046925, - 6.46597410600225, - 7.2515289559960365, - 7.257287922009709, - 7.279913737002062, - 6.78834927699063, - 7.278616002004128, - 7.279738262994215, - 7.2853794309921796, - 6.594339963005041, - 7.329651227992144, - 7.341639771999326, - 7.346849914989434, - 7.357328783007688, - 7.385862053008168, - 7.395998817009968, - 7.396096199998283, - 7.396714346003137, - 7.2785730229952605, - 7.430155469002784, - 7.446267239007284, - 7.451090082002338, - 7.45142728999781, - 7.191986906007514, - 7.330851314996835, - 7.464588903996628, - 7.46511903499777, - 7.465312988992082, - 7.466777352994541, - 7.467321080999682, - 6.83285277801042, - 7.541454293997958, - 7.547948962004739, - 7.5527226730046095, - 7.585858232996543, - 7.451598579995334, - 7.451565762996324, - 7.634827875997871, - 7.478285709003103, - 7.710759447989403, - 7.720631706004497, - 7.780704068005434, - 7.596270672001992, - 7.842651485989336, - 7.8603057069994975, - 7.887835162997362, - 7.903621399003896, - 7.954706441989401, - 7.960792337995372, - 7.984431662000134, - 7.831604492006591, - 7.9860502849915065, - 7.860450613996363, - 8.036837345003732, - 8.037180775994784, - 8.037597543006996, - 8.037666600997909, - 8.038995263006655, - 8.040048321010545, - 8.039500145998318, - 8.039709898002911, - 8.04067303800548, - 8.040095578995533, - 8.041814262003754, - 8.042443298007129, - 8.042417228993145, - 8.043041457000072, - 7.986029341991525, - 8.043940243995166, - 8.04518722499779, - 8.044210193009349, - 8.04552007500024, - 8.04464476800058, - 8.04568192899751, - 8.047690552994027, - 8.047338732998469, - 8.053684477010393, - 8.076117672986584, - 8.082300504000159, - 8.044366472007823, - 8.045501230008085, - 8.326228093996178, - 8.74850112698914, - 8.77776983899821, - 8.787489471011213, - 8.793019016011385, - 8.211703772001783, - 8.83378386500408, - 8.844716770006926, - 8.845746735998546, - 8.851737906996277, - 8.862527208999381, - 8.889873969004839, - 8.890461579998373, - 8.926070067012915, - 8.931778318001307, - 8.93290782799886, - 8.947091780006303, - 8.947597281003254, - 8.954073990011238, - 8.758931778997066, - 8.891371416000766, - 9.009196099010296, - 9.011117608999484, - 9.017915024000104, - 9.025748207001016, - 9.067088214011164, - 9.069253266003216, - 8.99259781598812, - 9.404682584005059, - 9.435410623991629, - 9.4472048930038, - 9.472231031002593, - 9.473554526004591, - 9.36452936900605, - 9.502747432008618, - 9.50303522500326, - 9.503369334997842, - 9.431065397002385, - 9.505525934000616, - 9.5058062790049, - 9.506081974002882, - 9.506385312997736, - 9.507378122987575, - 9.522942296011024, - 9.041410599005758, - 9.506313645004411, - 9.60259441898961, - 9.656008762001875, - 9.661084785999265, - 9.58400532300584, - 9.693035661999602, - 9.589428658000543, - 9.70806474899291, - 9.714632743998663, - 9.768985419999808, - 9.774138790002326, - 9.7859074720036, - 9.661814792998484, - 9.819899857000564, - 9.824558909996995, - 9.830895820996375, - 9.835802681001951, - 9.842000985998311, - 9.846860773002845, - 9.857242986996425, - 9.891073349004728, - 9.890801583009306, - 9.762808009996661, - 9.923608937999234, - 9.935900915996172, - 9.937403263000306, - 9.936983456995222, - 9.80407484099851, - 9.679233691000263, - 10.110083005012712, - 10.146404336002888, - 10.15209525100363, - 10.179436252001324, - 10.17936428700341, - 9.936654638993787, - 10.051657797012012, - 10.187031758003286, - 10.066222596011357, - 10.07339088000299, - 10.307029470990528, - 10.306902726995759, - 10.30707142400206, - 10.308654622000176, - 9.868792802008102, - 10.307018888997845, - 10.307731081004022, - 10.309690577007132, - 10.310019338998245, - 10.312821797997458, - 10.316719851995003, - 10.404397264996078, - 10.40494111199223, - 10.157027057997766, - 10.435104142990895, - 10.476877123001032, - 10.483911967996391, - 10.493857699009823, - 11.057212352010538, - 11.058647833997384, - 10.39724400799605, - 11.077377978988807, - 10.310398495013942, - 10.463806797008147, - 10.4634771609999, - 11.140782369009685, - 11.146684892009944, - 10.523139298005844, - 10.332194041999173, - 11.155668973005959, - 11.156053787010023, - 11.051168684003642, - 11.157194835002883, - 11.157777455999167, - 11.158505601997604, - 11.15948023700912, - 11.161236902000383, - 11.158001248011715, - 11.160831695000525, - 11.158612114013522, - 11.159083763006493, - 11.15750113100512, - 11.160426640009973, - 11.160867543003405, - 11.162351913997554, - 11.163573326994083, - 11.164614234992769, - 11.173762884995085, - 11.180827684991527, - 11.181978131004144, - 11.186357844999293, - 11.187148911994882, - 11.193141010997351, - 11.194135023004492, - 11.193841622007312, - 11.194631595993997, - 11.19766836699273, - 11.197260910004843, - 11.197822538990295, - 11.198385670009884, - 11.198043111988227, - 11.20099424799264, - 11.201680815996951, - 11.201903740991838, - 11.205090682997252, - 11.205148230001214, - 11.205775018999702, - 11.206107200006954, - 11.20875989200431, - 11.209459657009575, - 11.208705343000474, - 11.209773640002823, - 11.212554479003302, - 11.211808364008903, - 11.268485032996978, - 11.391486464999616, - 11.576811840990558 - ], - "storage_latencies": [ - 0.020050696999533102, - 0.021733285000664182, - 0.15978080402419437, - 0.1271236580068944, - 0.1375726179976482, - 0.15042792599706445, - 0.02991852501872927, - 0.12941323099948931, - 0.16574597703584004, - 0.14468452100118157, - 0.1822280510532437, - 0.16150681101134978, - 0.12650481896707788, - 0.05900908798503224, - 0.34252655402815435, - 0.23544584796763957, - 0.13414414699946065, - 0.055198405010742135, - 0.18394694999733474, - 0.26658219401724637, - 0.264605511983973, - 0.1963358579960186, - 0.09576586300681811, - 0.23887173701950815, - 0.27002976200310513, - 0.09815946899470873, - 0.2064623769983882, - 0.12100790301337838, - 0.21437607402913272, - 0.018519812991144136, - 0.05836144999193493, - 0.12485583101806697, - 0.12627438602794427, - 0.1202492119919043, - 0.11570841100183316, - 0.2876957909902558, - 0.12059779195988085, - 0.12619819701649249, - 0.2537868129875278, - 0.048727757020969875, - 0.2252111959969625, - 0.22982128500007093, - 0.12400605800212361, - 0.24231686498387717, - 0.12814205499307718, - 0.2285562469769502, - 0.20095728100568522, - 0.228283777993056, - 0.057321265994687565, - 0.25728929399338085, - 0.372843821067363, - 0.12169392300711479, - 0.24734511702263262, - 0.1485370750160655, - 0.14544134004972875, - 0.25295748698408715, - 0.2362059720180696, - 0.39311193402681965, - 0.25303413803339936, - 0.3407323940045899, - 0.08017971101799048, - 0.03349239299132023, - 0.3231541369896149, - 0.19405997400463093, - 0.10877073500887491, - 0.23145256398129277, - 0.1044766530249035, - 0.1896897820115555, - 0.11621754098450765, - 0.03217521200713236, - 0.44063166799605824, - 0.31726355697901454, - 0.280356853021658, - 0.6476361520035425, - 0.03748000999621581, - 0.18397230400296394, - 0.12369566300185397, - 0.40105667199532036, - 0.11642823199508712, - 0.3457284900068771, - 0.03791225100576412, - 0.5206234460056294, - 0.43838582596799824, - 0.45723312502377667, - 0.2900333680008771, - 0.3890618360310327, - 0.3347849510173546, - 0.5971393560321303, - 0.4381723729893565, - 0.1408876609930303, - 0.8140527790092165, - 0.33749516602256335, - 0.29557450201536994, - 0.7284621239814442, - 0.47147876002418343, - 0.8603732600167859, - 0.8436670239607338, - 0.5643408529867884, - 0.5523944010201376, - 0.4482108079682803, - 0.4919808399863541, - 0.7684570170240477, - 0.7218753789929906, - 0.7593545880081365, - 0.8018790880014421, - 0.7716466459969524, - 0.5845789340091869, - 0.8840758610022021, - 0.134337384995888, - 0.5300556160073029, - 0.048800690012285486, - 0.7695661230100086, - 0.2763244190282421, - 0.35231422798824497, - 0.5620469529967522, - 0.8185932110209251, - 0.7401420929818414, - 0.6742969999904744, - 0.3254272670019418, - 0.551154291984858, - 0.541158831998473, - 0.7708695609908318, - 0.4355417129700072, - 0.7938952000258723, - 0.3405744289921131, - 0.16401790696545504, - 0.6964243960101157, - 0.7569701470056316, - 0.7779047069780063, - 0.15228411498537753, - 1.092236803015112, - 0.017193668987601995, - 0.4123579760052962, - 0.14310831599868834, - 0.984054405009374, - 0.5809284960123478, - 0.09247593799955212, - 0.10485176800284535, - 0.9159099629760021, - 0.1607672310055932, - 0.7872945530107245, - 0.6042801419971511, - 1.0067639159533428, - 0.2346513320080703, - 0.06870747002540156, - 1.170264169020811, - 0.686936779980897, - 0.2924922080273973, - 0.23604886601970065, - 0.029400035986327566, - 0.3208948109386256, - 0.3230406800284982, - 0.19085510296281427, - 0.023260389993083663, - 0.09586957699502818, - 0.20578568401106168, - 0.1683441779896384, - 0.24683193700911943, - 0.9237970209651394, - 0.6798578059970168, - 0.22764792999078054, - 0.13511840198771097, - 0.24090604302182328, - 0.34883268100384157, - 0.15651599399279803, - 0.6917273259605281, - 0.3125377820106223, - 0.19320579698251095, - 0.5992034390219487, - 0.27529791300185025, - 0.09307126600469928, - 0.30635105798137374, - 0.045151173006161116, - 0.2200160550128203, - 0.8138559800281655, - 0.29758836700057145, - 0.06581207198905759, - 0.04832348501076922, - 0.025835794003796764, - 0.14105874099186622, - 0.11505495703022461, - 0.14434676300152205, - 0.10710587000357918, - 0.2354130030144006, - 0.5133810350089334, - 0.19418971303093713, - 0.6070250509801554, - 0.2622535030095605, - 0.8310857339529321, - 0.8043205429858062, - 0.11788580799475312, - 0.3060065359750297, - 0.0691899049852509, - 0.5641733780066716, - 0.4199194189859554, - 0.5187648580322275, - 0.544506691978313, - 0.22959608401288278, - 0.5395937020075507, - 0.29025363198888954, - 0.4136698199727107, - 0.5446210030204384, - 0.5118886660347925, - 0.12088365899398923, - 0.10486952899373136, - 0.15559234899410512, - 0.03586347399686929, - 0.005433272992377169, - 0.0720927940128604, - 0.21019229000376072, - 0.07102493901038542, - 0.1266416980070062, - 0.44893954703002237, - 0.06000998003582936, - 0.13318065895873588, - 0.1438673660013592, - 0.0809770560299512, - 0.173857878005947, - 0.057583942951168865, - 0.08304794899595436, - 0.09156035498017445, - 0.1599316150386585, - 0.16864294398692437, - 0.04974397401383612, - 0.03766499298217241, - 0.055897676007589325, - 0.13806095202744473, - 0.3295456710184226, - 0.038902492000488564, - 0.7016135649755597, - 0.16314342198893428, - 0.04650702803337481, - 0.022602002005442046, - 0.06098048599960748, - 0.10247121097927447, - 0.10472972696879879, - 0.1062997529952554, - 0.028879935998702422, - 0.09105048998026177, - 0.048308983023162, - 0.07781801102100872, - 0.09757105400785804, - 0.07059719301469158, - 0.07555907798814587, - 0.08029074003570713, - 0.1893793899944285, - 0.2833428439917043, - 0.33444475699798204, - 0.22536315000616014, - 0.7086262200027704, - 0.042734425005619414, - 0.3741547439713031, - 0.25134192800032906, - 0.0947237870132085, - 0.2679680579894921, - 0.28278265400149394, - 0.04160806101572234, - 0.267530703014927, - 0.10753837201627903, - 0.32767143700039014, - 0.24148643601802178, - 0.07916187099181116, - 0.3774724690010771, - 0.03168613201705739, - 0.04447866600821726, - 0.24518489395268261, - 0.11781199001416098, - 0.08561314397957176, - 0.25423177500488237, - 0.1438102020038059, - 0.14356480898277368, - 0.40973030298482627, - 0.11171000998001546, - 0.47410926801967435, - 0.314330027991673, - 0.07366788599756546, - 0.40238510198832955, - 0.10447641598875634, - 0.08579069099505432, - 0.14417123097518925, - 0.48086130697629414, - 0.07872019399655983, - 0.6871789679862559, - 0.37604748000740074, - 0.40476069299620576, - 0.05233773197687697, - 0.2037403760041343, - 0.15392204199451953, - 0.127111015986884, - 0.09454672700667288, - 0.06758882898429874, - 0.06089599302504212, - 0.05588856601389125, - 0.300935931969434, - 0.2238315589784179, - 0.10526467399904504, - 0.42206437498680316, - 0.0848014749790309, - 0.12715733800723683, - 0.13449858402600512, - 0.04480595998757053, - 0.1444035369786434, - 0.5061359250248643, - 0.09329197200713679, - 0.0903190389944939, - 0.11405742101487704, - 0.09219218401995022, - 0.35714662799728103, - 0.07215093000559136, - 0.09215577898430638, - 0.3752239459863631, - 0.5735047509515425, - 0.04991123299987521, - 0.19624204499996267, - 0.1617948460188927, - 0.05019483302021399, - 0.09538255199731793, - 0.022561623001820408, - 0.09486576200288255, - 0.03906164798536338, - 0.016048685007262975, - 0.4114499470015289, - 0.06449257901113015, - 0.03769718800322153, - 0.0612167909857817, - 0.15319365603500046, - 0.115267249988392, - 0.07263659802265465, - 0.03069137199781835, - 0.027214432979235426, - 0.14609838899923488, - 0.13107145501999184, - 0.0518811479996657, - 0.08874014500179328, - 0.08482393703889102, - 0.08029262701165862, - 0.09638042900769506, - 0.121293018994038, - 0.006861903020762838, - 0.04042177402880043, - 0.027824282020446844, - 0.19932309200521559, - 0.10286688798805699, - 0.06532888401125092, - 0.038568437987123616, - 0.05476745200576261, - 0.03476439199585002, - 0.07641804197919555, - 0.018137983002816327, - 0.05200673699437175, - 0.1264401409571292, - 0.11511592000897508, - 0.11397170899726916, - 0.05564595699252095, - 0.0843529810081236, - 0.06182708799315151, - 0.1329680020135129, - 0.0720246350101661, - 0.10284459598187823, - 0.11932955795782618, - 0.1256739790260326, - 0.24130183999659494, - 0.15503952697326895, - 0.12669239100068808, - 0.14038530401012395, - 0.1408305809745798, - 0.0561830189981265, - 0.06104514801700134, - 0.05917080398648977, - 0.03909348200249951, - 0.06864900299115106, - 0.07007887001964264, - 0.03391280099458527, - 0.2014175210497342, - 0.03348828002344817, - 0.06602594503783621, - 0.08082784998987336, - 0.07724991798750125, - 0.18356797799060587, - 0.11589122800796758, - 0.04927823900652584, - 0.08632780602783896, - 0.12070571600634139, - 0.05236852201051079, - 0.013623752005514689, - 0.06784889100526925, - 0.09671121901192237, - 0.21657790799508803, - 0.08977177200722508, - 0.09543593600392342, - 0.04940340200846549, - 0.007162884998251684, - 0.02364798700727988, - 0.1138879130303394, - 0.02395352500025183, - 0.18727189500350505, - 0.2215952549886424, - 0.012061257992172614, - 0.2084136129997205, - 0.4679360059672035, - 0.011389181992853992, - 0.06358606199501082, - 0.07144519299617968, - 0.5119486910116393, - 0.09481279598549008, - 0.07164935101172887, - 0.01660422998247668, - 0.021727263985667378, - 0.11830485198879614, - 0.08503841800848022, - 0.24152526100806426, - 0.3851143259817036, - 0.014454791002208367, - 0.08368269901257008, - 0.6341798279900104, - 0.029531067004427314, - 0.07948218900128268, - 0.09969779700622894, - 0.5127178419934353, - 0.10404755000490695, - 0.3072130080254283, - 0.07233635601005517, - 0.411756831992534, - 0.1566083929646993, - 0.37033746598172, - 0.04523293201054912, - 0.10454520001076162, - 0.08341840301000047, - 0.5303117890143767, - 0.3614524139848072, - 0.15377432605600916, - 0.06661766700563021, - 0.14059650400304236, - 0.14373443899967242, - 0.4002059409976937, - 0.4838628880097531, - 0.1595194179972168, - 0.15318893296353053, - 1.0211009640188422, - 0.08643424000183586, - 2.2688007447868586e-05, - 0.12578471699089278, - 0.5481215169565985, - 0.18188287399243563, - 0.3701455969567178, - 0.4901342209923314, - 0.19245181998121552, - 0.01877351499570068, - 0.24255388000165112, - 0.10536983897327445, - 0.1668453909951495, - 0.1594165120040998, - 0.09177068297867663, - 0.09336083004018292, - 0.08640619799552951, - 0.38640101098280866, - 0.10509062597702723, - 0.0603302410163451, - 0.7237794190150453, - 0.16051673000038136, - 0.6544297670188826, - 0.058095741012948565, - 0.1557813100516796, - 0.2354751300154021, - 0.19707682098669466, - 0.04275011000572704, - 0.22990247499546967, - 0.031127235997701064, - 4.348000220488757e-05, - 0.25160603297990747, - 0.25428352202288806, - 0.5904670829913812, - 0.2047260619874578, - 0.019385749998036772, - 0.07633823300420772, - 0.577104514974053, - 0.19561652198899537, - 0.016444414999568835, - 0.23241538897855207, - 0.39277110100374557, - 0.09139817699906416, - 0.1218541840207763, - 0.4905700050294399, - 0.20106742999632843, - 0.38946309496532194, - 0.14177461799408775, - 0.18160325699136592, - 0.13255965999269392, - 0.22134482499677688, - 0.0478240980010014, - 0.02299230601056479, - 0.0920578040095279, - 0.2220535129745258, - 0.12055463797878474, - 0.35978526197141036, - 0.15810338698793203, - 0.0582942409964744, - 0.238321488010115, - 0.021359089005272835, - 0.10814808198483661, - 0.151543176965788, - 0.6452835409872932, - 0.6307224860065617, - 0.15974305004056077, - 0.041166971015627496, - 0.07669584400719032, - 0.6090428390307352, - 0.025861735004582442, - 0.6522407370066503, - 0.7895520639722236, - 0.20245217099727597, - 0.6350400310038822, - 0.08456228999421, - 0.1949512680148473, - 0.054580625001108274, - 0.71644448202278, - 0.03618137800367549, - 0.02985672900103964, - 0.07612455196795054, - 0.6230506620195229, - 0.014688017006847076, - 0.07090603798860684, - 0.04450001199438702, - 0.7491337840619963, - 0.021895182988373563, - 0.09930352900119033, - 0.03628485098306555, - 0.07431411299330648, - 0.08786342697567306, - 0.028565488013555296, - 0.030801853019511327, - 0.17414596902381163, - 0.03102184298040811, - 0.028205180991790257, - 0.06731341002159752, - 0.04774439902394079, - 0.07435579699813388, - 0.034795454994309694, - 0.07815889999619685, - 0.031156747980276123, - 0.10026136197848246, - 0.033456595047027804, - 0.07445830698998179, - 0.05867070399108343, - 0.040301948043634184, - 0.03175030104466714, - 0.017739833987434395, - 0.061944176006363705, - 0.11318652500631288 - ], - "generation_latencies": [ - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0 - ], - "throughput_timeline": [], - "prefill_latencies": [ - 0.019826055999146774, - 0.027852017999975942, - 0.02779686301073525, - 0.016361040994524956, - 0.018238813994685188, - 0.021448113999213092, - 0.022307469000224955, - 0.10199258099601138, - 0.03465168499678839, - 0.040585359995020553, - 0.018412022007396445, - 0.017487895995145664, - 0.029813801011187024, - 0.0665708889864618, - 0.0075185969908488914, - 0.007709644007263705, - 0.05135608899581712, - 0.007673195999814197, - 0.01153943600365892, - 0.026530929011642, - 0.05317911300517153, - 0.053201882998109795, - 0.04582607900374569, - 0.09621792500547599, - 0.1336445850029122, - 0.1371221260051243, - 0.11415151700202841, - 0.114130654008477, - 0.14636694198998157, - 0.16484001599019393, - 0.1461836569942534, - 0.08539099000336137, - 0.07965016399975866, - 0.13128119500470348, - 0.027200902011827566, - 0.03590254999289755, - 0.017849067007773556, - 0.018670856996322982, - 0.02667208999628201, - 0.014770991998375393, - 0.022693008999340236, - 0.05147982000198681, - 0.06201906900969334, - 0.06543064001016319, - 0.09269453299930319, - 0.0978517479961738, - 0.06299988900718745, - 0.06280319599318318, - 0.10544350299460348, - 0.07096775999525562, - 0.07565636599611025, - 0.07131656000274234, - 0.07510223900317214, - 0.07225774499238469, - 0.07133567999699153, - 0.08092363500327338, - 0.1238157810003031, - 0.046816571993986145, - 0.0777300759946229, - 0.08159148300183006, - 0.019523490002029575, - 0.08411196299130097, - 0.08846868699765764, - 0.08842568199906964, - 0.07911433400295209, - 0.07285860099364072, - 0.012574925000080839, - 0.019814249011687934, - 0.03069561001029797, - 0.022368293008184992, - 0.010601371992379427, - 0.017845288006355986, - 0.012861969997175038, - 0.02037569800450001, - 0.020748238996020518, - 0.03254173000459559, - 0.032493186998181045, - 0.019565381997381337, - 0.021231721999356523, - 0.013723007010412402, - 0.10351269699458499, - 0.11212204999173991, - 0.1182457160030026, - 0.12177814400638454, - 0.11874222900951281, - 0.02328347299771849, - 0.024100479000480846, - 0.12159714999143034, - 0.005591031993390061, - 0.013336349991732277, - 0.03310289600631222, - 0.021720453005400486, - 0.029199478012742475, - 0.09025753600872122, - 0.07689386300626211, - 0.08958742699178401, - 0.07716400400386192, - 0.07756584699382074, - 0.10417475100257434, - 0.10984600000665523, - 0.1039429719967302, - 0.09705185701022856, - 0.10220621500047855, - 0.11051160198985599, - 0.10496609099209309, - 0.022428121999837458, - 0.10545178300526459, - 0.01607320700713899, - 0.017971597000723705, - 0.04886230001284275, - 0.01395125400449615, - 0.01649939699564129, - 0.03513406500860583, - 0.1096968689962523, - 0.08016209398920182, - 0.011081333999754861, - 0.09013527899514884, - 0.09004531799291726, - 0.15928981099568773, - 0.1600859180034604, - 0.09593625499110203, - 0.09210545000678394, - 0.09533251699758694, - 0.02043487600167282, - 0.022755839003366418, - 0.03544193699781317, - 0.020030205996590666, - 0.02746203100832645, - 0.022861564997583628, - 0.1205632479977794, - 0.2835234579979442, - 0.28577948499878403, - 0.4560973379993811, - 0.5725248850067146, - 0.46986062699579634, - 0.5735407220054185, - 0.3034440170013113, - 0.4877503869938664, - 0.013242727989563718, - 0.022569469001609832, - 0.3271208199876128, - 0.025171911998768337, - 0.04647462999855634, - 0.3163080380036263, - 0.0497076160099823, - 0.024094234002404846, - 0.055710115004330873, - 0.012008742996840738, - 0.0, - 0.016510386994923465, - 0.031908851000480354, - 0.03212637700198684, - 0.03759553999407217, - 0.0, - 0.0, - 0.03296172298723832, - 0.032980183998006396, - 0.030191707992344163, - 0.02869919899967499, - 0.04138768899429124, - 0.05102458300825674, - 0.01683685599709861, - 0.022095195003203116, - 0.022901721007656306, - 0.029930081000202335, - 0.04715258500073105, - 0.011660363001283258, - 0.023366302004433237, - 0.029410315997665748, - 0.025336798003991134, - 0.04838351999933366, - 0.036274711004807614, - 0.13320726399251726, - 0.3003441360051511, - 0.021718860996770673, - 0.010736661992268637, - 0.1745119010010967, - 0.0360948909947183, - 0.013299033991643228, - 0.032820730993989855, - 0.012672868004301563, - 0.00674187000549864, - 0.0, - 0.020037592999869958, - 0.05041933599568438, - 0.021848974996828474, - 0.024610934997326694, - 0.02825212500465568, - 0.023832512000808492, - 0.026074110995978117, - 0.03236680199916009, - 0.03653748099168297, - 0.03487642599793617, - 0.026818340993486345, - 0.002182431999244727, - 0.18010663099994417, - 0.16404049399716314, - 0.02030643699981738, - 0.038125693987240084, - 0.03253890300402418, - 0.03572925001208205, - 0.029980564009747468, - 0.029774309994536452, - 0.040726300998358056, - 0.045879962999606505, - 0.02803587500238791, - 0.17077910198713653, - 0.0, - 0.0, - 0.03185859299264848, - 0.01576854100858327, - 0.011654130998067558, - 0.029201876008301042, - 0.0424134380009491, - 0.02102438399742823, - 0.018581848999019712, - 0.01318127999547869, - 0.027332800993463024, - 0.030534473989973776, - 0.0, - 0.028742484995746054, - 0.0, - 0.0, - 0.02464194099593442, - 0.030958265997469425, - 0.014482717000646517, - 0.060711968995747156, - 0.02959837400703691, - 0.016660590990795754, - 0.07322373999340925, - 0.01259773199853953, - 0.006359845996485092, - 0.01709352199395653, - 0.2631493049993878, - 0.03682330599986017, - 0.023164380007074215, - 0.0, - 0.012463197999750264, - 0.03719645999080967, - 0.006283209004322998, - 0.03980003300239332, - 0.012085608002962545, - 0.017341605998808518, - 0.034356345000560395, - 0.024289193999720737, - 0.03189929999643937, - 0.03009775999817066, - 0.011959071998717263, - 0.038590890006162226, - 0.05255648099409882, - 0.011607627006014809, - 0.020944015006534755, - 0.02884224300214555, - 0.03328636700462084, - 0.0, - 0.01568319600482937, - 0.0, - 0.017177219997392967, - 0.03268747300899122, - 0.0, - 0.018567612001788802, - 0.0, - 0.028620206998311915, - 0.02803319699887652, - 0.029278658999828622, - 0.03333855699747801, - 0.023002824003924616, - 0.0, - 0.0, - 0.019042995001655072, - 0.027645095993648283, - 0.033465323998825625, - 0.03334675899532158, - 0.0, - 0.0, - 0.023896476996014826, - 0.027970579991233535, - 0.026313719994504936, - 0.06670723500428721, - 0.0601639160013292, - 0.02378557100018952, - 0.017408503001206554, - 0.04974228599166963, - 0.2151693769992562, - 0.04604816800565459, - 0.035275456990348175, - 0.016603545998805203, - 0.0, - 0.0, - 0.040272422003909014, - 0.0, - 0.031723428997793235, - 0.0, - 0.0, - 0.024926928002969362, - 0.024320425000041723, - 0.0, - 0.0, - 0.0, - 0.023539116999018006, - 0.0, - 0.3090878660004819, - 0.02630422000947874, - 0.026457442989340052, - 0.03801343399391044, - 0.02224743900296744, - 0.011566424000193365, - 0.016449609989649616, - 0.0, - 0.017120964010246098, - 0.012148221998359077, - 0.0, - 0.043990125006530434, - 0.02294511199579574, - 0.017434825989766978, - 0.023064473003614694, - 0.03438280300179031, - 0.016464348998852074, - 0.01718763199460227, - 0.03316209600598086, - 0.021609393996186554, - 0.0, - 0.037184607004746795, - 0.010808816005010158, - 0.03298833000008017, - 0.03061326600436587, - 0.0, - 0.025599174987291917, - 0.025700743004563265, - 0.0, - 0.01062663400080055, - 0.03280399400682654, - 0.0, - 0.03723132399318274, - 0.0, - 0.034132608998334035, - 0.02203300900873728, - 0.0, - 0.033536885006469674, - 0.04459710099035874, - 0.02702066898928024, - 0.02132342298864387, - 0.0, - 0.01738859601027798, - 0.0, - 0.0, - 0.021642299005179666, - 0.0, - 0.0, - 0.04432913599885069, - 0.021833268998307176, - 0.01636370999040082, - 0.0, - 0.020948172998032533, - 0.021731380998971872, - 0.027535310000530444, - 0.03725287700945046, - 0.017897432000609115, - 0.0, - 0.0, - 0.011089534003986046, - 0.0, - 0.027174155999091454, - 0.006212152002262883, - 0.016475663011078723, - 0.03312340099364519, - 0.03551711300679017, - 0.03414263999729883, - 0.045971192987053655, - 0.01729938200151082, - 0.010429813002701849, - 0.04258955399564002, - 0.0, - 0.026826478002476506, - 0.022848893000627868, - 0.022018875999492593, - 0.0, - 0.0, - 0.0019586899870773777, - 0.011115243003587238, - 0.022335917004966177, - 0.02300374599872157, - 0.0, - 0.01826846500625834, - 0.031231232002028264, - 0.027617860003374517, - 0.02689111200743355, - 0.0, - 0.03994193900143728, - 0.011644254991551861, - 0.012131474999478087, - 0.011790286996983923, - 0.022585482001886703, - 0.010675379002350383, - 0.0, - 0.006614224999793805, - 0.03974855500564445, - 0.022215960998437367, - 0.0, - 0.0007258749974425882, - 0.0, - 0.0, - 0.01656217299751006, - 0.0015338679950218648, - 0.0006584660004591569, - 0.0, - 0.0, - 0.0009107410005526617, - 0.016343120005331002, - 0.016828541993163526, - 0.0, - 0.0, - 0.0, - 0.013233732999651693, - 0.0, - 0.013480143999913707, - 0.0, - 0.17304461800085846, - 0.03327683800307568, - 0.01926002600521315, - 0.0, - 0.01141683199966792, - 0.022803507003118284, - 0.0, - 0.0, - 0.0, - 0.41242778800369706, - 0.42351943798712455, - 0.027458106997073628, - 0.0, - 0.0290191449894337, - 0.012210090004373342, - 0.0, - 0.0014445289998548105, - 0.0, - 0.0, - 0.0, - 0.03618368301249575, - 0.034780132002197206, - 0.0, - 0.027259081005468033, - 0.0, - 0.013743012998020276, - 0.0, - 0.02698085000156425, - 0.03331507899565622, - 0.0, - 0.01089726599457208, - 0.0388919089891715, - 0.0, - 0.0, - 0.0, - 0.03968535199237522, - 0.0, - 0.0, - 0.0, - 0.29096354699868243, - 0.3013197370019043, - 0.02345525100827217, - 0.03296987600333523, - 0.05045944399898872, - 0.0, - 0.03122741800325457, - 0.0, - 0.0, - 0.0, - 0.00760507800441701, - 0.01826141800847836, - 0.0, - 0.02720857399981469, - 0.02985970300505869, - 0.0, - 0.04743215499911457, - 0.03254160900542047, - 0.01853829200263135, - 0.0, - 0.020366941011161543, - 0.0, - 0.02468270799727179, - 0.0, - 0.023522878997027874, - 0.010699966995161958, - 0.028197250998346135, - 0.02584858400223311, - 0.02960577000339981, - 0.023171825989265926, - 0.016287302001728676, - 0.0, - 0.0, - 0.0, - 0.0, - 0.010980170001857914, - 0.0, - 0.0, - 0.03357581498858053, - 0.0, - 0.028673843000433408, - 0.0, - 0.0007853930001147091, - 0.0, - 0.0, - 0.03701407300832216, - 0.0, - 0.03776326098886784, - 0.0, - 0.0, - 0.127742191994912, - 0.03364070400130004, - 0.034660570992855355, - 0.0, - 0.002061794002656825, - 0.002517579006962478, - 0.005388317993492819, - 0.0, - 0.023770959000103176, - 0.0, - 0.0, - 0.024868609994882718, - 0.01559311500750482, - 0.0, - 0.026804681998328306, - 0.012793943009455688, - 0.018638472000020556, - 0.006681497005047277, - 0.0, - 0.020398313994519413, - 0.03890837199287489, - 0.046652714008814655, - 0.0, - 0.021477344998857006, - 0.0, - 0.031598396992194466, - 0.01374467600544449, - 0.03706888800661545, - 0.0, - 0.013521190005121753, - 0.0017698459996609017, - 0.0, - 0.0, - 0.0, - 0.012062149005942047, - 0.003342288007843308, - 0.0, - 0.0, - 0.0, - 0.004753261004225351, - 0.013200743997003883, - 0.010785855003632605, - 0.019723650999367237 - ], - "decode_latencies": [ - 0.011833675001980737, - 0.005254264993709512, - 0.010674788994947448, - 0.006254306994378567, - 0.005534740004804917, - 0.0007713740051258355, - 0.0017590070056030527, - 0.010586452001007274, - 0.005308282998157665, - 0.004258077999111265, - 0.0007851510017644614, - 0.004880974011030048, - 0.05417419000877999, - 0.002844592003384605, - 0.028555088996654376, - 0.005515475000720471, - 0.006170729990117252, - 0.007296634998056106, - 0.010647613991750404, - 0.012970419993507676, - 0.011265344001003541, - 0.02634453499922529, - 0.013352716996450908, - 0.013887958994018845, - 0.005324354991898872, - 0.008290256999316625, - 0.050696546997642145, - 0.04117652001150418, - 0.01509106598678045, - 0.01420635700924322, - 4.6558998292312026e-05, - 0.008698837002157234, - 0.014341059999424033, - 0.0063663480104878545, - 0.011236189006012864, - 0.004783922995557077, - 0.01862020000407938, - 0.016791084010037594, - 0.01782416200148873, - 0.0035887150006601587, - 0.02610318799270317, - 0.01871132200176362, - 0.018646603013621643, - 0.008191931992769241, - 0.019230705991503783, - 0.003293508998467587, - 0.004813994004507549, - 0.010237580994726159, - 0.0031618100038031116, - 0.014114685007371008, - 0.005942784002400003, - 0.003261599995312281, - 0.018442721993778832, - 0.005834212992340326, - 0.023339821011177264, - 0.023445220998837613, - 0.021701517995097674, - 0.010328669988666661, - 0.020453135002753697, - 0.01220791599189397, - 0.01324185800331179, - 0.023334100987995043, - 0.016994083998724818, - 0.0128542269958416, - 0.014231253997422755, - 0.016194205993087962, - 0.00279134999436792, - 0.0030757429922232404, - 0.028180487992358394, - 0.01598549500340596, - 0.11981815099716187, - 0.06302504800260067, - 0.0029767270025331527, - 0.00038333600969053805, - 0.08841144699545112, - 0.00815625800169073, - 0.018247774001793005, - 0.0030047439940972254, - 0.08361140299530234, - 0.006113968993304297, - 0.06973799900151789, - 0.01872882699535694, - 0.045058797011733986, - 0.020935067994287238, - 0.009470407996559516, - 0.01949649100424722, - 0.035933883002144285, - 0.004372697992948815, - 0.008499691990436986, - 0.01016081300622318, - 0.019673300004797056, - 0.011687564008752815, - 0.016194566997000948, - 0.009311198999057524, - 0.024805865992675535, - 0.006720938006765209, - 0.011443137991591357, - 0.006244432996027172, - 0.0035775810101768, - 0.01464513799874112, - 0.011527288996148854, - 0.020216288001392968, - 0.10105975400074385, - 0.18548798000847455, - 0.012076635990524665, - 0.015222236994304694, - 0.012308351011597551, - 0.026244648994179443, - 0.004019378000521101, - 0.021214480002527125, - 0.10151329499785788, - 0.013990007995744236, - 0.007504622000851668, - 0.08896663998893928, - 0.010744090002845041, - 0.09565543800999876, - 0.012522603006800637, - 0.0010002219933085144, - 0.012805127989850007, - 0.2767643270053668, - 0.008293218998005614, - 0.0037193790049059317, - 0.012204739003209397, - 0.003034113993635401, - 0.03408466500695795, - 0.006659666993073188, - 0.013781920992187224, - 0.018775667995214462, - 0.01703784200071823, - 0.027033542006392963, - 0.033450668008299544, - 0.0060206420021131635, - 0.013455715990858153, - 0.013571810006396845, - 0.02077725100389216, - 0.024342655000509694, - 0.016736669989768416, - 0.02945764000469353, - 0.005377708002924919, - 0.023947545007104054, - 0.022891827000421472, - 0.022496753008454107, - 0.035807809996185824, - 0.015505607007071376, - 0.02281282498734072, - 0.020502676998148672, - 0.005708555996534415, - 0.013557528000092134, - 0.029836859001079574, - 0.006248523990507238, - 0.005724528004066087, - 0.010930028991424479, - 0.01579203699657228, - 0.03181536099873483, - 0.010361748005379923, - 0.13951503399584908, - 0.00541745699592866, - 0.016196957003558055, - 0.0008728450047783554, - 0.02500500500900671, - 0.16599380600382574, - 0.010741391000919975, - 0.021006102993851528, - 0.011839006008813158, - 0.01773655200668145, - 0.010913755002547987, - 0.011736971995560452, - 0.011383382996427827, - 0.0245790519984439, - 0.016125580004882067, - 0.021806771997944452, - 0.017842262008343823, - 0.019895435994840227, - 0.015564100001938641, - 0.016399259999161586, - 0.00590507299057208, - 0.007209350005723536, - 0.005553602008149028, - 0.010977742000250146, - 0.005468192000989802, - 0.005800472994451411, - 0.015305722001357935, - 0.005861269004526548, - 0.005186308000702411, - 0.01600745199539233, - 0.14384761500696186, - 0.01095821600756608, - 0.018373286991845816, - 0.0053672750073019415, - 0.026079418996232562, - 0.010694939992390573, - 0.02710900901001878, - 0.02247051300946623, - 0.020660066002164967, - 0.031739723999635316, - 0.0052009819919476286, - 0.012363571004243568, - 0.005419681008788757, - 0.0008751220011617988, - 0.005397100991103798, - 0.0055162470089271665, - 0.011435386986704543, - 0.021490588012966327, - 0.005646766003337689, - 0.010477566989720799, - 0.021080257996800356, - 0.013736744003836066, - 0.01860566600225866, - 0.02080270298756659, - 0.021225367992883548, - 0.01629482800490223, - 0.010306162002962083, - 0.006053814999177121, - 0.005437542000436224, - 0.00020568500622175634, - 0.0059231549967080355, - 0.005527201996301301, - 0.02210314100375399, - 9.402100113220513e-05, - 0.019113126007141545, - 0.012277929010451771, - 0.007547740999143571, - 0.01727692800341174, - 4.745599289890379e-05, - 0.010722980994614772, - 0.010577837005257607, - 0.010759373995824717, - 0.0003334110078867525, - 0.020886799989966676, - 0.010384862005594186, - 0.017026112996973097, - 0.005215801997110248, - 0.01596756500657648, - 0.010188721003942192, - 0.010405309003544971, - 0.31354407699836884, - 0.011312779999570921, - 0.010836986999493092, - 0.00621300601051189, - 2.960900019388646e-05, - 0.017076115997042507, - 0.01605398699757643, - 0.006796688991016708, - 0.011363202997017652, - 0.01667610599542968, - 0.006025994996889494, - 0.01134764900780283, - 0.007221519001177512, - 0.005297363008139655, - 0.02175169599649962, - 0.005397610002546571, - 0.17251902200223412, - 0.005125907002366148, - 0.005449268006486818, - 0.005868239997653291, - 0.02690078699379228, - 0.02244632999645546, - 0.012222379999002442, - 0.00914166501024738, - 0.01670635699701961, - 0.01754087499284651, - 0.02257346999249421, - 0.02260637799918186, - 0.01636309600144159, - 0.0007676259992877021, - 0.010477344010723755, - 0.010639289001119323, - 0.01633212700835429, - 0.00033833399356808513, - 0.021719721000408754, - 0.012169256006018259, - 0.02245615399442613, - 0.005750087002525106, - 0.02400746000057552, - 0.024443928006803617, - 0.016873085987754166, - 0.007300796001800336, - 0.010954890007269569, - 0.2926139780029189, - 0.005511123992619105, - 0.011066674996982329, - 0.016098334002890624, - 0.017170880004414357, - 0.026578099001199007, - 0.01572799400310032, - 0.018580204996396787, - 0.017588385002454743, - 0.013760371002717875, - 0.021350921990233473, - 0.016595252993283793, - 0.02237810600490775, - 0.017453706997912377, - 0.005155507999006659, - 0.021858835010789335, - 0.01204559500911273, - 0.0228106150025269, - 0.016350714009604417, - 0.017095207003876567, - 0.02825463199405931, - 0.012228118008351885, - 0.005361792995245196, - 0.0057574759994167835, - 0.016502113998285495, - 0.011294583004200831, - 0.005835208008647896, - 0.016003010008716956, - 0.00016765198961365968, - 0.005552565009566024, - 0.01527889100543689, - 0.011717864996171556, - 0.015959644006215967, - 0.00910905199998524, - 0.011051874011172913, - 0.011106162011856213, - 0.023313126992434263, - 0.0053827660012757406, - 0.011759610002627596, - 0.018029156999546103, - 0.02058703498914838, - 0.005471272001159377, - 9.765599679667503e-05, - 0.0055711769964545965, - 0.0002531830104999244, - 0.011395244000595994, - 0.00547685899073258, - 0.005338065995601937, - 0.005614006004179828, - 0.00521122598729562, - 0.010500699994736351, - 0.005455760998302139, - 0.005524944004719146, - 0.005688107994501479, - 0.005472846998600289, - 0.010476330993697047, - 0.000957575990469195, - 0.0220457039977191, - 0.005989426004816778, - 0.0057865130074787885, - 0.005342371994629502, - 0.011101053009042516, - 0.010794995003379881, - 0.005266149993985891, - 0.022276286006672308, - 0.00025855300191324204, - 0.01140323501022067, - 0.011669254003209062, - 0.005360126990126446, - 0.006202076008776203, - 0.00633241499599535, - 0.01066918799187988, - 0.0055461629963247105, - 0.015664049991755746, - 0.028246315006981604, - 0.012736734992358834, - 0.01167695299955085, - 0.007415757994749583, - 0.011420052003813908, - 0.0004595099890138954, - 0.005533225004910491, - 0.02152704900072422, - 0.021158472998649813, - 0.011526047994266264, - 0.005382311006542295, - 0.006520249007735401, - 0.015865289999055676, - 0.016068337004981004, - 0.011074424997786991, - 0.010736191994510591, - 0.005477600003359839, - 0.006026123999617994, - 0.02154669800074771, - 0.01618821799638681, - 0.005243845997028984, - 6.79299992043525e-05, - 0.01201025300542824, - 0.006013691003317945, - 0.011445190000813454, - 0.005412215992691927, - 0.017084027000237256, - 0.010465917002875358, - 0.01634443500370253, - 0.02200575699680485, - 0.0108177400106797, - 0.005145492992596701, - 0.005414194005425088, - 0.011384472003555857, - 0.00012320598762016743, - 0.011593916002311744, - 0.00018104999617207795, - 0.016026805998990312, - 0.005227702000411227, - 8.283900388050824e-05, - 0.013089254003716633, - 0.007415338011924177, - 8.526499732397497e-05, - 0.1339016139972955, - 0.012214175003464334, - 0.0032344100036425516, - 0.0059488599945325404, - 0.0066017350036418065, - 0.017593494994798675, - 0.4015440410003066, - 0.00561322299472522, - 0.005346893012756482, - 0.00046589699923060834, - 0.016241596997133456, - 0.005251925002085045, - 0.011197656000149436, - 0.017532313999254256, - 0.01274839100369718, - 0.017071567999664694, - 0.002550885998061858, - 0.0015936889976728708, - 0.010613113001454622, - 0.005235459000687115, - 0.011965825004153885, - 0.0002695980074349791, - 0.0017043139960151166, - 0.40665603800152894, - 0.016041892988141626, - 0.03428194799926132, - 0.010973249009111896, - 0.006195402005687356, - 0.036456329995417036, - 0.017695762988296337, - 0.011834248987725005, - 0.0059126450069015846, - 0.01718144500046037, - 0.03309945898945443, - 0.0054164869943633676, - 0.007363330994849093, - 0.01540243299677968, - 0.01820282099652104, - 0.01700239699857775, - 0.017385124010615982, - 0.01206520400592126, - 0.01548167600412853, - 0.00701970599766355, - 0.3124428209994221, - 0.016333211009623483, - 0.010191188004682772, - 0.005896162998396903, - 0.005409791003330611, - 0.00955836899811402, - 0.01597545099502895, - 0.0170461880043149, - 0.020940678004990332, - 0.02134813100565225, - 0.011050543005694635, - 0.005146979994606227, - 0.0052585439989343286, - 0.010512024993658997, - 0.005526419001398608, - 0.021326700007193722, - 0.01304790900030639, - 0.03673810200416483, - 0.01974278899433557, - 0.023326402995735407, - 0.011860689002787694, - 0.006504600998596288, - 0.017536564002512023, - 0.01646848500240594, - 0.023301948996959254, - 0.006788550992496312, - 0.01235742200515233, - 0.2799967120081419, - 0.013183629998820834, - 0.012293896012124605, - 0.026656008994905278, - 0.02557804700336419, - 0.005651532002957538, - 0.0311661329906201, - 0.011480966990347952, - 0.020540817000437528, - 0.02615915700152982, - 0.0193375740054762, - 0.11512507799488958, - 0.005810888003907166, - 0.019067899003857747, - 0.0062929260020609945, - 0.018551767003373243, - 0.00011553500371519476, - 0.007719632994849235, - 0.011158085995703004, - 0.011524030996952206, - 0.011299139994662255, - 0.01153983100084588, - 0.022895624992088415, - 0.015790328005095944, - 0.006665512002655305, - 0.010329052995075472, - 0.006647995993262157, - 0.0007448520045727491, - 0.006469445987022482, - 0.008079853010713123, - 0.11893723999673966, - 0.03219378700305242, - 0.0014805320097366348, - 0.001318922993959859, - 0.006032488003256731, - 0.0007746069895802066, - 0.006281684007262811, - 7.929500134196132e-05, - 0.0034831929951906204, - 0.001310042993281968, - 0.006470114996773191, - 0.11743651299912017, - 0.01244171499274671, - 0.01285001500218641, - 0.019029963004868478, - 0.01550737199431751, - 0.012276401990675367, - 0.12025580198678654, - 0.010120239996467717, - 0.01886677999573294, - 0.006806395002058707, - 0.0059180730022490025, - 0.007214242999907583, - 0.00677113700658083, - 0.009485785994911566, - 0.007828464003978297, - 0.012199552002130076, - 0.007344885991187766, - 0.006213378990651108, - 0.00490893900860101, - 0.010839542010216974, - 0.004026125010568649, - 0.0029944429988972843, - 0.007064082994475029, - 0.0033181460021296516, - 0.0040591319993836805, - 0.002983134996611625, - 0.005124917995999567, - 0.003809027999523096, - 0.009752468002261594, - 0.004026476002763957, - 0.003517938996083103, - 0.00225745199713856, - 0.0014774250012123957, - 0.003815499003394507, - 0.007367348007392138, - 0.004245930002070963, - 0.0018586879887152463, - 0.0015946360072121024, - 0.003855736998957582, - 0.00920465899980627, - 0.004470361993298866, - 0.005519703001482412, - 0.008042160989134572 - ], - "multi_turn_cache_hits": 80, - "multi_turn_cache_misses": 314, - "seed": 42, - "summary": { - "total_requests": 549, - "total_tokens": 148164, - "elapsed_time": 10.61742639541626, - "avg_throughput_tokens_per_sec": 13954.794173469867, - "requests_per_second": 51.707445811634116, - "end_to_end_latency_ms": { - "mean": 5920.393522644944, - "p50": 6287.425850998261, - "p95": 11181.517952599097, - "p99": 11209.622928166064 - }, - "storage_io_latency_ms": { - "mean": 228.77682118112764, - "p50": 140.8876609930303, - "p95": 735.4701053816827, - "p99": 920.0112331303534 - }, - "generation_latency_ms": { - "mean": 0.0, - "p50": 0.0, - "p95": 0.0, - "p99": 0.0 - }, - "cache_stats": { - "cache_hit_rate": 0.9276824756138581, - "cache_hits": 5516, - "cache_misses": 430, - "gpu_entries": 370, - "cpu_entries": 63, - "nvme_entries": 0, - "gpu_memory_used_gb": 6.37841796875, - "cpu_memory_used_gb": 2.0386962890625, - "offloads_cpu": 63, - "offloads_nvme": 0, - "storage_health": { - "overall_status": "PASS", - "criteria": [ - { - "name": "Cache Hit Rate > 30%", - "target": 0.3, - "actual": 0.9276824756138581, - "unit": "ratio", - "passed": true - } - ], - "passed_count": 1, - "total_count": 1 - }, - "prefill_writes": 433, - "decode_reads": 5516, - "prefill_bytes_written_gb": 7.3505859375, - "decode_bytes_read_gb": 93.3670654296875, - "system_prompt_hits": 1088, - "common_phrase_hits": 0, - "user_cache_hits": 4348, - "multi_turn_hits": 80, - "total_read_bytes": 100252123136, - "total_write_bytes": 7892631552, - "total_read_gb": 93.3670654296875, - "total_write_gb": 7.3505859375, - "read_write_ratio": 12.701989504450644, - "read_iops": 5516, - "write_iops": 433, - "gpu_read_p50_ms": 10.779938500490971, - "gpu_read_p95_ms": 83.31552124946029, - "gpu_read_p99_ms": 278.1162148938165, - "gpu_write_p50_ms": 28.197250998346135, - "gpu_write_p95_ms": 167.21565038897086, - "gpu_write_p99_ms": 445.67240999545925 - }, - "qos_metrics": { - "interactive": { - "total_requests": 549, - "latency_ms": { - "mean": 5920.393522644944, - "p50": 6287.425850998261, - "p95": 11181.517952599097, - "p99": 11209.622928166064, - "max": 11576.811840990558 - }, - "sla": { - "target_p95_ms": 50, - "actual_p95_ms": 11181.517952599097, - "compliance": 0.0018214936247723523, - "met": false - } - }, - "responsive": { - "no_data": true - }, - "batch": { - "no_data": true - } - }, - "prefix_cache_stats": { - "prefix_hits": 116, - "prefix_misses": 433, - "system_prompt_reuse": 116, - "common_phrase_reuse": 0, - "bytes_saved": 102891520 - }, - "autoscaling_stats": [], - "autoscaling_summary": null, - "multi_turn_stats": { - "cache_hits": 80, - "cache_misses": 314, - "hit_rate": 0.20304568527918782 - } - } -} \ No newline at end of file diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_only_trial1.json b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_only_trial1.json deleted file mode 100644 index 197eef16..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_only_trial1.json +++ /dev/null @@ -1,2875 +0,0 @@ -{ - "requests_completed": 549, - "total_tokens_generated": 146900, - "total_storage_io_latency": 86.83119057111617, - "total_generation_latency": 0.0, - "end_to_end_latencies": [ - 0.15483045499422587, - 0.16587525300565176, - 0.1850858010002412, - 0.2271619740058668, - 0.2751319179951679, - 0.36607083899434656, - 0.4235986470011994, - 0.5006419209967135, - 0.5259312109992607, - 0.5358778099907795, - 0.5362829310033703, - 0.548115569006768, - 0.5483334449963877, - 0.558806403001654, - 0.5586658579995856, - 0.5590393140009837, - 0.5607536110037472, - 0.5610622039966984, - 0.5882386029988993, - 0.5951188489998458, - 0.5976580999995349, - 0.5973744119983166, - 0.604536675993586, - 0.6060825150052551, - 0.6121647929976461, - 0.8708840269973734, - 0.8918950639927061, - 0.8923729749949416, - 0.9283270599989919, - 1.20942717899743, - 1.5870242130040424, - 1.6282930839952314, - 1.7637443849962438, - 1.7649334669986274, - 1.7798523959936574, - 1.84510852600215, - 1.9812828940048348, - 2.0595532770093996, - 2.334537985996576, - 2.3401638709910912, - 2.3812486910028383, - 2.433436741004698, - 2.512438085002941, - 2.5433983289985918, - 2.5597305270057404, - 2.893368240009295, - 3.001108821001253, - 3.0172364429890877, - 3.045000522994087, - 3.163195191998966, - 3.6252905590081355, - 3.6718929460039362, - 3.708441850001691, - 3.717989935990772, - 3.891085454990389, - 3.9439171359990723, - 3.9923027260083472, - 4.021721830999013, - 4.05701362300897, - 4.869106651007314, - 4.986078629997792, - 5.129202636991977, - 5.148421669000527, - 5.6475797000020975, - 5.668002709993743, - 5.677634334002505, - 5.8488464999973075, - 5.960949549000361, - 6.039651994011365, - 6.148375004995614, - 6.298543286000495, - 6.379410277004354, - 6.3967374689964345, - 6.42273592800484, - 6.501100401990698, - 6.498966257990105, - 6.929079683002783, - 7.110149006999563, - 7.185550593989319, - 7.268884863005951, - 7.343207129990333, - 7.529181960009737, - 7.629323230008595, - 7.7745205179962795, - 7.925848048995249, - 7.939021280995803, - 7.940836452005897, - 8.043259765996481, - 8.047534081997583, - 8.583551711999462, - 8.7752437320014, - 8.998583811000572, - 9.04686761800258, - 9.06884607100801, - 9.594473303994164, - 9.755000794000807, - 9.75748776299588, - 9.75570324101136, - 9.761911794994376, - 9.852064835009514, - 9.929349330996047, - 10.042592594996677, - 10.103652199002681, - 10.168441667003208, - 10.356843773988658, - 10.37810688499303, - 10.377797527995426, - 10.414645835000556, - 10.45154193599592, - 10.518650558995432, - 11.283769738991396, - 11.448456736994558, - 11.585264061999624, - 11.595297356994706, - 11.611868075007806, - 11.647104875009973, - 11.797374408008181, - 11.80921336299798, - 11.92296440199425, - 11.977808169001946, - 12.217878619994735, - 12.255814335003379, - 12.270243979000952, - 12.308445317990845, - 12.664306613994995, - 12.675583670003107, - 12.681983086004038, - 12.681707501003984, - 12.691205879993504, - 12.693420815994614, - 12.708473227001377, - 12.736510204995284, - 12.782573866992607, - 13.163646607994451, - 14.034236595995026, - 14.092583785008173, - 14.111728521005716, - 14.321212584996829, - 14.36754366500827, - 14.446505408996018, - 14.476907803997165, - 14.545269402995473, - 14.575292455003364, - 14.74551448499551, - 15.313768232997973, - 15.314484273010748, - 15.39625698600139, - 15.424074694004958, - 15.477317858007154, - 15.62740658200346, - 15.776511543008382, - 15.78714823401242, - 15.809117392011103, - 15.858358501005569, - 15.99807997699827, - 16.153519829997094, - 16.301865531000658, - 16.32222051899589, - 16.399934325003414, - 16.420860775004257, - 17.48683058199822, - 17.57113534900418, - 17.58827332900546, - 17.59469876199728, - 17.63090953999199, - 17.6417527290032, - 17.66199187899474, - 17.756226700003026, - 17.813653394012363, - 17.824529117002385, - 17.88607962599781, - 18.274171198994736, - 18.296881547008525, - 18.49510628900316, - 18.613264724001056, - 18.93944465300592, - 18.974477938987548, - 19.025182338999002, - 19.081015567993745, - 19.241915417995187, - 19.46043436299078, - 19.65026594400115, - 19.674242359003983, - 19.6796455779986, - 19.678999857991585, - 19.808906281003146, - 19.95887238100113, - 19.984409023003536, - 20.019886662994395, - 20.143735883990303, - 21.844147623996832, - 21.842549048989895, - 21.85401717000059, - 22.011218355008168, - 22.062560595994, - 22.077103837989853, - 22.092447465998703, - 22.148460219003027, - 22.417267219003406, - 22.488012308007455, - 22.499635356012732, - 22.64731481600029, - 22.71139192400733, - 22.824504735996015, - 22.881647390007856, - 23.06434329399781, - 23.34581126901321, - 23.346180364009342, - 23.454824454005575, - 23.512694941004156, - 23.52783172999625, - 23.531755962001625, - 23.652923068992095, - 23.66251644199656, - 23.918677698995452, - 23.943806729002972, - 24.06138410200947, - 24.15686862298753, - 24.18567460900522, - 24.293566510998062, - 24.29942317500536, - 24.360196353998617, - 24.385062871995615, - 24.432299559004605, - 24.468362177998642, - 24.56857404000766, - 24.588886177996756, - 24.72997373600083, - 24.827226890993188, - 24.937751121004112, - 26.519771407998633, - 26.61211499500496, - 26.89640981098637, - 26.917614156991476, - 26.928067920001922, - 27.04360214198823, - 27.189752118007164, - 27.20667497300019, - 27.243559316004394, - 27.367693815001985, - 27.387096386999474, - 27.67893113500031, - 27.72059230899322, - 27.752217262008344, - 27.789138318999903, - 28.24949606199516, - 28.28477089200169, - 28.34627297699626, - 28.438806813996052, - 28.82379375700839, - 28.872465332009597, - 28.873163512995234, - 28.981867991999025, - 29.101100289990427, - 29.112352923999424, - 29.17065412500233, - 29.307726787999854, - 29.428912129005766, - 29.443467018994852, - 29.507196442995337, - 29.577889772001072, - 29.589623275009217, - 29.59343802901276, - 29.663235043000896, - 29.745666363000055, - 29.791342764001456, - 29.846681764000095, - 29.89980234700488, - 29.940609583994956, - 30.0173311940016, - 30.051888312998926, - 30.240872424008558, - 30.256088169000577, - 30.37969747900206, - 30.46131026200601, - 30.490956861001905, - 30.615366828991682, - 30.73866292400635, - 30.78840554501221, - 30.81939123000484, - 31.06542278599227, - 31.071674457998597, - 31.102744300005725, - 31.176089986998704, - 31.178078545999597, - 32.90869780999492, - 32.985808911995264, - 33.15221654000925, - 33.265811846998986, - 33.41745990500203, - 33.416551801987225, - 33.55472443900362, - 33.606948394997744, - 33.80889862299955, - 33.947414410999045, - 34.04116943599365, - 34.206690148988855, - 34.24319780500082, - 34.25861893399269, - 34.327594786009286, - 34.340910693004844, - 34.46082023800409, - 34.46395222999854, - 34.54732987200259, - 34.58970632799901, - 34.72166590500274, - 34.93577060000098, - 34.954436177999014, - 34.976176866010064, - 35.028385792000336, - 35.05412252199312, - 35.0595595860068, - 35.12530538300052, - 35.132873725000536, - 35.336412956006825, - 35.391604067001026, - 35.50005157099804, - 35.60170086899598, - 35.645071758990525, - 35.65293119699345, - 35.70261628799199, - 35.70315432398638, - 35.73043003799103, - 35.8578720420046, - 35.894203132003895, - 36.11389754900301, - 36.12371183899813, - 36.29939939500764, - 36.76753588300198, - 36.855613178006024, - 36.87606248099473, - 36.906375556995044, - 36.91822027400485, - 37.15848458900291, - 37.21675694799342, - 37.30656319799891, - 37.47227179299807, - 37.50415035900369, - 37.50490297799115, - 37.587995234993286, - 37.651739206005004, - 37.67135045200121, - 37.80450263300736, - 37.81586651901307, - 37.88236571299785, - 37.94530625999323, - 38.071746635003365, - 38.13742041699879, - 38.17444680900371, - 38.18435599299846, - 38.23775333700178, - 38.4672449100035, - 38.625090911998996, - 40.77460437300033, - 40.79979367599299, - 40.81530152300547, - 40.88531351799611, - 40.95818999399489, - 41.01922828699753, - 41.04064551000192, - 41.13898309299839, - 41.1893395389925, - 41.347604228998534, - 41.600594577001175, - 41.63694926099561, - 41.64346667598875, - 41.65855835500406, - 41.813655523001216, - 41.86503656599962, - 41.97279673498997, - 41.98746707600367, - 42.14336465600354, - 42.259126553006354, - 42.271552211997914, - 42.291794764998485, - 42.48703418399964, - 42.600169450000976, - 42.64161184099794, - 42.657959643998765, - 42.67814596700191, - 42.811872531994595, - 43.004216443005134, - 43.06535626699042, - 43.13849368400406, - 43.16019830100413, - 43.33258248500351, - 43.49373366999498, - 43.49882234999677, - 43.50956287600275, - 43.591815986001166, - 43.66824324300978, - 43.94837972900132, - 44.009262959996704, - 44.02413719399192, - 44.12800487301138, - 44.165371273003984, - 44.2571880530013, - 44.32438953399833, - 44.51684838499932, - 44.563853075000225, - 44.5642249550001, - 44.598754411999835, - 44.61030359000142, - 44.65407342700928, - 44.98172451999562, - 45.09928259899607, - 45.09890397799609, - 45.16654062901216, - 45.27496580200386, - 45.36323993399856, - 45.44215057400288, - 45.60610286399606, - 45.737285467999754, - 45.859820019992185, - 45.94249024899909, - 45.97642627100868, - 45.98843587200099, - 46.030601260004914, - 46.095694884992554, - 46.10597819699615, - 46.12556802200561, - 46.17777652401128, - 46.187658604001626, - 46.2613419219997, - 46.26127090799855, - 46.32694388699019, - 46.3640472219995, - 46.47343536300468, - 46.513688122999156, - 46.64356668799883, - 47.18193648400484, - 47.20302444900153, - 47.576690981994034, - 47.58723014499992, - 47.6405153980013, - 47.64670880700578, - 47.67969484500645, - 47.68968549699639, - 47.73172761799651, - 47.76340329399682, - 50.52001914799621, - 50.546093919998384, - 50.608869193994906, - 50.68163712299429, - 50.69312788300158, - 50.72911543698865, - 50.73393053399923, - 50.78799798700493, - 50.87754252800369, - 50.9337454320048, - 50.970065089000855, - 50.971068639002624, - 51.049701098003425, - 51.05390468199039, - 51.17691026900138, - 51.21926692400302, - 51.326840948997415, - 51.58265494200168, - 51.72666898800526, - 51.757558129000245, - 51.83743516499817, - 51.91457762400387, - 52.13844050199259, - 52.391527722997125, - 52.42762587199104, - 52.47120484699553, - 52.497740120001254, - 52.53481958899647, - 52.551821160988766, - 52.810395256004995, - 52.96870873500302, - 53.035515745999874, - 53.08079463400645, - 53.177976883001975, - 53.24716933601303, - 53.37697902400396, - 53.37604228600685, - 53.500284441994154, - 53.64497267699335, - 53.69357212500472, - 53.69715340201219, - 53.818510708995746, - 53.891455592005514, - 53.893229604989756, - 54.037964982999256, - 54.11167423700681, - 54.28581047800253, - 54.30166550799913, - 54.49298605500371, - 54.51361901201017, - 54.52714465399913, - 54.635249844010104, - 54.76913007599069, - 54.806232394999824, - 54.81674763499177, - 54.98772721900605, - 55.1056888759922, - 55.1108828879951, - 55.12255400500726, - 55.17490837899095, - 55.18322994900518, - 55.1866508230014, - 55.1904753389972, - 55.19176896099816, - 55.19046750299458, - 55.190749608998885, - 55.191215380007634, - 55.262853199004894, - 55.344543752013124, - 55.34954682100215, - 55.34876052199979, - 55.36503841099329, - 55.385349046002375, - 55.38699851599813, - 55.387021351998555, - 55.482339727997896, - 55.482890464001684, - 55.49439955600246, - 55.49472825600242, - 55.50274075400375, - 55.51427258600597, - 55.517659042991, - 55.518276073999004, - 55.52799337399483, - 55.53343488200335, - 55.53966940900136, - 55.54625800899521, - 55.54742505699687, - 55.54676350799855, - 55.553631916001905, - 55.55389231001027, - 55.56034794099105, - 55.560063176002586, - 55.56008498099982, - 55.573427333001746, - 55.57481326001289, - 55.57678003401088, - 55.57711423499859, - 55.598441433001426, - 55.60047167100129, - 55.60555294099322, - 55.60601384700567, - 55.60934505199839, - 55.609611149993725, - 55.61012584200944, - 55.61198814600357, - 55.611397301996476, - 55.61304084598669, - 55.61308303300757 - ], - "storage_latencies": [ - 0.1050109520292608, - 0.10892188201250974, - 0.08658670500153676, - 0.049786048009991646, - 0.17964924800617155, - 0.0392763049894711, - 0.130300537974108, - 0.10466290698968805, - 0.19143409199023154, - 0.26702041800308507, - 0.1672220250038663, - 0.13082015901454724, - 0.08504087799519766, - 0.2508131540234899, - 0.2146731650136644, - 0.13321028200152796, - 0.16637095701298676, - 0.27517651399830356, - 0.3078482299897587, - 0.2685931200248888, - 0.29593589299474843, - 0.03404579700145405, - 0.11260353900433984, - 0.3860341280378634, - 0.03408629300247412, - 0.16994892297952902, - 0.2289955629967153, - 0.2853773599927081, - 0.24885500097298063, - 0.047892993010464124, - 0.10783463099505752, - 0.13807504101714585, - 0.07123660801153164, - 0.4005779529979918, - 0.027715877004084177, - 0.28944687999319285, - 0.17902723599399906, - 0.18815688097674865, - 0.046928299008868635, - 0.08521173098415602, - 0.09158707801543642, - 0.21279334697464947, - 0.07842234497366007, - 0.09418772599019576, - 0.3149940349685494, - 0.07771225500619039, - 0.05698770900198724, - 0.12731603598513175, - 0.20640824100701138, - 0.16325303399935365, - 0.1075158619787544, - 0.10078001601505093, - 0.11184848702396266, - 0.015571870986605063, - 0.02436772499640938, - 0.11235526896780357, - 0.6147813630232122, - 0.13999842999328393, - 0.05981113099551294, - 0.30721250698843505, - 0.10849770200729836, - 0.02561949400114827, - 0.5362228720041458, - 0.3710420840216102, - 0.07292539799527731, - 0.04157945400220342, - 0.33106390402826946, - 0.04214676302217413, - 0.07310446299379691, - 0.28005209397815634, - 0.09722394701384474, - 0.02036576000682544, - 0.08406128898786847, - 0.17295732400089037, - 0.2946303599892417, - 0.0472471749963006, - 0.39900226397730876, - 0.4889787229622016, - 0.12034564100031275, - 0.06364866900548805, - 0.31405435399210546, - 0.06875235799816437, - 0.3675636509869946, - 0.20040791199426167, - 0.7027991140057566, - 0.031165328007773496, - 0.14278012099384796, - 0.0836162359919399, - 0.04180259800341446, - 0.1445182260213187, - 0.5848482010187581, - 0.07430933599243872, - 0.023489957020501606, - 0.10773543200048152, - 0.5939898480282864, - 0.05001322798489127, - 0.2508414930343861, - 0.06343767199723516, - 0.04117172901169397, - 0.09840574799454771, - 0.3138470809790306, - 0.19476887900964357, - 0.12317953699675854, - 0.06689361200551502, - 0.05029341299086809, - 0.120276151021244, - 0.06289036302769091, - 0.11004106196924113, - 0.09002314004465006, - 0.07231237100495491, - 0.05234942599781789, - 0.1112254519975977, - 0.05353506898973137, - 0.044437409000238404, - 0.13620126398745924, - 0.05199986297520809, - 0.04452094501175452, - 0.1177761169965379, - 0.13877953197516035, - 0.08339572401018813, - 0.036184825003147125, - 0.2549662330420688, - 0.15833164402283728, - 0.1766066140116891, - 0.03162054199492559, - 0.42711852697539143, - 0.2319778289529495, - 0.14196817899937741, - 0.12210410299303476, - 0.19760608895740006, - 0.6103456309938338, - 0.23163812700659037, - 0.16155581899511162, - 0.07928886798617896, - 0.03155944599711802, - 0.16871001302206423, - 0.06347455097420607, - 0.08534807000251021, - 0.0672539740044158, - 0.08360470896877814, - 0.03612631199939642, - 0.06827540500671603, - 0.04075163100787904, - 0.11634570198657457, - 0.07199382498220075, - 0.05098387501493562, - 0.026136340005905367, - 0.128335100991535, - 0.528728759047226, - 0.09431323701574001, - 0.0731729399994947, - 0.8745073659520131, - 0.4861551709618652, - 0.04811294298269786, - 0.1194486659951508, - 0.2290984200371895, - 0.09250674101349432, - 0.04190280000329949, - 0.1146211840241449, - 0.16834528897015844, - 0.09966375399380922, - 0.6244919370219577, - 0.06658976300968789, - 0.11329830496106297, - 0.11296097403101157, - 0.0690991729934467, - 0.08171291700273287, - 0.13050639000721276, - 0.1102793059690157, - 0.07756023501860909, - 0.955581803995301, - 0.20358237401524093, - 0.10715699002321344, - 0.06820182701630984, - 0.08890528097981587, - 0.09862327098380774, - 0.2513827520306222, - 0.10246502101654187, - 0.06753546200343408, - 0.03223927700310014, - 0.054995707017951645, - 0.18745048496930394, - 0.12048047001007944, - 0.09013920403958764, - 0.06851693900534883, - 0.041129472010652535, - 0.09905014297692105, - 0.1298167310160352, - 0.05822634999640286, - 0.11887558300804812, - 0.3241561770264525, - 0.031132299001910724, - 0.10057641098683234, - 0.16447286294715013, - 0.1486746109876549, - 0.06343879202904645, - 0.12412361604219768, - 0.045958036003867164, - 0.08828750402608421, - 0.1076218860107474, - 0.2126462689921027, - 0.05726703000254929, - 0.09717050298058894, - 0.08372406304988544, - 0.11990519598475657, - 0.05260332599573303, - 0.16433822498947848, - 0.17967121601395775, - 0.005139070999575779, - 0.09502633602824062, - 0.07371137797599658, - 0.250825934970635, - 0.10146442195400596, - 0.11499924102099612, - 0.22395362495444715, - 0.06732155999634415, - 0.07526474300539121, - 0.2837063680199208, - 0.043028308005887084, - 0.09977628498745617, - 0.2034169519902207, - 0.09205828998528887, - 0.04685418101144023, - 0.10421607701573521, - 0.13550367001153063, - 0.1057319310202729, - 0.05297790898475796, - 0.11991747599677183, - 0.04649123499984853, - 0.06915139399643522, - 0.06455239102069754, - 0.09858684802020434, - 0.0429530139954295, - 0.10912969699711539, - 0.08337003001361154, - 0.056937553992611356, - 0.19374130900541786, - 0.08209015600732528, - 1.4981543779722415, - 0.21077892201719806, - 0.07742863499152008, - 0.0847037550265668, - 0.11846981800044887, - 0.09478120299172588, - 0.0836280070070643, - 0.24684336499194615, - 0.11513201698835474, - 0.11329116197885014, - 0.04415694200724829, - 0.041745128997717984, - 0.05191647898755036, - 0.0878963109862525, - 0.1259605980158085, - 0.042862054993747734, - 0.2801825619826559, - 1.5719985410105437, - 0.09429721602646168, - 0.1417913279874483, - 0.02614096899924334, - 0.08384910901077092, - 0.11677467603294645, - 0.15392379504919518, - 0.08287295098125469, - 0.26023821903800126, - 0.08430593201774172, - 0.05183741298969835, - 0.026048273997730576, - 0.08890853199409321, - 0.04197354301868472, - 0.006443239995860495, - 0.228229647007538, - 0.14516319404356182, - 0.11455049300275277, - 0.11467522202292457, - 0.047459737994358875, - 0.04167458198207896, - 0.05845415001385845, - 0.13034783900366165, - 0.025806467994698323, - 0.010328502001357265, - 0.11322916600329336, - 0.11153107395512052, - 0.0675362970068818, - 0.06295592001697514, - 0.1953879629727453, - 1.8511275030177785, - 0.10050952498568222, - 0.1051617919729324, - 0.08004400099162012, - 0.1527274259715341, - 0.21238012397952843, - 0.14713094902981538, - 0.18066362802346703, - 0.0837972150038695, - 4.4840999180451035e-05, - 0.11928994602931198, - 0.068054066010518, - 0.07585677498718724, - 0.02577741301502101, - 0.17420451604994014, - 0.025505319994408637, - 0.03086170800088439, - 0.20656839801813476, - 0.21300715797406156, - 0.05685805900429841, - 0.03630238898040261, - 1.7647268930013524, - 0.086096939019626, - 0.16458244700334035, - 0.08907630495377816, - 0.1084124500193866, - 0.07232707799994387, - 0.031019254005514085, - 0.11434089799877256, - 0.19024803396314383, - 0.11364396498538554, - 0.07817987997259479, - 0.083630074019311, - 0.04490617901319638, - 0.21958280394028407, - 0.10410369701276068, - 0.09516098903259262, - 0.08212527399882674, - 0.037282097997376695, - 0.08507314001326449, - 0.09869114201865159, - 0.10505926802579779, - 0.1011077830044087, - 0.14010672896984033, - 0.041479396997601725, - 0.06420349203108344, - 0.026388811980723403, - 0.178013494994957, - 0.10526140200090595, - 0.04231251199962571, - 0.15657861600629985, - 0.0428176160203293, - 0.1148258920002263, - 0.09391768500790931, - 0.03125284401176032, - 0.15415410599962343, - 0.04659763700328767, - 0.08009042299818248, - 0.052169176982715726, - 0.0888166490040021, - 0.08768920598959085, - 0.10458471902529709, - 0.06232010101666674, - 0.12619163501949515, - 0.08086497700423934, - 0.11366018702392466, - 0.16251269896747544, - 0.10044655398814939, - 0.11812318502052221, - 0.0841533939819783, - 0.09403351297078189, - 2.2176338620192837, - 0.0841909430018859, - 0.10909325600368902, - 0.057779392969678156, - 0.005175753001822159, - 0.17362375099037308, - 0.20749314602289815, - 0.06988738600921351, - 0.04149047000100836, - 2.1532560710184043, - 0.06715826099389233, - 0.10321546901832335, - 0.0629815080028493, - 0.06806446600239724, - 0.047445780990528874, - 0.046228652019635774, - 0.00020067500008735806, - 0.09973901098419446, - 0.02196003100834787, - 0.05797206601710059, - 0.06151069201587234, - 0.031063751986948773, - 0.1121764730050927, - 0.04713524800899904, - 0.10909077302494552, - 0.10194658502587117, - 0.08115790599549655, - 0.08301201698486693, - 0.09986492399184499, - 0.20936324997455813, - 0.10246749897487462, - 0.04295884899329394, - 0.01643586299906019, - 0.2161410919507034, - 0.07843151302949991, - 0.1454136629909044, - 0.15264996596670244, - 0.08395936099987011, - 0.04149156702624168, - 0.1966373619652586, - 0.03783969800861087, - 0.06759749799675774, - 0.07223516498925164, - 0.031064245020388626, - 0.14447452200693078, - 0.1698271809873404, - 0.07018132998200599, - 0.07409286199253984, - 0.08317954499216285, - 0.08615109599486459, - 0.005256545991869643, - 0.08859343599760905, - 0.1375365769927157, - 0.14609489901340567, - 0.09965940902475268, - 0.07959669800766278, - 0.1982217790064169, - 0.0879175380396191, - 0.18671930399432313, - 0.08652540300681721, - 0.15012506004131865, - 0.20385389201692306, - 0.03668379597365856, - 0.10352901399892289, - 0.026265672000590712, - 0.16639235298498534, - 0.08571264600323047, - 0.08733288099756464, - 0.05150278698420152, - 0.021747139006038196, - 0.04699312901357189, - 0.11207521200412884, - 0.06781838601455092, - 0.14633615605998784, - 0.09415362699655816, - 0.08254478500748519, - 0.07077800799743272, - 0.04797475201485213, - 0.02676260197767988, - 0.08469413398415782, - 0.12454624197562225, - 0.14654860901646316, - 0.057091881026281044, - 0.026218710001558065, - 0.06888687000900973, - 0.07347109798865858, - 0.07376086400472559, - 0.031208104992401786, - 2.6391631289734505, - 0.13196287296887022, - 0.07887382399349008, - 0.07672229499439709, - 0.14000125497113913, - 0.026704379997681826, - 0.06812343501951545, - 0.09926007004105486, - 0.2596155490464298, - 0.09712782700080425, - 0.06699270599347074, - 0.09124741998675745, - 0.027268421996268444, - 0.09474053698068019, - 0.07297656999435276, - 0.031387664974317886, - 0.0866190409869887, - 0.09301467399927787, - 0.1721175550192129, - 0.16733249102253467, - 0.17681665200507268, - 0.10063055898353923, - 0.09604685902013443, - 0.09386566600005608, - 0.15972468101244885, - 0.05194990699237678, - 0.25097857098444365, - 0.051557982005761005, - 0.11561315397557337, - 0.04626919297152199, - 0.04688852197432425, - 0.12372286101162899, - 0.12887336198764388, - 0.005162076005944982, - 0.11522615497233346, - 0.10152549299527891, - 0.026713374987593852, - 0.06862083102168981, - 0.07469946899800561, - 0.18500869600393344, - 0.05819234601221979, - 0.07329397500143386, - 0.11868575504922774, - 0.11132117299712263, - 0.10794190196611453, - 0.07297451399790589, - 0.2280649309977889, - 0.10962389199994504, - 0.12794691797171254, - 0.09456271103408653, - 0.12481540597218554, - 0.020496013996307738, - 0.05296873900806531, - 0.08774404499854427, - 0.08616671299387235, - 0.15195504698203877, - 2.7472682140069082, - 0.04039308898791205, - 0.061700649006525055, - 0.08104005100904033, - 0.00017465300334151834, - 0.010919426989858039, - 0.010437368997372687, - 0.17674605998035986, - 0.041687656004796736, - 0.19676482894283254, - 0.1462664550053887, - 0.04501922498457134, - 0.168160978006199, - 0.23595082201063633, - 0.0760933360143099, - 0.1730763170053251, - 0.2830068839684827, - 0.34418472897959873, - 0.14026678800291847, - 0.04970377199060749, - 0.0580445860105101, - 0.2848812399606686, - 0.34460977801063564, - 0.38548319501569495, - 0.30844129604520276, - 0.08176134302630089, - 0.2009981390001485, - 0.36996142998395953, - 0.24984367299475707, - 0.24146817400469445, - 0.2906182389706373, - 0.21561053601908498, - 0.11978498595999554, - 0.2501274520182051, - 0.168340916003217, - 0.2683457579987589, - 0.2946185099717695, - 0.19748158397851512, - 0.3382068709906889, - 0.4414806540007703, - 0.2691646489984123, - 0.42407844100671355, - 0.3103845440055011, - 0.25334589795966167, - 0.28553533906233497, - 0.2819395959813846, - 0.3282630069879815, - 0.16268278703500982, - 0.3453921979817096 - ], - "generation_latencies": [ - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0 - ], - "throughput_timeline": [], - "prefill_latencies": [ - 0.007513898002798669, - 0.028443991002859548, - 0.023544443989521824, - 0.046746733991312794, - 0.03699453000444919, - 0.04718275800405536, - 0.043217013007961214, - 0.02620460699836258, - 0.019821673005935736, - 0.06716354200034402, - 0.08499718300299719, - 0.03290869599732105, - 0.023943841995787807, - 0.08356355899013579, - 0.08333027899789158, - 0.04186335600388702, - 0.04097870500118006, - 0.08965240100224037, - 0.09029463399201632, - 0.037735701000201516, - 0.049098231000243686, - 0.03040795400738716, - 0.05034330999478698, - 0.05207525799050927, - 0.03350997599773109, - 0.016106993003631942, - 0.1476945379981771, - 0.1008964110078523, - 0.09792527298850473, - 0.0903355560003547, - 0.09596244599379133, - 0.06875603900698479, - 0.132204666006146, - 0.1384347100101877, - 0.14407180198759306, - 0.056016299000475556, - 0.04752974500297569, - 0.0800154000025941, - 0.05583147599827498, - 0.15719744500529487, - 0.15051282499916852, - 0.054623944000923075, - 0.11858451100124512, - 0.08494065499689896, - 0.024520914012100548, - 0.10268634399108123, - 0.03079348200117238, - 0.024466096991091035, - 0.024593005000497214, - 0.023059246988850646, - 0.024761453998507932, - 0.011850113995024003, - 0.015747368001029827, - 0.014759640005649999, - 0.026133512001251802, - 0.030496191000565886, - 0.02038839399756398, - 0.04496894800104201, - 0.02822242199908942, - 0.02809136400173884, - 0.026953855995088816, - 0.028694459004327655, - 0.018123018002370372, - 0.0356416970025748, - 0.04430774999491405, - 0.0202353679924272, - 0.038354385003913194, - 0.02986045999568887, - 0.011194628998055123, - 0.008836291002808139, - 0.02250538900261745, - 0.017151213993201964, - 0.0104604819935048, - 0.015839100000448525, - 0.01586853200569749, - 0.03241497899580281, - 0.01677674199163448, - 0.04416462600056548, - 0.17784897700767033, - 0.017997912989812903, - 0.015302690997486934, - 0.02100834299926646, - 0.016579388990066946, - 0.050934633996803313, - 0.01582614699145779, - 0.23438244199496694, - 0.015676205002819188, - 0.03636036900570616, - 0.015955157999997027, - 0.015814728001714684, - 0.03619465199881233, - 0.020876876995316707, - 0.01610162299766671, - 0.021480823997990228, - 0.015997375012375414, - 0.016612330000498332, - 0.026312002999475226, - 0.02630947600118816, - 0.015365617000497878, - 0.10759462899295613, - 0.01581843799795024, - 0.09224504999292549, - 0.035929635996581055, - 0.2703548980061896, - 0.02113135099352803, - 0.04196811199653894, - 0.02032160600356292, - 0.020703064001281746, - 0.02180488199519459, - 0.03710306400898844, - 0.015377805000753142, - 0.021019641993916593, - 0.020507833993178792, - 0.02625264399102889, - 0.02621642401209101, - 0.021489912003744394, - 0.010946258000331, - 0.025972841001930647, - 0.034454498993000016, - 0.025936038000509143, - 0.01721073999942746, - 0.02565127599518746, - 0.020818257005885243, - 0.02258703399274964, - 0.029411658993922174, - 0.010797274007927626, - 0.015672800989705138, - 0.032665475009707734, - 0.015983221994247288, - 0.027863594994414598, - 0.01587004298926331, - 0.011145587006467395, - 0.3281757899967488, - 0.03128811799979303, - 0.036635059004765935, - 0.026191886005108245, - 0.020999394997488707, - 0.01621729500766378, - 0.015518294996581972, - 0.028389628991135396, - 0.03147032200649846, - 0.015827145994990133, - 0.02115677199617494, - 0.03120518900686875, - 0.02655446900462266, - 0.02647857200645376, - 0.015841190004721284, - 0.02659988999948837, - 0.031230426000547595, - 0.0, - 0.010441261998494156, - 0.021058455007732846, - 0.0, - 0.10563314199680462, - 0.031074293990968727, - 0.09491407099994831, - 0.025738559997989796, - 0.015567932001431473, - 0.031336938991444185, - 0.016368886004784144, - 0.021205415003350936, - 0.015499059998546727, - 0.021561532994383015, - 0.041757529994356446, - 0.020610791994840838, - 0.02558681700611487, - 0.016969958000117913, - 0.0, - 0.022345402991049923, - 0.031168678993708454, - 0.036091228990699165, - 0.013531782999052666, - 0.020491481001954526, - 0.025763839003047906, - 0.03561276900290977, - 0.025680122009362094, - 0.16184429099666886, - 0.020832017995417118, - 0.015543598012300208, - 0.01600016599695664, - 0.1651333650079323, - 0.03140721200907137, - 0.03137013700325042, - 0.01066960500611458, - 0.04178398101066705, - 0.03786648699315265, - 0.022020658012479544, - 0.04174514800251927, - 0.015664703998481855, - 0.01064552400202956, - 0.025264865005738102, - 0.025645060988608748, - 0.031241365999449044, - 0.03213495499221608, - 0.025916743994457647, - 0.010297503002220765, - 0.016210756002692506, - 0.0, - 0.02096552000148222, - 0.0, - 0.021006741997553036, - 0.03677152199088596, - 0.026597141986712813, - 0.010673305005184375, - 0.0, - 0.021639482001774013, - 0.02685442499932833, - 0.02080809199833311, - 0.01882331000524573, - 0.02744881399848964, - 0.026565302003291436, - 0.020546016006846912, - 0.0368049440003233, - 0.021486298006493598, - 0.030873556999722496, - 0.015404435005621053, - 0.026813823002157733, - 0.032337665994418785, - 0.0411631530005252, - 0.03635053201287519, - 0.028803038003388792, - 0.0, - 0.009690239006886259, - 0.025565838994225487, - 0.010282319999532774, - 0.010522274998947978, - 0.022272285001236014, - 0.046368723997147754, - 0.0, - 0.03221950100851245, - 0.03673343200352974, - 0.02748266700655222, - 0.015735246997792274, - 0.02170035098970402, - 0.021534213999984786, - 0.020980961999157444, - 0.020786780994967557, - 0.0, - 0.036304926994489506, - 0.0, - 0.010480686993105337, - 0.01568045398744289, - 0.0, - 0.010330691002309322, - 0.020668379991548136, - 0.020726197006297298, - 0.02077930599625688, - 0.025952575000701472, - 0.020718393992865458, - 0.02178943299804814, - 0.04230488800385501, - 0.0206108900019899, - 0.0, - 0.0282252910110401, - 0.0, - 0.020681471985881217, - 0.010478058000444435, - 0.030966067002736963, - 0.0, - 0.03237565500603523, - 0.04120191499532666, - 0.02123869099887088, - 0.0, - 0.011273581985733472, - 0.026015088995336555, - 0.030816238999250345, - 0.02616997699078638, - 0.042003772003226914, - 0.03156890800164547, - 0.031414197001140565, - 0.0, - 0.02113657200243324, - 0.015478568995604292, - 0.027462793994345702, - 0.011030054985894822, - 0.010414043004857376, - 0.02266283099015709, - 0.020805250998819247, - 0.0210021590028191, - 0.0, - 0.0, - 0.056557992997113615, - 0.0, - 0.0, - 0.0, - 0.0, - 0.020694798004115, - 0.016373348989873193, - 0.016185850006877445, - 0.02102874100091867, - 0.021534860992687754, - 0.05163567399722524, - 0.010530763989663683, - 0.0, - 0.010798062998219393, - 0.0, - 0.015814031998161227, - 0.0, - 0.0, - 0.03635909900185652, - 0.02071588599937968, - 0.0, - 0.0, - 0.01787340300506912, - 0.02273214999877382, - 0.015455132001079619, - 0.04211112100165337, - 0.04137068899581209, - 0.0, - 0.0, - 0.02128746199014131, - 0.037663279988919385, - 0.02054006399703212, - 0.0, - 0.026172182013397105, - 0.025861551010166295, - 0.0, - 0.015404024001327343, - 0.03146900699357502, - 0.03636352800822351, - 0.02092234000156168, - 0.021668219997081906, - 0.0, - 0.031244140001945198, - 0.03167795900662895, - 1.7124516180047067, - 0.02792789500381332, - 0.020814728995901532, - 0.015543868008535355, - 0.027658826002152637, - 0.0, - 0.025950007999199443, - 0.03152924501046073, - 0.010231341992039233, - 0.025226028010365553, - 0.010247393001918681, - 0.01093846300500445, - 0.030982094001956284, - 0.03129690099740401, - 0.025809351005591452, - 0.0, - 0.0, - 0.0, - 0.025756675997399725, - 0.011129298000014387, - 0.016647758006001823, - 0.020714845013571903, - 0.029590771009679884, - 0.016330253012711182, - 0.0, - 0.03084401499654632, - 0.010977304002153687, - 0.030907956999726593, - 0.0, - 0.0, - 0.012157763005234301, - 0.021498410002095625, - 0.02820518400403671, - 0.012219402007758617, - 0.03567530500004068, - 0.02589154899760615, - 0.0, - 0.020718968997243792, - 0.0, - 0.02605662700079847, - 0.020757133010192774, - 0.025561941001797095, - 0.010504835998290218, - 0.0, - 0.020501808990957215, - 0.01600068499101326, - 0.015623755010892637, - 0.015758616995299235, - 0.0, - 0.0, - 0.020916633991873823, - 0.02619301699451171, - 0.0, - 0.0, - 0.02609465799469035, - 0.031150987997534685, - 0.022817870005383156, - 0.0, - 0.03116700598911848, - 0.04211206000763923, - 0.05135675299970899, - 0.026143918992602266, - 0.02592664599069394, - 0.01586120000865776, - 0.02904301400121767, - 0.026597207004670054, - 0.0, - 0.030773947000852786, - 0.0, - 0.061814726010197774, - 0.025674920005258173, - 0.021221634990070015, - 0.03111327400256414, - 0.0, - 0.021730245993239805, - 0.0312097120040562, - 0.015570457006106153, - 0.015684762998716906, - 0.0, - 0.0, - 0.021246624004561454, - 0.0364580520108575, - 0.005549304012674838, - 0.010632086006808095, - 0.0, - 0.0, - 0.0, - 0.027090190997114405, - 0.011192602003575303, - 0.04167627400602214, - 0.0, - 0.020702423993498087, - 0.0, - 0.016266686987364665, - 0.027307848999043927, - 0.0, - 0.0, - 0.02124269500200171, - 0.010320301007595845, - 0.0, - 0.030999213995528407, - 0.010495889000594616, - 0.027414019001298584, - 0.03281663000234403, - 0.0, - 0.022248334003961645, - 0.015629197005182505, - 0.010429298999952152, - 0.021105096995597705, - 0.015386965009383857, - 0.015791399011504836, - 0.026830239003174938, - 0.010327675001462922, - 0.0260248929989757, - 0.0, - 0.0, - 0.0, - 0.0, - 0.025871731995721348, - 0.016125027992529795, - 0.0, - 0.025645084999268875, - 0.031059300003107637, - 0.020817047989112325, - 0.0, - 0.03225582999584731, - 0.03604050799913239, - 0.01680466500693001, - 0.0, - 0.0, - 0.026206909999018535, - 0.036399654985871166, - 0.0, - 0.020653693994972855, - 0.0, - 0.0, - 0.013001062994590029, - 0.017929661000380293, - 0.0163463039934868, - 0.026850686001125723, - 0.0, - 0.030482518006465398, - 0.0, - 0.0279954470024677, - 0.031003462005173787, - 0.03591989500273485, - 0.02059851300145965, - 0.010439967998536304, - 0.010761454002931714, - 0.022681681002723053, - 0.02601306399446912, - 0.01577682299830485, - 0.0278060349955922, - 0.010341958011849783, - 0.015742616989882663, - 0.02134666999336332, - 0.0, - 0.03414607798913494, - 0.0, - 0.010846559001947753, - 0.0, - 0.0, - 0.0, - 0.015656582996598445, - 0.02579576100106351, - 0.022474860001238994, - 0.0, - 0.010479690012289211, - 0.010591072001261637, - 0.026938875002088025, - 0.0, - 0.0260067130002426, - 0.0, - 0.02238055001362227, - 0.015832662000320852, - 0.0, - 0.0, - 0.01553205399250146, - 0.010639532993081957, - 0.0, - 0.0156390990014188, - 0.015523141992161982, - 0.010824148994288407, - 0.0, - 0.016319629998179153, - 0.020937285997206345, - 0.026709349011071026, - 0.025678295001853257, - 0.0, - 0.02143762999912724, - 0.0, - 0.0, - 0.01571765799599234, - 0.025879390988848172, - 0.0, - 0.021170676001929678, - 0.0, - 0.03720083799271379, - 0.010440183992614038, - 0.0, - 0.021703863007132895, - 0.0, - 0.02557815398904495, - 0.0, - 0.0, - 0.026040140000986867, - 0.0103209499939112, - 0.010706382992793806, - 0.015376772003946826, - 0.01025520199618768, - 0.025802631993428804, - 0.0, - 0.020673192993854173, - 0.03278452200174797, - 0.0, - 0.0, - 0.016147650007042103, - 0.0064703669922892, - 0.01088641099340748, - 0.1628114510094747, - 0.16573242799495347, - 0.18378175800899044, - 0.1757968330057338, - 0.19729973599896766 - ], - "decode_latencies": [ - 0.006010528988554142, - 0.009773044002940878, - 0.00600174099963624, - 0.07789056800538674, - 0.005800048005767167, - 0.047187137010041624, - 0.005826923996210098, - 0.0034971660061273724, - 0.007845930987969041, - 0.01996669900836423, - 0.08523929699731525, - 0.019794646999798715, - 0.006229325997992419, - 0.0005374800093704835, - 0.005947773010120727, - 0.028862884995760396, - 0.012568902995553799, - 0.002598305989522487, - 0.061311892000958323, - 0.006725329003529623, - 0.0025576400075806305, - 0.013333883995073847, - 0.07884469999407884, - 0.07888004300184548, - 0.020065258999238722, - 0.07234976399922743, - 0.08440551799139939, - 0.0063792959990678355, - 0.06561621899891179, - 0.0023428360000252724, - 0.012926990006235428, - 0.07825873099500313, - 0.006624036002904177, - 0.04202370900020469, - 0.010371635013143532, - 0.0061194029985927045, - 0.0779853799904231, - 0.013135043991496786, - 6.062499596737325e-05, - 0.006535585998790339, - 0.006559400004334748, - 0.04199892800534144, - 0.007825544991646893, - 0.013778895008726977, - 0.00651038000069093, - 0.008683018997544423, - 0.005099826987134293, - 0.0016596289933659136, - 0.006585305993212387, - 0.01930691400775686, - 0.009469430005992763, - 0.015453691012226045, - 0.018717175000347197, - 0.010525959994993173, - 0.005300220989738591, - 0.002510717007680796, - 0.016728082991903648, - 0.00729519801097922, - 0.0051479170069796965, - 0.004887761999270879, - 0.019142445002216846, - 0.005139632994541898, - 0.016917654997087084, - 0.009965449993615039, - 0.018140334999770857, - 0.010696757992263883, - 0.042991085996618494, - 0.005416861997218803, - 0.01067008200334385, - 0.010615312989102677, - 0.020544095008517615, - 0.015440457995282486, - 0.005473027005791664, - 0.012942219007527456, - 0.04731175600318238, - 0.005209570997976698, - 0.006641701998887584, - 0.011398488000850193, - 0.010378234001109377, - 0.005199478997383267, - 0.01409537899598945, - 0.010344702997826971, - 0.0033488329936517403, - 0.012983686989173293, - 0.03663665399653837, - 0.010415415003080852, - 0.005590428991126828, - 0.005292467001709156, - 0.005399312998633832, - 0.005307036000886001, - 0.005268870998406783, - 0.01628787801018916, - 0.01040842199290637, - 0.01603338100539986, - 0.005194143988774158, - 0.005124488001456484, - 0.006081636995077133, - 0.005260726000415161, - 0.005200958999921568, - 0.005239299003733322, - 0.005322028999216855, - 0.0052040800073882565, - 0.005170703996554948, - 0.010340584994992241, - 0.012331527992500924, - 0.01542972499737516, - 0.005335741996532306, - 0.0002774640015559271, - 0.011271137002040632, - 0.010618507003528066, - 0.005580249999184161, - 0.010329637996619567, - 0.00010782500612549484, - 0.01045294500363525, - 0.010396504992968403, - 0.010526206999202259, - 0.005150315002538264, - 0.00543859601020813, - 0.010451707988977432, - 0.005154323996976018, - 0.01016745799279306, - 0.005220848004682921, - 0.00523410300957039, - 0.0052538919990183786, - 0.010374429999501444, - 0.018085668998537585, - 0.010242026997730136, - 0.00029843700758647174, - 0.010142432991415262, - 0.01033389600343071, - 0.009662854005000554, - 0.015482524002436548, - 0.0076172480039531365, - 0.010235542998998426, - 0.010402929998235777, - 0.005275927003822289, - 0.015864664004766382, - 0.010355522012105212, - 0.010334759994293563, - 0.026234140997985378, - 0.010359940002672374, - 0.015570675997878425, - 0.010391204996267334, - 0.010462828999152407, - 9.547200170345604e-05, - 0.011438650006311946, - 0.011732661005225964, - 0.011666952996165492, - 0.005105150004965253, - 0.005313714005751535, - 0.01963439999963157, - 0.005547362001379952, - 8.315499871969223e-05, - 0.010241647993098013, - 0.0052031469967914745, - 0.0002165589976357296, - 0.015647616994101554, - 0.00016617598885204643, - 0.14930155698675662, - 0.005466514005092904, - 0.005133074009791017, - 0.005265280997264199, - 7.636600639671087e-05, - 0.010624762013321742, - 0.005182266992051154, - 0.005200883999350481, - 0.00567915900319349, - 0.00032243500754702836, - 0.010291984013747424, - 0.011257723002927378, - 0.005180141000892036, - 0.00519217700639274, - 0.010527670005103573, - 0.005157186998985708, - 0.010164765000808984, - 0.005133325990755111, - 0.0051431900064926594, - 0.010696042008930817, - 0.010400822997326031, - 0.015467361998162232, - 0.024006071995245293, - 0.005222960986429825, - 0.005139539003721438, - 0.010329967990401201, - 0.013012666997383349, - 0.010167758009629324, - 0.00010618800297379494, - 0.005156054001417942, - 0.005248393994406797, - 0.010226131009403616, - 0.009878608994768001, - 0.03582087600079831, - 0.010264966011163779, - 0.010321142006432638, - 0.01031523200799711, - 0.005340632007573731, - 0.0051476230000844225, - 0.01258930099720601, - 0.010370286996476352, - 0.005109871999593452, - 0.010646089009242132, - 0.015509574004681781, - 0.01540862800902687, - 0.010701921011786908, - 0.010618725995300338, - 0.005182531007449143, - 0.0052055039996048436, - 0.010417024997877888, - 0.010530999992624857, - 0.010396822006441653, - 0.015382219993625768, - 0.005156357001396827, - 0.01033446000656113, - 0.016505701991263777, - 0.005233750998741016, - 0.005187660994124599, - 0.01562295299663674, - 0.01572624700202141, - 0.01119735901011154, - 0.005163006993825547, - 0.005258984994725324, - 0.005893462992389686, - 0.005345590005163103, - 0.019993124005850405, - 0.010687835005228408, - 0.005192361000808887, - 0.005178200008231215, - 0.015457418994628824, - 5.3417999879457057e-05, - 0.012889970996184275, - 0.010151972994208336, - 0.026782173008541577, - 6.916499114595354e-05, - 0.010379134997492656, - 0.005165145004866645, - 0.01034579701081384, - 0.01558407099219039, - 0.005165517999557778, - 0.01016926599550061, - 0.015466589000425301, - 0.005202850006753579, - 0.011331944988342002, - 0.005678815999999642, - 0.010320127010345459, - 0.00511822898988612, - 0.005181169995921664, - 0.010369234005338512, - 0.028262703999644145, - 0.00554071601072792, - 0.010343775997171178, - 0.017061192003893666, - 0.005159328007721342, - 0.015441176990862004, - 0.015506577998166904, - 0.005309640007908456, - 0.005248076005955227, - 0.0103148769994732, - 0.005123604001710191, - 0.010417532990686595, - 0.01536964600381907, - 0.005117754000821151, - 0.005186977999983355, - 0.01537512699724175, - 0.005139048007549718, - 0.005173899000510573, - 0.005168460993445478, - 0.015844751993427053, - 0.005374861997552216, - 0.005143909002072178, - 0.005316691007465124, - 0.006122531005530618, - 0.005162516012205742, - 0.010556323992204852, - 0.005165195005247369, - 0.010325876995921135, - 0.030594249998102896, - 0.030566704997909255, - 0.005521657003555447, - 0.015449664002517238, - 0.005302396006300114, - 0.015853232005611062, - 0.011495811995700933, - 0.005414469997049309, - 0.01578082899504807, - 0.005185941990930587, - 0.0177117540006293, - 0.020948439996573143, - 0.005185586996958591, - 0.015577158992527984, - 0.0001391200057696551, - 0.011600533005548641, - 0.005638654998620041, - 0.019378306009457447, - 0.015447503988980316, - 0.005284754995955154, - 0.01029900299909059, - 0.0051852860051440075, - 0.019240911991801113, - 0.010263074000249617, - 0.005257263997918926, - 0.010295560001395643, - 0.005162695990293287, - 0.0056142579996958375, - 0.005111863007186912, - 0.010318179993191734, - 0.0052057449938729405, - 0.010425168002257124, - 0.015551477001281455, - 0.005186788999708369, - 0.010447927008499391, - 0.0051611620001494884, - 0.01541446200280916, - 0.015447393001522869, - 0.02034719500807114, - 0.010390146999270655, - 0.0052919979934813455, - 0.005204151995712891, - 0.010308512995834462, - 0.010342046996811405, - 0.01151389601000119, - 0.010336524996091612, - 0.00521631000447087, - 0.010255544009851292, - 0.010508364997804165, - 0.0153860970021924, - 0.005225614993833005, - 0.005242562008788809, - 0.010327230003895238, - 0.005218247999437153, - 0.010290282007190399, - 0.005169053009012714, - 0.0001322699972661212, - 0.005683823997969739, - 0.00029383199580479413, - 0.0051881219987990335, - 0.005562919002841227, - 0.00913758299429901, - 0.005375196007662453, - 0.010312926999176852, - 0.010402725005405955, - 0.005147081988980062, - 0.00524466999922879, - 0.012380612999550067, - 0.005121893991599791, - 0.015269763011019677, - 0.005283466991386376, - 0.00620399801118765, - 0.005194158002268523, - 0.010431454997160472, - 0.005153238991624676, - 0.009393087995704263, - 0.020637155990698375, - 0.005192035998334177, - 0.02476155900512822, - 0.005166532995644957, - 0.010479099990334362, - 0.012557777998154052, - 0.013046492997091264, - 0.005139667002367787, - 0.00523698799952399, - 0.010368095012381673, - 0.015384736994747072, - 0.0156277989881346, - 0.015512853002292104, - 0.010325669005396776, - 0.010473214992089197, - 0.015452069012098946, - 0.01561905701237265, - 0.010352801997214556, - 0.005233456991845742, - 0.005244281011982821, - 0.005290979010169394, - 0.010297044005710632, - 7.688099867664278e-05, - 0.01035580100142397, - 0.005149259988684207, - 0.020833991991821676, - 0.015253082994604483, - 0.005146349998540245, - 0.02245672099525109, - 0.00024919099814724177, - 0.0009913370013237, - 0.02047767701151315, - 0.01143955800216645, - 0.015621914993971586, - 0.0055171489948406816, - 0.010404334010672756, - 0.014834286994300783, - 0.00514006000594236, - 0.010329284006729722, - 0.005232997995335609, - 0.005115902997204103, - 0.020646179997129366, - 0.015337192002334632, - 0.005176351012778468, - 0.0051266999944346026, - 0.005176919992663898, - 0.01551325700711459, - 0.005146690004039556, - 0.0052100980101386085, - 0.00527341099223122, - 0.005130134988576174, - 0.010919186010141857, - 0.005238348006969318, - 0.021612509997794405, - 0.0051320110069355, - 0.01646770398656372, - 0.01040862500667572, - 0.005138857988640666, - 0.010190104992943816, - 0.010740273995907046, - 0.025489211999229155, - 0.005184095003642142, - 5.109400080982596e-05, - 0.020634289990994148, - 0.017224657000042498, - 0.010342114997911267, - 0.005370435988879763, - 0.01026780900429003, - 0.010267498000757769, - 0.006079245009459555, - 0.005104409996420145, - 0.005215983008383773, - 0.0061480219883378595, - 0.010680103994673118, - 0.020585402991855517, - 0.005195115998503752, - 0.010448151006130502, - 0.015451717001269571, - 0.010327209005481564, - 4.4639993575401604e-05, - 0.005143155998666771, - 0.00514613300038036, - 0.010434568001073785, - 0.00509577699995134, - 0.010415297991130501, - 0.015589964998071082, - 0.010338248001062311, - 0.005246480999630876, - 0.010308309996617027, - 0.011143911993713118, - 0.008997662996989675, - 0.005275536997942254, - 0.005136103995027952, - 0.005328597006155178, - 0.02397893999295775, - 0.015269512005033903, - 0.0051472439954523, - 0.02519418099836912, - 0.005204935994697735, - 0.005139462009537965, - 0.00514117899001576, - 0.022411805999581702, - 0.005108438999741338, - 0.010421642000437714, - 0.005283432008582167, - 0.01725972800340969, - 0.005174136997084133, - 0.005149499003891833, - 0.000122969999210909, - 0.00514388999727089, - 0.016855399997439235, - 0.0051229609962319955, - 0.011155662999954075, - 0.015439381008036435, - 0.005155740000191145, - 0.005424127011792734, - 0.0052303059928817675, - 0.015260974003467709, - 0.01035973800753709, - 0.00526626298960764, - 0.005252122005913407, - 0.010490266999113373, - 0.01029523900069762, - 0.005149684002390131, - 4.53649990959093e-05, - 0.0051818630017805845, - 0.00515699599054642, - 0.010148850997211412, - 0.00013072100409772247, - 0.005545569001697004, - 0.005216942998231389, - 0.010351414006436244, - 0.021318876009900123, - 0.005308071995386854, - 0.005287025996949524, - 0.005216499004745856, - 0.015399383002659306, - 0.010297542001353577, - 0.0007000029872870073, - 0.007942145006381907, - 0.010141460996237583, - 0.015321080005378462, - 0.0052648920100182295, - 0.006143960999906994, - 0.010602886992273852, - 0.01051466999342665, - 0.005238161989836954, - 0.005658879003021866, - 0.005335798996384256, - 0.0052256439958000556, - 0.015453033993253484, - 0.020898625007248484, - 5.1202994654886425e-05, - 0.004257489999872632, - 0.002230504003819078, - 0.003976228996179998, - 0.010291690996382385, - 0.005170491000171751, - 0.010324878996470943, - 0.005215386001509614, - 0.005307557003106922, - 0.014626167991082184, - 0.0052365879964781925, - 0.010772059002192691, - 0.005647728990879841, - 0.005138312990311533, - 0.010285882992320694, - 0.17320100399956573, - 0.18485681599122472, - 0.009080955001991242, - 0.005204788991250098, - 0.012098129009245895, - 0.011042698009987362, - 0.015685944992583245, - 0.007624507008586079, - 0.00013586100249085575, - 0.0053919099882477894, - 0.1070562569948379, - 0.033129075993201695, - 0.01034779500332661, - 0.02194856500136666, - 0.007640059993718751, - 0.018902419004007243, - 0.005174395002541132, - 0.010429526009829715, - 0.005167748997337185, - 0.015492991005885415, - 0.010277883004164323, - 0.005178586987312883, - 0.005158664993359707, - 0.005208902002777904, - 0.027258086993242614, - 0.0052060650050407276, - 0.01533649698831141, - 0.01049665900063701, - 0.0005014179914724082, - 0.17883039500156883, - 0.01032221200875938 - ], - "multi_turn_cache_hits": 76, - "multi_turn_cache_misses": 296, - "seed": 42, - "summary": { - "total_requests": 549, - "total_tokens": 146900, - "elapsed_time": 54.64053273200989, - "avg_throughput_tokens_per_sec": 2688.480376289268, - "requests_per_second": 10.047486225887052, - "end_to_end_latency_ms": { - "mean": 29974.702574712268, - "p50": 30461.31026200601, - "p95": 55516.30446019699, - "p99": 55609.483422955964 - }, - "storage_io_latency_ms": { - "mean": 158.1624600566779, - "p50": 99.66375399380922, - "p95": 370.60982240654994, - "p99": 1672.2172840457608 - }, - "generation_latency_ms": { - "mean": 0.0, - "p50": 0.0, - "p95": 0.0, - "p99": 0.0 - }, - "cache_stats": { - "cache_hit_rate": 0.931140276120675, - "cache_hits": 5463, - "cache_misses": 404, - "gpu_entries": 450, - "cpu_entries": 0, - "nvme_entries": 0, - "gpu_memory_used_gb": 7.5921630859375, - "cpu_memory_used_gb": 0.0, - "offloads_cpu": 0, - "offloads_nvme": 0, - "storage_health": { - "overall_status": "PASS", - "criteria": [ - { - "name": "Cache Hit Rate > 30%", - "target": 0.3, - "actual": 0.931140276120675, - "unit": "ratio", - "passed": true - } - ], - "passed_count": 1, - "total_count": 1 - }, - "prefill_writes": 450, - "decode_reads": 5463, - "prefill_bytes_written_gb": 7.5921630859375, - "decode_bytes_read_gb": 94.4210205078125, - "system_prompt_hits": 1019, - "common_phrase_hits": 0, - "user_cache_hits": 4368, - "multi_turn_hits": 76, - "total_read_bytes": 101383798784, - "total_write_bytes": 8152023040, - "total_read_gb": 94.4210205078125, - "total_write_gb": 7.5921630859375, - "read_write_ratio": 12.4366428169467, - "read_iops": 5463, - "write_iops": 450, - "gpu_read_p50_ms": 10.151406007935293, - "gpu_read_p95_ms": 21.36624029808443, - "gpu_read_p99_ms": 96.48693589901087, - "gpu_write_p50_ms": 25.4134030037676, - "gpu_write_p95_ms": 99.55939889914575, - "gpu_write_p99_ms": 190.6759267838787 - }, - "qos_metrics": { - "interactive": { - "total_requests": 549, - "latency_ms": { - "mean": 29974.702574712268, - "p50": 30461.31026200601, - "p95": 55516.30446019699, - "p99": 55609.483422955964, - "max": 55613.08303300757 - }, - "sla": { - "target_p95_ms": 50, - "actual_p95_ms": 55516.30446019699, - "compliance": 0.0, - "met": false - } - }, - "responsive": { - "no_data": true - }, - "batch": { - "no_data": true - } - }, - "prefix_cache_stats": { - "prefix_hits": 108, - "prefix_misses": 441, - "system_prompt_reuse": 108, - "common_phrase_reuse": 0, - "bytes_saved": 96075776 - }, - "autoscaling_stats": [], - "autoscaling_summary": null, - "multi_turn_stats": { - "cache_hits": 76, - "cache_misses": 296, - "hit_rate": 0.20430107526881722 - } - } -} \ No newline at end of file diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_only_trial2.json b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_only_trial2.json deleted file mode 100644 index 58bbec1b..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_only_trial2.json +++ /dev/null @@ -1,2875 +0,0 @@ -{ - "requests_completed": 549, - "total_tokens_generated": 148262, - "total_storage_io_latency": 98.74481805328105, - "total_generation_latency": 0.0, - "end_to_end_latencies": [ - 0.17159596100100316, - 0.2315504619909916, - 0.2324604659952456, - 0.27158166900335345, - 0.36186876900319476, - 0.3947469130071113, - 0.41883955799858086, - 0.4721600549964933, - 0.5708825059991796, - 0.5718035489990143, - 0.5977575400029309, - 0.5985978179960512, - 0.6055227759934496, - 0.6235441070020897, - 0.6246726989920717, - 0.6374808149994351, - 0.7496834869962186, - 0.7632349630002864, - 0.7640643329941668, - 0.7702364199940348, - 0.7770532249996904, - 0.7784076420066413, - 0.7793656440044288, - 0.8043996519991197, - 0.8057395049982006, - 0.8045243260130519, - 0.8054992089892039, - 0.8060047290055081, - 0.8060967779892962, - 0.8069216259900713, - 0.8148003519891063, - 0.8141518789925613, - 0.8145497609948507, - 0.8221060129872058, - 0.8276900539931376, - 0.8278362049895804, - 0.8299165300122695, - 0.8292160929995589, - 0.8713758390076691, - 0.8710802580026211, - 0.8763643949932884, - 0.8825998399988748, - 0.8834799640026176, - 0.8843470239953604, - 0.8890231580007821, - 0.9132367490092292, - 0.9140678640105762, - 0.9213106889947085, - 0.9377004739944823, - 0.9438780829950701, - 0.9440202940022573, - 0.9455217909999192, - 1.0551713789900532, - 1.0573294300120324, - 1.0599183949962026, - 1.059230277009192, - 1.1309632300108206, - 1.1350831600138918, - 1.1350522720022127, - 1.1331114399945363, - 1.1332993779942626, - 1.1339682459947653, - 1.06771192800079, - 1.1562287629931234, - 1.1588415470032487, - 1.1581078769959277, - 1.1567072739999276, - 1.1604001749947201, - 1.1753320030111354, - 1.174308703004499, - 1.1758526470075594, - 1.1778126230055932, - 1.1783227499981876, - 1.253333876011311, - 1.2561857649998274, - 1.2655535210069502, - 1.284184517004178, - 1.39096780300315, - 1.6050859249953646, - 1.6520206010027323, - 1.6885164989944315, - 2.382078159993398, - 2.4087442879972514, - 2.5653211520111654, - 2.6929361970105674, - 2.760534988992731, - 2.951111900008982, - 3.140671952001867, - 3.2433414139959496, - 3.2642208830075106, - 3.9381061570020393, - 4.240984686999582, - 4.28623952101043, - 4.399956348002888, - 4.601549872008036, - 4.965970122997533, - 5.0393283589946805, - 5.075657508990844, - 5.168189950010856, - 5.193966702994658, - 5.276978939989931, - 5.357212712988257, - 5.581537645004573, - 5.594684263996896, - 5.9523283210000955, - 6.043607831001282, - 6.103722454005037, - 6.160258699004771, - 6.26964966399828, - 6.349602551999851, - 6.3702468529954785, - 6.4121701749973, - 6.441727830999298, - 6.461681058994145, - 6.482882500000414, - 6.499652458995115, - 7.262075125006959, - 7.38969707301294, - 7.496218223997857, - 7.551984865000122, - 7.555745411998942, - 7.587884843000211, - 7.690460569996503, - 8.073082464004983, - 8.110060763006913, - 8.131651219009655, - 8.15238089900231, - 8.173854963009944, - 8.224532586013083, - 8.314421754999785, - 8.333786745002726, - 8.437110578990541, - 8.96592016199429, - 9.18724619099521, - 9.228202735001105, - 9.260076791993924, - 9.39863809599774, - 9.528000250007608, - 9.550550796004245, - 9.66968969500158, - 9.722510675986996, - 9.930127209008788, - 10.03239465400111, - 10.068025946995476, - 10.070171165003558, - 10.210021719001816, - 10.2945943550003, - 10.293918139999732, - 10.31569676399522, - 10.335203025999363, - 10.371704794000834, - 10.435043915000279, - 10.541401609996683, - 10.590004989004228, - 10.610723259000224, - 10.62705021900183, - 11.349979453007109, - 11.376797517004889, - 11.418002630001865, - 11.554187778994674, - 11.645985576993553, - 11.681365047988947, - 11.715001235992531, - 11.775753282010555, - 11.781023479998112, - 11.797861787999864, - 12.178078544005984, - 12.358180473005632, - 12.359137003004435, - 12.72110494399385, - 12.993292144994484, - 13.035759621998295, - 13.841266257004463, - 13.955059843996423, - 14.274397172994213, - 14.404647368006408, - 14.573805863998132, - 14.693997125999886, - 14.709397329992498, - 14.833063715996104, - 14.855597118003061, - 14.885282059010933, - 15.030114978013444, - 15.112693661998492, - 15.241532125000958, - 15.27230011500069, - 15.45720901498862, - 15.484298009003396, - 15.56677501500235, - 15.63512464199448, - 15.63592907799466, - 15.653639420008403, - 16.026578624994727, - 16.049313136012643, - 16.97883576800814, - 16.993799060001038, - 17.030797096012975, - 17.03166654400411, - 17.062247030000435, - 17.175460458995076, - 17.224316101011937, - 17.2343150319939, - 17.284799846995156, - 17.29478584199387, - 17.372792721987935, - 17.52527705299144, - 17.60347474399896, - 17.694011490006233, - 17.768787714012433, - 17.835166138000204, - 17.839994396999828, - 17.985533715007477, - 18.087354974995833, - 18.267512591002742, - 18.36992825199559, - 18.569247074003215, - 18.81728396100516, - 18.82234754700039, - 18.921305264011608, - 19.06330205600534, - 19.084059120999882, - 19.17691510800796, - 19.243758753000293, - 19.269901630003005, - 19.33075454599748, - 19.37875069300935, - 19.3936523609882, - 19.39363740499539, - 21.017664957995294, - 21.178384701997857, - 21.464710824002395, - 21.558205252993503, - 21.647278611999354, - 21.66163209899969, - 21.70456131499668, - 21.766030587998102, - 21.790292778998264, - 21.87122043200361, - 22.05735365000146, - 22.173797926996485, - 22.207160158999613, - 22.273763150005834, - 22.486428760006675, - 22.553913802010356, - 22.642065842999727, - 22.668328630999895, - 22.807578068997827, - 22.89080984800239, - 22.92850940200151, - 23.159901216000435, - 23.15994459101057, - 23.363903117002337, - 23.42672421899624, - 23.429323980002664, - 23.437400858994806, - 23.51791380699433, - 23.536874857993098, - 23.72103029100981, - 23.737718435004354, - 24.04791053799272, - 24.133838216992444, - 24.19746256498911, - 24.196206334992894, - 24.260073669996927, - 24.28127918599057, - 24.506605654998566, - 24.60950376398978, - 24.651435981999384, - 24.69785915200191, - 26.017649007990258, - 26.06364416499855, - 26.168165456008865, - 26.204925671001547, - 26.310535341996, - 26.422675355002866, - 26.474455474002752, - 26.501665838004556, - 26.519505087999278, - 26.535041053997702, - 26.561461353005143, - 26.675423177002813, - 26.735349975002464, - 26.939495744009037, - 26.97972935899452, - 26.989546191005502, - 27.25128729599237, - 27.395423976005986, - 27.411500500005786, - 27.43737605700153, - 27.53595481000957, - 27.60723343899008, - 27.903051875007804, - 28.053258304003975, - 28.400949950999347, - 28.80595196200011, - 28.811639414998353, - 28.81647552800132, - 28.853769184002886, - 28.9459705939953, - 29.018427634990076, - 29.0235381500097, - 29.100955236994196, - 29.144010151008843, - 29.237289773998782, - 29.259171470999718, - 29.304597969006863, - 29.46109440899454, - 29.599792957000318, - 29.693516932995408, - 29.729490152996732, - 29.778904752005474, - 29.779201475990703, - 29.80215653500636, - 29.872362809997867, - 30.167931289004628, - 30.210426629011636, - 32.128547689004336, - 32.25671142101055, - 32.28463203100546, - 32.32523591800418, - 32.39684170499095, - 32.45452659999137, - 32.59069131599972, - 32.647899744988536, - 32.66257918099291, - 32.69452223900589, - 32.741262397990795, - 32.817689161005546, - 32.82596291700611, - 32.84994949899556, - 32.908164956999826, - 33.13025218099938, - 33.161141649005, - 33.20754381100414, - 33.26684454000497, - 33.26592797000194, - 33.27330119100225, - 33.37621314699936, - 33.468896455990034, - 33.53903665300459, - 33.69845391000854, - 33.791577427997254, - 33.83330408099573, - 33.88008806500875, - 33.89568681399396, - 33.958222761997604, - 34.24487393599702, - 34.24680073298805, - 34.272770700001274, - 34.310676417007926, - 34.344978055989486, - 34.45305434901093, - 34.4688921449997, - 34.65024443999573, - 34.750112833004096, - 34.95975400299358, - 34.98222591399099, - 35.02634572399256, - 35.07209214300383, - 35.20105734899698, - 35.22854982300487, - 35.24849643900234, - 35.67853125400143, - 35.81676775899541, - 35.88328820699826, - 35.93530823299079, - 36.13602721699863, - 36.21128064200457, - 36.23182530900522, - 36.27413384099782, - 36.28406353000901, - 36.31091649099835, - 36.63582478299213, - 36.658165917004226, - 36.72072307100461, - 36.71995159200742, - 36.77675405199989, - 36.798035610001534, - 36.834101186002954, - 36.84940411700518, - 36.87058113999956, - 36.974921756002004, - 37.0108615479985, - 37.011235379002756, - 37.03186113599804, - 37.06814413399843, - 37.07875325100031, - 37.12490415299544, - 37.33841637099977, - 37.369063336998806, - 37.39242951699998, - 37.4266076519998, - 37.52268894101144, - 39.867591924994485, - 39.99149842299812, - 40.17875876300968, - 40.225039606011705, - 40.23096119299589, - 40.24664806500368, - 40.273138158998336, - 40.39036586599832, - 40.42671247299586, - 40.79285119299311, - 40.81720504599798, - 40.94666032200621, - 41.11319164300221, - 41.15993754300871, - 41.19075065200741, - 41.291848561988445, - 41.291966114004026, - 41.34877251800208, - 41.578189856998506, - 41.697210165002616, - 41.885798821007484, - 41.90659464999044, - 42.18424145899189, - 42.27852295599587, - 42.289101486996515, - 42.315911971003516, - 42.39712190300634, - 42.40587354400486, - 42.43050056799257, - 42.50574912200682, - 43.002517623011954, - 43.09592092598905, - 43.142160214993055, - 43.307225198994274, - 43.417827258002944, - 43.66999575099908, - 43.72601068299264, - 43.78928230699967, - 43.799323756000376, - 43.819015738001326, - 43.84007479700085, - 43.87675574400055, - 43.89755486900685, - 43.942180369995185, - 44.018949366000015, - 44.26434489600069, - 44.30523401100072, - 44.321575986003154, - 44.337055284995586, - 44.336644027003786, - 44.36314678299823, - 44.44160512900271, - 44.46221452399914, - 44.52389284799574, - 44.61231425400183, - 44.616044738999335, - 44.62745083200571, - 44.626294742003665, - 44.78491345400107, - 44.78667745999701, - 44.902217864990234, - 44.953986407010234, - 44.99118577199988, - 45.21130387901212, - 45.52703636599472, - 45.61008204000245, - 45.67850869600079, - 45.841284045003704, - 46.02942528600397, - 46.070997867995175, - 46.28962421399774, - 46.41264205799962, - 46.6348841440049, - 46.68340962799266, - 49.541172833996825, - 49.56981259200256, - 49.673990291004884, - 49.711950969009195, - 49.721235074990545, - 49.76519344600092, - 49.92242433599313, - 50.048809329993674, - 50.115585460996954, - 50.16247952199774, - 50.17798482600483, - 50.224825647994294, - 50.250276824008324, - 50.29262656500214, - 50.29107969599136, - 50.33833053598937, - 50.35021170300024, - 50.45204951699998, - 50.59836568500032, - 50.65444918000139, - 50.7965662820061, - 50.874915487002, - 51.38703141499718, - 51.40519101699465, - 51.43727225900511, - 51.56839319701248, - 51.60096226099995, - 51.61720346599759, - 51.68126223300351, - 51.79784756799927, - 52.17421506300161, - 52.1999003529927, - 52.20034732299973, - 52.230382041991106, - 52.22992147300101, - 52.362915275996784, - 52.39365144900512, - 52.39194475700788, - 52.392447548991186, - 52.40600057299889, - 52.429050790990004, - 52.574091946007684, - 52.57491752499482, - 52.57622180400358, - 52.57529898699431, - 52.58431171700067, - 52.593423926999094, - 52.59527874899504, - 52.59373048800626, - 52.59605719099636, - 52.59502167200844, - 52.59725602499384, - 52.595933169999626, - 52.619954880996374, - 52.63325009700202, - 52.63759697499336, - 52.638115004010615, - 52.645801734004635, - 52.65479009099363, - 52.65607989800628, - 52.65933064100682, - 52.65980526599742, - 52.67395183199551, - 52.674358692995156, - 52.675305377997574, - 52.67906830301217, - 52.686741982994135, - 52.68804566700419, - 52.69251876800263, - 52.69985871198878, - 52.70053644500149, - 52.70536735300266, - 52.7175275030022, - 52.72547679999843, - 52.726503673009574, - 52.72802283598867, - 52.741508001010516, - 52.743294482992496, - 52.74219437300053, - 52.74215273899608, - 52.74214228100027, - 52.743554184999084 - ], - "storage_latencies": [ - 0.0732523179758573, - 0.13814878698030952, - 0.15988514102355111, - 0.14608485501958057, - 0.09545025501574855, - 0.256498470029328, - 0.20593036399804987, - 0.005562100996030495, - 0.11686944698158186, - 0.2688438089971896, - 0.07707164998282678, - 0.10776075998728629, - 0.11791579899727367, - 0.08111565699800849, - 0.12725408101687208, - 0.12873321399092674, - 0.2896795920096338, - 0.33509834502183367, - 0.3151099260139745, - 0.24865476801642217, - 0.3132739319844404, - 0.39583665698592085, - 0.35531570301100146, - 0.15528604903374799, - 0.47889709401351865, - 0.24003836602787487, - 0.25995844400313217, - 0.29396551303216256, - 0.1329988490033429, - 0.26738536900666077, - 0.24246162200870458, - 0.23527700502017979, - 0.12849274001200683, - 0.294347518007271, - 0.3125863169989316, - 0.04646456199407112, - 0.27045533499040175, - 0.03570673200010788, - 0.5030173170089256, - 0.35191771801328287, - 0.2777841879869811, - 0.11822173398104496, - 0.2810346619953634, - 0.31907526397844777, - 0.19146290599019267, - 0.307425511040492, - 0.3089724110031966, - 0.2702875219838461, - 0.05514367501018569, - 0.07425963498826604, - 0.08965965900279116, - 0.26167611601704266, - 0.20401158298773225, - 0.3448468299902743, - 0.36658533998706844, - 0.1982595240115188, - 0.0240035530005116, - 0.5450974970153766, - 0.5823516660602763, - 0.2182414660055656, - 0.09090340798138641, - 0.214417073992081, - 0.33282670997141395, - 0.18326157599221915, - 0.4458128539699828, - 0.43656171804468613, - 0.21168215399666224, - 0.3055921360064531, - 0.2961604490119498, - 0.10034493201237638, - 0.2603178880090127, - 0.4766053349885624, - 0.2526453919999767, - 0.08757233100186568, - 0.5878789310372667, - 0.38462794601218775, - 0.7465365840180311, - 0.6473721129732439, - 0.49277773397625424, - 0.5676403369725449, - 0.46903406096680555, - 0.4959209779917728, - 0.11963316700712312, - 0.30740821798099205, - 0.12179597000067588, - 0.31556504998297896, - 0.38546285501797684, - 0.3346919459872879, - 0.18760830396786332, - 0.37871915198047645, - 0.07166881799639668, - 0.3062250559887616, - 0.17772769699513447, - 0.04200389998732135, - 0.05768239899771288, - 0.06662296598369721, - 0.3738353979861131, - 0.338261210010387, - 0.2479439939779695, - 0.1235776780085871, - 0.8644443890079856, - 0.3262873200001195, - 0.08373940498859156, - 0.3271757830225397, - 0.03722464299062267, - 0.34350370899483096, - 0.06836049997946247, - 0.1507279740035301, - 0.1582716970006004, - 0.01619064700207673, - 0.11373225701390766, - 0.1267037699726643, - 0.3769171480234945, - 0.0476259669958381, - 0.1022271320107393, - 0.4614836989931064, - 0.8973575380223338, - 0.057582441018894315, - 0.7198375470325118, - 0.5037693230551668, - 0.010399816994322464, - 0.11917091297800653, - 0.06824878398037981, - 0.03721505501016509, - 0.08364153699949384, - 0.16254826802469324, - 0.30131812297622673, - 0.26085688201419543, - 0.10175476998847444, - 0.16774055700807367, - 0.03112462698481977, - 0.5010971480078297, - 0.037362362010753714, - 0.16022982499271166, - 0.04867014102637768, - 0.4474621749977814, - 0.24152198297088034, - 0.22801439800241496, - 0.10646291500597727, - 0.07044876700092573, - 0.08762920799199492, - 0.10204704001080245, - 0.4227094389643753, - 0.15261313198425341, - 0.3353132390184328, - 0.14673085500544403, - 0.4284920410136692, - 0.1058721369918203, - 0.36116645993024576, - 0.07870953899691813, - 0.15560808502777945, - 0.2803404500155011, - 0.0834587110002758, - 0.06864931201562285, - 0.03281315699859988, - 0.125142311968375, - 0.418848646004335, - 0.1158434410172049, - 0.12665429002663586, - 0.07647968499804847, - 0.07288907498877961, - 0.5295079110073857, - 0.4528303840197623, - 0.08394735201727599, - 0.03264072601450607, - 0.07888487800664734, - 0.11311941199528519, - 0.0637013380182907, - 0.06848846601496916, - 0.10566686202946585, - 0.031442583989701234, - 0.10413460801646579, - 0.12427422701148316, - 0.0936655950063141, - 0.03651196100690868, - 0.052891237020958215, - 0.0679630910162814, - 0.09866094803146552, - 0.1017221630027052, - 0.053377230986370705, - 0.16961251398606692, - 0.14803659199969843, - 0.14100156900531147, - 0.04250534302263986, - 0.3489238759939326, - 0.08796570800768677, - 0.061351325013674796, - 0.0724724790052278, - 0.06022507499437779, - 0.13425630101119168, - 0.18208463002520148, - 0.2433399730216479, - 0.09383328398689628, - 0.2231830390082905, - 0.9617611479916377, - 0.01624051999533549, - 0.1159559430234367, - 0.11304775602184236, - 0.05659971800923813, - 0.041649185004644096, - 0.25782981800148264, - 0.871476004991564, - 0.09923849598271772, - 0.058915783010888845, - 0.07789812902046833, - 0.10452258594159503, - 0.055704326994600706, - 0.043438759006676264, - 0.764845473022433, - 0.13892648997716606, - 0.08956395601853728, - 0.18686522997450083, - 0.12499815497722011, - 0.14165624500310514, - 0.13093484402634203, - 0.0937017200194532, - 0.03652291100297589, - 0.09378458601713646, - 0.1143514130380936, - 0.14549392204207834, - 0.02763430699997116, - 0.1618271420011297, - 0.04217923201213125, - 0.09875505199306644, - 0.049110942985862494, - 0.08279138599755242, - 0.06283593300031498, - 0.051665395018062554, - 0.08277122103027068, - 0.056739493025816046, - 0.08570944696839433, - 1.1976587909739465, - 0.2039122259884607, - 0.11892764197546057, - 0.16782800998771563, - 0.9294318090251181, - 0.06327167502604425, - 0.08078271498379763, - 0.036872778000542894, - 0.03716147999512032, - 0.16270476604404394, - 0.11970651998126414, - 0.10347559700312559, - 1.1408198759891093, - 0.05319710996991489, - 0.0725882250117138, - 0.10948854700836819, - 0.10468471799686085, - 0.20366351197299082, - 0.09241682599531487, - 0.07275083501008339, - 1.2160652179591125, - 0.12679617697722279, - 0.3957446679705754, - 0.08333433499501552, - 0.08811057101411279, - 0.025894970007357188, - 0.08864926199021284, - 0.09144930700131226, - 0.22694939299253747, - 0.0439087249833392, - 0.14239911497861613, - 0.03617983000003733, - 0.1512393689918099, - 0.1875932899711188, - 0.11006252800871152, - 0.10324256202147808, - 0.04821289599931333, - 0.04223711899248883, - 1.4664685239986284, - 0.041955012013204396, - 0.14620630697754677, - 0.21918344801815692, - 0.0372455039905617, - 0.03194827801780775, - 0.0536674189788755, - 0.06310388501151465, - 0.27294770801381674, - 0.19809930205519777, - 0.16163633398537058, - 0.15251630697457585, - 0.06286587099020835, - 0.10258765600156039, - 1.162768276029965, - 0.07587457699992228, - 0.05775662798259873, - 0.07251690099656116, - 0.141798721961095, - 1.4098454280028818, - 0.19870782001817133, - 0.10993126998073421, - 0.031027795994305052, - 0.09831007098546252, - 0.41028585002641194, - 0.16846141299174633, - 0.1628311489475891, - 0.09032799203123432, - 0.10641685703012627, - 0.02099216400529258, - 0.12998146502650343, - 0.06491752600413747, - 0.07197501799964812, - 0.14611319403047673, - 0.11961847799830139, - 0.16029398002137896, - 0.10992151800019201, - 0.05132669101294596, - 0.09024521702667698, - 0.11478864395758137, - 0.12456596698029898, - 0.020730636999360286, - 0.046725898995646276, - 0.24239868205040693, - 0.05201126397878397, - 0.1463835920440033, - 0.146452655972098, - 0.07834263901168015, - 0.11359772502328269, - 0.07803515304112807, - 0.07869110697356518, - 0.011004696003510617, - 0.01630590800778009, - 0.03679921998991631, - 0.08918131899554282, - 0.07490232199779712, - 0.12390128801052924, - 0.015797467989614233, - 0.24901325700921007, - 0.1316048739827238, - 0.056921257011708803, - 0.08449024699802976, - 0.1100993520085467, - 0.10409191498183645, - 0.025808941005379893, - 0.0678147330036154, - 0.14008370699593797, - 0.09401230099319946, - 0.09891040898219217, - 0.08872751897433773, - 0.13898649596376345, - 0.08354250498814508, - 0.020896661008009687, - 0.021465843965415843, - 0.11093447799794376, - 0.05288792100327555, - 0.06252860998210963, - 0.06924055202398449, - 0.11818173000938259, - 0.09874760299862828, - 0.1827699770074105, - 0.05982314300490543, - 0.059043997011031024, - 0.09511917398776859, - 0.10897004800790455, - 0.1819575979752699, - 0.08482373697916046, - 0.24399267497938126, - 0.03117286498309113, - 0.08421734497824218, - 0.01619167900935281, - 0.06936435899115168, - 0.025818698006332852, - 0.03650699199351948, - 0.11047759900975507, - 0.08503359601309057, - 0.07854586101893801, - 0.0796215129958, - 0.11648684499959927, - 0.07410184101900086, - 0.11009889301203657, - 0.07725347400992177, - 0.16297091700835153, - 0.02036139699339401, - 0.11209203301405068, - 0.07754800099064596, - 0.053530414996203035, - 0.020812706992728636, - 0.04286810400662944, - 0.08839281799737364, - 0.05769145399972331, - 0.03193890399415977, - 0.1396984830062138, - 0.07319580600596964, - 0.09454060101415962, - 0.10333950202038977, - 0.09920326300198212, - 0.058665915013989434, - 0.0936306610237807, - 0.10001649997138884, - 0.05357375401945319, - 0.184503926007892, - 0.09449856402352452, - 0.09521203301846981, - 0.20327135801198892, - 0.03630786799476482, - 0.03143808100139722, - 0.02120668200950604, - 0.062685639000847, - 0.12622157495934516, - 0.10988432499289047, - 0.09250344101747032, - 0.06304486001317855, - 0.288797770961537, - 0.010372752003604546, - 0.05212442297488451, - 0.08291317199473269, - 0.06710281799314544, - 0.1157255749712931, - 0.10971538703597616, - 0.1591395110299345, - 0.04181610899104271, - 0.07338235099450685, - 0.22444611899845768, - 0.07381081499624997, - 0.05753844500577543, - 0.06190064699330833, - 0.03120102900720667, - 0.04693370700988453, - 0.22028237597260159, - 0.12800955696729943, - 0.1728080739849247, - 0.010580546979326755, - 0.10642777300381567, - 0.10887728897796478, - 0.11486533799325116, - 0.20461246391641907, - 0.0471789620060008, - 0.1195884439512156, - 0.1332225940132048, - 0.08008555500418879, - 0.16742973201326095, - 0.11357387495809235, - 0.06904285900236573, - 0.07771547198353801, - 0.113795598022989, - 0.08568520801782142, - 0.06694862898439169, - 0.11230349603283685, - 0.08807287600939162, - 0.03716033800446894, - 0.09624601202085614, - 0.07767914497526363, - 0.10369765700306743, - 0.20018928500940092, - 0.10468894898076542, - 0.10010257501562592, - 0.06811767898034304, - 0.12320005797664635, - 0.010487755003850907, - 0.10646459397685248, - 0.005184642010135576, - 0.041635236993897706, - 0.15874297198024578, - 0.021664197018253617, - 0.10501420796208549, - 0.04831299898796715, - 0.11231469399353955, - 0.20688861804956105, - 0.16159832300036214, - 0.15234090301964898, - 0.11648416001116857, - 0.0723629309941316, - 0.026634673980879597, - 0.03631463101191912, - 0.042237297995598055, - 0.11645726498682052, - 0.1905255869642133, - 0.08906350898905657, - 0.20167101902188733, - 0.07286948499677237, - 0.10312006699678022, - 0.05480834901391063, - 2.749441771011334, - 0.062365576974116266, - 0.09903886799293105, - 0.05688417598139495, - 0.09844380003050901, - 0.13829520999570377, - 0.14023608699790202, - 0.1184576510131592, - 0.10439636898809113, - 0.06829434298560955, - 0.06816482498834375, - 0.2629594279715093, - 0.10215405200142413, - 0.2167910840071272, - 0.08410601399373263, - 0.031026880984427407, - 0.07935979502508417, - 0.14021122101985384, - 0.04036204899603035, - 0.09132617802242748, - 0.06282565701985732, - 0.068908575020032, - 0.04154051899968181, - 0.0987778259877814, - 0.04665144199680071, - 0.13886594197538216, - 0.02234637099900283, - 0.04802005201054271, - 0.07242467298055999, - 0.02904923698224593, - 0.20477051101624966, - 0.2158037450426491, - 0.09477120102383196, - 0.07284735499706585, - 0.12321427902497817, - 0.09139177900215145, - 2.722919849024038, - 0.2774492689932231, - 0.1417335890000686, - 0.11092335898138117, - 0.05564450401288923, - 0.13398031900578644, - 0.22390430097584613, - 0.1637142149702413, - 0.2272928979655262, - 0.09674043199629523, - 0.16311162698548287, - 0.14259211001626682, - 0.20968752005137503, - 0.20787018399278168, - 0.16222523398755584, - 0.1496565040142741, - 0.1568982939934358, - 0.1998918900062563, - 0.2580729669862194, - 0.27128810400608927, - 0.28563577999011613, - 0.5454269979818491, - 0.1884459619905101, - 0.25723148697579745, - 0.18721037400246132, - 0.32857122900895774, - 0.24705301801441237, - 0.1828937620157376, - 0.3037348280195147, - 0.3444902120245388, - 0.2254064200387802, - 0.20395859798009042, - 0.22396318000392057, - 0.3377018950122874, - 0.3490637060167501, - 0.5309083980391733, - 0.336766917956993, - 0.2545015300420346, - 0.39292443194426596, - 0.23253486400062684, - 0.2394695600232808 - ], - "generation_latencies": [ - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0 - ], - "throughput_timeline": [], - "prefill_latencies": [ - 0.02806143400084693, - 0.0318174879939761, - 0.031037744993227534, - 0.09112562600057572, - 0.07278458299697377, - 0.08437604999926407, - 0.03549889801070094, - 0.03176944200822618, - 0.02651031500136014, - 0.02416356401226949, - 0.03998900200531352, - 0.021697548989322968, - 0.012488531996496022, - 0.08470455199130811, - 0.13012938501196913, - 0.08465122101188172, - 0.13741148600820452, - 0.14027382399945054, - 0.022811004993855022, - 0.10475173901068047, - 0.02825760999985505, - 0.056556527997599915, - 0.06835289800073951, - 0.07550077298947144, - 0.004215538006974384, - 0.14221332300803624, - 0.08313594601349905, - 0.05233958500321023, - 0.05244880999089219, - 0.05221419200825039, - 0.08304757600126322, - 0.05438509500527289, - 0.05830703099491075, - 0.059337854996556416, - 0.059449667009175755, - 0.08887125500768889, - 0.08905546499590855, - 0.0894371879985556, - 0.08377681698766537, - 0.13792075699893758, - 0.08868977500242181, - 0.08956275800301228, - 0.09011384399491362, - 0.09044887300115079, - 0.05953123299696017, - 0.09020914700522553, - 0.09080287099641282, - 0.13963063500705175, - 0.08676872699288651, - 0.08917629400093574, - 0.10641964100068435, - 0.06597194999631029, - 0.09810354799265042, - 0.05757626300328411, - 0.06861077099165414, - 0.06295704199874308, - 0.06445907699526288, - 0.07466473200474866, - 0.02726938399428036, - 0.02717219501209911, - 0.037078377994475886, - 0.1329313070018543, - 0.11861669199424796, - 0.11259382500429638, - 0.14146649300528225, - 0.035044226999161765, - 0.023061945001245476, - 0.027082216998678632, - 0.020707265008240938, - 0.028914281996549107, - 0.026846512002521195, - 0.029316416999790817, - 0.008559326000977308, - 0.03055284300353378, - 0.010269509992212988, - 0.021368097004597075, - 0.020060310009284876, - 0.04913145399768837, - 0.04764213500311598, - 0.055075304000638425, - 0.054921800998272374, - 0.012357782004983164, - 0.0615385430137394, - 0.06319590400380548, - 0.05490424399613403, - 0.019370327005162835, - 0.048378545994637534, - 0.04865225500543602, - 0.02972004300681874, - 0.029552865991718136, - 0.08420221800042782, - 0.02516511399880983, - 0.037936430002446286, - 0.03050227399216965, - 0.0376056899985997, - 0.019448213002760895, - 0.12856636100332253, - 0.1419043089990737, - 0.11865742900408804, - 0.009569167988956906, - 0.11575847199128475, - 0.0915139990102034, - 0.02411806900636293, - 0.01615326400496997, - 0.009637398004997522, - 0.09881009500531945, - 0.01706346799619496, - 0.017249665994313546, - 0.017278491010074504, - 0.09377371300070081, - 0.03444317899993621, - 0.09692146099405363, - 0.08802712900796905, - 0.11141834800946526, - 0.08999880599731114, - 0.07851983899308834, - 0.08411185299337376, - 0.02071872999658808, - 0.09768577700015157, - 0.02617514801386278, - 0.0164600490097655, - 0.0388175650004996, - 0.03650325199123472, - 0.09034935598901939, - 0.031141503000981174, - 0.04088131699245423, - 0.030229433003114536, - 0.0320119810057804, - 0.036789043006137945, - 0.03212798899039626, - 0.020612788997823372, - 0.02639097000064794, - 0.016099215994472615, - 0.010266605007927865, - 0.010588946999632753, - 0.04174089099979028, - 0.02746749299694784, - 0.02610832199570723, - 0.020621164003387094, - 0.34777725000458304, - 0.015377047006040812, - 0.005208925998886116, - 0.04662110201024916, - 0.021342129999538884, - 0.020815570998820476, - 0.015581655010464601, - 0.0, - 0.026194306003162637, - 0.02594869698805269, - 0.021391387010226026, - 0.0505513950047316, - 0.0, - 0.0216380769998068, - 0.041380519993253984, - 0.38714127900311723, - 0.04659965599421412, - 0.015585581000777893, - 0.010804199002450332, - 0.015784086994244717, - 0.02308992100006435, - 0.020798896992346272, - 0.049597946999710985, - 0.03151756900479086, - 0.025957867997931316, - 0.020857023002463393, - 0.0, - 0.053088404005393386, - 0.01592367400007788, - 0.03182821400696412, - 0.04139601001224946, - 0.026355962007073686, - 0.015711359010310844, - 0.015790231991559267, - 0.15251916299166624, - 0.015749309008242562, - 0.03226824699959252, - 0.005461714987177402, - 0.04167480800242629, - 0.022884550009621307, - 0.041752312987227924, - 0.182450748005067, - 0.02592996699968353, - 0.046197429997846484, - 0.18469133500184398, - 0.010900573004619218, - 0.02573106699855998, - 0.03098565600521397, - 0.026892284004134126, - 0.021103788996697403, - 0.020439344996702857, - 0.010438476005219854, - 0.03607374700368382, - 0.02092690499557648, - 0.0, - 0.020711085991933942, - 0.0, - 0.021026197005994618, - 0.020554512986564077, - 0.015885622997302562, - 0.03641797600721475, - 0.02122088299074676, - 0.02080164200742729, - 0.02617036399897188, - 0.015771677004522644, - 0.03114503998949658, - 0.031607163007720374, - 0.010836353991180658, - 0.02065308700548485, - 0.026049585998407565, - 0.025754285990842618, - 0.010498183997697197, - 0.020714460988529027, - 0.0, - 0.010338574007619172, - 0.04143054099404253, - 0.02556108900171239, - 0.020829803004744463, - 0.03128388700133655, - 0.010334371996577829, - 0.015400415009935386, - 0.030843931992421858, - 0.022127831995021552, - 0.02114973599964287, - 0.0, - 0.036224499010131694, - 0.041306539002107456, - 0.0, - 0.05140877699886914, - 0.031005056007415988, - 0.025643978005973622, - 0.0, - 0.015670937005779706, - 0.0, - 0.026320855002268218, - 0.041932976004318334, - 0.011001352002494968, - 0.0, - 0.005920661991694942, - 0.026826336994417943, - 0.00530909800727386, - 0.0, - 0.026218324986984953, - 0.021015858990722336, - 0.020950707999872975, - 0.025949094007955864, - 0.0, - 0.02111421299923677, - 0.025674054995761253, - 0.015611825991072692, - 0.023058838007273152, - 0.020570048989611678, - 0.022065219993237406, - 0.02607807000458706, - 0.02565704900189303, - 0.020765840003150515, - 0.0, - 0.041282889011199586, - 0.020977552005206235, - 0.03404544999648351, - 0.02080464300524909, - 0.027108152004075237, - 0.034127562001231126, - 0.010302216003765352, - 0.03604251900105737, - 0.020955965999746695, - 0.0, - 0.02141158501035534, - 0.027133650000905618, - 0.03125486700446345, - 0.03125614101008978, - 1.0945826309907716, - 0.022266924992436543, - 0.016408529001637362, - 0.021739379997598007, - 0.02118139799858909, - 1.1057568579999497, - 0.021048058988526464, - 0.0, - 0.0, - 0.0, - 0.02085206500487402, - 0.0161525849980535, - 0.0, - 0.025786701997276396, - 0.025935440993634984, - 0.0, - 0.0, - 0.0, - 0.0, - 0.02061643300112337, - 0.0, - 0.011091634005424567, - 0.0, - 0.005314072986948304, - 0.05124067699944135, - 0.010742498998297378, - 0.015917648997856304, - 0.0, - 0.04197028999624308, - 0.033421525004087016, - 0.021791936000226997, - 0.026290377994882874, - 0.026787591006723233, - 0.03144108899869025, - 0.0, - 0.0, - 0.02638844499597326, - 0.03629962500417605, - 0.026331617002142593, - 0.036727564001921564, - 0.027902239991817623, - 0.0, - 0.0, - 0.0, - 0.02147963699826505, - 0.02088346799428109, - 0.020422278001205996, - 0.01032022200524807, - 0.03590844199061394, - 0.02234660100657493, - 0.020817732001887634, - 0.015369101005489938, - 0.025965687003917992, - 0.04659189000085462, - 0.0, - 0.020535991003271192, - 0.036298487000749446, - 0.010508224993827753, - 0.020533632996375673, - 0.02554297199822031, - 0.010349154996220022, - 0.026586920997942798, - 0.02120264500263147, - 0.010469300003023818, - 0.026143126990064047, - 0.015512255995417945, - 0.010383686007116921, - 0.02622710500145331, - 0.01046255799883511, - 0.02101507899351418, - 0.02110579700092785, - 0.010482116995262913, - 0.015413140004966408, - 0.01555943299899809, - 0.0, - 0.01091947099484969, - 0.025894962003803812, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.01592823999817483, - 0.0, - 0.027113552001537755, - 0.02081206398725044, - 0.020986710995202884, - 0.0, - 0.0, - 0.010253626998746768, - 0.026818137004738674, - 0.02641779500117991, - 0.021713905996875837, - 0.022283628990408033, - 0.04143255201051943, - 0.0, - 0.01561596400279086, - 0.016325714997947216, - 0.01617494500533212, - 0.020569971995428205, - 0.026085038000019267, - 0.020710834010969847, - 0.015658763993997127, - 0.016155626988620497, - 0.0, - 0.02881559399247635, - 0.031174988005659543, - 0.0, - 0.015611556998919696, - 0.02597268200770486, - 0.0, - 0.025963395004509948, - 0.03105243000027258, - 0.03196975000901148, - 0.02095143600308802, - 0.0, - 0.020705182003439404, - 0.023500605006120168, - 0.0, - 0.01036837900755927, - 0.02130756100814324, - 0.04782380600227043, - 0.020724376998259686, - 0.04137356900901068, - 0.0, - 0.0, - 0.04762132200994529, - 0.0, - 0.025624585003242828, - 0.010910922996117733, - 0.025708360990392976, - 0.021891428012168035, - 0.010421148996101692, - 0.025776822003535926, - 0.01623969200591091, - 0.020750692987348884, - 0.023335106001468375, - 0.0, - 0.015477339999051765, - 0.0, - 0.0, - 0.016540874989004806, - 0.015883897998719476, - 0.0, - 0.0, - 0.0, - 0.010959356994135305, - 0.0, - 0.025698298995848745, - 0.0, - 0.0, - 0.017253462006920017, - 0.015883052998105995, - 0.021191454987274483, - 0.0, - 0.015620748003129847, - 0.020919814007356763, - 0.028015064002829604, - 0.0, - 0.0, - 0.030471723002847284, - 0.035905359996831976, - 0.020404234994202852, - 0.0, - 0.0, - 0.017012080003041774, - 0.030830884003080428, - 0.0, - 0.0, - 0.0, - 0.036631041002692655, - 0.030921293000574224, - 0.031130748000578023, - 0.03089778299909085, - 0.0, - 0.0, - 0.0, - 0.016184644991881214, - 0.0, - 0.0, - 0.0, - 0.0, - 0.010277608991600573, - 0.0, - 0.010435275005875155, - 0.022263324994128197, - 0.0, - 0.0, - 0.0, - 0.032319607998942956, - 0.026358650997281075, - 0.020546390005620196, - 0.016932582992012613, - 0.02080753300106153, - 0.0210999810078647, - 0.020764119995874353, - 0.0, - 0.020853807000094093, - 0.0, - 0.0, - 0.0511571720126085, - 0.0, - 0.0, - 0.0, - 0.015609687005053274, - 0.01558039800147526, - 0.010612546990159899, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.012278393987799063, - 0.026691342995036393, - 0.03117527099675499, - 0.020494274009251967, - 0.01551322199520655, - 0.010824212004081346, - 0.011286603010375984, - 0.0, - 0.01631073399039451, - 0.021416221003164537, - 0.0, - 0.010697777004679665, - 0.0, - 0.0, - 0.0, - 0.011161290996824391, - 0.020978435990400612, - 0.0, - 0.015486642005271278, - 0.0, - 0.031539446994429454, - 0.0, - 0.04689917300129309, - 0.01566615300544072, - 0.015910793998045847, - 0.03102960299293045, - 0.0313214740017429, - 2.595607993003796, - 0.031272545995307155, - 0.025671896000858396, - 0.0, - 0.0, - 0.020784774009371176, - 0.0, - 0.03619736299151555, - 0.0, - 0.015472244995180517, - 0.0, - 0.026177018007729203, - 0.02114790199266281, - 0.020480787992710248, - 0.0, - 0.0, - 0.0, - 0.031200239987811074, - 0.019055609009228647, - 0.0, - 0.010390064999228343, - 0.015571449999697506, - 0.0, - 0.0, - 0.0, - 0.012317475004238077, - 0.17955069700838067, - 0.031332935992395505, - 0.0, - 0.02736275400093291, - 0.008716860989807174, - 0.008299153996631503, - 0.1524175579979783, - 0.15328714399947785, - 0.18050788099935744, - 0.18058539000048768 - ], - "decode_latencies": [ - 0.027651033000438474, - 0.006605071990634315, - 0.02089744199474808, - 0.005496166995726526, - 0.011213231002329849, - 0.005978207991574891, - 0.006576113999472, - 0.0001118669897550717, - 0.04852117800328415, - 0.007014770002570003, - 0.006979001002036966, - 0.007144887000322342, - 0.0069479590019909665, - 0.022432222991483286, - 0.0072185250028269365, - 0.00838228699285537, - 0.02190375200007111, - 0.022629516999586485, - 0.07178710200241767, - 0.0001926319964695722, - 0.016264595004031435, - 0.01611638600297738, - 0.02473794700927101, - 0.008884360999218188, - 0.025901763001456857, - 0.018793662005919032, - 0.007490830001188442, - 0.006404255997040309, - 0.0073211619892390445, - 0.009254862001398578, - 0.006479561008745804, - 0.009537232996081002, - 0.00809864400071092, - 0.0675581559917191, - 0.007750042001134716, - 0.011585860003833659, - 0.0203642470005434, - 0.010169430999667384, - 0.015838115010410547, - 0.007706181990215555, - 0.01388101899647154, - 0.006342973007122055, - 0.008802521988400258, - 0.007769219009787776, - 0.012889371006167494, - 0.009772605000762269, - 0.00669027199910488, - 0.006047856993973255, - 0.024923649994889274, - 0.05176100099924952, - 0.007040971991955303, - 0.01954745699185878, - 0.0077840839949203655, - 0.028114077998907305, - 0.007395444001303986, - 0.0028130179998697713, - 0.013286543995491229, - 0.11647849201108329, - 0.006948453010409139, - 0.0329160929977661, - 0.006534330997965299, - 0.013164492003852502, - 0.013554201999795623, - 0.006367945010424592, - 0.022727788003976457, - 0.014681414992082864, - 3.8461992517113686e-05, - 0.11985574501159135, - 0.006168063002405688, - 0.010116274002939463, - 0.019944978004787117, - 0.025791466992814094, - 0.016989188006846234, - 0.009958929003914818, - 0.031867785000940785, - 0.013834220007993281, - 0.010544487988227047, - 0.005443343005026691, - 0.003221564009436406, - 0.006771928005036898, - 0.009083448007004336, - 0.00906618099543266, - 0.015603708001435734, - 0.02402393899683375, - 0.010324375005438924, - 0.013867159999790601, - 0.0034889890084741637, - 0.013140627008397132, - 0.0032488909928360954, - 0.006139124001492746, - 0.02037588899838738, - 0.010382888998719864, - 0.002185787001508288, - 0.005378364003263414, - 0.005207973998039961, - 0.010392573996796273, - 0.01308921500458382, - 0.012455083997338079, - 0.017414564994396642, - 0.010415468001156114, - 0.025978108009439893, - 0.015672351990360767, - 0.013388407998718321, - 0.030906381987733766, - 0.005159835010999814, - 0.026519412000197917, - 7.128099969122559e-05, - 0.1801720670046052, - 0.015499897010158747, - 0.015361814002972096, - 0.010363135996158235, - 0.005153615013114177, - 0.002840511006070301, - 0.0154372550023254, - 0.015675460002967156, - 0.013625655003124848, - 0.013226993003627285, - 0.001677125008427538, - 0.013317988996277563, - 0.005133295999257825, - 0.016024746000766754, - 0.010409422000520863, - 0.005141376008396037, - 0.005186841997783631, - 0.005276613999740221, - 0.0036247430107323453, - 0.020620613009668887, - 0.014531369000906125, - 0.005237411998677999, - 0.02059585798997432, - 0.010264864002238028, - 0.020518178993370384, - 0.0004172850021859631, - 0.005187968999962322, - 0.010289501995430328, - 0.015339075995143503, - 0.010324514005333185, - 0.00515309399634134, - 0.010235929003101774, - 0.005244173007667996, - 0.005273809001664631, - 0.01664313199580647, - 0.011352293993695639, - 0.010311805002857, - 0.0174478970002383, - 0.01558243200997822, - 0.007220371990115382, - 0.006269973004236817, - 0.0012809260078938678, - 0.010393814009148628, - 0.011481850000564009, - 0.006199956013006158, - 0.005570103996433318, - 0.005270597001072019, - 0.005198230006499216, - 0.005302188990754075, - 0.02585095100221224, - 0.02171171599184163, - 0.010318458007532172, - 0.010300916008418426, - 0.010485129998414777, - 0.0053500460053328425, - 0.00426714398781769, - 0.00015254798927344382, - 0.005170164004084654, - 6.0664009652100503e-05, - 0.005453600999317132, - 0.01016459200764075, - 0.005211007999605499, - 0.005247312001301907, - 0.015780265006469563, - 0.0053679300035582855, - 0.005354606997570954, - 0.005559696001000702, - 0.01038075600808952, - 0.010321681998902932, - 0.010633006997522898, - 0.015778324013808742, - 0.0051600959995994344, - 0.005145074988831766, - 4.2633997509256005e-05, - 0.00014667899813503027, - 0.010326290008379146, - 0.00519764400087297, - 0.010161199010326527, - 0.005232752999290824, - 0.005594476999249309, - 0.005198066995944828, - 0.0105676509992918, - 0.010336382008972578, - 0.01569752699288074, - 0.0152835789922392, - 0.01023218099726364, - 0.005326780999894254, - 0.025729186003445648, - 0.00010642599954735488, - 0.005358445996535011, - 0.010307213000487536, - 0.010296787004335783, - 0.00028716800443362445, - 0.010319452994735911, - 0.005188177994568832, - 0.005581257995800115, - 0.010378389997640625, - 0.010258588008582592, - 0.005208556001889519, - 0.021129904009285383, - 0.01944676900166087, - 0.010161901998799294, - 0.010548448000918142, - 0.005166464994545095, - 0.03557594200538006, - 0.005172662000404671, - 0.005126806994667277, - 0.0051693079876713455, - 0.005188504001125693, - 0.010155794996535406, - 0.010247440994135104, - 0.0052218110067769885, - 0.02056349700433202, - 0.005162696004845202, - 0.00532872301118914, - 0.010583312003291212, - 0.005299951997585595, - 0.01584916999854613, - 0.005166250004549511, - 0.0052075869898544624, - 6.007000047247857e-05, - 9.56110015977174e-05, - 0.01528616200084798, - 0.015502159993047826, - 0.01031844099634327, - 0.010603381000692025, - 0.015363158003310673, - 0.015495396990445442, - 7.011400884948671e-05, - 0.010403895998024382, - 0.005174079007701948, - 0.005157956009497866, - 0.005147445001057349, - 0.010441279009683058, - 0.0052379900007508695, - 0.005102429990074597, - 0.019424063008045778, - 0.015919099998427555, - 5.654399865306914e-05, - 0.010548562990152277, - 0.010354097001254559, - 0.010424633990623988, - 0.015891258997726254, - 0.02036390699504409, - 0.005188251001527533, - 0.010383936009020545, - 0.016039933005231433, - 0.01601565899909474, - 0.010628325995639898, - 0.01022759499028325, - 0.010342369991121814, - 0.010201159995631315, - 0.005194344004848972, - 0.010571587001322769, - 0.0051194859988754615, - 0.010447013002703898, - 0.005830555004649796, - 0.005185807996895164, - 0.010337299012462609, - 0.015494991006562486, - 0.010384590990724973, - 8.177700510714203e-05, - 0.010676236008293927, - 0.00514969699725043, - 0.005231146002188325, - 0.01077849100693129, - 0.02048855400062166, - 0.01041464100126177, - 0.02075236699602101, - 0.005162407003808767, - 0.015551962002064101, - 0.010246833000564948, - 0.005787399990367703, - 0.010259303991915658, - 0.01567412100848742, - 0.01450104899413418, - 0.0103165290056495, - 0.010335682993172668, - 0.010348171010264196, - 0.010823479999089614, - 0.010481908000656404, - 0.00512645099661313, - 0.016626176991849206, - 0.010312508005881682, - 0.005324494995875284, - 0.005174278994672932, - 0.020480895007494837, - 0.015359994999016635, - 0.010392968004452996, - 0.01267496201035101, - 0.005166743998415768, - 0.005160336004337296, - 0.031007219993625768, - 0.005204166998737492, - 0.005473883007653058, - 0.0051782020018436015, - 0.010416814999189228, - 0.015814127997145988, - 0.010496526985662058, - 5.936900561209768e-05, - 0.01014724699780345, - 0.015530841003055684, - 0.010220463998848572, - 0.010286914999596775, - 0.0051086790044792, - 0.005195227000513114, - 0.010245109995594248, - 0.020774240998434834, - 0.00516043599054683, - 0.010297255008481443, - 0.010467444008099847, - 0.010779080010252073, - 0.005164036992937326, - 0.005178837003768422, - 0.005186395996133797, - 0.010344840004108846, - 0.015594580996548757, - 0.020630273007554933, - 0.019813390012132004, - 0.020552237998344935, - 0.016083213995443657, - 0.010301711008651182, - 0.015328893001424149, - 0.010461754995048977, - 0.005154628001037054, - 9.07789944903925e-05, - 0.016416589001892135, - 3.438600106164813e-05, - 0.005231693008681759, - 0.010527122998610139, - 0.015706229998613708, - 0.027524639997864142, - 0.015692252985900268, - 0.005127698997966945, - 0.010476981013198383, - 0.005157154999324121, - 0.005503941996721551, - 0.005166140996152535, - 0.0051324710075277835, - 0.005168962001334876, - 0.02104485699965153, - 0.010335975995985791, - 0.005195815989281982, - 0.006165020007756539, - 0.015872914998908527, - 0.011023440994904377, - 0.010386861991719343, - 0.025741912992089055, - 0.005329561012331396, - 0.005312547989888117, - 0.010329429991543293, - 0.005297520008753054, - 0.0052251000015530735, - 0.010142630999325775, - 0.005139512999448925, - 0.005120877001900226, - 0.010471605986822397, - 0.005114897998282686, - 0.00523082900326699, - 0.0205536290013697, - 0.02031622501090169, - 0.010168554988922551, - 0.010449198001879267, - 0.005912103995797224, - 0.01566924598591868, - 0.0051439059898257256, - 0.005123463997733779, - 0.005267964996164665, - 0.010311089994502254, - 0.005208379996474832, - 0.010475563001818955, - 0.010335739993024617, - 0.005228198002441786, - 0.015583785003400408, - 0.0053435479931067675, - 0.015827505994820967, - 0.005202765998546965, - 0.005410944999312051, - 0.0051765820098808035, - 0.005182270993827842, - 0.010588436998659745, - 0.01095165399601683, - 0.00588463299209252, - 0.005584636994171888, - 0.005348270002286881, - 0.015891529998043552, - 7.411801198031753e-05, - 0.01072653399023693, - 0.005175096011953428, - 0.006204103003256023, - 0.005214120013988577, - 0.015763454997795634, - 0.010408515998278745, - 0.015238956999382935, - 0.010263884003506973, - 0.0104544369969517, - 0.005263965998892672, - 0.010614817001624033, - 0.005100009992020205, - 0.005156804996659048, - 0.010628439995343797, - 0.010459502998855896, - 0.01585694999084808, - 0.0051190920057706535, - 0.01021466100064572, - 0.010284920004778542, - 0.010149689987883903, - 0.005201039995881729, - 0.010275451000779867, - 0.010390059993369505, - 0.015364005987066776, - 0.025390837006852962, - 0.028164993011159822, - 0.005346229008864611, - 0.005113357008667663, - 0.005169190990272909, - 0.005273347007459961, - 0.005214320990489796, - 0.01120725400687661, - 0.01073830499080941, - 0.005437492000055499, - 0.0051193530089221895, - 0.015392632005386986, - 0.005661011004121974, - 0.010300474998075515, - 0.027343987007043324, - 0.020508159999735653, - 0.005165595997823402, - 0.005843509003170766, - 0.005113450999488123, - 0.010397746998933144, - 0.005234134994680062, - 0.00512002500181552, - 0.011295240008621477, - 0.01661005000642035, - 0.015317434997996315, - 0.00533407399780117, - 0.025751160996151157, - 0.010253624990582466, - 0.015493147991946898, - 0.005329609994078055, - 0.005232687995885499, - 0.010863739007618278, - 0.005144212991581298, - 0.005156604995136149, - 0.015419406990986317, - 0.010208580002654344, - 0.005162457004189491, - 0.01588641299167648, - 0.005164294998394325, - 0.013142726005753502, - 0.005977920998702757, - 0.005152571000508033, - 0.010793427005410194, - 0.010355896010878496, - 0.005384989999583922, - 0.005346674006432295, - 0.005193543998757377, - 0.005201181003940292, - 0.005277589996694587, - 0.005318309995345771, - 0.010400326995295472, - 0.005144687995198183, - 0.005200339001021348, - 0.015329927002312616, - 0.010369876996264793, - 0.020407286006957293, - 0.040877423001802526, - 0.005152689991518855, - 0.005630558996926993, - 0.005144086011569016, - 0.01542114099720493, - 0.005191165008000098, - 0.010289982994436286, - 8.666299981996417e-05, - 0.00514239699987229, - 0.010261136005283333, - 0.010386887995991856, - 0.01015387500228826, - 0.010735985008068383, - 0.015502873007790186, - 0.0156763880077051, - 0.020904437988065183, - 0.005157088002306409, - 0.0051758899935521185, - 0.01575962499191519, - 0.005217401994741522, - 0.005132382997544482, - 0.01021329200011678, - 0.005407071992522106, - 0.00516161099949386, - 0.010425458996905945, - 0.005172114004381001, - 0.015938034004648216, - 3.2880008802749217e-05, - 0.011081299991928972, - 0.005358541995519772, - 0.005182578999665566, - 0.0051352580048842356, - 0.005186276001040824, - 0.020243033999577165, - 0.005224042994086631, - 0.010301114001777023, - 0.01026615900627803, - 0.010374753008363768, - 0.017961314995773137, - 0.00513826499809511, - 0.010740601006546058, - 0.005135742001584731, - 0.005605270998785272, - 0.005173762998310849, - 0.010487541003385559, - 0.005297491996316239, - 0.0051139150018570945, - 0.005192637996515259, - 0.010476784998900257, - 0.005136749998200685, - 0.012596133994520642, - 0.005287939988193102, - 0.015286091991583817, - 0.007479154999600723, - 0.02255002399033401, - 0.046042134999879636, - 0.01838906698685605, - 0.005158069005119614, - 0.0051606269989861175, - 0.0014987000031396747, - 0.005749850999563932, - 0.01541886999621056, - 0.0434942829888314, - 0.011690172003000043, - 0.010347328992793337, - 0.017944298000657, - 0.018476308992831036, - 0.0052484869956970215, - 0.010327630996471271, - 0.01372094900580123, - 0.012536244001239538, - 0.005188429990084842, - 0.019137133000185713, - 0.14830706699285656, - 0.020266388994059525 - ], - "multi_turn_cache_hits": 78, - "multi_turn_cache_misses": 314, - "seed": 42, - "summary": { - "total_requests": 549, - "total_tokens": 148262, - "elapsed_time": 51.71758222579956, - "avg_throughput_tokens_per_sec": 2866.7620105032443, - "requests_per_second": 10.615345427461394, - "end_to_end_latency_ms": { - "mean": 25762.78353464132, - "p50": 26422.675355002866, - "p95": 52627.93201059976, - "p99": 52735.03512180003 - }, - "storage_io_latency_ms": { - "mean": 179.86305656335347, - "p50": 110.92335898138117, - "p95": 494.66368038556544, - "p99": 1180.9113438008346 - }, - "generation_latency_ms": { - "mean": 0.0, - "p50": 0.0, - "p95": 0.0, - "p99": 0.0 - }, - "cache_stats": { - "cache_hit_rate": 0.9301972685887708, - "cache_hits": 5517, - "cache_misses": 414, - "gpu_entries": 435, - "cpu_entries": 0, - "nvme_entries": 0, - "gpu_memory_used_gb": 7.3746337890625, - "cpu_memory_used_gb": 0.0, - "offloads_cpu": 0, - "offloads_nvme": 0, - "storage_health": { - "overall_status": "PASS", - "criteria": [ - { - "name": "Cache Hit Rate > 30%", - "target": 0.3, - "actual": 0.9301972685887708, - "unit": "ratio", - "passed": true - } - ], - "passed_count": 1, - "total_count": 1 - }, - "prefill_writes": 435, - "decode_reads": 5517, - "prefill_bytes_written_gb": 7.3746337890625, - "decode_bytes_read_gb": 96.243408203125, - "system_prompt_hits": 968, - "common_phrase_hits": 0, - "user_cache_hits": 4471, - "multi_turn_hits": 78, - "total_read_bytes": 103340572672, - "total_write_bytes": 7918452736, - "total_read_gb": 96.243408203125, - "total_write_gb": 7.3746337890625, - "read_write_ratio": 13.050601691688875, - "read_iops": 5517, - "write_iops": 435, - "gpu_read_p50_ms": 10.280613991199061, - "gpu_read_p95_ms": 29.388316598487958, - "gpu_read_p99_ms": 118.34518463176214, - "gpu_write_p50_ms": 26.049585998407565, - "gpu_write_p95_ms": 129.03526820591642, - "gpu_write_p99_ms": 292.3280389036466 - }, - "qos_metrics": { - "interactive": { - "total_requests": 549, - "latency_ms": { - "mean": 25762.783534641323, - "p50": 26422.675355002866, - "p95": 52627.93201059976, - "p99": 52735.03512180003, - "max": 52743.554184999084 - }, - "sla": { - "target_p95_ms": 50, - "actual_p95_ms": 52627.93201059976, - "compliance": 0.0, - "met": false - } - }, - "responsive": { - "no_data": true - }, - "batch": { - "no_data": true - } - }, - "prefix_cache_stats": { - "prefix_hits": 100, - "prefix_misses": 449, - "system_prompt_reuse": 100, - "common_phrase_reuse": 0, - "bytes_saved": 87293952 - }, - "autoscaling_stats": [], - "autoscaling_summary": null, - "multi_turn_stats": { - "cache_hits": 78, - "cache_misses": 314, - "hit_rate": 0.1989795918367347 - } - } -} \ No newline at end of file diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_only_trial3.json b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_only_trial3.json deleted file mode 100644 index bd940a26..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_gpu_only_trial3.json +++ /dev/null @@ -1,2875 +0,0 @@ -{ - "requests_completed": 549, - "total_tokens_generated": 147313, - "total_storage_io_latency": 78.35206160502275, - "total_generation_latency": 0.0, - "end_to_end_latencies": [ - 0.15975962800439447, - 0.18225778600026388, - 0.23585739400004968, - 0.24182755700894631, - 0.25305889500305057, - 0.27834574700682424, - 0.27927010899293236, - 0.46459749700443354, - 0.47679414101003204, - 0.4830173689988442, - 0.5306201170023996, - 0.5328180499927839, - 0.6076984289975371, - 0.6085127139958786, - 0.6439369319996331, - 0.6581228059949353, - 0.6662103010021383, - 0.6659408980049193, - 0.680180723007652, - 0.6810294070019154, - 0.6836641830013832, - 0.6832668169954559, - 0.6848867799999425, - 0.6851768870110391, - 0.704176045008353, - 0.7969855179981096, - 0.7995650470111286, - 0.799839624989545, - 0.8058095660089748, - 0.8186188560066512, - 0.8268315269961022, - 0.833929963002447, - 0.8521210569888353, - 0.8522486299916636, - 0.8519476840010611, - 0.8662878890027059, - 0.8734554709953954, - 0.8741378980048466, - 0.8806320389994653, - 0.8826046009926358, - 0.8827190330048325, - 0.8826250609999988, - 0.8831004439998651, - 0.8861056539899437, - 0.8862668559886515, - 0.8861570359877078, - 0.8864993760071229, - 0.8997483969869791, - 0.9052424310066272, - 0.9077968089986825, - 0.9122009470011108, - 0.9146547549898969, - 0.9210292589996243, - 0.9197341249964666, - 0.9204372499953024, - 0.9221777359925909, - 0.9227043690043502, - 0.9237482220050879, - 0.9234317099908367, - 0.9233350620052079, - 0.9262294180080062, - 0.9267812719917856, - 0.9278118049987825, - 0.9305167160055134, - 0.9299295370001346, - 1.0043953919957858, - 1.0106020789971808, - 1.2239635570003884, - 1.325518206009292, - 1.9098352090077242, - 2.008264963005786, - 2.2855144889908843, - 2.2965430850017583, - 2.3321537070005434, - 2.4026946969970595, - 2.4273875880026026, - 2.612624096000218, - 2.655867801993736, - 2.6589227889926406, - 2.865280915997573, - 2.986387733995798, - 3.03896437700314, - 3.051306831999682, - 3.2619385669968324, - 3.3083227000024635, - 3.7420123430056265, - 3.7836130060022697, - 3.8164982679882087, - 4.15422887900786, - 4.171478380012559, - 4.583441676993971, - 4.589976807998028, - 4.601794335001614, - 5.419511977001093, - 5.440271390005364, - 5.451455856993562, - 5.528560973005369, - 5.616741773992544, - 6.0415977430093335, - 6.098876992997248, - 6.099244152006577, - 6.241055371006951, - 6.335483118993579, - 6.352224372996716, - 6.573063492003712, - 6.594891399989137, - 6.596334449001006, - 6.606924827996409, - 6.607608317004633, - 6.694455639997614, - 6.725428297009785, - 6.7612102779967245, - 6.797617561009247, - 6.836963139008731, - 6.909429020990501, - 6.90985816999455, - 7.40501182799926, - 7.762832046006224, - 7.7723560629965505, - 7.810949544000323, - 8.067866789991967, - 8.079088561993558, - 8.301016201003222, - 8.300423200998921, - 8.502951510003186, - 8.51381468299951, - 9.02063089600415, - 9.034988924002391, - 9.076958314006333, - 9.076719475997379, - 9.446408593998058, - 9.471911248998367, - 9.483273769990774, - 9.834114782992401, - 9.891402460998506, - 10.129910151998047, - 10.234980849010753, - 10.292988984991098, - 10.358873910998227, - 10.506071016003261, - 10.506006562005496, - 10.526868610002566, - 10.585325818989077, - 10.593603988003451, - 10.709997404992464, - 11.563714293995872, - 11.64275337800791, - 11.674825444992166, - 11.760390136987553, - 11.798686083988287, - 11.828479433999746, - 11.87682568198943, - 11.907064918996184, - 12.006064603992854, - 12.06347796699265, - 12.093996066992986, - 12.122950293996837, - 12.313270807993831, - 12.433206742993207, - 12.458471546997316, - 12.592714361991966, - 12.712429605002399, - 12.727474586004973, - 12.74476189201232, - 12.862087633999181, - 12.912289968007826, - 12.981605609005783, - 13.072166989994003, - 13.094460939988494, - 13.176001942993025, - 14.20249589800369, - 14.431335567001952, - 14.720414335999521, - 14.75553247400967, - 14.803960819001077, - 14.8334022580093, - 14.860926499997731, - 15.055113345995778, - 15.07560543899308, - 15.20694582699798, - 15.222756265997305, - 15.248840512998868, - 15.372788758992101, - 15.5335566069989, - 15.57956974899571, - 15.808058286987944, - 15.869983562006382, - 15.902470259999973, - 15.911156508009299, - 15.974838775990065, - 16.006004958995618, - 16.152896097992198, - 16.28548757500539, - 16.370347606993164, - 17.473879116005264, - 17.539869076004834, - 17.560873733003973, - 17.90244587601046, - 18.056900772004155, - 18.19064478200744, - 18.217659868998453, - 18.34657570499985, - 18.41102182600298, - 18.578301924004336, - 18.623160996998195, - 18.904237663999083, - 18.930591746990103, - 19.039069247999578, - 19.111053879998508, - 19.150629153999034, - 19.15700323000783, - 19.25639336000313, - 19.26212948698958, - 19.491285074007465, - 19.5101963240013, - 19.551046505002887, - 19.804978492000373, - 19.80961011198815, - 19.8691223479982, - 19.899302707999595, - 19.900845231008134, - 20.038872983001056, - 20.072045798006002, - 21.27141015099187, - 21.28829980699811, - 21.504975438001566, - 21.56209452501207, - 21.60882009001216, - 21.70045717198809, - 21.891472439005156, - 22.062081669995678, - 22.15523924099398, - 22.206442691007396, - 22.280261071995483, - 22.310246851993725, - 22.36450065000099, - 22.380251541006146, - 22.748377529991558, - 22.758227392012486, - 22.871732437997707, - 22.875195423999685, - 23.052826813000138, - 23.07258170899877, - 23.128640628012363, - 23.12948722699366, - 23.345185119003872, - 23.380903598008445, - 23.448244847008027, - 23.459507528998074, - 23.501705065005808, - 23.69107164500747, - 23.84994761699636, - 23.959508683008607, - 24.043143524992047, - 24.063460553006735, - 24.163775986002292, - 24.303287260991056, - 24.373134973997367, - 24.72704114500084, - 24.866171670000767, - 24.899665392993484, - 26.51399002499238, - 26.529468109991285, - 26.603467745007947, - 26.614613538011326, - 26.685537544995896, - 26.70143316499889, - 26.888052803988103, - 26.994542726999498, - 27.03336112400575, - 27.095668908994412, - 27.142663645994617, - 27.196031001003576, - 27.226721623999765, - 27.254530234000413, - 27.29990388698934, - 27.32776025100611, - 27.409126069003833, - 27.435525289009092, - 27.542678176003392, - 27.671999949001474, - 27.85251721799432, - 27.86274360299285, - 27.883827614990878, - 27.96130656299647, - 28.006949196002097, - 28.076964880005107, - 28.139778174008825, - 28.150508039994747, - 28.197889974006102, - 28.405178629996954, - 28.543530034992727, - 28.57572201199946, - 28.59392654999101, - 28.610890134004876, - 28.715162485998007, - 28.88582280999981, - 28.93604491300357, - 29.26338042099087, - 29.335042884005816, - 29.38600380299613, - 29.41238293099741, - 29.415578580999863, - 29.584665083995787, - 29.585381979995873, - 29.802310628001578, - 29.83867248799652, - 29.859597847011173, - 29.933007006999105, - 30.10890368001128, - 30.196601807008847, - 30.196955571998842, - 30.336598373003653, - 30.373254984006053, - 30.449426737992326, - 30.53972423299274, - 30.611929651000537, - 30.6587584860099, - 30.809512432009797, - 30.82518111600075, - 30.847193969995715, - 30.86687532599899, - 30.907125931000337, - 32.64204429600795, - 32.759102729993174, - 32.77404324300005, - 32.84157179099566, - 32.86669353398611, - 32.89645556900359, - 32.91659946300206, - 33.100246866000816, - 33.39550885600329, - 33.420618653995916, - 33.54483392099792, - 33.566787518007914, - 33.57218640600331, - 33.613276207994204, - 33.70082295499742, - 33.82144679200428, - 33.836298774011084, - 33.98111417000473, - 34.12966738600517, - 34.17442370099889, - 34.49390972498804, - 34.575376801993116, - 34.74402407299203, - 34.961879317008425, - 35.04007891600486, - 35.054354951003916, - 35.14247164900007, - 35.278878982004244, - 35.368628019001335, - 35.3790320840053, - 35.46671367100498, - 35.493862374991295, - 35.56074562999129, - 35.62737978999212, - 35.67860482400283, - 35.78179853600159, - 35.79833617300028, - 35.86043899599463, - 35.86531712199212, - 35.922049928994966, - 36.10372312499385, - 36.182582014997024, - 36.29472639100277, - 36.29936868700315, - 36.32668307199492, - 36.33695678299409, - 36.38888599800703, - 36.39365476799139, - 36.57531320700946, - 36.71810405299766, - 36.935118787994725, - 37.01233661700098, - 37.04810820099374, - 37.12589002800814, - 37.1638133499946, - 37.31982758299273, - 37.36583039299876, - 37.398912974007544, - 37.46656983700814, - 37.48085813000216, - 37.532780982990516, - 37.641556262999075, - 37.65662110000267, - 37.690491578003275, - 37.83214492700063, - 37.83063043100992, - 37.99320784800511, - 38.14146344001347, - 38.177632530991104, - 40.445925018997514, - 40.670962732998305, - 40.69299383199541, - 40.718005939997965, - 40.87446023800294, - 40.98838997600251, - 41.14764818600088, - 41.602034262992674, - 41.606828984993626, - 41.63954657300201, - 41.75896596399252, - 41.878441199005465, - 41.91416275300435, - 41.96504822399584, - 41.969841332000215, - 42.25777574500535, - 42.26913712100941, - 42.37295478201122, - 42.401371340994956, - 42.4291777239996, - 42.523805258999346, - 42.528551859999425, - 42.56692548499268, - 42.660291128995595, - 42.77837823300797, - 42.840162310996675, - 42.84586886598845, - 43.03557733799971, - 43.246513966005296, - 43.277491277010995, - 43.38394002598943, - 43.39746646498679, - 43.67097455999465, - 43.6865776689956, - 43.7774478020001, - 43.83384187599586, - 43.91900170799636, - 43.979841517997556, - 43.99889390599856, - 44.04711647098884, - 44.117364795005415, - 44.226622329995735, - 44.31503190600779, - 44.40593978999823, - 44.442029212004854, - 44.4654830530053, - 44.78127055699588, - 44.78101292499923, - 44.87764001300093, - 44.899181877000956, - 45.18823779199738, - 45.19659938800032, - 45.1975180110021, - 45.219285414001206, - 45.26345178599877, - 45.26356578699779, - 45.268884996010456, - 45.324131023997325, - 45.339361674996326, - 45.39074616099242, - 45.67006324601243, - 45.711064798000734, - 45.77919527700578, - 45.86850745900301, - 45.97163102099148, - 46.03859460300009, - 46.03827429100056, - 46.04977232401143, - 46.11542969499715, - 46.17234598599316, - 46.36564716299472, - 46.498642184000346, - 46.71031095601211, - 46.79335700000229, - 46.87811714000418, - 47.06340859600459, - 47.165808759003994, - 47.33286343498912, - 49.99853179200727, - 50.14982287499879, - 50.19061788699764, - 50.213855157999205, - 50.390914642004645, - 50.47802670199599, - 50.68655172199942, - 50.86604318600439, - 51.05816411499109, - 51.09406687998853, - 51.12848513099016, - 51.17056419700384, - 51.396173568005906, - 51.46629309200216, - 51.6043977609952, - 51.670170146011515, - 51.68162820600264, - 51.77231526200194, - 51.89675541900215, - 51.91674405999947, - 51.94044581599883, - 52.003512256997055, - 52.02874402300222, - 52.04494391501066, - 52.234414891005144, - 52.399003381011426, - 52.44794636500592, - 52.515664637001464, - 52.5628372880019, - 52.616623756010085, - 52.638831302989274, - 52.639354535989696, - 52.63952812500065, - 52.63963835898903, - 52.64437338699645, - 52.64475294799195, - 52.65566313499585, - 52.65514058599365, - 52.756641587999184, - 52.75622949200624, - 52.77812757200445, - 52.78048942600435, - 52.7788348269969, - 52.77861623300123, - 52.78830572900188, - 52.857849085005, - 52.85696577599447, - 52.859157815997605, - 52.871746443997836, - 52.88657551200595, - 52.887411194999004, - 52.88696327300568, - 52.88856655701238, - 52.88828601100249, - 52.88902058599342, - 52.90108359999431, - 52.902619497006526, - 52.907274891011184, - 52.90766403298767, - 52.91018273799273, - 52.91258546500467, - 52.91228012899228, - 52.915298580992385, - 52.92061533500964, - 52.92119397199713, - 52.92713736600126, - 52.93311744900711, - 52.93835505101015, - 52.9380542459985, - 52.939948582003126, - 52.95222084799025, - 52.9553371410002, - 52.95910978000029, - 52.95908127599978, - 52.96078975898854, - 52.9613425360003, - 52.96108132301015, - 52.96182507601043, - 52.96197058301186 - ], - "storage_latencies": [ - 0.10121397599868942, - 0.07499572700180579, - 0.11042139996425249, - 0.06742719700559974, - 0.09048782101308461, - 0.18127806899428833, - 0.18957229597435798, - 0.16961767298926134, - 0.09849388399743475, - 0.14070050499867648, - 0.15802898899710272, - 0.2956428799807327, - 0.14251567298197187, - 0.29127886600326747, - 0.04740045500511769, - 0.2885911760095041, - 0.28147969699057285, - 0.056157229002565145, - 0.13930618700396735, - 0.06906301599519793, - 0.3145477589860093, - 0.33061052099219523, - 0.1794998539990047, - 0.16927627302356996, - 0.33984278299612924, - 0.08104167501733173, - 0.35260594698775094, - 0.21085586998378858, - 0.23774857996613719, - 0.20529992099909578, - 0.4338756939978339, - 0.38022814397118054, - 0.13469552899186965, - 0.25098881602752954, - 0.10878520201367792, - 0.14422595599899068, - 0.23953910598356742, - 0.1694105149799725, - 0.07366414900752716, - 0.2484513410454383, - 0.29061385001114104, - 0.23212670497014187, - 0.1475087130238535, - 0.2493986350018531, - 0.13339996300055645, - 0.1295507769973483, - 0.2539631249528611, - 0.15467194002121687, - 0.23128466999332886, - 0.5137989620270673, - 0.07947664699167944, - 0.17236439997213893, - 0.19227393198525533, - 0.01823149599658791, - 0.1713989560084883, - 0.25702447902585845, - 0.21043491798627656, - 0.3086349549848819, - 0.21530529897427186, - 0.18609318899689242, - 0.3841371769376565, - 0.05168264299572911, - 0.040577709980425425, - 0.5674542019551154, - 0.317771508009173, - 0.13256739499047399, - 0.2721287369786296, - 0.05902975396020338, - 0.5634358420356875, - 0.5632432049751515, - 0.5435261459933827, - 0.12920703303825576, - 0.1291633799992269, - 0.1279236640111776, - 0.623920053942129, - 0.22855304203403648, - 0.12569531201734208, - 0.37487061400315724, - 0.026323337995563634, - 0.041678998997667804, - 0.2641822939913254, - 0.11802116496255621, - 0.35413624900684226, - 0.0573567469837144, - 0.09888005499669816, - 0.0339291959971888, - 0.02116509799088817, - 0.2558007650222862, - 0.22932222102826927, - 0.450967898053932, - 0.02688810402469244, - 0.18041573798109312, - 0.3059243140160106, - 0.07262135698692873, - 0.21836830201209523, - 0.18772899699979462, - 0.5188272050436353, - 0.06544467702042311, - 0.47515641000063624, - 0.20177778699144255, - 0.06423125597939361, - 0.26302568105165847, - 0.07751615301822312, - 0.08719669499259908, - 0.30688227800419554, - 0.06811765699239913, - 0.17745467198255938, - 0.0954070310399402, - 0.23240477999206632, - 0.10932085699460004, - 0.05883759800053667, - 0.1798571899998933, - 0.10586497398617212, - 0.04660465799679514, - 0.11544654800673015, - 0.21825297402392607, - 0.10930805801763199, - 0.19702429704193491, - 0.05731977698451374, - 0.08924555199337192, - 0.015547529983450659, - 0.1251782530307537, - 0.1750378230062779, - 0.05494933699083049, - 0.016236307012150064, - 0.05895254097413272, - 0.7479611269955058, - 0.05244074900110718, - 0.10451174399349838, - 0.049362179983290844, - 1.0008192929672077, - 0.026340572992921807, - 0.08524002801277675, - 0.09967650398903061, - 0.026140218004002236, - 0.5765975560061634, - 0.031677992999902926, - 0.26556835902738385, - 0.09338845098682214, - 0.12555534999410156, - 0.10224544203083497, - 0.13672538197715767, - 0.14017609000438824, - 0.0365864819905255, - 0.28381401002116036, - 0.12905760000285227, - 0.04873435301124118, - 0.17599774898553733, - 0.06711512399488129, - 0.17476702299609315, - 0.09857506398111582, - 0.3795274699659785, - 0.29374755901517347, - 0.11197979700227734, - 0.12542571399535518, - 0.06721706401731353, - 0.19069935601146426, - 0.06277235099696554, - 0.05740345601225272, - 0.05777069697796833, - 0.031828627004870214, - 0.0801848599803634, - 0.053698133982834406, - 0.16202490795694757, - 0.0934149050299311, - 0.13589689100626856, - 0.10037047701189294, - 0.0592787569767097, - 0.6367771360091865, - 0.042059966988745145, - 0.12194687699957285, - 0.20769054398988374, - 0.12010775099042803, - 0.06159631098853424, - 0.08862979401601478, - 0.04681690098368563, - 0.1691436589753721, - 0.8381168839987367, - 0.06200221600010991, - 0.12506985398067627, - 0.2740352019900456, - 0.07683565102342982, - 0.14750037401972804, - 0.08739763199992012, - 0.06283143399923574, - 0.10955698901670985, - 0.07626899299793877, - 0.03485751000698656, - 0.2687908600346418, - 0.046527298996807076, - 0.09448987901851069, - 0.16828685504151508, - 0.010188492000452243, - 0.1528802750108298, - 0.043488582014106214, - 0.03136203499161638, - 0.07303320898790844, - 0.0832453519833507, - 0.06894536697654985, - 0.17638220803928562, - 0.15612919800332747, - 0.07801638598903082, - 0.05224102200008929, - 0.0584269250248326, - 0.0622410260111792, - 0.09105227899271995, - 0.09971471797325648, - 0.07871389501087833, - 0.11816698301117867, - 0.032855156998266466, - 0.13529253700107802, - 0.2010726800072007, - 0.07703051700082142, - 0.2113721460045781, - 0.9091033159784274, - 0.13627926399931312, - 0.25218469102401286, - 0.11509629999636672, - 0.1347581190202618, - 0.12012508399493527, - 0.22361427498981357, - 0.10353062201465946, - 0.0892055389995221, - 0.08624111000972334, - 0.09334827898419462, - 0.1521515289787203, - 0.14229947203421034, - 0.09179534199938644, - 0.05890113499481231, - 0.08381379600905348, - 0.10871082798985299, - 0.12277054201695137, - 0.03159662400139496, - 0.14259214304911438, - 0.2686223060154589, - 0.09504242599359713, - 0.056902248004917055, - 0.07928299500781577, - 0.03692267900623847, - 0.16397690102166962, - 0.03671243599092122, - 0.11191684899677057, - 0.06486404300085269, - 0.0881642549938988, - 0.11547583203355316, - 0.2474032270401949, - 0.29561784699035343, - 0.06797921801626217, - 0.07267764100106433, - 0.05809278099332005, - 0.07278744898212608, - 0.03719149300013669, - 0.03221190901240334, - 0.1099474959919462, - 0.0416594099951908, - 0.16738845301733818, - 0.11007214504934382, - 0.08769331999064889, - 0.03614277101587504, - 0.07902924699010327, - 0.22111059598682914, - 0.07285047398181632, - 0.04232484400563408, - 0.08291711301717442, - 0.13698970002587885, - 0.10020969300239813, - 0.11389554200286511, - 0.11685794898949098, - 0.20396620297105983, - 0.05758989001333248, - 0.10540236400265712, - 0.11111317996983416, - 0.18811219900089782, - 0.02603845900739543, - 0.09980755901779048, - 0.04723032498441171, - 0.4090488590009045, - 0.10538506202283315, - 0.047685227982583456, - 0.06223196901555639, - 0.07320392200199421, - 0.17369952697481494, - 0.15110354802163783, - 0.08308486596797593, - 0.08263128300313838, - 0.09315471499576233, - 0.13715725200017914, - 0.09883427098975517, - 0.16556227301771287, - 0.07730309100588784, - 0.11358943600498606, - 0.09413166702142917, - 0.11879579199012369, - 0.21141787496162578, - 0.10468483102158643, - 0.1367529569834005, - 0.0731895320059266, - 0.25227757095126435, - 0.0992696030298248, - 0.025712497008498758, - 0.025641706000897102, - 0.04708702400967013, - 0.23625187804282177, - 0.02191807901544962, - 0.1631260660069529, - 0.04809103200386744, - 0.02081545199325774, - 0.04236815900367219, - 0.20416447902971413, - 0.05778199600172229, - 0.1184871660079807, - 0.1227228740171995, - 0.15245904700714163, - 0.07781763203092851, - 0.021747449995018542, - 0.14144160003343131, - 0.09938214901194442, - 0.10070979502052069, - 0.18243914101913106, - 0.07416443698457442, - 0.2768262609752128, - 0.07885132802766748, - 0.03692618801142089, - 0.04267051999340765, - 0.15740966198791284, - 0.1321920929767657, - 0.06254249399353284, - 0.021123231985257007, - 0.031356200997834094, - 0.051195906999055296, - 0.12001735602098051, - 0.09770176601887215, - 0.050059929984854534, - 0.057589746968005784, - 0.14354251901386306, - 0.120416808014852, - 0.07028288498986512, - 0.16371812298893929, - 0.08905021497048438, - 0.06844823600840755, - 0.030669034997117706, - 0.06741961897932924, - 0.025761810000403784, - 0.14537898903654423, - 0.1103846379701281, - 0.08548277498630341, - 0.06771421802113764, - 0.12820077696233056, - 0.06196861898934003, - 0.07162297799368389, - 0.09840333499596454, - 0.065534827997908, - 0.14036571499309503, - 0.07772979800938629, - 0.13531211794179399, - 0.1452081209863536, - 0.08512081898516044, - 0.13770514202769846, - 4.425599763635546e-05, - 0.08805374297662638, - 0.11542992100294214, - 0.026196413004072383, - 0.05296227999497205, - 0.06357881000440102, - 0.0844350999686867, - 0.08285354099643882, - 0.06727296799363103, - 0.21251184795983136, - 0.10588161801570095, - 0.07719091299804859, - 0.012302101997192949, - 0.15864652500022203, - 0.06690668901137542, - 0.10491089598508552, - 0.07706128001154866, - 0.02152512602333445, - 0.1280852049967507, - 0.2002720780146774, - 0.07890651698107831, - 0.04724623399670236, - 0.16575065099459607, - 0.23483810998732224, - 0.10810768400551751, - 0.1271418720134534, - 0.1151972340157954, - 0.07836962500005029, - 0.1729563050030265, - 0.16688113204145338, - 0.0310794969991548, - 0.0698974469996756, - 0.055532819009386, - 0.06911216396838427, - 0.2359685059491312, - 0.08094005299790297, - 0.1480554379959358, - 0.06815805101359729, - 0.015357854004832916, - 0.03838801999518182, - 0.0057957399985753, - 0.12300070899073035, - 0.04173451999668032, - 0.11215557699324563, - 0.07317143598629627, - 0.07047195400809869, - 0.07452672500221524, - 0.06737630699353758, - 0.04703009300283156, - 3.6527999327518046e-05, - 0.09696208901004866, - 0.07246166297409218, - 0.05413951398804784, - 0.06889778601180296, - 0.08969877097115386, - 0.020789183006854728, - 0.08489010699850041, - 0.09572582902910654, - 0.14039642798888963, - 0.08283836903865449, - 0.0731390560104046, - 0.1473210399999516, - 2.1824106259882683, - 0.08375992502260488, - 0.18285913798899855, - 0.10479623595892917, - 0.13693444199452642, - 0.15557208501559217, - 0.10860968900669832, - 0.06289092601218726, - 0.190881668968359, - 0.1357114290294703, - 0.04656737200275529, - 0.0417666260182159, - 0.1302795499941567, - 0.18678532699414063, - 0.08081587897322606, - 0.05623456201283261, - 0.08320210000965744, - 0.17996685703110415, - 0.249108918957063, - 0.14036855200538412, - 0.0749774070282001, - 0.0630373349704314, - 0.11930620599014219, - 0.03130540199344978, - 0.10164066699508112, - 0.1671327430085512, - 0.08850126502511557, - 0.06267775698506739, - 0.09386280098988209, - 0.05590967202442698, - 0.059925230001681484, - 0.11166648501239251, - 0.04743142501683906, - 0.041259938006987795, - 0.09258406698063482, - 0.050324656011071056, - 0.11956850497517735, - 0.04851554201741237, - 0.04761691897874698, - 0.20134412100014742, - 0.09439645396196283, - 0.005600322998361662, - 0.09721341299882624, - 0.14642747704056092, - 0.07580864998453762, - 0.1504030050127767, - 0.12319173401920125, - 0.20445484106312506, - 0.1783433370437706, - 0.10301169198646676, - 0.09497827597078867, - 0.13170703801733907, - 0.06815928503056057, - 0.21199159702518955, - 0.09487991801870521, - 0.13213850800821092, - 0.13910054200096056, - 0.09832245102734305, - 0.0895829779910855, - 0.13397111800441053, - 0.06617784799891524, - 0.04150059197854716, - 0.05790752699249424, - 0.16493158001685515, - 0.0934935560071608, - 0.005157127001439221, - 0.1311980620084796, - 0.19306297300499864, - 0.12916414701612666, - 0.04620746801083442, - 0.05607535499439109, - 0.08367862898739986, - 0.057375556003535166, - 0.10473671500221826, - 0.041334609006298706, - 0.12006850198667962, - 0.0839774139894871, - 0.005179373998544179, - 0.1266707119939383, - 0.1932403310056543, - 0.06620526700862683, - 0.050384025991661474, - 0.05807385899242945, - 0.05215370201040059, - 0.11510963500768412, - 0.09926393399655353, - 0.1459518379997462, - 0.1386336759896949, - 0.09281635699153412, - 0.031582714000251144, - 0.22291526600020006, - 0.3505415300169261, - 0.12378141899534967, - 0.23699443899386097, - 0.1713012459804304, - 0.13825811201240867, - 0.06121369599713944, - 0.14001558101153933, - 0.16505454400612507, - 0.15261720499256626, - 0.20521004100737628, - 0.19790154801739845, - 0.17363213299540803, - 0.16953024400572758, - 0.23348723800154403, - 0.10588786700100172, - 0.3603987370006507, - 0.14511584099091124, - 0.16852635101531632, - 0.27976255997782573, - 0.2139984540117439, - 0.18612318998202682, - 0.2583765479939757, - 0.19872935497551225, - 0.19055530698096845, - 0.26429594798537437, - 0.17499702000350226, - 0.36619898000208195, - 0.2010071630065795, - 0.2676273769757245, - 0.3644852250290569, - 0.22726853797212243, - 0.4284666109451791, - 0.17030129001068417, - 0.13601745099003892, - 0.21118020497669932, - 0.24289200302155223, - 0.25216633100353647, - 0.3038664159394102 - ], - "generation_latencies": [ - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0 - ], - "throughput_timeline": [], - "prefill_latencies": [ - 0.03724823000084143, - 0.03202932600106578, - 0.016993072000332177, - 0.010874931991565973, - 0.03722312500758562, - 0.06103396200342104, - 0.023475909998524003, - 0.041580271004932, - 0.02085218000866007, - 0.015548337003565393, - 0.02835710700310301, - 0.05414916299923789, - 0.05957827599195298, - 0.05959639600769151, - 0.07668654300505295, - 0.012377687991829589, - 0.035720453000976704, - 0.035477719007758424, - 0.02445648099819664, - 0.031322130002081394, - 0.03517868100607302, - 0.060378559996024705, - 0.1308598890027497, - 0.14174191698839422, - 0.14201001900073607, - 0.1604835420002928, - 0.1387354559992673, - 0.132230682997033, - 0.13878571799432393, - 0.05026883199752774, - 0.02377337899815757, - 0.03807715700531844, - 0.018598361988551915, - 0.05012627999531105, - 0.01298382600361947, - 0.04350692201114725, - 0.08740310299617704, - 0.04715489799855277, - 0.08786273900477681, - 0.07423372799530625, - 0.04379764800250996, - 0.048884130010264926, - 0.041139524008031, - 0.04736485899775289, - 0.04192321500158869, - 0.04723296499287244, - 0.0935270030022366, - 0.04936214099870995, - 0.02430434900452383, - 0.06024453199643176, - 0.057701244004420005, - 0.03212777200678829, - 0.055494031999842264, - 0.055888941991725005, - 0.062185778995626606, - 0.06251540800440125, - 0.06248293400858529, - 0.06046386799425818, - 0.05620809699757956, - 0.07376746900263242, - 0.04270714399171993, - 0.061366916008410044, - 0.03210762899834663, - 0.04132574899995234, - 0.02834698600054253, - 0.03086928100674413, - 0.03069156699348241, - 0.13398456799041014, - 0.10251914999389555, - 0.12179878399183508, - 0.10966191299667116, - 0.10265815099410247, - 0.1034811260033166, - 0.11490282199520152, - 0.013638135002111085, - 0.11605017600231804, - 0.02749598199443426, - 0.02712599300139118, - 0.027174433998879977, - 0.024277093005366623, - 0.024756024999078363, - 0.007610686996486038, - 0.013646484003402293, - 0.04277684500266332, - 0.015573264987324364, - 0.003421625995542854, - 0.046290465994388796, - 0.029854325999622233, - 0.03295594199153129, - 0.008602736998000182, - 0.03670850700291339, - 0.02346045100421179, - 0.029987714005983435, - 0.009221646003425121, - 0.009286048996727914, - 0.024961039001937024, - 0.025510912004392594, - 0.026285989995812997, - 0.014408306000404991, - 0.015282123000361025, - 0.12439381500007585, - 0.0879660400096327, - 0.11760760900506284, - 0.032581164996372536, - 0.015929511995636858, - 0.02685487699636724, - 0.08817998500308022, - 0.12617135599430185, - 0.010467133994097821, - 0.03185474200290628, - 0.02114697099023033, - 0.15036487100587692, - 0.03126056199835148, - 0.015992262997315265, - 0.016032005005399697, - 0.08361345999583136, - 0.03115974499087315, - 0.02057439500640612, - 0.02088038600049913, - 0.01590000800206326, - 0.02646995299437549, - 0.02072828401287552, - 0.015592577008646913, - 0.032134644003235735, - 0.025672531002783217, - 0.012106529989978299, - 0.038354689997504465, - 0.01623884099535644, - 0.031039278008393012, - 0.032979210998746566, - 0.02181864299927838, - 0.01578019099542871, - 0.025752599991392344, - 0.0278667980019236, - 0.2910935090039857, - 0.34439156799635384, - 0.026152199992793612, - 0.02612282599147875, - 0.03140914200048428, - 0.026374238994321786, - 0.011364303005393595, - 0.005590704007772729, - 0.02176677400711924, - 0.025582310001482256, - 0.021689159999368712, - 0.0, - 0.02134889799344819, - 0.0210451489983825, - 0.0, - 0.005254169998806901, - 0.025651039002696052, - 0.01060920899908524, - 0.02287165900634136, - 0.026548256995738484, - 0.02105638499779161, - 0.04389048999291845, - 0.038293232006253675, - 0.018296768990694545, - 0.02686961401195731, - 0.03854347900778521, - 0.010351514007197693, - 0.021977188996970654, - 0.03111558400269132, - 0.0, - 0.015802991998498328, - 0.02120283398835454, - 0.026504052002565004, - 0.01031115600198973, - 0.0157597069919575, - 0.0209926270035794, - 0.02572631298971828, - 0.028993568004807457, - 0.02264854199893307, - 0.010427992005134001, - 0.02648499199131038, - 0.025833758991211653, - 0.17117451400554273, - 0.011667256010696292, - 0.03885286800505128, - 0.020574867012328468, - 0.031357507992652245, - 0.020783750995178707, - 0.015733817999716848, - 0.015628566994564608, - 0.020645951997721568, - 0.026324878010200337, - 0.011081391989137046, - 0.03138920699711889, - 0.01656158700643573, - 0.010720543999923393, - 0.0, - 0.03170461900299415, - 0.01613943300617393, - 0.02087811600358691, - 0.030444624004303478, - 0.021149644991965033, - 0.015739547001430765, - 0.02625769199221395, - 0.03131272501195781, - 0.0, - 0.03171188200940378, - 0.021702348996768706, - 0.04649192500801291, - 0.03154084100970067, - 0.03142830000433605, - 0.8386546630063094, - 0.026296845011529513, - 0.026280132005922496, - 0.8330034789978527, - 0.0, - 0.020336432004114613, - 0.04521342000225559, - 0.01621400199655909, - 0.015478713001357391, - 0.03096103999996558, - 0.015568560003885068, - 0.03652736099320464, - 0.021731796994572505, - 0.0412549860047875, - 0.021250922000035644, - 0.010451484995428473, - 0.015665954997530207, - 0.0, - 0.026587503001792356, - 0.010697499994421378, - 0.015366895007900894, - 0.02600968199840281, - 0.03128857200499624, - 0.0, - 0.0, - 0.03105038299690932, - 0.010455244002514519, - 0.03582013699633535, - 0.021179871997446753, - 0.0319438090082258, - 0.028678009010036476, - 0.015675613001803868, - 0.026014043993200175, - 0.028011635004077107, - 0.0, - 0.02084382699104026, - 0.04150761499477085, - 0.0, - 0.01035291000152938, - 0.028497970997705124, - 0.027282522001769394, - 0.020805453008506447, - 0.02125150299980305, - 0.015728789003333077, - 0.015580416002194397, - 0.0, - 0.016533816989976913, - 0.0, - 0.0261056819872465, - 0.0, - 0.02560406900011003, - 0.026098549002199434, - 0.03364988100656774, - 0.025813395986915566, - 0.03380401700269431, - 0.03753721300745383, - 0.01617511099902913, - 0.021086494001792744, - 0.015608511996106245, - 0.04031856400251854, - 0.0, - 0.015419597999425605, - 0.03097607499512378, - 0.016377116000512615, - 0.025990426991484128, - 0.0, - 0.010978856007568538, - 0.015615673997672275, - 0.026584924999042414, - 0.0, - 0.02090371299709659, - 0.0, - 0.015960733013343997, - 0.0, - 0.03832074000092689, - 0.025911011005518958, - 0.0, - 0.0, - 0.021297833998687565, - 0.03619736299151555, - 0.03281890599464532, - 0.02099512700806372, - 0.0, - 0.0, - 0.020980469009373337, - 0.041823681007372215, - 0.0214020459970925, - 0.0, - 0.0, - 0.011252595999394543, - 0.015903323001111858, - 0.0, - 0.041803658998105675, - 0.015841656000702642, - 0.04095588100608438, - 0.041496777994325384, - 0.0, - 0.0, - 0.010687041998608038, - 0.03151078999508172, - 0.01609405600174796, - 0.025792415995965712, - 0.03135521200601943, - 0.02127696599927731, - 0.01574720499047544, - 0.021212774008745328, - 0.015839466999750584, - 0.026033938003820367, - 0.0, - 0.0, - 0.010738164986832999, - 0.03108009000425227, - 0.0411417590075871, - 0.031167558001470752, - 0.0260545409983024, - 0.010418179997941479, - 0.02610388099856209, - 0.01595247299701441, - 0.015434758010087535, - 0.015884506006841548, - 0.010429362009745091, - 0.03146267200645525, - 0.0, - 0.021107013992150314, - 0.04227920400444418, - 0.02083895700343419, - 0.0, - 0.020817925003939308, - 0.02093975000025239, - 0.02303450700128451, - 0.010633843994583003, - 0.02636825900117401, - 0.01652984799875412, - 0.04283259599469602, - 0.0, - 0.03354682200006209, - 0.0, - 0.030957510010921396, - 0.015966314997058362, - 0.01597114698961377, - 0.0210777699976461, - 0.0, - 0.016478313991683535, - 0.017400357988663018, - 0.02111552099813707, - 0.011741259004338644, - 0.017778637004084885, - 0.026313035996281542, - 0.0, - 0.015648147993488237, - 0.0, - 0.03503459499916062, - 0.040738821000559255, - 0.03647884400561452, - 0.0, - 0.0, - 0.035934409010224044, - 0.04624711599899456, - 0.0, - 0.03173549300117884, - 0.0, - 0.0, - 0.02321732799464371, - 0.000264004003838636, - 0.015362442994955927, - 0.0, - 0.046260743009042926, - 0.02061133401002735, - 0.015611371010891162, - 0.015711319996626116, - 0.020609033992514014, - 0.04234636700130068, - 0.01568447900353931, - 0.0, - 0.0, - 0.03659130800224375, - 0.026832755000214092, - 0.031777104988577776, - 0.006084378997911699, - 0.016002076998120174, - 0.04280979299801402, - 0.0, - 0.03599050200136844, - 0.01568083799793385, - 0.021353112999349833, - 0.0, - 0.051238415006082505, - 0.025954087992431596, - 0.026453180005773902, - 0.0, - 0.0, - 0.030966723003075458, - 0.021579224994638935, - 0.026155086001381278, - 0.0, - 0.0, - 0.02142526900570374, - 0.021300398992025293, - 0.030870069007505663, - 0.038440039003035054, - 0.0, - 0.0, - 0.0, - 0.023362480002106167, - 0.0, - 0.0, - 0.010492522007552907, - 0.010549022001214325, - 0.022756190999643877, - 0.010547516008955427, - 0.03404188899730798, - 0.011634252994554117, - 0.0, - 0.01037743000779301, - 0.0, - 0.03620088900788687, - 0.0, - 0.0058665520045906305, - 0.02338514299481176, - 0.025850265999906696, - 0.015318096004193649, - 0.010481100995093584, - 0.0, - 0.005634179004118778, - 0.04948481099563651, - 0.0, - 0.03421376000915188, - 0.02583246200811118, - 0.025749340013135225, - 0.02106164800352417, - 0.02593826899828855, - 0.020860969001660123, - 0.0, - 0.0205254739994416, - 0.030747529002837837, - 0.0, - 0.015346019004937261, - 0.015760799011331983, - 0.021233714011032134, - 0.0, - 0.01565106600173749, - 0.0, - 0.026181193999946117, - 0.036063309002202004, - 0.0, - 0.010508349994779564, - 0.03133131199865602, - 0.0, - 0.0, - 0.02578070599702187, - 0.02793554399977438, - 0.0205256159970304, - 0.0, - 0.02576442800636869, - 0.0, - 0.0, - 0.0, - 0.02763371700712014, - 0.02105576600297354, - 0.031064060996868648, - 0.02190459199482575, - 0.02608719999261666, - 0.0, - 0.0, - 0.03155037701071706, - 0.017852635006420314, - 0.011200723005458713, - 0.02050716600206215, - 0.031102394001209177, - 0.015437271998962387, - 0.026152727994485758, - 0.022657352994428948, - 0.015438015005202033, - 0.0258019560133107, - 0.016492497990839183, - 0.0, - 0.011581868995563127, - 0.015836521008168347, - 0.026102400006493554, - 0.01601758300967049, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.020573617992340587, - 0.0, - 0.005152077006641775, - 0.0, - 0.01552937101223506, - 0.010450119996676221, - 0.0, - 0.026923379991785623, - 0.0, - 0.0, - 0.015767788005177863, - 0.020517283002845943, - 0.031127169000683352, - 0.01585298401187174, - 0.0, - 0.021041485990281217, - 0.026005146006355062, - 0.030985654011601582, - 0.026757573999930173, - 0.031245833000866696, - 0.01565581098839175, - 0.015599142003338784, - 0.006057716993382201, - 0.0, - 0.0, - 0.036296242993557826, - 0.0, - 0.010358188999816775, - 0.02177799100172706, - 0.021577482999418862, - 0.0, - 0.0, - 0.0, - 0.04211071199097205, - 0.036649078989285044, - 0.02102425201155711, - 0.015507363001233898, - 0.0, - 0.022375442989869043, - 0.04730258000199683, - 0.0, - 0.015572468997561373, - 0.0, - 0.0, - 0.0, - 0.01629188799415715, - 0.0207663710025372, - 0.0, - 0.0, - 0.03304210300848354, - 0.015787784999702126, - 0.021597191007458605, - 0.043019056000048295, - 0.03620850999141112, - 0.03841280400229152, - 0.05946198599121999, - 0.0, - 0.1336044690106064, - 0.10898629600706045 - ], - "decode_latencies": [ - 0.005458101994008757, - 0.0057838760112645105, - 0.06640703299490269, - 0.007362733987974934, - 0.05531144300766755, - 0.005198628001380712, - 0.005527424000320025, - 0.001245544000994414, - 0.09525062299508136, - 0.01121066500490997, - 0.006227688994840719, - 0.002453581997542642, - 0.10192410499439575, - 0.01719960500486195, - 0.005905073005123995, - 0.01244772199424915, - 0.0062201370019465685, - 0.012876526001491584, - 0.040567894000560045, - 0.012732449991744943, - 0.006873786012874916, - 0.0139232549990993, - 0.011247119007748552, - 5.1009003072977066e-05, - 0.019205909993615933, - 0.024853095004800707, - 0.042223650991218165, - 0.01870018898625858, - 0.04977662899182178, - 0.007538156991358846, - 0.0055103040067479014, - 0.007577662996482104, - 0.012413718010066077, - 0.019173022999893874, - 0.01941968699975405, - 0.005910289008170366, - 0.012991800991585478, - 0.046214710993808694, - 0.006690119989798404, - 0.05179581500124186, - 0.012086126997019164, - 0.007715569998254068, - 0.007210273994132876, - 0.012918936990899965, - 0.020143018002272584, - 0.018750945004285313, - 0.024931602005381137, - 0.025197402996127494, - 0.007422132999636233, - 0.006090492010116577, - 0.0062075269961496815, - 0.012944550995598547, - 0.011357886003679596, - 0.009550834001856856, - 0.019336815996211953, - 0.002597034996142611, - 0.020177816986688413, - 0.0009384369914187118, - 0.0069356849999167025, - 0.0010307709890184924, - 0.022421699002734385, - 0.007393460997263901, - 0.009705946009489708, - 0.0006734539929311723, - 0.04286405800667126, - 0.009056340000825003, - 0.006572722995770164, - 0.006617674996959977, - 0.024913268003729172, - 0.041840901991236024, - 0.019015967001905665, - 0.011225809997995384, - 0.013465276002534665, - 0.01548908899712842, - 0.0007261690043378621, - 0.0076162410114193335, - 0.005427650001365691, - 0.007365225988905877, - 0.010652465003659017, - 0.02049057300610002, - 0.006366765999700874, - 3.832401125691831e-05, - 0.012718911995762028, - 0.005307111001457088, - 0.015484742005355656, - 0.01553072799288202, - 0.0054963309958111495, - 0.013190091995056719, - 0.0048109640047186986, - 0.040325074005522765, - 0.010357306004152633, - 0.013135639994288795, - 0.01916714799881447, - 0.005887653998797759, - 0.010413960000732914, - 0.005121620997670107, - 0.01870726099878084, - 0.14392874400073197, - 0.07183713300037198, - 0.007445955998264253, - 0.005327179009327665, - 0.0722024630085798, - 0.005285113002173603, - 0.006200757008627988, - 0.015435930006788112, - 0.010485156992217526, - 0.006551537007908337, - 0.010316614003386348, - 0.007248683003126644, - 0.005319905991200358, - 0.005178852006793022, - 0.018022946998826228, - 0.010400607003248297, - 0.015640609999536537, - 0.010403197011328302, - 0.005136516992934048, - 0.07634064801095519, - 0.009252646996174008, - 0.015599560007103719, - 0.011006646993337199, - 0.01119044799997937, - 0.010322351998183876, - 0.01024202500411775, - 0.005191287011257373, - 0.0106684869970195, - 0.03090195800177753, - 0.015526957009569742, - 0.010733346993220039, - 0.010440227997605689, - 0.010702611005399376, - 0.004679010002291761, - 0.005319699994288385, - 0.005116471991641447, - 0.16463395000027958, - 0.00010809401283040643, - 0.018255789997056127, - 0.010417407000204548, - 0.01050021601258777, - 0.02087089499400463, - 0.005157934007002041, - 0.015441926996572874, - 0.0051914829964516684, - 0.00488573701295536, - 0.015726564000942744, - 0.00554861499404069, - 0.02076557899999898, - 0.010598738997941837, - 0.007893921007052995, - 0.005157452993444167, - 0.025892259000102058, - 0.0051960470009362325, - 0.004080558006535284, - 0.00516997100203298, - 0.011866941000334918, - 0.010327805008273572, - 0.005179745989153162, - 0.0041611570050008595, - 0.005111943988595158, - 0.0051276849990244955, - 0.00518023400218226, - 0.010368224990088493, - 0.00515931099653244, - 0.010525759003940038, - 0.010368885996285826, - 0.01046869000128936, - 0.010283530005835928, - 0.03110040898900479, - 0.005186655005672947, - 0.010448872999404557, - 0.005172073011635803, - 0.005134431994520128, - 0.0717168720002519, - 0.005262449005385861, - 0.005161875000339933, - 0.01079374601249583, - 0.011254577999352477, - 0.030988842001534067, - 0.005217337995418347, - 0.010600580004393123, - 0.005134617997100577, - 0.0051983159937663, - 2.9076007194817066e-05, - 0.005103540010168217, - 7.676100358366966e-05, - 0.005268105000141077, - 0.0001163849956355989, - 0.005186238995520398, - 0.010163431012188084, - 0.015474009996978566, - 0.0053262870060279965, - 0.005198641010792926, - 0.007919341995147988, - 0.010372372998972423, - 0.005122597998706624, - 0.005227243003901094, - 0.005106431999593042, - 0.005541554986848496, - 0.005193342993152328, - 0.005114162995596416, - 0.026027735992101952, - 0.025246971999877132, - 0.010477342992089689, - 0.005198540005949326, - 0.010385489993495867, - 0.005295297989505343, - 0.015690204003476538, - 0.005193539007450454, - 0.010497894996660762, - 0.00518690500757657, - 0.010312934988178313, - 0.015456005989108235, - 0.010298977998900227, - 0.010156167991226539, - 0.010610598998027854, - 0.014102757995715365, - 0.02044702900457196, - 0.005196672005695291, - 0.010371638010838069, - 0.010791142005473375, - 0.010528851998969913, - 0.005288575994200073, - 0.00521068400121294, - 0.005162868998013437, - 0.005115748004755005, - 0.005381593000493012, - 0.021054533994174562, - 0.005207644993788563, - 0.020406023992109112, - 0.016193012008443475, - 0.01725011300004553, - 0.01551599899539724, - 0.005388456003856845, - 0.010523347998969257, - 0.010489278007298708, - 0.07654957100749016, - 0.010449323002831079, - 0.010394569006166421, - 0.015855943987844512, - 0.005116383996210061, - 0.005257287994027138, - 0.010307867007213645, - 0.0051186969940317795, - 0.00512899000023026, - 0.005222701001912355, - 0.010324688992113806, - 0.0052746809960808605, - 0.015494554012548178, - 0.0052806269959546626, - 0.005137284999364056, - 0.010254070002702065, - 0.010156938995351084, - 0.00035095099883619696, - 0.015316542994696647, - 0.010208733001491055, - 0.005179278989089653, - 0.010346969997044653, - 0.010382943000877276, - 0.01956071901076939, - 0.005178812993108295, - 0.020677692999015562, - 0.005182435008464381, - 0.015524009999353439, - 0.010362880988395773, - 0.005127385011292063, - 0.009322647005319595, - 0.005204935005167499, - 0.005217382000409998, - 0.005519195008673705, - 0.005197527993004769, - 0.005165603011846542, - 0.005269815999781713, - 0.015440425006090663, - 0.005295356997521594, - 0.0114689490001183, - 0.01449438500276301, - 0.005178489998797886, - 0.025649676012108102, - 0.0053290479991119355, - 0.015546709997579455, - 0.005116368003655225, - 0.005125688010593876, - 0.005225418004556559, - 0.01061513100285083, - 0.01041103299940005, - 0.005331952008418739, - 0.01563177499338053, - 0.005172724006115459, - 0.01570331100083422, - 0.016046564996941015, - 0.005124700997839682, - 0.010390141993411817, - 0.005199295002967119, - 0.021261849004076794, - 0.020534006995148957, - 0.010378264996688813, - 0.005195541991270147, - 0.005159359992831014, - 0.0052358249959070235, - 0.005260531994281337, - 0.010407098001451232, - 0.010181985999224707, - 0.020588762999977916, - 0.01038279800559394, - 0.0003858819982269779, - 0.020360940994578414, - 0.005289483000524342, - 0.005181200001970865, - 0.010428174995467998, - 0.010299472996848635, - 0.016008191989385523, - 0.015596469995216466, - 0.020930708007654175, - 0.00511552399257198, - 0.005140464010764845, - 0.0051900679973186925, - 0.015557877995888703, - 0.010565503995167091, - 0.010582733011688106, - 0.005111174992634915, - 0.00530642201192677, - 0.005400150999776088, - 0.0180960339930607, - 0.005231432005530223, - 0.01571299199713394, - 0.005169387994101271, - 0.005252262999420054, - 0.010435227988637052, - 0.005251879993011244, - 0.01570338400779292, - 0.010265325006912462, - 0.015784340997925028, - 0.005136291001690552, - 0.020657760003814474, - 0.010335355007555336, - 0.005116675005410798, - 0.005169038005988114, - 0.010368732997449115, - 0.005689924000762403, - 0.015566265996312723, - 0.020434878999367356, - 0.010353790989029221, - 0.017232125988812186, - 0.005114581988891587, - 0.005877501011127606, - 0.006378849007887766, - 0.02581860999634955, - 0.005163693000213243, - 0.009392961990670301, - 0.005154904996743426, - 0.015288876005797647, - 0.01599708000139799, - 0.01031265000347048, - 0.01027052900462877, - 0.020611295010894537, - 0.010688416994526051, - 9.442899317946285e-05, - 0.015456638007890433, - 0.0052121479966444895, - 0.015624368010321632, - 1.6672276019962737, - 0.010231726002530195, - 0.015441409006598406, - 0.010331209006835707, - 0.005119114997796714, - 0.010454369999933988, - 0.025539391004713252, - 0.015840720996493474, - 0.0003194739983882755, - 0.01025283700437285, - 0.005260681005893275, - 5.445200076792389e-05, - 0.015343015009420924, - 0.01574318201164715, - 0.01027345799957402, - 0.01029989100061357, - 0.020498899000813253, - 0.005377487002988346, - 0.005266942986054346, - 0.021268842989229597, - 0.010293952000210993, - 0.00515745700977277, - 0.0052536639996105805, - 0.005091223996714689, - 0.00531936100742314, - 0.020699212996987626, - 0.015259644002071582, - 0.005206428002566099, - 0.005186937996768393, - 0.010368643997935578, - 0.005396852997364476, - 0.005319572999724187, - 0.005207388996495865, - 0.010361763008404523, - 0.005168884992599487, - 0.005129180004587397, - 0.005257971002720296, - 3.354701038915664e-05, - 0.015412263994221576, - 0.005254378993413411, - 6.511900573968887e-05, - 0.015417785005411133, - 0.015488150005694479, - 0.020632297004340217, - 0.010323669004719704, - 0.010614340993924998, - 0.012055420986143872, - 5.1412993343546987e-05, - 0.010361749009462073, - 0.020770567993167788, - 0.005130530000315048, - 0.006228651007404551, - 0.005183259010664187, - 0.010356007012887858, - 0.005140423992997967, - 0.00519494699256029, - 0.005175939004402608, - 0.015678814001148567, - 0.010195755996392109, - 0.005150732002221048, - 0.01555888500297442, - 0.01565920999564696, - 0.010617824998917058, - 0.010459561002789997, - 0.01032672100700438, - 0.015655558003345504, - 0.01566614001058042, - 0.011196810999535955, - 0.0051149119972251356, - 7.684000593144447e-05, - 0.005775176003226079, - 0.005202236992772669, - 0.02543215500190854, - 0.02046736799820792, - 0.010552184001426212, - 0.005194924000534229, - 0.005194709010538645, - 0.0051011230098083615, - 0.01207110499672126, - 0.008108267007628456, - 0.012569379003252834, - 0.006864652008516714, - 0.005193207005504519, - 0.00024631198903080076, - 0.010166395004489459, - 0.005207121997955255, - 0.005127458003698848, - 0.005157733001396991, - 0.02784478299145121, - 0.010650612995959818, - 0.005252502000075765, - 0.010265657008858398, - 0.0051365050021559, - 7.745000766590238e-05, - 0.005095899003208615, - 0.02178634599840734, - 0.00018739599909167737, - 0.010397166013717651, - 0.005169608994037844, - 0.010589010998955928, - 0.0052011619991390035, - 0.0052410190110094845, - 0.005714219994843006, - 0.015255109989084303, - 7.79410038376227e-05, - 0.015387098988867365, - 0.021245381998596713, - 0.010384473003796302, - 0.010397317993920296, - 0.02051358799508307, - 0.010979186001350172, - 8.975899254437536e-05, - 0.005132283011334948, - 0.015416563997860067, - 0.006317532999673858, - 5.6385004427284e-05, - 0.010629269003402442, - 0.00519478099886328, - 0.005316513997968286, - 0.010532560001593083, - 0.010332587989978492, - 0.010372921999078244, - 0.010234544999548234, - 0.013339706987608224, - 0.010416728997370228, - 0.005280380995827727, - 0.011023406987078488, - 0.005177717001060955, - 0.005138209002325311, - 0.005184086010558531, - 0.005141629997524433, - 0.010165931002120487, - 0.005380902992328629, - 0.010467300002346747, - 0.005172165998374112, - 0.010246549994917586, - 0.01535131799755618, - 0.005283575999783352, - 0.005992592006805353, - 0.005160253989743069, - 0.01043696100532543, - 0.01095457399787847, - 0.010377932005212642, - 0.005257936005364172, - 0.010444943007314578, - 0.005297897994751111, - 0.010278753004968166, - 0.005230622002272867, - 0.0052506379870465025, - 0.010509868006920442, - 0.019732266999199055, - 0.0109415920014726, - 0.015638512006262317, - 0.005140552995726466, - 0.02057435600727331, - 2.552151103998767, - 0.005239491001702845, - 0.08367423099116422, - 0.0052363599970703945, - 0.010371630996814929, - 0.023919860992464237, - 0.01026330899912864, - 0.02326721599092707, - 0.010335807004594244, - 0.01856046800094191, - 0.010978870996041223, - 0.016517177995410748, - 0.005158515996299684, - 0.0011016049975296482, - 0.005307792001985945, - 0.005115093998028897, - 0.011945709004066885, - 0.011244652007007971, - 0.005294545000651851, - 0.005107238001073711, - 0.019516090003889985, - 0.015674080001190305, - 0.08457667598850094, - 0.020447205999516882, - 0.035267076003947295, - 0.010296100997948088, - 0.025537534995237365, - 0.015402128992718644, - 0.005282191006699577, - 0.019450064006377943, - 0.02528155699837953, - 0.01060741399123799, - 0.015453439991688356, - 0.03103494699462317, - 0.010246061996440403 - ], - "multi_turn_cache_hits": 75, - "multi_turn_cache_misses": 297, - "seed": 42, - "summary": { - "total_requests": 549, - "total_tokens": 147313, - "elapsed_time": 52.04018187522888, - "avg_throughput_tokens_per_sec": 2830.7549030707937, - "requests_per_second": 10.549540378553594, - "end_to_end_latency_ms": { - "mean": 26292.018300497162, - "p50": 27254.530234000413, - "p95": 52887.23202620167, - "p99": 52959.096098080045 - }, - "storage_io_latency_ms": { - "mean": 142.71778070131649, - "p50": 108.71082798985299, - "p95": 351.780180199421, - "p99": 630.6057366169987 - }, - "generation_latency_ms": { - "mean": 0.0, - "p50": 0.0, - "p95": 0.0, - "p99": 0.0 - }, - "cache_stats": { - "cache_hit_rate": 0.930964121748002, - "cache_hits": 5475, - "cache_misses": 406, - "gpu_entries": 449, - "cpu_entries": 0, - "nvme_entries": 0, - "gpu_memory_used_gb": 7.3653564453125, - "cpu_memory_used_gb": 0.0, - "offloads_cpu": 0, - "offloads_nvme": 0, - "storage_health": { - "overall_status": "PASS", - "criteria": [ - { - "name": "Cache Hit Rate > 30%", - "target": 0.3, - "actual": 0.930964121748002, - "unit": "ratio", - "passed": true - } - ], - "passed_count": 1, - "total_count": 1 - }, - "prefill_writes": 449, - "decode_reads": 5475, - "prefill_bytes_written_gb": 7.3653564453125, - "decode_bytes_read_gb": 91.9632568359375, - "system_prompt_hits": 1000, - "common_phrase_hits": 0, - "user_cache_hits": 4400, - "multi_turn_hits": 75, - "total_read_bytes": 98744795136, - "total_write_bytes": 7908491264, - "total_read_gb": 91.9632568359375, - "total_write_gb": 7.3653564453125, - "read_write_ratio": 12.485920745148086, - "read_iops": 5475, - "write_iops": 449, - "gpu_read_p50_ms": 10.246467994875275, - "gpu_read_p95_ms": 25.67205340747023, - "gpu_read_p99_ms": 101.0735119177844, - "gpu_write_p50_ms": 25.93826899828855, - "gpu_write_p95_ms": 106.78422800556272, - "gpu_write_p99_ms": 166.04284744302257 - }, - "qos_metrics": { - "interactive": { - "total_requests": 549, - "latency_ms": { - "mean": 26292.018300497162, - "p50": 27254.530234000413, - "p95": 52887.23202620167, - "p99": 52959.096098080045, - "max": 52961.97058301186 - }, - "sla": { - "target_p95_ms": 50, - "actual_p95_ms": 52887.23202620167, - "compliance": 0.0, - "met": false - } - }, - "responsive": { - "no_data": true - }, - "batch": { - "no_data": true - } - }, - "prefix_cache_stats": { - "prefix_hits": 109, - "prefix_misses": 440, - "system_prompt_reuse": 109, - "common_phrase_reuse": 0, - "bytes_saved": 94896128 - }, - "autoscaling_stats": [], - "autoscaling_summary": null, - "multi_turn_stats": { - "cache_hits": 75, - "cache_misses": 297, - "hit_rate": 0.20161290322580644 - } - } -} \ No newline at end of file diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_nvme_only_trial1.json b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_nvme_only_trial1.json deleted file mode 100644 index a0d0b9fb..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_nvme_only_trial1.json +++ /dev/null @@ -1,2907 +0,0 @@ -{ - "requests_completed": 549, - "total_tokens_generated": 147313, - "total_storage_io_latency": 553.2598749097524, - "total_generation_latency": 0.0, - "end_to_end_latencies": [ - 0.31965799399768, - 0.3983893450058531, - 0.4248207490018103, - 0.46022437799547333, - 0.5509105820092373, - 0.5873291180032538, - 0.599661874002777, - 0.6159856000012951, - 0.6920091609936208, - 0.715832757006865, - 0.7183721239998704, - 0.7200420040026074, - 0.7314529239956755, - 0.7465914450003766, - 0.7556244070001412, - 0.7835762070026249, - 0.7841360489983344, - 0.7992805460089585, - 0.8150099659978878, - 0.9260951469914289, - 0.9354035030119121, - 0.9548282920004567, - 0.9569512379966909, - 0.975150181009667, - 0.9768707870098297, - 0.9881681300030323, - 0.9893617869965965, - 1.0061623279907508, - 1.0068226820003474, - 1.024901625001803, - 1.0275347700080601, - 1.080074388999492, - 1.081400595998275, - 1.081884651008295, - 1.102630898996722, - 1.1089100219978718, - 1.1118049130018335, - 1.1310830379952677, - 1.1514763300074264, - 1.1530813749996014, - 1.1699719239986734, - 1.1850058039999567, - 1.201451985994936, - 1.2039350420091068, - 1.274252543997136, - 1.2936436969903298, - 1.4070201859867666, - 1.4089701880002394, - 1.4207172250025906, - 1.443605206994107, - 1.4427537629962899, - 1.4533232940011658, - 1.4660417270060861, - 1.4756048909912352, - 1.4815714349970222, - 1.520407850999618, - 1.543174653997994, - 1.5453260889917146, - 1.5507015470066108, - 1.555061058010324, - 1.5548074850084959, - 1.573205702996347, - 1.595223045005696, - 1.6282530120079173, - 1.651761886998429, - 1.6591749729996081, - 1.6668910089938436, - 1.7052074739913223, - 1.7091881969972746, - 1.7141609800019069, - 1.735679119010456, - 1.7354904230014654, - 1.7355334129970288, - 1.7375503909861436, - 1.7541098380024778, - 1.7782317539968062, - 1.7970783189957729, - 1.7967887760023586, - 1.815894086001208, - 1.8290150350076146, - 1.8391276649927022, - 1.8521431270055473, - 2.010127167988685, - 2.0224769259948516, - 2.024501890002284, - 2.026606218001689, - 2.0305229789955774, - 2.0508325929986313, - 2.2360236539971083, - 2.2524744469992584, - 2.279282792005688, - 2.3138384710036917, - 2.325800798003911, - 2.3345671909919474, - 2.338480492006056, - 2.3655983979988378, - 2.3997376350016566, - 2.3990999679954257, - 2.409018432008452, - 2.4093657240009634, - 2.4370238419942325, - 2.4456359869946027, - 2.446131061995402, - 2.4638375399954384, - 2.467306621998432, - 2.476082563007367, - 2.4886752469901694, - 2.4898462270066375, - 2.4918040690099588, - 2.4938914200029103, - 2.4944995129917515, - 2.49726218500291, - 2.497919560992159, - 2.501700783992419, - 2.5031404920009663, - 2.521439634001581, - 2.5341935100004775, - 2.5481274609919637, - 2.5498430989973713, - 2.561472674002289, - 2.6077627939957893, - 2.6056182850006735, - 2.6081297419877956, - 2.613892676003161, - 2.6399681309994776, - 2.6446914950065548, - 2.6633777919923887, - 2.6731891259987606, - 2.725401969990344, - 2.9052733139978955, - 2.9140078960044775, - 2.9197343710111454, - 3.0405557149933884, - 3.056468319002306, - 3.0588757829973474, - 3.096105629010708, - 3.1422284799919, - 3.1524904480029363, - 3.1677665219904156, - 3.1687732259888435, - 3.192775180999888, - 3.2089106339990394, - 3.205947392998496, - 3.212210382000194, - 3.2167606920120306, - 3.2153753449965734, - 3.216416355004185, - 3.2209843330056174, - 3.2273927480127895, - 3.2276890110078966, - 3.253983247996075, - 3.2541311319946544, - 3.284234919003211, - 3.2905558529892005, - 3.291087785997661, - 3.292594671002007, - 3.304308841994498, - 3.3228528239997104, - 3.326101419996121, - 3.325422164009069, - 3.3377170099993236, - 3.36178373900475, - 3.369998359994497, - 3.4096656590118073, - 3.430009413001244, - 3.4747720059967833, - 3.5007837890007067, - 3.5078224960016087, - 3.5109966629970586, - 3.519065493994276, - 3.5536190620041452, - 3.555929121997906, - 3.568462087001535, - 3.619714135013055, - 3.6195982999925036, - 3.633985182008473, - 3.6723814790020697, - 3.676963814999908, - 3.692631241006893, - 3.693701541007613, - 3.889652140001999, - 3.8919338609994156, - 3.92537195100158, - 3.9346378939953865, - 3.939980844996171, - 3.9584967990085715, - 3.960278393991757, - 3.962758808003855, - 3.9628983740112744, - 3.9696036649984308, - 3.969952706989716, - 3.973569095003768, - 3.9782032600051025, - 3.981529358003172, - 3.9795628330030013, - 3.990960039998754, - 3.991176990995882, - 3.9946417050086893, - 3.9987749029969564, - 4.028330847999314, - 4.06330783899466, - 4.099650190997636, - 4.103741462997277, - 4.103032753002481, - 4.123836401005974, - 4.144384459999856, - 4.147904162993655, - 4.2855796340008965, - 4.316132445994299, - 4.317309032994672, - 4.318140610004775, - 4.323224370004027, - 4.326033777993871, - 4.327436601990485, - 4.328345568006625, - 4.343370835005771, - 4.344817663994036, - 4.345262691000244, - 4.350991316008731, - 4.355524170998251, - 4.363579661992844, - 4.424165402000654, - 4.424931379995542, - 4.438676441990538, - 4.471764763002284, - 4.474457360993256, - 4.4860731230000965, - 4.4988614939939, - 4.504642822997994, - 4.5116435829986585, - 4.574229619000107, - 4.618066686991369, - 4.679145418995176, - 4.700076409004396, - 4.708139522001147, - 4.7177556160022505, - 4.718006704992149, - 4.927585367011488, - 4.930083243001718, - 4.930946134001715, - 4.934530559999985, - 4.93674955899769, - 4.9379550159937935, - 4.938260712995543, - 4.938441257007071, - 4.945846314003575, - 4.960267028000089, - 4.959367830990232, - 5.028121855008067, - 5.029129128000932, - 5.275333477999084, - 5.277729748006095, - 5.307072833995335, - 5.354738802998327, - 5.363300912998966, - 5.367433795996476, - 5.376305537996814, - 5.376605899000424, - 5.392685974002234, - 5.398399429992423, - 5.4085326960048405, - 5.413616211997578, - 5.431533367009251, - 5.462424900004407, - 5.459182172009605, - 5.472027545998571, - 5.491322702000616, - 5.491247047990328, - 5.493525059995591, - 5.49511689398787, - 5.567544851001003, - 5.58883592300117, - 5.590953623992391, - 5.623487396995188, - 5.643789883994032, - 5.685861849997309, - 5.68923915499181, - 5.694472698989557, - 5.766094364997116, - 5.771231167003862, - 5.792661262006732, - 5.810035036003683, - 5.8106621830083895, - 5.834182058999431, - 5.850950972002465, - 5.932678606011905, - 5.9400007290096255, - 5.940842414987856, - 5.980123551998986, - 5.994202376998146, - 6.013236284008599, - 6.013052128997515, - 6.063947856993764, - 6.0731690490065375, - 6.071358283996233, - 6.077608909996343, - 6.078524279000703, - 6.08048294200853, - 6.1126037950016325, - 6.113816230004886, - 6.140987178005162, - 6.1934914809971815, - 6.226352372992551, - 6.229453810999985, - 6.228977949998807, - 6.233972725996864, - 6.237878237996483, - 6.238010240995209, - 6.2442229309963295, - 6.251001898999675, - 6.260400510000181, - 6.28933042899007, - 6.295803389002685, - 6.588871118990937, - 6.6018569550069515, - 6.642595340003027, - 6.722514130000491, - 6.723392554995371, - 6.74502628899063, - 6.7714462430012645, - 6.816662179000559, - 6.850402663010755, - 6.876199930004077, - 6.915123376005795, - 6.932616202000645, - 6.947949736000737, - 6.964024497996434, - 7.0196884570032125, - 7.034690988002694, - 7.063369630996021, - 7.062356859998545, - 7.092488037989824, - 7.092723616005969, - 7.0922401400021045, - 7.1236110819882015, - 7.17704363699886, - 7.198842127007083, - 7.22346550700604, - 7.222925183988991, - 7.2446706619957695, - 7.269141010998283, - 7.323529549990781, - 7.3320008640002925, - 7.36159522899834, - 7.3624418239924125, - 7.384201507011312, - 7.384929863997968, - 7.411770235004951, - 7.412180860002991, - 7.490760364991729, - 7.4904222900077, - 7.522252870010561, - 7.551805142997182, - 7.571162716005347, - 7.573980351007776, - 7.57408705499256, - 7.595441254990874, - 7.626941088994499, - 7.638958782001282, - 7.650691722010379, - 7.650367031004862, - 7.685367340993253, - 7.724327079995419, - 7.728332329003024, - 7.737461109994911, - 7.736230059992522, - 7.745416097997804, - 7.7513859669998055, - 7.772504453008878, - 7.793300809993525, - 7.877015782010858, - 8.229329323992715, - 8.249018972011982, - 8.24853847900522, - 8.263793395002722, - 8.271216302993707, - 8.310009198001353, - 8.320604676991934, - 8.323973284001113, - 8.330004531002487, - 8.330652082993765, - 8.331973445994663, - 8.361364762997255, - 8.418649315004586, - 8.420044589001918, - 8.457449308989453, - 8.480113307989086, - 8.498635106006986, - 8.504840995999984, - 8.505834791998495, - 8.505453289006255, - 8.528839926002547, - 8.542660104998504, - 8.578867705000448, - 8.607786828986718, - 8.608983836995321, - 8.636523336012033, - 8.648526167991804, - 8.660155217003194, - 8.663490229999297, - 8.725692191001144, - 8.73103142400214, - 8.752162931996281, - 8.7638936489966, - 8.767906143999426, - 8.777902187997825, - 8.83324573400023, - 8.853118440994876, - 8.869491812001797, - 8.871848664988647, - 8.927582014992367, - 8.929492690993357, - 8.929078208995634, - 8.963133237004513, - 8.96568899299018, - 8.985390129993903, - 8.985508147001383, - 9.052882859003148, - 9.066820285006543, - 9.120642488996964, - 9.140740354996524, - 9.162545120008872, - 9.187740128996666, - 9.197777169989422, - 9.265598917991156, - 9.273515879991464, - 9.299333271002979, - 9.340618184010964, - 9.380566371008172, - 9.374637883011019, - 9.375747313009924, - 9.454926956997951, - 9.456820934006828, - 9.486868300999049, - 9.491065187990898, - 9.5403337989992, - 9.545803911008989, - 9.551358741999138, - 9.560926440011826, - 9.605370700010099, - 9.607907988000079, - 9.632123543997295, - 9.640456360997632, - 9.67002889799187, - 9.700281066005118, - 9.72626792799565, - 9.744586587010417, - 9.77226756500022, - 9.803660838995711, - 9.816887180000776, - 10.406625830000849, - 10.409907257009763, - 10.476812098990194, - 10.565960968000581, - 10.590365192998433, - 10.602941345001454, - 10.626579596006195, - 10.675332166007138, - 10.706017343996791, - 10.749396750004962, - 10.76819811000314, - 10.76984730300319, - 10.771606734997476, - 10.773331823002081, - 10.787047157995403, - 10.788939582998864, - 10.817506548002711, - 10.84272612600762, - 10.853568074002396, - 10.872229425003752, - 10.880357796995668, - 10.90213794200099, - 10.901879326993367, - 10.903198590007378, - 11.020710323995445, - 11.02676850798889, - 11.058355353990919, - 11.100492741999915, - 11.099642406989005, - 11.128682290000143, - 11.14435173998936, - 11.165116174001014, - 11.163591787000769, - 11.213171012001112, - 11.236290645989357, - 11.302121492000879, - 11.302803719008807, - 11.33656472999428, - 11.342557718002354, - 11.343894399993587, - 11.368111777002923, - 11.388263635992189, - 11.390347337001003, - 11.416188631992554, - 11.453486529004294, - 11.473153846993227, - 11.47751124399656, - 11.542062545006047, - 11.560999376000836, - 11.619902664999245, - 11.621424365002895, - 11.628271684996434, - 11.63429314699897, - 11.640218235988868, - 11.65052693701, - 11.664136648003478, - 11.664082032002625, - 11.665292883990332, - 11.689956531001371, - 11.690349818003597, - 11.690548263999517, - 11.720726776009542, - 11.73100022401195, - 11.73516197099525, - 11.751289175997954, - 11.767617770994548, - 11.767237004998606, - 11.7717585569917, - 11.860627325004316, - 11.880993359009153, - 11.885412363990326, - 11.926630747999297, - 11.92953825800214, - 12.125185397002497, - 12.127437753995764, - 12.147164907000843, - 12.348484846006613, - 12.550199461999, - 12.642151880994788, - 12.967299647003529, - 13.192594683001516, - 13.607662046008045, - 13.725743007991696, - 13.838724779998302, - 14.13463191500341, - 14.217424583999673, - 14.306490969989682, - 14.53879602100642, - 14.790775556990411, - 15.820652985014021, - 16.828438647004077, - 17.312745459988946, - 17.336109224997927, - 17.34127740400436, - 17.535011109997868, - 18.005119261011714, - 18.722270776997902, - 19.40438091100077, - 21.1752007890027 - ], - "storage_latencies": [ - 0.1459478129982017, - 0.3660980059939902, - 0.364523186974111, - 0.3559424359991681, - 0.4670210400072392, - 0.4893923360214103, - 0.21678478700050618, - 0.5177643969946075, - 0.39598689998092595, - 0.2513808419898851, - 0.4296737530094106, - 0.3780315000039991, - 0.1011545610090252, - 0.556076631997712, - 0.30897698100307025, - 0.6090704380039824, - 0.2610629530099686, - 0.4648232019972056, - 0.06115171100827865, - 0.32901680198847316, - 0.4399791609903332, - 0.20239409600617364, - 0.31851802101300564, - 0.3493107519898331, - 0.5156473880051635, - 0.48667887899500784, - 0.37078376101271715, - 0.6932497870002408, - 0.39049896503274795, - 0.49709540203912184, - 0.12999103401671164, - 0.5062453350110445, - 0.1846610760258045, - 0.38067704997956753, - 0.12465771901770495, - 0.09481918600795325, - 0.35978538198105525, - 0.45322212898463476, - 0.2536262370122131, - 0.5353600989910774, - 0.8728302800300298, - 0.436191882006824, - 0.6943358199932845, - 0.7277451280242531, - 0.511705972981872, - 0.11887648800620809, - 0.5889492729911581, - 0.3435394589905627, - 0.3708824339992134, - 1.07870939001441, - 0.6567391650023637, - 0.4609303260076558, - 0.9498836370185018, - 0.9681975839775987, - 0.46724888097378425, - 0.15334491498651914, - 0.38559849200828467, - 0.613791130046593, - 0.8542527509998763, - 1.354168279998703, - 1.1186978309851838, - 0.37931479602411855, - 0.4705233159911586, - 0.8691484020091593, - 1.3870766339969123, - 0.31011958899034653, - 0.5418033559981268, - 0.1342139250045875, - 0.8314089750056155, - 0.13917064596898854, - 1.0921579510031734, - 0.8249977559607942, - 0.8720484350051265, - 0.8776255300181219, - 0.8740352709719446, - 0.1812806689995341, - 1.2243138340563746, - 0.3447081880440237, - 1.1599109969974961, - 0.34611446800408885, - 0.10760566001408733, - 0.4007968859950779, - 0.5816644090082264, - 0.8159302250132896, - 0.2641763019928476, - 0.21043843298684806, - 0.9772028579900507, - 0.39438751997658983, - 1.0802694449812407, - 1.0422673120046966, - 0.2486999290122185, - 0.8118118529964704, - 0.8067031230166322, - 0.4885296500142431, - 1.304708560972358, - 0.6339766369928839, - 1.356766096985666, - 0.07896832400001585, - 2.0621584850305226, - 0.7882473570207367, - 0.18553346498811152, - 0.6660375169740291, - 0.8446785679698223, - 1.7650311549805338, - 1.953856160005671, - 1.41290671499155, - 0.5327839589735959, - 1.0873301000101492, - 1.2485718480311334, - 1.5990858499571914, - 0.7483823389920872, - 0.07009553798707202, - 0.3890184520132607, - 0.2136266710003838, - 0.9823135990445735, - 0.9106899139442248, - 0.807120629993733, - 0.7680909640184836, - 0.09664605297439266, - 0.47298425699409563, - 1.598756940002204, - 0.06380541100224946, - 0.4354061770136468, - 0.22605401198961772, - 1.1690951250348007, - 0.15437726200616453, - 0.7988772289827466, - 0.10279837298730854, - 0.30543695398955606, - 1.1461350100144045, - 2.507804936962202, - 2.3596864330320386, - 0.387491265006247, - 1.0416023380093975, - 0.9658142080006655, - 0.482080694011529, - 1.5207056400104193, - 1.7500893080141395, - 0.22794261999661103, - 0.5545980499882717, - 0.736316029986483, - 2.45482900897332, - 0.6274424239818472, - 0.5509921680059051, - 1.8880179389379919, - 0.6222510129882721, - 0.7122774820163613, - 0.7894662710023113, - 1.6764060069835978, - 0.07114757598901633, - 0.872204632993089, - 0.6093307329720119, - 0.7177581719879527, - 0.70888791399193, - 0.7063684750173707, - 0.8788550580211449, - 1.0069712299882667, - 0.7942767979693599, - 0.7852264329994796, - 0.14260506701248232, - 0.8171010939986445, - 0.17398671200498939, - 0.5474357360071735, - 0.9170335209928453, - 0.34397784899920225, - 0.12326839700108394, - 0.2594488699833164, - 0.5615331979788607, - 0.4574178039911203, - 0.42932231302256696, - 1.1730378130159806, - 2.165434426991851, - 0.36422999000933487, - 0.8120339559827698, - 0.3663092979986686, - 0.3020617319998564, - 0.5493891619989881, - 0.9297372899891343, - 0.26091820301371627, - 0.2804141530068591, - 1.0161868709838018, - 0.5138433429819997, - 0.6631795040157158, - 0.6766933459875872, - 0.28010975298820995, - 0.5049719029921107, - 0.681426526978612, - 0.6975172209786251, - 0.40904934499121737, - 0.5973885659768712, - 0.031998256003134884, - 0.5305066469882149, - 0.36840835999464616, - 2.190216137067182, - 0.7187962910247734, - 0.6129145910090301, - 0.6705690340168076, - 0.017169243001262657, - 0.0800646739808144, - 1.870189114997629, - 0.7723323780082865, - 0.09884911497647408, - 1.718692849011859, - 0.7244877920456929, - 0.8331505399692105, - 0.6093173189874506, - 0.554050710023148, - 1.485299884021515, - 0.5853875080065336, - 0.16981712401320692, - 0.15922774201317225, - 0.18321736798679922, - 0.7637679100007517, - 0.6001720980129903, - 0.8105101779801771, - 0.35065366298658773, - 0.37158255898975767, - 0.2715474279684713, - 0.2271385689964518, - 1.1064135920169065, - 0.5122090960066998, - 0.465998303014203, - 0.2777165019797394, - 0.7257032849884126, - 1.1278405060002115, - 0.39445810398319736, - 0.45245555498695467, - 0.5017244450136786, - 0.4735655310068978, - 0.4180543890106492, - 0.21615787499467842, - 0.1481614990188973, - 0.22607050700753462, - 1.1792585870134644, - 0.32966560701606795, - 0.51741855002183, - 0.30788292895886116, - 0.5883228380116634, - 1.0950306790036848, - 0.8017701789794955, - 0.8124924959993223, - 1.07484689001285, - 0.9741210249776486, - 0.7969320309930481, - 0.4339695920061786, - 1.4163955579861067, - 0.9586606259981636, - 0.5435959240130614, - 0.46791072402265854, - 0.7077512949763332, - 1.1237912730139215, - 0.9269377619639272, - 0.6478590190235991, - 0.38735348101181444, - 0.80885030097852, - 0.09155675300280564, - 0.9137994600023376, - 0.4426459810201777, - 0.9646074490010506, - 0.8801063059945591, - 2.1227570619957987, - 1.45862290498917, - 0.44072059899917804, - 2.8650726059422595, - 0.7340476360113826, - 0.4743461289908737, - 0.5001359389862046, - 0.49111853701469954, - 0.5474081389984349, - 1.010177685006056, - 0.549287268993794, - 0.20149408499128185, - 1.045050753004034, - 0.18497159899561666, - 0.2766728270362364, - 0.9459764630300924, - 0.09172586702334229, - 2.2446326070057694, - 1.3406994930264773, - 0.4593782519950764, - 0.12056498603487853, - 0.4399248770059785, - 0.541994365004939, - 0.5821783200080972, - 1.4557838659966365, - 0.09455491499102209, - 0.9930745480232872, - 0.4211928880249616, - 0.13690531901374925, - 0.5259615879767807, - 1.5924387690174626, - 0.7043301119847456, - 0.04475646500941366, - 3.8475839449674822, - 0.630701974965632, - 3.2639058599888813, - 0.4960378680116264, - 0.1731211439910112, - 1.1180520179186715, - 0.5947569869895233, - 0.32111888501094654, - 0.21153954799228813, - 1.710959771007765, - 0.6693048880260903, - 0.4445096399867907, - 0.8075299500342226, - 0.7435493190132547, - 0.8045610540139023, - 0.6994935819820967, - 0.7525528390106047, - 0.5945056880009361, - 0.7450886619772064, - 0.12749292099033482, - 0.5606518089771271, - 0.7088767310197, - 0.7465168369963067, - 1.2512838340480812, - 1.7943319309852086, - 0.8687533719930798, - 0.6181723689951468, - 0.7007403009920381, - 0.5002829790028045, - 1.3452042280259775, - 1.1554169150185771, - 0.6094788430054905, - 2.672827348971623, - 0.9093901149899466, - 0.7136440279864473, - 1.4195421369804535, - 1.0761230000061914, - 0.07972139099729247, - 0.9818213820108213, - 0.5900156470015645, - 0.1423266630008584, - 0.785822570003802, - 0.7927610660262872, - 0.5393225769512355, - 2.0074232179904357, - 0.6037925069540506, - 0.10610682900005486, - 0.31427940499270335, - 0.7534981189673999, - 0.539897722992464, - 0.45865475198661443, - 0.737735093003721, - 0.18599458498647436, - 0.2294694230076857, - 0.33494279098522384, - 0.31960276699101087, - 1.2581191429926548, - 0.7106195050000679, - 3.9307277650077594, - 0.611880640979507, - 0.19465495301119518, - 1.2766579520102823, - 0.41600058299081866, - 1.396230998041574, - 0.4408953589882003, - 0.059617959006573074, - 0.9375182020012289, - 0.40363092499319464, - 0.3658398500265321, - 0.1863323540019337, - 1.0400022539979545, - 1.9608578249753918, - 0.6703915549442172, - 0.4416321230237372, - 0.10449716499715578, - 1.1543876790237846, - 0.05827160400804132, - 0.33802609602571465, - 2.0439171140023973, - 2.510133860996575, - 0.9198733400116907, - 0.6772065889963415, - 1.054418292012997, - 0.8382015510142082, - 1.407617881995975, - 1.4824902820400894, - 1.206472498990479, - 0.6561386560060782, - 0.9954312429763377, - 0.8999841579789063, - 0.9715157800237648, - 1.0491586729476694, - 0.07701629000075627, - 2.549274876975687, - 1.1069539159652777, - 0.47477541798434686, - 0.6794724379724357, - 0.12068449502112344, - 1.0476311499951407, - 0.7615730620018439, - 0.9312080399831757, - 0.49782742900424637, - 0.2978679379739333, - 0.22114267000870313, - 0.7971661539777415, - 0.3185767620016122, - 0.5931602439959534, - 1.307661356026074, - 0.5639065650029806, - 0.5625256780040218, - 0.20575248599925544, - 0.046154757001204416, - 1.1047116419358645, - 1.4354751460341504, - 0.3975591750058811, - 0.4185383139993064, - 0.5057316930033267, - 0.5811092240182916, - 0.5552083109942032, - 0.14544983700034209, - 1.0107843290024903, - 1.1699343220097944, - 0.6129947219887981, - 0.18671474201255478, - 0.10117920298944227, - 0.24570704498910345, - 0.4853278060181765, - 0.5149079170078039, - 0.4854975570051465, - 0.36186910400283523, - 0.3279435830190778, - 2.4532287240144797, - 0.4704146210133331, - 0.7472690039867302, - 0.09955616100342013, - 6.7231205710122595, - 0.8339541000168538, - 0.3989531909901416, - 0.04943439699127339, - 0.7637387640279485, - 0.16631754297122825, - 0.4591053440090036, - 0.8274114699743222, - 0.4178226809890475, - 0.5110143500060076, - 1.007242257008329, - 0.5900089550123084, - 0.37822225800482556, - 0.3897495429846458, - 0.6763729150115978, - 0.2774461300141411, - 3.5924514480138896, - 1.4023404700419633, - 0.5526738989865407, - 1.0640608500107192, - 0.0724815099965781, - 0.7179024299985031, - 0.6843819430068834, - 0.9919911929900991, - 1.0413085240143118, - 1.3828910490119597, - 1.013453904990456, - 1.359206550012459, - 6.405557205987861, - 1.085194956016494, - 1.1537337740155635, - 1.1496897080069175, - 0.46485612900869455, - 1.1514932989957742, - 0.1465990309807239, - 1.248877783989883, - 0.5505914600071264, - 1.1186673730408074, - 2.1189125640521524, - 2.5831810609961394, - 2.903662662007264, - 1.3435717570246197, - 0.1460300689941505, - 1.2681071990082273, - 0.23352673501358368, - 0.34327955298067536, - 0.2032109970023157, - 0.6740828330075601, - 2.0589165189594496, - 1.4271351049974328, - 0.4480485009844415, - 2.558303172001615, - 0.35978001200419385, - 2.269550819983124, - 0.040877167994040065, - 0.42463391101046, - 0.416097401001025, - 0.6850185489893192, - 0.7700055250170408, - 2.3273362269828795, - 1.7764844639896182, - 0.4854612639901461, - 0.2255137259926414, - 0.5117418790177908, - 0.5471604080084944, - 0.3018948700046167, - 0.500869811992743, - 0.9384962100011762, - 2.7427644170675194, - 0.09520762601459865, - 0.07001295799273066, - 0.2034150110121118, - 0.6881718200020259, - 1.9238930429855827, - 0.5317270169907715, - 0.09331185498740524, - 0.3924633839924354, - 0.38257678599620704, - 0.22649184400506783, - 0.5202309559681453, - 0.19593399298901204, - 0.3538217389723286, - 0.2951452079869341, - 0.2919926220056368, - 1.892551754033775, - 0.134352888999274, - 0.43674384300538804, - 0.7629379509744467, - 0.4869853899872396, - 0.7852801410044776, - 0.23653616101364605, - 0.77795059797063, - 1.9431102780072251, - 0.25202797999372706, - 0.33656223901198246, - 1.2928936899988912, - 2.837406040984206, - 3.2188899320026394, - 2.10724365603528, - 1.2511603319871938, - 1.0091772149899043, - 1.59733696100011, - 3.1857732239877805, - 8.61888375900162, - 5.477455155021744, - 11.832243690980249, - 2.4636119679780677, - 7.193426187004661, - 4.13561919501808, - 7.4546612100093625, - 6.677426269990974, - 11.792253819017787, - 7.455233935965225, - 5.446166192996316, - 10.535853993002092, - 13.821385307994206, - 8.905584807987907, - 5.754208754980937, - 5.377647007961059, - 12.406091131007997, - 5.054776105986093 - ], - "generation_latencies": [ - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0 - ], - "throughput_timeline": [], - "prefill_latencies": [ - 0.02699205400131177, - 0.027552018989808857, - 0.08243055200728122, - 0.09928471301100217, - 0.07269709200772922, - 0.09417447898886167, - 0.012413140997523442, - 0.023037288992782123, - 0.03506901900982484, - 0.03832929799682461, - 0.048054219005280174, - 0.02468508698802907, - 0.0182653020019643, - 0.012562427000375465, - 0.031315868996898644, - 0.07205562300805468, - 0.10271378200559411, - 0.035793304996332154, - 0.032344815001124516, - 0.1417889180011116, - 0.0836804559949087, - 0.04522692599857692, - 0.013304195992532186, - 0.032614016003208235, - 0.02946247800718993, - 0.05551961000310257, - 0.09862170000269543, - 0.1463767799868947, - 0.08125764600117691, - 0.1291644559969427, - 0.09024044699617662, - 0.13218140200478956, - 0.104003375992761, - 0.11492238500795793, - 0.03322352400573436, - 0.15243912000732962, - 0.03307486299308948, - 0.04436912499659229, - 0.019799126996076666, - 0.02298181300284341, - 0.021327649999875575, - 0.020330783008830622, - 0.01781720999861136, - 0.021790148006402887, - 0.01649590600572992, - 0.021463584998855367, - 0.05020351900020614, - 0.020976656989660114, - 0.05399623099947348, - 0.03798678999010008, - 0.0328011129895458, - 0.02874116699967999, - 0.012260595001862384, - 0.035677894004038535, - 0.03147223599080462, - 0.020428239004104398, - 0.03702547200373374, - 0.04617403700831346, - 0.028504488989710808, - 0.07603523199213669, - 0.03134917200077325, - 0.034344358995440416, - 0.05990402899624314, - 0.06009290799556766, - 0.02903582299768459, - 0.0299207610078156, - 0.01623005300643854, - 0.11993910200544633, - 0.16394380800193176, - 0.013577783000073396, - 0.007188759002019651, - 0.048665816997527145, - 0.03342424699803814, - 0.033675082988338545, - 0.012183408995042555, - 0.05286495998734608, - 0.06293369199556764, - 0.020881618998828344, - 0.05934455600799993, - 0.07980663300259039, - 0.07147935099783354, - 0.012328778000664897, - 0.01423444299143739, - 0.039649949001614004, - 0.02788935200078413, - 0.045030781999230385, - 0.03894781699636951, - 0.06865034799557179, - 0.031226131002767943, - 0.04956899699755013, - 0.10085491598874796, - 0.028742839000187814, - 0.0465509799978463, - 0.048161531012738124, - 0.11417915199126583, - 0.03393579999101348, - 0.050757292003254406, - 0.021705341001506895, - 0.1910478729987517, - 0.024107663994072936, - 0.05501098699460272, - 0.061663651998969726, - 0.08670432800136041, - 0.07224480499280617, - 0.026882256992394105, - 0.007920185991679318, - 0.021749887993792072, - 0.025950138006010093, - 0.02887843200005591, - 0.03891684100381099, - 0.09625104899168946, - 0.06263598300574813, - 0.08845789699989837, - 0.04694084000948351, - 0.023295506005524658, - 0.24435950700717513, - 0.04026714300562162, - 0.10242498500156216, - 0.05509421200258657, - 0.039156527011073194, - 0.05627647999790497, - 0.059461877011926845, - 0.0726437910052482, - 0.016626869997708127, - 0.04211109600146301, - 0.23778384100296535, - 0.26730835500347894, - 0.17767151400039438, - 0.16285204800078645, - 0.21342462800384965, - 0.16304273299465422, - 0.027948126997216605, - 0.02056865900522098, - 0.2074938530131476, - 0.38626270499662496, - 0.22133427699736785, - 0.23239629600720946, - 0.04673834299319424, - 0.032251184005872346, - 0.07765782900969498, - 0.03799165500095114, - 0.06323317599890288, - 0.05530739399546292, - 0.032996815993101336, - 0.06462274299701676, - 0.004309405994717963, - 0.007167345000198111, - 0.0, - 0.04113037200295366, - 0.028754922008374706, - 0.0, - 0.026834924006834626, - 0.026146092000999488, - 0.06696745600493159, - 0.0064639089978300035, - 0.005549231995246373, - 0.006405332998838276, - 0.038955650001298636, - 0.056906775003881194, - 0.0, - 0.05024712099111639, - 0.08011046700994484, - 0.036469722996116616, - 0.05671306999283843, - 0.08586005699180532, - 0.06075367599260062, - 0.013897023003664799, - 0.02452891798748169, - 0.07800821001001168, - 0.14909882699430455, - 0.09597083900007419, - 0.07879237899032887, - 0.15736001399636734, - 0.3905598240089603, - 0.21854688999883365, - 0.24543905300379265, - 0.18448377899767365, - 0.6442126269976143, - 0.043055481000919826, - 0.028297233002376743, - 0.05499148000671994, - 0.0235849570017308, - 0.04347963799955323, - 0.1469103519921191, - 0.07713253599649761, - 0.06632882400299422, - 0.3799268139991909, - 0.07568835900747217, - 0.12459012499311939, - 0.07345562899718061, - 0.07165642399922945, - 0.07096829799411353, - 0.09718876199622173, - 0.11112055799458176, - 0.07714520799345337, - 0.1008946689980803, - 0.09517928000423126, - 0.0721306600025855, - 0.06102286699751858, - 0.07856535099563189, - 0.0, - 0.47981848000199534, - 0.05145030899439007, - 0.3347307450021617, - 0.05985281500034034, - 0.10882064800534863, - 0.08720895599981304, - 0.06654400301340502, - 0.14869177600485273, - 0.1121319450030569, - 0.0, - 0.06360890201176517, - 0.02389755599142518, - 0.037962138987495564, - 0.06728462599858176, - 0.18136166800104547, - 0.042336133992648683, - 0.04990838799858466, - 0.04853950900724158, - 0.01573788899986539, - 0.08035497801029123, - 0.016074137995019555, - 0.0634114429994952, - 0.03769061199272983, - 0.031389032999868505, - 0.011306360000162385, - 0.0, - 0.23025551901082508, - 0.0, - 0.04962319799233228, - 0.24811585499264766, - 0.0, - 0.0, - 0.048040098990895785, - 0.03199717399547808, - 0.046166403000825085, - 0.017080762001569383, - 0.0, - 0.006182837998494506, - 0.011501157001475804, - 0.0, - 0.005027815990615636, - 0.03690255900437478, - 0.012193482994916849, - 0.028493407007772475, - 0.049580491991946474, - 0.06551770299847703, - 0.04361642499861773, - 0.08302115400147159, - 0.04015887599962298, - 0.11832665500696748, - 0.036071950002224185, - 0.0, - 0.023916988997370936, - 0.05338616999506485, - 0.04201199399540201, - 0.027851396007463336, - 0.0, - 0.0, - 0.024573244008934125, - 0.03301731200190261, - 0.025919352992787026, - 0.012953429992194287, - 0.03429659399262164, - 0.037774019001517445, - 0.05867988501267973, - 0.030237758008297533, - 0.05350674100918695, - 0.04612006701063365, - 0.05176163700525649, - 0.0, - 0.07367240299936384, - 0.0, - 0.07338953600265086, - 0.0, - 0.09906977698847186, - 0.10462740100047085, - 0.2719304070051294, - 0.12077110800601076, - 0.040934268006822094, - 0.11858201499853749, - 0.0, - 0.0, - 0.21167382400017232, - 0.0, - 0.013238747997093014, - 0.0, - 0.021733265995862894, - 0.014469433997874148, - 0.0, - 0.0, - 0.0, - 0.012257418988156132, - 0.023524698000983335, - 0.05345124899758957, - 0.0, - 0.2957223970006453, - 0.0, - 0.06516877700050827, - 0.06328579399269074, - 0.06241080600011628, - 0.0, - 0.11591538299398962, - 0.1480987420072779, - 0.16140479700698052, - 0.1488966180040734, - 0.017095950999646448, - 0.18247701399377547, - 0.0, - 0.0, - 0.12377049399947282, - 0.1461953959951643, - 0.1712406570004532, - 0.16422617400530726, - 0.020494815005804412, - 0.16276460999506526, - 0.14775515199289657, - 0.33049653698981274, - 0.184878418003791, - 0.11878264498955105, - 0.12474002101225778, - 0.7618851279985392, - 0.0, - 0.047522068009129725, - 0.19938320000073873, - 0.06823793999501504, - 0.07111416199768428, - 0.07994716700341087, - 0.06860273399797734, - 0.058756029000505805, - 0.015425938996486366, - 0.0, - 0.05307078699115664, - 0.05305198499991093, - 0.047652199005824514, - 0.059983515995554626, - 0.0, - 0.060687610995955765, - 0.021178415001486428, - 0.059909884992521256, - 0.05330106800829526, - 0.0, - 0.03943125299701933, - 0.06483485100034159, - 0.0740712650003843, - 0.03709700000763405, - 0.0, - 0.06596553800045513, - 0.0, - 0.0, - 0.0, - 0.08665718699921854, - 0.05919878699933179, - 0.4911458529968513, - 0.3365269950008951, - 0.3905032860056963, - 0.5687458850006806, - 0.3003587130078813, - 0.36357826901075896, - 0.35667714200099, - 0.3726650190073997, - 0.054889211009140126, - 0.061539999005617574, - 0.0, - 0.041357046997291036, - 0.0, - 0.024462473011226393, - 0.05206260900013149, - 0.0, - 0.05343460899894126, - 0.0, - 0.0, - 0.04085571000177879, - 0.021339668994187377, - 0.040983138998853974, - 0.013929267995990813, - 0.0, - 0.018055870998068713, - 0.03701676698983647, - 0.0604551150026964, - 0.0, - 0.07097067100403365, - 0.0, - 0.11237295799946878, - 0.024383099007536657, - 0.04635095699632075, - 0.01783521099423524, - 0.02043787599541247, - 0.0273951510025654, - 0.008374116994673386, - 0.0, - 0.0, - 0.02101482498983387, - 0.038300669999443926, - 0.029210751003120095, - 0.033636740990914404, - 0.03818052999849897, - 0.0, - 0.0503282369900262, - 0.020687567011918873, - 0.0, - 0.04194108099909499, - 0.031012006002129056, - 0.031072287005372345, - 0.05227802600711584, - 0.07907750899903476, - 0.026799785991897807, - 0.0, - 0.04726960200059693, - 0.0, - 0.0, - 0.0, - 0.0, - 0.14985331700881943, - 0.08243953900819179, - 0.08352871298848186, - 0.0, - 0.09772115701343864, - 0.3641808769898489, - 0.0, - 0.0, - 0.04520464000233915, - 0.03992734200437553, - 0.036760808012331836, - 0.0, - 0.0, - 0.05494916898896918, - 0.04207914898870513, - 0.0, - 0.021164644000236876, - 0.5130332650005585, - 0.04092433700861875, - 0.10740069299936295, - 0.04212513999664225, - 0.0, - 0.06366184499347582, - 0.03694526701292489, - 0.09383557201363146, - 0.05331990899867378, - 0.05281352599558886, - 0.03435585100669414, - 0.021324419998563826, - 0.0, - 0.0, - 0.0, - 0.0, - 0.08638890000293031, - 0.0630917469970882, - 0.0, - 0.0332756639982108, - 0.13676514600228984, - 0.1009605389990611, - 0.06918151800346095, - 0.07027677199221216, - 0.03056421800283715, - 0.0, - 0.0, - 0.018439045001287013, - 0.0, - 0.0, - 0.0, - 0.07377353000629228, - 0.0, - 0.12830376700730994, - 0.0, - 0.1938118200050667, - 0.09921330198994838, - 0.3084232039982453, - 0.20021501299925148, - 0.1727989739883924, - 0.0, - 0.06480415600526612, - 0.03807859800872393, - 0.01428825499897357, - 0.05468840499815997, - 0.09864936899975874, - 0.042277414991986006, - 0.05347407000954263, - 0.1353348209959222, - 0.06351988700043876, - 0.09979151500738226, - 0.060418695997213945, - 0.0, - 0.0946023879951099, - 0.05113541100581642, - 0.033680615000776015, - 0.05206948099657893, - 0.0, - 0.0, - 0.0, - 0.0, - 0.028225911999470554, - 0.0, - 0.03647143399575725, - 0.04854667800827883, - 0.0, - 0.03678972300258465, - 0.45970444699923974, - 0.47117605499806814, - 0.0, - 0.038140032993396744, - 0.02371069000218995, - 0.056174661993281916, - 0.0, - 0.01326615599100478, - 0.021901595988310874, - 0.02087613201001659, - 0.0, - 0.03823582999757491, - 0.0, - 0.0, - 0.029338802996790037, - 0.03327979000459891, - 0.0, - 0.03763887700915802, - 0.012430140996002592, - 0.0, - 0.0, - 0.03982248599641025, - 0.01723778000450693, - 0.0, - 0.02653729399025906, - 0.0, - 0.0, - 0.0, - 0.012527645987574942, - 0.0, - 0.10100170600344427, - 0.3873394889960764, - 0.0, - 0.13588351900398266, - 0.20856762399489526, - 0.0, - 0.16490981499373447, - 0.10716050899645779, - 0.4781360390043119, - 0.08724685601191595, - 0.08818794100079685, - 0.027856303000589833, - 0.06800067200674675, - 0.0, - 0.0, - 0.08230414500576444, - 0.04874230400309898, - 0.07530073000816628, - 0.0, - 0.032513914004084654, - 0.0, - 0.053541999994195066 - ], - "decode_latencies": [ - 0.047853262003627606, - 0.021805199998198077, - 0.017078720004064962, - 0.03969506299472414, - 0.035124467001878656, - 0.0601294020016212, - 0.06896472298831213, - 0.07719992499914952, - 0.09288512700004503, - 0.0220678140030941, - 0.008871493002516218, - 0.0307847119984217, - 0.02646779001224786, - 0.06428968600812368, - 0.048761995989480056, - 0.03634147500270046, - 0.09848439099732786, - 0.09543044900055975, - 0.030277444995590486, - 0.011144385993247852, - 0.02595734100032132, - 0.03530884700012393, - 0.04360897600417957, - 0.016371696008718573, - 0.035715794991119765, - 0.09650325500115287, - 0.03204067399201449, - 0.07232932199258357, - 0.01687579500139691, - 0.13388606198714115, - 0.11406455500400625, - 0.021893502998864278, - 0.1146003500034567, - 0.019077442004345357, - 0.023079111007973552, - 0.07929422499728389, - 0.06091967401152942, - 0.06268701999215409, - 0.17711003999284003, - 0.03279117798956577, - 0.07720235400483944, - 0.03298391601128969, - 0.10936868499265984, - 0.053768912999657914, - 0.03034952400776092, - 0.05847387699759565, - 0.044520109993754886, - 0.04184844899282325, - 0.058537044998956844, - 0.04486299300333485, - 0.037158648992772214, - 0.03574207400379237, - 0.05652512100641616, - 0.03995731500617694, - 0.02878121199319139, - 0.07342509699810762, - 0.06877295899903402, - 0.1316489049931988, - 0.02174717400339432, - 0.061990040994714946, - 0.05673833898617886, - 0.04208411599393003, - 0.13786759100912604, - 0.037807355998666026, - 0.10283447601250373, - 0.24604712599830236, - 0.041444408998358995, - 0.0168068679922726, - 0.14501990500139073, - 0.023133997005061246, - 0.1820349490008084, - 0.16614296699117403, - 0.04743052199773956, - 0.16147141800320242, - 0.1325415419996716, - 0.13954589000786655, - 0.021125071987626143, - 0.02011284300533589, - 0.037048086000140756, - 0.040983373997733, - 0.0965497069992125, - 0.03082979099417571, - 0.2272950829938054, - 0.09065495499817189, - 0.04589487200428266, - 0.012825027006329037, - 0.048372837001807056, - 0.04630660999100655, - 0.06714907800778747, - 0.05788072400901001, - 0.250487247001729, - 0.02093886598595418, - 0.07731032100855373, - 0.03215548300067894, - 0.044175039991387166, - 0.07273336699290667, - 0.036248245000024326, - 0.040844827992259525, - 0.12472385600267444, - 0.07115775499551091, - 0.2417787220038008, - 0.04261996199784335, - 0.05380746799346525, - 0.050103225992643274, - 0.15285456999845337, - 0.027582406997680664, - 0.14334791799774393, - 0.23274931000196375, - 0.058618670009309426, - 0.13838936400134116, - 0.06230642100854311, - 0.010665526002412662, - 0.0803349280031398, - 0.03628648399899248, - 0.04641043000447098, - 0.03238977900764439, - 0.06776564699248411, - 0.04578980400401633, - 0.005502569998498075, - 0.24081047500658315, - 0.0829684689961141, - 0.04371656499279197, - 0.4376831029949244, - 0.0729212029982591, - 0.06138108400045894, - 0.053477887995541096, - 0.035439129002043046, - 0.023321456988924183, - 0.017052712995791808, - 0.03784801899746526, - 0.18693287500354927, - 0.01681050399201922, - 0.04000587899645325, - 0.30546876399603207, - 0.06107785399944987, - 0.12199657500605099, - 0.06475862600200344, - 0.19946949600125663, - 0.21522100499714725, - 0.008968716007075273, - 0.05238328900304623, - 0.10765548801282421, - 0.07720549700025003, - 0.05406642600428313, - 0.05457170600129757, - 0.07060574299248401, - 0.0296225170022808, - 0.0650659669918241, - 0.029494772999896668, - 0.02062722599657718, - 0.03822639699501451, - 0.08405559998936951, - 0.059571628997218795, - 0.04833195899846032, - 0.07972455999697559, - 0.0870019289868651, - 0.2674563969922019, - 0.025718605989823118, - 0.0391589689970715, - 6.443100573960692e-05, - 0.05669303900504019, - 0.04701629299961496, - 0.15780257699952926, - 0.03125213900057133, - 0.15193708099832293, - 0.0615117950073909, - 0.04964227399614174, - 0.39639630100282375, - 0.012597936991369352, - 0.031696809004643, - 0.04279774099995848, - 0.25214812699414324, - 0.03566045300976839, - 0.16158705300767906, - 0.02909949899185449, - 0.042132955000852235, - 0.06782533899240661, - 0.12800091299868654, - 0.062195550999604166, - 0.09974595099629369, - 0.19956533399818, - 0.040264655006467365, - 0.03639178498997353, - 0.029293943007360213, - 0.04107653100800235, - 0.024806886009173468, - 0.05302920799294952, - 0.011363447003532201, - 0.04697747000318486, - 0.03853648299991619, - 0.25876353499188554, - 0.034781184993335046, - 0.03793642300297506, - 0.037738013008493, - 0.0858170470019104, - 0.043531518007512204, - 0.02937496300728526, - 0.017308586000581272, - 0.027116527999169193, - 0.12625412300985772, - 0.0859988979937043, - 0.0013360050070332363, - 0.14696467301109806, - 0.04218460799893364, - 0.0628560129989637, - 0.024248717993032187, - 0.04122336099680979, - 0.18649376499524806, - 0.0370338780048769, - 0.04696486299508251, - 0.036283023000578396, - 0.037958507004077546, - 0.043251921990304254, - 0.03445195699168835, - 0.04242258699377999, - 0.023125698004150763, - 0.048281612995197065, - 0.06830962999083567, - 0.12663839900051244, - 0.032048752007540315, - 0.23164574000111315, - 0.06944162699801382, - 0.04753966900170781, - 0.02867318100470584, - 0.059577564999926835, - 0.07968915399396792, - 0.03732738099643029, - 0.01253059899318032, - 0.02959919100976549, - 0.06111378301284276, - 0.028446034993976355, - 0.11422314899391495, - 0.01272781401348766, - 0.01940476799791213, - 0.05137811899476219, - 0.052701559994602576, - 0.0601904920040397, - 0.012741162994643673, - 0.2832186199957505, - 0.06676498599699698, - 0.11980632299673744, - 0.22284307899826672, - 0.023074345997883938, - 0.13791858300101012, - 0.1321214960044017, - 0.27488773300137836, - 0.02768426900729537, - 0.0697549960023025, - 0.055924751999555156, - 0.0378420350025408, - 0.15600058301060926, - 0.025546147997374646, - 0.08495716500328854, - 0.033735413002432324, - 0.04314379899005871, - 0.23129891999997199, - 0.0388443280098727, - 0.07885968299524393, - 0.05240433999279048, - 0.044063413006369956, - 0.05474699100886937, - 0.01973581600759644, - 0.0365973820007639, - 0.07438429100147914, - 0.025985621992731467, - 0.05903906800085679, - 0.04189561899693217, - 0.03630148399679456, - 0.013474696010234766, - 0.04684237600304186, - 0.06736503800493665, - 0.019546470008208416, - 0.07479800400324166, - 0.0015360139950644225, - 0.06030138899222948, - 0.030426668003201485, - 0.00873041499289684, - 0.06632102699950337, - 0.060085966004407965, - 0.023318195002502762, - 6.212500738911331e-05, - 0.033824394005932845, - 0.3071049380087061, - 0.29809430200839415, - 0.06449126401275862, - 0.028894002010929398, - 0.016723854991141707, - 0.04107628599740565, - 0.0727938319905661, - 0.05631917199934833, - 0.09979863700573333, - 0.037866755010327324, - 0.025974539006710984, - 0.10571343399351463, - 0.048447980007040314, - 0.16880589500942733, - 0.015282881009625271, - 0.09762166600557975, - 0.029948234994662926, - 0.02192846199613996, - 0.026749122989713214, - 0.03550410200841725, - 0.029128091991879046, - 0.0664136609993875, - 0.09938235300069209, - 0.05008275399450213, - 0.08858084300300106, - 0.054920700000366196, - 0.04913712799316272, - 0.0665131330024451, - 0.07430050500261132, - 0.07605620799586177, - 0.09576293399732094, - 0.09506707300897688, - 0.04380918499373365, - 0.10901393200038001, - 0.10624478200043086, - 0.2300439459941117, - 0.042506868005148135, - 0.07791424900642596, - 0.03405735100386664, - 0.11966516698885243, - 0.07152205001330003, - 0.06859301599615719, - 0.08551027299836278, - 0.1309157969953958, - 0.07186786801321432, - 0.07296541100367904, - 0.11429150799813215, - 0.04781996899691876, - 0.04944445598812308, - 0.09920190399861895, - 0.3076125069928821, - 0.03593377101060469, - 0.0940457380056614, - 0.08224914399033878, - 0.32047943500219844, - 0.26426145898585673, - 0.03056152399221901, - 0.0762036870000884, - 0.010521559001062997, - 0.3357502930011833, - 0.028969595005037263, - 0.03324173900182359, - 0.11448155000107363, - 0.02142225998977665, - 0.06326986500062048, - 0.10697431900189258, - 0.029901984002208337, - 0.12110348600253928, - 0.1377172030042857, - 0.6839305780013092, - 0.04579613800160587, - 0.05348791100550443, - 0.04537882599106524, - 0.06641594000393525, - 0.05172297899844125, - 0.06287937199522275, - 0.021950829002889805, - 0.11074163200100884, - 0.025866544994642027, - 0.014304208001703955, - 0.048266494006384164, - 0.609104026996647, - 0.1322673870017752, - 0.03860199201153591, - 0.03445732899126597, - 0.05396401100733783, - 0.024249245005194098, - 0.04855350300204009, - 0.047918915006448515, - 0.06989266400341876, - 0.37927190100890584, - 0.23433170300268102, - 0.07038108799315523, - 0.017727844999171793, - 0.05963022299692966, - 0.03759054999682121, - 0.06927789401379414, - 0.08945291400596034, - 0.10153660400828812, - 0.043233877993770875, - 0.07664324999495875, - 0.042885863993433304, - 0.0781714920012746, - 0.052263101999415085, - 0.3212649089982733, - 0.025280179994297214, - 0.37133581399393734, - 0.07338737999089062, - 0.018439945008140057, - 0.04695152000931557, - 0.1308311250031693, - 0.053126736005651765, - 0.364805756995338, - 0.03425225899263751, - 0.055130940992967226, - 0.08010977899539284, - 0.11276384600205347, - 0.49797053598740604, - 0.03488133200153243, - 0.42125962900172453, - 0.07635344199661631, - 0.0524207690032199, - 0.06237376600620337, - 0.04592083100578748, - 0.012025601987261325, - 0.03782494999177288, - 0.03216842400433961, - 0.09336688100302126, - 0.03669069800525904, - 0.04411196301225573, - 0.05974059400614351, - 0.0741803980054101, - 0.059795652006869204, - 0.041398251996724866, - 0.013383573008468375, - 0.08230200900288764, - 0.08510273999127094, - 0.027613578000455163, - 0.08362559300439898, - 0.04017791499791201, - 0.06172382600198034, - 0.11869531800039113, - 0.4513342569989618, - 0.0261753939994378, - 0.018859159012208693, - 0.10058182899956591, - 0.3039740950043779, - 0.03482017100031953, - 0.04797958300332539, - 0.01995673000055831, - 0.0821469359943876, - 0.04595193899876904, - 0.04585582300205715, - 0.06430851599725429, - 0.06094204400142189, - 0.055362560000503436, - 0.04896682800608687, - 0.05050649099575821, - 0.06696711399126798, - 0.04448361799586564, - 0.03172421299677808, - 0.09369241599051747, - 0.41273926199937705, - 0.0743422039959114, - 0.07140685500053223, - 0.24809060899133328, - 0.06143550100387074, - 0.04537568800151348, - 0.08309466599894222, - 0.041279229990323074, - 0.059411131995148025, - 0.2351137329969788, - 0.02341446299396921, - 0.055069705995265394, - 0.2134580139972968, - 0.036398413009010255, - 0.06484577999799512, - 0.02835343599144835, - 0.05355524799961131, - 0.050443044005078264, - 0.034655595998628996, - 0.0651112089981325, - 0.4853304249991197, - 0.06195716100046411, - 0.12835328800429124, - 0.13664547000371385, - 0.2767921530030435, - 0.03477676899638027, - 0.059170057997107506, - 0.02849979599704966, - 0.05039169099472929, - 0.08322721200238448, - 0.030333239992614836, - 0.4636050130065996, - 0.12742638400231954, - 0.4099826470046537, - 0.048925615003099665, - 0.23262277200410608, - 0.009890186003758572, - 0.06322168800397776, - 0.001319261995377019, - 0.018310487997950986, - 0.041146218005451374, - 0.05156204900413286, - 0.1121740270027658, - 0.25024403000134043, - 0.11034278199076653, - 0.06988872299552895, - 0.24054777099809144, - 0.05026585000450723, - 0.05427589899045415, - 0.04826982600206975, - 0.08047511900076643, - 0.12157718998787459, - 0.07990735699422657, - 0.07901892700465396, - 0.04907124099554494, - 0.01956483999674674, - 0.09304001700365916, - 0.0637635169987334, - 0.07581592199858278, - 0.07467823699698783, - 0.08062870199501049, - 0.04542037499777507, - 0.04350194599828683, - 0.04339664400322363, - 0.04101099800027441, - 0.033888182006194256, - 0.09295480500441045, - 0.0939233449898893, - 0.11310619300638791, - 0.05693646799772978, - 0.06395070000144187, - 0.15543718400294892, - 0.04506518099515233, - 0.10540527800912969, - 0.060489383002277464, - 0.08128469900111668, - 0.1227100769901881, - 0.2572983430000022, - 0.11813176098803524, - 0.12565616999927443, - 0.3231848040013574, - 0.25962870399234816, - 0.5424460349895526, - 0.16789453499950469, - 0.7278452859900426, - 0.22425526900042314, - 1.1492330399923958, - 0.6589163939934224, - 0.36142636800650507, - 0.5292331609962275, - 0.26834057200176176, - 0.7657272630021907, - 0.822596740006702, - 1.0634765959985089, - 0.4621248980110977, - 1.76706964999903, - 0.5916366659948835, - 0.9457246840029256, - 1.8474718910001684, - 0.588177158992039, - 0.4448569509986555, - 1.0871993280015886, - 0.5203078320046188, - 0.3303659920056816, - 0.8627884830057155 - ], - "multi_turn_cache_hits": 74, - "multi_turn_cache_misses": 298, - "seed": 42, - "summary": { - "total_requests": 549, - "total_tokens": 147313, - "elapsed_time": 10.874348640441895, - "avg_throughput_tokens_per_sec": 13546.834377936013, - "requests_per_second": 50.4857824732839, - "end_to_end_latency_ms": { - "mean": 6190.353436489755, - "p50": 5643.789883994032, - "p95": 11910.14339439571, - "p99": 17338.79667808127 - }, - "storage_io_latency_ms": { - "mean": 1007.7593349904414, - "p50": 609.3173189874506, - "p95": 2799.549391417533, - "p99": 8767.968304474483 - }, - "generation_latency_ms": { - "mean": 0.0, - "p50": 0.0, - "p95": 0.0, - "p99": 0.0 - }, - "cache_stats": { - "cache_hit_rate": 0.9325383304940374, - "cache_hits": 5474, - "cache_misses": 396, - "gpu_entries": 15, - "cpu_entries": 0, - "nvme_entries": 434, - "gpu_memory_used_gb": 0.0, - "cpu_memory_used_gb": 0.0, - "offloads_cpu": 0, - "offloads_nvme": 434, - "storage_health": { - "overall_status": "FAIL", - "criteria": [ - { - "name": "NVMe Write P95 < 500ms", - "target": 500, - "actual": 189.35884264719746, - "unit": "ms", - "passed": true - }, - { - "name": "NVMe Read P95 < 200ms", - "target": 200, - "actual": 293.52559285325685, - "unit": "ms", - "passed": false - }, - { - "name": "Cache Hit Rate > 30%", - "target": 0.3, - "actual": 0.9325383304940374, - "unit": "ratio", - "passed": true - } - ], - "passed_count": 2, - "total_count": 3 - }, - "prefill_writes": 449, - "decode_reads": 5474, - "prefill_bytes_written_gb": 7.373779296875, - "decode_bytes_read_gb": 91.890380859375, - "system_prompt_hits": 958, - "common_phrase_hits": 0, - "user_cache_hits": 4442, - "multi_turn_hits": 74, - "total_read_bytes": 98666545152, - "total_write_bytes": 7917535232, - "total_read_gb": 91.890380859375, - "total_write_gb": 7.373779296875, - "read_write_ratio": 12.461775320332418, - "read_iops": 5474, - "write_iops": 449, - "gpu_read_p50_ms": 4.097593002370559, - "gpu_read_p95_ms": 34.32291735443867, - "gpu_read_p99_ms": 55.22852120906493, - "gpu_write_p50_ms": 33.93579999101348, - "gpu_write_p95_ms": 67.96025080111576, - "gpu_write_p99_ms": 69.81346775399288, - "nvme_read_p50_ms": 59.06918349501211, - "nvme_read_p95_ms": 358.2194530514242, - "nvme_read_p99_ms": 862.9684650640406, - "nvme_write_p50_ms": 53.41038949700305, - "nvme_write_p95_ms": 303.1812848545084, - "nvme_write_p99_ms": 487.407819908549, - "nvme_read_device_p50_ms": 35.97741150588263, - "nvme_read_device_p95_ms": 293.52559285325685, - "nvme_read_device_p99_ms": 707.8821929484494, - "nvme_read_host_p50_ms": 19.260006498370785, - "nvme_read_host_p95_ms": 84.66013173892861, - "nvme_read_host_p99_ms": 236.72035544383047, - "nvme_write_device_p50_ms": 14.280821502325125, - "nvme_write_device_p95_ms": 189.35884264719746, - "nvme_write_device_p99_ms": 343.4912211267512, - "nvme_write_host_p50_ms": 27.31539149681339, - "nvme_write_host_p95_ms": 115.94986300333402, - "nvme_write_host_p99_ms": 342.44433709522156 - }, - "qos_metrics": { - "interactive": { - "total_requests": 549, - "latency_ms": { - "mean": 6190.353436489756, - "p50": 5643.789883994032, - "p95": 11910.14339439571, - "p99": 17338.79667808127, - "max": 21175.2007890027 - }, - "sla": { - "target_p95_ms": 50, - "actual_p95_ms": 11910.14339439571, - "compliance": 0.0, - "met": false - } - }, - "responsive": { - "no_data": true - }, - "batch": { - "no_data": true - } - }, - "prefix_cache_stats": { - "prefix_hits": 98, - "prefix_misses": 451, - "system_prompt_reuse": 98, - "common_phrase_reuse": 0, - "bytes_saved": 84672512 - }, - "autoscaling_stats": [], - "autoscaling_summary": null, - "multi_turn_stats": { - "cache_hits": 74, - "cache_misses": 298, - "hit_rate": 0.1989247311827957 - } - } -} \ No newline at end of file diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_nvme_only_trial2.json b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_nvme_only_trial2.json deleted file mode 100644 index 88b7d942..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_nvme_only_trial2.json +++ /dev/null @@ -1,2907 +0,0 @@ -{ - "requests_completed": 549, - "total_tokens_generated": 146625, - "total_storage_io_latency": 562.261374020818, - "total_generation_latency": 0.0, - "end_to_end_latencies": [ - 0.23539620799419936, - 0.23629945299762767, - 0.2368339120002929, - 0.259204526009853, - 0.26693596701079514, - 0.2865899090102175, - 0.29956695598957594, - 0.3034186930017313, - 0.31723564300045837, - 0.31822132399247494, - 0.3211941050103633, - 0.3231561239954317, - 0.48832464299630374, - 0.5033665480004856, - 0.5163233960047364, - 0.5296306469972478, - 0.5334089480020339, - 0.5333249559917022, - 0.5334353319922229, - 0.5499358870001743, - 0.5516328880039509, - 0.5773924889945192, - 0.5927996509999502, - 0.5927243089972762, - 0.5997314539999934, - 0.6051628929999424, - 0.6164633690059418, - 0.6177078680047998, - 0.6171035540028242, - 0.6394934220006689, - 0.6417967570014298, - 0.6428347040055087, - 0.650397953009815, - 0.6777077970036771, - 0.6803812260041013, - 0.6984101560083218, - 0.7262785369966878, - 0.734772662006435, - 0.7408515029965201, - 0.751635337001062, - 0.7529673600074602, - 0.8711113419994945, - 0.8788162900018506, - 0.9312790080002742, - 0.9337115450034617, - 0.9580600830086041, - 0.9698306749924086, - 0.9707870999991428, - 0.985311669006478, - 0.9872588400030509, - 0.9887121459905757, - 0.998852201999398, - 1.0163883000059286, - 1.0385161129961489, - 1.06623837799998, - 1.0700398740009405, - 1.0709318169974722, - 1.0763517340092221, - 1.08267339199665, - 1.0829184569884092, - 1.0888086079939967, - 1.112174331996357, - 1.1237910770141752, - 1.126235119998455, - 1.1349668570037466, - 1.1401160349923884, - 1.262199820994283, - 1.2722589019977022, - 1.2721574129973305, - 1.2735631709947484, - 1.2758765120088356, - 1.2830805800040253, - 1.282378650008468, - 1.2997590170125477, - 1.3043315570103005, - 1.3067453740077326, - 1.4090340080001624, - 1.4326623889937764, - 1.4514251770015107, - 1.4726991280040238, - 1.477803211993887, - 1.4800184650084702, - 1.4826945320091909, - 1.5015070269873831, - 1.5186259879992576, - 1.5350239309918834, - 1.5373071920039365, - 1.6086997730017174, - 1.784576970996568, - 1.8854916239943122, - 1.8884997840068536, - 1.8990616710070753, - 1.9231670500012115, - 1.9267106729967054, - 1.9262455379939638, - 1.9927980749926064, - 2.0083730449987343, - 2.009088945997064, - 2.012295188003918, - 2.0164610750071006, - 2.0175795709947124, - 2.0203320129949134, - 2.0229574819968548, - 2.055682259000605, - 2.06590383600269, - 2.068025246990146, - 2.112286175994086, - 2.1224310499965213, - 2.1266936119936872, - 2.1446773200004827, - 2.152339047010173, - 2.1521923410036834, - 2.1530619580007624, - 2.1697610539995367, - 2.173550653998973, - 2.176424394012429, - 2.1744956280017504, - 2.1808682550035883, - 2.187050852997345, - 2.218900758001837, - 2.220785693003563, - 2.226435432006838, - 2.2464128589926986, - 2.263281787003507, - 2.261984618002316, - 2.2606189780053683, - 2.2638702260010177, - 2.267243691996555, - 2.2838356380088953, - 2.3022068980062613, - 2.3112259450135753, - 2.354644699007622, - 2.4809326190006686, - 2.48781556400354, - 2.490147516000434, - 2.513149197999155, - 2.5107173529977445, - 2.526381480987766, - 2.6713036080036545, - 2.677779094010475, - 2.6803950409957906, - 2.688447530992562, - 2.6975460599933285, - 2.698130698991008, - 2.6997332659957465, - 2.715896022011293, - 2.7358138330018846, - 2.7518442970031174, - 2.7526017939962912, - 2.7751274019974517, - 2.788813980994746, - 2.794724931998644, - 2.819034421991091, - 2.8195355600037146, - 2.8212210070050787, - 2.8206811679992825, - 2.8220112870039884, - 2.855092812998919, - 2.862987166008679, - 2.8651446669973666, - 2.9217046559933806, - 2.9375390560016967, - 2.9393407570023555, - 2.972283265000442, - 2.973284830004559, - 2.981782791990554, - 2.9840642500057584, - 2.986127940006554, - 3.0540512389998185, - 3.0802481419959804, - 3.080342949993792, - 3.081518758001039, - 3.1066703029937344, - 3.1087902800063603, - 3.1125722889992176, - 3.1200397799984785, - 3.156620445995941, - 3.1581736200023443, - 3.176645746003487, - 3.1836490749992663, - 3.349129832990002, - 3.349297710999963, - 3.3598857760080136, - 3.373511423007585, - 3.3832235720037716, - 3.4544798499991884, - 3.456767335999757, - 3.4679575500049395, - 3.468505256008939, - 3.4691902409977047, - 3.474680591010838, - 3.476306537995697, - 3.4773726599960355, - 3.4831152469996596, - 3.485049982002238, - 3.488455225000507, - 3.490930517000379, - 3.5181041680043563, - 3.5612461599957896, - 3.5639738780009793, - 3.630084098986117, - 3.6561889970034827, - 3.6588985579874134, - 3.6571446529997047, - 3.6581490289972862, - 3.690581098999246, - 3.719225191991427, - 3.719263338993187, - 3.721810214003199, - 3.732725460009533, - 3.740928883999004, - 3.763123019001796, - 3.7796387300040806, - 3.792552294995403, - 3.802200324003934, - 3.867332104011439, - 3.890463105009985, - 3.8934650489973137, - 3.899022685000091, - 3.926863457993022, - 3.927623459996539, - 3.9308788960042875, - 3.932803685995168, - 3.9659403760015266, - 3.9686729919922072, - 3.987723942991579, - 4.001368305005599, - 4.012165408988949, - 4.046064623995335, - 4.052872935004416, - 4.056055704000755, - 4.2809850980120245, - 4.29838684599963, - 4.331347065992304, - 4.343424498001696, - 4.422796925995499, - 4.4277506779908435, - 4.640832238001167, - 4.643896395005868, - 4.671195757997339, - 4.670364281992079, - 4.676528784999391, - 4.672305526008131, - 4.730943557005958, - 4.730326438002521, - 4.774435110986815, - 4.7916175440041116, - 4.803972338006133, - 4.824109674009378, - 4.846790135998162, - 4.85314196300169, - 4.867798413994024, - 4.883921478991397, - 4.89464293900528, - 4.904181856007199, - 4.92954715101223, - 4.928911128008622, - 4.939362277000328, - 4.940174575996934, - 4.940859579000971, - 4.98833145100798, - 4.994997374000377, - 4.997361952002393, - 4.998360543002491, - 5.014541092008585, - 5.0298882229981245, - 5.029336328996578, - 5.03640588100825, - 5.059373038006015, - 5.060704083007295, - 5.095136719988659, - 5.149457557999995, - 5.190731501992559, - 5.233483674004674, - 5.2349533189990325, - 5.272385406002286, - 5.300730105009279, - 5.316346671999781, - 5.317629466007929, - 5.348365329002263, - 5.371639878008864, - 5.386754391001887, - 5.392076906995499, - 5.401014634990133, - 5.4015161149873165, - 5.41033066700038, - 5.658792933012592, - 5.681385015996057, - 5.6839360800076975, - 5.704052505985601, - 5.704830982009298, - 5.741987299988978, - 5.7454158019972965, - 5.7690549490071135, - 5.771080541002448, - 5.779628521006089, - 5.7914827420026995, - 5.789029085004586, - 5.79031273999135, - 5.808290521003073, - 5.849535571993329, - 5.848558229001355, - 5.8677729689952685, - 5.881182468001498, - 5.898474510002416, - 5.925394295001752, - 5.9436920630105305, - 5.945198298999458, - 5.967097094995552, - 5.974837211993872, - 6.004668579000281, - 6.042169545995421, - 6.049102849996416, - 6.0814603829931, - 6.114438336997409, - 6.162795376993017, - 6.221645302997786, - 6.232679691005615, - 6.240045169994119, - 6.253801864004345, - 6.276922614008072, - 6.367689924998558, - 6.384992383013014, - 6.391528402993572, - 6.450349784994614, - 6.465447223003139, - 6.506857610991574, - 6.521476431997144, - 6.548214733993518, - 6.558860117991571, - 6.6129828020057175, - 6.6507700639922405, - 6.651825356006157, - 6.945660569006577, - 6.969983048998984, - 6.975504864996765, - 6.986054658002104, - 6.998776901004021, - 7.020190184994135, - 7.02727292200143, - 7.05054176998965, - 7.137298304995056, - 7.183106671000132, - 7.192542447999585, - 7.223841672006529, - 7.237964507992729, - 7.259733669998241, - 7.278736555992509, - 7.29707653800142, - 7.300580023002112, - 7.342875977992662, - 7.378816044001724, - 7.436694068004726, - 7.471034817994223, - 7.477247212998918, - 7.4942818070121575, - 7.494977235008264, - 7.497335791995283, - 7.502542272995925, - 7.50143378599023, - 7.503269929002272, - 7.50603804300772, - 7.508287249002024, - 7.509092338994378, - 7.529765125000267, - 7.537121008994291, - 7.537739389998023, - 7.558422327012522, - 7.5679161839943845, - 7.642125500991824, - 7.643467561996658, - 7.675569778002682, - 7.687237109988928, - 7.687120540998876, - 7.695716290996643, - 7.704294676994323, - 7.7048000870127, - 7.720436767995125, - 7.71904353100399, - 7.722674150005332, - 7.752733237997745, - 7.749787913999171, - 7.760832636995474, - 7.7959545079938835, - 7.859999981999863, - 7.879828525008634, - 7.89363256499928, - 7.94422495100298, - 7.944186595996143, - 7.982460946004721, - 7.982725982990814, - 7.989276901993435, - 8.013047370011918, - 8.017388737993315, - 8.025846830991213, - 8.032616413998767, - 8.07761817399296, - 8.090246071995352, - 8.091503354007727, - 8.093731403001584, - 8.118625583010726, - 8.128671418991871, - 8.142424655001378, - 8.142766742996173, - 8.155470569996396, - 8.170973736007, - 8.172027592008817, - 8.184781161005958, - 8.6230883019889, - 8.632224104003399, - 8.641631164995488, - 8.66734212799929, - 8.66124316699279, - 8.664024802987115, - 8.695738066991908, - 8.695945579005638, - 8.831172978010727, - 8.864618966996204, - 8.880886999002541, - 8.88145502100815, - 8.903051430010237, - 8.904080651002005, - 8.957173346992931, - 8.97873681600322, - 9.018706178991124, - 9.019629394999356, - 9.024934614993981, - 9.075105310010258, - 9.094274857008713, - 9.158703773005982, - 9.16468555899337, - 9.25146265499643, - 9.290016465005465, - 9.356953116992372, - 9.36392125900602, - 9.364224435004871, - 9.371149999002228, - 9.389727611996932, - 9.399500606989022, - 9.439157261003857, - 9.478707811998902, - 9.486415751001914, - 9.50133791200642, - 9.51606018700113, - 9.53630261401122, - 9.5725179840083, - 9.62144198200258, - 9.628047484002309, - 9.683584747996065, - 9.705499008996412, - 9.729057705000741, - 9.735719338001218, - 9.757436009007506, - 9.777774847010733, - 9.821513286005938, - 9.844732655008556, - 9.917123965002247, - 10.110807197008398, - 10.178793035986018, - 10.185721916990587, - 10.185679254995193, - 10.634329797991086, - 10.676637626995216, - 10.678754153996124, - 10.69509929799824, - 10.711837852009921, - 10.738081171002705, - 10.767845504000434, - 10.777721795995603, - 10.791554751995136, - 10.798517570990953, - 10.855927429991425, - 10.882911125998362, - 10.884802931002923, - 10.917452797992155, - 10.934064974004286, - 10.994912767986534, - 11.020935474996804, - 11.067880318005336, - 11.065191624002182, - 11.07498375501018, - 11.104634040006204, - 11.128777766003623, - 11.194497381991823, - 11.196781485006795, - 11.19740351299697, - 11.231370622001123, - 11.241401228005998, - 11.251982228990528, - 11.313714553005411, - 11.342480955994688, - 11.356941753998399, - 11.403165262003313, - 11.455548221012577, - 11.467005206999602, - 11.472092860989505, - 11.474226958991494, - 11.47488727500604, - 11.508955316996435, - 11.507067918006214, - 11.511708605001331, - 11.524038219999056, - 11.534680006996496, - 11.53917749399261, - 11.543081996991532, - 11.549244982001255, - 11.571018625996658, - 11.57131680600287, - 11.58639149001101, - 11.605821807010216, - 11.617626785009634, - 11.629667996006901, - 11.634468596006627, - 11.650702789003844, - 11.724750345994835, - 11.728845849997015, - 11.770093840998015, - 11.804023301010602, - 11.859952920000069, - 11.859551173009095, - 11.863468755997019, - 11.953156173010939, - 12.003767130008782, - 12.035339164998732, - 12.109494821997941, - 12.342884255005629, - 12.41283452299831, - 12.724037578998832, - 13.04332712800533, - 13.191011093003908, - 13.215486360000796, - 13.900160650009639, - 14.13209219899727, - 14.30626759599545, - 14.375798396999016, - 14.555523845992866, - 14.620264468001551, - 14.66712751200248, - 15.005867825995665, - 16.346304227990913, - 16.49086992899538, - 17.519056210992858, - 17.560873120004544, - 17.692493310998543, - 17.783538771007443, - 18.0216725509963, - 18.850421101989923, - 19.519380525001907, - 21.26088185000117 - ], - "storage_latencies": [ - 0.11743784000282176, - 0.0517974229878746, - 0.0024003130238270387, - 0.056968531003803946, - 0.06616211298387498, - 0.09669327399751637, - 0.20177950701327063, - 0.0719128759810701, - 0.04816272499738261, - 0.06190505699487403, - 0.030197321000741795, - 0.06614333501784131, - 0.19411593000404537, - 0.30284904001746327, - 0.19909152599575464, - 0.27352950199565385, - 0.44589312000607606, - 0.27169366701855324, - 0.2496655639988603, - 0.30679237900767475, - 0.3171069559757598, - 0.16325559999677353, - 0.3063048059993889, - 0.11274084699107334, - 0.3129269420023775, - 0.05087118201481644, - 0.336566820013104, - 0.36427485197782516, - 0.3261500149674248, - 0.3385564280033577, - 0.39618045497627463, - 0.3773753310088068, - 0.06753609899897128, - 0.43081635402631946, - 0.4473980100156041, - 0.364535619984963, - 0.41553966101491824, - 0.48171347501920536, - 0.37875425501260906, - 0.007151156984036788, - 0.4081203660025494, - 0.29564754899183754, - 0.5525138460070593, - 0.3311146719934186, - 0.2522051959967939, - 0.6349445560044842, - 0.5533853550296044, - 0.7006776549824281, - 0.3290953880205052, - 0.5218633510085056, - 0.36711541000113357, - 0.7510410549730295, - 0.32712377201823983, - 0.0454446950025158, - 0.5555952020076802, - 0.7796163720049663, - 0.8157751830149209, - 0.26554218801902607, - 0.847441211953992, - 0.8107226239517331, - 0.20629018699401058, - 0.32655249699018896, - 0.5491459230106557, - 0.4255693099985365, - 0.7274159689986845, - 0.030078981988481246, - 0.4761041450110497, - 0.8629325690271799, - 0.17727418598951772, - 0.658442126979935, - 0.9773489710205467, - 0.7372734110249439, - 0.20237472307053395, - 0.06508563000534195, - 1.0344533180177677, - 0.8202770359494025, - 0.6217946600081632, - 0.780001697014086, - 0.02520320299663581, - 0.9372818630072288, - 0.42513393302215263, - 0.2488638000068022, - 0.14278730397927575, - 0.912435637001181, - 0.7160333290084964, - 0.3213252759887837, - 0.839368184984778, - 0.6706454399682116, - 1.53871073598566, - 0.753223811974749, - 0.5154778079886455, - 1.5780419759830693, - 0.6228464259911561, - 1.2578969700116431, - 0.6534748119738651, - 0.8488950620230753, - 1.375623554980848, - 0.5035160110273864, - 0.9512990019866265, - 1.4098916779912543, - 0.44688637398940045, - 0.9446868479863042, - 0.965489545968012, - 0.08512939300271682, - 0.9021486999845365, - 0.8211662649991922, - 0.7842954550142167, - 0.70368421501189, - 0.2009990079968702, - 0.930457839029259, - 1.4351908919779817, - 0.3167487059836276, - 0.5891103210306028, - 0.76436040198314, - 0.6823690949968295, - 1.8893309109844267, - 0.09823357999266591, - 1.2775606679788325, - 1.6139139429869829, - 1.965571072039893, - 0.24057994301256258, - 0.8962637939985143, - 0.04666917599388398, - 1.8136016950302292, - 1.3245933909784071, - 0.09101134000229649, - 0.18963585700839758, - 0.7279767120344331, - 0.1790080029895762, - 1.3019215920066927, - 0.460810597971431, - 0.04487092400086112, - 0.39565546899393667, - 0.29023615301412065, - 1.526451406039996, - 2.231941599980928, - 1.2928922590072034, - 1.5003470090014162, - 1.4514709579816554, - 0.48049932799767703, - 0.4874663330265321, - 2.2737952929746825, - 0.5323217720142566, - 0.40203143599501345, - 0.3965762320003705, - 0.9455642490211176, - 0.6376298079849221, - 0.6641789729474112, - 0.6587624839739874, - 0.7634971670049708, - 0.7111483980115736, - 0.784470311991754, - 0.5249877660098718, - 0.6428653199836845, - 0.924691085019731, - 0.5677923180046491, - 0.7457651950098807, - 0.693248643015977, - 0.613833438968868, - 0.6796842680341797, - 0.3575551060057478, - 0.42364840298250783, - 0.6973696609929902, - 0.5454798529826803, - 0.09735206000914332, - 0.07049239797925111, - 0.2612662679894129, - 0.19088353794359136, - 0.16422975200111978, - 0.8394344670086866, - 0.30320219999703113, - 0.539717499006656, - 1.150545048963977, - 2.2544253959640628, - 0.3971505589724984, - 0.6549385849939426, - 0.6125578840292292, - 0.7241627460025484, - 0.27910080799483694, - 0.04447451500163879, - 0.8076141880155774, - 0.7933559530065395, - 2.0097593719838187, - 0.30794184099067934, - 0.9466842970141442, - 0.10800196201307699, - 0.45086553200962953, - 0.15137598500587046, - 0.49470609199488536, - 0.6076459089817945, - 0.674748372999602, - 0.1129643630119972, - 0.250875334997545, - 0.6819254080328392, - 0.6888036309974268, - 0.5983891590003623, - 0.5299779009947088, - 0.4703074720018776, - 0.0628790370101342, - 0.7629595350008458, - 0.14194711700838525, - 1.709263472002931, - 1.7737946860142983, - 0.8753223980020266, - 0.7658974590158323, - 0.4574083269981202, - 0.6144446300168056, - 0.5868500460201176, - 0.9560029769927496, - 0.46479179499146994, - 0.3988700569898356, - 0.12733396598196123, - 0.24889477199758403, - 0.4152817380236229, - 0.91644186998019, - 0.624834802976693, - 0.18309756598318927, - 0.8611287079693284, - 0.34594540498801507, - 0.5313001639879076, - 0.2001499759790022, - 0.414512711999123, - 0.21174603399413172, - 0.44453641602012794, - 1.2151269860041793, - 1.5779925959941465, - 0.4759750460070791, - 0.2751665860268986, - 0.40782510800636373, - 0.5017369770212099, - 0.2314872249844484, - 1.2753443930268986, - 1.0025421909958823, - 1.2582157940341858, - 0.4487982119753724, - 0.470887097006198, - 0.8534131819906179, - 0.8578454859816702, - 0.918462735004141, - 1.1119372360117268, - 0.4664369880047161, - 2.841406006977195, - 0.7101336200430524, - 1.0744064820173662, - 0.8850518169783754, - 2.0284814340120647, - 1.2867680569906952, - 2.5733299069979694, - 0.8402872459846549, - 0.722946303023491, - 0.06258171300578397, - 0.7596589220192982, - 0.46040308302326594, - 1.669114687043475, - 0.6719773379882099, - 1.436925588946906, - 1.0865775770071195, - 0.6635330779681681, - 0.1799297889956506, - 0.9669290589954471, - 0.2236101949965814, - 0.2939141860115342, - 0.5982832430017879, - 1.219782331972965, - 0.29847226600395516, - 1.5759844509739196, - 0.8301082020188915, - 1.0437511440104572, - 0.7736227970017353, - 0.025237941998057067, - 0.382060323987389, - 0.19709725100256037, - 0.4528127770026913, - 0.41610452595341485, - 0.31524655198154505, - 2.099661423038924, - 1.3998942629696103, - 0.2326248119934462, - 0.7182884839858161, - 0.40216123400023207, - 0.13449014903744683, - 1.1952024180063745, - 0.48269294398778584, - 0.5186748200212605, - 0.508606804010924, - 0.21186972300347406, - 1.737495191002381, - 0.6329412540071644, - 1.5474302299990086, - 0.7447220829781145, - 0.6663327260030201, - 0.7148291560006328, - 0.32317071000579745, - 0.03095454200229142, - 0.7457191279972903, - 1.09345402897452, - 3.239043744993978, - 0.3500960160017712, - 0.38921357301296666, - 0.8713092380203307, - 1.5974664610112086, - 0.6851746879983693, - 0.3877360930055147, - 0.15750028399634175, - 0.09726866500568576, - 0.8058237990335329, - 0.9794278210320044, - 1.038347616995452, - 0.8634503319772193, - 0.15727887299726717, - 1.115202742992551, - 0.684087105008075, - 0.5703224659955595, - 0.42539581200981047, - 0.7883926360227633, - 0.3069708279945189, - 0.2972006830095779, - 0.9555406940198736, - 1.840651902006357, - 0.865871848014649, - 0.8318068269873038, - 1.1953740379831288, - 1.314222371991491, - 0.7719469880248653, - 0.43459430400980636, - 0.7347729610191891, - 0.5712487749988213, - 0.7230999670136953, - 1.785722810032894, - 0.06484267400810495, - 0.12062984700605739, - 0.6383287610078696, - 0.7282659299962688, - 0.8486172449920559, - 3.1470015790109755, - 0.8499499610043131, - 0.8683683459676104, - 0.09877788099402096, - 3.9327406980009982, - 1.032733869011281, - 1.0224942789936904, - 1.4209357240033569, - 0.7166398119734367, - 0.19392006301495712, - 0.23519324901280925, - 0.7190948929928709, - 0.5643160239851568, - 1.3109694579907227, - 0.9210888459929265, - 0.6907379370095441, - 1.043005137995351, - 0.9596907999948598, - 0.4033103350084275, - 0.4246364800055744, - 0.8259600049932487, - 1.2601725540152984, - 1.4384359400137328, - 0.7787797450291691, - 1.3876666949945502, - 0.0014052229817025363, - 1.5912766510009533, - 0.27816002699546516, - 0.015233876998536289, - 0.3878139840089716, - 0.6544205630052602, - 0.9056765640125377, - 0.020413634993019514, - 0.027657259968691505, - 0.15955740900244564, - 0.2627541800175095, - 1.1845411780086579, - 0.3879299959662603, - 1.375197535919142, - 0.6834803039964754, - 0.6297585840075044, - 0.578542205010308, - 1.1096266460372135, - 1.7995728209643858, - 0.42065986001398414, - 0.1307165129983332, - 2.2669031779805664, - 0.2985220379923703, - 0.44360159698408097, - 2.2584580549882958, - 0.09679335200053174, - 0.7003163920016959, - 0.5203241939889267, - 0.38413333203061484, - 0.20614738200674765, - 0.6531541629810818, - 0.6803578319959342, - 0.6909384540194878, - 0.47027577499102335, - 0.24984747701091692, - 0.21279000901267864, - 0.452636115020141, - 0.4780638980009826, - 0.483701339981053, - 0.48259835301723797, - 0.5702944469958311, - 0.48671807200298645, - 0.5404825279983925, - 0.2483197410037974, - 0.2787733000150183, - 0.11685747400042601, - 0.03686523900250904, - 0.40398977301083505, - 0.5361922290030634, - 2.948694539954886, - 0.4827403509989381, - 0.9773420229903422, - 6.648547969001811, - 0.8866016750107519, - 0.8728546690108487, - 0.9147225339984288, - 0.921362265929929, - 0.7544928200077266, - 0.18592951299797278, - 0.758819417009363, - 0.6784347430220805, - 0.7195845470268978, - 0.8249128259776626, - 0.9234243569808314, - 0.7878839839977445, - 0.8353282809985103, - 0.4303655359981349, - 0.07649529101036023, - 0.9903781250177417, - 1.0525824439682765, - 2.5243740520236315, - 1.0147332849592203, - 0.42776711797341704, - 0.6121152099949541, - 1.2655735739972442, - 0.39679842900659423, - 0.4866692129726289, - 0.6484210100170458, - 0.6166944050055463, - 1.402024752984289, - 3.752041073006694, - 0.47924978598894086, - 0.39515561499865726, - 0.3532948720094282, - 0.08906864099844825, - 0.6110087599954568, - 1.8025191419874318, - 0.059129828005097806, - 0.08439665498735849, - 0.7039778560138075, - 0.2698732889984967, - 0.7891337909823051, - 0.4934568750177277, - 0.5535617870045826, - 0.11694686695409473, - 6.1296234960027505, - 0.7071383550064638, - 1.022197474987479, - 0.6460698190057883, - 0.42107571501401253, - 0.6559764680278022, - 0.376097585001844, - 1.2144699159980519, - 1.310756197970477, - 1.259709420017316, - 0.7179247929889243, - 2.6207690380251734, - 2.727711943007307, - 2.4704537129728124, - 0.8015390760265291, - 2.9332474729890237, - 1.338666982977884, - 0.9457172349939356, - 0.17081110199796967, - 1.8655995599983726, - 1.4950412530160975, - 2.346335241018096, - 1.5604892450064654, - 0.27156326100521255, - 2.9621777509892127, - 0.40394884000124875, - 1.6768836419796571, - 1.2354669410124188, - 0.41871184801857453, - 1.3286930460162694, - 1.4803998729912564, - 0.4901961110008415, - 0.42943451803876087, - 2.5450024779856903, - 0.49174553601187654, - 0.20311088000016753, - 0.04427233900059946, - 1.510374726029113, - 0.5780352290166775, - 0.43467376999615226, - 0.3853479660174344, - 0.08321944999624975, - 0.3527638270170428, - 0.6236579680116847, - 3.3300172460149042, - 0.6310677710134769, - 1.9186990300077014, - 0.13704476300335955, - 0.4596176810009638, - 0.04840675099694636, - 0.31895576800161507, - 0.2945250369957648, - 0.08654551400104538, - 0.24694759299745783, - 0.2948241049743956, - 0.15014324999356177, - 0.29237264500989113, - 1.9182322759879753, - 0.3971013489936013, - 0.13940771800116636, - 0.3875908300251467, - 0.8238588689855533, - 5.824414441012777, - 0.8382911249791505, - 1.9978033779771067, - 0.29752012599783484, - 0.9262108419789001, - 2.9203133060218534, - 3.4391208060260396, - 0.2326282970025204, - 0.47764922700298484, - 1.5068568139831768, - 2.7444633730046917, - 1.4848077760107117, - 1.150135372998193, - 1.9259609209984774, - 3.930426591032301, - 5.836594089996652, - 9.707423260988435, - 12.6593145319639, - 4.78328218201932, - 2.8359488129790407, - 8.102838720980799, - 7.9836895979969995, - 7.082828046011855, - 12.25525991700124, - 7.817692886019358, - 9.638243202978629, - 5.517757876965334, - 11.258868955002981, - 14.694338137996965, - 5.885286682008882, - 5.3965500469639665, - 12.126016601963784, - 5.286667804961326 - ], - "generation_latencies": [ - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0 - ], - "throughput_timeline": [], - "prefill_latencies": [ - 0.02734439000778366, - 0.09680399599892553, - 0.09682399001030717, - 0.032922850994509645, - 0.02187952598615084, - 0.03351956800906919, - 0.02829580000252463, - 0.03027350699994713, - 0.004642265004804358, - 0.004262864007614553, - 0.0031077849998837337, - 0.004906506001134403, - 0.005235363001702353, - 0.0008731780108064413, - 0.004740347998449579, - 0.002299213010701351, - 0.002908957001636736, - 0.009554853008012287, - 0.004973942996002734, - 0.010307222008123063, - 0.007612033004988916, - 0.008945656998548657, - 0.017859296000096947, - 0.007826066997949965, - 0.009497937993728556, - 0.0076693000009981915, - 0.009029584005475044, - 0.01481465199321974, - 0.00788318199920468, - 0.007176030994742177, - 0.01407030200061854, - 0.005139610992046073, - 0.013914761002524756, - 0.010141942999325693, - 0.010544381002546288, - 0.01181348400132265, - 0.021827335993293673, - 0.04134011099813506, - 0.046738750010263175, - 0.014390369004104286, - 0.014704104993143119, - 0.014444205007748678, - 0.008999552999739535, - 0.006711868991260417, - 0.014094134996412322, - 0.06662916200002655, - 0.010936188991763629, - 0.017260273001738824, - 0.014906351003446616, - 0.012793026995495893, - 0.016233168993494473, - 0.011148246005177498, - 0.01640419999603182, - 0.016746055989642628, - 0.040390151989413425, - 0.07840651100559626, - 0.016152935990248807, - 0.012805513993953355, - 0.04121121201023925, - 0.12479121200158261, - 0.1349056540057063, - 0.12358935399970505, - 0.026078295006300323, - 0.06180218601366505, - 0.03601411399722565, - 0.034163501011789776, - 0.03712712400010787, - 0.03027299999666866, - 0.035046635995968245, - 0.021858660998987034, - 0.07373470600578003, - 0.053585475994623266, - 0.014427654998144135, - 0.019112788009806536, - 0.028094269990106113, - 0.026958895003190264, - 0.027375687001040205, - 0.04307624300417956, - 0.07431939498928841, - 0.03087410300213378, - 0.05057285299699288, - 0.05626623099669814, - 0.03937657100323122, - 0.030131892999634147, - 0.004160077995038591, - 0.06336581899086013, - 0.057359444006579, - 0.10824778801179491, - 0.10359193698968738, - 0.09318993700435385, - 0.09263643200392835, - 0.047878621000563726, - 0.05137507199833635, - 0.03499404799367767, - 0.0454785530018853, - 0.02058567300264258, - 0.04078982598730363, - 0.029017823006142862, - 0.048889568002778105, - 0.06503624201286584, - 0.08080794000125024, - 0.023560036002891138, - 0.07829549399320967, - 0.021982019999995828, - 0.027475751005113125, - 0.023077684993040748, - 0.04256127400731202, - 0.033716703997924924, - 0.05449177099217195, - 0.06062725900846999, - 0.0403828950074967, - 0.043975608990876935, - 0.0333208329975605, - 0.06258445999992546, - 0.2526617469993653, - 0.1379092399874935, - 0.01059393200557679, - 0.035406954993959516, - 0.021203326992690563, - 0.018851125991204754, - 0.019481053997878917, - 0.019412188994465396, - 0.05930599400016945, - 0.02226983600121457, - 0.15610303400899284, - 0.15127151399792638, - 0.14582199799770024, - 0.051567922011599876, - 0.028592320988536812, - 0.08057420700788498, - 0.03863176800950896, - 0.04264841400436126, - 0.01642049699148629, - 0.052318978996481746, - 0.09145762900880072, - 0.04414972699305508, - 0.0728400590014644, - 0.04228061799949501, - 0.04558694800653029, - 0.16787576900969725, - 0.05368674101191573, - 0.062385186000028625, - 0.06750731299689505, - 0.0, - 0.03062110600876622, - 0.11050939599226695, - 0.009278782992623746, - 0.026305446997866966, - 0.022261610996793024, - 0.0, - 0.03665126299893018, - 0.05224444899067748, - 0.03777379899111111, - 0.030559437989722937, - 0.04364327200164553, - 0.060638679991825484, - 0.025924318004399538, - 0.02542043999710586, - 0.013228544004959986, - 0.005575749994022772, - 0.02941990199906286, - 0.0230678809894016, - 0.02769395000359509, - 0.031506587998592295, - 0.0, - 0.025074472010601312, - 0.013055332005023956, - 0.04052133599179797, - 0.05433750100200996, - 0.09240645699901506, - 0.03086410700052511, - 0.5931048009952065, - 0.3549243489978835, - 0.22081013501156121, - 0.22959455600357614, - 0.21953091201430652, - 0.20160600199596956, - 0.24942254199413583, - 0.21362971799680963, - 0.355538556992542, - 0.16817362899018917, - 0.17620362198795192, - 0.34704685599717777, - 0.16179410000040662, - 0.19424941801116802, - 0.19569224301085342, - 0.2125373099988792, - 0.11439976000110619, - 0.12594062800053507, - 0.09866442400380038, - 0.04664333100663498, - 0.12011252000229433, - 0.08358213199244346, - 0.286450721003348, - 0.054405495000537485, - 0.1158406879985705, - 0.025920501997461542, - 0.14678049899521284, - 0.08367855699907523, - 0.05564566700195428, - 0.007186627990449779, - 0.012464928004192188, - 0.020186431996989995, - 0.04363763400760945, - 0.11125381301098969, - 0.0712692219967721, - 0.05598481300694402, - 0.020703326998045668, - 0.0, - 0.02538982300029602, - 0.025927002003300004, - 0.0, - 0.013790027005597949, - 0.021577844992862083, - 0.02206768600444775, - 0.02867114900436718, - 0.05115065000427421, - 0.02471566101303324, - 0.0036567350034601986, - 0.17291322399978526, - 0.028959244998986833, - 0.02640972200606484, - 0.0763790089986287, - 0.031681063992436975, - 0.0, - 0.0, - 0.23021377499389928, - 0.2117667239945149, - 0.1882764330075588, - 0.2038725119928131, - 0.03414240000711288, - 0.0, - 0.012896700005512685, - 0.03134688099089544, - 0.04109547199914232, - 0.0, - 0.009478920008405112, - 0.0127069290028885, - 0.006768605002434924, - 0.014392072000191547, - 0.0, - 0.0, - 0.030103950994089246, - 0.04303463200631086, - 0.04115867899963632, - 0.06546195199189242, - 0.088156814003014, - 0.073948051998741, - 0.018227886001113802, - 0.06603666399314534, - 0.0343927609937964, - 0.01180493799620308, - 0.011740491987438872, - 0.0, - 0.03495778900105506, - 0.0, - 0.030552506999811158, - 0.020162525004707277, - 0.0, - 0.06810194399440661, - 0.029536672998801805, - 0.03084153399686329, - 0.06160349099081941, - 0.11153980100061744, - 0.0, - 0.06377663899911568, - 0.028265485001611523, - 0.0965275689959526, - 0.13764876200002618, - 0.10099715800606646, - 0.13123698100389447, - 0.3055832050013123, - 0.04346330600674264, - 0.14039375699940138, - 0.10173478499928024, - 0.17851488600717857, - 0.07778399900416844, - 0.0, - 0.0, - 0.2588621499889996, - 0.2196105430048192, - 0.0, - 0.0, - 0.05247639899607748, - 0.0, - 0.0, - 0.046211652996134944, - 0.043762396002421156, - 0.3128552679991117, - 0.0, - 0.0, - 0.0678160840034252, - 0.09293753698875662, - 0.03311656799633056, - 0.0, - 0.0, - 0.03713729699666146, - 0.005163548004929908, - 0.0, - 0.03973332399618812, - 0.028957989008631557, - 0.0, - 0.06087917200056836, - 0.027754730996093713, - 0.06106747499143239, - 0.05679827700078022, - 0.0, - 0.0, - 0.0, - 0.09464715600188356, - 0.19814996399509255, - 0.0, - 0.09839912499592174, - 0.0, - 0.1349376899888739, - 0.13241261799703352, - 0.12547070599975996, - 0.1632631649990799, - 0.15173249199870043, - 0.0, - 0.00932395200652536, - 0.08835858599923085, - 0.12726773699978366, - 0.0, - 0.08522827700653579, - 0.27017245499882847, - 0.26423652400262654, - 0.6188317589985672, - 0.2580609759897925, - 1.0194019050104544, - 0.3878059499984374, - 0.32072878700273577, - 0.273677487988607, - 0.3410624999960419, - 0.006127821994596161, - 0.0, - 0.046777316005318426, - 0.036145171005045995, - 0.04546189299435355, - 0.03491375299927313, - 0.06169971899362281, - 0.0, - 0.04317738499958068, - 0.04063962500367779, - 0.0, - 0.03886108398728538, - 0.09682143399550114, - 0.0, - 0.101215750008123, - 0.0, - 0.11985409499902744, - 0.0, - 0.16821056099433918, - 0.10654960799729452, - 0.1867397309979424, - 0.2901730919984402, - 0.15498626000771765, - 0.15638495799794327, - 0.0, - 0.15711771200585645, - 0.12463516899151728, - 0.0, - 0.1377959829987958, - 0.511809067989816, - 0.3378164900059346, - 0.10591377899982035, - 0.022790382005041465, - 0.0, - 0.048989749993779697, - 0.11613031099841464, - 0.027626465001958422, - 0.0, - 0.0, - 0.024399608009844087, - 0.015598484998918138, - 0.0, - 0.0395716339990031, - 0.03276686801109463, - 0.06706315599149093, - 0.037348437996115535, - 0.3324318880040664, - 0.0, - 0.0, - 0.04396036399703007, - 0.09794162800244521, - 0.02890024599037133, - 0.049835445010103285, - 0.06454180300352164, - 0.06963603099575266, - 0.06048673800250981, - 0.0600617570016766, - 0.01589883399719838, - 0.0, - 0.019264338989160024, - 0.0, - 0.045840815000701696, - 0.02367039700038731, - 0.01866203900135588, - 0.0, - 0.03669751700363122, - 0.0, - 0.0538608899951214, - 0.0536375810042955, - 0.0013846919900970533, - 0.0, - 0.0, - 0.009590799003490247, - 0.024048713996307924, - 0.021262684997054748, - 0.05170292100228835, - 0.0, - 0.021899036000831984, - 0.0, - 0.0, - 0.035781119993771426, - 0.029751919995760545, - 0.14483248400210869, - 0.0, - 0.0, - 0.01767721801297739, - 0.0, - 0.0341290350042982, - 0.02328577500884421, - 0.014316614004201256, - 0.0, - 0.0, - 0.02768568399187643, - 0.0, - 0.17133713199291378, - 0.03596852600458078, - 0.08081665801000781, - 0.2076171670050826, - 0.08877625099557918, - 0.10643153100681957, - 0.0, - 0.04803521699795965, - 0.10969214398937766, - 0.017861989996163175, - 0.018596889000036754, - 0.0679313180007739, - 0.0, - 0.06428344300366007, - 0.07579846899898257, - 0.0, - 0.0, - 0.03658686199923977, - 0.0, - 0.0357938070083037, - 0.0, - 0.042233129002852365, - 0.06990793600562029, - 0.10322677400836255, - 0.03368715199758299, - 0.021356087003368884, - 0.040920927000115626, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.11290012700192165, - 0.5989327120041708, - 0.14614531700499356, - 0.14070774299034383, - 0.10985494800843298, - 0.1554100460052723, - 0.039889810999738984, - 0.0, - 0.0, - 0.03231897500518244, - 0.022712485006195493, - 0.0, - 0.05400146399915684, - 0.048921288005658425, - 0.10070231500139926, - 0.05279862099268939, - 0.1438203900033841, - 0.06648799800314009, - 0.05460409000806976, - 0.15230367600452155, - 0.030445206997683272, - 0.06297687599726487, - 0.09743827300553676, - 0.015132668006117456, - 0.029907464995631017, - 0.0, - 0.0, - 0.00934620900079608, - 0.0, - 0.0, - 0.0, - 0.0, - 0.03802374599035829, - 0.0688633870013291, - 0.04731324999011122, - 0.02925128600327298, - 0.0, - 0.0471454050129978, - 0.059549551995587535, - 0.0, - 0.07946854399051517, - 0.028175367988296784, - 0.08769471899722703, - 0.06148137200216297, - 0.08584397700906266, - 0.1312889399996493, - 0.046103403001325205, - 0.036401930992724374, - 0.0, - 0.0, - 0.0, - 0.043719950001104735, - 0.0, - 0.04077115800464526, - 0.0, - 0.059197174006840214, - 0.0414637140056584, - 0.0, - 0.040294370002811775, - 0.0, - 0.08931078099703882, - 0.023456048002117313, - 0.0, - 0.0, - 0.0, - 0.0, - 0.07754122299957089, - 0.09740718800458126, - 0.0, - 0.152069390998804, - 0.21931918100744952, - 0.09161332799703814, - 0.4335252639866667, - 0.08525867799471598, - 0.0770997340005124, - 0.43243750899273437, - 0.21921941000618972, - 0.12865037999290507, - 0.045971437997650355, - 0.0, - 0.053200388996629044, - 0.03963897400535643, - 0.0, - 0.0, - 0.022532276998390444, - 0.0, - 0.018545548999099992, - 0.052835319002042525, - 0.08485844999086112 - ], - "decode_latencies": [ - 0.1074430129956454, - 0.04489021199697163, - 0.00037993800651747733, - 0.007439585999236442, - 0.044776183989597484, - 0.028020716999890283, - 0.04931410899735056, - 0.0055557640007464215, - 0.00015028200868982822, - 0.01415732700843364, - 0.013227149000158533, - 0.020968475000699982, - 0.021855548009625636, - 0.011238863997277804, - 0.013475448009558022, - 0.01640499901259318, - 0.03304557700175792, - 0.004295127000659704, - 0.007208951006759889, - 0.012032148006255738, - 0.008028758995351382, - 0.1723685140023008, - 0.0190474830014864, - 0.1691551590047311, - 0.018783170002279803, - 0.014071183992200531, - 0.014008576996275224, - 0.016039153997553512, - 0.010705860986490734, - 0.01726626900199335, - 0.011962157994275913, - 0.02330432800226845, - 0.03288129699649289, - 0.006586046001757495, - 0.009048776002600789, - 0.03133492299821228, - 0.012729690002743155, - 0.013433842002996244, - 0.034090882996679284, - 0.01656811698921956, - 0.03313058998901397, - 0.044583188006072305, - 0.017447566002374515, - 0.023464074009098113, - 0.06476566300261766, - 0.029663617999176495, - 0.17702657399058808, - 0.016575668996665627, - 0.01744211500044912, - 0.14408758300123736, - 0.02924467800767161, - 0.013775946004898287, - 0.08717165800044313, - 0.036385264989803545, - 0.20999583799857646, - 0.059561322996160015, - 0.016016695997677743, - 0.13353393599390984, - 0.01041409200115595, - 0.017152504005935043, - 0.20391856798960362, - 0.18064669499290176, - 0.048819627991179004, - 0.05042265699012205, - 0.08724489499581978, - 0.0289553399925353, - 0.034227240990730934, - 0.05742377899878193, - 0.0981456120061921, - 0.06885750600486062, - 0.012083502995665185, - 0.012384372996166348, - 0.007652239000890404, - 0.10889723399304785, - 0.009166142001049593, - 0.1700248190027196, - 0.08933891099877656, - 0.028357754010357894, - 0.002151622000383213, - 0.2804128580028191, - 0.06451142301375512, - 0.10856222300208174, - 0.059253870000247844, - 0.09875869000097737, - 0.062096552006551065, - 0.13394833299389575, - 0.05609821899270173, - 0.058091433995286934, - 0.018782892992021516, - 0.06715554800757673, - 0.06686100899241865, - 0.031853036998654716, - 0.01617623699712567, - 0.04154218500480056, - 0.1314546479989076, - 0.051933216003817506, - 0.024436730003799312, - 0.05373767799756024, - 0.07451330099138431, - 0.05756031000055373, - 0.08789226799854077, - 0.09032383700832725, - 0.012545856996439397, - 0.08199899600003846, - 0.02859273699868936, - 0.17806985799688846, - 0.06439515000965912, - 0.13277759699849412, - 0.040502528005163185, - 0.10226606599462684, - 0.057198734997655265, - 0.3355528220126871, - 0.0727220199914882, - 0.12073190299270209, - 0.05926156300120056, - 0.0264812749956036, - 0.048559186994680203, - 0.1337291749950964, - 0.05571799099561758, - 0.023892969998996705, - 0.07419188099447638, - 0.05600901100842748, - 0.013659428004757501, - 0.15970627900969703, - 0.06630451000819448, - 0.04136720800306648, - 0.05941554300079588, - 0.04430829599732533, - 0.07997425299254246, - 0.030578825011616573, - 0.3137280819937587, - 0.002267618998303078, - 0.042708875000244007, - 0.02232608900521882, - 0.03221354899869766, - 0.04336469800909981, - 0.12864470000204165, - 0.08544880799308885, - 0.13888533000135794, - 0.08570598499500193, - 0.009682060001068749, - 0.18118895900261123, - 0.02133501399657689, - 0.03646333901269827, - 0.1501799870020477, - 0.49193009099690244, - 0.0309691139991628, - 0.03176326600078028, - 0.028440837995731272, - 0.08475838300364558, - 0.08515154600900132, - 0.08675688599760178, - 0.047302037011832, - 0.024697344008018263, - 0.3781315920059569, - 0.029047682997770607, - 0.037422863009851426, - 0.011055291004595347, - 0.0348086679878179, - 0.03275123999628704, - 0.044779696996556595, - 0.025684413994895294, - 0.04516703500121366, - 0.1631932419986697, - 0.04841903499618638, - 0.002213943997048773, - 0.04292846999305766, - 0.00010109999857377261, - 0.06589806800184306, - 0.06065180500445422, - 0.05659178500354756, - 0.06135125500441063, - 0.029439712001476437, - 0.1143299039977137, - 0.03822761100309435, - 0.29090567999810446, - 0.056218145007733256, - 0.15897209399554413, - 0.0661649719986599, - 0.008576663996791467, - 0.017246018993319012, - 0.04190028000448365, - 0.03833879499870818, - 0.0793236109893769, - 0.17566172299848404, - 0.23039589200925548, - 0.03434364800341427, - 0.2032784009934403, - 0.0353935009916313, - 0.031321105008828454, - 0.026113874002476223, - 0.015066539999679662, - 0.07069186199805699, - 0.06036407400097232, - 0.04502731999673415, - 0.035741245999815874, - 0.03913564000686165, - 0.07402163599908818, - 0.012263028998859227, - 0.04905560800398234, - 0.009552770003210753, - 0.16191604000050575, - 0.34969292700407095, - 0.09858283799258061, - 0.027671692994772457, - 0.0755081460083602, - 0.05018183898937423, - 0.05146741999487858, - 0.09615975500491913, - 0.08403037900279742, - 0.20700614899396896, - 0.0006030499935150146, - 0.0401650269923266, - 0.019963161001214758, - 0.06283950200304389, - 0.0657315049902536, - 0.05176467201090418, - 0.04987022400018759, - 0.06359290699765552, - 0.04721043599420227, - 0.0719554609968327, - 0.02142454299610108, - 0.0639935119979782, - 0.050498890996095724, - 0.05462912800430786, - 0.1783332210034132, - 0.04162827500840649, - 0.08409434600616805, - 0.07517871899472084, - 0.07819324400043115, - 0.04501424799673259, - 0.2159579599974677, - 0.2152582589915255, - 0.0882061260053888, - 0.10240032999718096, - 0.05285732999618631, - 0.05675428999529686, - 0.05873394900118001, - 0.16406782600097358, - 0.07039433199679479, - 0.2750530569901457, - 0.35008158000709955, - 0.025413515002583154, - 0.17271632100164425, - 0.0815774299990153, - 0.041611413995269686, - 0.0321285680111032, - 0.05045398499350995, - 0.027278431996819563, - 0.0669687080080621, - 4.722099401988089e-05, - 0.24127119699551258, - 0.04396460999851115, - 0.09614076100115199, - 0.24505793499702122, - 0.05735414399532601, - 0.09297567899920978, - 0.23011305399995763, - 0.07361176901031286, - 0.040724149002926424, - 0.03166315899579786, - 0.031048026008647867, - 0.055807896991609596, - 0.053952910006046295, - 0.0635257520043524, - 0.07162093700026162, - 0.2314084060053574, - 0.027189737011212856, - 0.23374437300662976, - 0.00744468200718984, - 0.04289232401060872, - 0.024951209998107515, - 0.06605419600964524, - 0.08537856199836824, - 0.03606614899763372, - 0.04664516598859336, - 0.09939890299574472, - 0.14411432399356272, - 0.25559119699755684, - 0.07859573099995032, - 0.008368530005100183, - 0.25521406999905594, - 0.05221999000059441, - 0.11064994499611203, - 0.03206825900997501, - 0.04915195200010203, - 0.17982036700414028, - 0.08073250600136817, - 0.24508586298907176, - 0.025495662994217128, - 0.03628648500307463, - 0.028236526995897293, - 0.0373219750035787, - 0.034738775008008815, - 0.023813809995772317, - 0.05436670800554566, - 0.25684702100988943, - 0.06690198300930206, - 0.053468496989808045, - 0.03286936400400009, - 0.250412656008848, - 0.06871393299661577, - 0.08882897200237494, - 0.0379611110111, - 0.0022191390016814694, - 0.058151260993327014, - 0.025033291007275693, - 0.05035404898808338, - 0.04486028300016187, - 0.07141451400821097, - 0.07612067599256989, - 0.03370874599204399, - 0.06828807000420056, - 0.2551854069897672, - 0.05036555998958647, - 0.08553229201061185, - 0.05677657799969893, - 0.031186314998194575, - 0.06910050500300713, - 0.04388086900871713, - 0.04459153799689375, - 0.12375629099551588, - 0.04737606999697164, - 0.3028640800039284, - 0.07197535499290098, - 0.03854281999520026, - 0.05409788599354215, - 0.08828250700025819, - 0.08967754799232353, - 0.043875382994883694, - 0.0821458279970102, - 0.04564853600459173, - 0.07302554399939254, - 0.04789744599838741, - 0.13204079399292823, - 0.06535635600448586, - 0.036365229985676706, - 0.34079297100834083, - 0.7824259990011342, - 0.0957000960042933, - 0.08476054698985536, - 0.0481441360025201, - 0.1063094870041823, - 0.023461103002773598, - 0.3753008409985341, - 0.05183039100666065, - 0.044051396995200776, - 0.17488539600162767, - 0.13796685499255545, - 0.09489856800064445, - 0.13607927098928485, - 0.02841485699173063, - 0.06747196400829125, - 0.04722011700505391, - 0.8720079929917119, - 0.09216752200154588, - 0.0810769419913413, - 0.06793134400504641, - 0.044576904008863494, - 8.528000034857541e-05, - 0.05592840499593876, - 0.09174781100591645, - 0.05654584401054308, - 0.0712183800060302, - 0.35416326799895614, - 0.07156371799646877, - 0.010387571004685014, - 0.019247260002885014, - 0.06614272799924947, - 0.06902142100443598, - 0.07429228500404861, - 0.04940663899469655, - 0.05793664699012879, - 0.01742851999006234, - 0.05098836400429718, - 0.09206787300354335, - 0.04857764099142514, - 0.10375840499182232, - 0.020310128005803563, - 0.005716288011171855, - 0.5903743199887685, - 0.06157385899859946, - 0.09279292200517375, - 0.29897006799001247, - 0.04190532199572772, - 0.23266208400309552, - 0.07198983000125736, - 0.030408289007027633, - 0.03187584999250248, - 0.09095294900180306, - 0.041783313994528726, - 0.04845740299788304, - 0.0432855719991494, - 0.08925782798905857, - 0.052930006000678986, - 0.027021059999242425, - 0.056699427994317375, - 0.10100224601046648, - 0.1231948569911765, - 0.020973646998754703, - 0.0728960809938144, - 0.047254805991542526, - 0.033804755003075115, - 0.070693494999432, - 0.05672456999309361, - 0.034082245008903556, - 0.07278092599881347, - 0.08045814199431334, - 0.3537202199950116, - 0.031754411989822984, - 0.18884121401060838, - 0.4354863850021502, - 0.049785131006501615, - 0.03897535700525623, - 0.05777055300131906, - 0.059359567996580154, - 0.050560462012072094, - 0.055296457998338155, - 0.044819299000664614, - 0.04714894000790082, - 0.05253279799944721, - 0.06621712200285401, - 0.04218547100026626, - 0.04799670500506181, - 0.04088874100125395, - 0.4177852970024105, - 0.04548678899300285, - 0.05209211500186939, - 0.05945951500325464, - 0.2857690680102678, - 0.050264710996998474, - 0.12308346899226308, - 0.03073861800658051, - 0.20876398900873028, - 0.06305097400036175, - 0.04627153999172151, - 0.0551682939985767, - 0.06676134299777914, - 0.05081197200343013, - 0.3835699379997095, - 0.01821738699800335, - 0.06586567100021057, - 0.10989942100422923, - 0.002257953994558193, - 0.043675079010427, - 0.07255852800153662, - 0.04616812001040671, - 0.008297480992041528, - 0.09729329899710137, - 0.008037908002734184, - 0.06801463800366037, - 0.07765577999816742, - 0.04845741800090764, - 0.017496467000455596, - 0.20809882700268645, - 0.04237117699813098, - 0.22489827800018247, - 0.05016755098768044, - 0.07443266600603238, - 0.05047759000444785, - 0.05229482399590779, - 0.04811915699974634, - 0.07447477900132071, - 0.062441231988486834, - 0.16223966400139034, - 0.0563477760006208, - 0.23858764999022242, - 0.20316717600508127, - 0.05968588299583644, - 0.2114872079982888, - 0.024702367998543195, - 0.06534179400478024, - 0.5162383550050436, - 0.06129753899585921, - 0.057165145000908524, - 0.4152468009997392, - 0.41436710199923255, - 0.053823171998374164, - 0.3024577119940659, - 0.4752122950012563, - 0.14609865201055072, - 0.09067178500117734, - 0.032524324997211806, - 0.1082093330041971, - 0.08002163800119888, - 0.07275432100868784, - 0.06554288500046823, - 0.06291187800525222, - 0.08130473799246829, - 0.2538920990045881, - 0.04670613999769557, - 0.09723286300140899, - 0.11476804099220317, - 0.08813837100751698, - 0.07512726499408018, - 0.04577345500001684, - 0.05111794099502731, - 0.061846746000810526, - 0.12358626299828757, - 0.10979661499732174, - 0.0906969360075891, - 0.042512312007602304, - 0.053664209990529343, - 0.014175541000440717, - 0.02516963001107797, - 0.05487372899369802, - 0.011897723990841769, - 0.04081360300187953, - 0.06013633499969728, - 0.09927238200907595, - 0.08515859500039369, - 0.13005231900024228, - 0.11870174598880112, - 0.0339878509985283, - 0.04200148300151341, - 0.10642905600252561, - 0.9001636210014112, - 0.04824075799842831, - 0.1531245369988028, - 0.05381926598784048, - 0.1590443460008828, - 0.35923563700634986, - 0.45772787400346715, - 0.34647588500229176, - 0.1548906189855188, - 0.1583912079950096, - 0.1309544869873207, - 0.17021527599717956, - 1.0065908910037251, - 0.18872391199693084, - 1.0962786190066254, - 0.41304609199869446, - 0.6891506709944224, - 0.6151813229953405, - 0.5090304269979242, - 0.31054986000526696, - 0.7660957539919764, - 0.7476826619968051, - 0.40741140699537937, - 2.1852482240065, - 0.6531350790028227, - 0.2866827640100382, - 1.148160483004176, - 1.773630771000171, - 0.7548705680092098, - 1.0790537950088037, - 0.5420942299970193, - 0.533326806005789, - 1.1594955140026286 - ], - "multi_turn_cache_hits": 76, - "multi_turn_cache_misses": 295, - "seed": 42, - "summary": { - "total_requests": 549, - "total_tokens": 146625, - "elapsed_time": 11.224706411361694, - "avg_throughput_tokens_per_sec": 13062.702455325296, - "requests_per_second": 48.90996520357093, - "end_to_end_latency_ms": { - "mean": 5859.818478140482, - "p50": 5234.9533189990325, - "p95": 11917.281206205374, - "p99": 17629.31561932142 - }, - "storage_io_latency_ms": { - "mean": 1024.1555082346413, - "p50": 629.7585840075044, - "p95": 2956.784466575482, - "p99": 9674.216833143726 - }, - "generation_latency_ms": { - "mean": 0.0, - "p50": 0.0, - "p95": 0.0, - "p99": 0.0 - }, - "cache_stats": { - "cache_hit_rate": 0.9297647459938629, - "cache_hits": 5454, - "cache_misses": 412, - "gpu_entries": 22, - "cpu_entries": 0, - "nvme_entries": 426, - "gpu_memory_used_gb": 0.0, - "cpu_memory_used_gb": 0.0, - "offloads_cpu": 0, - "offloads_nvme": 426, - "storage_health": { - "overall_status": "FAIL", - "criteria": [ - { - "name": "NVMe Write P95 < 500ms", - "target": 500, - "actual": 189.21024774681428, - "unit": "ms", - "passed": true - }, - { - "name": "NVMe Read P95 < 200ms", - "target": 200, - "actual": 329.5353519933997, - "unit": "ms", - "passed": false - }, - { - "name": "Cache Hit Rate > 30%", - "target": 0.3, - "actual": 0.9297647459938629, - "unit": "ratio", - "passed": true - } - ], - "passed_count": 2, - "total_count": 3 - }, - "prefill_writes": 448, - "decode_reads": 5454, - "prefill_bytes_written_gb": 7.5706787109375, - "decode_bytes_read_gb": 92.9859619140625, - "system_prompt_hits": 1261, - "common_phrase_hits": 0, - "user_cache_hits": 4117, - "multi_turn_hits": 76, - "total_read_bytes": 99842916352, - "total_write_bytes": 8128954368, - "total_read_gb": 92.9859619140625, - "total_write_gb": 7.5706787109375, - "read_write_ratio": 12.282381205759526, - "read_iops": 5454, - "write_iops": 448, - "gpu_read_p50_ms": 2.991078988998197, - "gpu_read_p95_ms": 31.549266600632084, - "gpu_read_p99_ms": 48.13581535883715, - "gpu_write_p50_ms": 16.404943009547424, - "gpu_write_p95_ms": 37.97891489084577, - "gpu_write_p99_ms": 59.363617848139235, - "nvme_read_p50_ms": 58.9307330083102, - "nvme_read_p95_ms": 395.6012370035751, - "nvme_read_p99_ms": 914.0823572059172, - "nvme_write_p50_ms": 46.96136050915811, - "nvme_write_p95_ms": 262.8929304992198, - "nvme_write_p99_ms": 492.23811698902864, - "nvme_read_device_p50_ms": 35.58107301068958, - "nvme_read_device_p95_ms": 329.5353519933997, - "nvme_read_device_p99_ms": 815.2104674081785, - "nvme_read_host_p50_ms": 19.04496799397748, - "nvme_read_host_p95_ms": 88.03859201725572, - "nvme_read_host_p99_ms": 286.19837940204917, - "nvme_write_device_p50_ms": 13.673149500391446, - "nvme_write_device_p95_ms": 189.21024774681428, - "nvme_write_device_p99_ms": 322.38521074759774, - "nvme_write_host_p50_ms": 26.88514949841192, - "nvme_write_host_p95_ms": 120.89401550474577, - "nvme_write_host_p99_ms": 294.1244499925233 - }, - "qos_metrics": { - "interactive": { - "total_requests": 549, - "latency_ms": { - "mean": 5859.818478140482, - "p50": 5234.9533189990325, - "p95": 11917.281206205373, - "p99": 17629.31561932142, - "max": 21260.88185000117 - }, - "sla": { - "target_p95_ms": 50, - "actual_p95_ms": 11917.281206205373, - "compliance": 0.0, - "met": false - } - }, - "responsive": { - "no_data": true - }, - "batch": { - "no_data": true - } - }, - "prefix_cache_stats": { - "prefix_hits": 117, - "prefix_misses": 432, - "system_prompt_reuse": 117, - "common_phrase_reuse": 0, - "bytes_saved": 98435072 - }, - "autoscaling_stats": [], - "autoscaling_summary": null, - "multi_turn_stats": { - "cache_hits": 76, - "cache_misses": 295, - "hit_rate": 0.20485175202156333 - } - } -} \ No newline at end of file diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_nvme_only_trial3.json b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_nvme_only_trial3.json deleted file mode 100644 index a4106baf..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/kvcache_nvme_only_trial3.json +++ /dev/null @@ -1,2903 +0,0 @@ -{ - "requests_completed": 548, - "total_tokens_generated": 146684, - "total_storage_io_latency": 560.5828831137333, - "total_generation_latency": 0.0, - "end_to_end_latencies": [ - 0.36344698099128436, - 0.3663236160064116, - 0.38132128899451345, - 0.40034799800196197, - 0.4032474059931701, - 0.40639400099462364, - 0.41548262001015246, - 0.4584073579899268, - 0.4586846099991817, - 0.46036255099170376, - 0.47277040900371503, - 0.5101148460089462, - 0.5189002530096332, - 0.533661009001662, - 0.5526029400061816, - 0.5617266039917013, - 0.5786869019939331, - 0.584040253990679, - 0.5864102599880425, - 0.7064447589946212, - 0.7476361199951498, - 0.7691834650031524, - 0.7688774389971513, - 0.7830630480020773, - 0.794744863989763, - 0.7988161320099607, - 0.8088943859911524, - 0.8149096400011331, - 0.8278532340045786, - 0.8483545789931668, - 0.8485779950133292, - 0.8839274330093758, - 0.9096024970058352, - 0.9140295200049877, - 0.9266336750006303, - 0.9363022149918834, - 0.9745786550047342, - 0.9858521179994568, - 0.9858552610094193, - 0.9943785659997957, - 1.1036493710125796, - 1.1357918279973092, - 1.147195431010914, - 1.1565629570104647, - 1.158202999009518, - 1.166061064999667, - 1.1718427589948988, - 1.1838961399917025, - 1.2064005420106696, - 1.2134676839923486, - 1.2216140890086535, - 1.2431452890014043, - 1.270765063003637, - 1.2801087990083033, - 1.2818472530052532, - 1.281422611005837, - 1.2864237369940383, - 1.297993515006965, - 1.3118057559913723, - 1.3255366949888412, - 1.3337449050013674, - 1.3439099779934622, - 1.3514266449928982, - 1.3549733130057575, - 1.355977396000526, - 1.3882117539906176, - 1.4038163460063515, - 1.4199645059998147, - 1.4199826669937465, - 1.4396030710049672, - 1.5623472890001722, - 1.5772654430038529, - 1.5902888779964997, - 1.591405663988553, - 1.6144756139983656, - 1.6310051820037188, - 1.6425652920006542, - 1.6727794930047821, - 1.7122303819924127, - 1.7495411450072424, - 1.7857203680032399, - 1.789754900004482, - 1.81056931099738, - 1.8164467559981858, - 1.8257673100015381, - 1.826776746995165, - 1.8408969279989833, - 1.8445882219966734, - 2.0147371850034688, - 2.0239293100021314, - 2.033988435010542, - 2.06940703500004, - 2.0878199620055966, - 2.100754062004853, - 2.110862009008997, - 2.1461376850056695, - 2.154795082009514, - 2.1644588229974033, - 2.1714968160085846, - 2.17186120300903, - 2.185032470006263, - 2.2065814710076666, - 2.2064226410002448, - 2.2087751859944547, - 2.208009320005658, - 2.359222246988793, - 2.363373843007139, - 2.3642784980038414, - 2.364316350998706, - 2.3683397699933266, - 2.3703812749881763, - 2.381146498999442, - 2.399738897991483, - 2.4110594210069394, - 2.420421281000017, - 2.420928952007671, - 2.4250152250024257, - 2.4364256279950496, - 2.4625415209884522, - 2.4781143459986197, - 2.477850045004743, - 2.500461445990368, - 2.518260692988406, - 2.5494776729901787, - 2.5748758879926754, - 2.5854330319998553, - 2.6360018529958325, - 2.656580749011482, - 2.663416477997089, - 2.667559061999782, - 2.697327896996285, - 2.706143447008799, - 2.839460996998241, - 2.8387387149996357, - 2.8468081280007027, - 2.8481412370019825, - 2.876421051012585, - 2.876605843004654, - 2.8915889670024626, - 2.902694999007508, - 2.902863366995007, - 2.905498798994813, - 2.908421382002416, - 2.974445678992197, - 2.9743059430038556, - 2.982275447997381, - 2.9831869400077267, - 3.1522265450039413, - 3.1857992690056562, - 3.2033692650002195, - 3.204460584995104, - 3.204539124999428, - 3.206363170000259, - 3.2137519899988547, - 3.2136032190028345, - 3.2143829079868738, - 3.223218523999094, - 3.234396294996259, - 3.2387531689892057, - 3.2662529289955273, - 3.2762148230103776, - 3.2825919030001387, - 3.330979706006474, - 3.343608441995457, - 3.351663997003925, - 3.35103416799393, - 3.4192344110051636, - 3.422163946001092, - 3.4626881889998913, - 3.468825924996054, - 3.4720688860106748, - 3.495208617998287, - 3.495511222994537, - 3.4970201190008083, - 3.5525348549999762, - 3.5602450180012966, - 3.6123728359962115, - 3.6100015879928833, - 3.613786521003931, - 3.626020422001602, - 3.632151844998589, - 3.6443151230050717, - 3.65981656400254, - 3.6685309750027955, - 3.7341382050071843, - 3.734309662002488, - 3.7487863679998554, - 3.7502462720003678, - 3.7866094619967043, - 3.799782387999585, - 3.8020319870120147, - 3.803820542001631, - 3.804639042005874, - 3.8150547749974066, - 3.8139585080061806, - 3.822329788992647, - 3.825278453005012, - 3.832400775005226, - 3.8498287980037276, - 4.062860744001227, - 4.0634989620011766, - 4.093899286002852, - 4.1219672489969525, - 4.143128480995074, - 4.1435880869976245, - 4.145910481995088, - 4.18044871200982, - 4.199362569008372, - 4.202443503003451, - 4.210068717002287, - 4.224016171996482, - 4.2237710949993925, - 4.299154769993038, - 4.320781023998279, - 4.321135728008812, - 4.323756883997703, - 4.325497950005229, - 4.332416146004107, - 4.344457772007445, - 4.347055792008177, - 4.35515892400872, - 4.3700013650086476, - 4.370025244003045, - 4.3814003500010585, - 4.400117231009062, - 4.459144014996127, - 4.4722323559981305, - 4.482874670007732, - 4.520802355997148, - 4.530154467996908, - 4.541843101003906, - 4.5724363699991954, - 4.576104781997856, - 4.592837181000505, - 4.614304296002956, - 4.627859537999029, - 4.855606929006171, - 4.876416938001057, - 4.880595700000413, - 4.895703848000267, - 4.918912043998716, - 4.9348538299964275, - 4.934691767994082, - 4.981111529996269, - 4.98216307599796, - 5.002961666992633, - 5.046120016006171, - 5.048339037995902, - 5.068226814008085, - 5.0682837210042635, - 5.087168818005011, - 5.087758648995077, - 5.0917333790130215, - 5.13691661998746, - 5.182991753012175, - 5.416923709999537, - 5.462352432004991, - 5.4698169469920686, - 5.4823232130002, - 5.492699816008098, - 5.519678329001181, - 5.54931459799991, - 5.576530474994797, - 5.576908967996133, - 5.577677232999122, - 5.602762870999868, - 5.6422795729886275, - 5.640134438988753, - 5.6678176679997705, - 5.681975138009875, - 5.683028332001413, - 5.689853429998038, - 5.690926376992138, - 5.710880315004033, - 5.768249207001645, - 5.769181338997441, - 5.772245793006732, - 5.783868176993565, - 5.836682359993574, - 5.886053940994316, - 5.9056544070044765, - 5.9078911949909525, - 5.920272431001649, - 5.93570830799581, - 6.002984306993312, - 6.004997729003662, - 6.027502161989105, - 6.050242798999534, - 6.074107732012635, - 6.125675934992614, - 6.128119194996543, - 6.1275096320023295, - 6.135902491005254, - 6.147903274002601, - 6.151621722994605, - 6.153315775998635, - 6.162261460995069, - 6.160413577003055, - 6.1692845369980205, - 6.190475488998345, - 6.190945570997428, - 6.192155138007365, - 6.220294544997159, - 6.2431752029951895, - 6.256516705994727, - 6.260164362000069, - 6.269228809993365, - 6.273061540006893, - 6.278988714009756, - 6.301718777991482, - 6.339578531013103, - 6.380104850002681, - 6.374940380002954, - 6.412042473006295, - 6.411187465011608, - 6.445605465996778, - 6.451021287997719, - 6.478281422998407, - 6.483962703990983, - 6.489532102001249, - 6.82135910699435, - 6.823805697000353, - 6.8329019650118425, - 6.904158366000047, - 6.913424576006946, - 6.964620565995574, - 7.00884474600025, - 7.010648386989487, - 7.022353838008712, - 7.043824396998389, - 7.078039993008133, - 7.090901029994711, - 7.09862932600663, - 7.140336936005042, - 7.16074350499548, - 7.174726773999282, - 7.179336703004083, - 7.226950916010537, - 7.229105270002037, - 7.246034661002341, - 7.260942569002509, - 7.260447305990965, - 7.293590059009148, - 7.371654361006222, - 7.379200255993055, - 7.381555964995641, - 7.384181575995171, - 7.414339231007034, - 7.444217695010593, - 7.518876749993069, - 7.523101102007786, - 7.615465124996263, - 7.621698186005233, - 7.622775608993834, - 7.638561095998739, - 7.654204300997662, - 7.677229870998417, - 7.7047720280097565, - 7.727493525992031, - 7.7452741679881, - 7.745249131010496, - 7.7505814009928145, - 7.760697025994887, - 7.799486137009808, - 7.827998917011428, - 7.831117212001118, - 7.840355897991685, - 7.845830450998619, - 7.857731864001835, - 7.869948912994005, - 7.879657215002226, - 7.903231004005647, - 7.922159334004391, - 7.921798142997432, - 7.954183056994225, - 7.979644828999881, - 8.009368302009534, - 8.020898735005176, - 8.357641424998292, - 8.359589143990888, - 8.368330327008152, - 8.433656410998083, - 8.469576354997116, - 8.485070031994837, - 8.489541407005163, - 8.502359040008741, - 8.558375754990266, - 8.571072121994803, - 8.61492454400286, - 8.61546883599658, - 8.638785185001325, - 8.64711266598897, - 8.658320101996651, - 8.673274275002768, - 8.691276725003263, - 8.76742770599958, - 8.767554543999722, - 8.794737019998138, - 8.803807985008461, - 8.804718889994547, - 8.81444251499488, - 8.874300719005987, - 8.874539475000347, - 8.875903114007087, - 8.877379666999332, - 8.878829915003735, - 8.892723726996337, - 8.899617988005048, - 8.905866172004608, - 8.917382857995108, - 8.92781474700314, - 8.935066473000916, - 8.937963695003418, - 8.96411312399141, - 8.97242127500067, - 8.980468453009962, - 9.00637061499583, - 9.023477693001041, - 9.112602576991776, - 9.158379898988642, - 9.292698939010734, - 9.305559108994203, - 9.318207204007194, - 9.320485604999703, - 9.321312700994895, - 9.334677045990247, - 9.354686482009129, - 9.398182176999399, - 9.402132764997077, - 9.430286558999796, - 9.435174450991326, - 9.466140696007642, - 9.500299450010061, - 9.539292570989346, - 9.564563156993245, - 9.566999468996073, - 9.568204773997422, - 9.569063460003235, - 9.588670527009526, - 9.615111177001381, - 9.61569207899447, - 9.648839632995077, - 9.671077129009063, - 9.767208139994182, - 9.788043380001909, - 9.790077957994072, - 9.792300481989514, - 9.866425685991999, - 9.866368087998126, - 9.924064001999795, - 10.381658084996161, - 10.47920849500224, - 10.47936692800431, - 10.516822517995024, - 10.524903652010835, - 10.594181925000157, - 10.594260379002662, - 10.685488014001749, - 10.75690045999363, - 10.754787640995346, - 10.780173125996953, - 10.78651154099498, - 10.854283640990616, - 10.851474342009169, - 10.866362250992097, - 10.883517554000719, - 10.915643756001373, - 10.93120637901302, - 10.943157728994265, - 10.944108944007894, - 10.96454706399527, - 11.027115550008602, - 11.068131318999804, - 11.082841891999124, - 11.083876094999141, - 11.09849062099238, - 11.194930841011228, - 11.221459702996071, - 11.24228535500879, - 11.245813312998507, - 11.263011351009482, - 11.265221013993141, - 11.285178136997274, - 11.322662323000259, - 11.337200577007025, - 11.390764181007398, - 11.420700764996582, - 11.448994600999868, - 11.457443159000832, - 11.527762796991738, - 11.542960902006598, - 11.551905778993387, - 11.555558243999258, - 11.584619029003079, - 11.589156936010113, - 11.589595323996036, - 11.650003436996485, - 11.67485425400082, - 11.676381690995186, - 11.676650289999088, - 11.681740855012322, - 11.698696991996258, - 11.700551550995442, - 11.706481342000188, - 11.711186435000855, - 11.720110740992823, - 11.72629021499597, - 11.736527243003366, - 11.753305413003545, - 11.755451591001474, - 11.770896884001559, - 11.796962104010163, - 11.811101887986297, - 11.86154045999865, - 11.876503097999375, - 11.910411998993368, - 11.914172224001959, - 11.943062588004977, - 11.944913213999826, - 12.013212744001066, - 12.138457078995998, - 12.176631719004945, - 12.21748654700059, - 12.234691284000291, - 12.494491862991708, - 12.57344167600968, - 12.574768952996237, - 12.78833701100666, - 13.1930069779919, - 13.281680180007243, - 13.405655007998575, - 13.73125978099415, - 14.028564027001266, - 14.201690825997503, - 14.591170216008322, - 14.684717885000282, - 14.702782603999367, - 14.874862487005885, - 15.071612932006246, - 16.000432282991824, - 17.188391349991434, - 17.437198537998484, - 17.570434004010167, - 17.613101793001988, - 17.843000356995617, - 18.32898139298777, - 19.025760688993614, - 19.16499025899975 - ], - "storage_latencies": [ - 0.2435955290129641, - 0.270160173997283, - 0.07222874199214857, - 0.2867387769947527, - 0.20391719303734135, - 0.008548663012334146, - 0.14234822199796326, - 0.20000537698797416, - 0.07244226400507614, - 0.09440733399242163, - 0.04802443798689637, - 0.11896366797736846, - 0.14126403898990247, - 0.23351532297965605, - 0.07477331200789195, - 0.1979609100089874, - 0.48103603000345174, - 0.09280370999476872, - 0.2562436700100079, - 0.2747746030072449, - 0.3698136129823979, - 0.40521341100975405, - 0.35226521897129714, - 0.32679113700578455, - 0.3830175080074696, - 0.26483147096587345, - 0.43483113100228366, - 0.13945199301815592, - 0.538872295001056, - 0.45136154198553413, - 0.42946676597057376, - 0.47255249600857496, - 0.488223439999274, - 0.5292250940110534, - 0.5055185939854709, - 0.12291626901424024, - 0.5246495429892093, - 0.5252751560037723, - 0.31769936697673984, - 0.5826273070269963, - 0.41941976100497413, - 0.5052321069961181, - 0.6051380190328928, - 0.2908657929947367, - 0.07008961300016381, - 0.3673194859875366, - 0.2842998520063702, - 0.5777842920215335, - 0.5576774769870099, - 0.7289556949835969, - 0.8248660270037362, - 0.8454757150029764, - 0.3722435950185172, - 0.8162671429890906, - 0.9649504940898623, - 0.7321132190118078, - 0.9145077849680092, - 0.9246533539990196, - 0.44653548700443935, - 0.12135945299814921, - 0.6985189379774965, - 0.6973149139957968, - 0.26562118700530846, - 0.8796550239785574, - 0.9528265310364077, - 0.703957810983411, - 0.3628554119786713, - 0.3761999169946648, - 0.08007104598800652, - 0.3049446739896666, - 0.17822559901105706, - 0.873788255994441, - 0.9647497010009829, - 0.2652645019843476, - 1.1884101799951168, - 0.5856417739851167, - 0.5604473239945946, - 0.3250634670112049, - 0.7784298350161407, - 1.3368795449641766, - 0.25100605899933726, - 0.1496799090091372, - 0.9601267769612605, - 0.7794411059876438, - 0.135280236005201, - 0.5573017289862037, - 0.307247232994996, - 1.5955205279606162, - 0.738718967026216, - 0.8331802249886096, - 0.2817290169914486, - 0.2353337120002834, - 0.3912260210054228, - 0.7703218889655545, - 0.5602851050061872, - 1.249287866972736, - 1.2807177510112524, - 1.2885346429829951, - 0.8298756109870737, - 0.09794034699734766, - 0.14532329801295418, - 0.9341795300279045, - 0.5591078629659023, - 1.0430225570016773, - 0.01537983502203133, - 1.905123883028864, - 0.8053009730065241, - 1.0445457010209793, - 0.28411699102434795, - 1.7548191139649134, - 1.5849389190116199, - 0.6449817359825829, - 0.6611548350192606, - 0.06788546200550627, - 1.2437412629660685, - 1.0280460660142126, - 2.0664655789587414, - 0.9921474840230076, - 0.8399449550051941, - 1.1481563749839552, - 0.38702019199263304, - 0.34402375303034205, - 2.1155230720178224, - 0.2146054819895653, - 0.7807306320028147, - 2.062939487004769, - 0.22958268100046553, - 2.172825479050516, - 2.2247467269189656, - 1.4247036309971008, - 0.20329069800209254, - 0.11813274600717705, - 1.0193291880132165, - 0.380405283998698, - 1.6336076669831527, - 0.06707524998637382, - 1.666401697031688, - 0.06929453799966723, - 1.0323669659992447, - 0.7374117220169865, - 0.7178206819808111, - 1.52740639600961, - 1.6822653729759622, - 1.623779778034077, - 0.43795043899444863, - 0.6505964020034298, - 0.5215725790039869, - 1.263644968974404, - 0.8870626990246819, - 0.6083636170078535, - 0.7862637710495619, - 0.15731580696592573, - 0.8027680759696523, - 0.780374345020391, - 0.675816599992686, - 0.984164855995914, - 0.718514808017062, - 1.0004186700243736, - 0.7232274709967896, - 0.31788112098001875, - 0.9552173760312144, - 0.7116899419925176, - 0.8843792789848521, - 0.38701986699015833, - 0.8199437310104258, - 0.09532910797861405, - 0.9709136439923896, - 0.9140442969946889, - 2.348655961031909, - 0.5681281750439666, - 1.3202482570050051, - 0.21697657500044443, - 0.44882911500462797, - 0.5352040110010421, - 0.9022837889788207, - 0.6911999590229243, - 1.9615547790308483, - 0.3527437930024462, - 0.7211801670055138, - 0.6865561420127051, - 0.11020006400940474, - 0.2895280919910874, - 0.19801011199888308, - 0.11321724101435393, - 0.520263646991225, - 0.0864190960128326, - 0.4865365859877784, - 0.8108563669811701, - 0.474533144995803, - 0.7342185219895327, - 0.10314878397912253, - 0.5608611580100842, - 0.41163350899296347, - 1.909782188013196, - 0.9233540210116189, - 0.033636213003774174, - 0.5223615759750828, - 0.8296779949887423, - 1.542484374003834, - 0.6756767379847588, - 0.8041076759982388, - 0.7622608090023277, - 0.06675517598341685, - 0.5852919970056973, - 0.8382855979725718, - 0.8782931320456555, - 0.6817827049817424, - 0.0506822940078564, - 0.7153625509963604, - 0.4412994080194039, - 0.8078769929998089, - 0.6498046740016434, - 0.6610282570036361, - 0.6006105830165325, - 0.14629736098868307, - 0.6574548130010953, - 1.3316554039920447, - 0.6566989989951253, - 0.27496004101703875, - 1.0972692359791836, - 0.282709012972191, - 0.7014776689902646, - 0.2424448219971964, - 0.9086081179993926, - 0.2205116990226088, - 0.37591488398902584, - 0.5906561449810397, - 0.38990284499595873, - 0.39189537697529886, - 1.9363470649841474, - 0.6963304550008615, - 0.21934683500148822, - 0.047936716975527816, - 0.3151118309906451, - 0.2670963150158059, - 0.20193081301113125, - 1.2487374500196893, - 1.436772507004207, - 0.5197033489821479, - 0.6588641450362047, - 2.375036488985643, - 1.3274900659744162, - 0.5551679009804502, - 1.021587568000541, - 0.8758375700126635, - 0.7227329169982113, - 0.5463483200001065, - 1.0886684950091876, - 2.0678106830309844, - 0.8465654109895695, - 1.2924457940243883, - 0.26695483800722286, - 1.2971826539578615, - 1.0441879199934192, - 0.29180559200176504, - 0.4357855470152572, - 0.9791443229914876, - 1.7138487950433046, - 0.8098968859849265, - 0.44657321998965926, - 0.5396292780060321, - 0.567295655986527, - 0.49888183700386435, - 0.6430368849978549, - 1.1245361210021656, - 0.603106738984934, - 2.3990515190089354, - 1.2364226980425883, - 1.0926682889985386, - 0.2466397169919219, - 0.5979204030008987, - 0.5300608160032425, - 0.5712524729897268, - 0.574609013972804, - 0.6283071829820983, - 1.3442018949863268, - 1.4414490230119554, - 0.7548427149740746, - 0.9391296109824907, - 3.3742400470364373, - 1.5360998440301046, - 1.514606858996558, - 0.7581049189611804, - 1.2236022639990551, - 0.26569975999882445, - 0.48022670499631204, - 0.41174079800839536, - 0.33522428502328694, - 0.13787520000187214, - 0.17440289800288156, - 1.6321370210062014, - 0.41872520398464985, - 0.5764279100112617, - 0.1819002720003482, - 0.5862536439963151, - 0.4111354090127861, - 1.7534178339701612, - 0.006351953008561395, - 0.44868001103168353, - 1.055365234031342, - 0.6447771409875713, - 0.47192529802850913, - 0.5267830969678471, - 0.9899374050291954, - 0.32330602599540725, - 1.293828122987179, - 0.5853932330355747, - 0.6176445199962473, - 0.15836714999750257, - 0.4368617049913155, - 3.512540778974653, - 3.014286922989413, - 0.1726693999953568, - 1.5462967129860772, - 0.5204695829743287, - 0.185102500996436, - 0.8342064889729954, - 0.6880553490045713, - 0.6324498450267129, - 0.36363039301068056, - 0.5968665629916359, - 1.0449566210154444, - 0.7470215749926865, - 0.9448475469689583, - 2.616185614009737, - 0.6256341230327962, - 0.04989408400433604, - 0.809217913003522, - 0.19807135598966852, - 1.998142024021945, - 0.09695934297633357, - 0.8098280159756541, - 0.08258679098798893, - 0.9956753499864135, - 0.9618924079986755, - 1.46868485599407, - 0.07636529400770087, - 1.121700829040492, - 0.9372101580083836, - 0.817667715047719, - 0.613530153001193, - 0.04503760200168472, - 0.5139400980406208, - 0.1778756040002918, - 1.01903378800489, - 0.703719678989728, - 0.3764961909764679, - 0.2776612450106768, - 0.39516604399250355, - 0.8634068709943676, - 1.0389273529872298, - 1.0387406449735863, - 1.101328626013128, - 0.3314418580266647, - 1.058571954985382, - 0.35823655902640894, - 0.28716959297889844, - 0.4058038149960339, - 0.05665409100765828, - 1.499071685000672, - 0.688000102963997, - 0.5310489389667055, - 0.061620590000529774, - 1.2906885080155917, - 0.8287800129619427, - 0.6544979910395341, - 2.3360710089618806, - 0.7361731650016736, - 0.5549681399861583, - 0.2808169499912765, - 0.07279621800989844, - 0.3222109690104844, - 1.435562004975509, - 0.4804482680046931, - 1.5751651339960517, - 0.3120796189905377, - 0.5930103769933339, - 2.4420430769823724, - 1.0423032400285592, - 0.9969563880003989, - 2.512311791040702, - 1.1991730139852734, - 0.7146368929970777, - 0.6789375149965053, - 0.9633130910224281, - 2.265634897034033, - 0.410713372999453, - 0.7837177790061105, - 1.697983309000847, - 1.1266499070479767, - 1.5396644049906172, - 0.8284295329940505, - 0.6139567320060451, - 0.37161593198834453, - 0.9383069870091276, - 1.1431211789895315, - 0.8386387600039598, - 0.9242000180092873, - 0.29296605300623924, - 0.9287677189859096, - 0.5402540540380869, - 1.0045608760119649, - 0.3558719670108985, - 0.9659812669851817, - 0.18845498301379848, - 0.04025226201338228, - 1.4225390099891229, - 1.018673939994187, - 1.1521582530258456, - 0.47712147698621266, - 0.42793525199522264, - 0.36978469998575747, - 0.5623699500138173, - 6.564043016987853, - 0.22642542999528814, - 0.9021951530012302, - 0.13269673200557008, - 0.6010194069967838, - 0.1533801810001023, - 0.4378357789828442, - 0.36350906698498875, - 0.4985604519752087, - 0.3630888879997656, - 0.4018131940101739, - 0.3744328929897165, - 0.42353454798285384, - 0.650431658999878, - 2.7395934860105626, - 0.6254818129964406, - 0.40302564699959476, - 0.768532854039222, - 3.342212388961343, - 0.12439967699174304, - 0.5677502719918266, - 0.4211917180218734, - 0.5872155010001734, - 0.08525019000808243, - 0.5456688659760403, - 0.7375242639827775, - 0.9166070190112805, - 0.5797837240243098, - 0.3066520910069812, - 0.26843341898347717, - 1.028260813007364, - 0.35231370602559764, - 0.11182348401052877, - 1.4728657579689752, - 0.507360249015619, - 0.07042355301382486, - 0.8498226400115527, - 1.1610733320267173, - 1.008375044024433, - 0.7617271600174718, - 1.3609255159972236, - 6.495857069035992, - 1.1013345569808735, - 1.0109794699965278, - 1.0606538600113709, - 2.4978577230358496, - 0.17112029998679645, - 1.0809719130193116, - 1.1666452569625108, - 3.066173271028674, - 1.483953073984594, - 0.4190096050151624, - 1.3036166799720377, - 1.2691388870443916, - 1.2814967869635439, - 0.3382631209969986, - 1.1226315399835585, - 2.2308832339622313, - 0.1459154909971403, - 1.417005639988929, - 0.43560840901045594, - 0.362692058988614, - 1.9689264070184436, - 1.5168486749898875, - 0.37234752599033527, - 0.3528355919843307, - 1.6856981159944553, - 0.6986113019811455, - 2.2145800529397093, - 0.44388325698673725, - 0.7953985239873873, - 2.3859774439770263, - 0.47916928300401196, - 2.7482777859841008, - 0.4892963069723919, - 0.186484648991609, - 0.8753547289961716, - 0.04024070600280538, - 2.7926139019982656, - 0.2956741560046794, - 0.5827111499966122, - 0.5967409889999544, - 0.6527708590001566, - 0.5065253770299023, - 0.6455900889850454, - 1.1456480330089107, - 0.06949521499336697, - 0.35617210598138627, - 1.938929491007002, - 0.42805931098700967, - 0.07047177199274302, - 0.09819780099496711, - 0.3340070330305025, - 0.39475888399465475, - 0.11685177002800629, - 0.297593183000572, - 0.2942906300013419, - 0.15297197198378853, - 0.09704774800047744, - 0.3882305469887797, - 0.8414297060371609, - 0.3547178650042042, - 0.42447862899280153, - 1.982277309987694, - 0.853974453988485, - 0.8014063549489947, - 0.2845804300304735, - 2.047240096013411, - 2.8714691450004466, - 0.2781402090040501, - 0.4184884029964451, - 1.5052902050374541, - 3.5449959979596315, - 2.24159267990035, - 1.4067412080039503, - 1.0890305420034565, - 1.886207610979909, - 8.676042563005467, - 3.5527912719990127, - 5.904522515003919, - 12.275266414013458, - 4.306440100990585, - 2.821974526013946, - 7.7097915390186245, - 7.880105950971483, - 6.74621125900012, - 12.366873619015678, - 7.7398450409964425, - 10.869675571026164, - 5.328234413987957, - 13.980686443959712, - 9.408051040998544, - 5.714650952024385, - 5.2576443910220405, - 12.181730936019449 - ], - "generation_latencies": [ - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0, - 0.0 - ], - "throughput_timeline": [], - "prefill_latencies": [ - 0.036508054996374995, - 0.03641709800285753, - 0.09182117599993944, - 0.028353083005640656, - 0.09840823999547865, - 0.01680864299123641, - 0.03825263399630785, - 0.05629780799790751, - 0.028054210997652262, - 0.037837479991139844, - 0.10525537199282553, - 0.10272561199963093, - 0.017646221996983513, - 0.03630448199692182, - 0.005871155997738242, - 0.0037357030087150633, - 0.0042758719937410206, - 0.005630996995023452, - 0.013416036003036425, - 0.012102294989745133, - 0.041345485995407216, - 0.010207430008449592, - 0.010794909001560882, - 0.015367308995337225, - 0.017956416006200016, - 0.01349988899892196, - 0.012342525005806237, - 0.014705022986163385, - 0.005622391006909311, - 0.015821574997971766, - 0.018335341999772936, - 0.02314935199683532, - 0.012634544997126795, - 0.02406500199867878, - 0.017850243995781057, - 0.028716963992337696, - 0.010359045991208404, - 0.009573345989338122, - 0.015352268994320184, - 0.017372536996845156, - 0.011041260993806645, - 0.013532509008655325, - 0.06337530999735463, - 0.017942464008228853, - 0.01968743299948983, - 0.023680425001657568, - 0.12026614601199981, - 0.028734455001540482, - 0.023386700995615683, - 0.03089429999818094, - 0.025915825986885466, - 0.04191237500344869, - 0.029196639006840996, - 0.05357694601116236, - 0.10702071699779481, - 0.03556646600191016, - 0.09078526799567044, - 0.03810006800631527, - 0.057971196991275065, - 0.060469204996479675, - 0.05306235900206957, - 0.062017708987696096, - 0.08422874999814667, - 0.03196250900509767, - 0.07113126799231395, - 0.1194869939936325, - 0.15284819599764887, - 0.13042021499131806, - 0.14075622099335305, - 0.07859296999231447, - 0.052751427007024176, - 0.046489483007462695, - 0.05754597899795044, - 0.042430656001670286, - 0.059878813990508206, - 0.04745488399930764, - 0.03029135600081645, - 0.06380565799190663, - 0.07685294299153611, - 0.058468724993872456, - 0.06238341199059505, - 0.054444520006654784, - 0.04163529700599611, - 0.022207527988939546, - 0.0758757840085309, - 0.03168417800043244, - 0.17533769899455365, - 0.11252753000007942, - 0.1070594190096017, - 0.12738934499793686, - 0.0308620009891456, - 0.02273509799852036, - 0.029732854993199, - 0.024567296990426257, - 0.032596114004263654, - 0.022116623993497342, - 0.05528261199651752, - 0.04335218999767676, - 0.04487285499635618, - 0.07306100999994669, - 0.06637725200562272, - 0.05235697999887634, - 0.09237133100396022, - 0.060739396009012125, - 0.0313030209945282, - 0.0743355629965663, - 0.028424140997231007, - 0.07170825499633793, - 0.057656944001792, - 0.0342969350022031, - 0.025974407006287947, - 0.019077682998613454, - 0.022029976011253893, - 0.030346851999638602, - 0.06927850500505883, - 0.18713566398946568, - 0.053045913999085315, - 0.017403046003892086, - 0.16681821100064553, - 0.06666335200134199, - 0.027013369006454013, - 0.23615352100750897, - 0.06398538898793049, - 0.08932868600822985, - 0.06094054799177684, - 0.1151743089867523, - 0.07921592700586189, - 0.05110589700052515, - 0.035466774002998136, - 0.04022874899965245, - 0.023439569005859084, - 0.19707394200668205, - 0.24215524000464939, - 0.05500655599462334, - 0.20813754199480172, - 0.06304504499712493, - 0.0581577830016613, - 0.25807628499751445, - 0.05384248000336811, - 0.2787757370097097, - 0.05242171400459483, - 0.03325473700533621, - 0.04363994000595994, - 0.0026736189902294427, - 0.08495354100887198, - 0.053300339000998065, - 0.04136639699572697, - 0.020457759004784748, - 0.02989484900899697, - 0.03408528899308294, - 0.0, - 0.0, - 0.14785860799020156, - 0.15972760099975858, - 0.06638490399927832, - 0.09817385600763373, - 0.0, - 0.1365922650002176, - 0.14022342200041749, - 0.12040493899257854, - 0.13129667400789913, - 0.11340457299957052, - 0.13503604900324717, - 0.13637250699684955, - 0.09779068999341689, - 0.11191348099964671, - 0.1540983610029798, - 0.0828070030111121, - 0.11092601998825558, - 0.07406978801009245, - 0.08735484500357416, - 0.12076587499177549, - 0.6533028130070306, - 0.32377747500140686, - 0.11916127799486276, - 0.04559778599650599, - 0.03386087199032772, - 0.24864028798765503, - 0.05873873700329568, - 0.05743361198983621, - 0.06879826099611819, - 0.022211973002413288, - 0.04160447299364023, - 0.043205156005569734, - 0.018895797998993658, - 0.07922008600144181, - 0.4416529680020176, - 0.23942115101090167, - 0.2652616749983281, - 0.4784597819962073, - 0.23753457599377725, - 0.23327293399779592, - 0.024395972999627702, - 0.18779792099667247, - 0.2116898090025643, - 0.36503837299824227, - 0.22148101500351913, - 0.023459910007659346, - 0.03410594101296738, - 0.02646207700308878, - 0.0381038720079232, - 0.06320271700678859, - 0.04911182398791425, - 0.059646320005413145, - 0.03838679101318121, - 0.03399198199622333, - 0.0664618890004931, - 0.09147180199215654, - 0.0, - 0.03390036900236737, - 0.0, - 0.033821557997725904, - 0.08205563700175844, - 0.09717854199698195, - 0.024449871998513117, - 0.16962285399495158, - 0.042203478995361365, - 0.03703961899736896, - 0.03306160800275393, - 0.04577369900653139, - 0.05158047498844098, - 0.0889259820105508, - 0.08798504500009585, - 0.07407281499763485, - 0.0, - 0.08541074499953538, - 0.0, - 0.028900386998429894, - 0.0341771920066094, - 0.020501086008152924, - 0.0, - 0.04894475000037346, - 0.044804131001001224, - 0.04326132200367283, - 0.0, - 0.03157548799936194, - 0.011265765991993248, - 0.03938992500479799, - 0.0, - 0.0704721910005901, - 0.0, - 0.0222443319944432, - 0.03607071700389497, - 0.022630381005001254, - 0.025175382004817948, - 0.2278511409967905, - 0.26111833899631165, - 0.05120285099837929, - 0.23011513300298247, - 0.07423513299727347, - 0.021669570996891707, - 0.0, - 0.08846303400059696, - 0.04718630400020629, - 0.04250714600493666, - 0.052081565008847974, - 0.0, - 0.014724401000421494, - 0.06796253799984697, - 0.044406341010471806, - 0.05799026098975446, - 0.0, - 0.02360461000353098, - 0.016726978006772697, - 0.0, - 0.0604724409931805, - 0.08187718800036237, - 0.0851748779969057, - 0.09558363399992231, - 0.11423978999664541, - 0.11926592500822153, - 0.06016963299771305, - 0.13905924100254197, - 0.04764863599848468, - 0.11794766200182494, - 0.3196919100009836, - 0.1672955880057998, - 0.0, - 0.0, - 0.07987943499756511, - 0.0, - 0.035367702002986334, - 0.0, - 0.0, - 0.2525355420075357, - 0.02937815399491228, - 0.0, - 0.04690010700142011, - 0.05068111499713268, - 0.01739691899274476, - 0.0, - 0.0, - 0.0, - 0.0363152719946811, - 0.11566198999935295, - 0.0, - 0.0194892669969704, - 0.04170741100097075, - 0.0, - 0.020572356996126473, - 0.2719452320015989, - 0.5870542890042998, - 0.0, - 0.05371869000373408, - 0.3543504379922524, - 0.02308201299456414, - 0.3098601730016526, - 0.03631024599599186, - 0.0, - 0.06674586900044233, - 0.015112288005184382, - 0.05316778899577912, - 0.03593071400246117, - 0.05498474900377914, - 0.022808804002124816, - 0.0, - 0.02626351900107693, - 0.037036567999166436, - 0.08584855899971444, - 0.05529749499692116, - 0.030147133002174087, - 0.035892230007448234, - 0.10785297799156979, - 0.0, - 0.020163629000307992, - 0.04088953699101694, - 0.030208779004169628, - 0.07600095499947201, - 0.03449444400030188, - 0.023325223999563605, - 0.03943576700112317, - 0.08449722100340296, - 0.07054456799232867, - 0.0, - 0.06447795400163159, - 0.03528423799434677, - 0.06426321000617463, - 0.0, - 0.0734858810028527, - 0.03144495100423228, - 0.06887180599733256, - 0.0, - 0.04589571899850853, - 0.03059617299004458, - 0.0, - 0.06860463399789296, - 0.0, - 0.06551678299729247, - 0.04206136999709997, - 0.07178763899719343, - 0.052510538007481955, - 0.0, - 0.0, - 0.19882186099130195, - 0.1077361690113321, - 0.13885575300082564, - 0.12024000599922147, - 0.166020826989552, - 0.0, - 0.12273660900245886, - 0.0, - 0.36365674101398326, - 0.5368992920120945, - 0.34952077000343706, - 0.3330365760048153, - 0.6689841729967156, - 0.02447910199407488, - 0.35907836600381415, - 0.0, - 0.41893010601052083, - 0.0259891979949316, - 0.027384043001802638, - 0.0, - 0.0, - 0.05221346201142296, - 0.0, - 0.05135442499886267, - 0.042674116004491225, - 0.03762992299743928, - 0.030513206002069637, - 0.010558497000602074, - 0.0, - 0.03858086800028104, - 0.02029106899863109, - 0.08573802199680358, - 0.0, - 0.024003048994927667, - 0.026828701011254452, - 0.03207103499153163, - 0.0, - 0.054776782999397255, - 0.08946188200206961, - 0.03225710999686271, - 0.0011817449994850904, - 0.0, - 0.01217934700252954, - 0.026739395994809456, - 0.0, - 0.05853255699912552, - 0.0, - 0.040748194995103404, - 0.035388973003136925, - 0.04632128599041607, - 0.05062284698942676, - 0.07080347099690698, - 0.0453961659950437, - 0.03186967200599611, - 0.0, - 0.0, - 0.011488114003441297, - 0.0, - 0.0, - 0.1184014960017521, - 0.0, - 0.030642172001535073, - 0.063093270000536, - 0.054167522001080215, - 0.0, - 0.0, - 0.0, - 0.06459308600460645, - 0.057453779008938, - 0.11813323201204184, - 0.017743622011039406, - 0.34515872500196565, - 0.0, - 0.0, - 0.37774122500559315, - 0.0, - 0.5072984069993254, - 0.03704607400868554, - 0.023893613994005136, - 0.06280108900682535, - 0.06928262200381141, - 0.0, - 0.1257554740004707, - 0.11504911500378512, - 0.02740269500645809, - 0.05133845099771861, - 0.04610305199457798, - 0.02470222400734201, - 0.0, - 0.06326768599683419, - 0.0, - 0.0508151339890901, - 0.0, - 0.027788305000285618, - 0.0, - 0.036084260005736724, - 0.0, - 0.01568752300227061, - 0.0, - 0.03184470799169503, - 0.12129173400171567, - 0.04402242500509601, - 0.0, - 0.03551216200867202, - 0.046141997998347506, - 0.0, - 0.0, - 0.0, - 0.06969082700379658, - 0.09091002300556283, - 0.06335606200445909, - 0.12793472698831465, - 0.0, - 0.0, - 0.15563228199607693, - 0.32317050399433356, - 0.13090338099573273, - 0.0, - 0.04420358900097199, - 0.03927420699619688, - 0.05242145399097353, - 0.0654963580018375, - 0.11283787499996834, - 0.03357577900169417, - 0.07335305400192738, - 0.05683228399720974, - 0.09248223199392669, - 0.054279053991194814, - 0.03540511999744922, - 0.17280342199956067, - 0.0, - 0.02249866099737119, - 0.022911253006896004, - 0.0, - 0.0, - 0.0, - 0.06136938199051656, - 0.0, - 0.02477733099658508, - 0.04941094201058149, - 0.0, - 0.04916703399794642, - 0.06408476299839094, - 0.0, - 0.47618112500640564, - 0.495870962011395, - 0.07051580298866611, - 0.0, - 0.027116226992802694, - 0.0, - 0.03249515499919653, - 0.06639860400173347, - 0.037530433997744694, - 0.04715136998856906, - 0.04331952199572697, - 0.0, - 0.0, - 0.0, - 0.03422816700185649, - 0.025470870998105966, - 0.059051457996247336, - 0.0, - 0.0, - 0.015877637997618876, - 0.0, - 0.04521087200555485, - 0.072461583011318, - 0.0, - 0.008754127004067414, - 0.0, - 0.0, - 0.0, - 0.0, - 0.10092678100045305, - 0.1189985609962605, - 0.11407276900717989, - 0.2049356200004695, - 0.0, - 0.1942794189963024, - 0.0, - 0.18316911699366756, - 0.42061178499716334, - 0.10782276099780574, - 0.0773011069977656, - 0.4591118469979847, - 0.13643378199776635, - 0.028557822995935567, - 0.0, - 0.0, - 0.0611627909966046, - 0.03202733999933116, - 0.0, - 0.0, - 0.005809338006656617, - 0.04987015399092343, - 0.07782725300057791 - ], - "decode_latencies": [ - 0.1092569629981881, - 0.02740059200732503, - 0.14213472200208344, - 0.09245069700409658, - 0.06055534699407872, - 0.0011779460037359968, - 0.1232133529993007, - 0.025244055010261945, - 0.024188148992834613, - 0.009328483996796422, - 0.015281429994502105, - 0.009456309999222867, - 0.011138702990137972, - 0.14386266699875705, - 0.04719463699439075, - 0.009354578010970727, - 0.05028407200006768, - 0.05861361400457099, - 0.01262607000535354, - 0.024221679996117018, - 0.007945458011818118, - 0.00726715600467287, - 0.005683679002686404, - 0.08697532300720923, - 0.011770733995945193, - 0.03535899400594644, - 0.011190360994078219, - 0.10482193100324366, - 0.11037857200426515, - 0.00812973000574857, - 0.013101108997943811, - 0.033876568006235175, - 0.011155928004882298, - 0.017825048009399325, - 0.007743777998257428, - 0.06655501200293656, - 0.025956323006539606, - 0.04081766300078016, - 0.09103673600475304, - 0.012250190993654542, - 0.07095100199512672, - 0.1503959900001064, - 0.10080719599500299, - 0.05807751799875405, - 0.11429783000494353, - 0.030741734997718595, - 0.04003353700682055, - 0.021754474990302697, - 0.188987839006586, - 0.05277714200201444, - 0.008306503994390368, - 0.06063157299649902, - 0.12728210000204854, - 0.09765004199289251, - 0.1107726560003357, - 0.1274281859950861, - 0.012249324994627386, - 0.008481035998556763, - 0.07309318499756046, - 0.03878479100239929, - 0.049941250996198505, - 0.18582426200737245, - 0.171334027996636, - 0.05017834799946286, - 0.016792855996754952, - 0.16776397300418466, - 0.2461642150010448, - 0.04326194801251404, - 0.014302325012977235, - 0.20818699200754054, - 0.02701563799928408, - 0.1833618470118381, - 0.23662595498899464, - 0.01508250400365796, - 0.01162725398899056, - 0.1365334189904388, - 0.14568731699546333, - 0.10658906999742612, - 0.05072423900128342, - 0.01906090100237634, - 0.17903565999586135, - 0.09606754999549594, - 0.035218927994719706, - 0.027452548005385324, - 0.04811037199397106, - 0.06377987501036841, - 0.14653286000248045, - 0.03685064100136515, - 0.05582750999019481, - 0.04416524300177116, - 0.03253350999148097, - 0.00923159800004214, - 0.054560842996579595, - 0.04609523400722537, - 0.19153993799409363, - 0.07030165899777785, - 0.07098288300039712, - 0.02709283299918752, - 0.0423029410012532, - 0.03974465999635868, - 0.23018261799006723, - 0.05972440900222864, - 0.08057927999470849, - 0.062321161007275805, - 0.009164668008452281, - 0.03585555999598, - 0.0512292079947656, - 0.04030567698646337, - 0.056830390996765345, - 0.059748135987319984, - 0.07895557100709993, - 0.14209463099541608, - 0.06568564599729143, - 0.15833479299908504, - 0.040439323012833484, - 0.0498362630023621, - 0.03929873200831935, - 0.02379806300450582, - 0.046387974987737834, - 0.050170881004305556, - 0.1123369490087498, - 0.08008456700190436, - 0.03763068599801045, - 0.15752859799249563, - 0.046607634008978494, - 0.10439336000126787, - 0.042534918000455946, - 0.15114228198945057, - 0.047130425009527244, - 0.058951241997419856, - 0.05879371799528599, - 0.038991105000604875, - 0.3983446019992698, - 0.0957660889980616, - 0.03798769600689411, - 0.0040801649884087965, - 0.0527917329891352, - 9.570499241817743e-05, - 0.017884111002786085, - 0.05654652499652002, - 0.02166791800118517, - 0.044571439997525886, - 0.0672281059960369, - 0.0697134410002036, - 0.15536286900169216, - 0.15548987399961334, - 0.040725835991906933, - 0.05605901998933405, - 0.21064191800542176, - 0.08073143099318258, - 0.04822655599855352, - 0.01651468400086742, - 0.03756474998954218, - 0.07697052499861456, - 0.06064926901308354, - 0.015460316993994638, - 0.09343560801062267, - 0.0193405620084377, - 0.03874717600410804, - 0.04676954999740701, - 0.17571671000041533, - 0.19530230200325605, - 0.047135802000411786, - 0.08076619399071205, - 0.06828180400771089, - 0.04284980699594598, - 0.05797826898924541, - 0.08768622799834702, - 0.11768964299699292, - 0.06255084098665975, - 0.03335372099536471, - 0.06559493900567759, - 0.05908560000534635, - 0.06026358899543993, - 0.05912199799786322, - 0.029040020002867095, - 0.03715925999858882, - 0.03499013200053014, - 0.03082305399584584, - 0.06225514700054191, - 0.008214371002395637, - 0.024988807999761775, - 0.03617783299705479, - 0.050093145997379906, - 0.06302483299805317, - 0.038518403001944534, - 0.05758256700937636, - 0.04865295300260186, - 0.07715119500062428, - 0.08111365199147258, - 0.052914586005499586, - 0.029595945990877226, - 0.049675178990582936, - 0.07823426398681477, - 0.22431087099539582, - 0.05448811801034026, - 0.06286125900805928, - 0.09634595499665011, - 0.2898863650043495, - 0.03642166100325994, - 0.05608601600397378, - 0.028403989999787882, - 0.22293610501219518, - 0.05793873900256585, - 0.03701884800102562, - 0.0753162839973811, - 0.039060127004631795, - 0.00013072400179225951, - 0.01845332198718097, - 0.020420116008608602, - 0.06444996599748265, - 0.022164811991387978, - 0.024920454001403414, - 0.06039147199771833, - 0.052919992987881415, - 0.05613632399763446, - 0.02085249799711164, - 0.05026189499767497, - 0.2468050079914974, - 0.04377966500760522, - 0.2597144009923795, - 0.028467597992857918, - 0.03468465700279921, - 0.04699452599743381, - 0.03287921399169136, - 0.03621096700953785, - 0.0684348869981477, - 0.2682535310013918, - 0.06662003400560934, - 0.0881110059999628, - 0.014090637996559963, - 0.031478432996664196, - 0.0014610369980800897, - 0.05288005599868484, - 0.04616020699904766, - 0.08199081999191549, - 0.13667200999043416, - 0.2342972140031634, - 0.03564418898895383, - 0.05695771600585431, - 0.1215844050020678, - 0.047396966998348944, - 0.041667516008601524, - 0.11289452899654862, - 0.04361132399935741, - 0.05626086599659175, - 0.1544901720044436, - 0.05715200801205356, - 0.08254428800137248, - 0.021761682000942528, - 0.03572669001005124, - 0.2783926170086488, - 0.060208095994312316, - 0.28597346699098125, - 0.013720321003347635, - 0.04054786999768112, - 0.024656148001668043, - 0.08648201100004371, - 0.0780154709937051, - 0.05681218901008833, - 0.09618525899713859, - 0.06677654100349173, - 0.03172088900464587, - 0.038636730998405255, - 0.05075030399893876, - 0.0662805200117873, - 0.053608248999807984, - 0.03182805700635072, - 0.04652395800803788, - 0.01834165099717211, - 0.08278856400283985, - 0.05672328399668913, - 0.02973749399825465, - 0.03017984099278692, - 0.05417726100131404, - 0.04468808999808971, - 0.12055704000522383, - 0.041667458004667424, - 0.04351868899539113, - 0.3486505100008799, - 0.04776087400387041, - 0.06726703600725159, - 0.021983915998134762, - 0.09750731999520212, - 0.048025399999460205, - 0.04242669101222418, - 0.06620188499800861, - 0.09821751300478354, - 0.05651091400068253, - 0.03194631601218134, - 0.023642494008527137, - 0.041628267004853114, - 0.058788248992641456, - 0.06007832899922505, - 0.09532553800090682, - 0.059471902990480885, - 0.035034444998018444, - 0.028335394003079273, - 0.02973888599080965, - 0.08764831200824119, - 0.07877733599161729, - 0.0382234429998789, - 0.0539364319993183, - 0.057619183004135266, - 0.029204682999989018, - 0.04839969000022393, - 0.09316003900312353, - 0.0783539060066687, - 0.07100464700488374, - 0.08748574698984157, - 0.24737668699526694, - 0.6955870289966697, - 0.059723924001445994, - 0.27245550499355886, - 0.05385055601072963, - 0.04127154000161681, - 0.03161210300459061, - 0.0521881820022827, - 0.04686204199970234, - 0.11649471400596667, - 0.06560643500415608, - 0.06795157400483731, - 0.07685946399578825, - 0.024318604002473876, - 0.14834325799893122, - 0.06718271199497394, - 0.04917579899483826, - 0.038878259001648985, - 0.001445540998247452, - 0.11070321399893146, - 0.01383986699511297, - 0.0739672389900079, - 0.11270055000204593, - 0.06826476499554701, - 0.07097820199851412, - 0.06059538200497627, - 0.05484216701006517, - 0.05794369999784976, - 0.049813434001407586, - 0.046829354992951266, - 0.3682105890038656, - 0.009332617002655752, - 0.3697220570029458, - 0.09482107400253881, - 0.10382461799599696, - 0.40945972899498884, - 0.10158340900670737, - 0.05822842099587433, - 0.03951824399700854, - 0.4928661989979446, - 0.0887549329927424, - 0.4074837460066192, - 0.03677171700110193, - 0.030381265009054914, - 0.13407804499729536, - 0.06991486100014299, - 0.000186416000360623, - 0.03832205799699295, - 0.049894490992301144, - 0.05081688200880308, - 0.04647359800583217, - 0.05980396999802906, - 0.0447482419986045, - 0.09417406799911987, - 0.06918185501126572, - 0.08602142598829232, - 0.36554713700024877, - 0.03795410000020638, - 0.05666012299479917, - 0.07123360600962769, - 0.05426461700699292, - 0.058771533993422054, - 0.056026122998446226, - 0.057431986002484336, - 0.07235291101096664, - 0.05132063399651088, - 0.03798558400012553, - 0.3041025999991689, - 0.08637177899072412, - 0.22159593099786434, - 0.08043770999938715, - 0.054444405002868734, - 0.11688030300138053, - 0.18184309800562914, - 0.11139285100216512, - 0.043840405996888876, - 0.1286100490106037, - 0.035683886002516374, - 0.08373142300115433, - 0.04409406399645377, - 0.07557377500052098, - 0.05456652400607709, - 0.05686479799624067, - 0.3982394530030433, - 0.06259888999920804, - 0.06952314800582826, - 0.07122301000345033, - 0.03771405499719549, - 0.04163938600686379, - 0.03851914500410203, - 0.42729917899123393, - 0.060679671994876117, - 0.0296799479983747, - 0.0823345969984075, - 0.05133780700271018, - 0.035009376995731145, - 0.08553427399601787, - 0.049669579006149434, - 0.04173896799329668, - 0.07427651699981652, - 0.14074070799688343, - 0.07878275300026871, - 0.42280531699361745, - 0.342456658006995, - 0.05531715399411041, - 0.0687957140034996, - 0.059968072004267015, - 0.06532166800752748, - 0.054437531987787224, - 0.06200690200785175, - 0.04134978700312786, - 0.039689444995019585, - 0.056503358006011695, - 0.03778893800335936, - 0.06424959300784394, - 0.02721871501125861, - 0.057195478992071, - 0.2624468019930646, - 0.009685320997959934, - 0.04731343500316143, - 0.05130167100287508, - 0.34257172800425906, - 0.05194063800445292, - 0.043979884998407215, - 0.027589920995524153, - 0.0447109270025976, - 0.00259286499931477, - 0.050113599994801916, - 0.05644953899900429, - 0.0710793819889659, - 0.02920395699038636, - 0.036567026007105596, - 0.05324281299544964, - 0.23626359101035632, - 0.08140000999264885, - 0.012390187999699265, - 0.39517214699299075, - 0.038492138002766296, - 0.030180841000401415, - 0.05107400601264089, - 0.06163091800408438, - 0.03041539499827195, - 0.06886199499422219, - 0.21656617600820027, - 0.18258739500015508, - 0.0908502940001199, - 0.043989983008941635, - 0.05346958199515939, - 0.38475853399722837, - 0.05350657500093803, - 0.07424151099985465, - 0.03141615599452052, - 0.24739339400548488, - 0.04826961499929894, - 0.0655503510060953, - 0.07801861000189092, - 0.06883852000464685, - 0.030984272991190664, - 0.10368262798874639, - 0.030559474005713128, - 0.07121653099602554, - 0.10124053999606986, - 0.07817856600740924, - 0.05306262100930326, - 0.12705209200794343, - 0.24965420499211177, - 0.35486373500316404, - 0.06899605999933556, - 0.021343957996577956, - 0.09741966900764965, - 0.03453142600483261, - 0.1326694010058418, - 0.05177456700766925, - 0.5085950259963283, - 0.05879485698824283, - 0.05961498399847187, - 0.23621056300180499, - 0.07596107700373977, - 0.24324142299883533, - 0.16184133700153325, - 0.04556921099720057, - 0.08780835200741421, - 0.06461670399585273, - 0.08049239798856433, - 0.061398632009513676, - 0.07310563499049749, - 0.04661131699685939, - 0.07930066899280064, - 0.13023393700132146, - 0.06380815099691972, - 0.060172914003487676, - 0.08796603999508079, - 0.05071858500014059, - 0.047002102990518324, - 0.04953818800277077, - 0.05053554200276267, - 0.06489007800701074, - 0.030795995000516996, - 0.037400379005703144, - 0.06863664998672903, - 0.06844763900153339, - 0.02253937099885661, - 0.06270674499683082, - 0.05697964101273101, - 0.07240994599123951, - 0.05827495499397628, - 0.12171091500204057, - 0.14473143599752802, - 0.058929350998369046, - 0.04963892500381917, - 0.16438545599521603, - 0.3382104459888069, - 0.3547055600065505, - 0.16616009701101575, - 0.12301860099250916, - 0.21944069399614818, - 0.5435156000021379, - 0.1338238880125573, - 1.019235223007854, - 0.13161552499514073, - 0.5182512839965057, - 1.2757576079893624, - 0.2354066140105715, - 0.6423723719926784, - 0.8826011280034436, - 0.30119191099947784, - 0.7248506940086372, - 0.8584717179910513, - 0.5860078340047039, - 1.6023005710012512, - 0.9167075619916432, - 1.7031657890038332, - 1.16348775899678, - 0.9133047429932049, - 0.4131181190023199, - 1.0760639700019965, - 0.5944249559979653, - 0.5501292789995205 - ], - "multi_turn_cache_hits": 75, - "multi_turn_cache_misses": 297, - "seed": 42, - "summary": { - "total_requests": 548, - "total_tokens": 146684, - "elapsed_time": 11.267836093902588, - "avg_throughput_tokens_per_sec": 13017.938739752855, - "requests_per_second": 48.63400527245347, - "end_to_end_latency_ms": { - "mean": 6140.036848054977, - "p50": 5739.564761002839, - "p95": 12094.621561747768, - "p99": 17507.813334984672 - }, - "storage_io_latency_ms": { - "mean": 1022.9614655360095, - "p50": 643.9070129927131, - "p95": 2777.0962613933066, - "p99": 9064.007056341778 - }, - "generation_latency_ms": { - "mean": 0.0, - "p50": 0.0, - "p95": 0.0, - "p99": 0.0 - }, - "cache_stats": { - "cache_hit_rate": 0.930465827949677, - "cache_hits": 5473, - "cache_misses": 409, - "gpu_entries": 16, - "cpu_entries": 0, - "nvme_entries": 433, - "gpu_memory_used_gb": 0.0, - "cpu_memory_used_gb": 0.0, - "offloads_cpu": 0, - "offloads_nvme": 433, - "storage_health": { - "overall_status": "FAIL", - "criteria": [ - { - "name": "NVMe Write P95 < 500ms", - "target": 500, - "actual": 184.88548159657512, - "unit": "ms", - "passed": true - }, - { - "name": "NVMe Read P95 < 200ms", - "target": 200, - "actual": 305.0211516027045, - "unit": "ms", - "passed": false - }, - { - "name": "Cache Hit Rate > 30%", - "target": 0.3, - "actual": 0.930465827949677, - "unit": "ratio", - "passed": true - } - ], - "passed_count": 2, - "total_count": 3 - }, - "prefill_writes": 449, - "decode_reads": 5473, - "prefill_bytes_written_gb": 7.364990234375, - "decode_bytes_read_gb": 91.2911376953125, - "system_prompt_hits": 1160, - "common_phrase_hits": 0, - "user_cache_hits": 4238, - "multi_turn_hits": 75, - "total_read_bytes": 98023112704, - "total_write_bytes": 7908098048, - "total_read_gb": 91.2911376953125, - "total_write_gb": 7.364990234375, - "read_write_ratio": 12.395282925050552, - "read_iops": 5473, - "write_iops": 449, - "gpu_read_p50_ms": 3.093189501669258, - "gpu_read_p95_ms": 41.00240149491582, - "gpu_read_p99_ms": 107.82302447303641, - "gpu_write_p50_ms": 37.08539250510512, - "gpu_write_p95_ms": 215.79176450541127, - "gpu_write_p99_ms": 260.7145385023614, - "nvme_read_p50_ms": 61.4110075039207, - "nvme_read_p95_ms": 369.2106971007887, - "nvme_read_p99_ms": 883.3475683738659, - "nvme_write_p50_ms": 53.045913999085315, - "nvme_write_p95_ms": 321.08334759832354, - "nvme_write_p99_ms": 503.6416246031877, - "nvme_read_device_p50_ms": 37.53224400134059, - "nvme_read_device_p95_ms": 305.0211516027045, - "nvme_read_device_p99_ms": 764.9997430053195, - "nvme_read_host_p50_ms": 19.654980991617776, - "nvme_read_host_p95_ms": 83.59523918552439, - "nvme_read_host_p99_ms": 245.60161646164494, - "nvme_write_device_p50_ms": 15.513343998463824, - "nvme_write_device_p95_ms": 184.88548159657512, - "nvme_write_device_p99_ms": 341.41933392093057, - "nvme_write_host_p50_ms": 30.788410003879108, - "nvme_write_host_p95_ms": 141.69743460370222, - "nvme_write_host_p99_ms": 310.0091035547667 - }, - "qos_metrics": { - "interactive": { - "total_requests": 548, - "latency_ms": { - "mean": 6140.036848054977, - "p50": 5739.564761002839, - "p95": 12094.621561747768, - "p99": 17507.813334984672, - "max": 19164.99025899975 - }, - "sla": { - "target_p95_ms": 50, - "actual_p95_ms": 12094.621561747768, - "compliance": 0.0, - "met": false - } - }, - "responsive": { - "no_data": true - }, - "batch": { - "no_data": true - } - }, - "prefix_cache_stats": { - "prefix_hits": 112, - "prefix_misses": 437, - "system_prompt_reuse": 112, - "common_phrase_reuse": 0, - "bytes_saved": 95027200 - }, - "autoscaling_stats": [], - "autoscaling_summary": null, - "multi_turn_stats": { - "cache_hits": 75, - "cache_misses": 297, - "hit_rate": 0.20161290322580644 - } - } -} \ No newline at end of file diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/lmcache_cpu_offload_trial1.json b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/lmcache_cpu_offload_trial1.json deleted file mode 100644 index 082a832a..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/lmcache_cpu_offload_trial1.json +++ /dev/null @@ -1,10 +0,0 @@ -{ - "tier": "cpu_offload", - "num_prompts": 492, - "total_tokens": 61613, - "elapsed_time": 6.466309070587158, - "tokens_per_second": 9528.310405120394, - "requests_per_second": 76.08668169573359, - "backend": "lmcache", - "cpu_mem_gb": 32 -} \ No newline at end of file diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/lmcache_cpu_offload_trial2.json b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/lmcache_cpu_offload_trial2.json deleted file mode 100644 index 2780079e..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/lmcache_cpu_offload_trial2.json +++ /dev/null @@ -1,10 +0,0 @@ -{ - "tier": "cpu_offload", - "num_prompts": 492, - "total_tokens": 61605, - "elapsed_time": 6.556665658950806, - "tokens_per_second": 9395.781820276314, - "requests_per_second": 75.0381406635167, - "backend": "lmcache", - "cpu_mem_gb": 32 -} \ No newline at end of file diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/lmcache_cpu_offload_trial3.json b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/lmcache_cpu_offload_trial3.json deleted file mode 100644 index e715c372..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/lmcache_cpu_offload_trial3.json +++ /dev/null @@ -1,10 +0,0 @@ -{ - "tier": "cpu_offload", - "num_prompts": 492, - "total_tokens": 61605, - "elapsed_time": 6.618597030639648, - "tokens_per_second": 9307.863844076066, - "requests_per_second": 74.33599563810445, - "backend": "lmcache", - "cpu_mem_gb": 32 -} \ No newline at end of file diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/lmcache_gpu_only_trial1.json b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/lmcache_gpu_only_trial1.json deleted file mode 100644 index 86d26cb8..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/lmcache_gpu_only_trial1.json +++ /dev/null @@ -1,9 +0,0 @@ -{ - "tier": "gpu_only", - "num_prompts": 492, - "total_tokens": 61605, - "elapsed_time": 6.496809005737305, - "tokens_per_second": 9482.347402485879, - "requests_per_second": 75.72948497724296, - "backend": "lmcache" -} \ No newline at end of file diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/lmcache_gpu_only_trial2.json b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/lmcache_gpu_only_trial2.json deleted file mode 100644 index 44f3636c..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/lmcache_gpu_only_trial2.json +++ /dev/null @@ -1,9 +0,0 @@ -{ - "tier": "gpu_only", - "num_prompts": 492, - "total_tokens": 61605, - "elapsed_time": 6.492191553115845, - "tokens_per_second": 9489.091548821209, - "requests_per_second": 75.78334618975789, - "backend": "lmcache" -} \ No newline at end of file diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/lmcache_gpu_only_trial3.json b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/lmcache_gpu_only_trial3.json deleted file mode 100644 index d7119b69..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/lmcache_gpu_only_trial3.json +++ /dev/null @@ -1,9 +0,0 @@ -{ - "tier": "gpu_only", - "num_prompts": 492, - "total_tokens": 61733, - "elapsed_time": 6.46160626411438, - "tokens_per_second": 9553.816416027177, - "requests_per_second": 76.14205816476392, - "backend": "lmcache" -} \ No newline at end of file diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_cpu_nvme_trial1.log b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_cpu_nvme_trial1.log deleted file mode 100644 index e438c2ab..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_cpu_nvme_trial1.log +++ /dev/null @@ -1,113 +0,0 @@ -Using random seed: 42 -[ShareGPT] Loaded 319 conversations with 981 turns -[ShareGPT] Context tokens: mean=122.4, p50=17.0, p95=677.0 -[ShareGPT] Generation tokens: mean=261.9, p50=244.0, p95=633.0 -[KVCacheGenerator] Pre-generating 256 MB noise buffer... - -Integrated Multi-User KV Cache Benchmark - MLPerf Edition -Model: Mistral 7B -Users: 50 -Duration: 60s -Seed: 42 -Generation Mode: none (0.0ms/token) -Features: - - Phase-Aware Processing: Enabled - - Multi-turn Conversations: Enabled - - Prefix Caching: Enabled - - RAG Workload: Disabled - - Autoscaling: Disabled - - QoS Support: Enabled (Interactive/Responsive/Batch) - - Trace-Driven (BurstGPT): Disabled - - ShareGPT Dataset: Enabled -================================================================================ - -ShareGPT Dataset Statistics: - Conversations: 319 - Total Turns: 981 - -Starting benchmark... --------------------------------------------------------------------------------- - -================================================================================ -BENCHMARK RESULTS - MLPerf KV Cache Storage Benchmark -Generation Mode: none (0.0ms/token) -================================================================================ - -### STORAGE PERFORMANCE ASSESSMENT: PASS ✓ ### - Criteria Passed: 2/2 - ✓ CPU RAM P95 < 150ms: 15.70ms (target: 150.00ms) - ✓ Cache Hit Rate > 30%: 92.6% (target: 30.0%) - -### OVERALL PERFORMANCE ### -Requests Completed: 438 -Total Tokens Generated: 118293 -Throughput: 1950.91 tokens/sec -Requests/sec: 7.22 - -### END-TO-END LATENCY (Storage I/O + Token Generation) ### - Mean: 22496.61 ms - P50: 15972.13 ms - P95: 61651.50 ms - P99: 63370.45 ms - -### STORAGE I/O LATENCY (Primary Metric) ### - Mean: 189.40 ms - P50: 109.66 ms - P95: 669.80 ms - P99: 1119.20 ms - -### STORAGE PERFORMANCE ### - Cache Hit Rate: 92.6% - Total Read: 75.68 GB - Total Write: 6.68 GB - Read/Write Ratio: 11.33 - Read IOPS: 72.90 - Write IOPS: 6.33 - -### CACHE TIER DISTRIBUTION ### - GPU Entries: 61 (3.20 GB) - CPU Entries: 156 (1.71 GB) - NVMe Entries: 158 - -### PHASE-SPECIFIC METRICS ### - Prefill Writes: 380 - Prefill Bytes Written: 6.68 GB - Decode Reads: 4374 - Decode Bytes Read: 75.68 GB - -### TIER-SPECIFIC LATENCIES ### - GPU Read P95: 46.69 ms - GPU Write P95: 136.19 ms - CPU Read P95: 15.70 ms - -### CACHE TYPE BREAKDOWNS ### - System Prompt Hits: 859 - Common Phrase Hits: 0 - User Cache Hits: 3471 - Multi-turn Hits: 44 - -### PREFIX CACHING ### - Prefix Hits: 93 - Prefix Misses: 345 - System Prompt Reuse: 93 - Bytes Saved: 0.08 GB - -### MULTI-TURN CONVERSATIONS ### - Multi-turn Cache Hits: 44 - Multi-turn Cache Misses: 257 - Multi-turn Hit Rate: 14.6% - -### QOS LATENCY METRICS (Informational - includes simulated generation) ### - - INTERACTIVE: - Requests: 438 - Latency P95: 61651.50 ms - Latency P99: 63370.45 ms - SLA Met: ✗ (compliance: 0.0%) - -================================================================================ -NOTES: - - Pure storage I/O benchmark (no generation simulation) -================================================================================ - -Results saved to lmcache_results_20260106_233959/kvcache_gpu_cpu_nvme_trial1.json diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_cpu_nvme_trial2.log b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_cpu_nvme_trial2.log deleted file mode 100644 index bd13269e..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_cpu_nvme_trial2.log +++ /dev/null @@ -1,115 +0,0 @@ -Using random seed: 42 -[ShareGPT] Loaded 319 conversations with 981 turns -[ShareGPT] Context tokens: mean=122.4, p50=17.0, p95=677.0 -[ShareGPT] Generation tokens: mean=261.9, p50=244.0, p95=633.0 -[KVCacheGenerator] Pre-generating 256 MB noise buffer... - -Integrated Multi-User KV Cache Benchmark - MLPerf Edition -Model: Mistral 7B -Users: 50 -Duration: 60s -Seed: 42 -Generation Mode: none (0.0ms/token) -Features: - - Phase-Aware Processing: Enabled - - Multi-turn Conversations: Enabled - - Prefix Caching: Enabled - - RAG Workload: Disabled - - Autoscaling: Disabled - - QoS Support: Enabled (Interactive/Responsive/Batch) - - Trace-Driven (BurstGPT): Disabled - - ShareGPT Dataset: Enabled -================================================================================ - -ShareGPT Dataset Statistics: - Conversations: 319 - Total Turns: 981 - -Starting benchmark... --------------------------------------------------------------------------------- - -================================================================================ -BENCHMARK RESULTS - MLPerf KV Cache Storage Benchmark -Generation Mode: none (0.0ms/token) -================================================================================ - -### STORAGE PERFORMANCE ASSESSMENT: PASS ✓ ### - Criteria Passed: 3/3 - ✓ NVMe Read P95 < 200ms: 39.04ms (target: 200.00ms) - ✓ CPU RAM P95 < 150ms: 26.89ms (target: 150.00ms) - ✓ Cache Hit Rate > 30%: 93.3% (target: 30.0%) - -### OVERALL PERFORMANCE ### -Requests Completed: 549 -Total Tokens Generated: 147313 -Throughput: 3504.35 tokens/sec -Requests/sec: 13.06 - -### END-TO-END LATENCY (Storage I/O + Token Generation) ### - Mean: 11741.78 ms - P50: 3959.79 ms - P95: 43183.22 ms - P99: 44894.89 ms - -### STORAGE I/O LATENCY (Primary Metric) ### - Mean: 267.17 ms - P50: 146.58 ms - P95: 1035.27 ms - P99: 1396.14 ms - -### STORAGE PERFORMANCE ### - Cache Hit Rate: 93.3% - Total Read: 91.87 GB - Total Write: 7.76 GB - Read/Write Ratio: 11.84 - Read IOPS: 91.18 - Write IOPS: 7.52 - -### CACHE TIER DISTRIBUTION ### - GPU Entries: 22 (3.04 GB) - CPU Entries: 8 (0.95 GB) - NVMe Entries: 419 - -### PHASE-SPECIFIC METRICS ### - Prefill Writes: 451 - Prefill Bytes Written: 7.76 GB - Decode Reads: 5471 - Decode Bytes Read: 91.87 GB - -### TIER-SPECIFIC LATENCIES ### - GPU Read P95: 114.41 ms - GPU Write P95: 228.24 ms - CPU Read P95: 26.89 ms - NVME Read P95: 81.56 ms - -### CACHE TYPE BREAKDOWNS ### - System Prompt Hits: 852 - Common Phrase Hits: 0 - User Cache Hits: 4548 - Multi-turn Hits: 71 - -### PREFIX CACHING ### - Prefix Hits: 90 - Prefix Misses: 459 - System Prompt Reuse: 90 - Bytes Saved: 0.07 GB - -### MULTI-TURN CONVERSATIONS ### - Multi-turn Cache Hits: 71 - Multi-turn Cache Misses: 301 - Multi-turn Hit Rate: 19.1% - -### QOS LATENCY METRICS (Informational - includes simulated generation) ### - - INTERACTIVE: - Requests: 549 - Latency P95: 43183.22 ms - Latency P99: 44894.89 ms - SLA Met: ✗ (compliance: 0.0%) - -================================================================================ -NOTES: - - Pure storage I/O benchmark (no generation simulation) -================================================================================ - -Results saved to lmcache_results_20260106_233959/kvcache_gpu_cpu_nvme_trial2.json diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_cpu_nvme_trial3.log b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_cpu_nvme_trial3.log deleted file mode 100644 index d812bb8c..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_cpu_nvme_trial3.log +++ /dev/null @@ -1,115 +0,0 @@ -Using random seed: 42 -[ShareGPT] Loaded 319 conversations with 981 turns -[ShareGPT] Context tokens: mean=122.4, p50=17.0, p95=677.0 -[ShareGPT] Generation tokens: mean=261.9, p50=244.0, p95=633.0 -[KVCacheGenerator] Pre-generating 256 MB noise buffer... - -Integrated Multi-User KV Cache Benchmark - MLPerf Edition -Model: Mistral 7B -Users: 50 -Duration: 60s -Seed: 42 -Generation Mode: none (0.0ms/token) -Features: - - Phase-Aware Processing: Enabled - - Multi-turn Conversations: Enabled - - Prefix Caching: Enabled - - RAG Workload: Disabled - - Autoscaling: Disabled - - QoS Support: Enabled (Interactive/Responsive/Batch) - - Trace-Driven (BurstGPT): Disabled - - ShareGPT Dataset: Enabled -================================================================================ - -ShareGPT Dataset Statistics: - Conversations: 319 - Total Turns: 981 - -Starting benchmark... --------------------------------------------------------------------------------- - -================================================================================ -BENCHMARK RESULTS - MLPerf KV Cache Storage Benchmark -Generation Mode: none (0.0ms/token) -================================================================================ - -### STORAGE PERFORMANCE ASSESSMENT: PASS ✓ ### - Criteria Passed: 3/3 - ✓ NVMe Read P95 < 200ms: 87.54ms (target: 200.00ms) - ✓ CPU RAM P95 < 150ms: 15.01ms (target: 150.00ms) - ✓ Cache Hit Rate > 30%: 92.8% (target: 30.0%) - -### OVERALL PERFORMANCE ### -Requests Completed: 549 -Total Tokens Generated: 147832 -Throughput: 17586.24 tokens/sec -Requests/sec: 65.31 - -### END-TO-END LATENCY (Storage I/O + Token Generation) ### - Mean: 4470.00 ms - P50: 3735.96 ms - P95: 9286.45 ms - P99: 12368.83 ms - -### STORAGE I/O LATENCY (Primary Metric) ### - Mean: 245.69 ms - P50: 128.50 ms - P95: 881.76 ms - P99: 1371.43 ms - -### STORAGE PERFORMANCE ### - Cache Hit Rate: 92.8% - Total Read: 95.83 GB - Total Write: 8.94 GB - Read/Write Ratio: 10.72 - Read IOPS: 91.67 - Write IOPS: 7.33 - -### CACHE TIER DISTRIBUTION ### - GPU Entries: 7 (3.18 GB) - CPU Entries: 10 (2.65 GB) - NVMe Entries: 413 - -### PHASE-SPECIFIC METRICS ### - Prefill Writes: 440 - Prefill Bytes Written: 8.94 GB - Decode Reads: 5500 - Decode Bytes Read: 95.83 GB - -### TIER-SPECIFIC LATENCIES ### - GPU Read P95: 126.70 ms - GPU Write P95: 217.03 ms - CPU Read P95: 15.01 ms - NVME Read P95: 159.60 ms - -### CACHE TYPE BREAKDOWNS ### - System Prompt Hits: 1026 - Common Phrase Hits: 0 - User Cache Hits: 4400 - Multi-turn Hits: 74 - -### PREFIX CACHING ### - Prefix Hits: 109 - Prefix Misses: 440 - System Prompt Reuse: 109 - Bytes Saved: 0.09 GB - -### MULTI-TURN CONVERSATIONS ### - Multi-turn Cache Hits: 74 - Multi-turn Cache Misses: 318 - Multi-turn Hit Rate: 18.9% - -### QOS LATENCY METRICS (Informational - includes simulated generation) ### - - INTERACTIVE: - Requests: 549 - Latency P95: 9286.45 ms - Latency P99: 12368.83 ms - SLA Met: ✗ (compliance: 0.0%) - -================================================================================ -NOTES: - - Pure storage I/O benchmark (no generation simulation) -================================================================================ - -Results saved to lmcache_results_20260106_233959/kvcache_gpu_cpu_nvme_trial3.json diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_cpu_trial1.log b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_cpu_trial1.log deleted file mode 100644 index 85461cc6..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_cpu_trial1.log +++ /dev/null @@ -1,111 +0,0 @@ -Using random seed: 42 -[ShareGPT] Loaded 319 conversations with 981 turns -[ShareGPT] Context tokens: mean=122.4, p50=17.0, p95=677.0 -[ShareGPT] Generation tokens: mean=261.9, p50=244.0, p95=633.0 -[KVCacheGenerator] Pre-generating 256 MB noise buffer... - -Integrated Multi-User KV Cache Benchmark - MLPerf Edition -Model: Mistral 7B -Users: 50 -Duration: 60s -Seed: 42 -Generation Mode: none (0.0ms/token) -Features: - - Phase-Aware Processing: Enabled - - Multi-turn Conversations: Enabled - - Prefix Caching: Enabled - - RAG Workload: Disabled - - Autoscaling: Disabled - - QoS Support: Enabled (Interactive/Responsive/Batch) - - Trace-Driven (BurstGPT): Disabled - - ShareGPT Dataset: Enabled -================================================================================ - -ShareGPT Dataset Statistics: - Conversations: 319 - Total Turns: 981 - -Starting benchmark... --------------------------------------------------------------------------------- - -================================================================================ -BENCHMARK RESULTS - MLPerf KV Cache Storage Benchmark -Generation Mode: none (0.0ms/token) -================================================================================ - -### STORAGE PERFORMANCE ASSESSMENT: PASS ✓ ### - Criteria Passed: 1/1 - ✓ Cache Hit Rate > 30%: 92.7% (target: 30.0%) - -### OVERALL PERFORMANCE ### -Requests Completed: 549 -Total Tokens Generated: 148297 -Throughput: 2766.08 tokens/sec -Requests/sec: 10.24 - -### END-TO-END LATENCY (Storage I/O + Token Generation) ### - Mean: 25717.34 ms - P50: 26512.06 ms - P95: 54093.91 ms - P99: 54182.24 ms - -### STORAGE I/O LATENCY (Primary Metric) ### - Mean: 155.51 ms - P50: 100.79 ms - P95: 462.46 ms - P99: 1069.82 ms - -### STORAGE PERFORMANCE ### - Cache Hit Rate: 92.7% - Total Read: 100.01 GB - Total Write: 7.81 GB - Read/Write Ratio: 12.81 - Read IOPS: 91.60 - Write IOPS: 7.22 - -### CACHE TIER DISTRIBUTION ### - GPU Entries: 352 (6.40 GB) - CPU Entries: 4 (6.35 GB) - NVMe Entries: 77 - -### PHASE-SPECIFIC METRICS ### - Prefill Writes: 433 - Prefill Bytes Written: 7.81 GB - Decode Reads: 5496 - Decode Bytes Read: 100.01 GB - -### TIER-SPECIFIC LATENCIES ### - GPU Read P95: 30.93 ms - GPU Write P95: 119.69 ms - -### CACHE TYPE BREAKDOWNS ### - System Prompt Hits: 1182 - Common Phrase Hits: 0 - User Cache Hits: 4251 - Multi-turn Hits: 63 - -### PREFIX CACHING ### - Prefix Hits: 114 - Prefix Misses: 435 - System Prompt Reuse: 114 - Bytes Saved: 0.09 GB - -### MULTI-TURN CONVERSATIONS ### - Multi-turn Cache Hits: 63 - Multi-turn Cache Misses: 320 - Multi-turn Hit Rate: 16.4% - -### QOS LATENCY METRICS (Informational - includes simulated generation) ### - - INTERACTIVE: - Requests: 549 - Latency P95: 54093.91 ms - Latency P99: 54182.24 ms - SLA Met: ✗ (compliance: 0.0%) - -================================================================================ -NOTES: - - Pure storage I/O benchmark (no generation simulation) -================================================================================ - -Results saved to lmcache_results_20260106_233959/kvcache_gpu_cpu_trial1.json diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_cpu_trial2.log b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_cpu_trial2.log deleted file mode 100644 index 4971d8c0..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_cpu_trial2.log +++ /dev/null @@ -1,113 +0,0 @@ -Using random seed: 42 -[ShareGPT] Loaded 319 conversations with 981 turns -[ShareGPT] Context tokens: mean=122.4, p50=17.0, p95=677.0 -[ShareGPT] Generation tokens: mean=261.9, p50=244.0, p95=633.0 -[KVCacheGenerator] Pre-generating 256 MB noise buffer... - -Integrated Multi-User KV Cache Benchmark - MLPerf Edition -Model: Mistral 7B -Users: 50 -Duration: 60s -Seed: 42 -Generation Mode: none (0.0ms/token) -Features: - - Phase-Aware Processing: Enabled - - Multi-turn Conversations: Enabled - - Prefix Caching: Enabled - - RAG Workload: Disabled - - Autoscaling: Disabled - - QoS Support: Enabled (Interactive/Responsive/Batch) - - Trace-Driven (BurstGPT): Disabled - - ShareGPT Dataset: Enabled -================================================================================ - -ShareGPT Dataset Statistics: - Conversations: 319 - Total Turns: 981 - -Starting benchmark... --------------------------------------------------------------------------------- - -================================================================================ -BENCHMARK RESULTS - MLPerf KV Cache Storage Benchmark -Generation Mode: none (0.0ms/token) -================================================================================ - -### STORAGE PERFORMANCE ASSESSMENT: PASS ✓ ### - Criteria Passed: 2/2 - ✓ CPU RAM P95 < 150ms: 15.77ms (target: 150.00ms) - ✓ Cache Hit Rate > 30%: 93.0% (target: 30.0%) - -### OVERALL PERFORMANCE ### -Requests Completed: 549 -Total Tokens Generated: 146891 -Throughput: 2853.75 tokens/sec -Requests/sec: 10.67 - -### END-TO-END LATENCY (Storage I/O + Token Generation) ### - Mean: 24335.69 ms - P50: 23836.15 ms - P95: 51915.50 ms - P99: 52094.94 ms - -### STORAGE I/O LATENCY (Primary Metric) ### - Mean: 155.52 ms - P50: 104.04 ms - P95: 450.09 ms - P99: 997.65 ms - -### STORAGE PERFORMANCE ### - Cache Hit Rate: 93.0% - Total Read: 94.28 GB - Total Write: 7.58 GB - Read/Write Ratio: 12.43 - Read IOPS: 91.05 - Write IOPS: 7.47 - -### CACHE TIER DISTRIBUTION ### - GPU Entries: 378 (6.01 GB) - CPU Entries: 31 (6.31 GB) - NVMe Entries: 39 - -### PHASE-SPECIFIC METRICS ### - Prefill Writes: 448 - Prefill Bytes Written: 7.58 GB - Decode Reads: 5463 - Decode Bytes Read: 94.28 GB - -### TIER-SPECIFIC LATENCIES ### - GPU Read P95: 28.54 ms - GPU Write P95: 97.71 ms - CPU Read P95: 15.77 ms - -### CACHE TYPE BREAKDOWNS ### - System Prompt Hits: 1096 - Common Phrase Hits: 0 - User Cache Hits: 4292 - Multi-turn Hits: 75 - -### PREFIX CACHING ### - Prefix Hits: 115 - Prefix Misses: 434 - System Prompt Reuse: 115 - Bytes Saved: 0.09 GB - -### MULTI-TURN CONVERSATIONS ### - Multi-turn Cache Hits: 75 - Multi-turn Cache Misses: 297 - Multi-turn Hit Rate: 20.2% - -### QOS LATENCY METRICS (Informational - includes simulated generation) ### - - INTERACTIVE: - Requests: 549 - Latency P95: 51915.50 ms - Latency P99: 52094.94 ms - SLA Met: ✗ (compliance: 0.0%) - -================================================================================ -NOTES: - - Pure storage I/O benchmark (no generation simulation) -================================================================================ - -Results saved to lmcache_results_20260106_233959/kvcache_gpu_cpu_trial2.json diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_cpu_trial3.log b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_cpu_trial3.log deleted file mode 100644 index 5aafd59a..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_cpu_trial3.log +++ /dev/null @@ -1,111 +0,0 @@ -Using random seed: 42 -[ShareGPT] Loaded 319 conversations with 981 turns -[ShareGPT] Context tokens: mean=122.4, p50=17.0, p95=677.0 -[ShareGPT] Generation tokens: mean=261.9, p50=244.0, p95=633.0 -[KVCacheGenerator] Pre-generating 256 MB noise buffer... - -Integrated Multi-User KV Cache Benchmark - MLPerf Edition -Model: Mistral 7B -Users: 50 -Duration: 60s -Seed: 42 -Generation Mode: none (0.0ms/token) -Features: - - Phase-Aware Processing: Enabled - - Multi-turn Conversations: Enabled - - Prefix Caching: Enabled - - RAG Workload: Disabled - - Autoscaling: Disabled - - QoS Support: Enabled (Interactive/Responsive/Batch) - - Trace-Driven (BurstGPT): Disabled - - ShareGPT Dataset: Enabled -================================================================================ - -ShareGPT Dataset Statistics: - Conversations: 319 - Total Turns: 981 - -Starting benchmark... --------------------------------------------------------------------------------- - -================================================================================ -BENCHMARK RESULTS - MLPerf KV Cache Storage Benchmark -Generation Mode: none (0.0ms/token) -================================================================================ - -### STORAGE PERFORMANCE ASSESSMENT: PASS ✓ ### - Criteria Passed: 1/1 - ✓ Cache Hit Rate > 30%: 92.8% (target: 30.0%) - -### OVERALL PERFORMANCE ### -Requests Completed: 549 -Total Tokens Generated: 148164 -Throughput: 13954.79 tokens/sec -Requests/sec: 51.71 - -### END-TO-END LATENCY (Storage I/O + Token Generation) ### - Mean: 5920.39 ms - P50: 6287.43 ms - P95: 11181.52 ms - P99: 11209.62 ms - -### STORAGE I/O LATENCY (Primary Metric) ### - Mean: 228.78 ms - P50: 140.89 ms - P95: 735.47 ms - P99: 920.01 ms - -### STORAGE PERFORMANCE ### - Cache Hit Rate: 92.8% - Total Read: 93.37 GB - Total Write: 7.35 GB - Read/Write Ratio: 12.70 - Read IOPS: 91.93 - Write IOPS: 7.22 - -### CACHE TIER DISTRIBUTION ### - GPU Entries: 370 (6.38 GB) - CPU Entries: 63 (2.04 GB) - NVMe Entries: 0 - -### PHASE-SPECIFIC METRICS ### - Prefill Writes: 433 - Prefill Bytes Written: 7.35 GB - Decode Reads: 5516 - Decode Bytes Read: 93.37 GB - -### TIER-SPECIFIC LATENCIES ### - GPU Read P95: 83.32 ms - GPU Write P95: 167.22 ms - -### CACHE TYPE BREAKDOWNS ### - System Prompt Hits: 1088 - Common Phrase Hits: 0 - User Cache Hits: 4348 - Multi-turn Hits: 80 - -### PREFIX CACHING ### - Prefix Hits: 116 - Prefix Misses: 433 - System Prompt Reuse: 116 - Bytes Saved: 0.10 GB - -### MULTI-TURN CONVERSATIONS ### - Multi-turn Cache Hits: 80 - Multi-turn Cache Misses: 314 - Multi-turn Hit Rate: 20.3% - -### QOS LATENCY METRICS (Informational - includes simulated generation) ### - - INTERACTIVE: - Requests: 549 - Latency P95: 11181.52 ms - Latency P99: 11209.62 ms - SLA Met: ✗ (compliance: 0.2%) - -================================================================================ -NOTES: - - Pure storage I/O benchmark (no generation simulation) -================================================================================ - -Results saved to lmcache_results_20260106_233959/kvcache_gpu_cpu_trial3.json diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_only_trial1.log b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_only_trial1.log deleted file mode 100644 index fc2b8581..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_only_trial1.log +++ /dev/null @@ -1,111 +0,0 @@ -Using random seed: 42 -[ShareGPT] Loaded 319 conversations with 981 turns -[ShareGPT] Context tokens: mean=122.4, p50=17.0, p95=677.0 -[ShareGPT] Generation tokens: mean=261.9, p50=244.0, p95=633.0 -[KVCacheGenerator] Pre-generating 256 MB noise buffer... - -Integrated Multi-User KV Cache Benchmark - MLPerf Edition -Model: Mistral 7B -Users: 50 -Duration: 60s -Seed: 42 -Generation Mode: none (0.0ms/token) -Features: - - Phase-Aware Processing: Enabled - - Multi-turn Conversations: Enabled - - Prefix Caching: Enabled - - RAG Workload: Disabled - - Autoscaling: Disabled - - QoS Support: Enabled (Interactive/Responsive/Batch) - - Trace-Driven (BurstGPT): Disabled - - ShareGPT Dataset: Enabled -================================================================================ - -ShareGPT Dataset Statistics: - Conversations: 319 - Total Turns: 981 - -Starting benchmark... --------------------------------------------------------------------------------- - -================================================================================ -BENCHMARK RESULTS - MLPerf KV Cache Storage Benchmark -Generation Mode: none (0.0ms/token) -================================================================================ - -### STORAGE PERFORMANCE ASSESSMENT: PASS ✓ ### - Criteria Passed: 1/1 - ✓ Cache Hit Rate > 30%: 93.1% (target: 30.0%) - -### OVERALL PERFORMANCE ### -Requests Completed: 549 -Total Tokens Generated: 146900 -Throughput: 2688.48 tokens/sec -Requests/sec: 10.05 - -### END-TO-END LATENCY (Storage I/O + Token Generation) ### - Mean: 29974.70 ms - P50: 30461.31 ms - P95: 55516.30 ms - P99: 55609.48 ms - -### STORAGE I/O LATENCY (Primary Metric) ### - Mean: 158.16 ms - P50: 99.66 ms - P95: 370.61 ms - P99: 1672.22 ms - -### STORAGE PERFORMANCE ### - Cache Hit Rate: 93.1% - Total Read: 94.42 GB - Total Write: 7.59 GB - Read/Write Ratio: 12.44 - Read IOPS: 91.05 - Write IOPS: 7.50 - -### CACHE TIER DISTRIBUTION ### - GPU Entries: 450 (7.59 GB) - CPU Entries: 0 (0.00 GB) - NVMe Entries: 0 - -### PHASE-SPECIFIC METRICS ### - Prefill Writes: 450 - Prefill Bytes Written: 7.59 GB - Decode Reads: 5463 - Decode Bytes Read: 94.42 GB - -### TIER-SPECIFIC LATENCIES ### - GPU Read P95: 21.37 ms - GPU Write P95: 99.56 ms - -### CACHE TYPE BREAKDOWNS ### - System Prompt Hits: 1019 - Common Phrase Hits: 0 - User Cache Hits: 4368 - Multi-turn Hits: 76 - -### PREFIX CACHING ### - Prefix Hits: 108 - Prefix Misses: 441 - System Prompt Reuse: 108 - Bytes Saved: 0.09 GB - -### MULTI-TURN CONVERSATIONS ### - Multi-turn Cache Hits: 76 - Multi-turn Cache Misses: 296 - Multi-turn Hit Rate: 20.4% - -### QOS LATENCY METRICS (Informational - includes simulated generation) ### - - INTERACTIVE: - Requests: 549 - Latency P95: 55516.30 ms - Latency P99: 55609.48 ms - SLA Met: ✗ (compliance: 0.0%) - -================================================================================ -NOTES: - - Pure storage I/O benchmark (no generation simulation) -================================================================================ - -Results saved to lmcache_results_20260106_233959/kvcache_gpu_only_trial1.json diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_only_trial2.log b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_only_trial2.log deleted file mode 100644 index e8ad55e5..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_only_trial2.log +++ /dev/null @@ -1,111 +0,0 @@ -Using random seed: 42 -[ShareGPT] Loaded 319 conversations with 981 turns -[ShareGPT] Context tokens: mean=122.4, p50=17.0, p95=677.0 -[ShareGPT] Generation tokens: mean=261.9, p50=244.0, p95=633.0 -[KVCacheGenerator] Pre-generating 256 MB noise buffer... - -Integrated Multi-User KV Cache Benchmark - MLPerf Edition -Model: Mistral 7B -Users: 50 -Duration: 60s -Seed: 42 -Generation Mode: none (0.0ms/token) -Features: - - Phase-Aware Processing: Enabled - - Multi-turn Conversations: Enabled - - Prefix Caching: Enabled - - RAG Workload: Disabled - - Autoscaling: Disabled - - QoS Support: Enabled (Interactive/Responsive/Batch) - - Trace-Driven (BurstGPT): Disabled - - ShareGPT Dataset: Enabled -================================================================================ - -ShareGPT Dataset Statistics: - Conversations: 319 - Total Turns: 981 - -Starting benchmark... --------------------------------------------------------------------------------- - -================================================================================ -BENCHMARK RESULTS - MLPerf KV Cache Storage Benchmark -Generation Mode: none (0.0ms/token) -================================================================================ - -### STORAGE PERFORMANCE ASSESSMENT: PASS ✓ ### - Criteria Passed: 1/1 - ✓ Cache Hit Rate > 30%: 93.0% (target: 30.0%) - -### OVERALL PERFORMANCE ### -Requests Completed: 549 -Total Tokens Generated: 148262 -Throughput: 2866.76 tokens/sec -Requests/sec: 10.62 - -### END-TO-END LATENCY (Storage I/O + Token Generation) ### - Mean: 25762.78 ms - P50: 26422.68 ms - P95: 52627.93 ms - P99: 52735.04 ms - -### STORAGE I/O LATENCY (Primary Metric) ### - Mean: 179.86 ms - P50: 110.92 ms - P95: 494.66 ms - P99: 1180.91 ms - -### STORAGE PERFORMANCE ### - Cache Hit Rate: 93.0% - Total Read: 96.24 GB - Total Write: 7.37 GB - Read/Write Ratio: 13.05 - Read IOPS: 91.95 - Write IOPS: 7.25 - -### CACHE TIER DISTRIBUTION ### - GPU Entries: 435 (7.37 GB) - CPU Entries: 0 (0.00 GB) - NVMe Entries: 0 - -### PHASE-SPECIFIC METRICS ### - Prefill Writes: 435 - Prefill Bytes Written: 7.37 GB - Decode Reads: 5517 - Decode Bytes Read: 96.24 GB - -### TIER-SPECIFIC LATENCIES ### - GPU Read P95: 29.39 ms - GPU Write P95: 129.04 ms - -### CACHE TYPE BREAKDOWNS ### - System Prompt Hits: 968 - Common Phrase Hits: 0 - User Cache Hits: 4471 - Multi-turn Hits: 78 - -### PREFIX CACHING ### - Prefix Hits: 100 - Prefix Misses: 449 - System Prompt Reuse: 100 - Bytes Saved: 0.08 GB - -### MULTI-TURN CONVERSATIONS ### - Multi-turn Cache Hits: 78 - Multi-turn Cache Misses: 314 - Multi-turn Hit Rate: 19.9% - -### QOS LATENCY METRICS (Informational - includes simulated generation) ### - - INTERACTIVE: - Requests: 549 - Latency P95: 52627.93 ms - Latency P99: 52735.04 ms - SLA Met: ✗ (compliance: 0.0%) - -================================================================================ -NOTES: - - Pure storage I/O benchmark (no generation simulation) -================================================================================ - -Results saved to lmcache_results_20260106_233959/kvcache_gpu_only_trial2.json diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_only_trial3.log b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_only_trial3.log deleted file mode 100644 index e4079efb..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_gpu_only_trial3.log +++ /dev/null @@ -1,111 +0,0 @@ -Using random seed: 42 -[ShareGPT] Loaded 319 conversations with 981 turns -[ShareGPT] Context tokens: mean=122.4, p50=17.0, p95=677.0 -[ShareGPT] Generation tokens: mean=261.9, p50=244.0, p95=633.0 -[KVCacheGenerator] Pre-generating 256 MB noise buffer... - -Integrated Multi-User KV Cache Benchmark - MLPerf Edition -Model: Mistral 7B -Users: 50 -Duration: 60s -Seed: 42 -Generation Mode: none (0.0ms/token) -Features: - - Phase-Aware Processing: Enabled - - Multi-turn Conversations: Enabled - - Prefix Caching: Enabled - - RAG Workload: Disabled - - Autoscaling: Disabled - - QoS Support: Enabled (Interactive/Responsive/Batch) - - Trace-Driven (BurstGPT): Disabled - - ShareGPT Dataset: Enabled -================================================================================ - -ShareGPT Dataset Statistics: - Conversations: 319 - Total Turns: 981 - -Starting benchmark... --------------------------------------------------------------------------------- - -================================================================================ -BENCHMARK RESULTS - MLPerf KV Cache Storage Benchmark -Generation Mode: none (0.0ms/token) -================================================================================ - -### STORAGE PERFORMANCE ASSESSMENT: PASS ✓ ### - Criteria Passed: 1/1 - ✓ Cache Hit Rate > 30%: 93.1% (target: 30.0%) - -### OVERALL PERFORMANCE ### -Requests Completed: 549 -Total Tokens Generated: 147313 -Throughput: 2830.75 tokens/sec -Requests/sec: 10.55 - -### END-TO-END LATENCY (Storage I/O + Token Generation) ### - Mean: 26292.02 ms - P50: 27254.53 ms - P95: 52887.23 ms - P99: 52959.10 ms - -### STORAGE I/O LATENCY (Primary Metric) ### - Mean: 142.72 ms - P50: 108.71 ms - P95: 351.78 ms - P99: 630.61 ms - -### STORAGE PERFORMANCE ### - Cache Hit Rate: 93.1% - Total Read: 91.96 GB - Total Write: 7.37 GB - Read/Write Ratio: 12.49 - Read IOPS: 91.25 - Write IOPS: 7.48 - -### CACHE TIER DISTRIBUTION ### - GPU Entries: 449 (7.37 GB) - CPU Entries: 0 (0.00 GB) - NVMe Entries: 0 - -### PHASE-SPECIFIC METRICS ### - Prefill Writes: 449 - Prefill Bytes Written: 7.37 GB - Decode Reads: 5475 - Decode Bytes Read: 91.96 GB - -### TIER-SPECIFIC LATENCIES ### - GPU Read P95: 25.67 ms - GPU Write P95: 106.78 ms - -### CACHE TYPE BREAKDOWNS ### - System Prompt Hits: 1000 - Common Phrase Hits: 0 - User Cache Hits: 4400 - Multi-turn Hits: 75 - -### PREFIX CACHING ### - Prefix Hits: 109 - Prefix Misses: 440 - System Prompt Reuse: 109 - Bytes Saved: 0.09 GB - -### MULTI-TURN CONVERSATIONS ### - Multi-turn Cache Hits: 75 - Multi-turn Cache Misses: 297 - Multi-turn Hit Rate: 20.2% - -### QOS LATENCY METRICS (Informational - includes simulated generation) ### - - INTERACTIVE: - Requests: 549 - Latency P95: 52887.23 ms - Latency P99: 52959.10 ms - SLA Met: ✗ (compliance: 0.0%) - -================================================================================ -NOTES: - - Pure storage I/O benchmark (no generation simulation) -================================================================================ - -Results saved to lmcache_results_20260106_233959/kvcache_gpu_only_trial3.json diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_nvme_only_trial1.log b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_nvme_only_trial1.log deleted file mode 100644 index 6cd52f08..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_nvme_only_trial1.log +++ /dev/null @@ -1,115 +0,0 @@ -Using random seed: 42 -[ShareGPT] Loaded 319 conversations with 981 turns -[ShareGPT] Context tokens: mean=122.4, p50=17.0, p95=677.0 -[ShareGPT] Generation tokens: mean=261.9, p50=244.0, p95=633.0 -[KVCacheGenerator] Pre-generating 256 MB noise buffer... - -Integrated Multi-User KV Cache Benchmark - MLPerf Edition -Model: Mistral 7B -Users: 50 -Duration: 60s -Seed: 42 -Generation Mode: none (0.0ms/token) -Features: - - Phase-Aware Processing: Enabled - - Multi-turn Conversations: Enabled - - Prefix Caching: Enabled - - RAG Workload: Disabled - - Autoscaling: Disabled - - QoS Support: Enabled (Interactive/Responsive/Batch) - - Trace-Driven (BurstGPT): Disabled - - ShareGPT Dataset: Enabled -================================================================================ - -ShareGPT Dataset Statistics: - Conversations: 319 - Total Turns: 981 - -Starting benchmark... --------------------------------------------------------------------------------- - -================================================================================ -BENCHMARK RESULTS - MLPerf KV Cache Storage Benchmark -Generation Mode: none (0.0ms/token) -================================================================================ - -### STORAGE PERFORMANCE ASSESSMENT: FAIL ✗ ### - Criteria Passed: 2/3 - ✓ NVMe Write P95 < 500ms: 189.36ms (target: 500.00ms) - ✗ NVMe Read P95 < 200ms: 293.53ms (target: 200.00ms) - ✓ Cache Hit Rate > 30%: 93.3% (target: 30.0%) - -### OVERALL PERFORMANCE ### -Requests Completed: 549 -Total Tokens Generated: 147313 -Throughput: 13546.83 tokens/sec -Requests/sec: 50.49 - -### END-TO-END LATENCY (Storage I/O + Token Generation) ### - Mean: 6190.35 ms - P50: 5643.79 ms - P95: 11910.14 ms - P99: 17338.80 ms - -### STORAGE I/O LATENCY (Primary Metric) ### - Mean: 1007.76 ms - P50: 609.32 ms - P95: 2799.55 ms - P99: 8767.97 ms - -### STORAGE PERFORMANCE ### - Cache Hit Rate: 93.3% - Total Read: 91.89 GB - Total Write: 7.37 GB - Read/Write Ratio: 12.46 - Read IOPS: 91.23 - Write IOPS: 7.48 - -### CACHE TIER DISTRIBUTION ### - GPU Entries: 15 (0.00 GB) - CPU Entries: 0 (0.00 GB) - NVMe Entries: 434 - -### PHASE-SPECIFIC METRICS ### - Prefill Writes: 449 - Prefill Bytes Written: 7.37 GB - Decode Reads: 5474 - Decode Bytes Read: 91.89 GB - -### TIER-SPECIFIC LATENCIES ### - GPU Read P95: 34.32 ms - GPU Write P95: 67.96 ms - NVME Read P95: 358.22 ms - NVME Write P95: 303.18 ms - -### CACHE TYPE BREAKDOWNS ### - System Prompt Hits: 958 - Common Phrase Hits: 0 - User Cache Hits: 4442 - Multi-turn Hits: 74 - -### PREFIX CACHING ### - Prefix Hits: 98 - Prefix Misses: 451 - System Prompt Reuse: 98 - Bytes Saved: 0.08 GB - -### MULTI-TURN CONVERSATIONS ### - Multi-turn Cache Hits: 74 - Multi-turn Cache Misses: 298 - Multi-turn Hit Rate: 19.9% - -### QOS LATENCY METRICS (Informational - includes simulated generation) ### - - INTERACTIVE: - Requests: 549 - Latency P95: 11910.14 ms - Latency P99: 17338.80 ms - SLA Met: ✗ (compliance: 0.0%) - -================================================================================ -NOTES: - - Pure storage I/O benchmark (no generation simulation) -================================================================================ - -Results saved to lmcache_results_20260106_233959/kvcache_nvme_only_trial1.json diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_nvme_only_trial2.log b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_nvme_only_trial2.log deleted file mode 100644 index c7fd68ae..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_nvme_only_trial2.log +++ /dev/null @@ -1,115 +0,0 @@ -Using random seed: 42 -[ShareGPT] Loaded 319 conversations with 981 turns -[ShareGPT] Context tokens: mean=122.4, p50=17.0, p95=677.0 -[ShareGPT] Generation tokens: mean=261.9, p50=244.0, p95=633.0 -[KVCacheGenerator] Pre-generating 256 MB noise buffer... - -Integrated Multi-User KV Cache Benchmark - MLPerf Edition -Model: Mistral 7B -Users: 50 -Duration: 60s -Seed: 42 -Generation Mode: none (0.0ms/token) -Features: - - Phase-Aware Processing: Enabled - - Multi-turn Conversations: Enabled - - Prefix Caching: Enabled - - RAG Workload: Disabled - - Autoscaling: Disabled - - QoS Support: Enabled (Interactive/Responsive/Batch) - - Trace-Driven (BurstGPT): Disabled - - ShareGPT Dataset: Enabled -================================================================================ - -ShareGPT Dataset Statistics: - Conversations: 319 - Total Turns: 981 - -Starting benchmark... --------------------------------------------------------------------------------- - -================================================================================ -BENCHMARK RESULTS - MLPerf KV Cache Storage Benchmark -Generation Mode: none (0.0ms/token) -================================================================================ - -### STORAGE PERFORMANCE ASSESSMENT: FAIL ✗ ### - Criteria Passed: 2/3 - ✓ NVMe Write P95 < 500ms: 189.21ms (target: 500.00ms) - ✗ NVMe Read P95 < 200ms: 329.54ms (target: 200.00ms) - ✓ Cache Hit Rate > 30%: 93.0% (target: 30.0%) - -### OVERALL PERFORMANCE ### -Requests Completed: 549 -Total Tokens Generated: 146625 -Throughput: 13062.70 tokens/sec -Requests/sec: 48.91 - -### END-TO-END LATENCY (Storage I/O + Token Generation) ### - Mean: 5859.82 ms - P50: 5234.95 ms - P95: 11917.28 ms - P99: 17629.32 ms - -### STORAGE I/O LATENCY (Primary Metric) ### - Mean: 1024.16 ms - P50: 629.76 ms - P95: 2956.78 ms - P99: 9674.22 ms - -### STORAGE PERFORMANCE ### - Cache Hit Rate: 93.0% - Total Read: 92.99 GB - Total Write: 7.57 GB - Read/Write Ratio: 12.28 - Read IOPS: 90.90 - Write IOPS: 7.47 - -### CACHE TIER DISTRIBUTION ### - GPU Entries: 22 (0.00 GB) - CPU Entries: 0 (0.00 GB) - NVMe Entries: 426 - -### PHASE-SPECIFIC METRICS ### - Prefill Writes: 448 - Prefill Bytes Written: 7.57 GB - Decode Reads: 5454 - Decode Bytes Read: 92.99 GB - -### TIER-SPECIFIC LATENCIES ### - GPU Read P95: 31.55 ms - GPU Write P95: 37.98 ms - NVME Read P95: 395.60 ms - NVME Write P95: 262.89 ms - -### CACHE TYPE BREAKDOWNS ### - System Prompt Hits: 1261 - Common Phrase Hits: 0 - User Cache Hits: 4117 - Multi-turn Hits: 76 - -### PREFIX CACHING ### - Prefix Hits: 117 - Prefix Misses: 432 - System Prompt Reuse: 117 - Bytes Saved: 0.09 GB - -### MULTI-TURN CONVERSATIONS ### - Multi-turn Cache Hits: 76 - Multi-turn Cache Misses: 295 - Multi-turn Hit Rate: 20.5% - -### QOS LATENCY METRICS (Informational - includes simulated generation) ### - - INTERACTIVE: - Requests: 549 - Latency P95: 11917.28 ms - Latency P99: 17629.32 ms - SLA Met: ✗ (compliance: 0.0%) - -================================================================================ -NOTES: - - Pure storage I/O benchmark (no generation simulation) -================================================================================ - -Results saved to lmcache_results_20260106_233959/kvcache_nvme_only_trial2.json diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_nvme_only_trial3.log b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_nvme_only_trial3.log deleted file mode 100644 index b461807f..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/kvcache_nvme_only_trial3.log +++ /dev/null @@ -1,115 +0,0 @@ -Using random seed: 42 -[ShareGPT] Loaded 319 conversations with 981 turns -[ShareGPT] Context tokens: mean=122.4, p50=17.0, p95=677.0 -[ShareGPT] Generation tokens: mean=261.9, p50=244.0, p95=633.0 -[KVCacheGenerator] Pre-generating 256 MB noise buffer... - -Integrated Multi-User KV Cache Benchmark - MLPerf Edition -Model: Mistral 7B -Users: 50 -Duration: 60s -Seed: 42 -Generation Mode: none (0.0ms/token) -Features: - - Phase-Aware Processing: Enabled - - Multi-turn Conversations: Enabled - - Prefix Caching: Enabled - - RAG Workload: Disabled - - Autoscaling: Disabled - - QoS Support: Enabled (Interactive/Responsive/Batch) - - Trace-Driven (BurstGPT): Disabled - - ShareGPT Dataset: Enabled -================================================================================ - -ShareGPT Dataset Statistics: - Conversations: 319 - Total Turns: 981 - -Starting benchmark... --------------------------------------------------------------------------------- - -================================================================================ -BENCHMARK RESULTS - MLPerf KV Cache Storage Benchmark -Generation Mode: none (0.0ms/token) -================================================================================ - -### STORAGE PERFORMANCE ASSESSMENT: FAIL ✗ ### - Criteria Passed: 2/3 - ✓ NVMe Write P95 < 500ms: 184.89ms (target: 500.00ms) - ✗ NVMe Read P95 < 200ms: 305.02ms (target: 200.00ms) - ✓ Cache Hit Rate > 30%: 93.0% (target: 30.0%) - -### OVERALL PERFORMANCE ### -Requests Completed: 548 -Total Tokens Generated: 146684 -Throughput: 13017.94 tokens/sec -Requests/sec: 48.63 - -### END-TO-END LATENCY (Storage I/O + Token Generation) ### - Mean: 6140.04 ms - P50: 5739.56 ms - P95: 12094.62 ms - P99: 17507.81 ms - -### STORAGE I/O LATENCY (Primary Metric) ### - Mean: 1022.96 ms - P50: 643.91 ms - P95: 2777.10 ms - P99: 9064.01 ms - -### STORAGE PERFORMANCE ### - Cache Hit Rate: 93.0% - Total Read: 91.29 GB - Total Write: 7.36 GB - Read/Write Ratio: 12.40 - Read IOPS: 91.22 - Write IOPS: 7.48 - -### CACHE TIER DISTRIBUTION ### - GPU Entries: 16 (0.00 GB) - CPU Entries: 0 (0.00 GB) - NVMe Entries: 433 - -### PHASE-SPECIFIC METRICS ### - Prefill Writes: 449 - Prefill Bytes Written: 7.36 GB - Decode Reads: 5473 - Decode Bytes Read: 91.29 GB - -### TIER-SPECIFIC LATENCIES ### - GPU Read P95: 41.00 ms - GPU Write P95: 215.79 ms - NVME Read P95: 369.21 ms - NVME Write P95: 321.08 ms - -### CACHE TYPE BREAKDOWNS ### - System Prompt Hits: 1160 - Common Phrase Hits: 0 - User Cache Hits: 4238 - Multi-turn Hits: 75 - -### PREFIX CACHING ### - Prefix Hits: 112 - Prefix Misses: 437 - System Prompt Reuse: 112 - Bytes Saved: 0.09 GB - -### MULTI-TURN CONVERSATIONS ### - Multi-turn Cache Hits: 75 - Multi-turn Cache Misses: 297 - Multi-turn Hit Rate: 20.2% - -### QOS LATENCY METRICS (Informational - includes simulated generation) ### - - INTERACTIVE: - Requests: 548 - Latency P95: 12094.62 ms - Latency P99: 17507.81 ms - SLA Met: ✗ (compliance: 0.0%) - -================================================================================ -NOTES: - - Pure storage I/O benchmark (no generation simulation) -================================================================================ - -Results saved to lmcache_results_20260106_233959/kvcache_nvme_only_trial3.json diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/lmcache_cpu_offload_trial1.log b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/lmcache_cpu_offload_trial1.log deleted file mode 100644 index b4e1619f..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/logs/lmcache_cpu_offload_trial1.log +++ /dev/null @@ -1,759 +0,0 @@ -Loaded 492 prompts -INFO 01-06 23:45:51 [utils.py:253] non-default args: {'trust_remote_code': True, 'gpu_memory_utilization': 0.8, 'disable_log_stats': True, 'kv_transfer_config': KVTransferConfig(kv_connector='LMCacheConnectorV1', engine_id='c242eabe-278c-4795-a499-986f6277a0aa', kv_buffer_device='cuda', kv_buffer_size=1000000000.0, kv_role='kv_both', kv_rank=None, kv_parallel_size=1, kv_ip='127.0.0.1', kv_port=14579, kv_connector_extra_config={}, kv_connector_module_path=None, enable_permute_local_kv=False, kv_load_failure_policy='recompute'), 'model': 'mistralai/Mistral-7B-Instruct-v0.2'} -The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. -INFO 01-06 23:45:52 [model.py:514] Resolved architecture: MistralForCausalLM -INFO 01-06 23:45:52 [model.py:1661] Using max model len 32768 -INFO 01-06 23:45:52 [scheduler.py:230] Chunked prefill is enabled with max_num_batched_tokens=16384. -WARNING 01-06 23:45:52 [vllm.py:932] Turning off hybrid kv cache manager because `--kv-transfer-config` is set. This will reduce the performance of vLLM on LLMs with sliding window attention or Mamba attention. If you are a developer of kv connector, please consider supporting hybrid kv cache manager for your connector by making sure your connector is a subclass of `SupportsHMA` defined in kv_connector/v1/base.py and use --no-disable-hybrid-kv-cache-manager to start vLLM. -(EngineCore_DP0 pid=552566) INFO 01-06 23:45:53 [core.py:93] Initializing a V1 LLM engine (v0.13.0) with config: model='mistralai/Mistral-7B-Instruct-v0.2', speculative_config=None, tokenizer='mistralai/Mistral-7B-Instruct-v0.2', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False), seed=0, served_model_name=mistralai/Mistral-7B-Instruct-v0.2, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': None, 'mode': , 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [16384], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': , 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': True, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': , 'evaluate_guards': False}, 'local_cache_dir': None} -(EngineCore_DP0 pid=552566) INFO 01-06 23:45:53 [parallel_state.py:1203] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://10.10.50.98:35767 backend=nccl -(EngineCore_DP0 pid=552566) INFO 01-06 23:45:53 [parallel_state.py:1411] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank 0 -(EngineCore_DP0 pid=552566) INFO 01-06 23:45:54 [gpu_model_runner.py:3562] Starting to load model mistralai/Mistral-7B-Instruct-v0.2... -(EngineCore_DP0 pid=552566) /home/sped/Downloads/storage/kv_cache_benchmark/venv/lib/python3.10/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:174: UserWarning: Failed to JIT torch c dlpack extension, EnvTensorAllocator will not be enabled. -(EngineCore_DP0 pid=552566) We recommend installing via `pip install torch-c-dlpack-ext` -(EngineCore_DP0 pid=552566) warnings.warn( -(EngineCore_DP0 pid=552566) INFO 01-06 23:45:56 [cuda.py:351] Using FLASH_ATTN attention backend out of potential backends: ('FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION') -(EngineCore_DP0 pid=552566) Loading safetensors checkpoint shards: 0% Completed | 0/3 [00:00= PR#20511) (token_database.py:74:lmcache.v1.token_database) -(EngineCore_DP0 pid=552566) [2026-01-06 23:46:09,091] LMCache INFO: Using hash algorithm: builtin (token_database.py:84:lmcache.v1.token_database) -(EngineCore_DP0 pid=552566) [2026-01-06 23:46:09,094] LMCache INFO: Creating LMCacheEngine with config: {'chunk_size': 256, 'local_cpu': True, 'max_local_cpu_size': 32.0, 'reserve_local_cpu_size': 0.0, 'local_disk': None, 'max_local_disk_size': 0.0, 'remote_url': None, 'remote_serde': 'naive', 'use_layerwise': False, 'save_decode_cache': False, 'pre_caching_hash_algorithm': 'builtin', 'enable_blending': False, 'blend_recompute_ratios': None, 'blend_thresholds': None, 'blend_check_layers': None, 'blend_min_tokens': 256, 'blend_special_str': ' # # ', 'enable_p2p': False, 'p2p_host': None, 'p2p_init_ports': None, 'p2p_lookup_ports': None, 'enable_controller': False, 'lmcache_instance_id': None, 'controller_pull_url': None, 'controller_reply_url': None, 'lmcache_worker_ports': None, 'lmcache_worker_ids': None, 'lmcache_worker_heartbeat_delay_time': 10, 'lmcache_worker_heartbeat_time': None, 'enable_pd': False, 'pd_role': None, 'pd_buffer_size': None, 'pd_buffer_device': None, 'pd_peer_host': None, 'pd_peer_init_port': None, 'pd_peer_alloc_port': None, 'pd_proxy_host': None, 'pd_proxy_port': None, 'transfer_channel': None, 'nixl_backends': None, 'nixl_buffer_size': None, 'nixl_buffer_device': None, 'gds_path': None, 'cufile_buffer_size': None, 'audit_actual_remote_url': None, 'internal_api_server_host': '0.0.0.0', 'extra_config': None, 'save_unfull_chunk': False, 'blocking_timeout_secs': 10, 'external_lookup_client': None, 'py_enable_gc': True, 'cache_policy': 'LRU', 'numa_mode': None, 'enable_async_loading': False, 'internal_api_server_enabled': False, 'internal_api_server_port_start': 6999, 'priority_limit': None, 'internal_api_server_include_index_list': None, 'internal_api_server_socket_path_prefix': None, 'runtime_plugin_locations': None, 'storage_plugins': None, 'lookup_timeout_ms': 3000, 'hit_miss_ratio': None, 'lookup_server_worker_ids': None, 'enable_scheduler_bypass_lookup': False, 'script_allowed_imports': None, 'enable_lazy_memory_allocator': False, 'lazy_memory_initial_ratio': 0.2, 'lazy_memory_expand_trigger_ratio': 0.5, 'lazy_memory_step_ratio': 0.1, 'lazy_memory_safe_size': 0.0, 'enable_chunk_statistics': False, 'chunk_statistics_auto_start_statistics': False, 'chunk_statistics_auto_exit_timeout_hours': 0.0, 'chunk_statistics_auto_exit_target_unique_chunks': 0, 'chunk_statistics_strategy': 'memory_bloom_filter', 'enable_kv_events': False, 'use_gpu_connector_v3': False, 'pin_timeout_sec': 300, 'pin_check_interval_sec': 30} (cache_engine.py:101:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=552566) [2026-01-06 23:46:09,094] LMCache INFO: LMCacheWorker is not initialized (related configs: enable_controller: False, role: worker, worker_id: 0, worker_ids: [0]). (cache_engine.py:143:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=552566) [2026-01-06 23:46:09,094] LMCache INFO: KV events are disabled. (cache_engine.py:172:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=552566) [2026-01-06 23:46:09,094] LMCache INFO: Initializing usage context. (usage_context.py:412:lmcache.usage_context) -(EngineCore_DP0 pid=552566) [2026-01-06 23:46:10,398] LMCache INFO: Starting PinMonitor background thread (pin_monitor.py:156:lmcache.v1.pin_monitor) -(EngineCore_DP0 pid=552566) [2026-01-06 23:46:10,398] LMCache INFO: PinMonitor started (pin_monitor.py:176:lmcache.v1.pin_monitor) -(EngineCore_DP0 pid=552566) [2026-01-06 23:46:10,847] LMCache INFO: lmcache lookup server start on /tmp/engine_c242eabe-278c-4795-a499-986f6277a0aa_service_lookup_lmcache_rpc_port_0 (lmcache_lookup_client.py:357:lmcache.v1.lookup_client.lmcache_lookup_client) -(EngineCore_DP0 pid=552566) [2026-01-06 23:46:10,398] LMCache INFO: PinMonitor check: pinned_objects=0, timeout_objects=0, force_unpin_success=0 (pin_monitor.py:121:lmcache.v1.pin_monitor) -(EngineCore_DP0 pid=552566) [2026-01-06 23:46:10,849] LMCache WARNING: Please use the latest lmcache connector, otherwise some features may not work, such as DSA (vllm_v1_adapter.py:767:lmcache.integration.vllm.vllm_v1_adapter) -(EngineCore_DP0 pid=552566) [2026-01-06 23:46:10,849] LMCache INFO: Post initializing LMCacheEngine (cache_engine.py:221:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=552566) [2026-01-06 23:46:10,849] LMCache INFO: Initialize storage manager on rank 0, use layerwise: False,save only first rank: False (cache_engine.py:233:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=552566) [2026-01-06 23:46:10,884] LMCache INFO: Initializing LRUCachePolicy (lru.py:22:lmcache.v1.storage_backend.cache_policy.lru) -(EngineCore_DP0 pid=552566) [2026-01-06 23:46:10,884] LMCache INFO: NUMA mapping None (local_cpu_backend.py:350:lmcache.v1.storage_backend.local_cpu_backend) -(EngineCore_DP0 pid=552566) [2026-01-06 23:46:19,844] LMCache WARNING: Controller message sender is not initialized (local_cpu_backend.py:102:lmcache.v1.storage_backend.local_cpu_backend) -(EngineCore_DP0 pid=552566) [2026-01-06 23:46:19,844] LMCache INFO: Internal API server disabled. internal_api_server_enabled=False, port_offset=1, port=7000, socket_path=None, include_index_list=None (api_server.py:50:lmcache.v1.internal_api_server.api_server) -(EngineCore_DP0 pid=552566) [2026-01-06 23:46:19,844] LMCache INFO: LMCache initialized for role KVConnectorRole.WORKER with version 0.3.12-g78697950e, vllm version 0.13.0, lmcache cache_engine metadata: LMCacheEngineMetadata(model_name='mistralai/Mistral-7B-Instruct-v0.2', world_size=1, worker_id=0, fmt='vllm', kv_dtype=torch.bfloat16, kv_shape=(32, 2, 256, 8, 128), use_mla=False, role='worker', served_model_name='mistralai/Mistral-7B-Instruct-v0.2', chunk_size=256, kv_layer_groups_manager=KVLayerGroupsManager(kv_layer_groups=[])) (vllm_v1_adapter.py:840:lmcache.integration.vllm.vllm_v1_adapter) -(EngineCore_DP0 pid=552566) INFO 01-06 23:46:19 [utils.py:35] Connectors do not specify a kv cache layout, defaulting to NHD. -(EngineCore_DP0 pid=552566) Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 0%| | 0/51 [00:00\xba\xcf\xe0R\xcbs\xa9\x17\x81\x14" from vLLM (>= PR#20511) (token_database.py:74:lmcache.v1.token_database) -(EngineCore_DP0 pid=552566) [2026-01-06 23:46:23,629] LMCache INFO: Using hash algorithm: builtin (token_database.py:84:lmcache.v1.token_database) -(EngineCore_DP0 pid=552566) [2026-01-06 23:46:23,629] LMCache INFO: Internal API server disabled. internal_api_server_enabled=False, port_offset=0, port=6999, socket_path=None, include_index_list=None (api_server.py:50:lmcache.v1.internal_api_server.api_server) -(EngineCore_DP0 pid=552566) [2026-01-06 23:46:23,629] LMCache INFO: LMCache initialized for role KVConnectorRole.SCHEDULER with version 0.3.12-g78697950e, vllm version 0.13.0, lmcache cache_engine metadata: None (vllm_v1_adapter.py:840:lmcache.integration.vllm.vllm_v1_adapter) -INFO 01-06 23:46:24 [llm.py:360] Supported tasks: ['generate'] - Adding requests: 0%| | 0/492 [00:00, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [16384], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': , 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': True, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': , 'evaluate_guards': False}, 'local_cache_dir': None} -(EngineCore_DP0 pid=553149) INFO 01-06 23:46:56 [parallel_state.py:1203] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://10.10.50.98:55861 backend=nccl -(EngineCore_DP0 pid=553149) INFO 01-06 23:46:56 [parallel_state.py:1411] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank 0 -(EngineCore_DP0 pid=553149) INFO 01-06 23:46:57 [gpu_model_runner.py:3562] Starting to load model mistralai/Mistral-7B-Instruct-v0.2... -(EngineCore_DP0 pid=553149) /home/sped/Downloads/storage/kv_cache_benchmark/venv/lib/python3.10/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:174: UserWarning: Failed to JIT torch c dlpack extension, EnvTensorAllocator will not be enabled. -(EngineCore_DP0 pid=553149) We recommend installing via `pip install torch-c-dlpack-ext` -(EngineCore_DP0 pid=553149) warnings.warn( -(EngineCore_DP0 pid=553149) INFO 01-06 23:46:59 [cuda.py:351] Using FLASH_ATTN attention backend out of potential backends: ('FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION') -(EngineCore_DP0 pid=553149) Loading safetensors checkpoint shards: 0% Completed | 0/3 [00:00= PR#20511) (token_database.py:74:lmcache.v1.token_database) -(EngineCore_DP0 pid=553149) [2026-01-06 23:47:11,799] LMCache INFO: Using hash algorithm: builtin (token_database.py:84:lmcache.v1.token_database) -(EngineCore_DP0 pid=553149) [2026-01-06 23:47:11,801] LMCache INFO: Creating LMCacheEngine with config: {'chunk_size': 256, 'local_cpu': True, 'max_local_cpu_size': 32.0, 'reserve_local_cpu_size': 0.0, 'local_disk': None, 'max_local_disk_size': 0.0, 'remote_url': None, 'remote_serde': 'naive', 'use_layerwise': False, 'save_decode_cache': False, 'pre_caching_hash_algorithm': 'builtin', 'enable_blending': False, 'blend_recompute_ratios': None, 'blend_thresholds': None, 'blend_check_layers': None, 'blend_min_tokens': 256, 'blend_special_str': ' # # ', 'enable_p2p': False, 'p2p_host': None, 'p2p_init_ports': None, 'p2p_lookup_ports': None, 'enable_controller': False, 'lmcache_instance_id': None, 'controller_pull_url': None, 'controller_reply_url': None, 'lmcache_worker_ports': None, 'lmcache_worker_ids': None, 'lmcache_worker_heartbeat_delay_time': 10, 'lmcache_worker_heartbeat_time': None, 'enable_pd': False, 'pd_role': None, 'pd_buffer_size': None, 'pd_buffer_device': None, 'pd_peer_host': None, 'pd_peer_init_port': None, 'pd_peer_alloc_port': None, 'pd_proxy_host': None, 'pd_proxy_port': None, 'transfer_channel': None, 'nixl_backends': None, 'nixl_buffer_size': None, 'nixl_buffer_device': None, 'gds_path': None, 'cufile_buffer_size': None, 'audit_actual_remote_url': None, 'internal_api_server_host': '0.0.0.0', 'extra_config': None, 'save_unfull_chunk': False, 'blocking_timeout_secs': 10, 'external_lookup_client': None, 'py_enable_gc': True, 'cache_policy': 'LRU', 'numa_mode': None, 'enable_async_loading': False, 'internal_api_server_enabled': False, 'internal_api_server_port_start': 6999, 'priority_limit': None, 'internal_api_server_include_index_list': None, 'internal_api_server_socket_path_prefix': None, 'runtime_plugin_locations': None, 'storage_plugins': None, 'lookup_timeout_ms': 3000, 'hit_miss_ratio': None, 'lookup_server_worker_ids': None, 'enable_scheduler_bypass_lookup': False, 'script_allowed_imports': None, 'enable_lazy_memory_allocator': False, 'lazy_memory_initial_ratio': 0.2, 'lazy_memory_expand_trigger_ratio': 0.5, 'lazy_memory_step_ratio': 0.1, 'lazy_memory_safe_size': 0.0, 'enable_chunk_statistics': False, 'chunk_statistics_auto_start_statistics': False, 'chunk_statistics_auto_exit_timeout_hours': 0.0, 'chunk_statistics_auto_exit_target_unique_chunks': 0, 'chunk_statistics_strategy': 'memory_bloom_filter', 'enable_kv_events': False, 'use_gpu_connector_v3': False, 'pin_timeout_sec': 300, 'pin_check_interval_sec': 30} (cache_engine.py:101:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=553149) [2026-01-06 23:47:11,802] LMCache INFO: LMCacheWorker is not initialized (related configs: enable_controller: False, role: worker, worker_id: 0, worker_ids: [0]). (cache_engine.py:143:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=553149) [2026-01-06 23:47:11,802] LMCache INFO: KV events are disabled. (cache_engine.py:172:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=553149) [2026-01-06 23:47:11,802] LMCache INFO: Initializing usage context. (usage_context.py:412:lmcache.usage_context) -(EngineCore_DP0 pid=553149) [2026-01-06 23:47:13,025] LMCache INFO: Starting PinMonitor background thread (pin_monitor.py:156:lmcache.v1.pin_monitor) -(EngineCore_DP0 pid=553149) [2026-01-06 23:47:13,025] LMCache INFO: PinMonitor started (pin_monitor.py:176:lmcache.v1.pin_monitor) -(EngineCore_DP0 pid=553149) [2026-01-06 23:47:13,025] LMCache INFO: PinMonitor check: pinned_objects=0, timeout_objects=0, force_unpin_success=0 (pin_monitor.py:121:lmcache.v1.pin_monitor) -(EngineCore_DP0 pid=553149) [2026-01-06 23:47:13,397] LMCache INFO: lmcache lookup server start on /tmp/engine_7f248456-2ccb-496b-b4c8-14275dbab1c5_service_lookup_lmcache_rpc_port_0 (lmcache_lookup_client.py:357:lmcache.v1.lookup_client.lmcache_lookup_client) -(EngineCore_DP0 pid=553149) [2026-01-06 23:47:13,399] LMCache WARNING: Please use the latest lmcache connector, otherwise some features may not work, such as DSA (vllm_v1_adapter.py:767:lmcache.integration.vllm.vllm_v1_adapter) -(EngineCore_DP0 pid=553149) [2026-01-06 23:47:13,399] LMCache INFO: Post initializing LMCacheEngine (cache_engine.py:221:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=553149) [2026-01-06 23:47:13,399] LMCache INFO: Initialize storage manager on rank 0, use layerwise: False,save only first rank: False (cache_engine.py:233:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=553149) [2026-01-06 23:47:13,428] LMCache INFO: Initializing LRUCachePolicy (lru.py:22:lmcache.v1.storage_backend.cache_policy.lru) -(EngineCore_DP0 pid=553149) [2026-01-06 23:47:13,428] LMCache INFO: NUMA mapping None (local_cpu_backend.py:350:lmcache.v1.storage_backend.local_cpu_backend) -(EngineCore_DP0 pid=553149) [2026-01-06 23:47:22,338] LMCache WARNING: Controller message sender is not initialized (local_cpu_backend.py:102:lmcache.v1.storage_backend.local_cpu_backend) -(EngineCore_DP0 pid=553149) [2026-01-06 23:47:22,339] LMCache INFO: Internal API server disabled. internal_api_server_enabled=False, port_offset=1, port=7000, socket_path=None, include_index_list=None (api_server.py:50:lmcache.v1.internal_api_server.api_server) -(EngineCore_DP0 pid=553149) [2026-01-06 23:47:22,339] LMCache INFO: LMCache initialized for role KVConnectorRole.WORKER with version 0.3.12-g78697950e, vllm version 0.13.0, lmcache cache_engine metadata: LMCacheEngineMetadata(model_name='mistralai/Mistral-7B-Instruct-v0.2', world_size=1, worker_id=0, fmt='vllm', kv_dtype=torch.bfloat16, kv_shape=(32, 2, 256, 8, 128), use_mla=False, role='worker', served_model_name='mistralai/Mistral-7B-Instruct-v0.2', chunk_size=256, kv_layer_groups_manager=KVLayerGroupsManager(kv_layer_groups=[])) (vllm_v1_adapter.py:840:lmcache.integration.vllm.vllm_v1_adapter) -(EngineCore_DP0 pid=553149) INFO 01-06 23:47:22 [utils.py:35] Connectors do not specify a kv cache layout, defaulting to NHD. -(EngineCore_DP0 pid=553149) Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 0%| | 0/51 [00:00= PR#20511) (token_database.py:74:lmcache.v1.token_database) -(EngineCore_DP0 pid=553149) [2026-01-06 23:47:26,218] LMCache INFO: Using hash algorithm: builtin (token_database.py:84:lmcache.v1.token_database) -(EngineCore_DP0 pid=553149) [2026-01-06 23:47:26,218] LMCache INFO: Internal API server disabled. internal_api_server_enabled=False, port_offset=0, port=6999, socket_path=None, include_index_list=None (api_server.py:50:lmcache.v1.internal_api_server.api_server) -(EngineCore_DP0 pid=553149) [2026-01-06 23:47:26,218] LMCache INFO: LMCache initialized for role KVConnectorRole.SCHEDULER with version 0.3.12-g78697950e, vllm version 0.13.0, lmcache cache_engine metadata: None (vllm_v1_adapter.py:840:lmcache.integration.vllm.vllm_v1_adapter) -INFO 01-06 23:47:26 [llm.py:360] Supported tasks: ['generate'] - Adding requests: 0%| | 0/492 [00:00, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [16384], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': , 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': True, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': , 'evaluate_guards': False}, 'local_cache_dir': None} -(EngineCore_DP0 pid=553748) INFO 01-06 23:47:59 [parallel_state.py:1203] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://10.10.50.98:41443 backend=nccl -(EngineCore_DP0 pid=553748) INFO 01-06 23:47:59 [parallel_state.py:1411] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank 0 -(EngineCore_DP0 pid=553748) INFO 01-06 23:47:59 [gpu_model_runner.py:3562] Starting to load model mistralai/Mistral-7B-Instruct-v0.2... -(EngineCore_DP0 pid=553748) /home/sped/Downloads/storage/kv_cache_benchmark/venv/lib/python3.10/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:174: UserWarning: Failed to JIT torch c dlpack extension, EnvTensorAllocator will not be enabled. -(EngineCore_DP0 pid=553748) We recommend installing via `pip install torch-c-dlpack-ext` -(EngineCore_DP0 pid=553748) warnings.warn( -(EngineCore_DP0 pid=553748) INFO 01-06 23:48:01 [cuda.py:351] Using FLASH_ATTN attention backend out of potential backends: ('FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION') -(EngineCore_DP0 pid=553748) Loading safetensors checkpoint shards: 0% Completed | 0/3 [00:00= PR#20511) (token_database.py:74:lmcache.v1.token_database) -(EngineCore_DP0 pid=553748) [2026-01-06 23:48:14,319] LMCache INFO: Using hash algorithm: builtin (token_database.py:84:lmcache.v1.token_database) -(EngineCore_DP0 pid=553748) [2026-01-06 23:48:14,322] LMCache INFO: Creating LMCacheEngine with config: {'chunk_size': 256, 'local_cpu': True, 'max_local_cpu_size': 32.0, 'reserve_local_cpu_size': 0.0, 'local_disk': None, 'max_local_disk_size': 0.0, 'remote_url': None, 'remote_serde': 'naive', 'use_layerwise': False, 'save_decode_cache': False, 'pre_caching_hash_algorithm': 'builtin', 'enable_blending': False, 'blend_recompute_ratios': None, 'blend_thresholds': None, 'blend_check_layers': None, 'blend_min_tokens': 256, 'blend_special_str': ' # # ', 'enable_p2p': False, 'p2p_host': None, 'p2p_init_ports': None, 'p2p_lookup_ports': None, 'enable_controller': False, 'lmcache_instance_id': None, 'controller_pull_url': None, 'controller_reply_url': None, 'lmcache_worker_ports': None, 'lmcache_worker_ids': None, 'lmcache_worker_heartbeat_delay_time': 10, 'lmcache_worker_heartbeat_time': None, 'enable_pd': False, 'pd_role': None, 'pd_buffer_size': None, 'pd_buffer_device': None, 'pd_peer_host': None, 'pd_peer_init_port': None, 'pd_peer_alloc_port': None, 'pd_proxy_host': None, 'pd_proxy_port': None, 'transfer_channel': None, 'nixl_backends': None, 'nixl_buffer_size': None, 'nixl_buffer_device': None, 'gds_path': None, 'cufile_buffer_size': None, 'audit_actual_remote_url': None, 'internal_api_server_host': '0.0.0.0', 'extra_config': None, 'save_unfull_chunk': False, 'blocking_timeout_secs': 10, 'external_lookup_client': None, 'py_enable_gc': True, 'cache_policy': 'LRU', 'numa_mode': None, 'enable_async_loading': False, 'internal_api_server_enabled': False, 'internal_api_server_port_start': 6999, 'priority_limit': None, 'internal_api_server_include_index_list': None, 'internal_api_server_socket_path_prefix': None, 'runtime_plugin_locations': None, 'storage_plugins': None, 'lookup_timeout_ms': 3000, 'hit_miss_ratio': None, 'lookup_server_worker_ids': None, 'enable_scheduler_bypass_lookup': False, 'script_allowed_imports': None, 'enable_lazy_memory_allocator': False, 'lazy_memory_initial_ratio': 0.2, 'lazy_memory_expand_trigger_ratio': 0.5, 'lazy_memory_step_ratio': 0.1, 'lazy_memory_safe_size': 0.0, 'enable_chunk_statistics': False, 'chunk_statistics_auto_start_statistics': False, 'chunk_statistics_auto_exit_timeout_hours': 0.0, 'chunk_statistics_auto_exit_target_unique_chunks': 0, 'chunk_statistics_strategy': 'memory_bloom_filter', 'enable_kv_events': False, 'use_gpu_connector_v3': False, 'pin_timeout_sec': 300, 'pin_check_interval_sec': 30} (cache_engine.py:101:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=553748) [2026-01-06 23:48:14,322] LMCache INFO: LMCacheWorker is not initialized (related configs: enable_controller: False, role: worker, worker_id: 0, worker_ids: [0]). (cache_engine.py:143:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=553748) [2026-01-06 23:48:14,322] LMCache INFO: KV events are disabled. (cache_engine.py:172:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=553748) [2026-01-06 23:48:14,322] LMCache INFO: Initializing usage context. (usage_context.py:412:lmcache.usage_context) -(EngineCore_DP0 pid=553748) [2026-01-06 23:48:15,549] LMCache INFO: Starting PinMonitor background thread (pin_monitor.py:156:lmcache.v1.pin_monitor) -(EngineCore_DP0 pid=553748) [2026-01-06 23:48:15,549] LMCache INFO: PinMonitor started (pin_monitor.py:176:lmcache.v1.pin_monitor) -(EngineCore_DP0 pid=553748) [2026-01-06 23:48:15,549] LMCache INFO: PinMonitor check: pinned_objects=0, timeout_objects=0, force_unpin_success=0 (pin_monitor.py:121:lmcache.v1.pin_monitor) -(EngineCore_DP0 pid=553748) [2026-01-06 23:48:15,930] LMCache INFO: lmcache lookup server start on /tmp/engine_ee5da427-e9f9-4c15-8d53-6bd05e463996_service_lookup_lmcache_rpc_port_0 (lmcache_lookup_client.py:357:lmcache.v1.lookup_client.lmcache_lookup_client) -(EngineCore_DP0 pid=553748) [2026-01-06 23:48:15,931] LMCache WARNING: Please use the latest lmcache connector, otherwise some features may not work, such as DSA (vllm_v1_adapter.py:767:lmcache.integration.vllm.vllm_v1_adapter) -(EngineCore_DP0 pid=553748) [2026-01-06 23:48:15,931] LMCache INFO: Post initializing LMCacheEngine (cache_engine.py:221:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=553748) [2026-01-06 23:48:15,932] LMCache INFO: Initialize storage manager on rank 0, use layerwise: False,save only first rank: False (cache_engine.py:233:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=553748) [2026-01-06 23:48:15,960] LMCache INFO: Initializing LRUCachePolicy (lru.py:22:lmcache.v1.storage_backend.cache_policy.lru) -(EngineCore_DP0 pid=553748) [2026-01-06 23:48:15,960] LMCache INFO: NUMA mapping None (local_cpu_backend.py:350:lmcache.v1.storage_backend.local_cpu_backend) -(EngineCore_DP0 pid=553748) [2026-01-06 23:48:24,877] LMCache WARNING: Controller message sender is not initialized (local_cpu_backend.py:102:lmcache.v1.storage_backend.local_cpu_backend) -(EngineCore_DP0 pid=553748) [2026-01-06 23:48:24,878] LMCache INFO: Internal API server disabled. internal_api_server_enabled=False, port_offset=1, port=7000, socket_path=None, include_index_list=None (api_server.py:50:lmcache.v1.internal_api_server.api_server) -(EngineCore_DP0 pid=553748) [2026-01-06 23:48:24,878] LMCache INFO: LMCache initialized for role KVConnectorRole.WORKER with version 0.3.12-g78697950e, vllm version 0.13.0, lmcache cache_engine metadata: LMCacheEngineMetadata(model_name='mistralai/Mistral-7B-Instruct-v0.2', world_size=1, worker_id=0, fmt='vllm', kv_dtype=torch.bfloat16, kv_shape=(32, 2, 256, 8, 128), use_mla=False, role='worker', served_model_name='mistralai/Mistral-7B-Instruct-v0.2', chunk_size=256, kv_layer_groups_manager=KVLayerGroupsManager(kv_layer_groups=[])) (vllm_v1_adapter.py:840:lmcache.integration.vllm.vllm_v1_adapter) -(EngineCore_DP0 pid=553748) INFO 01-06 23:48:24 [utils.py:35] Connectors do not specify a kv cache layout, defaulting to NHD. -(EngineCore_DP0 pid=553748) Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 0%| | 0/51 [00:00= PR#20511) (token_database.py:74:lmcache.v1.token_database) -(EngineCore_DP0 pid=553748) [2026-01-06 23:48:28,628] LMCache INFO: Using hash algorithm: builtin (token_database.py:84:lmcache.v1.token_database) -(EngineCore_DP0 pid=553748) [2026-01-06 23:48:28,628] LMCache INFO: Internal API server disabled. internal_api_server_enabled=False, port_offset=0, port=6999, socket_path=None, include_index_list=None (api_server.py:50:lmcache.v1.internal_api_server.api_server) -(EngineCore_DP0 pid=553748) [2026-01-06 23:48:28,628] LMCache INFO: LMCache initialized for role KVConnectorRole.SCHEDULER with version 0.3.12-g78697950e, vllm version 0.13.0, lmcache cache_engine metadata: None (vllm_v1_adapter.py:840:lmcache.integration.vllm.vllm_v1_adapter) -INFO 01-06 23:48:29 [llm.py:360] Supported tasks: ['generate'] - Adding requests: 0%| | 0/492 [00:00, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [16384], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': , 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': True, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': , 'evaluate_guards': False}, 'local_cache_dir': None} -(EngineCore_DP0 pid=550925) INFO 01-06 23:43:18 [parallel_state.py:1203] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://10.10.50.98:49537 backend=nccl -(EngineCore_DP0 pid=550925) INFO 01-06 23:43:18 [parallel_state.py:1411] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank 0 -(EngineCore_DP0 pid=550925) INFO 01-06 23:43:19 [gpu_model_runner.py:3562] Starting to load model mistralai/Mistral-7B-Instruct-v0.2... -(EngineCore_DP0 pid=550925) /home/sped/Downloads/storage/kv_cache_benchmark/venv/lib/python3.10/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:174: UserWarning: Failed to JIT torch c dlpack extension, EnvTensorAllocator will not be enabled. -(EngineCore_DP0 pid=550925) We recommend installing via `pip install torch-c-dlpack-ext` -(EngineCore_DP0 pid=550925) warnings.warn( -(EngineCore_DP0 pid=550925) INFO 01-06 23:43:21 [cuda.py:351] Using FLASH_ATTN attention backend out of potential backends: ('FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION') -(EngineCore_DP0 pid=550925) Loading safetensors checkpoint shards: 0% Completed | 0/3 [00:00= PR#20511) (token_database.py:74:lmcache.v1.token_database) -(EngineCore_DP0 pid=550925) [2026-01-06 23:43:33,756] LMCache INFO: Using hash algorithm: builtin (token_database.py:84:lmcache.v1.token_database) -(EngineCore_DP0 pid=550925) [2026-01-06 23:43:33,757] LMCache INFO: Creating LMCacheEngine with config: {'chunk_size': 256, 'local_cpu': False, 'max_local_cpu_size': 5.0, 'reserve_local_cpu_size': 0.0, 'local_disk': None, 'max_local_disk_size': 0.0, 'remote_url': None, 'remote_serde': 'naive', 'use_layerwise': False, 'save_decode_cache': False, 'pre_caching_hash_algorithm': 'builtin', 'enable_blending': False, 'blend_recompute_ratios': None, 'blend_thresholds': None, 'blend_check_layers': None, 'blend_min_tokens': 256, 'blend_special_str': ' # # ', 'enable_p2p': False, 'p2p_host': None, 'p2p_init_ports': None, 'p2p_lookup_ports': None, 'enable_controller': False, 'lmcache_instance_id': None, 'controller_pull_url': None, 'controller_reply_url': None, 'lmcache_worker_ports': None, 'lmcache_worker_ids': None, 'lmcache_worker_heartbeat_delay_time': 10, 'lmcache_worker_heartbeat_time': None, 'enable_pd': False, 'pd_role': None, 'pd_buffer_size': None, 'pd_buffer_device': None, 'pd_peer_host': None, 'pd_peer_init_port': None, 'pd_peer_alloc_port': None, 'pd_proxy_host': None, 'pd_proxy_port': None, 'transfer_channel': None, 'nixl_backends': None, 'nixl_buffer_size': None, 'nixl_buffer_device': None, 'gds_path': None, 'cufile_buffer_size': None, 'audit_actual_remote_url': None, 'internal_api_server_host': '0.0.0.0', 'extra_config': None, 'save_unfull_chunk': False, 'blocking_timeout_secs': 10, 'external_lookup_client': None, 'py_enable_gc': True, 'cache_policy': 'LRU', 'numa_mode': None, 'enable_async_loading': False, 'internal_api_server_enabled': False, 'internal_api_server_port_start': 6999, 'priority_limit': None, 'internal_api_server_include_index_list': None, 'internal_api_server_socket_path_prefix': None, 'runtime_plugin_locations': None, 'storage_plugins': None, 'lookup_timeout_ms': 3000, 'hit_miss_ratio': None, 'lookup_server_worker_ids': None, 'enable_scheduler_bypass_lookup': False, 'script_allowed_imports': None, 'enable_lazy_memory_allocator': False, 'lazy_memory_initial_ratio': 0.2, 'lazy_memory_expand_trigger_ratio': 0.5, 'lazy_memory_step_ratio': 0.1, 'lazy_memory_safe_size': 0.0, 'enable_chunk_statistics': False, 'chunk_statistics_auto_start_statistics': False, 'chunk_statistics_auto_exit_timeout_hours': 0.0, 'chunk_statistics_auto_exit_target_unique_chunks': 0, 'chunk_statistics_strategy': 'memory_bloom_filter', 'enable_kv_events': False, 'use_gpu_connector_v3': False, 'pin_timeout_sec': 300, 'pin_check_interval_sec': 30} (cache_engine.py:101:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=550925) [2026-01-06 23:43:33,758] LMCache INFO: LMCacheWorker is not initialized (related configs: enable_controller: False, role: worker, worker_id: 0, worker_ids: [0]). (cache_engine.py:143:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=550925) [2026-01-06 23:43:33,758] LMCache INFO: KV events are disabled. (cache_engine.py:172:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=550925) [2026-01-06 23:43:33,758] LMCache INFO: Initializing usage context. (usage_context.py:412:lmcache.usage_context) -(EngineCore_DP0 pid=550925) [2026-01-06 23:43:34,988] LMCache INFO: Starting PinMonitor background thread (pin_monitor.py:156:lmcache.v1.pin_monitor) -(EngineCore_DP0 pid=550925) [2026-01-06 23:43:34,989] LMCache INFO: PinMonitor started (pin_monitor.py:176:lmcache.v1.pin_monitor) -(EngineCore_DP0 pid=550925) [2026-01-06 23:43:34,989] LMCache INFO: PinMonitor check: pinned_objects=0, timeout_objects=0, force_unpin_success=0 (pin_monitor.py:121:lmcache.v1.pin_monitor) -(EngineCore_DP0 pid=550925) [2026-01-06 23:43:35,421] LMCache INFO: lmcache lookup server start on /tmp/engine_8d26dd62-bf36-4d46-ad83-f02908fa3dc3_service_lookup_lmcache_rpc_port_0 (lmcache_lookup_client.py:357:lmcache.v1.lookup_client.lmcache_lookup_client) -(EngineCore_DP0 pid=550925) [2026-01-06 23:43:35,422] LMCache WARNING: Please use the latest lmcache connector, otherwise some features may not work, such as DSA (vllm_v1_adapter.py:767:lmcache.integration.vllm.vllm_v1_adapter) -(EngineCore_DP0 pid=550925) [2026-01-06 23:43:35,422] LMCache INFO: Post initializing LMCacheEngine (cache_engine.py:221:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=550925) [2026-01-06 23:43:35,422] LMCache INFO: Initialize storage manager on rank 0, use layerwise: False,save only first rank: False (cache_engine.py:233:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=550925) [2026-01-06 23:43:35,456] LMCache INFO: Initializing LRUCachePolicy (lru.py:22:lmcache.v1.storage_backend.cache_policy.lru) -(EngineCore_DP0 pid=550925) [2026-01-06 23:43:35,456] LMCache INFO: NUMA mapping None (local_cpu_backend.py:350:lmcache.v1.storage_backend.local_cpu_backend) -(EngineCore_DP0 pid=550925) [2026-01-06 23:43:36,872] LMCache WARNING: Controller message sender is not initialized (local_cpu_backend.py:102:lmcache.v1.storage_backend.local_cpu_backend) -(EngineCore_DP0 pid=550925) [2026-01-06 23:43:36,873] LMCache INFO: Internal API server disabled. internal_api_server_enabled=False, port_offset=1, port=7000, socket_path=None, include_index_list=None (api_server.py:50:lmcache.v1.internal_api_server.api_server) -(EngineCore_DP0 pid=550925) [2026-01-06 23:43:36,873] LMCache INFO: LMCache initialized for role KVConnectorRole.WORKER with version 0.3.12-g78697950e, vllm version 0.13.0, lmcache cache_engine metadata: LMCacheEngineMetadata(model_name='mistralai/Mistral-7B-Instruct-v0.2', world_size=1, worker_id=0, fmt='vllm', kv_dtype=torch.bfloat16, kv_shape=(32, 2, 256, 8, 128), use_mla=False, role='worker', served_model_name='mistralai/Mistral-7B-Instruct-v0.2', chunk_size=256, kv_layer_groups_manager=KVLayerGroupsManager(kv_layer_groups=[])) (vllm_v1_adapter.py:840:lmcache.integration.vllm.vllm_v1_adapter) -(EngineCore_DP0 pid=550925) INFO 01-06 23:43:36 [utils.py:35] Connectors do not specify a kv cache layout, defaulting to NHD. -(EngineCore_DP0 pid=550925) Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 0%| | 0/51 [00:00= PR#20511) (token_database.py:74:lmcache.v1.token_database) -(EngineCore_DP0 pid=550925) [2026-01-06 23:43:40,584] LMCache INFO: Using hash algorithm: builtin (token_database.py:84:lmcache.v1.token_database) -(EngineCore_DP0 pid=550925) [2026-01-06 23:43:40,585] LMCache INFO: Internal API server disabled. internal_api_server_enabled=False, port_offset=0, port=6999, socket_path=None, include_index_list=None (api_server.py:50:lmcache.v1.internal_api_server.api_server) -(EngineCore_DP0 pid=550925) [2026-01-06 23:43:40,585] LMCache INFO: LMCache initialized for role KVConnectorRole.SCHEDULER with version 0.3.12-g78697950e, vllm version 0.13.0, lmcache cache_engine metadata: None (vllm_v1_adapter.py:840:lmcache.integration.vllm.vllm_v1_adapter) -INFO 01-06 23:43:41 [llm.py:360] Supported tasks: ['generate'] - Adding requests: 0%| | 0/492 [00:00, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [16384], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': , 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': True, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': , 'evaluate_guards': False}, 'local_cache_dir': None} -(EngineCore_DP0 pid=551467) INFO 01-06 23:44:10 [parallel_state.py:1203] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://10.10.50.98:53245 backend=nccl -(EngineCore_DP0 pid=551467) INFO 01-06 23:44:10 [parallel_state.py:1411] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank 0 -(EngineCore_DP0 pid=551467) INFO 01-06 23:44:10 [gpu_model_runner.py:3562] Starting to load model mistralai/Mistral-7B-Instruct-v0.2... -(EngineCore_DP0 pid=551467) /home/sped/Downloads/storage/kv_cache_benchmark/venv/lib/python3.10/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:174: UserWarning: Failed to JIT torch c dlpack extension, EnvTensorAllocator will not be enabled. -(EngineCore_DP0 pid=551467) We recommend installing via `pip install torch-c-dlpack-ext` -(EngineCore_DP0 pid=551467) warnings.warn( -(EngineCore_DP0 pid=551467) INFO 01-06 23:44:12 [cuda.py:351] Using FLASH_ATTN attention backend out of potential backends: ('FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION') -(EngineCore_DP0 pid=551467) Loading safetensors checkpoint shards: 0% Completed | 0/3 [00:00= PR#20511) (token_database.py:74:lmcache.v1.token_database) -(EngineCore_DP0 pid=551467) [2026-01-06 23:44:25,115] LMCache INFO: Using hash algorithm: builtin (token_database.py:84:lmcache.v1.token_database) -(EngineCore_DP0 pid=551467) [2026-01-06 23:44:25,117] LMCache INFO: Creating LMCacheEngine with config: {'chunk_size': 256, 'local_cpu': False, 'max_local_cpu_size': 5.0, 'reserve_local_cpu_size': 0.0, 'local_disk': None, 'max_local_disk_size': 0.0, 'remote_url': None, 'remote_serde': 'naive', 'use_layerwise': False, 'save_decode_cache': False, 'pre_caching_hash_algorithm': 'builtin', 'enable_blending': False, 'blend_recompute_ratios': None, 'blend_thresholds': None, 'blend_check_layers': None, 'blend_min_tokens': 256, 'blend_special_str': ' # # ', 'enable_p2p': False, 'p2p_host': None, 'p2p_init_ports': None, 'p2p_lookup_ports': None, 'enable_controller': False, 'lmcache_instance_id': None, 'controller_pull_url': None, 'controller_reply_url': None, 'lmcache_worker_ports': None, 'lmcache_worker_ids': None, 'lmcache_worker_heartbeat_delay_time': 10, 'lmcache_worker_heartbeat_time': None, 'enable_pd': False, 'pd_role': None, 'pd_buffer_size': None, 'pd_buffer_device': None, 'pd_peer_host': None, 'pd_peer_init_port': None, 'pd_peer_alloc_port': None, 'pd_proxy_host': None, 'pd_proxy_port': None, 'transfer_channel': None, 'nixl_backends': None, 'nixl_buffer_size': None, 'nixl_buffer_device': None, 'gds_path': None, 'cufile_buffer_size': None, 'audit_actual_remote_url': None, 'internal_api_server_host': '0.0.0.0', 'extra_config': None, 'save_unfull_chunk': False, 'blocking_timeout_secs': 10, 'external_lookup_client': None, 'py_enable_gc': True, 'cache_policy': 'LRU', 'numa_mode': None, 'enable_async_loading': False, 'internal_api_server_enabled': False, 'internal_api_server_port_start': 6999, 'priority_limit': None, 'internal_api_server_include_index_list': None, 'internal_api_server_socket_path_prefix': None, 'runtime_plugin_locations': None, 'storage_plugins': None, 'lookup_timeout_ms': 3000, 'hit_miss_ratio': None, 'lookup_server_worker_ids': None, 'enable_scheduler_bypass_lookup': False, 'script_allowed_imports': None, 'enable_lazy_memory_allocator': False, 'lazy_memory_initial_ratio': 0.2, 'lazy_memory_expand_trigger_ratio': 0.5, 'lazy_memory_step_ratio': 0.1, 'lazy_memory_safe_size': 0.0, 'enable_chunk_statistics': False, 'chunk_statistics_auto_start_statistics': False, 'chunk_statistics_auto_exit_timeout_hours': 0.0, 'chunk_statistics_auto_exit_target_unique_chunks': 0, 'chunk_statistics_strategy': 'memory_bloom_filter', 'enable_kv_events': False, 'use_gpu_connector_v3': False, 'pin_timeout_sec': 300, 'pin_check_interval_sec': 30} (cache_engine.py:101:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=551467) [2026-01-06 23:44:25,118] LMCache INFO: LMCacheWorker is not initialized (related configs: enable_controller: False, role: worker, worker_id: 0, worker_ids: [0]). (cache_engine.py:143:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=551467) [2026-01-06 23:44:25,118] LMCache INFO: KV events are disabled. (cache_engine.py:172:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=551467) [2026-01-06 23:44:25,118] LMCache INFO: Initializing usage context. (usage_context.py:412:lmcache.usage_context) -(EngineCore_DP0 pid=551467) [2026-01-06 23:44:26,350] LMCache INFO: Starting PinMonitor background thread (pin_monitor.py:156:lmcache.v1.pin_monitor) -(EngineCore_DP0 pid=551467) [2026-01-06 23:44:26,350] LMCache INFO: PinMonitor started (pin_monitor.py:176:lmcache.v1.pin_monitor) -(EngineCore_DP0 pid=551467) [2026-01-06 23:44:26,726] LMCache INFO: lmcache lookup server start on /tmp/engine_aa903ed9-c948-4010-830f-b1ae3e915ff2_service_lookup_lmcache_rpc_port_0 (lmcache_lookup_client.py:357:lmcache.v1.lookup_client.lmcache_lookup_client) -(EngineCore_DP0 pid=551467) [2026-01-06 23:44:26,350] LMCache INFO: PinMonitor check: pinned_objects=0, timeout_objects=0, force_unpin_success=0 (pin_monitor.py:121:lmcache.v1.pin_monitor) -(EngineCore_DP0 pid=551467) [2026-01-06 23:44:26,727] LMCache WARNING: Please use the latest lmcache connector, otherwise some features may not work, such as DSA (vllm_v1_adapter.py:767:lmcache.integration.vllm.vllm_v1_adapter) -(EngineCore_DP0 pid=551467) [2026-01-06 23:44:26,728] LMCache INFO: Post initializing LMCacheEngine (cache_engine.py:221:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=551467) [2026-01-06 23:44:26,728] LMCache INFO: Initialize storage manager on rank 0, use layerwise: False,save only first rank: False (cache_engine.py:233:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=551467) [2026-01-06 23:44:26,752] LMCache INFO: Initializing LRUCachePolicy (lru.py:22:lmcache.v1.storage_backend.cache_policy.lru) -(EngineCore_DP0 pid=551467) [2026-01-06 23:44:26,752] LMCache INFO: NUMA mapping None (local_cpu_backend.py:350:lmcache.v1.storage_backend.local_cpu_backend) -(EngineCore_DP0 pid=551467) [2026-01-06 23:44:28,157] LMCache WARNING: Controller message sender is not initialized (local_cpu_backend.py:102:lmcache.v1.storage_backend.local_cpu_backend) -(EngineCore_DP0 pid=551467) [2026-01-06 23:44:28,157] LMCache INFO: Internal API server disabled. internal_api_server_enabled=False, port_offset=1, port=7000, socket_path=None, include_index_list=None (api_server.py:50:lmcache.v1.internal_api_server.api_server) -(EngineCore_DP0 pid=551467) [2026-01-06 23:44:28,157] LMCache INFO: LMCache initialized for role KVConnectorRole.WORKER with version 0.3.12-g78697950e, vllm version 0.13.0, lmcache cache_engine metadata: LMCacheEngineMetadata(model_name='mistralai/Mistral-7B-Instruct-v0.2', world_size=1, worker_id=0, fmt='vllm', kv_dtype=torch.bfloat16, kv_shape=(32, 2, 256, 8, 128), use_mla=False, role='worker', served_model_name='mistralai/Mistral-7B-Instruct-v0.2', chunk_size=256, kv_layer_groups_manager=KVLayerGroupsManager(kv_layer_groups=[])) (vllm_v1_adapter.py:840:lmcache.integration.vllm.vllm_v1_adapter) -(EngineCore_DP0 pid=551467) INFO 01-06 23:44:28 [utils.py:35] Connectors do not specify a kv cache layout, defaulting to NHD. -(EngineCore_DP0 pid=551467) Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 0%| | 0/51 [00:00= PR#20511) (token_database.py:74:lmcache.v1.token_database) -(EngineCore_DP0 pid=551467) [2026-01-06 23:44:31,794] LMCache INFO: Using hash algorithm: builtin (token_database.py:84:lmcache.v1.token_database) -(EngineCore_DP0 pid=551467) [2026-01-06 23:44:31,794] LMCache INFO: Internal API server disabled. internal_api_server_enabled=False, port_offset=0, port=6999, socket_path=None, include_index_list=None (api_server.py:50:lmcache.v1.internal_api_server.api_server) -(EngineCore_DP0 pid=551467) [2026-01-06 23:44:31,794] LMCache INFO: LMCache initialized for role KVConnectorRole.SCHEDULER with version 0.3.12-g78697950e, vllm version 0.13.0, lmcache cache_engine metadata: None (vllm_v1_adapter.py:840:lmcache.integration.vllm.vllm_v1_adapter) -INFO 01-06 23:44:32 [llm.py:360] Supported tasks: ['generate'] - Adding requests: 0%| | 0/492 [00:00, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [16384], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': , 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': True, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': , 'evaluate_guards': False}, 'local_cache_dir': None} -(EngineCore_DP0 pid=552016) INFO 01-06 23:45:01 [parallel_state.py:1203] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://10.10.50.98:42505 backend=nccl -(EngineCore_DP0 pid=552016) INFO 01-06 23:45:01 [parallel_state.py:1411] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank 0 -(EngineCore_DP0 pid=552016) INFO 01-06 23:45:02 [gpu_model_runner.py:3562] Starting to load model mistralai/Mistral-7B-Instruct-v0.2... -(EngineCore_DP0 pid=552016) /home/sped/Downloads/storage/kv_cache_benchmark/venv/lib/python3.10/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:174: UserWarning: Failed to JIT torch c dlpack extension, EnvTensorAllocator will not be enabled. -(EngineCore_DP0 pid=552016) We recommend installing via `pip install torch-c-dlpack-ext` -(EngineCore_DP0 pid=552016) warnings.warn( -(EngineCore_DP0 pid=552016) INFO 01-06 23:45:04 [cuda.py:351] Using FLASH_ATTN attention backend out of potential backends: ('FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION') -(EngineCore_DP0 pid=552016) Loading safetensors checkpoint shards: 0% Completed | 0/3 [00:00= PR#20511) (token_database.py:74:lmcache.v1.token_database) -(EngineCore_DP0 pid=552016) [2026-01-06 23:45:17,095] LMCache INFO: Using hash algorithm: builtin (token_database.py:84:lmcache.v1.token_database) -(EngineCore_DP0 pid=552016) [2026-01-06 23:45:17,097] LMCache INFO: Creating LMCacheEngine with config: {'chunk_size': 256, 'local_cpu': False, 'max_local_cpu_size': 5.0, 'reserve_local_cpu_size': 0.0, 'local_disk': None, 'max_local_disk_size': 0.0, 'remote_url': None, 'remote_serde': 'naive', 'use_layerwise': False, 'save_decode_cache': False, 'pre_caching_hash_algorithm': 'builtin', 'enable_blending': False, 'blend_recompute_ratios': None, 'blend_thresholds': None, 'blend_check_layers': None, 'blend_min_tokens': 256, 'blend_special_str': ' # # ', 'enable_p2p': False, 'p2p_host': None, 'p2p_init_ports': None, 'p2p_lookup_ports': None, 'enable_controller': False, 'lmcache_instance_id': None, 'controller_pull_url': None, 'controller_reply_url': None, 'lmcache_worker_ports': None, 'lmcache_worker_ids': None, 'lmcache_worker_heartbeat_delay_time': 10, 'lmcache_worker_heartbeat_time': None, 'enable_pd': False, 'pd_role': None, 'pd_buffer_size': None, 'pd_buffer_device': None, 'pd_peer_host': None, 'pd_peer_init_port': None, 'pd_peer_alloc_port': None, 'pd_proxy_host': None, 'pd_proxy_port': None, 'transfer_channel': None, 'nixl_backends': None, 'nixl_buffer_size': None, 'nixl_buffer_device': None, 'gds_path': None, 'cufile_buffer_size': None, 'audit_actual_remote_url': None, 'internal_api_server_host': '0.0.0.0', 'extra_config': None, 'save_unfull_chunk': False, 'blocking_timeout_secs': 10, 'external_lookup_client': None, 'py_enable_gc': True, 'cache_policy': 'LRU', 'numa_mode': None, 'enable_async_loading': False, 'internal_api_server_enabled': False, 'internal_api_server_port_start': 6999, 'priority_limit': None, 'internal_api_server_include_index_list': None, 'internal_api_server_socket_path_prefix': None, 'runtime_plugin_locations': None, 'storage_plugins': None, 'lookup_timeout_ms': 3000, 'hit_miss_ratio': None, 'lookup_server_worker_ids': None, 'enable_scheduler_bypass_lookup': False, 'script_allowed_imports': None, 'enable_lazy_memory_allocator': False, 'lazy_memory_initial_ratio': 0.2, 'lazy_memory_expand_trigger_ratio': 0.5, 'lazy_memory_step_ratio': 0.1, 'lazy_memory_safe_size': 0.0, 'enable_chunk_statistics': False, 'chunk_statistics_auto_start_statistics': False, 'chunk_statistics_auto_exit_timeout_hours': 0.0, 'chunk_statistics_auto_exit_target_unique_chunks': 0, 'chunk_statistics_strategy': 'memory_bloom_filter', 'enable_kv_events': False, 'use_gpu_connector_v3': False, 'pin_timeout_sec': 300, 'pin_check_interval_sec': 30} (cache_engine.py:101:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=552016) [2026-01-06 23:45:17,098] LMCache INFO: LMCacheWorker is not initialized (related configs: enable_controller: False, role: worker, worker_id: 0, worker_ids: [0]). (cache_engine.py:143:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=552016) [2026-01-06 23:45:17,098] LMCache INFO: KV events are disabled. (cache_engine.py:172:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=552016) [2026-01-06 23:45:17,098] LMCache INFO: Initializing usage context. (usage_context.py:412:lmcache.usage_context) -(EngineCore_DP0 pid=552016) [2026-01-06 23:45:18,325] LMCache INFO: Starting PinMonitor background thread (pin_monitor.py:156:lmcache.v1.pin_monitor) -(EngineCore_DP0 pid=552016) [2026-01-06 23:45:18,325] LMCache INFO: PinMonitor started (pin_monitor.py:176:lmcache.v1.pin_monitor) -(EngineCore_DP0 pid=552016) [2026-01-06 23:45:18,699] LMCache INFO: lmcache lookup server start on /tmp/engine_62d1e10b-1c92-43b0-9197-e042bcd8865d_service_lookup_lmcache_rpc_port_0 (lmcache_lookup_client.py:357:lmcache.v1.lookup_client.lmcache_lookup_client) -(EngineCore_DP0 pid=552016) [2026-01-06 23:45:18,325] LMCache INFO: PinMonitor check: pinned_objects=0, timeout_objects=0, force_unpin_success=0 (pin_monitor.py:121:lmcache.v1.pin_monitor) -(EngineCore_DP0 pid=552016) [2026-01-06 23:45:18,701] LMCache WARNING: Please use the latest lmcache connector, otherwise some features may not work, such as DSA (vllm_v1_adapter.py:767:lmcache.integration.vllm.vllm_v1_adapter) -(EngineCore_DP0 pid=552016) [2026-01-06 23:45:18,701] LMCache INFO: Post initializing LMCacheEngine (cache_engine.py:221:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=552016) [2026-01-06 23:45:18,701] LMCache INFO: Initialize storage manager on rank 0, use layerwise: False,save only first rank: False (cache_engine.py:233:lmcache.v1.cache_engine) -(EngineCore_DP0 pid=552016) [2026-01-06 23:45:18,720] LMCache INFO: Initializing LRUCachePolicy (lru.py:22:lmcache.v1.storage_backend.cache_policy.lru) -(EngineCore_DP0 pid=552016) [2026-01-06 23:45:18,720] LMCache INFO: NUMA mapping None (local_cpu_backend.py:350:lmcache.v1.storage_backend.local_cpu_backend) -(EngineCore_DP0 pid=552016) [2026-01-06 23:45:20,130] LMCache WARNING: Controller message sender is not initialized (local_cpu_backend.py:102:lmcache.v1.storage_backend.local_cpu_backend) -(EngineCore_DP0 pid=552016) [2026-01-06 23:45:20,130] LMCache INFO: Internal API server disabled. internal_api_server_enabled=False, port_offset=1, port=7000, socket_path=None, include_index_list=None (api_server.py:50:lmcache.v1.internal_api_server.api_server) -(EngineCore_DP0 pid=552016) [2026-01-06 23:45:20,131] LMCache INFO: LMCache initialized for role KVConnectorRole.WORKER with version 0.3.12-g78697950e, vllm version 0.13.0, lmcache cache_engine metadata: LMCacheEngineMetadata(model_name='mistralai/Mistral-7B-Instruct-v0.2', world_size=1, worker_id=0, fmt='vllm', kv_dtype=torch.bfloat16, kv_shape=(32, 2, 256, 8, 128), use_mla=False, role='worker', served_model_name='mistralai/Mistral-7B-Instruct-v0.2', chunk_size=256, kv_layer_groups_manager=KVLayerGroupsManager(kv_layer_groups=[])) (vllm_v1_adapter.py:840:lmcache.integration.vllm.vllm_v1_adapter) -(EngineCore_DP0 pid=552016) INFO 01-06 23:45:20 [utils.py:35] Connectors do not specify a kv cache layout, defaulting to NHD. -(EngineCore_DP0 pid=552016) Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 0%| | 0/51 [00:00= PR#20511) (token_database.py:74:lmcache.v1.token_database) -(EngineCore_DP0 pid=552016) [2026-01-06 23:45:23,833] LMCache INFO: Using hash algorithm: builtin (token_database.py:84:lmcache.v1.token_database) -(EngineCore_DP0 pid=552016) [2026-01-06 23:45:23,833] LMCache INFO: Internal API server disabled. internal_api_server_enabled=False, port_offset=0, port=6999, socket_path=None, include_index_list=None (api_server.py:50:lmcache.v1.internal_api_server.api_server) -(EngineCore_DP0 pid=552016) [2026-01-06 23:45:23,833] LMCache INFO: LMCache initialized for role KVConnectorRole.SCHEDULER with version 0.3.12-g78697950e, vllm version 0.13.0, lmcache cache_engine metadata: None (vllm_v1_adapter.py:840:lmcache.integration.vllm.vllm_v1_adapter) -INFO 01-06 23:45:24 [llm.py:360] Supported tasks: ['generate'] - Adding requests: 0%| | 0/492 [00:00, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [16384], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': , 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': True, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': , 'evaluate_guards': False}, 'local_cache_dir': None} -(EngineCore_DP0 pid=548986) INFO 01-06 23:40:28 [parallel_state.py:1203] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://10.10.50.98:55527 backend=nccl -(EngineCore_DP0 pid=548986) INFO 01-06 23:40:28 [parallel_state.py:1411] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank 0 -(EngineCore_DP0 pid=548986) INFO 01-06 23:40:29 [gpu_model_runner.py:3562] Starting to load model mistralai/Mistral-7B-Instruct-v0.2... -(EngineCore_DP0 pid=548986) /home/sped/Downloads/storage/kv_cache_benchmark/venv/lib/python3.10/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:174: UserWarning: Failed to JIT torch c dlpack extension, EnvTensorAllocator will not be enabled. -(EngineCore_DP0 pid=548986) We recommend installing via `pip install torch-c-dlpack-ext` -(EngineCore_DP0 pid=548986) warnings.warn( -(EngineCore_DP0 pid=548986) INFO 01-06 23:40:31 [cuda.py:351] Using FLASH_ATTN attention backend out of potential backends: ('FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION') -(EngineCore_DP0 pid=548986) Loading safetensors checkpoint shards: 0% Completed | 0/3 [00:00, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [16384], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': , 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': True, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': , 'evaluate_guards': False}, 'local_cache_dir': None} -(EngineCore_DP0 pid=549646) INFO 01-06 23:41:27 [parallel_state.py:1203] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://10.10.50.98:35041 backend=nccl -(EngineCore_DP0 pid=549646) INFO 01-06 23:41:27 [parallel_state.py:1411] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank 0 -(EngineCore_DP0 pid=549646) INFO 01-06 23:41:28 [gpu_model_runner.py:3562] Starting to load model mistralai/Mistral-7B-Instruct-v0.2... -(EngineCore_DP0 pid=549646) /home/sped/Downloads/storage/kv_cache_benchmark/venv/lib/python3.10/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:174: UserWarning: Failed to JIT torch c dlpack extension, EnvTensorAllocator will not be enabled. -(EngineCore_DP0 pid=549646) We recommend installing via `pip install torch-c-dlpack-ext` -(EngineCore_DP0 pid=549646) warnings.warn( -(EngineCore_DP0 pid=549646) INFO 01-06 23:41:29 [cuda.py:351] Using FLASH_ATTN attention backend out of potential backends: ('FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION') -(EngineCore_DP0 pid=549646) Loading safetensors checkpoint shards: 0% Completed | 0/3 [00:00, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [16384], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': , 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': True, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': , 'evaluate_guards': False}, 'local_cache_dir': None} -(EngineCore_DP0 pid=550304) INFO 01-06 23:42:26 [parallel_state.py:1203] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://10.10.50.98:41083 backend=nccl -(EngineCore_DP0 pid=550304) INFO 01-06 23:42:26 [parallel_state.py:1411] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank 0 -(EngineCore_DP0 pid=550304) INFO 01-06 23:42:26 [gpu_model_runner.py:3562] Starting to load model mistralai/Mistral-7B-Instruct-v0.2... -(EngineCore_DP0 pid=550304) /home/sped/Downloads/storage/kv_cache_benchmark/venv/lib/python3.10/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:174: UserWarning: Failed to JIT torch c dlpack extension, EnvTensorAllocator will not be enabled. -(EngineCore_DP0 pid=550304) We recommend installing via `pip install torch-c-dlpack-ext` -(EngineCore_DP0 pid=550304) warnings.warn( -(EngineCore_DP0 pid=550304) INFO 01-06 23:42:28 [cuda.py:351] Using FLASH_ATTN attention backend out of potential backends: ('FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION') -(EngineCore_DP0 pid=550304) Loading safetensors checkpoint shards: 0% Completed | 0/3 [00:00 0 else 0 - storage_tok_per_sec.append(st) - # Request rate based on storage time - rps = requests / io_time if io_time > 0 else 0 - req_per_sec.append(rps) - # Wall-clock throughput for reference - wc_elapsed = t.get('elapsed_time', io_time) - wc = tokens / wc_elapsed if wc_elapsed > 0 else 0 - tok_per_sec.append(wc) - elapsed.append(io_time) - - results_data[config_name] = { - 'name': display_name, - 'trials': len(trials), - 'tok_per_sec_mean': np.mean(tok_per_sec), - 'tok_per_sec_std': np.std(tok_per_sec), - 'storage_tok_per_sec_mean': np.mean(storage_tok_per_sec), - 'storage_tok_per_sec_std': np.std(storage_tok_per_sec), - 'req_per_sec_mean': np.mean(req_per_sec), - 'req_per_sec_std': np.std(req_per_sec), - 'elapsed_mean': np.mean(elapsed), - 'elapsed_std': np.std(elapsed), - } - -# Build report -lines = [] -lines.append('=' * 80) -lines.append('LMCACHE vs KV-CACHE COMPARISON RESULTS') -lines.append('=' * 80) -lines.append('') - -# Real inference section -for cfg in ['vllm_baseline', 'lmcache_gpu_only', 'lmcache_cpu_offload']: - if cfg not in results_data: - continue - d = results_data[cfg] - lines.append(d['name']) - lines.append('-' * 50) - lines.append(f" Trials: {d['trials']}") - lines.append(f" Tokens/sec: {d['tok_per_sec_mean']:8.2f} +/- {d['tok_per_sec_std']:7.2f}") - lines.append(f" Requests/sec: {d['req_per_sec_mean']:8.2f} +/- {d['req_per_sec_std']:7.2f}") - lines.append(f" Elapsed time: {d['elapsed_mean']:8.2f}s +/- {d['elapsed_std']:7.2f}s") - lines.append('') - -# kv-cache.py section with STORAGE THROUGHPUT -for cfg in ['kvcache_gpu_only', 'kvcache_gpu_cpu', 'kvcache_gpu_cpu_nvme', 'kvcache_nvme_only']: - if cfg not in results_data: - continue - d = results_data[cfg] - lines.append(d['name']) - lines.append('-' * 50) - lines.append(f" Trials: {d['trials']}") - lines.append(f" Storage Throughput: {d['storage_tok_per_sec_mean']:8.2f} +/- {d['storage_tok_per_sec_std']:7.2f} tok/s") - lines.append(f" Storage Requests/sec: {d['req_per_sec_mean']:8.2f} +/- {d['req_per_sec_std']:7.2f}") - lines.append(f" Total I/O Time: {d['elapsed_mean']:8.2f}s +/- {d['elapsed_std']:7.2f}s") - lines.append('') - -# Comparative analysis -lines.append('=' * 80) -lines.append('COMPARATIVE ANALYSIS') -lines.append('=' * 80) -lines.append('') -lines.append('Note: kv-cache.py tests use EQUAL total cache capacity for fair comparison.') -lines.append(' Storage Throughput = tokens / total_storage_io_latency (correct metric)') -lines.append('') - -lines.append('kv-cache.py Storage Tier Comparison (Storage Throughput):') -for cfg in ['kvcache_gpu_only', 'kvcache_gpu_cpu', 'kvcache_gpu_cpu_nvme', 'kvcache_nvme_only']: - if cfg not in results_data: - continue - d = results_data[cfg] - tier_name = cfg.replace('kvcache_', '').upper().replace('_', ' ') - lines.append(f" {tier_name:20}: {d['storage_tok_per_sec_mean']:8.2f} tok/s") - -lines.append('') - -# Speedup calculation -if 'kvcache_nvme_only' in results_data: - nvme_baseline = results_data['kvcache_nvme_only']['storage_tok_per_sec_mean'] - lines.append(' Speedup vs NVMe-only:') - for cfg in ['kvcache_gpu_only', 'kvcache_gpu_cpu', 'kvcache_gpu_cpu_nvme']: - if cfg not in results_data: - continue - d = results_data[cfg] - speedup = d['storage_tok_per_sec_mean'] / nvme_baseline - tier_name = cfg.replace('kvcache_', '').replace('_', ' ') - lines.append(f" {tier_name:16}: {speedup:.2f}x") - -lines.append('') -lines.append('LMCache vs kv-cache.py (NOTE: different tools, different purposes):') -lines.append(' - LMCache: Real GPU inference with KV cache optimization') -lines.append(' - kv-cache.py: Storage I/O simulator for MLPerf Storage benchmark') -lines.append('') -if 'lmcache_cpu_offload' in results_data and 'kvcache_gpu_cpu' in results_data: - lm = results_data['lmcache_cpu_offload']['tok_per_sec_mean'] - kv = results_data['kvcache_gpu_cpu']['storage_tok_per_sec_mean'] - lines.append(f" LMCache CPU offload: {lm:8.2f} tok/s (real inference)") - lines.append(f" kv-cache.py GPU+CPU: {kv:8.2f} tok/s (storage I/O sim)") - lines.append(f" Ratio: {lm/kv:.2f}x (expected: LMCache faster due to GPU compute)") - -output = '\n'.join(lines) -print(output) - -with open('comparison_report.txt', 'w') as f: - f.write(output) - -print('\n\nSaved to comparison_report.txt') diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/system_info.txt b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/system_info.txt deleted file mode 100644 index 876c27d1..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/system_info.txt +++ /dev/null @@ -1,36 +0,0 @@ -=== LMCache vs KV-Cache Comparison: 20260106_233959 === - -=== Hardware === -name, memory.total [MiB], driver_version -NVIDIA H100 NVL, 95830 MiB, 580.95.05 - -=== Software === -OS: Linux sped 6.5.0-15-generic #15~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Jan 12 18:54:30 UTC 2 x86_64 x86_64 x86_64 GNU/Linux -Python: Python 3.10.12 -vLLM: 0.13.0 -LMCache: unknown -PyTorch: 2.9.0+cu128, CUDA: 12.8 - -=== Configuration === -Model: mistralai/Mistral-7B-Instruct-v0.2 / mistral-7b -Number of trials: 3 -Prompts per run: 500 -GPU memory for KV cache: 16GB -CPU memory for KV cache: 32GB -Cache directory: /mnt/nvme -Dataset: ShareGPT_V3_unfiltered_cleaned_split.json - -=== LMCache Environment Variables === -LMCACHE_CHUNK_SIZE: 256 (production default) -LMCACHE_LOCAL_CPU: True/False (per test) -LMCACHE_MAX_LOCAL_CPU_SIZE: 32GB - -=== Memory === - total used free shared buff/cache available -Mem: 251Gi 3.1Gi 191Gi 198Mi 57Gi 246Gi -Swap: 0B 0B 0B - -=== Disk === -Filesystem Size Used Avail Use% Mounted on -/dev/nvme4n1 7.0T 2.5T 4.6T 35% /mnt - diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/vllm_baseline_trial1.json b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/vllm_baseline_trial1.json deleted file mode 100644 index 95383c06..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/vllm_baseline_trial1.json +++ /dev/null @@ -1,7 +0,0 @@ -{ - "elapsed_time": 17.475845791996107, - "num_requests": 500, - "total_num_tokens": 239867, - "requests_per_second": 28.61091851869045, - "tokens_per_second": 13725.630384645445 -} \ No newline at end of file diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/vllm_baseline_trial2.json b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/vllm_baseline_trial2.json deleted file mode 100644 index f73ec3f0..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/vllm_baseline_trial2.json +++ /dev/null @@ -1,7 +0,0 @@ -{ - "elapsed_time": 17.45444850999047, - "num_requests": 500, - "total_num_tokens": 239867, - "requests_per_second": 28.645992436473318, - "tokens_per_second": 13742.456535519092 -} \ No newline at end of file diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/vllm_baseline_trial3.json b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/vllm_baseline_trial3.json deleted file mode 100644 index 9c998cb9..00000000 --- a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/vllm_baseline_trial3.json +++ /dev/null @@ -1,7 +0,0 @@ -{ - "elapsed_time": 17.48015343400766, - "num_requests": 500, - "total_num_tokens": 239867, - "requests_per_second": 28.60386791727179, - "tokens_per_second": 13722.247971424464 -} \ No newline at end of file diff --git a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/~$mlperf_storage_summary.xlsx b/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/~$mlperf_storage_summary.xlsx deleted file mode 100644 index 1e0c8b3a..00000000 Binary files a/kv_cache_benchmark/vllm_lmcache_validate/lmcache_results_20260106_233959/~$mlperf_storage_summary.xlsx and /dev/null differ