Demystifying and Improving Lazy Promotion in Cache Eviction [VLDB'26]

This repository contains the source code and experimental data for the paper "Demystifying and Improving Lazy Promotion in Cache Eviction".

Repository Structure

The project is organized as follows:

simulator/: Contains the implementation of various lazy promotion techniques on a cache simulator based on libCacheSim.
simulator-concurrent/: Contains the implementations of concurrent lazy promotion techniques
scripts/: Contains Python scripts to parse and process datasets, and to generate figures for the paper. This is the primary directory for reproducing the results.
scripts/figures/: This directory is not present in the repository but is generated by the scripts in the scripts/ directory. It contains the figures generated from the scripts.

Environment Setup

Tested on Ubuntu 24.04

Dependencies

The simulator depends on the following tools and libraries:

cmake
build-essential
libglib2.0-dev
libgoogle-perftools-dev (tcmalloc)
zstd

You can install these dependencies using the provided script:

cd simulator/scripts
bash install_dependency.sh

This script supports Ubuntu, CentOS, and macOS.

Building the Simulator

After installing the dependencies, you can build the simulator:

mkdir -p _build
cd _build
cmake -DCMAKE_BUILD_TYPE=Release ..
make [-j]

On stricter compiler you might need to run

mkdir -p _build
cd _build
cmake -DCMAKE_C_FLAGS="-Wno-implicit-int" -DCMAKE_CXX_FLAGS="-std=c++17 -include cstdint -Wno-template-body -fpermissive" ..
make -j

The simulator executable cachesim will be available in the simulator/_build/bin/ directory.

Reproducing Experiment Results

The following sections describe how to reproduce the experiment results from the paper.

Running Simulations

All simulations are run using the cachesim executable. The general command format is:

./_build/bin/cachesim [trace_path] [trace_type] [algorithm] -e [eviction_param] [cache_size] --ignore-obj-size 1

The traces used in the paper are available at https://ftp.pdl.cmu.edu/pub/datasets/twemcacheWorkload/cacheDatasets/. The trace_type is oracleGeneral.

Here are the basic commands for each algorithm discussed in the paper:

1. FIFO

# sequential
./_build/bin/cachesim $file oracleGeneral fifo 0.01 --ignore-obj-size 1
# concurrent
./_build/bin/cachesim fifo <cache_size> --ignore-obj-size 1 --num-thread <thread_count>

2. LRU

# sequential
./_build/bin/cachesim $file oracleGeneral lru 0.01 --ignore-obj-size 1
# concurrent
./_build/bin/cachesim lru <cache_size> --ignore-obj-size 1 --num-thread <thread_count>

3. Prob-LRU

# sequential
./_build/bin/cachesim $file oracleGeneral lru-prob -e prob=0.5 0.01 --ignore-obj-size 1
# concurrent
./_build/bin/cachesim lru-prob -e prob=0.5 <cache_size> --ignore-obj-size 1 --num-thread <thread_count>

4. Delay-LRU

# sequential
./_build/bin/cachesim $file oracleGeneral lru-delay -e delay-time=0.2 0.01 --ignore-obj-size 1
# concurrent
./_build/bin/cachesim lru-delay -e delay-time=0.2 <cache_size> --ignore-obj-size 1 --num-thread <thread_count>

5. Batch

# sequential
./_build/bin/cachesim $file oracleGeneral batch -e batch-size=0.5 0.01 --ignore-obj-size 1
# concurrent
./_build/bin/cachesim batch -e batch-size=0.5 <cache_size> --ignore-obj-size 1 --num-thread <thread_count>

6. FIFO-reinsertion(FR) / CLOCK

# sequential
./_build/bin/cachesim $file oracleGeneral clock -e n_bit_counter=1 0.01 --ignore-obj-size 1
# concurrent
./_build/bin/cachesim clock -e n_bit_counter=1 <cache_size> --ignore-obj-size 1 --num-thread <thread_count>

7. RandomLRU

./_build/bin/cachesim $file oracleGeneral randomLRU -e n-samples=5 0.01 --ignore-obj-size 1

8. Belady-Random

./_build/bin/cachesim $file oracleGeneral randomBelady -e scaler=5 0.01 --ignore-obj-size 1

9. Belady-RandomLRU

./_build/bin/cachesim $file oracleGeneral beladyRandomLRU -e scaler=5 0.01 --ignore-obj-size 1

10. Offline-FR

./_build/bin/cachesim $file oracleGeneral opt-clock -e iter=5 0.01 --ignore-obj-size 1

11. Delay-FR

./_build/bin/cachesim $file oracleGeneral delayfr -e delay-ratio=0.05 0.01 --ignore-obj-size 1

12. AGE

./_build/bin/cachesim $file oracleGeneral age -e scaler=0.4 0.01 --ignore-obj-size 1

Experiments Details

Hardware

We recommend running this experiments on hardware with at least 256 GB of RAM. It is required to run some larger traces.

Experiment Size and Duration

Running all the experiments included in the paper would consume a lot of spaces to just store the traces and take at least a day using 15 nodes (64 core, 376 GB RAM). Considering that, we provide the simulation results on scripts/data/ and Google Drive.

Setup distComp

We ran our experiments using distComp. Utilizing its capability in managing hundred-thousands of experiments across multiple nodes. Even if you're only using single node it's nice to have for monitoring purpose.

Install dependencies

pip install redis psutil

or

sudo apt install python3-redis python3-psutil

Above commands need to be run on all nodes. You can utilize pssh or clush to do that. For the manager nodes. You need to set up redis-server with redis.sh available on the distComp repository.

We provide systemd service for distComp so that you can start/stop/restart easily.

[Unit]
Description=distComp Worker
After=network.target

[Service]
Type=simple
ExecStart=/usr/bin/python3 $DISTCOMP_DIR/redisWorker.py
Restart=always
User=$USER
WorkingDirectory=$DISTCOMP_DIR
StandardOutput=null
StandardError=null

[Install]
WantedBy=multi-user.target

Running the Experiments

There's 2 kind of experiments on our paper. Miss ratio and promotions count experiment and Throughput experiments.

Miss ratio and Promotions count Experiment

we provided scripts for that in scripts/generate_task.sh. You might want to adjust the path on the scripts. It will output task file that contains all experiments.

Running the FIFO and LRU experiments is necessary to generate all figures as it is used as the baseline. You can run the two experiment first by running

grep -Ei ' (lru|fifo) ' ~/task > ~/fifo_lru_task
cd distComp; python redisManager.py --task loadTask --taskfile ~/fifo_lru_task

Here's the list of minimal experiment you can run to reproduce specific figures on our paper.

figures 1b

grep -Ei ' (lru-delay|lru-prob|batch|clock|delayfr|age) ' ~/task > ~/fig1_task
cd distComp; python redisManager.py --task loadTask --taskfile ~/fig1_task

figures 2a, 2c

grep -Ei ' (lru-prob) ' ~/task > ~/fig2_task
cd distComp; python redisManager.py --task loadTask --taskfile ~/fig2_task

figures 3a, 3c

grep -Ei ' (batch) ' ~/task > ~/fig3_task
cd distComp; python redisManager.py --task loadTask --taskfile ~/fig3_task

figures 4a, 4c

grep -Ei ' (lru-delay) ' ~/task > ~/fig4_task
cd distComp; python redisManager.py --task loadTask --taskfile ~/fig4_task

figures 5a, 5c

grep -Ei ' (clock) ' ~/task > ~/fig5_task
cd distComp; python redisManager.py --task loadTask --taskfile ~/fig5_task

figures 6b

grep -Ei ' (random) ' ~/task > ~/fig6_task
cd distComp; python redisManager.py --task loadTask --taskfile ~/fig6_task

figures 7a

grep -Ei ' (lru-prob) ' ~/task > ~/fig7a_task
cd distComp; python redisManager.py --task loadTask --taskfile ~/fig7a_task

figures 7b

grep -Ei ' (batch) ' ~/task > ~/fig7b_task
cd distComp; python redisManager.py --task loadTask --taskfile ~/fig7b_task

figures 7c

grep -Ei ' (lru-delay) ' ~/task > ~/fig7c_task
cd distComp; python redisManager.py --task loadTask --taskfile ~/fig7c_task

figures 7d

grep -Ei ' (clock) ' ~/task > ~/fig7d_task
cd distComp; python redisManager.py --task loadTask --taskfile ~/fig7d_task

figures 8a, 8b

grep -Ei 'arc' ~/task > ~/fig8ab_task
cd distComp; python redisManager.py --task loadTask --taskfile ~/fig8ab_task

figures 8c, 8d

grep -Ei 'twoq' ~/task > ~/fig8cd_task
cd distComp; python redisManager.py --task loadTask --taskfile ~/fig8cd_task

figures 9a

grep -Ei ' (beladyRandomLRU) ' ~/task > ~/fig9a_task
cd distComp; python redisManager.py --task loadTask --taskfile ~/fig9a_task

figures 9b,9c

grep -Ei ' (randomBelady) ' ~/task > ~/fig9ab_task
cd distComp; python redisManager.py --task loadTask --taskfile ~/fig9ab_task

figures 10a, 10b, 10c

grep -Ei ' (opt-clock) ' ~/task > ~/fig10_task
cd distComp; python redisManager.py --task loadTask --taskfile ~/fig10_task

figures 11a, 11b, 11c

grep -Ei ' (lru-delay|lru-prob|batch|clock|delayfr|age) ' ~/task > ~/fig11_task
cd distComp; python redisManager.py --task loadTask --taskfile ~/fig11_task

figures 12a, 12b, 12c

grep -Ei ' (delayfr) ' ~/task > ~/fig12_task
cd distComp; python redisManager.py --task loadTask --taskfile ~/fig12_task

figures 13a, 13b, 13c

grep -Ei ' (age) ' ~/task > ~/fig13_task
cd distComp; python redisManager.py --task loadTask --taskfile ~/fig13_task

Scalability experiments

We run scalability experiments on a single r650 node in CloudLab. Before starting, run the following scripts to disable Turbo Boost and hyper-threading to ensure stable and repeatable performance:

bash disable_turbo.sh
sudo bash disable_hyperthreading.sh off # requires admin privileges

To restrict execution to a single NUMA domain, use lscpu to identify the CPUs that belong to the same NUMA node. Then use taskset to pin the program to those cores:

taskset -c <cpu_list> ./generate_scalabilty_task.sh

The script used to generate the scalability experiment tasks is:

scripts/generate_scalability_task.sh

We provide the scalability results on scripts/data/ and Google Drive

Analyzing Results

The scripts/ directory contains Python scripts for analyzing the simulation results and generating the figures used in the paper.

parse_data.py: Parses the raw simulation output.
process_data.py: Processes the parsed data to generate intermediate results.
generate_figures.py: Generates the figures from the processed data.
style.py: Defines the plotting style for the figures.
matplotlib_wrapper.py: A wrapper for the matplotlib plotting library.
datasets.txt: A list of datasets used in the experiments.
generate_task.sh: A script to generate tasks for distComp.

To generate the figures, first you need to install the dependencies.

pip install -r requirements.txt

Parse and Process Results

To parse the results you need to run parse_data.py it took exactly one argument result_directory and output scripts/data/data.feather,data.csv. If scalability results is available you need to run parse_scalability.py it take scalability_result_directory as argument. You need to separate scalability and miss ratio result as they have distinct format. Next, run process_data.py to get the metrics for our figures. It outputs scripts/data/processed.feather,processed.csv and scripts/data/processed_scalability.feather,processed_scalability.csv if scalability result available

Note: only feather file is necessary, csv file only for quick and easy observation of the raw data.

Generate Figures

We provide generate_figures.py to generate all of the figures on our paper. To run the script you need to pass which figures you want to generate. You can pass more than one figures. For example if you want to generate figure1b and figure3c you can run:

python generate_figures.py figure1b figure3c

The figures would be generated as pdf inside scripts/figures

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
scripts		scripts
simulator-concurrent		simulator-concurrent
simulator		simulator
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Demystifying and Improving Lazy Promotion in Cache Eviction [VLDB'26]

Repository Structure

Environment Setup

Dependencies

Building the Simulator

Reproducing Experiment Results

Running Simulations

Experiments Details

Hardware

Experiment Size and Duration

Setup distComp

Running the Experiments

Miss ratio and Promotions count Experiment

Scalability experiments

Analyzing Results

Parse and Process Results

Generate Figures

About

Uh oh!

Releases

Packages

Uh oh!

Languages

cacheMon/Lazy-Promotions

Folders and files

Latest commit

History

Repository files navigation

Demystifying and Improving Lazy Promotion in Cache Eviction [VLDB'26]

Repository Structure

Environment Setup

Dependencies

Building the Simulator

Reproducing Experiment Results

Running Simulations

Experiments Details

Hardware

Experiment Size and Duration

Setup distComp

Running the Experiments

Miss ratio and Promotions count Experiment

Scalability experiments

Analyzing Results

Parse and Process Results

Generate Figures

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages