This repository contains the source code to reproduce the results and figures presented in the paper ExaLogLog: Space-Efficient and Practical Approximate Distinct Counting up to the Exa-Scale which was accepted at EDBT 2025.
This work introduces ExaLogLog, a new data structure for approximate distinct counting, which has the same practical properties as the popular HyperLogLog algorithm. It is commutative, idempotent, mergeable, reducible, has a constant-time insert operation, and supports distinct counts up to the exa-scale. At the same time, as theoretically derived and experimentally verified, it requires 43% less space to achieve the same estimation error.
-
Create an Amazon EC2 c5.metal instance with Ubuntu Server 24.04 LTS and 20GiB of storage.
-
Clone the repository including submodules:
git clone https://github.com/dynatrace-research/exaloglog-paper.git && cd exaloglog-paper && git submodule init && git submodule update -
Install all required packages:
sudo apt update && sudo apt --yes install openjdk-21-jdk python-is-python3 python3-pip texlive texlive-latex-extra texlive-fonts-extra texlive-science black && pip install -r python/requirements.txt --break-system-packages -
To reproduce the estimation error results in the folder
results/errorrun thesimulateEstimationErrorstask (takes ~35min):./gradlew simulateEstimationErrors -
To reproduce the empirically determined memory-variance product (MVP) values based on the actual allocated memory and the serialization size of different data structure implementations for approximate distinct counting run the
runEmpiricalMVPComputationtask (takes ~1h):./gradlew runEmpiricalMVPComputationThe results can be found in the
results/comparison-empirical-mvpfolder. -
To reproduce the performance benchmark results in the folder
results/benchmarksdisable Turbo Boost (set P-state to 1), run therunBenchmarkstask (takes ~8h), and enable Turbo Boost again:sudo sh -c "echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo"; ./gradlew runBenchmarks; sudo sh -c "echo 0 > /sys/devices/system/cpu/intel_pstate/no_turbo" -
To calculate theoretical MVP constants as well as constants used in the Java implementation run the
calculateConstantstask (takes ~9min, not needed for the figures):./gradlew calculateConstantsThe output can then be found in the
results/constantsfolder. -
To (re-)generate all figures in the
paperdirectory execute thepdfFigurestask (takes ~90s):./gradlew pdfFigures -
The unit tests are executed by the
testtask (takes ~13min, not needed for the figures):./gradlew test