This is a project that attempts to benchmark the different backends used by recheck. See Regex-Radar/regex-radar#24 for more info.
The benchmark is still incomplete not taking into account of startup times, warmup times and IPC overhead (especially for the worker thread). Additionally the worker is implemented as a pool, but has a fixed size of 1. Some early conclusions:
nativeis pretty fast, as expectedjavais the fastest for a batch, slow for individual patterns, not sure whywasmis very fast for individual regex checks, and still has a good performance for batchesscalajsis the slowest as expectedthread-workerhas similar performance asscalajs, but the messaging overhead is noticablethread-worker-wasmhas a a similar performance towasm, with IPC overhead being noticable if a lot of checks are done in a small window of time.- increasing the
workerPoolSizetocpucount oravailableParallelism()does not seem to give any benefits which is odd, since especially batches should see some work being done in paralell.
Some variations to benchmark:
worker pools with multiple workers, measure the memory/process spawning overhead as additional cost- write some benchmarking suites, with setup/teardowns to measure:
- the overhead of spawning processes
- the overhead of the IPC
- the performance of the individual
- do some memory profiling, to check the memory overhead
As the aim of the benchmark is to figure out the performance gains vs the portability of the backend, wasm seems like a very good choice, if you ignore the experimental status of its build target. use thread-worker-wasm to keep the async requirement, with a small overhead for the IPC messaging being worth the price of being able to use async/await.
Attempts at visualising the initial results.
- Latency - lower is better
- Throughput - higher is better
6regexes in the dataset
| Backend | Avg Latency | (ms) | Throughput | (ops/sec) |
|---|---|---|---|---|
java |
██ | 97 |
██████████ | 10 |
native |
███ | 135 |
███████ | 7 |
wasm |
██████ | 276 |
████ | 4 |
scalajs |
████████████ | 613 |
██ | 2 |
threadWorker |
█████████████ | 628 |
██ | 2 |
- Throughput (ops/sec)
| Pattern | native (ops/sec) |
java (ops/sec) |
wasm (ops/sec) |
scalajs (ops/sec) |
threadWorker (ops/sec) |
|---|---|---|---|---|---|
(https?:\/\/(w{3}\.)?)+[a-zA-Z0-9\-]+\.[a-z]+(\/[^\s]*)* |
7 |
11 |
4 |
2 |
2 |
^(a+)+$ |
1232 |
39 |
1958 |
552 |
562 |
(a|aa)+$ |
999 |
48 |
1227 |
416 |
324 |
(x+x+)+y |
682 |
60 |
647 |
275 |
237 |
^([a-zA-Z0-9_.+-]+)+@(([a-zA-Z0-9-])+.)+[a-zA-Z0-9]{2,4}$ |
240 |
42 |
153 |
68 |
65 |
^([A-Z]:\\)?(\\[A-Za-z_\-\s0-9\.]+)+\\?$ |
486 |
46 |
327 |
155 |
143 |
# install dependencies
npm i
# run benchmark
npm run benchThe regular expressions used to benchmark are found in ./data/redos-vulnerable.json.