Skip to content

Cache the result of "single" statistics runs from the evobench-evaluator #26

@pflanze

Description

@pflanze

When evobench-run finishes a job run, the run's output file (evobench.log.zstd) is currently processed once by evobench-evaluator to produce the single.xlsx file, and once to produce the single*.svg files. Then it starts evobench-evaluator on all existing evobench.log.zstd files for the same job to produce the summary.xlsx file, and again for the summary*.svg files; that latter part has to open up to 10 evobench.log.zstd files. I used to think that this is not a big deal because I made evobench-evaluator process all log files in parallel, and given gs-staging-1 has more than 10 cores that means the cost was very manageable. But it can easily happen that the log files are too large to be able to hold all 10 files fully in RAM, thus I had to remove the parallelism. Now the processing cost for just one job run is around 1 hour (2 hours towards the end when 10 files have collected), while the benchmarking part of a job run is done in under ~10 minutes, kind of a disconnect. Calculating a trend (graph) would be even slower. Caching the output ("single" statistics data) of the first two runs described above in a format that can be read in again is all that's needed to solve that issue completely.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions