Added parameter optimization and evaluation functionality for iotools benchmarks#11
Added parameter optimization and evaluation functionality for iotools benchmarks#11DanteNiewenhuis wants to merge 11 commits intojblomer:masterfrom
Conversation
jalopezg-git
left a comment
There was a problem hiding this comment.
Many thanks, @DanteNiewenhuis! 👍
I have attached some preliminary comments, but I guess it's up to @jblomer to finally approve and merge.
gen_atlas.cxx
Outdated
| size_t pagesize = 65536; | ||
| size_t clustersize = 52428800; |
There was a problem hiding this comment.
| size_t pagesize = 65536; | |
| size_t clustersize = 52428800; | |
| size_t pageSize = (64 * 1024); | |
| size_t clusterSize = (50 * 1000 * 1000); |
Or even better, IMO, set it to 0 or -1U, and only call SetApproxUnzippedPageSize(), etc., if a value was actually set by the user.
If unset, importer should be using the default value from RNTupleWriteOptions.
There was a problem hiding this comment.
Same comments in gen_atlas.cxx apply here.
There was a problem hiding this comment.
Same comments in gen_atlas.cxx apply here.
There was a problem hiding this comment.
Same comment in gen_atlas.cxx apply here.
|
|
||
|
|
||
| import sys | ||
| sys.path.insert(0, "/home/dante-niewenhuis/Documents/CERN-parameters-new") |
There was a problem hiding this comment.
I think changing this path is not mentioned in the README.md. Also, what is it used for?
There was a problem hiding this comment.
Sadly, Python cannot import packages that are in the parent folder of a file.
To fix this, I add the optimization_tools folder to the paths so Python can find it.
I have changed the string so it is clearer that it is something the user should to change.
| page_sizes = [(16, "KB"), (32, "KB"), (64, "KB"), (128, "KB"), (256, "KB"), (512, "KB"), | ||
| (1, "MB"), (2, "MB"), (4, "MB"), (8, "MB"), (16, "MB")] |
There was a problem hiding this comment.
Per the convertToByte() function, MB here refers to a power-of-two unit, i.e. a mebibyte.
However, we typically use the MB (1000 * 1000) bytes as the unit for on-disk data (see, e.g. the default value for the fApproxZippedClusterSize member in RNTupleWriteOptions).
| return False | ||
|
|
||
|
|
||
| def convertToByte(value: int, form: str = "b") -> int: |
There was a problem hiding this comment.
This seems to duplicate this function in Parameters.py. I think we should get rid of one of them.
| """ | ||
| for line in benchmark_output.split("\n"): | ||
| if target in line: | ||
|
|
| for _ in tqdm(range(evaluations)): | ||
|
|
||
| # Reset cache | ||
| os.system('sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"') |
There was a problem hiding this comment.
The iotools repository already has clear_page_cache.c. The recipe in the Makefile also sets the suid bit, so that no sudo is required after
$ make clear_page_cache
jblomer
left a comment
There was a problem hiding this comment.
Very nice, thanks a lot!
Some minor comments.
Feel free to add your name and/or contact details to the README.
.gitignore
Outdated
| gen_h1 | ||
| gen_lhcb | ||
|
|
||
| *.pyc No newline at end of file |
There was a problem hiding this comment.
Missing newline at the end of the file
gen_atlas.cxx
Outdated
| size_t pagesize = (64 * 1024); | ||
| size_t clustersize = (50 * 1000 * 1000); |
There was a problem hiding this comment.
Here and in the other generators: use zero to indicate the default page size should be used if the option is not set. I think also the file name modification should only be done if there is a deviation from the default.
gen_cms.cxx
Outdated
| void Usage(char *progname) | ||
| { | ||
| std::cout << "Usage: " << progname << " -i <ttjet_13tev_june2019.root> -o <ntuple-path> -c <compression> [-m(t)]" | ||
| std::cout << "Usage: " << progname << " -o <ntuple-path> -c <compression> -p <page-size> -x <cluster-size> [-m(t)] <H1 root file>" |
There was a problem hiding this comment.
| std::cout << "Usage: " << progname << " -o <ntuple-path> -c <compression> -p <page-size> -x <cluster-size> [-m(t)] <H1 root file>" | |
| std::cout << "Usage: " << progname << " -o <ntuple-path> -c <compression> -p <page-size> -x <cluster-size> [-m(t)] <CMS root file>" |
optimization_tools/README.md
Outdated
| @@ -0,0 +1,12 @@ | |||
| This folder contains python files used to benchmark and optimize parameters based on the iotools benchmarks. | |||
|
|
|||
| To run the files, first the Paths in Benchmarking/variables.py need to be updated. | |||
There was a problem hiding this comment.
| To run the files, first the Paths in Benchmarking/variables.py need to be updated. | |
| To run the files, first the paths in Benchmarking/variables.py need to be updated. |
|
|
||
|
|
||
| import sys | ||
| sys.path.insert(0, "/home/dante-niewenhuis/Documents/CERN-parameters-new") |
There was a problem hiding this comment.
| sys.path.insert(0, "/home/dante-niewenhuis/Documents/CERN-parameters-new") | |
| sys.path.insert(0, "CERN-parameters-new") |
There was a problem hiding this comment.
This will not work in this case because "CERN-parameters-new" is a parent of the current working directory
| # root [12] auto ntuple = ROOT::Experimental::RNTupleReader::Open("DecayTree" , "generated/B2HHH~none.ntuple") | ||
| # root [13] ntuple->PrintInfo(ROOT::Experimental::ENTupleInfo::kStorageDetails) |
There was a problem hiding this comment.
Add to the comment what it means / why it is there
|
|
||
| for _ in tqdm(range(evaluations)): | ||
| # Reset cache | ||
| os.system('sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"') |
There was a problem hiding this comment.
For this we have the clean_page_cache utility
Added python files that can be used to optimize the parameters of reading and writing and RNTuple file.
The main functionality is evaluating single parameter configurations (see tutorials/evaluate_parameters.py),
and optimizing parameters for a specific benchmark (see tutorials/run_annealer.py).
Three different methods of optimizing parameters are provided in Algorithms.py:
To use the files, first the paths in variables.py have to be updated.
This update also adds two flags to the generator files for defining page and cluster size:
-p define page size in bytes (default 65536)
-x define cluster size in bytes (default 52428800)