Batch requests for docker image size#170
Batch requests for docker image size#170jovandeginste wants to merge 1 commit intodevopshq:masterfrom
Conversation
allburov
left a comment
There was a problem hiding this comment.
Could you also run
pre-commit install
pre-commit run --all-files
to format it properly 🙏
artifactory_cleanup/rules/docker.py
Outdated
| """ | ||
|
|
||
| MANIFEST_FILENAME = "manifest.json" | ||
| BATCH_SIZE = 40 |
There was a problem hiding this comment.
Is there way to have it in config with 40 by default if not provided for backward compatability 🤔
Why 40 btw, let's do 42 may be? 😆
Just kidding, let's change to 100, idk, something that doesn't look like magic
There was a problem hiding this comment.
I touched on that in the commit message: I tried different "magic numbers"; 50 was too high, so I went with 40. This has to do with the query limit size (the size of the POST body). This indeed does not mean that 40 will work for everybody, all the time. On the other hand, this is mostly a thing if you actually have more than 40 things to query.
I would be happy to make this a config variable. That would solve the case where "once in a blue moon" 40 is too large, and I need it to be 20.
There was a problem hiding this comment.
I took a quick attempt to change this into a config variable, but I'm not sure which rules need to change. Any help here?
There was a problem hiding this comment.
may be change it to 100 and add some env variable, just as a fast solution (not an ideal one tho)
There was a problem hiding this comment.
ARTIFACTORY_CLEANUP_DOCKER_RULE_BATCH_SIZE
When I run this, it changes many files that I did not touch; are you sure this should "just work"? Maybe you have some system-level or user-level configuration for some of these tools.... |
78bdff2 to
7dedfd8
Compare
We have a registry with many thousands of images. The function to calculate the size of the docker images sometimes takes down the whole Artifactory instance because of the required memory. Even when only a few images are return by the initial query, since the size of _all_ images is then requested. Instead of querying the size of _every_ asset on the registry, this function now batches the artifacts in slices of 40 (there are limits in de query size), and requests the size of those artifact paths per slice. It then fills out the size information for thoses artifacts in the internal dictionary, as before. This means the query is a lot more efficient for us: - The size query takes a few seconds instead of a few minutes. - Especially for queries returning only a few results! - No single (background) query should take down the whole server. We provide an environment variable `ARTIFACTORY_CLEANUP_DOCKER_RULE_BATCH_SIZE` to override this size. Signed-off-by: Jo Vandeginste <Jo.Vandeginste@kuleuven.be>
7dedfd8 to
b0444bb
Compare
We have a registry with many thousands of images. The function to calculate the size of the docker images sometimes takes down the whole Artifactory instance because of the required memory. Even when only a few images are return by the initial query, since the size of all images is then requested.
Instead of querying the size of every asset on the registry, this function now batches the artifacts in slices of 40 (there are limits in de query size), and requests the size of those artifact paths per slice. It then fills out the size information for thoses artifacts in the internal dictionary, as before.
This means the query is a lot more efficient for us: