Skip to content

Comments

Batch requests for docker image size#170

Open
jovandeginste wants to merge 1 commit intodevopshq:masterfrom
kuleuven:batch-size
Open

Batch requests for docker image size#170
jovandeginste wants to merge 1 commit intodevopshq:masterfrom
kuleuven:batch-size

Conversation

@jovandeginste
Copy link

We have a registry with many thousands of images. The function to calculate the size of the docker images sometimes takes down the whole Artifactory instance because of the required memory. Even when only a few images are return by the initial query, since the size of all images is then requested.

Instead of querying the size of every asset on the registry, this function now batches the artifacts in slices of 40 (there are limits in de query size), and requests the size of those artifact paths per slice. It then fills out the size information for thoses artifacts in the internal dictionary, as before.

This means the query is a lot more efficient for us:

  • The size query takes a few seconds instead of a few minutes.
  • Especially for queries returning only a few results!
  • No single (background) query should take down the whole server.

Copy link
Member

@allburov allburov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also run

pre-commit install
pre-commit run --all-files

to format it properly 🙏

"""

MANIFEST_FILENAME = "manifest.json"
BATCH_SIZE = 40
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there way to have it in config with 40 by default if not provided for backward compatability 🤔

Why 40 btw, let's do 42 may be? 😆

Just kidding, let's change to 100, idk, something that doesn't look like magic

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I touched on that in the commit message: I tried different "magic numbers"; 50 was too high, so I went with 40. This has to do with the query limit size (the size of the POST body). This indeed does not mean that 40 will work for everybody, all the time. On the other hand, this is mostly a thing if you actually have more than 40 things to query.

I would be happy to make this a config variable. That would solve the case where "once in a blue moon" 40 is too large, and I need it to be 20.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick attempt to change this into a config variable, but I'm not sure which rules need to change. Any help here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be change it to 100 and add some env variable, just as a fast solution (not an ideal one tho)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARTIFACTORY_CLEANUP_DOCKER_RULE_BATCH_SIZE

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this now

@jovandeginste
Copy link
Author

Could you also run

pre-commit install
pre-commit run --all-files

to format it properly 🙏

When I run this, it changes many files that I did not touch; are you sure this should "just work"? Maybe you have some system-level or user-level configuration for some of these tools....

We have a registry with many thousands of images. The function to
calculate the size of the docker images sometimes takes down the whole
Artifactory instance because of the required memory. Even when only a
few images are return by the initial query, since the size of _all_
images is then requested.

Instead of querying the size of _every_ asset on the registry, this
function now batches the artifacts in slices of 40 (there are limits in
de query size), and requests the size of those artifact paths per slice.
It then fills out the size information for thoses artifacts in the
internal dictionary, as before.

This means the query is a lot more efficient for us:
- The size query takes a few seconds instead of a few minutes.
- Especially for queries returning only a few results!
- No single (background) query should take down the whole server.

We provide an environment variable `ARTIFACTORY_CLEANUP_DOCKER_RULE_BATCH_SIZE`
to override this size.

Signed-off-by: Jo Vandeginste <Jo.Vandeginste@kuleuven.be>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants