Skip to content

Conversation

@vulyon
Copy link

@vulyon vulyon commented Jul 4, 2025

Enable Docker Daemon-Free Image Squashing

This PR directly addresses and resolves Issue #24: "Make it possible to run squashing without accessing Docker daemon" by introducing the ability for docker-squash to directly process Docker images from tar files. This eliminates the need for a running Docker daemon, significantly enhancing flexibility for image optimization in CI/CD pipelines, air-gapped environments, and systems without Docker installed.

Key Benefits

  • No Docker Daemon Required: Squash images anywhere you can save them as a tar file.
  • Ideal for Restricted Environments: Works seamlessly in CI/CD, air-gapped setups, or when Docker isn't running.
  • OCI Format Support: The tool automatically detects and processes images from OCI-formatted input tar files. Please note: Currently, testing has been focused solely on OCI format images.
  • Preserves Layer History: Ensures compatibility and traceability for your squashed images.

How to Use

  1. Export the Image:

    $ docker save -o source.tar jboss/wildfly:latest
    
  2. Squash from Tar:

    $ python -m docker_squash.cli --input-tar source.tar --tag jboss/wildfly:squashed -f 8 --output-path squashed.tar --load-image false
    
    • Use --input-tar for your source image file.
    • --tag is recommended for the new image name.
    • --output-path specifies where to save the squashed tar file.
    • --load-image false prevents the tool from attempting to load the image directly into a Docker daemon.
  3. Load into Docker (Optional):

    $ docker load -i squashed.tar
    

This enhancement significantly broadens docker-squash's utility, making image size optimization more accessible across diverse development and deployment scenarios.

@vulyon vulyon force-pushed the support-image-tar branch from 7458af5 to f04a699 Compare July 4, 2025 07:52
@vulyon
Copy link
Author

vulyon commented Jul 4, 2025

hello,sir @goldmann @rnc it appears there's an issue with the runner environment. It looks like the CI/CD checks are failing due to the Ubuntu 20.xx runner deprecation (as per the error message). This doesn't seem to be related to my code changes.

@rnc rnc force-pushed the support-image-tar branch from f04a699 to cad66c8 Compare July 4, 2025 14:03
@rnc
Copy link
Collaborator

rnc commented Jul 4, 2025

@lyon-v I've fixed the runners and rebased. However I think you need to run ./support/run_formatter.py to fixup the formatting as well.

@vulyon vulyon force-pushed the support-image-tar branch from cad66c8 to c5feb62 Compare July 6, 2025 09:41
@vulyon
Copy link
Author

vulyon commented Jul 6, 2025

@rnc Hi there! Apologies for the delayed response. I've fixed the code formatting and it has passed the checks now.

@vulyon
Copy link
Author

vulyon commented Jul 7, 2025

Sir @rnc @goldmann .Do I need to squash these two commits into a single one?

Copy link
Collaborator

@rnc rnc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think this is a great idea. However there are some areas I have questions/comments on. And tests are also required please. Thanks very much for the PR!

README.rst Outdated

::

$ python -m docker_squash.cli --input-tar source.tar --tag jboss/wildfly:squashed -f 8 --output-path squashed.tar --load-image false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be run without the -f parameter as the log out below has the squashed image larger than the original which is a confusing result for a README. Also, if both docker squash and tar squash have an example showing the same result IMHO its more inituitive.

Copy link
Author

@vulyon vulyon Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because, jboss/wildfly:latest this image has changed.

(base) root@master:~# docker pull jboss/wildfly:latest
latest: Pulling from jboss/wildfly
f87ff222252e: Pull complete
8116b2f7ca5a: Pull complete
0b43aea4eeb1: Pull complete
13776e8da872: Pull complete
f26d32e28c29: Pull complete
Digest: sha256:35320abafdec6d360559b411aff466514d5741c3c527221445f48246350fdfe5
Status: Downloaded newer image for jboss/wildfly:latest
docker.io/jboss/wildfly:latest

(base) root@master:~# docker history jboss/wildfly:latest
IMAGE CREATED CREATED BY SIZE COMMENT
35320abafdec 3 years ago /bin/sh -c #(nop) CMD ["/opt/jboss/wildfly/… 0B
3 years ago /bin/sh -c #(nop) EXPOSE 8080 0B
3 years ago /bin/sh -c #(nop) USER jboss 0B
3 years ago /bin/sh -c #(nop) ENV LAUNCH_JBOSS_IN_BACKG… 0B
3 years ago /bin/sh -c cd $HOME && curl -L -O https:… 270MB
3 years ago /bin/sh -c #(nop) USER root 0B
3 years ago /bin/sh -c #(nop) ENV JBOSS_HOME=/opt/jboss… 0B
3 years ago /bin/sh -c #(nop) ENV WILDFLY_SHA1=238e67f4… 0B
3 years ago /bin/sh -c #(nop) ENV WILDFLY_VERSION=25.0.… 0B
4 years ago /bin/sh -c #(nop) ENV JAVA_HOME=/usr/lib/jv… 0B
4 years ago /bin/sh -c #(nop) USER jboss 0B
4 years ago /bin/sh -c yum -y install java-11-openjdk-de… 239MB
4 years ago /bin/sh -c #(nop) USER root 0B
4 years ago /bin/sh -c #(nop) MAINTAINER Marek Goldmann… 0B
4 years ago /bin/sh -c #(nop) USER jboss 0B
4 years ago /bin/sh -c #(nop) WORKDIR /opt/jboss 0B
4 years ago /bin/sh -c groupadd -r jboss -g 1000 && user… 406kB
4 years ago /bin/sh -c yum update -y && yum -y install x… 33.5MB
4 years ago /bin/sh -c #(nop) MAINTAINER Marek Goldmann… 0B
5 years ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0B
5 years ago /bin/sh -c #(nop) LABEL org.label-schema.sc… 0B
5 years ago /bin/sh -c #(nop) ADD file:61908381d3142ffba… 222MB

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix the readme.rst

parser.add_argument(
"--input-tar",
help="Path to tar file created by 'docker save'. Process tar file directly without requiring Docker daemon.",
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should investigate using exclusive groups for argparse - as that has built in support for having either the --input-tar or image option and would avoid the manual checks below.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - I think its valid for output-path to be the same as input-tar (?) , should, in tar mode, this be the default?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great ! I have the code changes.


def __init__(
self, log, tar_path, from_layer=None, tmp_dir=None, tag=None, comment=""
):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TarImage derives from Image (which is good) but isn't calling super. Further I think it duplicates some code from image.py (and potentially v2_image). Could there be more attempt at normalising the code to avoid duplication?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sir. I will fix this

- Works in CI/CD pipelines and restricted environments
- Supports both Docker format and OCI format images
- Maintains complete layer history compatibility
- Can process images on systems where Docker is not installed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would imagine that its helpful when working with podman as well

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely! That's a great point. The --input-tar feature is indeed very helpful for Podman users.

Since Podman uses podman save to export images in the same tar format as docker save, users can now:

# Export image with Podman
podman save myimage:latest -o image.tar

# Squash with docker-squash (no Docker daemon required)
docker-squash --input-tar image.tar --tag myimage:squashed --output-path squashed.tar

# Import back to Podman
podman load -i squashed.tar

This workflow is particularly valuable in environments where:

  • Only Podman is available (no Docker daemon)
  • Running in CI/CD pipelines with Podman
  • Working in rootless containers or restricted environments
  • Processing images offline without any container runtime

Should I add a Podman example to the documentation to highlight this use case?

self.log.info("Detected Docker format image")
self.oci_format = False
else:
raise SquashError("Unable to detect image format - missing manifest files")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this duplicating v2_image::_get_manifest ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right! There is indeed duplication with v2_image::_get_manifest. Both methods:

  1. Check for index.json to detect OCI format
  2. Set self.oci_format = True/False
  3. Handle manifest file reading

I should refactor this to reuse the existing logic. A few options:

Option 1: Extract common logic to base class

# In Image base class
def detect_image_format(self):
if os.path.exists(os.path.join(self.old_image_dir, "index.json")):
self.oci_format = True
return "oci"
elif os.path.exists(os.path.join(self.old_image_dir, "manifest.json")):
self.oci_format = False
return "docker"
else:
raise SquashError("Unable to detect image format")

Option 2: Have TarImage reuse v2_image's get_manifest

# In TarImage
def detect_image_format(self):
try:
# This will set self.oci_format as a side effect
self.manifest = self.get_manifest() # Inherit from v2_image logic
except SquashError:
raise SquashError("Unable to detect image format")

I lean toward Option 1 as it's cleaner separation of concerns. What do you think?

self.log.info(
"💡 Tip: Consider using --tag to specify a name for your squashed image"
)
self.log.info(" Example: --tag myimage:squashed")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a tag make sense for an output tar? It is probably of only relevance if --load-image has been specified?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I respectfully disagree with this assessment. The --tag parameter is meaningful for output tar files regardless of the --load-image setting, here's why:

Tag is part of image metadata in tar format:

  • Docker/Podman tar format stores tags in manifest.json under RepoTags field
  • This metadata becomes part of the squashed tar file

Tag is useful in all scenarios:

  1. --load-image true: Image gets loaded with the specified tag
  2. --load-image false + --output-path: The output tar contains tag metadata, so when someone later runs docker load -i squashed.tar, the image will have the proper tag
  3. Distribution: Tagged tar files are more useful when shared with others

Without --tag, the consequences are significant:

# Without tag - image loads but has no name
$ docker load -i squashed.tar
Loaded image ID: sha256:abc123...
$ docker images
REPOSITORY TAG IMAGE ID
<none> <none> sha256:abc123... # Hard to identify!

# With tag - much more usable
$ docker load -i squashed.tar
Loaded image: myapp:squashed
$ docker images
REPOSITORY TAG IMAGE ID
myapp squashed sha256:abc123... # Clear identification

The tip message encourages good practices for tar-based workflows, not just --load-image scenarios. The tag becomes part of the portable tar artifact.

@rnc
Copy link
Collaborator

rnc commented Aug 5, 2025

@lyon-v Did you wish to discuss any of the comments?

@vulyon
Copy link
Author

vulyon commented Aug 20, 2025

sir, my apologies for the slow response. I've been swamped with work lately, but I'll reply to or fix these issues shortly.

@vulyon vulyon requested a review from rnc August 20, 2025 08:22
@vulyon vulyon closed this by deleting the head repository Nov 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants