From b470f7364eca5fabb4503354736cf34779eadcf8 Mon Sep 17 00:00:00 2001 From: Colin Walters Date: Thu, 12 Feb 2026 01:37:08 +0000 Subject: [PATCH] doc/plans: Update OCI sealing spec (kernel sigs, flattened layers) The biggest goal here is support for Linux kernel-native fsverity signatures to be attached to layers, which enables integration with IPE. Add support for a fully separate OCI "composefs signature" artifact which can be attached to an image. Drop the -impl.md doc...it's not useful to try to write this stuff in markdown. The spec has some implementation considerations, but it's easier to look at implementation side from a code draft. Add standardized-erofs-meta.md as a placeholder document outlining the goal of standardizing composefs EROFS serialization across implementations (canonical model: tar -> dumpfile -> EROFS). Assisted-by: OpenCode (Claude Opus 4.5) Signed-off-by: Colin Walters --- doc/plans/oci-sealing-impl.md | 210 ----------------- doc/plans/oci-sealing-spec.md | 332 ++++++++++++++++++++------- doc/plans/standardized-erofs-meta.md | 74 ++++++ 3 files changed, 321 insertions(+), 295 deletions(-) delete mode 100644 doc/plans/oci-sealing-impl.md create mode 100644 doc/plans/standardized-erofs-meta.md diff --git a/doc/plans/oci-sealing-impl.md b/doc/plans/oci-sealing-impl.md deleted file mode 100644 index dea523c9..00000000 --- a/doc/plans/oci-sealing-impl.md +++ /dev/null @@ -1,210 +0,0 @@ -# OCI Sealing Implementation in composefs-rs - -This document describes the implementation of OCI sealing in composefs-rs. For the generic specification applicable to any composefs implementation, see [oci-sealing-spec.md](oci-sealing-spec.md). - - - -## Current Implementation Status - -### What Exists - -The `composefs-oci` crate at `crates/composefs-oci/src/image.rs` already implements the core sealing mechanism. The `seal()` function computes the fsverity digest via `compute_image_id()`, creates an EROFS image from merged layers with whiteouts applied, and stores the digest in `config.labels["containers.composefs.fsverity"]`. A new config with updated labels is written via `write_config()`, returning both the SHA256 config digest and fsverity image digest. - -The implementation includes fsverity computation and verification through the `composefs` crate's fsverity module. Config label storage follows the OCI specification with digest mapping from SHA256 to fsverity maintained in split streams. Repository-level integrity verification is provided through `check_stream()` and `check_image()`. Mount operations check for the seal label and use fsverity verification when present. - -All objects in the repository are fsverity-enabled by default, with digests stored using the generic `ObjectID` type parameterized over `FsVerityHashValue`. Images are tracked separately in the `images/` directory, distinct from general objects due to the kernel security model that restricts non-root filesystem mounting. - -### Current Workflow - -The sealing workflow in composefs-rs begins with `create_filesystem()` building the filesystem from OCI layers. Layer tar streams are imported via `import_layer()`, converting them to composefs split streams. Files 64 bytes or smaller are stored inline in the split stream, while larger files are stored in the object store with fsverity digests. Layers are processed in order, applying overlayfs semantics including whiteout handling (`.wh.` files). Hardlinks are tracked properly across layers to maintain filesystem semantics. - -After building the filesystem, `compute_image_id()` generates the EROFS image and computes its fsverity digest. The digest is stored in the config label `containers.composefs.fsverity`. The `write_config()` function writes the new config to the repository with the digest mapping, and both the SHA256 config digest and fsverity image digest are returned. - -For mounting, the `mount()` operation requires the `containers.composefs.fsverity` label to be present. It extracts the image ID from the label and mounts at the specified path with kernel fsverity verification. - -## Repository Architecture - -The composefs-rs repository architecture at `crates/composefs/src/repository.rs` supports sealing without major changes. Objects are stored in a content-addressed layout under `objects/XX/YYY...` where `XX` is the first byte of the fsverity digest and `YYY` are the remaining 62 hex characters. All files in `objects/` must have fsverity enabled, enforced via `ensure_verity_equal()`. - -Images are tracked separately in the `images/` directory as symlinks to objects, with refs providing named references and garbage collection roots. Split streams are stored in the `streams/` directory, also as symlinks to objects. The repository has an "insecure" mode for development without fsverity filesystem support, but sealing operations should explicitly fail in this mode. - -Two-level naming allows access by fsverity digest (verified) or by ref name (unverified). The `ensure_stream()` method provides idempotent stream creation with SHA256-based deduplication. Streams can reference other streams via digest maps stored in split stream headers, enabling the layer→config relationship tracking. - -## Required Enhancements - -### Manifest Annotations - -Manifest annotations should be added to indicate sealed images and enable discovery without parsing configs. The sealing operation should add `containers.composefs.sealed` set to `"true"` and optionally `containers.composefs.image.fsverity` containing the image digest. This allows registries to discover sealed images and clients to optimize pull strategies. - -### Per-Layer Digest Annotations - -Per-layer digests enable incremental verification and caching. A `SealedImageInfo` structure should track the image fsverity digest, config SHA256 digest, optional config fsverity digest, and a list of layer seal information. Each `LayerSealInfo` entry should contain the original tar layer digest, the composefs fsverity of the layer, and the split stream digest in the repository. - -During sealing, layer descriptors should be annotated with `containers.composefs.layer.fsverity` after processing each layer. This allows verification of individual layers before merging and enables caching where shared layers have known composefs digests. - -### Verification API - -A standalone verification API separate from mounting should be implemented. The verification function should check manifest annotations for the seal flag, fetch and verify the config against the manifest's config descriptor, extract the fsverity digest from the config label, verify annotated layers if present, and optionally verify the image exists in the repository. - -This enables verification before mounting and provides detailed seal information without building the filesystem. The returned `SealedImageInfo` structure contains all digest relationships and layer details. - -### Pull Integration - -The `pull()` function in `crates/composefs-oci/src/image.rs` should be enhanced to handle sealed images. When a verify_seal flag is enabled, the pull operation should check manifest annotations for the sealed flag and verify the seal during pull if present. If the image is sealed and verification passes, some integrity checks can be skipped since the composefs digests are trusted. - -An optimization is that sealed images don't require re-computing digests during import if verification already passed. The pull result should include optional seal information alongside the manifest and config. - -### Push Integration - -Support for pushing sealed images back to registries requires preserving seal annotations through the registry round-trip. The push operation should construct the manifest with seal annotations, push the config with the composefs label, push layers optionally with layer annotations, and push the manifest with seal annotations. - -The challenge is maintaining digest mappings through the registry round-trip, as registries may re-compress or re-package layers while preserving content digests. - -### Insecure Mode Handling - -Repository sealing operations should explicitly fail when the repository is in insecure mode. The rationale is that if the repository doesn't enforce fsverity, sealing provides no security benefit. The check should be performed at the beginning of seal operations, returning an error if `repo.is_insecure()` is true. - -## Implementation Phases - -### Phase 1: Core Sealing (Completed) - -Phase 1 is complete with basic `seal()` implementation in `composefs-oci`, fsverity computation and storage, config label with digest, and mount with seal verification. - -### Phase 2: Manifest Annotations (Planned) - -Phase 2 will add manifest annotation support to `seal()`, create the `SealedImageInfo` type, implement the `verify_seal()` API, document the label/annotation schema, and add tests for sealed image workflows. - -Deliverables include `seal()` emitting manifests with annotations, standalone verification without mounting, and updated documentation in `doc/oci.md`. - -### Phase 3: Per-Layer Digests (Planned) - -Phase 3 will record per-layer fsverity during sealing, add layer annotations to manifests, implement incremental verification, and optimize pull for sealed images. - -Deliverables include full `SealedImageInfo` with layer details, layer-by-layer verification API, and performance improvements for sealed pulls. - -### Phase 4: Push/Registry Integration (Planned) - -Phase 4 will implement push support for sealed images, preserve annotations through registry round-trip, test with standard OCI registries, and document registry compatibility. - -Deliverables include bidirectional registry support, a registry compatibility matrix, and integration tests with real registries. - -### Phase 5: Advanced Features (Future) - -Future work includes dumpfile digest support, eager/lazy verification modes, zstd:chunked integration, the three-digest model, and signature integration. - -## API Design Considerations - -### Type Safety - -The generic `ObjectID` type parameterized over `FsVerityHashValue` provides type safety for digest handling. Both `Sha256HashValue` and `Sha512HashValue` implement the `FsVerityHashValue` trait with hex encoding/decoding, object pathname format, and algorithm ID constants. - -### Async/Await - -Operations like `seal()` and `pull()` are async to support parallel layer fetching with semaphore-based concurrency control. The repository is wrapped in `Arc` to enable sharing across async contexts. - -### Error Handling - -The codebase uses `anyhow::Result` for error handling with context. Seal operations should provide clear error messages distinguishing between fsverity failures, missing labels, and repository integrity issues. - -### Verification Modes - -Supporting both eager and lazy verification requires a configuration option, potentially as an enum `SealVerificationMode` with variants `Eager`, `Lazy`, and `Never`. Different defaults may apply for user versus system repositories. - -## Integration Points - -### Split Streams - -Split streams at `crates/composefs/src/splitstream.rs` are the intermediate format between OCI tar layers and composefs EROFS images. They contain inline data for small files and references to objects for large files. Split stream headers include digest maps linking SHA256 layer digests to fsverity digests. - -Per-layer sealing should leverage split streams to maintain the digest mapping. The split stream format doesn't need changes but seal metadata should reference split stream digests. - -### EROFS Generation - -EROFS image generation via `mkfs_erofs()` in `crates/composefs/src/erofs/` creates reproducible images from filesystem trees. The EROFS writer handles inline data, shared data, and metadata blocks with deterministic layout. The same input filesystem produces the same EROFS digest. - -Sealing relies on this determinism for verification. The EROFS format version may evolve, which is why dumpfile digests are being considered as a format-agnostic alternative. - -### Fsverity Module - -The fsverity module at `crates/composefs/src/fsverity/` provides userspace computation matching kernel behavior and ioctl wrappers for kernel interaction. Digest computation uses a hardcoded 4096-byte block size with no salt support, matching kernel fs-verity defaults. - -Sealing uses `compute_verity()` for userspace digest computation during EROFS generation and `enable_verity_maybe_copy()` to handle ETXTBSY by copying files if needed. Verification uses `measure_verity()` to get kernel-measured digests and `ensure_verity_equal()` to compare against expected values. - -## Open Implementation Questions - -### Config Annotation Method - -The current code calls `config.get_config_annotation()` which actually reads from labels, not annotations. This naming suggests potential confusion between OCI label and annotation semantics. Clarification is needed whether storing in labels is intentional or if annotations should be used for the digest. - -### Sealed Config Mutability - -Sealing modifies config content by adding the label, creating a new SHA256 for the config and breaking existing references to the old config digest. This may be acceptable since the sealed config is a new artifact, but it needs clear documentation about the relationship between sealed and unsealed images. - -### Performance at Scale - -Computing fsverity for large images is expensive as `compute_image_id()` builds the entire EROFS in memory. Streaming approaches or caching strategies should be considered for multi-GB images. The EROFS writer could be enhanced to support streaming output with incremental digest computation. - -### Seal Metadata Persistence - -Optionally persisting `SealedImageInfo` as `.seal.json` alongside images in the repository could enable faster seal information retrieval without re-parsing configs. This metadata cache would need invalidation strategies and shouldn't be security-critical. - -### Repository Ref Strategy - -Sealed images have different config digests than unsealed images. The ref strategy for managing variants should avoid keeping both sealed and unsealed versions indefinitely. Garbage collection should understand the relationship between sealed and unsealed images, potentially tracking seal derivation relationships. - -## Testing Strategy - -Testing should cover sealing unsealed images and verifying the config label is added correctly with the expected fsverity digest. Mounting sealed images should verify that fsverity is checked by the kernel. Verification API tests should check correct extraction of seal information from manifest and config. - -Per-layer annotation tests should verify layer digests are computed and annotated correctly. Pull integration tests should verify detection and verification of sealed images during pull. Push integration tests should verify seal metadata is preserved through registry round-trip. - -Negative tests should verify that seal operations fail in insecure mode, mounting fails with incorrect fsverity digest, and verification fails with missing or incorrect labels. - -Performance tests should measure sealing time for various image sizes and verify parallel layer processing performance. - -## Compatibility Considerations - -### OCI Registry Compatibility - -Standard OCI registries should store and serve sealed images without special handling. Unknown labels and annotations are preserved by spec-compliant registries. Testing should verify round-trip through common registries like Docker Hub, Quay, and GitHub Container Registry. - -### Existing Composefs-rs Versions - -The seal format version label enables detection of format changes. Forward compatibility means newer implementations can read older seals. Backward compatibility means older implementations should gracefully ignore newer seal formats they don't understand. - -### C Composefs Compatibility - -While composefs-rs aims to become the reference implementation, compatibility with the C composefs implementation should be maintained where feasible. EROFS images and dumpfiles should be interchangeable. Digest computation must match exactly between implementations. - -## Future Implementation Work - -### Dumpfile Digest Support - -Supporting dumpfile digests requires adding `containers.composefs.dumpfile.sha256` label computation during sealing. Verification should support parsing EROFS back to dumpfile format and verifying the digest. Caching the dumpfile→fsverity mapping requires careful security consideration to avoid cache poisoning. - -### zstd:chunked Integration - -Integration with zstd:chunked requires reading and writing TOC metadata with fsverity digests added to entries. The TOC format from the estargz/stargz-snapshotter projects would need extension for fsverity. Direct TOC→dumpfile conversion would enable unified metadata handling. - -### Non-Root Mounting Helper - -A separate composefs-mount-helper service would accept dumpfiles from unprivileged users, generate EROFS images, validate fsverity, and return mount file descriptors. This requires privileged service implementation with careful input validation on the dumpfile format. - -### Signature Integration - -Integrating with cosign or sigstore requires fetching and verifying signatures during pull, associating signatures with sealed images in the repository, and potentially storing signature references in seal metadata. The signature verification should happen before seal verification in the trust chain. - -## References - -See [oci-sealing-spec.md](oci-sealing-spec.md) for the generic specification and complete reference list. - -**Implementation references**: -- `crates/composefs-oci/src/image.rs` - OCI image operations including seal() -- `crates/composefs/src/repository.rs` - Repository management -- `crates/composefs/src/fsverity/` - Fsverity computation and verification -- `crates/composefs/src/splitstream.rs` - Split stream format -- `crates/composefs/src/erofs/` - EROFS generation - -**Related composefs-rs issues**: -- Check for existing issues about OCI sealing enhancements -- File new issues for specific implementation work items diff --git a/doc/plans/oci-sealing-spec.md b/doc/plans/oci-sealing-spec.md index 98d000bf..bbe7f2d0 100644 --- a/doc/plans/oci-sealing-spec.md +++ b/doc/plans/oci-sealing-spec.md @@ -8,164 +8,331 @@ Container images need cryptographic verification that efficiently covers the ent Hence verifying the integrity of an individual file would require re-synthesizing the entire tarball (using tar-split or equivalent) and computing its digest. +## Related projects + +- **[containerd EROFS snapshotter](https://github.com/containerd/containerd/blob/main/docs/snapshotters/erofs.md)**: Converts OCI layers to EROFS blobs with optional fsverity protection. Supports `enable_fsverity = true` to enable fs-verity on layer blobs. Uses reproducible builds with erofs-utils 1.8+ (`-T0 --mkfs-time`). dm-verity integration is planned but not yet implemented. + ## Solution The core primitive of composefs is fsverity, which allows incremental online verification of individual files. The complete filesystem tree metadata is itself stored as a file which can be verified in the same way. The critical design question is how to embed the composefs digest within OCI image metadata such that external signatures can efficiently cover the entire filesystem tree. -## Design Goals +## Core Design -The OCI sealing specification aims to provide efficient verification where a signature on an OCI manifest cryptographically covers the entire filesystem tree without re-hashing content. The specification defines standardized metadata locations for composefs digests and supports future format evolution without breaking existing images. +"composefs digest" here means the fsverity digest of the EROFS metadata file. The EROFS generation must be reproducible — given identical input filesystem trees, implementations must produce byte-for-byte identical EROFS images. See [standardized-erofs-meta.md](standardized-erofs-meta.md) for the goals and open questions around EROFS serialization standardization. However, fsverity is configurable based on digest (SHA-256 or SHA-512 currently) as well as block size (4k and e.g. 64k). -Incremental verification must be supported, enabling verification of individual layers or the complete flattened filesystem. The design accommodates both registry-provided sealed images and client-side sealing workflows while maintaining backward compatibility with existing OCI tooling and registries. +For standardized short form of the combination, a string of the form `fsverity-${DIGEST}-${BLOCKSIZEBITS}` is used. The `fsverity-` prefix makes clear this is an fsverity Merkle tree digest, not a simple hash: -## Core Design +- `fsverity-sha256-12` (SHA-256, 4k block size, 2^12) +- `fsverity-sha512-12` (SHA-512, 4k block size) +- `fsverity-sha256-16` (SHA-256, 64k block size, 2^16) +- `fsverity-sha512-16` (SHA-512, 64k block size) -### Composefs Digest Storage +Digests are encoded as lowercase hexadecimal. -The composefs fsverity digest is stored as a label in the OCI image config: +### Recommended default algorithm -```json -{ - "config": { - "Labels": { - "containers.composefs.fsverity": "sha256:a3b2c1d4e5f6..." - } - } -} -``` +The suggested default is `fsverity-sha512-12` - this maximizes compatibility as +not every system can support higher page sizes, and also maximizes security (there are +post-quantum crypto arguments against SHA-256). -The config represents the container's identity rather than transport metadata. Manifests are transport artifacts that can vary across different distribution mechanisms. Adding the composefs label creates a new config and thus a new manifest, establishing the sealed image as a distinct artifact. This means sealing an image produces a new image with a different config digest, where the original unsealed image and sealed image coexist as separate artifacts that registries treat as distinct versions. +### Composefs Digest Storage -### Digest Type +Composefs digests can be stored in two locations: -The primary digest is the fs-verity digest of the EROFS image containing the merged, flattened filesystem. This digest provides fast verification at mount time through kernel fs-verity checks and is deterministic: the same input layers always produce the same EROFS digest. The digest covers the complete filesystem tree including all metadata such as permissions, timestamps, and extended attributes. +1. **Signature artifact** (primary): As annotations on the signature artifact layers, alongside the PKCS#7 signatures. This is the recommended approach because it allows signing existing unmodified OCI images — the original manifest is never touched. -### Merged Filesystem Representation +2. **Manifest annotations** (optional): As annotations on the image manifest layers. This is a convenience for tools that want to verify composefs digests without fetching a separate artifact. When both are present, they MUST agree. -The config label contains the digest of the merged, flattened filesystem. This represents the final filesystem state after extracting all layers in order, applying whiteouts (`.wh.` files), merging directories where the most-derived layer wins for metadata, and building the final composefs EROFS image. +When using manifest annotations, in [the manifest](https://github.com/opencontainers/image-spec/blob/main/manifest.md), +each layer may have an annotation with a composefs digest. -### Per-Layer Digests (Future Extension) +```json +{ + "layers": [ + { + "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", + "digest": "sha256:9834876dcfb05cb167a5c24953eba58c4ac89b1adf57f28f2f9d09af107ee8f0", + "size": 32654, + "annotations": { + "composefs.layer.fsverity-sha512-12": "3abb6677af34ac57c0ca5828fd94f9d886c26ce59a8ce60ecf6778079423dccff1d6f19cb655805d56098e6d38a1a710dee59523eed7511e5a9e4b8ccb3a4686" + } + }, + { + "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", + "digest": "sha256:3c3a4604a545cdc127456d94e421cd355bca5b528f4a9c1905b15da2eb4a4c6b", + "size": 16724, + "annotations": { + "composefs.layer.fsverity-sha512-12": "63e22ec2fbeebabf005e58fbfb0eee607c4aa417045a68a0cc63767b048e3559268d35e72f367d3b2dbd5dbddf12fc4397762ba149260b3795a0391713bddcd7" + } + }, + { + "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", + "digest": "sha256:ec4b8955958665577945c89419d1af06b5f7636b4ac3da7f12184802ad867736", + "size": 73109, + "annotations": { + "composefs.layer.fsverity-sha512-12": "2b59d179d9815994f687383a886ea34109889756efca5ab27318cc67ce2a21261d12fa6fee6b8c716f72214ead55ee0d789d6c35cff977d40ef5728ba9188a80" + } + } + ] +} +``` -Per-layer composefs digests may be added as manifest annotations: +Additionally, an optional merged digest may be provided on the **final layer only**, representing the *flattened* merged filesystem tree of the complete stack of all layers. The rationale is that it makes it easier for a runtime to avoid the overhead of individual mounts if it chooses to do so. This is especially suitable for e.g. a "base image" whose stack of mounts would commonly be shared with higher level applications. ```json { - "manifests": [ + "layers": [ { - "layers": [ - { - "digest": "sha256:...", - "annotations": { - "containers.composefs.layer.fsverity": "sha256:..." - } - } - ] + "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", + "digest": "sha256:9834876dcfb05cb167a5c24953eba58c4ac89b1adf57f28f2f9d09af107ee8f0", + "size": 32654, + "annotations": { + "composefs.layer.fsverity-sha512-12": "3abb6677af34ac57c0ca5828fd94f9d886c26ce59a8ce60ecf6778079423dccff1d6f19cb655805d56098e6d38a1a710dee59523eed7511e5a9e4b8ccb3a4686" + } + }, + { + "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", + "digest": "sha256:3c3a4604a545cdc127456d94e421cd355bca5b528f4a9c1905b15da2eb4a4c6b", + "size": 16724, + "annotations": { + "composefs.layer.fsverity-sha512-12": "63e22ec2fbeebabf005e58fbfb0eee607c4aa417045a68a0cc63767b048e3559268d35e72f367d3b2dbd5dbddf12fc4397762ba149260b3795a0391713bddcd7", + "composefs.merged.fsverity-sha512-12": "d015f70f8bee6cf6453dd5b771eec18994b861c646cec18e2a9dfdec93f631fbb9030e60cfc82b552d33b9a134312a876ef4e519bffe3ef872aefbd84e6198b3" + } } ] } ``` -Per-layer digests enable incremental verification during pull, create caching opportunities where shared layers have known composefs digests, and enable runtime choice between flattened versus layered mounting strategies. +Note: The `composefs.merged.fsverity-sha512-12` annotation appears only on the final layer and represents the complete flattened filesystem of all layers merged together. + +#### Whiteout Handling in Merged Filesystem + +The merged EROFS represents a fully flattened filesystem and is designed to be mounted directly, not stacked with other EROFS layers via overlayfs. During the merge process, OCI whiteouts (`.wh.*` files and opaque directory markers) are fully processed: files and directories marked for deletion in upper layers are removed from the merged result. The final merged EROFS contains no whiteout entries — it is a clean, whiteout-free snapshot of the complete filesystem tree as it would appear after all layers are applied. + +### Signatures + +#### Linux kernel fsverity signatures (recommended) + +The primary signature mechanism is Linux kernel [fsverity built-in signature verification](https://docs.kernel.org/filesystems/fsverity.html#built-in-signature-verification). The kernel's `FS_IOC_ENABLE_VERITY` ioctl accepts a PKCS#7 signature that is verified against the `.fs-verity` keyring. This provides a clear chain of trust: the same component that controls data access (the kernel) also validates the signature. The kernel additionally integrates with the [IPE](https://docs.kernel.org/admin-guide/LSM/ipe.html) (Integrity Policy Enforcement) subsystem. + +The recommended delivery mechanism for these signatures is a separate OCI artifact using the Referrer pattern, described below. This enables signing existing unmodified OCI images. -### Trust Chain +Signatures MAY also be embedded as manifest annotations using a `.signature` suffix on digest annotations (e.g. `composefs.layer.fsverity-sha512-12.signature` with base64-encoded PKCS#7), though this requires modifying the image manifest. -The trust chain for composefs-verified OCI images flows from external signatures through the manifest to the complete filesystem: +#### Digest-only verification (alternative) + +Kernel-based signing is not required. An implementation may instead rely on external trust in the composefs digests themselves — for example, by trusting the OCI manifest (verified via cosign/sigstore/GPG) and treating the composefs digest annotations as authoritative. In this model: ``` External signature (cosign/sigstore/GPG) ↓ signs -OCI Manifest (includes config descriptor) - ↓ digest reference -OCI Config (includes containers.composefs.fsverity label) - ↓ fsverity digest -Composefs EROFS image - ↓ contains -Complete merged filesystem tree +OCI Manifest (includes composefs digest annotations) + ↓ +Composefs EROFS image (verified against digest) + ↓ +Complete filesystem tree ``` -## Verification Process +The userspace tooling performing this verification must be trusted. A key benefit of composefs is that verification of large data is on-demand and continuous via the kernel's fsverity — the composefs digest covers the complete filesystem tree, so verifying it is cheap even though the underlying data may be large. -Verification begins by fetching the manifest from the registry and verifying the external signature on the manifest. The config descriptor is extracted from the manifest, and the config is fetched and verified to match the descriptor digest. The `containers.composefs.fsverity` label is extracted from the config, and the composefs image is mounted with fsverity verification. The kernel verifies the EROFS matches the expected fsverity digest. +#### Replacing diff_id validation -The security property is that signature verification happens once, while filesystem verification is delegated to kernel fs-verity with lazy or eager verification depending on mount options. +The OCI image specification requires a `diff_id` in the [image config](https://github.com/opencontainers/image-spec/blob/main/config.md) for each layer, which is the digest of the uncompressed tar stream. This is expensive to validate after extraction and provides no path to continual kernel-enforced verification. With composefs, validating `diff_id` becomes redundant: the composefs digest already cryptographically covers the complete filesystem tree derived from the layer. -## Metadata Schema +#### Separate Signing Artifacts with Referrer Support -### Config Labels +Composefs fsverity signatures can be stored as separate OCI artifacts, discoverable via the OCI referrer pattern. This follows the same approach as cosign: the signature artifact references the sealed image through the `subject` field and can be found via the `/referrers` API. -The image config contains the following labels: +Each layer in the signature artifact is a raw PKCS#7 DER-encoded signature blob — exactly the format expected by `FS_IOC_ENABLE_VERITY`. No JSON wrapping or base64 encoding. -The `containers.composefs.fsverity` label (string) contains the fsverity digest of the merged composefs EROFS in the format `:` where algorithm is `sha256` or `sha512`. +##### Signature Artifact Structure -The `containers.composefs.version` label (string, optional) contains the seal format version such as `1.0`. +The signature artifact is an OCI image manifest following the [artifacts guidance](https://github.com/opencontainers/image-spec/blob/main/artifacts-guidance.md) pattern (empty config, content in layers): -### Descriptor Annotations +```json +{ + "schemaVersion": 2, + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "artifactType": "application/vnd.composefs.signature.v1", + "config": { + "mediaType": "application/vnd.oci.empty.v1+json", + "digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a", + "size": 2 + }, + "layers": [ + { + "mediaType": "application/vnd.composefs.signature.v1+pkcs7", + "digest": "sha256:aaa...", + "size": 456, + "annotations": { + "composefs.signature.type": "manifest", + "composefs.digest": "ab12...manifest-fsverity-digest..." + } + }, + { + "mediaType": "application/vnd.composefs.signature.v1+pkcs7", + "digest": "sha256:bbb...", + "size": 789, + "annotations": { + "composefs.signature.type": "config", + "composefs.digest": "cd34...config-fsverity-digest..." + } + }, + { + "mediaType": "application/vnd.composefs.signature.v1+pkcs7", + "digest": "sha256:ccc...", + "size": 1234, + "annotations": { + "composefs.signature.type": "layer", + "composefs.digest": "3abb6677af34ac57...layer-1-composefs-digest..." + } + }, + { + "mediaType": "application/vnd.composefs.signature.v1+pkcs7", + "digest": "sha256:ddd...", + "size": 1234, + "annotations": { + "composefs.signature.type": "layer", + "composefs.digest": "63e22ec2fbeeba...layer-2-composefs-digest..." + } + }, + { + "mediaType": "application/vnd.composefs.signature.v1+pkcs7", + "digest": "sha256:eee...", + "size": 1234, + "annotations": { + "composefs.signature.type": "merged", + "composefs.digest": "d015f70f8bee6c...merged-composefs-digest..." + } + } + ], + "subject": { + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "digest": "sha256:5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270", + "size": 7682 + }, + "annotations": { + "composefs.algorithm": "fsverity-sha512-12" + } +} +``` -A descriptor may have the following annotation: +##### Layer Ordering -The `containers.composefs.layer.fsverity` annotation (string, optional) contains the fsverity digest of that individual layer. +Each layer carries two annotations: `composefs.signature.type` identifies the group, and `composefs.digest` carries the fsverity digest that the PKCS#7 blob signs. This makes the artifact self-contained — a consumer can verify composefs digests using only the signature artifact and the image layers, without requiring composefs annotations on the original image manifest. -### Label versus Annotation Semantics +The layers MUST appear in this order: -Config labels store the authoritative digest because the config represents container identity while the manifest is a transport artifact. Labels are part of the container specification and create a new artifact (sealed image) rather than mutating metadata. Manifest annotations are retained for discovery purposes, allowing registries to identify sealed images without parsing configs and enabling clients to optimize pull strategies. +1. One `type: "manifest"` — signature for the sealed image manifest, stored as a file with fsverity +2. One `type: "config"` — signature for the image config, stored as a file with fsverity +3. N `type: "layer"` entries — one per manifest layer, in manifest order. Each signature is applied to the EROFS blob via `FS_IOC_ENABLE_VERITY`. +4. Zero or one `type: "merged"` entry — if present, this is the signature for the merged digest on the final layer, representing the complete flattened filesystem. -## Verification Modes +Position within each group determines which source object is signed. The number of `layer` entries MUST equal the number of layers in the source manifest. -### Eager Verification +This design enables signing existing unmodified OCI images: compute composefs digests for each layer, sign them, and push the signature artifact as a referrer. The original image is never touched. -Eager verification occurs during image pull. The composefs image is immediately created and its digest is verified against the config label. This makes the container ready to mount immediately after pull and is suitable for boot scenarios where operations should be read-only. +##### Signature Format -### Lazy Verification +Each layer blob is a raw PKCS#7 signature encoded using [DER](https://en.wikipedia.org/wiki/X.690#DER_encoding) (Distinguished Encoding Rules, ITU-T X.690) over the kernel's `fsverity_formatted_digest`: -Lazy verification defers composefs creation until first mount. The pull operation stores layers and config but doesn't build the composefs image. On mount, the composefs image is built and verified against the label. This mode is suitable for application containers where many images may be pulled but only some are actually used. +```c +struct fsverity_formatted_digest { + char magic[8]; /* "FSVerity" */ + __le16 digest_algorithm; + __le16 digest_size; + __u8 digest[]; +}; +``` -## Security Model +Composefs algorithm identifiers map to kernel constants with no salt: +- `fsverity-sha512-12` → `FS_VERITY_HASH_ALG_SHA512`, 4096-byte blocks +- `fsverity-sha256-12` → `FS_VERITY_HASH_ALG_SHA256`, 4096-byte blocks +- `fsverity-sha512-16` → `FS_VERITY_HASH_ALG_SHA512`, 65536-byte blocks +- `fsverity-sha256-16` → `FS_VERITY_HASH_ALG_SHA256`, 65536-byte blocks -### Registry-Provided Sealed Images +All entries in a single signature artifact MUST use the same algorithm. The algorithm is declared in the `composefs.algorithm` annotation on the signature artifact manifest (e.g. `fsverity-sha512-12`). -For images sealed by the registry or vendor, the seal is computed during the build process and the seal label is embedded in the published config. An external signature covers the manifest. Clients verify the chain: signature → manifest → config → composefs. Trust is placed in the image producer and the signature key. +For manifest and config signatures, the fsverity digest is computed over the exact JSON bytes as stored in the registry. These files are stored locally with fsverity enabled so that reads are kernel-verified. -### Client-Sealed Images +##### Discovery and Verification -For images sealed locally by the client, the client pulls an image that may be unsigned and computes the seal locally. The client stores the sealed config in its local repository. On boot or mount, the client can re-fetch the manifest from the network to verify freshness. Trust is placed in the network fetch (TLS) and local verification. +Discovery uses the standard [OCI Distribution Spec referrers API](https://github.com/opencontainers/distribution-spec/blob/main/spec.md#listing-referrers): +``` +GET /v2//referrers/?artifactType=application/vnd.composefs.signature.v1 +``` + +Verification: +1. Check `subject` matches the sealed image manifest digest +2. Extract layers in order and match to source objects by position +3. Read the `composefs.digest` annotation from each layer to learn the expected fsverity digest +4. For each signature, pass to `FS_IOC_ENABLE_VERITY` when enabling verity on the corresponding file (manifest JSON, config JSON, or EROFS layer blob) +5. The kernel handles PKCS#7 validation — failed verification prevents reading the file +6. If the source manifest also has composefs digest annotations, verify they match the artifact's `composefs.digest` values -## Attack Mitigation +``` +External CA/Keystore + ↓ issues certificate for .fs-verity keyring +PKCS#7 signatures (from artifact layers) + ↓ applied via FS_IOC_ENABLE_VERITY to each file +Manifest JSON, Config JSON, EROFS layer blobs + ↓ kernel fsverity enforcement on every read +Runtime file access +``` -### Digest Mismatch +##### Implementation Considerations -If a config label doesn't match the actual EROFS, the mount operation fails the fsverity check. Verification APIs can detect this condition before mounting. +This specification depends on Linux kernel fsverity (CONFIG_FS_VERITY, CONFIG_FS_VERITY_BUILTIN_SIGNATURES). Signature validation and file access enforcement are handled by the kernel. -### Signature Bypass +Manifest and config objects should be stored as regular files (not splitstream) so that fsverity can be enabled on them directly. -Any attempt to modify the config label without updating the signature fails because the signature covers the manifest, which covers the config digest. Any config change produces a new digest, breaking the signature chain. +Not all signature types are required. Implementations MAY omit entire groups (e.g. no manifest/config signatures, or no merged signatures). When a group is omitted, its entries are simply absent from the layers array and the relative ordering of the remaining groups is preserved. The `layer` group MUST always be present. -### Rollback Attack +Clients that pull images with composefs signature artifacts are expected to also store the signature artifact locally alongside the image. This enables offline verification and allows fsverity signatures to be applied when files are later accessed. However, local storage of the signature artifact is not strictly required — a client could re-fetch the artifact from the registry when needed, or operate in digest-only mode where the composefs digests themselves are trusted without kernel signature verification. -For application containers, re-fetching the manifest on boot checks for freshness. For host systems, embedding the manifest in the boot artifact prevents rollback. +##### Media Types -### Layer Confusion +- `application/vnd.composefs.signature.v1`: Artifact type for signature manifests +- `application/vnd.composefs.signature.v1+pkcs7`: Layer media type for PKCS#7 DER signature blobs -Per-layer fsverity annotations allow verification before merging. Implementations that maintain digest maps can link layer SHA256 digests to fsverity digests. +## Storage model + +It is recommended to store the config, manifest and unpacked layers. The EROFS can be generated on-demand or cached (via an index associated with a given manifest). ## Relationship to Booting with composefs OCI sealing is independent from but complementary to composefs boot verification (UKI, BLS, etc.). These are separate mechanisms operating at different stages of the system lifecycle with different trust models. -OCI sealing provides runtime verification of container images distributed through registries. The trust chain typically flows from external signatures (cosign, GPG) through OCI manifests to composefs digests. +It is expected that boot-sealed images would *also* be OCI sealed, although this is not strictly required. + +### Bootable composefs UKI and kernel command line -Boot verification is designed to be rooted in extant hardware mechanisms such as Secure Boot. The composefs digest is embedded directly in boot artifacts (UKI `.cmdline` section, BLS entry `options` field) and verified during early boot by the initramfs. +The default model implemented is that the UKI's kernel command line includes the digest of a slightly modified EROFS (without `/boot` among other things). -These mechanisms work together in a complete workflow where a sealed OCI image can be pulled from a registry, verified through OCI sealing, and then used to build a boot artifact with the composefs digest embedded for boot verification. However, each mechanism operates independently with its own trust anchor and threat model. +However, it would also be possible to instead load signing keys into the kernel fsverity chain from the initramfs (which may be the same or different keys used for application images), and use the exact same scheme for mounting the root filesystem from the initramfs. ## Future Directions ### Dumpfile Digest as Canonical Identifier -The fsverity digest ties implementations to a specific EROFS format. A dumpfile digest (SHA256 of the composefs dumpfile format) would enable format evolution. This would be stored as an additional label `containers.composefs.dumpfile.sha256` alongside the fsverity digest. +The fsverity digest ties implementations to a specific EROFS format; for more details on this, see [this issue](https://github.com/composefs/composefs/issues/198). A dumpfile digest (classic SHA or fsverity digest) of the composefs dumpfile format would enable format evolution. + +This would also be stored as an annotation: -The dumpfile format is format-agnostic, meaning the same dumpfile can generate different EROFS versions. This simplifies standardization since the dumpfile format is simpler than EROFS and provides future-proofing to migrate to composefs-over-squashfs or other formats. +```json +{ + "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", + "digest": "sha256:9834876dcfb05cb167a5c24953eba58c4ac89b1adf57f28f2f9d09af107ee8f0", + "size": 32654, + "annotations": { + "composefs.layer.fsverity-sha512-12": "3abb6677af34ac57c0ca5828fd94f9d886c26ce59a8ce60ecf6778079423dccff1d6f19cb655805d56098e6d38a1a710dee59523eed7511e5a9e4b8ccb3a4686", + "composefs.layer.fsverity-sha512-12.signature": "MIIBkgYJKo...base64-encoded-pkcs7...", + "composefs.dumpfile.sha512": "62d4b68bc4d336ff0982b93832d9a1f1d40206b49218299e5ac2e50f683d23f17bb99a1f3805339232abebd702eeda204827cfde244bf833e42b67a2fe632dc0" + } +} +``` -The challenge is that verification becomes slower as it requires parsing a saved EROFS from disk to dumpfile format. Caching the dumpfile digest to fsverity digest mapping introduces complexity and security implications. A use case split might apply dumpfile digests to application containers (for format flexibility) while using fsverity digests for host boot (for speed with minimal skew). +A downside though is that because the mapping from the tar layer to the EROFS was not pre-computed server side, there is no way to attach a kernel-native signature. However, it does still allow efficient validation of the complete filesystem tree, given only the saved metadata (e.g. tar-split or splitstream) in combination with the fsverity digests of content. ### Integration with zstd:chunked @@ -173,10 +340,6 @@ Both zstd:chunked and composefs add new digests to OCI images. The zstd:chunked Adding fsverity to zstd:chunked TOC entries would allow using the TOC digest as a canonical composefs identifier. This would support a direct TOC → dumpfile → composefs pipeline, with a single metadata format serving both zstd:chunked and composefs use cases. -### Three-Digest Model - -To support both flattened and layered mounting strategies, three digests could be stored per image: a base image digest, a derived layers digest, and a flattened digest. This would enable mounting a single flattened composefs for speed, mounting base and derived separately to avoid metadata amplification, or verifying the base from upstream while only rebuilding derived layers. This aligns with the existing `org.opencontainers.image.base.digest` standard. - ## References **Design discussion**: [composefs/composefs#294](https://github.com/composefs/composefs/issues/294) @@ -192,8 +355,7 @@ To support both flattened and layered mounting strategies, three digests could b **Standards**: - [OCI Image Specification](https://github.com/opencontainers/image-spec) -- [Canonical JSON](https://wiki.laptop.org/go/Canonical_JSON) ## Contributors -This specification synthesizes ideas from Colin Walters (original design proposals and iteration), Allison Karlitskaya (implementation and practical refinements), and Alexander Larsson (security model and non-root mounting insights). Significant assistance from Claude Sonnet 4.5 was used in synthesis. +This specification synthesizes ideas from Colin Walters (original design proposals and iteration), Allison Karlitskaya (implementation and practical refinements), Alexander Larsson (security model and non-root mounting insights), and Giuseppe Scrivano (across the board) with assistance from Claude Sonnet 4.5 and Claude Opus 4. diff --git a/doc/plans/standardized-erofs-meta.md b/doc/plans/standardized-erofs-meta.md new file mode 100644 index 00000000..a8b54613 --- /dev/null +++ b/doc/plans/standardized-erofs-meta.md @@ -0,0 +1,74 @@ +# Standardized EROFS Metadata Serialization + +This document outlines the goal of standardizing how composefs serializes filesystem trees to EROFS metadata images. + +## Goal + +Standardize how a filesystem tree, expressed canonically as a composefs dumpfile (or equivalent representation), is serialized to EROFS metadata. This enables reproducible EROFS generation across implementations. + +## Conceptual Model + +The canonical transformation model is: + +``` +tar layer → dumpfile → EROFS metadata +``` + +Even when implementations optimize by going directly from tar to EROFS for efficiency, the canonical model remains tar → dumpfile → EROFS. This means: + +1. Two implementations processing the same tar layer should produce equivalent dumpfiles +2. Two implementations processing the same dumpfile MUST produce byte-identical EROFS images +3. Therefore, two implementations processing the same tar layer should produce byte-identical EROFS images + +The dumpfile serves as the canonical intermediate representation that defines the filesystem tree independent of serialization format. + +## Why This Matters + +- **Reproducible EROFS generation**: Given identical inputs, composefs-c, composefs-rs, and any future implementations must produce byte-for-byte identical EROFS images +- **fsverity digest interoperability**: The OCI sealing specification relies on fsverity digests of EROFS images. These digests must match across implementations for signatures to verify correctly +- **Ecosystem compatibility**: Container runtimes, build tools, and registries can use different implementations interchangeably + +Without standardized EROFS output, a signature created by one implementation would fail verification when the EROFS is regenerated by a different implementation. + +## Current State + +This standardization is a work in progress: + +- **[composefs/composefs#423](https://github.com/composefs/composefs/discussions/423)**: Discussion on compatible EROFS output across implementations +- **[composefs-rs PR #225](https://github.com/containers/composefs-rs/pull/225)**: Initial reimplementation of composefs-c in Rust, with compatible EROFS output as a key goal + +## Open Questions + +The following details need to be standardized (future work): + +### EROFS Format Options +- EROFS format version and feature flags +- Block size (currently 4096) +- Compression settings (composefs uses uncompressed metadata) + +### Inode Representation +- Compact vs extended inode format +- Inode numbering scheme +- Handling of hardlinks (inode sharing) + +### Metadata Ordering +- Inode table ordering (depth-first? breadth-first? by path?) +- Directory entry ordering within directories +- Xattr key ordering within an inode +- Shared xattr table construction algorithm + +### Content Handling +- Inline data threshold (currently ~64 bytes for external, but exact cutoff matters) +- External file references via overlay metacopy xattrs +- Symlink target storage + +### OCI-Specific Concerns +- Whiteout representation (should not appear in final EROFS — processed during merge) +- Root inode metadata normalization (copying from `/usr`) +- Timestamp precision (seconds only, matching tar limitations) + +## References + +- [composefs dumpfile format](../splitstream.md) — related binary format documentation +- [OCI sealing specification](oci-sealing-spec.md) — depends on reproducible EROFS generation +- [EROFS documentation](https://docs.kernel.org/filesystems/erofs.html) — kernel filesystem documentation