diff --git a/src/http-gateways/path-gateway.md b/src/http-gateways/path-gateway.md index c207ae6d..dfd95010 100644 --- a/src/http-gateways/path-gateway.md +++ b/src/http-gateways/path-gateway.md @@ -4,7 +4,7 @@ description: > The comprehensive low-level HTTP Gateway enables the integration of IPFS resources into the HTTP stack through /ipfs and /ipns namespaces, supporting both deserialized and verifiable response types. -date: 2025-10-13 +date: 2026-02-05 maturity: reliable editors: - name: Marcin Rataj @@ -36,6 +36,12 @@ thanks: affiliation: name: Protocol Labs url: https://protocol.ai/ + - name: Alex Potsides + github: achingbrain + url: https://achingbrain.net + affiliation: + name: Shipyard + url: https://ipshipyard.com xref: - url - trustless-gateway @@ -158,12 +164,13 @@ For example: - [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw) – disables [IPLD codec deserialization](https://ipld.io/docs/codecs/), requests a verifiable raw [block](https://docs.ipfs.io/concepts/glossary/#block) to be returned - [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) – disables [IPLD codec deserialization](https://ipld.io/docs/codecs/), requests a verifiable [CAR](https://docs.ipfs.io/concepts/glossary/#car) stream to be returned with implicit or explicit [`dag-scope`](https://specs.ipfs.tech/http-gateways/trustless-gateway/#dag-scope-request-query-parameter) for blocks at the terminus of the specified path and the blocks required to traverse path segments from root CID to the terminus. -- [application/x-tar](https://en.wikipedia.org/wiki/Tar_(computing)) – returns UnixFS tree (files and directories) as a [TAR](https://en.wikipedia.org/wiki/Tar_(computing)) stream. Returned tree starts at a DAG which name is the same as the terminus segment. Produces 400 Bad Request for content that is not UnixFS. -- [application/vnd.ipld.dag-json](https://www.iana.org/assignments/media-types/application/vnd.ipld.dag-json) – requests [IPLD Data Model](https://ipld.io/docs/data-model/) representation serialized into [DAG-JSON format](https://ipld.io/docs/codecs/known/dag-json/). If the requested CID already has `dag-json` (0x0129) codec, data is validated as DAG-JSON before being returned as-is. Invalid DAG-JSON produces HTTP Error 500. -- [application/vnd.ipld.dag-cbor](https://www.iana.org/assignments/media-types/application/vnd.ipld.dag-cbor) – requests [IPLD Data Model](https://ipld.io/docs/data-model/) representation serialized into [DAG-CBOR format](https://ipld.io/docs/codecs/known/dag-cbor/). If the requested CID already has `dag-cbor` (0x71) codec, data is validated as DAG-CBOR before being returned as-is. Invalid DAG-CBOR produces HTTP Error 500. -- [application/json](https://www.iana.org/assignments/media-types/application/json) – same as `application/vnd.ipld.dag-json`, unless the CID's codec already is `json` (0x0200). Then, the raw JSON block can be returned as-is without any conversion. -- [application/cbor](https://www.iana.org/assignments/media-types/application/cbor) – same as `application/vnd.ipld.dag-cbor`, unless the CID's codec already is `cbor` (0x51). Then, the raw CBOR block can be returned as-is without any conversion. +- [application/x-tar](https://en.wikipedia.org/wiki/Tar_(computing)) – returns a UnixFS tree (files and directories) as a [TAR](https://en.wikipedia.org/wiki/Tar_(computing)) stream. Returned tree starts at a DAG which name is the same as the terminus segment. Produces 406 Not Acceptable for content that is not UnixFS. +- [application/vnd.ipld.dag-json](https://www.iana.org/assignments/media-types/application/vnd.ipld.dag-json) – Returns the block when CID codec is `dag-json`. Implementations MAY validate block data before returning. SHOULD produce 406 Not Acceptable when the CID codec does not match. +- [application/vnd.ipld.dag-cbor](https://www.iana.org/assignments/media-types/application/vnd.ipld.dag-cbor) – Returns the block when CID codec is `dag-cbor`. Implementations MAY validate block data before returning. SHOULD produce 406 Not Acceptable when the CID codec does not match. +- [application/json](https://www.iana.org/assignments/media-types/application/json) – For blocks with CID codec `json`, returns block data as `application/json`. Implementations MAY validate block data before returning. For deserialized UnixFS files that represent text files with valid JSON, implementations SHOULD allow serving the file content as `application/json` regardless of the CID codec being `dag-pb` or `raw`. SHOULD produce 406 Not Acceptable in all other cases. +- [application/cbor](https://www.iana.org/assignments/media-types/application/cbor) – Returns the block when CID codec is `cbor`. Implementations MAY validate block data before returning. SHOULD produce 406 Not Acceptable when the CID codec does not match. - [application/vnd.ipfs.ipns-record](https://www.iana.org/assignments/media-types/application/vnd.ipfs.ipns-record) – requests a verifiable :cite[ipns-record] to be returned. Produces 400 Bad Request if the content is not under the IPNS namespace, or contains a path. +- [text/html](https://html.spec.whatwg.org/) – returns a human-readable representation of the requested data which may include a link to download the raw data. :::note @@ -339,6 +346,23 @@ responses (such as CAR), once HTTP 200 OK status is sent, gateways cannot change it. If a child block is missing during streaming, the gateway SHOULD terminate the stream. Clients MUST verify response completeness. +### `406` Not Acceptable + +Returned when the requested response format does not match the CID's codec +and the gateway does not perform cross-codec conversion. + +For example, requesting `?format=dag-json` on a `dag-cbor` block, or +`?format=dag-cbor` on a `dag-pb` block, SHOULD return a 406 response. + +Similarly, requesting `?format=tar` for content that is not UnixFS SHOULD +return 406. + +Implementations MAY include an actionable hint in the response body (e.g., +suggesting the client fetch the raw block with `?format=raw` and convert +client-side). + +See :cite[ipip-0524] for details. + ### `410` Gone Error to indicate that request was formally correct, but this specific Gateway @@ -753,10 +777,10 @@ By default, implicit deserialized response type is based on `Accept` header and - Bytes representing a CBOR file, see [application/cbor](https://www.iana.org/assignments/media-types/application/cbor) - Works exactly the same as `raw`, but returned `Content-Type` is `application/cbor` - DAG-JSON (0x0129) - - If the `Accept` header includes `text/html`, implementation should return a generated HTML with options to download DAG-JSON as-is, or converted to DAG-CBOR. + - If the `Accept` header includes `text/html`, implementation should return a generated HTML with an option to download DAG-JSON as-is. - Otherwise, response works exactly the same as `raw` block, but returned `Content-Type` is [application/vnd.ipld.dag-json](https://www.iana.org/assignments/media-types/application/vnd.ipld.dag-json) - DAG-CBOR (0x71) - - If the `Accept` header includes `text/html`: implementation should return a generated HTML with options to download DAG-CBOR as-is, or converted to DAG-JSON. + - If the `Accept` header includes `text/html`: implementation should return a generated HTML with an option to download DAG-CBOR as-is. - Otherwise, response works exactly the same as `raw` block, but returned `Content-Type` is [application/vnd.ipld.dag-cbor](https://www.iana.org/assignments/media-types/application/vnd.ipld.dag-cbor) The following response types require an explicit opt-in, can only be requested with [`format`](#format-request-query-parameter) query parameter or [`Accept`](#accept-request-header) header: diff --git a/src/ipips/ipip-0524.md b/src/ipips/ipip-0524.md new file mode 100644 index 00000000..a833a9e6 --- /dev/null +++ b/src/ipips/ipip-0524.md @@ -0,0 +1,199 @@ +--- +title: "IPIP-0524: Remove cross-codec conversion from HTTP Gateways" +date: 2026-02-06 +ipip: proposal +editors: + - name: Alex Potsides + github: achingbrain + url: https://achingbrain.net + affiliation: + name: Shipyard + url: https://ipshipyard.com + - name: Marcin Rataj + github: lidel + url: https://lidel.org + affiliation: + name: Shipyard + url: https://ipshipyard.com +relatedIssues: + - https://github.com/ipfs/gateway-conformance/issues/200 +order: 524 +tags: ['ipips'] +--- + +## Summary + +Make IPFS HTTP Gateway responses easier to reason about by not requiring IPLD +Data Model translations + +## Motivation + +When sending an [Accept](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Accept) +header or [format](https://specs.ipfs.tech/http-gateways/path-gateway/#format-request-query-parameter) +query parameter to specify the response format of a request, the IPFS HTTP +Gateway specs [allow translation](https://specs.ipfs.tech/http-gateways/path-gateway/#accept-request-header) +of the requested content into the [IPLD Data Model](https://ipld.io/docs/data-model/). + +This adds significant complexity to HTTP Gateway implementations, since they +need to be able to translate between arbitrary data types and handle all the +various failure states. + +The conversions are also lossy due to differences in supported data types across +different formats so lack general-purpose utility and are ultimately something +that could be done on an interested client if required. + +## Detailed design + +When the block's CID codec matches the requested response format, +implementations MAY return the block as-is without parsing or validating it. +This is effectively equivalent to requesting `?format=raw` but with a +codec-specific `Content-Type` header. + +When the CID codec does not match the requested format, the gateway SHOULD +return a [406 Not Acceptable](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/406) +unless the server provides cross-codec conversion as an extra feature outside +of this specification. + +For example, requesting a DAG-JSON block with the `application/cbor` format +would result in a 406 response. + +Where a human-readable rendering of the data is desired, the `text/html` format +can be requested. This would allow browsing DAG-PB data, for example. + +A [400](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/400) +may be returned if the request was invalid (for example an unsupported format +was requested). + +A [500](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/500) +may be returned in other circumstances. + +## Design rationale + +Simplifying the HTTP Gateway spec to remove these format translations and the +additional logic required makes it more straightforward to create new +implementations, and makes the returned data more transparent and so easier to +understand since the data is not modified to fit the output format. + +Clients that wish to translate between different data formats may request raw +blocks and do the translation themselves. + +### User benefit + +For gateway operators and implementers, removing the requirement to perform +codec conversions server-side significantly reduces implementation complexity. + +For end users and application developers, the change makes gateway behavior +easier to reason about: a request either returns data deserialized according +to the rules of the CID's original codec, or fails with 406. This moves +conversion to userland, encouraging users to fetch raw blocks with +`?format=raw` and convert client-side, putting the application in full control +and producing deterministic results regardless of which gateway is used. + +This matters in practice because codec libraries do not behave identically. +[Cross-library dag-cbor tests (2026)](https://hyphacoop.github.io/dasl-testing/?group=tests-by-file&tag=dag-cbor) +show each implementation differs on edge cases like float handling, map key +ordering, and encoding strictness. Relying on server-side conversion means +the output depends on whichever library the gateway happens to use, which is +not a foundation for robust software. + +### Compatibility + +Formally this is a breaking change: server-side IPLD Data Model translations +between codecs are removed. + +In practice, nobody could build reliable software +on top of conversion logic that behaved non-deterministically across gateways +written in different languages. Clients that needed data in a different +format often chose to fetch `?format=raw` and convert client-side already. + +This IPIP standardizes that robust real-world pattern and removes an +unreliable niche feature that has seen limited use. + +#### Real-world `?format=` usage on `ipfs.io` and `dweb.link` + +A 24-hour sample of traffic on the `ipfs.io` and `dweb.link` public gateways +(Feb 2026) shows that only 4.5% of all requests use the `?format=` query +parameter, and the vast majority ask for `json`: + +| `?format=` value | % of requests with `format=` | +|------------------|------------------------------| +| `json` | 99.11% | +| `raw` | 0.86% | +| `dag-json` | 0.02% | +| `car` | 0.01% | +| other | <0.01% | + +Note: `ipfs.io` and `dweb.link` serve deserialized responses. Trustless +verifiable requests (`?format=raw`, `?format=car`) are redirected to +`trustless-gateway.link`, which is why those formats appear so rarely here. + +Looking at what those `?format=json` requests actually point at tells the +real story. The CID codec of the requested blocks breaks down as follows: + +| CID codec of requested block | % of `?format=json` | +|------------------------------|---------------------| +| `dag-pb` (CIDv0 `Qm...`) | 60.0% | +| `dag-pb` (CIDv1 `bafy...`) | 21.4% | +| `raw` (`bafk...`) | 18.6% | + +100% of `?format=json` requests are for blocks with `dag-pb` or `raw` codec. +None target the `json` codec (0x0200). In other words, these clients are +reading regular JSON files stored as UnixFS, not asking the gateway to convert +between IPLD codecs. The gateway serves them as plain HTTP file responses, +which is covered by the UnixFS interop exception described later in this IPIP. + +The remaining formats (`dag-json` and `car`) together account for less than +0.04% of `?format=` requests and do not depend on cross-codec conversion +either, since they request data in the block's native codec. + +#### `json` and `dag-json` independence + +`application/json` and `application/vnd.ipld.dag-json` are now treated as +independent formats, each matching only their respective CID codec (`json` +0x0200 and `dag-json` 0x0129). The old behavior where `application/json` was +an alias for `application/vnd.ipld.dag-json` (falling back to dag-json +conversion) no longer applies. + +#### UnixFS interop exception for `Accept: application/json` + +Note: the codec match requirement and 406 behavior described above do not +apply to deserialized UnixFS file responses. Users commonly store valid JSON +as UnixFS files (with `dag-pb` or `raw` codec), and serving those files with +`Accept: application/json` is regular HTTP content serving, not codec +conversion. See the `application/json` entry in the +[Accept request header](https://specs.ipfs.tech/http-gateways/path-gateway/#accept-request-header) +section of the Path Gateway spec for normative requirements. + +#### Opt-in backward compatibility + +Implementations MAY offer an opt-in configuration flag to restore the old +codec conversion behavior for backward compatibility. + +#### Implementation-defined behavior + +- The content of the 406 error response body (e.g. actionable hints). +- Handling of `?format=json` / `Accept: application/json` on non-json-codec + content (like `dag-pb` UnixFS files). +- Whether to offer an opt-in flag for restoring codec conversion. +- Validation of block data when the CID codec matches the requested format. + +### Security + +No security implications. This change restricts gateway behavior (returning +406 instead of converting), which reduces attack surface. + +## Test fixtures + +Implementers can run the [gateway-conformance](https://github.com/ipfs/gateway-conformance/) +test suite v0.10 or later. The following behaviors are verified by the test suite: + +- Requesting a block in a format that differs from its CID codec (e.g. + `dag-pb` block with `?format=dag-json`) returns HTTP 406. +- Requesting a block in its native codec returns HTTP 200. +- `?format=raw` works for any codec. +- HTML rendering (`Accept: text/html`) of DAG-JSON/DAG-CBOR blocks is not + codec conversion and remains part of the spec. + +### Copyright + +Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).