-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
PEP 819: JSON Package Metadata #4751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
emmatyping
wants to merge
12
commits into
python:main
Choose a base branch
from
wheelnext:norender-json-metadata-pep
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+649
−0
Open
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
7711db1
PEP 9999: JSON Package Metadata
emmatyping 3426023
Add newline to appendix
emmatyping d360af6
Remove extra newline in pep-9999.rst
emmatyping 13552b3
Use data not attr for email compat
emmatyping b3ad6b6
Add more fixes for lint errors
emmatyping c60c080
Format JSON Schema to 2 spaces
emmatyping 933ef4d
Respond to feedback and add ``WHEEL.json``
emmatyping f7adcac
Be more lax about the build tag
emmatyping 1b0170e
Add newline to fix lint
emmatyping f6613f1
Claim PEP 819
emmatyping 4d28637
Update schema refs in PEP text
emmatyping 8db80d0
Add spacing between under-review PEPs in CODEOWNERS
emmatyping File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,347 @@ | ||
| PEP: 819 | ||
| Title: JSON Package Metadata | ||
| Author: Emma Harper Smith <emma@python.org> | ||
| PEP-Delegate: Paul Moore | ||
| Discussions-To: Pending | ||
| Status: Draft | ||
| Type: Standards Track | ||
| Topic: Packaging | ||
| Created: 18-Dec-2025 | ||
| Post-History: Pending | ||
|
|
||
|
|
||
| Abstract | ||
| ======== | ||
|
|
||
| Python package metadata ("core metadata") was first defined in :pep:`241` to | ||
| use :rfc:`822` email headers to encode information about packages. This was | ||
| reasonable at the time; email messages were the only widely used, standardized | ||
| text format that had a parser in the standard library at the time. However, | ||
| issues with handling different encodings, differing handling of line breaks, | ||
| and other differences between implementations have caused numerous packaging | ||
| bugs. To resolve these issues, this PEP proposes introducing | ||
| `Javascript Object Notation (JSON) <https://www.json.org/json-en.html>`_ | ||
| encoded core metadata and wheel file format metadata files in Python packages. | ||
|
|
||
|
|
||
| Motivation | ||
| ========== | ||
|
|
||
| The email message format has a number of complexities and limitations which | ||
| reduce its utility as a portable textual interchange format for packaging | ||
| metadata. Due to the :mod:`email` parser requiring configuration changes to | ||
| properly generate valid core metadata, many projects do not use the | ||
| :mod:`!email` module and instead generate core metadata in a custom manner. | ||
| There are many pitfalls with generating email headers that these custom | ||
| generators can hit. First, core metadata fields may contain newlines in the | ||
| value of fields. These newlines must be handled properly to "unfolded" multiple | ||
| lines per :rfc:`822`. One particularly difficult to encode field is the | ||
| ``Description`` field, which may contain newlines and indentation. To encode | ||
| the field in email headers, CRLF line breaks must be followed by seven (7) | ||
| spaces and a pip ("|") character. While ``Description`` may now be encoded in | ||
| the message body, similar escaping issues occur for the ``Author`` and | ||
| ``Maintainer`` fields. Improperly escaped newlines can lead to missing, | ||
| partial, or invalid core metadata. Second, as discussed in the | ||
| `core metadata specifications <https://packaging.python.org/specifications/core-metadata/>`__: | ||
|
|
||
| .. epigraph:: | ||
| The standard file format for metadata (including in wheels and installed | ||
| projects) is based on the format of email headers. However, email formats | ||
| have been revised several times, and exactly which email RFC applies to | ||
| packaging metadata is not specified. In the absence of a precise | ||
| definition, the practical standard is set by what the standard library | ||
| :mod:`email.parser` module can parse using the | ||
| :data:`email.policy.compat32` policy. | ||
|
|
||
| Since no specific email RFC is selected, the current core metadata | ||
| specification is ambiguous whether a given core metadata document is valid. | ||
| :rfc:`822` is the only email standard to be explicitly listed in a PEP. | ||
| However, the core metadata specifications also requires that core metadata is | ||
| encoded using UTF-8 when written to a file. This de-facto makes the core | ||
| metadata follow :rfc:`6532`, which specifies internationalization of email | ||
| headers. This has practical interoperability concerns. Until a few years ago, | ||
| it was unspecified how to handle non-ASCII encoded content in core metadata, | ||
| causing confusion about how to properly encode non-ASCII emails in core | ||
| metadata. Third, the current format is difficult to properly validate and | ||
| parse. Many tools do not check for issues with the output of the :mod:`!email` | ||
| parser. If a document is malformed, it may still parse without error by the | ||
| :mod:`!email` module as a valid email message. Furthermore, due to limitations | ||
| in the email format, fields like ``Project-Url`` must create custom encodings | ||
| of nested key-value items, further complicating parsing. Finally, the lack of | ||
| a schema makes it difficult to validate the contents of email message encoded | ||
| metadata. While introducing a specification for the current format has been | ||
| `discussed previously <https://discuss.python.org/t/python-metadata-format-specification-and-implementation/7550>`_, | ||
| no progress had been made, and converting to JSON was a suggested resolution | ||
| to the issues raised. | ||
|
|
||
| The ``WHEEL`` file format is currently encoded in a custom key-value format. | ||
| While this format is easy to parse and write, it requires manual parsing and | ||
| validation to ensure that the contents are valid. Moving to a JSON encoded | ||
| format will allow for easier parsing and validation of the contents, and | ||
| simplify packaging tools and services. | ||
|
|
||
|
|
||
| Rationale | ||
| ========= | ||
|
|
||
| Introducing a new core metadata file with a well-specified format will greatly | ||
| ease generating, parsing, and validating metadata. JSON is a natural choice for | ||
| storing package core metadata. It is easily machine readable and writable, is | ||
| understandable to humans, and is well supported across many languages. | ||
| Furthermore, :pep:`566` already specifies a canonicalization of email formatted | ||
| core metadata to JSON. JSON is also a frequently used format for data | ||
| interchange on the web. For discussion of other formats considered, please | ||
| refer to the rejected ideas section. | ||
|
|
||
| To maintain backwards compatibility, the JSON metadata file MUST be generated | ||
| alongside the existing email formatted metadata file. This ensures that tools | ||
| that do not support the new format can still read package metadata for new | ||
| packages. | ||
|
|
||
| The JSON formatted metadata file must be semantically equivalent to the email | ||
| encoded file. This ensures that the metadata is unambiguous between the two | ||
| formats, and tools may read either when both are present. To maintain | ||
| performance, this equivalence is not required to be verified by installers, | ||
| though other tools may do so. Some tools may choose to make the check dependent | ||
| on a configuration flag. | ||
|
|
||
| Package indexes SHOULD check that the metadata files are semantically | ||
| equivalent when the package is added to the index. This is a low-cost, one-time | ||
| check that ensures users of the index are served valid packages. | ||
|
|
||
|
|
||
| Specification | ||
| ============= | ||
|
|
||
| JSON Format Core Metadata File | ||
| ------------------------------ | ||
|
|
||
| A new optional but recommended file ``METADATA.json`` shall be introduced as a | ||
| metadata file for Python packages. If generated, the ``METADATA.json`` file | ||
| MUST be placed in the same directory as the current email formatted | ||
| ``METADATA`` or ``PKG-INFO`` file. | ||
|
|
||
| For wheels, this means that ``METADATA.json`` MUST be located in the | ||
| ``.dist-info`` directory. The wheel format minor version will be incremented to | ||
| indicate the change in the format. | ||
|
|
||
| For source distribution packages, the ``METADATA.json`` file MUST be located | ||
| in the root directory of the project sources. Tools that prefer the JSON | ||
| formatted metadata file MUST check for the existence of a ``METADATA.json`` | ||
| in the source distribution before reading the file. | ||
|
|
||
| The semantic contents of the ``METADATA`` and ``METADATA.json`` files MUST be | ||
| equivalent if ``METADATA.json`` is present. Installers MAY verify this | ||
| information. Public package indexes SHOULD verify the files are semantically | ||
| equivalent. | ||
|
|
||
| Conversion of ``METADATA`` to JSON Encoding | ||
| ------------------------------------------- | ||
|
|
||
| Conversion from the current email format for core metadata to JSON should | ||
| follow the process described in :pep:`566`, with the following modification: | ||
| the ``Project-URL`` entries should be converted into an object with keys | ||
| containing the labels and values containing the URLs from the original email | ||
| value. The overall process thus becomes: | ||
|
|
||
| #. The original key-value format should be read with | ||
| ``email.parser.HeaderParser``; | ||
| #. All transformed keys should be reduced to lower case. Hyphens should be | ||
| replaced with underscores, but otherwise should retain all other characters; | ||
| #. The transformed value for any field marked with "(Multiple-use") should be a | ||
| single list containing all the original values for the given key; | ||
| #. The ``Keywords`` field should be converted to a list by splitting the | ||
| original value on commas; | ||
| #. The ``Project-URL`` field should be converted into a JSON object with keys | ||
| containing the labels and values containing the URLs from the original email | ||
| value. | ||
| #. The message body, if present, should be set to the value of the | ||
| ``description`` key. | ||
| #. The result should be stored as a string-keyed dictionary. | ||
|
|
||
| One edge case in the above conversion is that the ``Project-URL`` label is | ||
| "free text, with a maximum length of 32 characters." This presents a problem | ||
| when trying to decode the label. Therefore this PEP sets the requirement that | ||
| the ``Project-URL`` label be any text *except* the comma (``,``) character. | ||
| This allows for unambiguous parsing of the ``Project-URL`` entries by splitting | ||
| the text on the left-most comma (``,``) character. | ||
|
|
||
| JSON Schema for Core Metadata | ||
| ----------------------------- | ||
|
|
||
| To enable verification of JSON encoded core metadata, a | ||
| `JSON schema <https://json-schema.org/>`_ for core metadata has been produced. | ||
| This schema will be updated with each revision to the core metadata | ||
| specification. The schema is available in | ||
| :ref:`0819-core-metadata-json-schema`. | ||
|
|
||
| Serving METADATA.json in the Simple Repository API | ||
| -------------------------------------------------- | ||
|
|
||
| :pep:`658` introduced a means of serving package metadata in the Simple | ||
| Repository API. The JSON encoded version of the package metadata may also be | ||
| served, via the following modifications to the Simple Repository API: | ||
|
|
||
| A new attribute ``data-dist-info-metadata-json`` may be added to anchor tags | ||
| in the Simple API. This attribute should have a value containing the hash | ||
| information for the ``METADATA.json`` file in the same format as | ||
| ``data-dist-info-metadata``. If ``data-dist-info-metadata-json`` is present, | ||
| the repository MUST serve the JSON encoded metadata file at the | ||
| distribution's path with ``.metadata.json`` appended to it. For example, if a | ||
| distribution is served at ``/simple/foo-1.0-py3-none-any.whl``, the JSON | ||
| encoded core metadata file MUST be served at | ||
| ``/simple/foo-1.0-py3-none-any.whl.metadata.json``. | ||
|
|
||
| JSON Format Wheel Metadata File | ||
| ------------------------------- | ||
|
|
||
| A new optional but recommended file ``WHEEL.json`` shall be introduced as a | ||
| JSON encoded version of the ``WHEEL`` file. If generated, the ``WHEEL.json`` | ||
| file MUST be placed in the same directory as the current key-value formatted | ||
| ``WHEEL`` file, i.e. the ``.dist-info`` directory. The semantic contents of | ||
| the ``WHEEL`` and ``WHEEL.json`` files MUST be equivalent. | ||
|
|
||
| The ``WHEEL.json`` file SHOULD be preferred over the ``WHEEL`` file when both | ||
| are present. | ||
|
|
||
| Conversion of ``WHEEL`` to JSON Encoding | ||
| ---------------------------------------- | ||
|
|
||
| Conversion from the current key-value format for wheel file format metadata to | ||
| JSON should proceed as follows: | ||
|
|
||
| #. The original key-value format should be read. | ||
| #. All transformed keys should be reduced to lower case. Hyphens should be | ||
| replaced with underscores, but otherwise should retain all other characters. | ||
| #. The ``Tag`` field's entries should be converted to a list containing the | ||
| original values. | ||
| #. The result should be stored as a string-keyed dictionary. | ||
|
|
||
| This follows a similar process to the conversion of ``METADATA`` to JSON | ||
| encoding. | ||
|
|
||
| JSON Schema for Wheel Metadata | ||
| ------------------------------ | ||
|
|
||
| To enable verification of JSON encoded wheel file format metadata, a | ||
| JSON schema for wheel metadata has been produced. | ||
| This schema will be updated with each revision to the wheel metadata | ||
| specification. The schema is available in :ref:`0819-wheel-json-schema`. | ||
|
|
||
| Deprecation of the ``METADATA``, ``PKG-INFO``, and ``WHEEL`` Files | ||
| ------------------------------------------------------------------ | ||
|
|
||
| The ``METADATA``, ``PKG-INFO``, and ``WHEEL`` files are now deprecated. This | ||
| means that a future PEP may make the ``METADATA``, ``PKG-INFO``, and ``WHEEL`` | ||
| files optional and require ``METADATA.json`` and ``WHEEL.json`` to be present. | ||
| Please see the next section for more information on backwards compatibility | ||
| caveats to that change. | ||
|
|
||
| Despite the ``METADATA`` and ``PKG-INFO`` files being deprecated, new core | ||
| metadata revisions should be implemented for both JSON and email to ensure that | ||
| they may remain semantically equivalent. Similarly, new ``WHEEL`` metadata keys | ||
| should be implemented for both JSON and key-value formats to ensure that they | ||
| may remain semantically equivalent. | ||
|
|
||
|
|
||
| Backwards Compatibility | ||
| ======================= | ||
|
|
||
| The specification for ``METADATA.json`` and ``WHEEL.json`` is designed such | ||
| that the new format is completely backwards compatible. Existing tools may read | ||
| metadata from the existing email formatted files, and new tools may take | ||
| advantage of the new format. | ||
|
|
||
| A future major revision of the wheel specification may make the ``METADATA``, | ||
| ``PKG-INFO``, and ``WHEEL`` files optional and make the ``METADATA.json`` and | ||
| ``WHEEL.json`` files required. | ||
|
|
||
| Note that tools will need to maintain parsing of email metadata and the | ||
| key-value formatted ``WHEEL`` file indefinitely to support parsing metadata | ||
| for old packages which only have the ``METADATA``, ``PKG-INFO``, | ||
| or ``WHEEL`` files. | ||
|
|
||
|
|
||
| Security Implications | ||
| ===================== | ||
|
|
||
| One attack vector with JSON encoded core metadata is if the JSON payload is | ||
| designed to consume excessive memory or CPU resources in a denial of service | ||
| (DoS) attack. While this attack is not likely to affect users whom can cancel | ||
| resource-intensive interactive operations, it may be an issue for package | ||
| indexes. | ||
|
|
||
| There are several mitigations that can be made to prevent this: | ||
|
|
||
| #. The length of the JSON payload can be restricted to a reasonable size. | ||
| #. The reader may use a :class:`~json.JSONDecoder` to omit parsing :class:`int` | ||
| and :class:`float` values to avoid quadratic number parsing time complexity | ||
| attacks. | ||
| #. I plan to contribute a change to :class:`~json.JSONDecoder` in Python | ||
| 3.15+ that will allow it to be configured to restrict the nesting of JSON | ||
| payloads to a reasonable depth. Core metadata currently has a maximum depth | ||
| of 2 to encode mapping and list fields. | ||
|
|
||
| With these mitigations in place, concerns about denial of service attacks with | ||
| JSON encoded core metadata are minimal. | ||
|
|
||
|
|
||
| Reference Implementation | ||
| ======================== | ||
|
|
||
| A reference implementation of the JSON schema for JSON core metadata is | ||
| available in :ref:`0819-core-metadata-json-schema`. | ||
|
|
||
| Furthermore, a reference implementation in the ``packaging`` library `is | ||
| available | ||
| <https://github.com/wheelnext/packaging/tree/PEP-9999-JSON-metadata>`__. | ||
|
|
||
| A reference implementation generating both ``METADATA.json`` and ``WHEEL.json`` | ||
| in the ``uv`` build backend `is also available <https://github.com/astral-sh/uv/pull/15510>`__. | ||
|
|
||
|
|
||
| Rejected Ideas | ||
| ============== | ||
|
|
||
| Using Another File Format (TOML, YAML, etc.) | ||
| -------------------------------------------- | ||
|
|
||
| While TOML or another format could be used for the new core metadata file | ||
| format, JSON has been chosen for a few reasons: | ||
|
|
||
| #. Core metadata is mostly meant as a machine interchange format to be used by | ||
| tools and services which wish to interoperate. Therefore the | ||
| human-readability of TOML is not an important consideration in this | ||
| selection. | ||
| #. JSON parsers are implemented in many languages' standard libraries and the | ||
| :mod:`json` module has been part of Python's standard library for a very | ||
| long time. | ||
| #. JSON is fast to parse and emit. | ||
| #. JSON schemas are JSON native and commonly used. | ||
|
|
||
|
|
||
| Open Issues | ||
| =========== | ||
|
|
||
| Where should the JSON schema be served? | ||
| --------------------------------------- | ||
|
|
||
| Where should the standard JSON Schema be served? Some options would be | ||
| packaging.python.org, pypi.org, python.org, or pypa.org. | ||
|
|
||
| My first choice would be packaging.python.org, but I am open to other options. | ||
|
|
||
|
|
||
| Acknowledgements | ||
| ================ | ||
|
|
||
| Thanks to Konstantin Schütze for implementing the reference implementation of | ||
| this PEP in the ``uv`` build backend and for providing valuable feedback on the | ||
| specification. | ||
|
|
||
|
|
||
| Copyright | ||
| ========= | ||
|
|
||
| This document is placed in the public domain or under the | ||
| CC0-1.0-Universal license, whichever is more permissive. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| :orphan: | ||
|
|
||
| .. _0819-core-metadata-json-schema: | ||
|
|
||
| Appendix: JSON Schema for Core Metadata | ||
| ======================================= | ||
|
|
||
| .. literalinclude:: core-metadata.schema.json | ||
| :language: json | ||
| :linenos: | ||
| :name: core-metadata-schema | ||
|
|
||
| .. _0819-wheel-json-schema: | ||
|
|
||
| Appendix: JSON Schema for Wheel Metadata | ||
| ======================================== | ||
|
|
||
| .. literalinclude:: wheel.schema.json | ||
| :language: json | ||
| :linenos: | ||
| :name: wheel-schema |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.