Skip to content

[FEATURE] Add manifest file to ROR data dumps #352

@adambuttrick

Description

@adambuttrick

Describe the problem you would like to solve

Integrators consuming ROR data dumps currently have no structured way to determine what a ZIP file contains without extracting and inspecting the files themselves. Identifying the schema version, expected record count, and which files correspond to which format requires parsing the actual data records. This is workable but adds complexity, particularly for integrations that need to quickly assess whether a new data dump is compatible before committing to a full download and extraction cycle.

Describe the solution you'd like

Include a machine-readable manifest file (e.g., manifest.json) at the root of each data dump ZIP. At a minimum, the manifest should contain:

  • The schema version(s) of the included data files (mapping each file to its admin.schema_version value)
  • The expected record count per file
  • The release date of the data dump
  • A checksum for each included file to support integrity verification

Example:

{
  "release_date": "2025-02-10",
  "files": [
    {
      "filename": "v2.0-2025-02-10-ror-data.json",
      "schema_version": "2.1",
      "record_count": 120345,
      "checksum_sha256": "a1b2c3..."
    }
  ]
}

Who would benefit from this feature?

Developers building integrations that consume ROR data dumps, including publishers repository managers, library system vendors, and data analysts maintaining local copies of ROR data. Any user that needs to programmatically assess compatibility or validate integrity of a data dump before processing it would benefit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureTotally new functionality that does not exist in ROR currently

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions