-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Describe the problem you would like to solve
Integrators consuming ROR data dumps currently have no structured way to determine what a ZIP file contains without extracting and inspecting the files themselves. Identifying the schema version, expected record count, and which files correspond to which format requires parsing the actual data records. This is workable but adds complexity, particularly for integrations that need to quickly assess whether a new data dump is compatible before committing to a full download and extraction cycle.
Describe the solution you'd like
Include a machine-readable manifest file (e.g., manifest.json) at the root of each data dump ZIP. At a minimum, the manifest should contain:
- The schema version(s) of the included data files (mapping each file to its
admin.schema_versionvalue) - The expected record count per file
- The release date of the data dump
- A checksum for each included file to support integrity verification
Example:
{
"release_date": "2025-02-10",
"files": [
{
"filename": "v2.0-2025-02-10-ror-data.json",
"schema_version": "2.1",
"record_count": 120345,
"checksum_sha256": "a1b2c3..."
}
]
}Who would benefit from this feature?
Developers building integrations that consume ROR data dumps, including publishers repository managers, library system vendors, and data analysts maintaining local copies of ROR data. Any user that needs to programmatically assess compatibility or validate integrity of a data dump before processing it would benefit.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status