Describe the problem you would like to solve
Integrators consuming ROR data dumps currently have no structured way to determine what a ZIP file contains without extracting and inspecting the files themselves. Identifying the schema version, expected record count, and which files correspond to which format requires parsing the actual data records. This is workable but adds complexity, particularly for integrations that need to quickly assess whether a new data dump is compatible before committing to a full download and extraction cycle.
Describe the solution you'd like
Include a machine-readable manifest file (e.g., manifest.json) at the root of each data dump ZIP. At a minimum, the manifest should contain:
- The schema version(s) of the included data files (mapping each file to its
admin.schema_version value)
- The expected record count per file
- The release date of the data dump
- A checksum for each included file to support integrity verification
Example:
{
"release_date": "2025-02-10",
"files": [
{
"filename": "v2.0-2025-02-10-ror-data.json",
"schema_version": "2.1",
"record_count": 120345,
"checksum_sha256": "a1b2c3..."
}
]
}
Who would benefit from this feature?
Developers building integrations that consume ROR data dumps, including publishers repository managers, library system vendors, and data analysts maintaining local copies of ROR data. Any user that needs to programmatically assess compatibility or validate integrity of a data dump before processing it would benefit.
Describe the problem you would like to solve
Integrators consuming ROR data dumps currently have no structured way to determine what a ZIP file contains without extracting and inspecting the files themselves. Identifying the schema version, expected record count, and which files correspond to which format requires parsing the actual data records. This is workable but adds complexity, particularly for integrations that need to quickly assess whether a new data dump is compatible before committing to a full download and extraction cycle.
Describe the solution you'd like
Include a machine-readable manifest file (e.g.,
manifest.json) at the root of each data dump ZIP. At a minimum, the manifest should contain:admin.schema_versionvalue)Example:
{ "release_date": "2025-02-10", "files": [ { "filename": "v2.0-2025-02-10-ror-data.json", "schema_version": "2.1", "record_count": 120345, "checksum_sha256": "a1b2c3..." } ] }Who would benefit from this feature?
Developers building integrations that consume ROR data dumps, including publishers repository managers, library system vendors, and data analysts maintaining local copies of ROR data. Any user that needs to programmatically assess compatibility or validate integrity of a data dump before processing it would benefit.