-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Labels
featureTotally new functionality that does not exist in ROR currentlyTotally new functionality that does not exist in ROR currently
Description
Context
Several scripts in https://github.com/ror-community/curation_ops repository are used by https://github.com/ror-community/ror-records workflows to generate and publish data dumps. While most of these scripts are path-agnostic and require minimal changes, some contain hard-coded references and conventions that should be updated to complete #353.
Scripts
generate_dump.py
Current behavior:
- Takes
-r(release dir),-e(previous dump name),-i(input path),-o(output path) - Reads previous dump from
{output_path}/{prev-release}.zip - Generates new dump at
{input_path}/{release}-{YYYY-MM-DD}-ror-data.json/.csv - Creates zip at
{output_path}/{release}-{YYYY-MM-DD}-ror-data.zip - Has no hardcoded ror-data repo references; the workflow handles all repo interactions
Changes needed:
- Minimal or none. The script is I/O path agnostic and reads from paths passed as arguments. The workflow changes (sub-issue 01) handle downloading from release artifacts and uploading back.
- Optional: update the
-ror-datasuffix constant (line 18) if the naming convention changes. This is likely best kept as-is since "ror-data" is a known brand/convention for the dump files.
upload_dump_zenodo.py
Current behavior:
DUMP_FILE_DIR = "./"(line 15) scans the current directory for the dump zipget_dump_file()(lines 19-24) matches the filename prefix against the release namecheck_release_data()(line 221) checks"ror-data.zip"in the filename- Error message (line 228):
"Dump file not found in ror-data" GITHUB_API_URL(line 12) points toror-community/ror-updatesfor release notes (this is a different repo from ror-data, used for release notes metadata)format_description()(lines 50-108) contains hardcoded HTML with links to ror-schema and ror-updates repos
Changes needed:
- Update error message on line 228 from
"ror-data"to something generic (e.g.,"Dump file not found in working directory") - Consider parameterizing
DUMP_FILE_DIRas a CLI argument instead of hardcoding"./", so the workflow can explicitly pass the download directory - No other changes are strictly required. [FEATURE] Broken link checker for ROR records #337 handles downloading the artifact to the correct directory before running the script
requirements.txtcurrently pinsrequeststorequests==2.27.1. We should update to a more recent version
Files to review/modify
generate_dump.pyupload_dump_zenodo.pyrequirements.txt
Acceptance criteria
-
generate_dump.pycontinues to work correctly with the modified workflow from sub-issue 01 -
upload_dump_zenodo.pyerror messages are accurate and no longer reference "ror-data" as a directory - Optional:
DUMP_FILE_DIRis parameterizable via CLI argument with"./"as the default -
requirements.txtdependencies are reviewed and updated where appropriate
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
featureTotally new functionality that does not exist in ROR currentlyTotally new functionality that does not exist in ROR currently