Skip to content

[FEATURE] Modify publish_dump_zenodo.yml to download dump from release artifact #356

@adambuttrick

Description

@adambuttrick

Context

The publish_dump_zenodo.yml workflow currently checks out the ror-community/ror-data repository and scans it for a zip file matching the release name in order to upload the dump to Zenodo. As part of the migration away from the ror-data repo, this workflow needs to be updated to download the dump zip from a ror-records release artifact instead.

Current behavior

  • Checks out ror-community/ror-data repo (line 38-40)
  • Changes into ./ror-data directory (line 48)
  • Downloads and runs upload_dump_zenodo.py from curation_ops (schema-v2-1 branch) via raw curl
  • upload_dump_zenodo.py scans the current directory (DUMP_FILE_DIR = "./") for a zip matching the release name
  • The script also calls the GitHub API for ror-community/ror-updates to get release notes data (total orgs, added, updated counts)
  • Uses actions/checkout@v2 and Python 3.9 (both outdated)

Proposed changes

  • Remove the ror-data checkout step entirely
  • Add a step to download the dump zip from the ror-records release artifact using gh release download {release-tag} --pattern '*ror-data*.zip'
  • Download to a working directory, then cd into it before running the script so that the script's DUMP_FILE_DIR assumption (current directory) still works
  • Upgrade actions/checkout to v4
  • Upgrade Python to 3.11
  • Check out curation_ops properly instead of curling individual files (current workflow curls from the schema-v2-1 branch; the checkout should target main to match generate_dump.yml, or whichever branch is canonical at the time of implementation)

Open question

  • upload_dump_zenodo.py looks like it hardcodes a check that the filename contains "ror-data.zip" (line 221) and has an error message "Dump file not found in ror-data" (line 228). These are string conventions and should still work with the new artifact-based flow, but the error message is misleading and could be updated in the curation_ops sub-issue.

Files to modify

  • .github/workflows/publish_dump_zenodo.yml

Acceptance criteria

  • Workflow no longer checks out the ror-data repository
  • Dump zip is downloaded from the ror-records release asset
  • Zenodo publication works identically to the current behavior
  • actions/checkout upgraded to v4
  • Python version upgraded to 3.11
  • curation_ops is checked out properly rather than curled as raw files

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureTotally new functionality that does not exist in ROR currently

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions