Skip to content

Conversation

@joverlee521
Copy link
Contributor

Description of proposed changes

Use uncompressed outputs so that the uploaded files are no longer double compressed.¹

Once upload-to-s3 supports compressed local files, we can go back to using compressed intermediate files.

Related issue(s)

Related to #343

Checklist

  • Checks pass
  • Update changelog

Use uncompressed outputs so that the uploaded files are no longer
double compressed.¹

Once `upload-to-s3` supports compressed local files, we can go back
to using compressed intermediate files.

¹ <#343>
@joverlee521
Copy link
Contributor Author

Merging without review to fix the files on S3. Will re-run the ingest workflow after merge.

@joverlee521 joverlee521 merged commit 112aff7 into master Nov 13, 2025
9 of 13 checks passed
@joverlee521 joverlee521 deleted the uncompress-open-data branch November 13, 2025 22:20
@joverlee521
Copy link
Contributor Author

@corneliusroemer
Copy link
Member

Thanks and sorry for this. How did you notice the double compression? It hasn't caused any issues in any of our CI runs as far as I can tell?

@joverlee521
Copy link
Contributor Author

Thanks and sorry for this. How did you notice the double compression? It hasn't caused any issues in any of our CI runs as far as I can tell?

Yeah, the CI uses example data and the automated phylo workflows start from the *_with_restricted files, so nothing flagged this issue. I only noticed because I was curious why the metadata diff on Slack started reporting "Binary files /tmp/s3-file-GVSf4x and results/metadata.tsv differ". This led me to realize the S3 path for the diff was incorrect and prompted me to take a look at the files on S3.

@joverlee521
Copy link
Contributor Author

the automated phylo workflows start from the *_with_restricted files, so nothing flagged this issue.

It would be good for us to use both the OPEN and RESTRICTED files and to not host duplicate OPEN records on S3. Updating ingest to upload completely separate files for OPEN and RESTRICTED records in #347

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants