Skip to content

New helper function to help with file downloads#55

Merged
prototaxites merged 24 commits intonf-core:mainfrom
muffato:download_file
Dec 9, 2025
Merged

New helper function to help with file downloads#55
prototaxites merged 24 commits intonf-core:mainfrom
muffato:download_file

Conversation

@muffato
Copy link
Member

@muffato muffato commented Dec 8, 2025

Introducing curlAndExtract.
This function can be used when tests need a tailored database to be downloaded and unzipped/untarred before the test starts. One version of the method auto-detects the format based on the name, but I also provide another method for URLs that are a bit obscure.
cc @prototaxites : we have a few of those in our tests !

muffato and others added 2 commits December 8, 2025 15:38
Co-authored-by: Jim Downie <19718667+prototaxites@users.noreply.github.com>
Copy link
Contributor

@prototaxites prototaxites left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of other thoughts:

  1. Is it worth having a plain curl function that just downloads a file to a specific location without extracting?

  2. Do we want to support any other common archive formats - .zip, for instance, with separate functions?

@muffato
Copy link
Member Author

muffato commented Dec 8, 2025

  1. Is it worth having a plain curl function that just downloads a file to a specific location without extracting?

I thought about that, but Nextflow is already able to stage standalone files. Is

  1. Do we want to support any other common archive formats - .zip, for instance, with separate functions?

I can rename the function to curlAndUntar to make it clear only tar archives are currently supported.

unzip doesn't support getting the archives from stdin, so it'll have to run two separate commands

   Archives read from standard input are not yet supported, except with funzip (and then only the first member of the archive can be extracted).

That could form curlAndUnzip, with curlAndExtract being a dispatcher based on the file extension ?

@prototaxites
Copy link
Contributor

I thought about that, but Nextflow is already able to stage standalone files. Is

This is true, but I was thinking that this might be helpful in cases where one might need to perform further operations on a file prior to usage - e.g., download a gzipped fasta and ungzip it.

That could form curlAndUnzip, with curlAndExtract being a dispatcher based on the file extension ?

I like that idea!

@muffato
Copy link
Member Author

muffato commented Dec 8, 2025

Almost there. Will finish after dinner

@muffato muffato marked this pull request as ready for review December 9, 2025 11:32
@muffato muffato changed the title [WIP] New helper function to help with file downloads New helper function to help with file downloads Dec 9, 2025
@maxulysse
Copy link
Member

I really like this, and for me, this is more settings things up outside Nextflow/nf-test, so we can rely less on modules to download and extract files we need for tests.

Copy link
Contributor

@prototaxites prototaxites left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! I will give it a quick build and try replacing my nf-test with it, to see that it handles a "real" use-case - but I suspect there will be no issues.

@prototaxites
Copy link
Contributor

It does indeed work!

        setup {
            curlAndExtract("https://tolit.cog.sanger.ac.uk/test-data/resources/gtdbtk/mockup.tar.gz", "${launchDir}/gtdbtk")
            curlAndExtract("https://tolit.cog.sanger.ac.uk/test-data/resources/metagenomeassembly/checkm2_database.tar.gz", "${launchDir}/checkm2")
            curlAndExtract("https://tolit.cog.sanger.ac.uk/test-data/resources/metagenomeassembly/genomad_db_v1.9.tar.gz", "${launchDir}/genomad")
        }
🚀 nf-test 0.9.3
https://www.nf-test.com
(c) 2021 - 2024 Lukas Forer and Sebastian Schoenherr

Load /Users/jd42/Projects/nft-utils/target/nft-utils-0.0.6.jar
Load .nf-test/plugins/nft-csv/0.1.0/nft-csv-0.1.0.jar

Test pipeline

  Test [b5152990] '-profile test' Successfully downloaded and extracted file: https://tolit.cog.sanger.ac.uk/test-data/resources/gtdbtk/mockup.tar.gz
Successfully downloaded and extracted file: https://tolit.cog.sanger.ac.uk/test-data/resources/metagenomeassembly/checkm2_database.tar.gz
Successfully downloaded and extracted file: https://tolit.cog.sanger.ac.uk/test-data/resources/metagenomeassembly/genomad_db_v1.9.tar.gz

    > Nextflow 25.10.2 is available - Please consider updating your version to it
    > N E X T F L O W  ~  version 25.10.0
    > Launching `/Users/jd42/Projects/metagenomeassembly/tests/../main.nf` [scruffy_stone] DSL2 - revision: 5a6c80ef98
    >
    > ------------------------------------------------------
    >    _____                                _______    _
    >   / ____|                              |__   __|  | |
    >  | (___   __ _ _ __   __ _  ___ _ __  ___ | | ___ | |
    >   \___ \ / _` | '_ \ / _` |/ _ \ '__||___|| |/ _ \| |
    >   ____) | (_| | | | | (_| |  __/ |        | | (_) | |____
    >  |_____/ \__,_|_| |_|\__, |\___|_|        |_|\___/|______|
    >                       __/ |
    >                      |___/
    >   sanger-tol/metagenomeassembly 1.3.1
    > ------------------------------------------------------
    >
    > Input/output options
    >   genomad_db                : /Users/jd42/Projects/metagenomeassembly/.nf-test/tests/b5152990c7ba186102006ae31d0e8ada/genomad/genomad_db

    >   checkm2_db                : /Users/jd42/Projects/metagenomeassembly/.nf-test/tests/b5152990c7ba186102006ae31d0e8ada/checkm2/CheckM2_database/uniref100.KO.1.dmnd
    >
    > Bin taxonomy options
    >   gtdbtk_db                 : /Users/jd42/Projects/metagenomeassembly/.nf-test/tests/b5152990c7ba186102006ae31d0e8ada/gtdbtk/r226_mockup

@muffato
Copy link
Member Author

muffato commented Dec 9, 2025

Super ! Can you please merge, then ? I don't have write permissions

@prototaxites prototaxites merged commit 5c77648 into nf-core:main Dec 9, 2025
5 checks passed
@muffato muffato deleted the download_file branch December 9, 2025 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants