fetch-artifacts

A Julia-style artifact system for Python. Manage large binary files with TOML-based configuration, automatic downloading, content-addressable caching, and checksum verification.

Features

Julia-compatible: Uses the same Artifacts.toml format as Julia's Pkg.Artifacts
Content-addressable storage: Artifacts cached by git-tree-sha1 hash for deduplication
Lazy loading: Download artifacts only when accessed
Checksum verification: SHA256 verification for all downloads
Multiple mirrors: Support for fallback download sources
Simple API: Minimal code to load and use artifacts

Installation

pip install fetch-artifacts

Usage

1. Create an Artifacts.toml file

[MyDataset]
git-tree-sha1 = "d309b571f5693718c8612d387820a409479fe506"

    [[MyDataset.download]]
    url = "https://example.com/dataset.tar.xz"
    sha256 = "d309b571f5693718c8612d387820a409479fe50688d4c46c87ba8662c6acc09b"

2. Load artifacts in Python

from fetch_artifacts import artifact

# Get path to the artifact (downloads if needed)
dataset_path = artifact("MyDataset")

# Use the artifact
import pandas as pd
data = pd.read_csv(dataset_path / "data.csv")

3. Create and publish artifacts

from fetch_artifacts import create_artifact, bind_artifact

# Create archive from directory
result = create_artifact(
    directory="path/to/data",
    archive_path="output.tar.xz",
    compression="xz"
)

# Add to Artifacts.toml
bind_artifact(
    toml_path="Artifacts.toml",
    name="MyArtifact",
    git_tree_sha1=result['git_tree_sha1'],
    download_url="https://example.com/artifact.tar.xz",
    sha256=result['sha256']
)

4. Add existing remote files

from fetch_artifacts import add_artifact

# Download, compute hashes, and add to Artifacts.toml in one step
add_artifact(
    toml_path="Artifacts.toml",
    name="RemoteDataset",
    tarball_url="https://zenodo.org/records/12345/files/data.tar.xz"
)

Advanced Usage

Custom cache directory:

from fetch_artifacts import set_cache_dir
set_cache_dir("/path/to/cache")

Check if artifact exists:

from fetch_artifacts import artifact_exists
if artifact_exists("MyArtifact"):
    print("Artifact is cached")

Clear cache:

from fetch_artifacts import clear_artifact_cache
clear_artifact_cache("MyArtifact")  # Clear specific artifact
clear_artifact_cache()              # Clear all artifacts

Custom metadata:

[MyEmulator]
git-tree-sha1 = "abc123..."
description = "Neural network emulator for cosmology"
version = "2.0"

    [[MyEmulator.download]]
    url = "https://zenodo.org/records/12345/files/emulator.tar.xz"
    sha256 = "def456..."

Access metadata:

from fetch_artifacts import load_artifacts

manager = load_artifacts("Artifacts.toml")
metadata = manager.artifacts["MyEmulator"].metadata
print(metadata["description"])  # "Neural network emulator for cosmology"

Why fetch-artifacts?

Managing large datasets or model files in scientific computing has several challenges:

git-lfs: Expensive, coupled to git history, doesn't deduplicate across projects
Direct downloads: No versioning, no automatic checksums, manual management
fetch-artifacts: Content-addressable, automatic verification, global caching, platform-independent

Inspired by Julia's Pkg.Artifacts, fetch-artifacts brings the same robust workflow to Python.

Artifacts.toml Format

[ArtifactName]
git-tree-sha1 = "abc123..."  # Content hash (required)

    [[ArtifactName.download]]
    url = "https://primary.com/data.tar.xz"
    sha256 = "def456..."

    [[ArtifactName.download]]  # Optional fallback mirror
    url = "https://mirror.com/data.tar.xz"
    sha256 = "def456..."

Development

git clone https://github.com/CosmologicalEmulators/fetch-artifacts.git
cd fetch-artifacts
poetry install
poetry run pytest tests/ -v --cov=fetch_artifacts

License

MIT License. See LICENSE for details.

Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
fetch_artifacts		fetch_artifacts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

fetch-artifacts

Features

Installation

Usage

1. Create an Artifacts.toml file

2. Load artifacts in Python

3. Create and publish artifacts

4. Add existing remote files

Advanced Usage

Why fetch-artifacts?

Artifacts.toml Format

Development

License

Contributing

Links

About

Uh oh!

Releases 1

Packages

Languages

License

CosmologicalEmulators/fetch-artifacts

Folders and files

Latest commit

History

Repository files navigation

fetch-artifacts

Features

Installation

Usage

1. Create an Artifacts.toml file

2. Load artifacts in Python

3. Create and publish artifacts

4. Add existing remote files

Advanced Usage

Why fetch-artifacts?

Artifacts.toml Format

Development

License

Contributing

Links

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages