Skip to content

Simplifying the discovery and usage of machine-learning ready datasets in materials science and chemistry

License

Notifications You must be signed in to change notification settings

MLMI2-CSSI/foundry

Repository files navigation

PyPI Tests NSF-1931306

Foundry-ML simplifies access to machine learning-ready datasets in materials science and chemistry.

  • Search & Load - Find and use curated datasets with a few lines of code
  • Understand - Rich schemas describe what each field means
  • Cite - Automatic citation generation for publications
  • Publish - Share your datasets with the community
  • AI-Ready - MCP server for Claude and other AI assistants

Quick Start

pip install foundry-ml
from foundry import Foundry

# Connect
f = Foundry()

# Search
results = f.search("band gap", limit=5)

# Load
dataset = results.iloc[0].FoundryDataset
X, y = dataset.get_as_dict()['train']

# Understand
schema = dataset.get_schema()
print(schema['fields'])

# Cite
print(dataset.get_citation())

Cloud Environments

For Google Colab or remote Jupyter:

f = Foundry(no_browser=True, no_local_server=True)

CLI

foundry search "band gap"
foundry schema 10.18126/abc123
foundry --help

AI Agent Integration

foundry mcp install  # Add to Claude Code

Documentation

Features

Feature Description
Search Find datasets by keyword, DOI, or browse catalog
Load Automatic download, caching, and format conversion
PyTorch/TensorFlow dataset.get_as_torch(), dataset.get_as_tensorflow()
CLI Terminal-based workflows
MCP Server AI assistant integration
HuggingFace Export Publish to HuggingFace Hub

Available Datasets

Browse datasets at Foundry-ML.org or:

f = Foundry()
f.list(limit=20)  # See available datasets

How to Cite

If you use Foundry-ML, please cite:

@article{Schmidt2024,
  doi = {10.21105/joss.05467},
  year = {2024},
  publisher = {The Open Journal},
  volume = {9},
  number = {93},
  pages = {5467},
  author = {Kj Schmidt and Aristana Scourtas and Logan Ward and others},
  title = {Foundry-ML - Software and Services to Simplify Access to Machine Learning Datasets in Materials Science},
  journal = {Journal of Open Source Software}
}

Contributing

Foundry is open source. To contribute:

  1. Fork from main
  2. Make your changes
  3. Open a Pull Request

See CONTRIBUTING.md for details.

Support

This work was supported by the National Science Foundation under NSF Award Number: 1931306 "Collaborative Research: Framework: Machine Learning Materials Innovation Infrastructure".

Foundry integrates with Materials Data Facility, DLHub, and MAST-ML.

About

Simplifying the discovery and usage of machine-learning ready datasets in materials science and chemistry

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 23

Languages