Skip to content

GeYugong/bio-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧬 Bio-Agent

Bio-Agent is a lightweight research agent for biomedical and structural biology literature mining.

It automatically fetches recent PubMed papers, parses and stores them locally, and exports structured Markdown reports — without using any LLMs or external APIs.

Current focus: Protein Structure Prediction (e.g. AlphaFold and related methods)

✨ Features

🔍 Automated PubMed search (time window + keyword based)

📄 XML parsing into structured paper fields

🗄️ Local SQLite storage (incremental & reproducible)

📝 Markdown report export

🧠 Heuristic-based “Key Takeaways” digest (no LLM)

🔐 No API keys, no cloud dependency

📦 Installation Requirements

Python 3.8+

Windows / macOS / Linux

Option 1: Development / Local Usage (Recommended) git clone https://github.com/GeYugong/bio-agent.git cd bio-agent

python -m venv .venv

Windows

..venv\Scripts\Activate.ps1

macOS / Linux

source .venv/bin/activate

pip install -e .

Verify installation:

bio-agent --help bio-agent hello

Option 2: Standard Installation pip install .

🚀 Quick Start 1️⃣ Fetch Recent PubMed Papers bio-agent fetch
--query "protein structure prediction OR AlphaFold"
--days 30
--retmax 20

This will:

Retrieve recent PMIDs from PubMed

Download and parse XML records

Store results in a local SQLite database (bio_agent.db)

2️⃣ Export Markdown Reports bio-agent export --limit 10 --out report.md

Filter by keyword:

bio-agent export
--limit 20
--query-contains AlphaFold
--out alphafold_report.md

3️⃣ Generate README Digest (Key Takeaways) bio-agent digest

What this does:

Reads recent papers from SQLite

Generates 3 key takeaways using heuristic scoring

Injects the digest into the top of README.md

Uses markers to avoid duplicate insertion

⚠️ No LLMs are used — all summaries are rule-based and reproducible.

📁 Project Structure

bio-agent/
├── src/
│   └── bio_agent/
│       ├── cli.py        # Typer-based CLI entrypoint
│       ├── pubmed.py     # PubMed E-utilities + XML parsing
│       ├── store.py      # SQLite schema & upsert logic
│       ├── exporter.py   # Markdown / README export
│       ├── digest.py     # Heuristic digest generation
│       └── summarize.py  # Extensible summarization logic
├── reports/              # Generated Markdown reports
├── bio_agent.db          # Local SQLite database
├── pyproject.toml
└── README.md

🧠 Why No LLM or API?

This is a deliberate design choice:

✅ Fully local & reproducible

✅ No API cost or rate limits

✅ Transparent logic for research workflows

✅ Suitable for scheduled or offline pipelines

LLM-based summarization (OpenAI / Claude / local models) can be added as an optional module in future versions.

🔮 Roadmap

Richer structured fields (methods, datasets, benchmarks)

Optional LLM-based summarization

Scheduled runs via cron / GitHub Actions

Paper embedding & topic clustering

Web UI (Streamlit / Gradio)

📜 License

MIT License

🙌 Acknowledgements

NCBI PubMed E-utilities

AlphaFold & protein structure prediction community

Python open-source ecosystem

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages