🧬 Bio-Agent
Bio-Agent is a lightweight research agent for biomedical and structural biology literature mining.
It automatically fetches recent PubMed papers, parses and stores them locally, and exports structured Markdown reports — without using any LLMs or external APIs.
Current focus: Protein Structure Prediction (e.g. AlphaFold and related methods)
✨ Features
🔍 Automated PubMed search (time window + keyword based)
📄 XML parsing into structured paper fields
🗄️ Local SQLite storage (incremental & reproducible)
📝 Markdown report export
🧠 Heuristic-based “Key Takeaways” digest (no LLM)
🔐 No API keys, no cloud dependency
📦 Installation Requirements
Python 3.8+
Windows / macOS / Linux
Option 1: Development / Local Usage (Recommended) git clone https://github.com/GeYugong/bio-agent.git cd bio-agent
python -m venv .venv
..venv\Scripts\Activate.ps1
source .venv/bin/activate
pip install -e .
Verify installation:
bio-agent --help bio-agent hello
Option 2: Standard Installation pip install .
🚀 Quick Start
1️⃣ Fetch Recent PubMed Papers
bio-agent fetch
--query "protein structure prediction OR AlphaFold"
--days 30
--retmax 20
This will:
Retrieve recent PMIDs from PubMed
Download and parse XML records
Store results in a local SQLite database (bio_agent.db)
2️⃣ Export Markdown Reports bio-agent export --limit 10 --out report.md
Filter by keyword:
bio-agent export
--limit 20
--query-contains AlphaFold
--out alphafold_report.md
3️⃣ Generate README Digest (Key Takeaways) bio-agent digest
What this does:
Reads recent papers from SQLite
Generates 3 key takeaways using heuristic scoring
Injects the digest into the top of README.md
Uses markers to avoid duplicate insertion
📁 Project Structure
bio-agent/
├── src/
│ └── bio_agent/
│ ├── cli.py # Typer-based CLI entrypoint
│ ├── pubmed.py # PubMed E-utilities + XML parsing
│ ├── store.py # SQLite schema & upsert logic
│ ├── exporter.py # Markdown / README export
│ ├── digest.py # Heuristic digest generation
│ └── summarize.py # Extensible summarization logic
├── reports/ # Generated Markdown reports
├── bio_agent.db # Local SQLite database
├── pyproject.toml
└── README.md
🧠 Why No LLM or API?
This is a deliberate design choice:
✅ Fully local & reproducible
✅ No API cost or rate limits
✅ Transparent logic for research workflows
✅ Suitable for scheduled or offline pipelines
LLM-based summarization (OpenAI / Claude / local models) can be added as an optional module in future versions.
🔮 Roadmap
Richer structured fields (methods, datasets, benchmarks)
Optional LLM-based summarization
Scheduled runs via cron / GitHub Actions
Paper embedding & topic clustering
Web UI (Streamlit / Gradio)
📜 License
MIT License
🙌 Acknowledgements
NCBI PubMed E-utilities
AlphaFold & protein structure prediction community
Python open-source ecosystem