Transform physical notebook images into structured markdown notes using AI vision models.
- Tag-based extraction: Generate contextual tags first, then use them to improve extraction accuracy (Claude only)
- Multiple model support: Choose between local TrOCR, Claude API, or local Ollama vision models
- Image optimization: Automatic resizing, compression, and optional grayscale conversion to reduce token usage
- Template-based output: Customizable markdown templates for consistent note formatting
- Custom prompts: Use different prompts for different extraction tasks (e.g., bullet points, detailed notes)
- Custom source: Specify your own source description instead of using the filename
- Test-driven development: Comprehensive pytest suite with 73% test coverage
git clone <repository-url>
cd notebook-parser
uv sync-
Get your API key from Anthropic Console
-
Set your API key (choose one method):
Option A: Using .env file (recommended)
# Copy the example file cp .env.example .env # Edit .env and add your key # ANTHROPIC_API_KEY=sk-ant-api03-your-key-here
Option B: Export environment variable
export ANTHROPIC_API_KEY=sk-ant-api03-your-key-here -
Parse a notebook image:
# Basic usage (outputs to results/notebook.md with bullet points template)
uv run notebook-parser parse -i notebook.jpg --model claude
# With tag-based extraction for better accuracy (recommended)
uv run notebook-parser parse -i notebook.jpg --model claude --tags
# With custom output path
uv run notebook-parser parse -i notebook.jpg -o note.md --model claude --tags
# With custom source description
uv run notebook-parser parse -i notebook.jpg --model claude --tags --source "My Lecture Notes"Note: Uses Claude Sonnet 4.5 (latest model as of January 2026). If no output path is specified, files are saved to results/ directory with the same name as the input file.
- Install and start Ollama:
brew install ollama # macOS
ollama serve
ollama pull llama3.2-vision- Parse a notebook image:
uv run notebook-parser parse -i notebook.jpg -o note.md --model ollamauv run notebook-parser parse -i notebook.jpg -o note.md --model localnotebook-parser parse [OPTIONS]Required Options:
-i, --input PATH: Input image file
Optional Output:
-o, --output PATH: Output markdown file (default:results/<input-name>.md)
Model Options:
--model [local|claude|ollama]: Model to use (default: local)local: TrOCR (basic OCR, lower quality)claude: Claude Sonnet 4.5 vision API (best quality, requires API key)ollama: Local Ollama vision model (good quality, fully private)
Image Optimization Options:
--optimize/--no-optimize: Optimize image for LLM vision (default: True)--grayscale: Convert to grayscale to save tokens (~3x reduction)
Claude-specific Options:
--api-key TEXT: Anthropic API key (or set ANTHROPIC_API_KEY env var)--tags: Generate tags first, then use as context for better extraction (recommended)
Ollama-specific Options:
--ollama-model TEXT: Ollama model name (default: llama3.2-vision)--ollama-url TEXT: Ollama API endpoint (default: http://localhost:11434)
Template Options:
-t, --template PATH: Custom template file (default: templates/bullet-points-template.md)-p, --prompt TEXT: Prompt name to use (without .txt extension, e.g., 'bullet-points')
Source Options:
-s, --source TEXT: Custom source description (default: image filename)
Extract text quickly without template formatting:
notebook-parser read notebook.jpgexport ANTHROPIC_API_KEY=sk-ant-api03-xxx
uv run notebook-parser parse -i page1.jpg --model claude
# Outputs to: results/page1.mduv run notebook-parser parse -i page1.jpg --model claude --tags
# Generates contextual tags first, then uses them to improve extractionuv run notebook-parser parse -i page1.jpg --model claude --tags --source "CS101 Lecture 3"
# Sets source to "CS101 Lecture 3" instead of "page1.jpg"uv run notebook-parser parse -i page1.jpg -o my-notes/page1.md --model claude --tagsuv run notebook-parser parse -i page1.jpg --model claude --tags --grayscaleuv run notebook-parser parse -i page1.jpg --model ollama --ollama-model llavauv run notebook-parser parse -i page1.jpg --model claude --template templates/note-template.md --prompt clean-bullet-pointsuv run notebook-parser parse -i page1.jpg --model claude --no-optimizeThe default template (templates/bullet-points-template.md) creates clean bullet point notes:
**Title**: <extracted-title>
**Source**: <custom-source-or-filename>
**Date**: <current-date>
**Tags**: <generated-tags> #notes #handwritten
**Status**: Bullet Points
## Key Points
<extracted-bullet-points>When using --tags, the tags field will include AI-generated Obsidian-compatible tags (e.g., #machine-learning #python) along with the default tags.
The note template (templates/note-template.md) provides a more structured format:
**Title**: <extracted-title>
**Source**: <image-filename>
**Date**: <current-date>
**Tags**: #notes #handwritten
**Status**: Raw Note
## Key Idea
<extracted-content>
## Why It Matters
*To be filled*
## How I Might Use It
*To be filled*Use with: --template templates/note-template.md
Prompts are stored in the prompts/ directory and guide how the AI extracts text from your images.
- bullet-points (
prompts/bullet-points.txt): Basic extraction as bullet points, handling arrows and schemas - clean-bullet-points (
prompts/clean-bullet-points.txt): Advanced extraction with interpretation, error correction, and cleaner output (recommended) - generate-tags (
prompts/generate-tags.txt): Generates Obsidian-compatible tags for the note (used automatically with--tags) - bullet-points-with-tags (
prompts/bullet-points-with-tags.txt): Context-aware extraction using generated tags (used automatically with--tags)
Note: When using the --tags flag, the system automatically uses generate-tags and bullet-points-with-tags prompts in a two-step process for improved accuracy.
- Create a new
.txtfile in theprompts/directory - Write your extraction instructions
- Use it with:
--prompt your-prompt-name(without .txt extension)
Example prompt structure:
These are handwritten notes, they may contain arrows and schemas too. Transform it to bullet points.
When using Claude or Ollama models with --optimize enabled (default):
- Claude: Images resized to max 1568px, JPEG quality 85
- Ollama: Images resized to max 1024px, JPEG quality 75
- Grayscale: Optional flag reduces token usage by ~3x
uv run pytest -vuv run pytest --cov=src| Model | Quality | Speed | Cost | Privacy |
|---|---|---|---|---|
| TrOCR (local) | Low | Fast | Free | Full |
| Ollama (local) | Good | Medium | Free | Full |
| Claude API | Best | Fast | $3/1K images* | API calls |
*Estimated cost based on average image size and token usage
- Ensure
ANTHROPIC_API_KEYis set correctly - Check API key has proper permissions at console.anthropic.com
- Verify Ollama is running:
ollama list - Check Ollama URL:
curl http://localhost:11434/api/tags - Pull the model:
ollama pull llama3.2-vision
- Use
--model claudeor--model ollamafor better handwriting recognition - TrOCR works best with typed text, not handwritten notes
- Compare models -> Benchmark
- Test -> elevate testing converage
- A/B test prompts
- Prune docs / unused options