An imperative orchestration harness for LLM agents. Workers delegate to workers; your code owns control flow.
Most agent frameworks are graph DSLs—you define nodes and edges, an engine runs the graph. llm-do is an imperative orchestration harness: your code owns control flow, llm-do intercepts at the tool layer. Think syscalls, not state machines.
| Aspect | Graph DSLs | llm-do Harness |
|---|---|---|
| Orchestration | Declarative: define Node A → Node B | Imperative: Worker A calls Worker B as a function |
| State | Global context passed through graph | Local scope—each worker receives only its arguments |
| Approvals | Checkpoints: serialize graph state, resume after input | Interception: blocking "syscall" at the tool level |
| Refactoring | Redraw edges, update graph definitions | Change code—extract functions, inline workers |
| Control flow | DSL constructs (branches, loops) | Native Python: if, for, try/except |
| Durability | Often built-in checkpointing/replay | None—restart on failure (or integrate external engine) |
| Visualization | Graph editors, visual debugging | Code is the source of truth |
For the full design rationale—including hardening prompts into code (and softening code back to prompts), security posture, and related research—see
docs/concept.md.
This is the Unix philosophy for agents: workers are files, dangerous operations are gated syscalls, composition happens through code—not a DSL.
Delegation. Workers call other workers like function calls. A summarizer delegates to an analyzer; an orchestrator coordinates specialists. Each runs with its own tools and model.
Unified function space. Workers and Python tools are the same abstraction—they call each other freely. LLM reasoning and deterministic code interleave; which is which becomes an implementation detail.
Tight context. Each worker does one thing well. No bloated multi-purpose prompts that try to handle everything. Task executors receive only relevant history—no conversation baggage from parent agents.
Guardrails by construction. Tool approvals gate dangerous operations; tool schemas and toolset policies enforce constraints in code, not prompt instructions.
Progressive hardening. Start with prompts for flexibility. As patterns stabilize, extract deterministic logic to tested Python code. Or go the other direction—soften rigid code into prompts when edge cases multiply.
We use uv for development. Install it via curl -LsSf https://astral.sh/uv/install.sh | sh or see the installation docs.
# Install
uv pip install -e . # or: pip install -e .
# Set your API key
export ANTHROPIC_API_KEY="sk-ant-..." # or OPENAI_API_KEY
# Optional default model (or use --model per run)
export LLM_DO_MODEL="anthropic:claude-haiku-4-5"
# Run a worker
cd examples/greeter
llm-do main.worker "Tell me a joke"llm-do executes the entry point named by --entry from the files you pass. If omitted, the entry defaults to main, so name your entry worker main or pass --entry explicitly. See examples/ for more.
Use the OAuth helper to authenticate with Anthropic subscriptions:
llm-do-oauth login --provider anthropicCredentials are stored at ~/.llm-do/oauth.json. Clear them with:
llm-do-oauth logout --provider anthropicCheck login status:
llm-do-oauth status --provider anthropicWorkers are .worker files: YAML front matter (config) + body (instructions). Workers and Python tools form a unified function space—each is exposed as a callable tool, taking input and returning results. LLM reasoning and deterministic code call each other freely:
Worker ──calls──▶ Tool ──calls──▶ Worker ──calls──▶ Tool ...
reason execute reason
This dual recursion lets each component play to its strengths: LLMs handle ambiguity and context; Python handles precision and speed. See docs/concept.md for the full design philosophy.
Why "workers" not "agents"? llm-do is built on PydanticAI, which uses "agent" for its LLM orchestration primitive. We use "worker" to distinguish our composable, constrained prompt units from the underlying PydanticAI agents that execute them. A worker defines what to do; the PydanticAI agent executes it.
Projects grow organically from simple to complex:
Minimal — just a worker:
my-project/
└── orchestrator.worker
With helpers — orchestrator delegates to focused workers:
my-project/
├── orchestrator.worker # Entry point
├── analyzer.worker # Focused worker
└── formatter.worker # Another focused worker
With hardened operations — extract reliable logic to Python:
my-project/
├── orchestrator.worker
├── analyzer.worker
├── tools.py # Shared Python tools
├── input/
└── output/
This progression reflects progressive hardening: initially you might prompt the LLM to "rename the file to remove special characters". Once you see it works, extract that to a Python function—deterministic, testable, no LLM variability. See the pitchdeck examples for a concrete progression: pitchdeck_eval (all LLM) → pitchdeck_eval_hardened (extracted tools) → pitchdeck_eval_code_entry (Python orchestration).
Add custom tools by creating tools.py in your project root:
# tools.py
from pydantic_ai.toolsets import FunctionToolset
tools = FunctionToolset()
@tools.tool
def sanitize_filename(name: str) -> str:
"""Remove special characters from filename."""
return "".join(c if c.isalnum() or c in ".-_" else "_" for c in name)Functions become LLM-callable tools. Reference the toolset name in your worker's toolsets config and pass tools.py to llm-do.
To access runtime context (for calling other tools/workers), accept a RunContext and use ctx.deps:
# tools.py
from pydantic_ai.tools import RunContext
from pydantic_ai.toolsets import FunctionToolset
from llm_do.runtime import WorkerRuntime
tools = FunctionToolset()
@tools.tool
async def analyze_config(ctx: RunContext[WorkerRuntime], raw: str) -> str:
"""Delegate parsing to a worker."""
return await ctx.deps.call("config_parser", {"input": raw})You can also use:
- Server-side tools — Provider-executed capabilities like web search and code execution
# Run a worker
llm-do main.worker "input message"
# Run a worker with Python toolsets
llm-do main.worker tools.py "input message"
# Choose a non-default entry
llm-do orchestrator.worker tools.py --entry orchestrator "input"
# Override config at runtime
llm-do main.worker --set model=anthropic:claude-sonnet-4 "input"Common flags: --headless, --tui, --chat, --json, -v/-vv, --set, --approve-all, --model. See docs/cli.md for details.
Model names follow PydanticAI conventions (e.g., anthropic:claude-sonnet-4-20250514, openai:gpt-4o-mini).
See docs/cli.md for full reference.
| Example | Demonstrates |
|---|---|
greeter/ |
Minimal project structure |
pitchdeck_eval/ |
Multi-worker orchestration for pitch decks |
pitchdeck_eval_hardened/ |
Progressive hardening: extracted Python tools |
pitchdeck_eval_code_entry/ |
Full hardening: Python orchestration, tool entry point |
calculator/ |
Custom Python tools |
approvals_demo/ |
Write approval for file operations |
file_organizer/ |
Hardening pattern: LLM semantic decisions + Python cleanup |
code_analyzer/ |
Shell commands with approval rules |
web_searcher/ |
Server-side tools (web search) |
Some experiments include standalone Python entry points. Run them from the repo root so imports resolve:
uv run experiments/inv/v2_direct/run.py
uv run -m experiments.inv.v2_direct.rundocs/concept.md— Design philosophydocs/reference.md— API reference: calling workers from Python, writing toolsetsdocs/architecture.md— Internal structure: runtime scopes, execution flowdocs/cli.md— CLI referencedocs/notes/— Working design notes and explorations
Experimental — Built on PydanticAI. APIs may change.
Working: Worker resolution, worker delegation, approvals, custom tools.
TUI: The interactive terminal UI (Textual-based) is experimental. Use --chat to keep it open for multi-turn input, or --headless for non-interactive mode.
llm-do excels at normal-code control flow and progressive hardening. It's not a durable workflow engine—no built-in checkpointing or replay. For that, use llm-do as a component within Temporal, Prefect, or similar.
Tool approvals reduce risk but aren't guarantees. Prompt injection can trick LLMs into misusing granted tools. Treat approvals as one layer of defense.
For real isolation, run llm-do in a container or VM.
PRs welcome! Run uv run pytest before committing. See AGENTS.md.