The open-source toolkit for Harness Engineering — OpenAI's methodology for building software where humans steer and agents execute.
OpenAI published the methodology. We built the tooling.
Reins came from real project pressure: as the harness improved, coding agents became more autonomous, consistent, and reliable. The problem was portability. Those gains were trapped in one repo.
Reins packages the same approach so you can apply it to any project: scaffold the harness, score readiness, diagnose gaps, and iteratively evolve toward stronger agent autonomy.
| Layer | Role | Source of truth |
|---|---|---|
| Skill | Control plane. Teaches the agent when to run Reins and how to interpret output. | skill/Reins/SKILL.md |
| CLI | Execution plane. Produces deterministic JSON for init, audit, doctor, evolve. |
cli/reins/src/lib/commands/ (+ routed via cli/reins/src/index.ts) |
| Human | Steering plane. Sets goals, accepts tradeoffs, and decides product/taste direction. | Prompts + repo decisions |
If this split is unclear, agents drift: they either skip Reins or use it incorrectly. Reins is designed so agents can repeatedly improve repo quality with explicit, machine-readable feedback loops.
1. Install the skill so your agent knows how to use Reins:
npx skills add WellDunDun/reinsThe skill teaches your agent when and how to run every Reins command — command priority (local source vs. npx), JSON output parsing, and when to pair audit with doctor for remediation detail. Once installed, you talk:
You: "Audit this codebase and show me the weakest dimensions"
Agent: runs reins audit, parses JSON, summarizes gaps
You: "Scaffold harness engineering in this repo"
Agent: runs reins init, then walks you through customization
You: "Evolve to the next maturity level"
Agent: runs audit, identifies current level, executes the evolution path
2. Or run the CLI directly for a quick score without the skill:
# "." means "current directory"
npx reins-cli@latest audit .{
"total_score": 6,
"max_score": 18,
"maturity_level": "L1: Assisted",
"recommendations": [
"Create ARCHITECTURE.md with domain map and layer rules",
"Add linter configuration to enforce architectural constraints",
"Create docs/golden-principles.md with mechanical taste rules"
]
}# Check whether your installed skills are outdated
npx skills check
# Update installed skills (including Reins) when updates are available
npx skills updateIf you run Reins directly (without the skill), prefer npx reins-cli@latest ... so agents always use the latest published CLI.
Install/refresh skill -> Audit -> Doctor/Evolve -> Apply changes -> Re-audit
That loop is the product: repeatedly steering agents toward a better repository state.
Most agent rollouts fail for one boring reason: agents can edit code, but the repository doesn't teach them how to reason safely.
Reins gives you a repeatable operating system for agent work:
- A map (
AGENTS.md, architecture docs, indexed decisions) - A score (0-18 maturity audit with concrete gaps)
- A plan (next-step evolution path by maturity level)
- A guardrail model (
risk-policy.json+ CI enforcement signals)
Agent-first development has multiple layers. Reins operates at the repository structure layer — complementary to session orchestration tools, not competing with them.
block-beta
columns 1
block:L3:1
columns 2
A["SESSION EXECUTION"] B["GSD, Flow-Next, etc."]
end
block:L2:1
columns 2
C["REPO READINESS"] D["Reins"]
end
block:L1:1
columns 2
E["THE CODEBASE"] F["Your project"]
end
style L3 fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
style L2 fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
style L1 fill:#0f172a,stroke:#475569,color:#94a3b8
style A fill:transparent,stroke:none,color:#93c5fd
style B fill:transparent,stroke:none,color:#64748b
style C fill:transparent,stroke:none,color:#60a5fa
style D fill:transparent,stroke:none,color:#f472b6
style E fill:transparent,stroke:none,color:#93c5fd
style F fill:transparent,stroke:none,color:#64748b
| Concern | Reins | Session orchestrators |
|---|---|---|
| When you use it | Once per repo, then evolve periodically | Every coding session |
| What it produces | Docs, audit scores, maturity roadmaps | Working code |
| What it prevents | Organizational rot, undocumented architecture | Context rot, wasted tokens |
Use them together. Reins scaffolds your repo so AGENTS.md tells the agent where everything is, ARCHITECTURE.md defines the rules, and golden principles are enforced in CI. Then a session orchestrator runs the actual coding work on top of that well-structured repo.
graph LR
Init["reins init\nScaffold"] --> Audit["reins audit\nScore 0-18"]
Audit --> Evolve["reins evolve\nLevel up"]
Evolve --> Doctor["reins doctor\nHealth check"]
Doctor --> Audit
style Init fill:#1e3a5f,stroke:#3b82f6,color:#e2e8f0
style Audit fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
style Evolve fill:#1e3a5f,stroke:#818cf8,color:#e2e8f0
style Doctor fill:#1e3a5f,stroke:#a78bfa,color:#e2e8f0
reins init . # Scaffold the full structure
reins init . --pack auto # Adaptive pack selection from project signals
reins init . --pack agent-factory # Optional advanced automation pack
reins audit . # Score against harness principles (0-18)
reins evolve . # Roadmap to next maturity level
reins doctor . # Health check with prescriptive fixesEvery repo sits on a maturity spectrum. The audit tells you where you are. The evolve command tells you what to do next.
graph LR
L0["L0: Manual\n0-4"] --> L1["L1: Assisted\n5-8"]
L1 --> L2["L2: Steered\n9-13"]
L2 --> L3["L3: Autonomous\n14-16"]
L3 --> L4["L4: Self-Correcting\n17-18"]
style L0 fill:#1e293b,stroke:#475569,color:#94a3b8
style L1 fill:#1e293b,stroke:#3b82f6,color:#93c5fd
style L2 fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
style L3 fill:#1e3a5f,stroke:#818cf8,color:#e2e8f0
style L4 fill:#312e81,stroke:#a78bfa,color:#e2e8f0
| Score | Level | What it means |
|---|---|---|
| 0-4 | L0: Manual | Traditional engineering, no agent infra |
| 5-8 | L1: Assisted | Agents help, humans still write code |
| 9-13 | L2: Steered | Humans steer, agents execute most code |
| 14-16 | L3: Autonomous | Agents handle full lifecycle |
| 17-18 | L4: Self-Correcting | System maintains and improves itself |
AGENTS.md # Concise map (~100 lines) for agents
ARCHITECTURE.md # Domain map, layer rules, dependency direction
risk-policy.json # Risk tiers + docs drift rules (policy-as-code)
docs/
golden-principles.md # Mechanical taste rules enforced in CI
design-docs/
index.md # Design doc registry with verification status
core-beliefs.md # Agent-first operating principles
product-specs/
index.md # Product spec registry
exec-plans/
active/ # Currently executing plans
completed/ # Historical plans with outcomes
tech-debt-tracker.md # Known debt with priority and ownership
references/ # External LLM-friendly reference docs
generated/ # Auto-generated docs (schema, API specs)
Optional pack:
reins init . --pack auto
reins init . --pack agent-factory--pack auto keeps base scaffold for unknown stacks and selects agent-factory when the repo looks Node/JS compatible.
--pack agent-factory adds an advanced automation layer:
scripts/lint-structure.mjs(hard structural gate)scripts/doc-gardener.mjs+scripts/check-changed-doc-freshness.mjs(docs freshness loop)scripts/pr-review.mjs(soft golden-principles reviewer).github/workflows/risk-policy-gate.yml(risk-tier + docs drift checks).github/workflows/pr-review-bot.yml(PR feedback loop).github/workflows/structural-lint.yml(CI enforcement gate)
reins evolve now includes pack recommendations and reins evolve . --apply can scaffold compatible pack automation into an existing repo.
Each scored 0-3, totaling 0-18:
graph TD
Score["Total Score\n0-18"]
RK["Repository Knowledge\n0-3"]
AE["Architecture Enforcement\n0-3"]
AL["Agent Legibility\n0-3"]
GP["Golden Principles\n0-3"]
AW["Agent Workflow\n0-3"]
GC["Garbage Collection\n0-3"]
RK --> Score
AE --> Score
AL --> Score
GP --> Score
AW --> Score
GC --> Score
style Score fill:#312e81,stroke:#a78bfa,color:#e2e8f0
style RK fill:#1e3a5f,stroke:#3b82f6,color:#e2e8f0
style AE fill:#1e3a5f,stroke:#3b82f6,color:#e2e8f0
style AL fill:#1e3a5f,stroke:#3b82f6,color:#e2e8f0
style GP fill:#1e3a5f,stroke:#3b82f6,color:#e2e8f0
style AW fill:#1e3a5f,stroke:#3b82f6,color:#e2e8f0
style GC fill:#1e3a5f,stroke:#3b82f6,color:#e2e8f0
| Dimension | What it checks |
|---|---|
| Repository Knowledge | AGENTS.md, docs/, versioned execution plans |
| Architecture Enforcement | ARCHITECTURE.md, dependency rules, linters, policy signals |
| Agent Legibility | Bootable app, observability (or CLI diagnosability), lean dependencies |
| Golden Principles | Documented taste rules, CI gate depth, cleanup process |
| Agent Workflow | Agent config, risk policy, PR templates, CI enforcement |
| Garbage Collection | Debt tracking, doc-gardening, quality grades, docs drift rules |
Reins audits itself in CI. Current score:
{
"total_score": 18,
"max_score": 18,
"maturity_level": "L4: Self-Correcting"
}CI gates: lint, test, typecheck, self-audit. Merging to master runs publish: if cli/reins/package.json has a version not yet on npm, it publishes reins-cli and creates a GitHub Release.
PRs from this repository that modify cli/reins/** and do not manually bump cli/reins/package.json are auto-patched by .github/workflows/auto-bump-cli-version.yml (fork PRs are skipped).
For CLI repositories, Reins treats strong diagnosability signals (for example doctor surfaces, CLI diagnostics in CI, and help/error coverage) as the equivalent of service observability infrastructure.
reins/
cli/reins/ # The CLI tool (Bun + TypeScript, zero deps)
src/index.ts # Thin CLI router
src/lib/commands/ # Command handlers (init/audit/doctor/evolve)
src/lib/audit/ # Audit runtime context + scoring internals
package.json
skill/ # Agent skill (Claude Code)
Reins/
SKILL.md # Skill definition and routing
HarnessMethodology.md # Full methodology reference
Workflows/
Scaffold.md # Scaffold workflow
Audit.md # Audit workflow
Evolve.md # Evolve workflow
- Bun v1.0+ or Node.js 18+
- No other dependencies
See CONTRIBUTING.md for guidelines.
Based on OpenAI's Harness Engineering (February 2026, Ryan Lopopolo). The five pillars:
- Repository as system of record — all knowledge versioned in-repo
- Layered domain architecture — strict layer ordering with forward-only dependencies
- Agent legibility — optimize for agent understanding, not just human readability
- Golden principles — encode human taste mechanically, enforce in CI
- Garbage collection — background agents clean drift continuously