Skip to content

Harness Engineering CLI — scaffold, audit, evolve, and doctor agent-readiness for any repo

License

Notifications You must be signed in to change notification settings

WellDunDun/reins

reins

npm version CI License: MIT CodeRabbit Pull Request Reviews Zero Dependencies Node Bun

The open-source toolkit for Harness Engineering — OpenAI's methodology for building software where humans steer and agents execute.

OpenAI published the methodology. We built the tooling.

Why Reins exists

Reins came from real project pressure: as the harness improved, coding agents became more autonomous, consistent, and reliable. The problem was portability. Those gains were trapped in one repo.

Reins packages the same approach so you can apply it to any project: scaffold the harness, score readiness, diagnose gaps, and iteratively evolve toward stronger agent autonomy.

The model (for humans and agents)

Layer Role Source of truth
Skill Control plane. Teaches the agent when to run Reins and how to interpret output. skill/Reins/SKILL.md
CLI Execution plane. Produces deterministic JSON for init, audit, doctor, evolve. cli/reins/src/lib/commands/ (+ routed via cli/reins/src/index.ts)
Human Steering plane. Sets goals, accepts tradeoffs, and decides product/taste direction. Prompts + repo decisions

If this split is unclear, agents drift: they either skip Reins or use it incorrectly. Reins is designed so agents can repeatedly improve repo quality with explicit, machine-readable feedback loops.

Quick start

1. Install the skill so your agent knows how to use Reins:

npx skills add WellDunDun/reins

The skill teaches your agent when and how to run every Reins command — command priority (local source vs. npx), JSON output parsing, and when to pair audit with doctor for remediation detail. Once installed, you talk:

You:   "Audit this codebase and show me the weakest dimensions"
Agent: runs reins audit, parses JSON, summarizes gaps

You:   "Scaffold harness engineering in this repo"
Agent: runs reins init, then walks you through customization

You:   "Evolve to the next maturity level"
Agent: runs audit, identifies current level, executes the evolution path

2. Or run the CLI directly for a quick score without the skill:

# "." means "current directory"
npx reins-cli@latest audit .
{
  "total_score": 6,
  "max_score": 18,
  "maturity_level": "L1: Assisted",
  "recommendations": [
    "Create ARCHITECTURE.md with domain map and layer rules",
    "Add linter configuration to enforce architectural constraints",
    "Create docs/golden-principles.md with mechanical taste rules"
  ]
}

Keep Reins fresh

# Check whether your installed skills are outdated
npx skills check

# Update installed skills (including Reins) when updates are available
npx skills update

If you run Reins directly (without the skill), prefer npx reins-cli@latest ... so agents always use the latest published CLI.

The steering loop

Install/refresh skill -> Audit -> Doctor/Evolve -> Apply changes -> Re-audit

That loop is the product: repeatedly steering agents toward a better repository state.

Why teams adopt Reins

Most agent rollouts fail for one boring reason: agents can edit code, but the repository doesn't teach them how to reason safely.

Reins gives you a repeatable operating system for agent work:

  • A map (AGENTS.md, architecture docs, indexed decisions)
  • A score (0-18 maturity audit with concrete gaps)
  • A plan (next-step evolution path by maturity level)
  • A guardrail model (risk-policy.json + CI enforcement signals)

Where Reins fits

Agent-first development has multiple layers. Reins operates at the repository structure layer — complementary to session orchestration tools, not competing with them.

block-beta
    columns 1
    block:L3:1
        columns 2
        A["SESSION EXECUTION"] B["GSD, Flow-Next, etc."]
    end
    block:L2:1
        columns 2
        C["REPO READINESS"] D["Reins"]
    end
    block:L1:1
        columns 2
        E["THE CODEBASE"] F["Your project"]
    end

    style L3 fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
    style L2 fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
    style L1 fill:#0f172a,stroke:#475569,color:#94a3b8
    style A fill:transparent,stroke:none,color:#93c5fd
    style B fill:transparent,stroke:none,color:#64748b
    style C fill:transparent,stroke:none,color:#60a5fa
    style D fill:transparent,stroke:none,color:#f472b6
    style E fill:transparent,stroke:none,color:#93c5fd
    style F fill:transparent,stroke:none,color:#64748b
Loading
Concern Reins Session orchestrators
When you use it Once per repo, then evolve periodically Every coding session
What it produces Docs, audit scores, maturity roadmaps Working code
What it prevents Organizational rot, undocumented architecture Context rot, wasted tokens

Use them together. Reins scaffolds your repo so AGENTS.md tells the agent where everything is, ARCHITECTURE.md defines the rules, and golden principles are enforced in CI. Then a session orchestrator runs the actual coding work on top of that well-structured repo.

The four commands

graph LR
    Init["reins init\nScaffold"] --> Audit["reins audit\nScore 0-18"]
    Audit --> Evolve["reins evolve\nLevel up"]
    Evolve --> Doctor["reins doctor\nHealth check"]
    Doctor --> Audit

    style Init fill:#1e3a5f,stroke:#3b82f6,color:#e2e8f0
    style Audit fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
    style Evolve fill:#1e3a5f,stroke:#818cf8,color:#e2e8f0
    style Doctor fill:#1e3a5f,stroke:#a78bfa,color:#e2e8f0
Loading
reins init .           # Scaffold the full structure
reins init . --pack auto  # Adaptive pack selection from project signals
reins init . --pack agent-factory  # Optional advanced automation pack
reins audit .          # Score against harness principles (0-18)
reins evolve .         # Roadmap to next maturity level
reins doctor .         # Health check with prescriptive fixes

The maturity model

Every repo sits on a maturity spectrum. The audit tells you where you are. The evolve command tells you what to do next.

graph LR
    L0["L0: Manual\n0-4"] --> L1["L1: Assisted\n5-8"]
    L1 --> L2["L2: Steered\n9-13"]
    L2 --> L3["L3: Autonomous\n14-16"]
    L3 --> L4["L4: Self-Correcting\n17-18"]

    style L0 fill:#1e293b,stroke:#475569,color:#94a3b8
    style L1 fill:#1e293b,stroke:#3b82f6,color:#93c5fd
    style L2 fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
    style L3 fill:#1e3a5f,stroke:#818cf8,color:#e2e8f0
    style L4 fill:#312e81,stroke:#a78bfa,color:#e2e8f0
Loading
Score Level What it means
0-4 L0: Manual Traditional engineering, no agent infra
5-8 L1: Assisted Agents help, humans still write code
9-13 L2: Steered Humans steer, agents execute most code
14-16 L3: Autonomous Agents handle full lifecycle
17-18 L4: Self-Correcting System maintains and improves itself

What reins init scaffolds

AGENTS.md                        # Concise map (~100 lines) for agents
ARCHITECTURE.md                  # Domain map, layer rules, dependency direction
risk-policy.json                 # Risk tiers + docs drift rules (policy-as-code)
docs/
  golden-principles.md           # Mechanical taste rules enforced in CI
  design-docs/
    index.md                     # Design doc registry with verification status
    core-beliefs.md              # Agent-first operating principles
  product-specs/
    index.md                     # Product spec registry
  exec-plans/
    active/                      # Currently executing plans
    completed/                   # Historical plans with outcomes
    tech-debt-tracker.md         # Known debt with priority and ownership
  references/                    # External LLM-friendly reference docs
  generated/                     # Auto-generated docs (schema, API specs)

Optional pack:

reins init . --pack auto
reins init . --pack agent-factory

--pack auto keeps base scaffold for unknown stacks and selects agent-factory when the repo looks Node/JS compatible.

--pack agent-factory adds an advanced automation layer:

  • scripts/lint-structure.mjs (hard structural gate)
  • scripts/doc-gardener.mjs + scripts/check-changed-doc-freshness.mjs (docs freshness loop)
  • scripts/pr-review.mjs (soft golden-principles reviewer)
  • .github/workflows/risk-policy-gate.yml (risk-tier + docs drift checks)
  • .github/workflows/pr-review-bot.yml (PR feedback loop)
  • .github/workflows/structural-lint.yml (CI enforcement gate)

reins evolve now includes pack recommendations and reins evolve . --apply can scaffold compatible pack automation into an existing repo.

The six audit dimensions

Each scored 0-3, totaling 0-18:

graph TD
    Score["Total Score\n0-18"]
    RK["Repository Knowledge\n0-3"]
    AE["Architecture Enforcement\n0-3"]
    AL["Agent Legibility\n0-3"]
    GP["Golden Principles\n0-3"]
    AW["Agent Workflow\n0-3"]
    GC["Garbage Collection\n0-3"]

    RK --> Score
    AE --> Score
    AL --> Score
    GP --> Score
    AW --> Score
    GC --> Score

    style Score fill:#312e81,stroke:#a78bfa,color:#e2e8f0
    style RK fill:#1e3a5f,stroke:#3b82f6,color:#e2e8f0
    style AE fill:#1e3a5f,stroke:#3b82f6,color:#e2e8f0
    style AL fill:#1e3a5f,stroke:#3b82f6,color:#e2e8f0
    style GP fill:#1e3a5f,stroke:#3b82f6,color:#e2e8f0
    style AW fill:#1e3a5f,stroke:#3b82f6,color:#e2e8f0
    style GC fill:#1e3a5f,stroke:#3b82f6,color:#e2e8f0
Loading
Dimension What it checks
Repository Knowledge AGENTS.md, docs/, versioned execution plans
Architecture Enforcement ARCHITECTURE.md, dependency rules, linters, policy signals
Agent Legibility Bootable app, observability (or CLI diagnosability), lean dependencies
Golden Principles Documented taste rules, CI gate depth, cleanup process
Agent Workflow Agent config, risk policy, PR templates, CI enforcement
Garbage Collection Debt tracking, doc-gardening, quality grades, docs drift rules

Self-apply: 18/18

Reins audits itself in CI. Current score:

{
  "total_score": 18,
  "max_score": 18,
  "maturity_level": "L4: Self-Correcting"
}

CI gates: lint, test, typecheck, self-audit. Merging to master runs publish: if cli/reins/package.json has a version not yet on npm, it publishes reins-cli and creates a GitHub Release. PRs from this repository that modify cli/reins/** and do not manually bump cli/reins/package.json are auto-patched by .github/workflows/auto-bump-cli-version.yml (fork PRs are skipped). For CLI repositories, Reins treats strong diagnosability signals (for example doctor surfaces, CLI diagnostics in CI, and help/error coverage) as the equivalent of service observability infrastructure.

Project structure

reins/
  cli/reins/            # The CLI tool (Bun + TypeScript, zero deps)
    src/index.ts        # Thin CLI router
    src/lib/commands/   # Command handlers (init/audit/doctor/evolve)
    src/lib/audit/      # Audit runtime context + scoring internals
    package.json
  skill/                # Agent skill (Claude Code)
    Reins/
      SKILL.md          # Skill definition and routing
      HarnessMethodology.md  # Full methodology reference
      Workflows/
        Scaffold.md     # Scaffold workflow
        Audit.md        # Audit workflow
        Evolve.md       # Evolve workflow

Requirements

  • Bun v1.0+ or Node.js 18+
  • No other dependencies

Contributing

See CONTRIBUTING.md for guidelines.

Methodology

Based on OpenAI's Harness Engineering (February 2026, Ryan Lopopolo). The five pillars:

  1. Repository as system of record — all knowledge versioned in-repo
  2. Layered domain architecture — strict layer ordering with forward-only dependencies
  3. Agent legibility — optimize for agent understanding, not just human readability
  4. Golden principles — encode human taste mechanically, enforce in CI
  5. Garbage collection — background agents clean drift continuously

License

MIT

About

Harness Engineering CLI — scaffold, audit, evolve, and doctor agent-readiness for any repo

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •