Skip to content

lwyBZss8924d/ReCodeAgent

Repository files navigation

ReCodeAgent - Universal Recursive Code Generation Agent

Core Paper: ReCode: Unify Plan and Action for Universal Granularity Control (arXiv:2510.23564v2)

Project Status: Production-ready Rust implementation with Harbor integration for Terminal-Bench 2.0 evaluation Updated Date: 2025-11-22


Project Objective: Productionize the ReCode research paradigm into a high-performance Rust Core + Codex CLI integrated recursive code generation system. Now featuring Harbor Container-Unified Architecture for Terminal-Bench 2.0 benchmark evaluation.

arXiv License Rust


📖 Project Overview

ReCodeAgent is a production implementation of the academic paper ReCode: Unify Plan and Action for Universal Granularity Control, using a Hybrid Architecture: Rust Orchestrator + Codex CLI Executor to enable dynamic granularity control from fixed-granularity decision-making to universal programming agents.

Core Value Propositions

  • 🔄 Recursive Code Generation: Placeholder functions auto-expand into executable code
  • High-Performance Rust Core: DFS tree traversal, AST parsing, checkpoint mechanism
  • 🔌 Codex CLI Integration: LLM calls via codex exec --json, authenticated with ~/.codex/auth.json
  • 🐳 Harbor Container-Unified Architecture: Seamless Terminal-Bench 2.0 evaluation
  • 🛠️ Tool Ecosystem Integration: File editing, command execution, environment interaction
  • 🔒 Production-Grade Reliability: Type safety, memory efficiency, JSONL event streaming

🚀 Quick Start

Running with Harbor (Recommended)

# Navigate to Harbor workspace
cd ~/harbor-workspace

# Run a single task
harbor run -d terminal-bench@2.0 -t regex-log -a recode-agent

# Specify prompt template
harbor run -d terminal-bench@2.0 -t regex-log -a recode-agent \
  --agent-kwarg template=recode_tb2_prompt.jinja2

# Limit max steps + debug mode
harbor run -d terminal-bench@2.0 -t password-recovery -a recode-agent \
  --agent-kwarg max_steps=50 --debug

# Batch run all tasks (4 concurrent)
harbor run -d terminal-bench@2.0 -a recode-agent -n 4

Local Development Testing

# Quick smoke test (10 steps)
cargo run --example terminal_bench_smoke --release

# CLI subcommand test
cargo run --release --manifest-path recode-core/Cargo.toml -- \
  execute --task-name test --instruction "test task" --working-dir /tmp --max-steps 5

View Results

# Latest task results
ls -lt ~/harbor-workspace/jobs/ | head -3

# Execution logs
cat ~/harbor-workspace/jobs/<job-id>/<task-id>/agent/command-2/stdout.txt | tail -50

# Verification result (0.0 = fail, 1.0 = success)
cat ~/harbor-workspace/jobs/<job-id>/<task-id>/verifier/reward.txt

🏗️ Architecture

Harbor Container-Unified Architecture

┌─────────────────────────────────────────────────────────────────┐
│                   macOS Development Environment (Host)          │
├─────────────────────────────────────────────────────────────────┤
│  ReCodeAgent Repository                                         │
│  ~/dev-space/ReCodeAgent/                                       │
│  ├── recode-core/src/           # Rust source code              │
│  └── recode-core/templates/     # Jinja2 Prompt templates       │
├─────────────────────────────────────────────────────────────────┤
│  Harbor Installation                                            │
│  ~/.local/share/uv/tools/harbor/.../agents/installed/           │
│  ├── recode_agent.py            # Harbor Agent definition       │
│  └── recode-assets/             # Deployment assets             │
│      ├── recode-agent           # Linux x86_64 binary           │
│      ├── templates/             # Jinja2 templates              │
│      └── scripts/               # Python bridge scripts         │
└─────────────────────────────────────────────────────────────────┘
                              │ Docker Volume
                              │ ${HOME}/.codex:/tmp/host-codex:ro
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│             Docker Container (Terminal-Bench 2.0)               │
├─────────────────────────────────────────────────────────────────┤
│  /app/                        # Working directory               │
│  ├── recode-agent             # ReCodeAgent binary              │
│  ├── AGENTS.md                # Rendered system prompt          │
│  │                            # (auto-loaded by Codex)          │
│  ├── instruction.md           # Task instruction                │
│  └── .codex/                  # Codex CLI config                │
│      ├── auth.json            # Auth (from host)                │
│      └── config.toml          # Model configuration             │
└─────────────────────────────────────────────────────────────────┘

Execution Flow

harbor run -d terminal-bench@2.0 -t <task> -a recode-agent
    │
    ├──▶ Step 1: Setup Codex auth (copy auth.json, config.toml)
    ├──▶ Step 2: Render AGENTS.md (recode-agent render-template)
    ├──▶ Step 3: Execute task (recode-agent execute + codex exec)
    │             └── DFS tree traversal + checkpoint self-verification
    └──▶ Step 4: Cleanup

CLI Subcommand Structure

enum Command {
    /// Legacy: Run with explicit bridge configuration
    Run { env_kind, python, bridge, bridge_args, instruction },

    /// Render AGENTS.md template (Harbor Step 2)
    RenderTemplate { template, output, task_name, instruction_path },

    /// Execute task (Harbor Step 3) - Core execution engine
    Execute { task_name, instruction, working_dir, max_steps, codex_home },
}

📁 Project Structure

recode-core/
├── Cargo.toml
├── Dockerfile                 # Container image config
│
├── examples/
│   └── terminal_bench_smoke.rs  # Quick local test (10 steps)
│
├── harbor-assets/             # Harbor deployment assets
│   ├── recode-agent           # Linux x86_64 binary
│   └── templates/             # Deployment templates
│
├── templates/                 # Jinja2 Prompt templates (source)
│   ├── recode_tb2_agents_md.jinja2      # Default AGENTS.md template
│   ├── recode_tb2_prompt.jinja2         # TB2 task prompt
│   ├── recode_microtexecute_tb2_prompt.jinja2  # Codex expansion
│   └── recode_tb2_checkpoint_minimal.jinja2    # Checkpoint
│
├── src/
│   ├── main.rs                # CLI entry (run, render-template, execute)
│   ├── codex/
│   │   └── thread_manager.rs  # Codex CLI integration
│   ├── execution/
│   │   └── python_executor.rs # Python code execution
│   ├── orchestrator/
│   │   ├── engine.rs          # Codex prompt assembly
│   │   └── runtime.rs         # DFS tree + checkpoint mechanism
│   └── tree/
│       ├── context.rs
│       └── node.rs
│
└── tests/
    └── fixtures/

🏗️ ReCode Core Methodology

Decision Process Formulation

We model LLM-based agent interaction with the environment as a simplified decision process:

$$\mathcal{M} = \langle \mathcal{S}, \mathcal{A}, \mathcal{O}, T, R \rangle$$

Where:

  • $\mathcal{S}$: State space
  • $\mathcal{A}$: Primitive action space (executable operations like run('crack egg'))
  • $\mathcal{O}$: Observation space
  • $T: \mathcal{S} \times \mathcal{A} \rightarrow \mathcal{S}$: Transition function
  • $R: \mathcal{S} \times \mathcal{A} \rightarrow \mathbb{R}$: Reward function

Beyond primitive actions, we introduce plan space $\mathcal{P}$ containing high-level intentions requiring decomposition (e.g., prepare_breakfast()).

Decision space: $\mathcal{D} = \mathcal{A} \cup \mathcal{P}$

Unified Plan & Action Representation

Key Insight: Plans and actions, though seemingly different, can be unified into a single executable code representation.

  • Actions (primitive): Executable operations like run('click the submit button')
  • Plans (abstract): Unimplemented placeholder functions like prepare_breakfast(), get_ingredients()

This unified representation enables seamless transitions between planning and execution.

Recursive Code Generation Algorithm

Algorithm 1: The ReCode Algorithm

Procedure ReCode(T, π, E, c):
    if c is None:                      // Initialize
        o_0 ← Reset(E)                 // Reset environment
        c ← Text2Code(T, o_0)          // Convert task to root placeholder
    end if

    code_block ← π(c)                  // LLM generates child code

    for each child u in code_block:
        if IsPrimitive(u):             // Primitive action
            Execute(u, E)
        else:                          // Placeholder function
            ReCode(T, π, E, u)         // Recursive expansion
        end if
    end for
end procedure

Implementation Details

  • Task Initialization: Task instruction → root placeholder function solve(instruction, observation)
  • Context Management: Unified variable namespace, persisted across recursion levels
  • Error Handling: Self-correction loop (max_rewrite=5)
  • Recursion Control: Maximum recursion depth 10
  • Checkpoint Mechanism: DFS tree completion ≠ task solved, inject checkpoint for agent self-verification

🛠️ Development

Prerequisites

  • Rust 1.83+ (curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh)
  • Docker (for cross-compilation and Harbor)
  • Codex CLI (~/.codex/auth.json configured)
  • Harbor Framework (uv tool install harbor)

Build Commands

# Local macOS build (development)
cargo build --release --manifest-path recode-core/Cargo.toml

# Linux x86_64 build (Harbor container)
docker build --platform linux/amd64 -f Dockerfile.build-x86 -t recode-builder .
docker create --name tmp recode-builder
docker cp tmp:/build/recode-core/target/release/recode-core ./recode-agent-linux-x86_64
docker rm tmp

# Verify binary
file ./recode-agent-linux-x86_64
# Should output: ELF 64-bit LSB pie executable, x86-64...

Sync to Harbor

HARBOR_ASSETS=~/.local/share/uv/tools/harbor/lib/python3.13/site-packages/harbor/agents/installed/recode-assets

# Sync binary
cp ./recode-agent-linux-x86_64 $HARBOR_ASSETS/recode-agent
chmod +x $HARBOR_ASSETS/recode-agent

# Sync templates
cp -r recode-core/templates/*.jinja2 $HARBOR_ASSETS/templates/

# Sync scripts
cp scripts/terminal_bench_bridge.py $HARBOR_ASSETS/scripts/

Testing

cargo test                              # All tests
cargo test --test codex_turn_tests     # Codex integration tests
cargo clippy                            # Code linting

⚙️ Configuration

Prompt Templates

Template Purpose Default
recode_tb2_agents_md.jinja2 AGENTS.md system prompt
recode_tb2_prompt.jinja2 TB2 task prompt
recode_microtexecute_tb2_prompt.jinja2 Codex expansion
recode_tb2_checkpoint_minimal.jinja2 Checkpoint verification

Harbor Agent Parameters

Parameter Type Default Description
template string recode_tb2_agents_md.jinja2 Jinja2 template filename
max_steps int 99999 DFS tree maximum steps

Usage: --agent-kwarg template=xxx --agent-kwarg max_steps=100

Codex CLI Integration

  • Authentication: ~/.codex/auth.json (copied to container /app/.codex/)
  • Model config: ~/.codex/config.toml (default: gpt-5.1-codex-max)
  • AGENTS.md: Auto-discovered and loaded by Codex (95%+ token savings)
  • Command: codex exec --json for JSONL event streaming

📚 Documentation

Document Description
WARP.md Quick reference guide for WARP/Claude Code
CLAUDE.md Claude Code instructions
Harbor ENV Guide Detailed Harbor operation manual
Architecture Technical architecture specification
Roadmap Implementation roadmap

📈 Benchmark Results

ReCodeAgent is evaluated on Terminal-Bench 2.0, a benchmark for terminal-based task automation.

# Run evaluation
harbor run -d terminal-bench@2.0 -a recode-agent -n 4

# View results
cat ~/harbor-workspace/jobs/<job-id>/result.json | jq '.stats'

🔬 Technology Stack

Core Dependencies (Rust)

  • tokio - Async runtime
  • clap - CLI argument parsing
  • serde / serde_json - Serialization / JSONL parsing
  • minijinja - Jinja2 template rendering
  • tree-sitter - High-performance AST parsing
  • tracing - Structured logging

External Dependencies

  • Codex CLI - LLM calls and tool execution
  • Harbor Framework - Benchmark evaluation platform
  • Docker - Container runtime

📝 License

Apache License 2.0 - See LICENSE file for details.


🙏 Acknowledgments


Links


Last Updated: 2025-11-22 Version: v0.2.0 Status: ✅ Production-ready with Harbor Terminal-Bench 2.0 integration

About

ReCodeAgent project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published