You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Introduces a framework for decomposing complex R&D goals into structured, executable plans using LLM-based planning agents. The core insight is treating LLM planning as a "compiler front-end" that produces an **Intermediate Representation (IR)** — the `PlanSpec` — which can be validated, scored, and eventually compiled into executable workflows.
6
+
7
+
This PR establishes the foundational infrastructure for plan generation and quality evaluation, with the goal of enabling autonomous research and development workflows.
8
+
9
+
## 🔗 Related links
10
+
11
+
-`agent/docs/PLAN-task-decomposition.md` — Full design document and implementation plan
12
+
-`agent/docs/E2E-test-results-2024-12-17.md` — Latest E2E test outputs
13
+
14
+
## 🚫 Blocked by
15
+
16
+
_None_
17
+
18
+
## 🔍 What does this change?
19
+
20
+
### Core Schema & Types
21
+
22
+
-**`schemas/plan-spec.ts`** — Full Zod schema for `PlanSpec` with 4 step types:
23
+
-`research` — Parallelizable information gathering
24
+
-`synthesize` — Combining findings (integrative) or evaluating results (evaluative)
25
+
-`experiment` — Testing hypotheses (exploratory or confirmatory with preregistration)
26
+
-`develop` — Building/implementing artifacts
27
+
28
+
-**`schemas/planning-fixture.ts`** — Types for test fixtures (`PlanningFixture`, `ExpectedPlanCharacteristics`)
29
+
30
+
-**`constants.ts`** — 12 agent capability profiles with `canHandle` mappings for executor assignment
-[x] does not modify any publishable blocks or libraries, or modifications do not need publishing
89
+
90
+
### 📜 Does this require a change to the docs?
91
+
92
+
The changes in this PR:
93
+
94
+
-[x] are internal and do not require a docs change
95
+
96
+
### 🕸️ Does this require a change to the Turbo Graph?
97
+
98
+
The changes in this PR:
99
+
100
+
-[x] do not affect the execution graph
101
+
102
+
## ⚠️ Known issues
103
+
104
+
1.**ct-database-goal fixture fails validation** — The LLM occasionally generates confirmatory experiments without `preregisteredCommitments`. This is a known prompt engineering issue that will be addressed in the revision workflow.
105
+
106
+
2.**explore-and-recommend generates unexpected content** — The LLM adds hypotheses and experiments not specified in the fixture expectations. This is valid behavior (more thorough than minimum), but indicates fixture expectations may need adjustment.
0 commit comments