From 3ae75773684cfed51c27063fb7894b170474ef4a Mon Sep 17 00:00:00 2001 From: bft-codebot Date: Fri, 6 Feb 2026 18:56:55 +0000 Subject: [PATCH] sync(bfmono): feat(gambit): add tool-call-aware grader schemas and root-deck guards (+19 more) (bfmono@70ec3b942) This PR is an automated gambitmono sync of bfmono Gambit packages. - Source: `packages/gambit/` - Core: `packages/gambit-core/` - bfmono rev: 70ec3b942 Changes: - 70ec3b942 feat(gambit): add tool-call-aware grader schemas and root-deck guards - cae381f00 feat(gambit): align scaffolds with product command and hourglass policies - 5faa48b35 feat(gambit): move bot policy to folder and enforce policy summarizer flow - 9a36c4a7e fix(gambit): align env loading with init and block .gambit env writes - dbe7c54ca feat(gambit-bot): add file actions and scenario deck structure - 855784d6b docs(gambit): add public permissions guide and API jsdoc - 8f0ca0a85 feat(gambit): trace effective permission layers at runtime - 90b4b5071 feat(gambit-core): add phase-1 permission contract primitives - df9280f6a fix(gambit): restore build-bot deck path compatibility - daca46555 feat(simulator-ui): wire build, test, and grade to workspace sessions - e404a17d7 feat(gambit): add workspace-backed serve and bot sandbox flow - 5f4fa86b9 feat(gambit): scaffold workspace defaults in init - cf9b23778 feat(gambit-core): add schema guards and model param passthrough - d0e5a9617 [gambit] move chat message into transcript so it scrolls - 5c6125d99 feat(simulator-ui): open workbench drawer by default - 7c9cd05f8 feat(simulator): gate chat accordion by env flag - a2599068e feat(simulator-ui): add build chat history loading - 9911dae22 feat(simulator-ui): add workbench chat drawer accordion - 8cab8ec1f feat(simulator-ui): dock calibrate drawer and sync updates - d41ba101d Add AAR for phase 3.1.5 deck format build tab Do not edit this repo directly; make changes in bfmono and re-run the sync. --- docs/external/guides/authoring.md | 5 + .../graders/contexts/conversation_tools.ts | 27 ++++++ .../contexts/conversation_tools.zod.ts | 1 + .../schemas/graders/contexts/tools.ts | 5 + .../schemas/graders/contexts/tools.zod.ts | 1 + .../schemas/graders/contexts/turn_tools.ts | 28 ++++++ .../graders/contexts/turn_tools.zod.ts | 1 + packages/gambit-core/src/markdown.test.ts | 95 ++++++++++++++++++ src/decks/gambit-bot/PROMPT.md | 35 +++---- .../first_deck_root_prompt_guard/PROMPT.md | 9 ++ .../first_deck_root_prompt_guard.deck.ts | 94 ++++++++++++++++++ .../PROMPT.md | 10 ++ ...first_deck_root_prompt_guard_tools.deck.ts | 97 +++++++++++++++++++ .../PROMPT.md | 10 ++ ...ot_prompt_guard_tools_conversation.deck.ts | 97 +++++++++++++++++++ .../gambit-bot/policy/deck-format-1.0.md | 20 ++-- .../scenarios/build_tab_demo/PROMPT.md | 40 -------- .../scenarios/faq_bot_build_flow/PROMPT.md | 52 ++++++++++ .../investor_faq_regression/PROMPT.md | 54 ----------- .../scenarios/nux_from_scratch_demo/PROMPT.md | 27 ------ .../scenarios/recipe_selection/PROMPT.md | 33 ------- .../recipe_selection_no_skip/PROMPT.md | 27 ------ .../nux_from_scratch_demo_input.zod.ts | 7 -- src/decks/tests/build_tab_demo.test.deck.md | 40 -------- .../tests/nux_from_scratch_demo.test.deck.md | 27 ------ src/decks/tests/recipe_selection.test.deck.md | 33 ------- .../recipe_selection_no_skip.test.deck.md | 27 ------ 27 files changed, 559 insertions(+), 343 deletions(-) create mode 100644 packages/gambit-core/schemas/graders/contexts/conversation_tools.ts create mode 100644 packages/gambit-core/schemas/graders/contexts/conversation_tools.zod.ts create mode 100644 packages/gambit-core/schemas/graders/contexts/tools.ts create mode 100644 packages/gambit-core/schemas/graders/contexts/tools.zod.ts create mode 100644 packages/gambit-core/schemas/graders/contexts/turn_tools.ts create mode 100644 packages/gambit-core/schemas/graders/contexts/turn_tools.zod.ts create mode 100644 src/decks/gambit-bot/graders/first_deck_root_prompt_guard/PROMPT.md create mode 100644 src/decks/gambit-bot/graders/first_deck_root_prompt_guard/first_deck_root_prompt_guard.deck.ts create mode 100644 src/decks/gambit-bot/graders/first_deck_root_prompt_guard_tools/PROMPT.md create mode 100644 src/decks/gambit-bot/graders/first_deck_root_prompt_guard_tools/first_deck_root_prompt_guard_tools.deck.ts create mode 100644 src/decks/gambit-bot/graders/first_deck_root_prompt_guard_tools_conversation/PROMPT.md create mode 100644 src/decks/gambit-bot/graders/first_deck_root_prompt_guard_tools_conversation/first_deck_root_prompt_guard_tools_conversation.deck.ts delete mode 100644 src/decks/gambit-bot/scenarios/build_tab_demo/PROMPT.md create mode 100644 src/decks/gambit-bot/scenarios/faq_bot_build_flow/PROMPT.md delete mode 100644 src/decks/gambit-bot/scenarios/investor_faq_regression/PROMPT.md delete mode 100644 src/decks/gambit-bot/scenarios/nux_from_scratch_demo/PROMPT.md delete mode 100644 src/decks/gambit-bot/scenarios/recipe_selection/PROMPT.md delete mode 100644 src/decks/gambit-bot/scenarios/recipe_selection_no_skip/PROMPT.md delete mode 100644 src/decks/gambit-bot/scenarios/schemas/nux_from_scratch_demo_input.zod.ts delete mode 100644 src/decks/tests/build_tab_demo.test.deck.md delete mode 100644 src/decks/tests/nux_from_scratch_demo.test.deck.md delete mode 100644 src/decks/tests/recipe_selection.test.deck.md delete mode 100644 src/decks/tests/recipe_selection_no_skip.test.deck.md diff --git a/docs/external/guides/authoring.md b/docs/external/guides/authoring.md index 8abb0264..9f1790c1 100644 --- a/docs/external/guides/authoring.md +++ b/docs/external/guides/authoring.md @@ -124,6 +124,11 @@ deno run -A packages/gambit/scripts/migrate-schema-terms.ts invalid JSON or schema-violating output blocks the run with a clear error. - `graderDecks` describe calibration decks that score transcripts/artifacts. The simulator Calibrate page will run these decks against stored runs. +- For graders that inspect assistant tool usage, set + `contextSchema = "gambit://schemas/graders/contexts/turn_tools.zod.ts"` so + `session.messages[*].tool_calls` is available in the grader input. +- For conversation-level tool-call grading (single score for the whole run), use + `contextSchema = "gambit://schemas/graders/contexts/conversation_tools.zod.ts"`. - Configure `acceptsUserTurns` alongside these references: - Markdown roots default to `true`; TypeScript decks default to `false` everywhere. Set it to `false` for any workflow deck that should never accept diff --git a/packages/gambit-core/schemas/graders/contexts/conversation_tools.ts b/packages/gambit-core/schemas/graders/contexts/conversation_tools.ts new file mode 100644 index 00000000..a525e424 --- /dev/null +++ b/packages/gambit-core/schemas/graders/contexts/conversation_tools.ts @@ -0,0 +1,27 @@ +import { z } from "zod"; + +const graderToolCallSchema = z.object({ + id: z.string().optional(), + type: z.string().optional(), + function: z.object({ + name: z.string(), + arguments: z.string().optional(), + }), +}); + +export const graderConversationMessageWithToolsSchema = z.object({ + role: z.string(), + content: z.any().optional(), + name: z.string().optional(), + tool_calls: z.array(graderToolCallSchema).optional(), +}); + +export const graderConversationWithToolsSchema = z.object({ + messages: z.array(graderConversationMessageWithToolsSchema).optional(), + meta: z.record(z.any()).optional(), + notes: z.object({ text: z.string().optional() }).optional(), +}); + +export default z.object({ + session: graderConversationWithToolsSchema, +}); diff --git a/packages/gambit-core/schemas/graders/contexts/conversation_tools.zod.ts b/packages/gambit-core/schemas/graders/contexts/conversation_tools.zod.ts new file mode 100644 index 00000000..4de338ec --- /dev/null +++ b/packages/gambit-core/schemas/graders/contexts/conversation_tools.zod.ts @@ -0,0 +1 @@ +export { default } from "./conversation_tools.ts"; diff --git a/packages/gambit-core/schemas/graders/contexts/tools.ts b/packages/gambit-core/schemas/graders/contexts/tools.ts new file mode 100644 index 00000000..2741efcb --- /dev/null +++ b/packages/gambit-core/schemas/graders/contexts/tools.ts @@ -0,0 +1,5 @@ +export { default } from "./turn_tools.ts"; +export { + graderConversationWithToolsSchema, + graderMessageWithToolsSchema, +} from "./turn_tools.ts"; diff --git a/packages/gambit-core/schemas/graders/contexts/tools.zod.ts b/packages/gambit-core/schemas/graders/contexts/tools.zod.ts new file mode 100644 index 00000000..72849e70 --- /dev/null +++ b/packages/gambit-core/schemas/graders/contexts/tools.zod.ts @@ -0,0 +1 @@ +export { default } from "./turn_tools.ts"; diff --git a/packages/gambit-core/schemas/graders/contexts/turn_tools.ts b/packages/gambit-core/schemas/graders/contexts/turn_tools.ts new file mode 100644 index 00000000..50b0e8f3 --- /dev/null +++ b/packages/gambit-core/schemas/graders/contexts/turn_tools.ts @@ -0,0 +1,28 @@ +import { z } from "zod"; + +const graderToolCallSchema = z.object({ + id: z.string().optional(), + type: z.string().optional(), + function: z.object({ + name: z.string(), + arguments: z.string().optional(), + }), +}); + +export const graderMessageWithToolsSchema = z.object({ + role: z.string(), + content: z.any().optional(), + name: z.string().optional(), + tool_calls: z.array(graderToolCallSchema).optional(), +}); + +export const graderConversationWithToolsSchema = z.object({ + messages: z.array(graderMessageWithToolsSchema).optional(), + meta: z.record(z.any()).optional(), + notes: z.object({ text: z.string().optional() }).optional(), +}); + +export default z.object({ + session: graderConversationWithToolsSchema, + messageToGrade: graderMessageWithToolsSchema, +}); diff --git a/packages/gambit-core/schemas/graders/contexts/turn_tools.zod.ts b/packages/gambit-core/schemas/graders/contexts/turn_tools.zod.ts new file mode 100644 index 00000000..72849e70 --- /dev/null +++ b/packages/gambit-core/schemas/graders/contexts/turn_tools.zod.ts @@ -0,0 +1 @@ +export { default } from "./turn_tools.ts"; diff --git a/packages/gambit-core/src/markdown.test.ts b/packages/gambit-core/src/markdown.test.ts index 41242596..d813430b 100644 --- a/packages/gambit-core/src/markdown.test.ts +++ b/packages/gambit-core/src/markdown.test.ts @@ -101,6 +101,101 @@ Schema deck. assertEquals(parsed, { status: 200 }); }); +Deno.test("markdown deck resolves tool-call-aware grader context schema", async () => { + const dir = await Deno.makeTempDir(); + + const deckPath = await writeTempDeck( + dir, + "turn-tools-schema.deck.md", + `+++ +label = "turn-tools-schema" +contextSchema = "gambit://schemas/graders/contexts/turn_tools.zod.ts" ++++ + +Schema deck. +`, + ); + + const deck = await loadMarkdownDeck(deckPath); + + assert(deck.contextSchema, "expected context schema to resolve"); + const parsed = deck.contextSchema.parse({ + session: { + messages: [ + { + role: "assistant", + tool_calls: [ + { + function: { + name: "bot_write", + arguments: '{"path":"PROMPT.md"}', + }, + }, + ], + }, + ], + }, + messageToGrade: { + role: "assistant", + tool_calls: [ + { + function: { + name: "bot_write", + }, + }, + ], + }, + }); + + assertEquals(parsed.messageToGrade.role, "assistant"); + assertEquals( + parsed.session.messages?.[0].tool_calls?.[0].function.name, + "bot_write", + ); +}); + +Deno.test("markdown deck resolves conversation-level tool-call grader context schema", async () => { + const dir = await Deno.makeTempDir(); + + const deckPath = await writeTempDeck( + dir, + "conversation-tools-schema.deck.md", + `+++ +label = "conversation-tools-schema" +contextSchema = "gambit://schemas/graders/contexts/conversation_tools.zod.ts" ++++ + +Schema deck. +`, + ); + + const deck = await loadMarkdownDeck(deckPath); + + assert(deck.contextSchema, "expected context schema to resolve"); + const parsed = deck.contextSchema.parse({ + session: { + messages: [ + { + role: "assistant", + tool_calls: [ + { + function: { + name: "bot_write", + arguments: '{"path":"faq-bot/PROMPT.md"}', + }, + }, + ], + }, + ], + }, + }); + + assertEquals( + parsed.session.messages?.[0].tool_calls?.[0].function.name, + "bot_write", + ); +}); + Deno.test("markdown deck warns on legacy schema URIs", async () => { const dir = await Deno.makeTempDir(); const deckPath = await writeTempDeck( diff --git a/src/decks/gambit-bot/PROMPT.md b/src/decks/gambit-bot/PROMPT.md index 21d458bb..33b9179c 100644 --- a/src/decks/gambit-bot/PROMPT.md +++ b/src/decks/gambit-bot/PROMPT.md @@ -55,30 +55,25 @@ label = "Deck format policy guard (turn) LLM" path = "./graders/deck_format_policy_llm/PROMPT.md" description = "LLM guard for policy-compliant deck editing behavior." -[[scenarios]] -label = "Recipe selection on-ramp tester" -path = "./scenarios/recipe_selection/PROMPT.md" -description = "Synthetic user that asks Gambit Bot to build a recipe selection chatbot." - -[[scenarios]] -label = "Recipe selection (no skip)" -path = "./scenarios/recipe_selection_no_skip/PROMPT.md" -description = "Synthetic user that completes the question flow without skipping to building." +[[graders]] +label = "First deck location guard (turn)" +path = "./graders/first_deck_root_prompt_guard/PROMPT.md" +description = "Checks that the first created deck is root PROMPT.md (not a subfolder PROMPT.md)." -[[scenarios]] -label = "Build tab demo prompt" -path = "./scenarios/build_tab_demo/PROMPT.md" -description = "Synthetic user prompt for the build tab demo." +[[graders]] +label = "First deck location guard (tools)" +path = "./graders/first_deck_root_prompt_guard_tools/PROMPT.md" +description = "Checks first created deck location using tool-call-aware grading context." -[[scenarios]] -label = "NUX from scratch demo prompt" -path = "./scenarios/nux_from_scratch_demo/PROMPT.md" -description = "Synthetic user prompt for the NUX from-scratch build demo." +[[graders]] +label = "First deck location guard (tools, conversation)" +path = "./graders/first_deck_root_prompt_guard_tools_conversation/PROMPT.md" +description = "Conversation-level check of first created deck location with tool-call-aware context." [[scenarios]] -label = "Investor FAQ regression" -path = "./scenarios/investor_faq_regression/PROMPT.md" -description = "Replays the investor FAQ build flow that previously produced a non-v1.0 deck format." +label = "FAQ bot build flow" +path = "./scenarios/faq_bot_build_flow/PROMPT.md" +description = "Synthetic user flow that builds an FAQ bot, checks policy alignment, and requests a root-level deck move." +++ You are GambitBot, an AI assistant designed to help people build other AI diff --git a/src/decks/gambit-bot/graders/first_deck_root_prompt_guard/PROMPT.md b/src/decks/gambit-bot/graders/first_deck_root_prompt_guard/PROMPT.md new file mode 100644 index 00000000..3375a542 --- /dev/null +++ b/src/decks/gambit-bot/graders/first_deck_root_prompt_guard/PROMPT.md @@ -0,0 +1,9 @@ ++++ +label = "First deck location guard (turn)" +description = "Deterministic guard that checks whether the first created deck is root PROMPT.md." +contextSchema = "gambit://schemas/graders/contexts/turn_tools.zod.ts" +responseSchema = "gambit://schemas/graders/grader_output.zod.ts" +execute = "./first_deck_root_prompt_guard.deck.ts" ++++ + +Compute grader that enforces first deck location policy. diff --git a/src/decks/gambit-bot/graders/first_deck_root_prompt_guard/first_deck_root_prompt_guard.deck.ts b/src/decks/gambit-bot/graders/first_deck_root_prompt_guard/first_deck_root_prompt_guard.deck.ts new file mode 100644 index 00000000..ff6d2ce7 --- /dev/null +++ b/src/decks/gambit-bot/graders/first_deck_root_prompt_guard/first_deck_root_prompt_guard.deck.ts @@ -0,0 +1,94 @@ +import { defineDeck } from "jsr:@bolt-foundry/gambit"; +import { z } from "npm:zod"; +import contextSchema, { + type graderMessageWithToolsSchema as messageSchema, +} from "../../../../../../gambit-core/schemas/graders/contexts/turn_tools.ts"; + +const responseSchema = z.object({ + score: z.number().int().min(-3).max(3), + reason: z.string(), + evidence: z.array(z.string()).optional(), +}); + +type DeckWrite = { + path: string; + messageIndex: number; +}; + +export default defineDeck({ + label: "first_deck_root_prompt_guard", + contextSchema, + responseSchema, + run(ctx) { + const messages = ctx.input.session.messages ?? []; + const deckWrites = collectDeckPromptWrites(messages); + + if (deckWrites.length === 0) { + return { + score: 0, + reason: + "No deck creation write found (no bot_write call targeting PROMPT.md).", + }; + } + + const firstWrite = deckWrites[0]; + if (firstWrite.path === "PROMPT.md") { + return { + score: 3, + reason: "First created deck is root PROMPT.md.", + evidence: [`first deck write path: ${firstWrite.path}`], + }; + } + + return { + score: -3, + reason: + "First created deck is not root PROMPT.md; it was created in a subfolder.", + evidence: [ + `first deck write path: ${firstWrite.path}`, + `message index: ${firstWrite.messageIndex}`, + ], + }; + }, +}); + +function collectDeckPromptWrites( + messages: Array>, +): Array { + const writes: Array = []; + + for (let i = 0; i < messages.length; i += 1) { + const msg = messages[i]; + if (msg.role !== "assistant" || !msg.tool_calls?.length) continue; + + for (const tool of msg.tool_calls) { + if (tool.function.name !== "bot_write") continue; + if (!tool.function.arguments) continue; + + try { + const parsed = JSON.parse(tool.function.arguments) as { + path?: unknown; + }; + if (typeof parsed.path !== "string") continue; + + const normalizedPath = normalizePath(parsed.path); + if (isDeckPromptPath(normalizedPath)) { + writes.push({ path: normalizedPath, messageIndex: i }); + } + } catch { + // Ignore malformed tool args and continue scanning. + } + } + } + + return writes; +} + +function normalizePath(path: string): string { + const withForwardSlashes = path.replaceAll("\\", "/"); + return withForwardSlashes.replace(/^\.\//, ""); +} + +function isDeckPromptPath(path: string): boolean { + return path === "PROMPT.md" || path.endsWith("/PROMPT.md"); +} diff --git a/src/decks/gambit-bot/graders/first_deck_root_prompt_guard_tools/PROMPT.md b/src/decks/gambit-bot/graders/first_deck_root_prompt_guard_tools/PROMPT.md new file mode 100644 index 00000000..b0936e2a --- /dev/null +++ b/src/decks/gambit-bot/graders/first_deck_root_prompt_guard_tools/PROMPT.md @@ -0,0 +1,10 @@ ++++ +label = "First deck location guard (tools)" +description = "Deterministic guard that checks whether the first created deck is root PROMPT.md, with tool-call-aware context." +contextSchema = "gambit://schemas/graders/contexts/turn_tools.zod.ts" +responseSchema = "gambit://schemas/graders/grader_output.zod.ts" +execute = "./first_deck_root_prompt_guard_tools.deck.ts" ++++ + +Compute grader that enforces first deck location policy using tool-call-aware +context. diff --git a/src/decks/gambit-bot/graders/first_deck_root_prompt_guard_tools/first_deck_root_prompt_guard_tools.deck.ts b/src/decks/gambit-bot/graders/first_deck_root_prompt_guard_tools/first_deck_root_prompt_guard_tools.deck.ts new file mode 100644 index 00000000..38874a98 --- /dev/null +++ b/src/decks/gambit-bot/graders/first_deck_root_prompt_guard_tools/first_deck_root_prompt_guard_tools.deck.ts @@ -0,0 +1,97 @@ +import { defineDeck } from "jsr:@bolt-foundry/gambit"; +import { z } from "npm:zod"; +import contextSchema, { + type graderMessageWithToolsSchema as messageSchema, +} from "../../../../../../gambit-core/schemas/graders/contexts/turn_tools.ts"; + +const responseSchema = z.object({ + score: z.number().int().min(-3).max(3), + reason: z.string(), + evidence: z.array(z.string()).optional(), +}); + +type GraderInput = z.infer; +type SessionMessage = z.infer; + +type DeckWrite = { + path: string; + messageIndex: number; +}; + +export default defineDeck({ + label: "first_deck_root_prompt_guard_tools", + contextSchema, + responseSchema, + run(ctx) { + const messages = ctx.input.session.messages ?? []; + const deckWrites = collectDeckPromptWrites(messages); + + if (deckWrites.length === 0) { + return { + score: 0, + reason: + "No deck creation write found (no bot_write call targeting PROMPT.md).", + }; + } + + const firstWrite = deckWrites[0]; + if (firstWrite.path === "PROMPT.md") { + return { + score: 3, + reason: "First created deck is root PROMPT.md.", + evidence: [`first deck write path: ${firstWrite.path}`], + }; + } + + return { + score: -3, + reason: + "First created deck is not root PROMPT.md; it was created in a subfolder.", + evidence: [ + `first deck write path: ${firstWrite.path}`, + `message index: ${firstWrite.messageIndex}`, + ], + }; + }, +}); + +function collectDeckPromptWrites( + messages: Array, +): Array { + const writes: Array = []; + + for (let i = 0; i < messages.length; i += 1) { + const msg = messages[i]; + if (msg.role !== "assistant" || !msg.tool_calls?.length) continue; + + for (const tool of msg.tool_calls) { + if (tool.function.name !== "bot_write") continue; + if (!tool.function.arguments) continue; + + try { + const parsed = JSON.parse(tool.function.arguments) as { + path?: unknown; + }; + if (typeof parsed.path !== "string") continue; + + const normalizedPath = normalizePath(parsed.path); + if (isDeckPromptPath(normalizedPath)) { + writes.push({ path: normalizedPath, messageIndex: i }); + } + } catch { + // Ignore malformed tool args and continue scanning. + } + } + } + + return writes; +} + +function normalizePath(path: string): string { + const withForwardSlashes = path.replaceAll("\\", "/"); + return withForwardSlashes.replace(/^\.\//, ""); +} + +function isDeckPromptPath(path: string): boolean { + return path === "PROMPT.md" || path.endsWith("/PROMPT.md"); +} diff --git a/src/decks/gambit-bot/graders/first_deck_root_prompt_guard_tools_conversation/PROMPT.md b/src/decks/gambit-bot/graders/first_deck_root_prompt_guard_tools_conversation/PROMPT.md new file mode 100644 index 00000000..5a4afb35 --- /dev/null +++ b/src/decks/gambit-bot/graders/first_deck_root_prompt_guard_tools_conversation/PROMPT.md @@ -0,0 +1,10 @@ ++++ +label = "First deck location guard (tools, conversation)" +description = "Conversation-level guard that checks whether the first created deck is root PROMPT.md." +contextSchema = "gambit://schemas/graders/contexts/conversation_tools.zod.ts" +responseSchema = "gambit://schemas/graders/grader_output.zod.ts" +execute = "./first_deck_root_prompt_guard_tools_conversation.deck.ts" ++++ + +Compute grader that enforces first deck location policy across the whole +conversation. diff --git a/src/decks/gambit-bot/graders/first_deck_root_prompt_guard_tools_conversation/first_deck_root_prompt_guard_tools_conversation.deck.ts b/src/decks/gambit-bot/graders/first_deck_root_prompt_guard_tools_conversation/first_deck_root_prompt_guard_tools_conversation.deck.ts new file mode 100644 index 00000000..21cb5c60 --- /dev/null +++ b/src/decks/gambit-bot/graders/first_deck_root_prompt_guard_tools_conversation/first_deck_root_prompt_guard_tools_conversation.deck.ts @@ -0,0 +1,97 @@ +import { defineDeck } from "jsr:@bolt-foundry/gambit"; +import { z } from "npm:zod"; +import contextSchema, { + type graderConversationMessageWithToolsSchema as messageSchema, +} from "../../../../../../gambit-core/schemas/graders/contexts/conversation_tools.ts"; + +const responseSchema = z.object({ + score: z.number().int().min(-3).max(3), + reason: z.string(), + evidence: z.array(z.string()).optional(), +}); + +type GraderInput = z.infer; +type SessionMessage = z.infer; + +type DeckWrite = { + path: string; + messageIndex: number; +}; + +export default defineDeck({ + label: "first_deck_root_prompt_guard_tools_conversation", + contextSchema, + responseSchema, + run(ctx) { + const messages = ctx.input.session.messages ?? []; + const deckWrites = collectDeckPromptWrites(messages); + + if (deckWrites.length === 0) { + return { + score: 0, + reason: + "No deck creation write found (no bot_write call targeting PROMPT.md).", + }; + } + + const firstWrite = deckWrites[0]; + if (firstWrite.path === "PROMPT.md") { + return { + score: 3, + reason: "First created deck is root PROMPT.md.", + evidence: [`first deck write path: ${firstWrite.path}`], + }; + } + + return { + score: -3, + reason: + "First created deck is not root PROMPT.md; it was created in a subfolder.", + evidence: [ + `first deck write path: ${firstWrite.path}`, + `message index: ${firstWrite.messageIndex}`, + ], + }; + }, +}); + +function collectDeckPromptWrites( + messages: Array, +): Array { + const writes: Array = []; + + for (let i = 0; i < messages.length; i += 1) { + const msg = messages[i]; + if (msg.role !== "assistant" || !msg.tool_calls?.length) continue; + + for (const tool of msg.tool_calls) { + if (tool.function.name !== "bot_write") continue; + if (!tool.function.arguments) continue; + + try { + const parsed = JSON.parse(tool.function.arguments) as { + path?: unknown; + }; + if (typeof parsed.path !== "string") continue; + + const normalizedPath = normalizePath(parsed.path); + if (isDeckPromptPath(normalizedPath)) { + writes.push({ path: normalizedPath, messageIndex: i }); + } + } catch { + // Ignore malformed tool args and continue scanning. + } + } + } + + return writes; +} + +function normalizePath(path: string): string { + const withForwardSlashes = path.replaceAll("\\", "/"); + return withForwardSlashes.replace(/^\.\//, ""); +} + +function isDeckPromptPath(path: string): boolean { + return path === "PROMPT.md" || path.endsWith("/PROMPT.md"); +} diff --git a/src/decks/gambit-bot/policy/deck-format-1.0.md b/src/decks/gambit-bot/policy/deck-format-1.0.md index b7a382bc..d0f656b3 100644 --- a/src/decks/gambit-bot/policy/deck-format-1.0.md +++ b/src/decks/gambit-bot/policy/deck-format-1.0.md @@ -52,7 +52,9 @@ Schema requirements: - For plain chat output, `responseSchema` SHOULD be a string schema (for example, `gambit://schemas/scenarios/plain_chat_output.zod.ts`). - Grader decks MUST be compatible with the built-in grader schemas: - `gambit://schemas/graders/contexts/turn.zod.ts` or + `gambit://schemas/graders/contexts/turn.zod.ts`, + `gambit://schemas/graders/contexts/turn_tools.zod.ts`, + `gambit://schemas/graders/contexts/conversation_tools.zod.ts`, or `gambit://schemas/graders/contexts/conversation.zod.ts` (context) and `gambit://schemas/graders/grader_output.zod.ts` (response). - Compatibility rule (deep): base fields MUST be present and unchanged @@ -124,13 +126,15 @@ Schemas are referenced by path strings in `PROMPT.md` frontmatter (for example Built-in schemas (v1.0): -| URI | Purpose | -| ------------------------------------------------------- | --------------------------------------------------------------- | -| `gambit://schemas/graders/respond.zod.ts` | Shared respond-envelope schema used by decks and graders. | -| `gambit://schemas/graders/grader_output.zod.ts` | Canonical grader output schema (`score`, `reason`, `evidence`). | -| `gambit://schemas/graders/contexts/turn.zod.ts` | Schema for per-turn grader context (single exchange). | -| `gambit://schemas/graders/contexts/conversation.zod.ts` | Schema for full-conversation grader context. | -| `gambit://schemas/scenarios/plain_chat_output.zod.ts` | Canonical string output for plain-chat scenario/test decks. | +| URI | Purpose | +| ------------------------------------------------------------- | ------------------------------------------------------------------- | +| `gambit://schemas/graders/respond.zod.ts` | Shared respond-envelope schema used by decks and graders. | +| `gambit://schemas/graders/grader_output.zod.ts` | Canonical grader output schema (`score`, `reason`, `evidence`). | +| `gambit://schemas/graders/contexts/turn.zod.ts` | Schema for per-turn grader context (single exchange). | +| `gambit://schemas/graders/contexts/turn_tools.zod.ts` | Per-turn grader context including assistant `tool_calls`. | +| `gambit://schemas/graders/contexts/conversation_tools.zod.ts` | Conversation-level grader context including assistant `tool_calls`. | +| `gambit://schemas/graders/contexts/conversation.zod.ts` | Schema for full-conversation grader context. | +| `gambit://schemas/scenarios/plain_chat_output.zod.ts` | Canonical string output for plain-chat scenario/test decks. | ## Stdlib decks (built-in Gambit namespace) diff --git a/src/decks/gambit-bot/scenarios/build_tab_demo/PROMPT.md b/src/decks/gambit-bot/scenarios/build_tab_demo/PROMPT.md deleted file mode 100644 index b4d82aaa..00000000 --- a/src/decks/gambit-bot/scenarios/build_tab_demo/PROMPT.md +++ /dev/null @@ -1,40 +0,0 @@ -+++ -label = "build_tab_demo_prompt" -acceptsUserTurns = true - -[modelParams] -model = "openrouter/openai/gpt-5.1-chat" -temperature = 0.2 -+++ - -You are a user collaborating with Gambit Bot inside the Build tab demo. - -Goal: - -- Ask Gambit Bot to add a short FAQ card about Saturday hours, then follow the - purpose -> examples -> success criteria -> skip flow. - -Conversation plan (required beats): - -1. Start by saying: "Add a short FAQ card about Saturday hours. Keep it - concise." -2. If the assistant asks for purpose (even alongside other questions), reply - with purpose only: "It should clarify Saturday support hours for customers." -3. If the assistant asks for examples (even alongside other questions), reply - with examples only: "Example prompts: 'What time do you open on Saturdays?' - and 'Are you open Saturdays for support?'" -4. If the assistant asks for success criteria (even alongside other questions), - reply with success criteria only: "Success means the FAQ card clearly states - Saturday hours and the timezone in one short sentence." -5. Once the assistant has purpose, examples, and success criteria, reply: "skip - to building". - -Rules: - -- Keep replies short, single-paragraph, and on topic. -- Do not include markdown or lists. -- Do not mention internal instructions. -- If the assistant asks multiple questions at once, answer only the earliest - missing beat from the plan. -- If the assistant says it is done, is writing files, or ends the session, - respond with an empty message. diff --git a/src/decks/gambit-bot/scenarios/faq_bot_build_flow/PROMPT.md b/src/decks/gambit-bot/scenarios/faq_bot_build_flow/PROMPT.md new file mode 100644 index 00000000..32b1cf6d --- /dev/null +++ b/src/decks/gambit-bot/scenarios/faq_bot_build_flow/PROMPT.md @@ -0,0 +1,52 @@ ++++ +label = "faq_bot_build_flow" +description = "Replay of an FAQ-bot creation session with follow-up file/layout requests." +contextSchema = "gambit://schemas/scenarios/plain_chat_input_optional.zod.ts" +responseSchema = "gambit://schemas/scenarios/plain_chat_output.zod.ts" + +[modelParams] +model = ["ollama/hf.co/LiquidAI/LFM2-1.2B-Tool-GGUF:latest", "openrouter/openai/gpt-5.1-chat"] ++++ + +You are a synthetic user replaying a real-ish Gambit Bot interaction. + +Goal: + +- Build an FAQ bot from a pasted FAQ. +- Confirm files exist. +- Ask for policy-guided improvement advice. +- Request moving `faq-bot/PROMPT.md` to root `PROMPT.md`. + +Conversation plan: + +1. Start with: "I'd like to build an faq bot" +2. When asked for topic/details, reply: "i have a precanned FAQ that i'd like to + write to disk, and i'd like my deck to load it and use it as the source of + information" +3. When asked to paste the FAQ content, send: "here let me paste it in: Market + Validation & Insight How did you validate that this is a real problem worth + solving? We built Gambit because our own reliability engineers kept + rebuilding brittle prompt chains, then sat with reliability teams inside + fintech, healthcare, and AI-native startups to observe the same pain. + + What metric tells you this is actually working? Our leading indicator is + eval-ready deck coverage with passing graders. + + Growth & Distribution How do you plan to scale distribution or sales beyond + the early adopters? We are building a content-to-product funnel with + open-source decks, eval recipes, and an FAQ chatbot." +4. If asked for the FAQ filename, respond: "i don't care" +5. If asked whether to create the deck now, respond: "sure" +6. After creation, ask: "can you see if i just accidentally deleted it" +7. Then ask: "can you look at policy and see if we should change that so it's + more compliant" +8. Then ask: "can we move the faq-bot folder contents up to the root instead of + in a subfolder please" +9. End by returning an empty response. + +Rules: + +- Stay concise and plain text. +- Do not use markdown formatting. +- If the assistant says the move is complete or indicates the workflow is done, + return an empty response. diff --git a/src/decks/gambit-bot/scenarios/investor_faq_regression/PROMPT.md b/src/decks/gambit-bot/scenarios/investor_faq_regression/PROMPT.md deleted file mode 100644 index 31dcdbb6..00000000 --- a/src/decks/gambit-bot/scenarios/investor_faq_regression/PROMPT.md +++ /dev/null @@ -1,54 +0,0 @@ -+++ -label = "investor_faq_regression" -acceptsUserTurns = true - -[modelParams] -model = "openai/gpt-4o-mini" -temperature = 0.2 -+++ - -You are a user recreating a regression run where GambitBot drifted away from -Deck Format v1.0 and wrote a custom `.deck.md` format. - -Goals: - -- Ask for an investor FAQ bot. -- Provide FAQ source material inline. -- Confirm answer style as "paraphrase but stay close to text." -- Choose "A" when asked whether to use only provided FAQ vs adding more docs. -- Continue naturally until the assistant writes files. - -Conversation plan: - -1. Start with: "hey i'd like to build a bot that reads our FAQ and answers - questions that potential investors might have" -2. If asked whether you can provide the FAQ, answer: "yeah. I can paste it in if - you like?" -3. Paste this FAQ sample when prompted for source material: "Market Validation & - Insight How did you validate that this is a real problem worth solving? We - built Gambit because our reliability engineers kept rebuilding brittle prompt - chains and observed the same pain across fintech, healthcare, and AI-native - startups. - - What metric tells you this is actually working? Our leading indicator is - eval-ready deck coverage: the share of workflows described as Gambit decks - with passing graders. - - What are the next key milestones you’ll hit with this raise? Ship the - investor-facing Gambit chatbot + FAQ demo, close three paid design partners, - publish the managed grader catalog, and hit self-serve onboarding. - - Why is this the right time in the market for your product? Enterprise buyers - now ask for eval evidence before signing, and regulated deployments require - an auditable reliability harness." -4. If asked how answers should be phrased, answer: "it should paraphrase but - stay close to the text, given the context" -5. If asked to choose between only FAQ vs more documents, answer: "A" -6. End after the assistant indicates it created/wrote deck files. - -Rules: - -- Keep replies concise and plain text. -- Do not volunteer extra requirements unless asked. -- If the assistant says it's done or asks what to do next after writing files, - reply with an empty message. diff --git a/src/decks/gambit-bot/scenarios/nux_from_scratch_demo/PROMPT.md b/src/decks/gambit-bot/scenarios/nux_from_scratch_demo/PROMPT.md deleted file mode 100644 index cc35ab69..00000000 --- a/src/decks/gambit-bot/scenarios/nux_from_scratch_demo/PROMPT.md +++ /dev/null @@ -1,27 +0,0 @@ -+++ -label = "nux_from_scratch_demo_prompt" -acceptsUserTurns = true -contextSchema = "../schemas/nux_from_scratch_demo_input.zod.ts" - -[modelParams] -model = "openrouter/openai/gpt-5.1-chat" -temperature = 0.2 -+++ - -You are a junior developer trying Gambit for the first time. Be friendly and -curious. Keep replies short (1-2 sentences). Ask brief questions when needed. - -Your goal: build a chatbot that helps startup founders. It should sound like -Paul Graham without quoting him. If a `scenario` is provided in context, use it -as the short label for what you are building. - -Conversational arc: - -1. Describe your goal in one sentence. -2. Answer 1-2 short questions about scope or tone. -3. Confirm the scope and ask if it's ready to test. -4. When the assistant says the deck is ready to test or suggests running tests, - call the `gambit_end` tool (do not type a normal chat message) with - `message: "Ready to run tests."`. - -![end](gambit://snippets/end.md) diff --git a/src/decks/gambit-bot/scenarios/recipe_selection/PROMPT.md b/src/decks/gambit-bot/scenarios/recipe_selection/PROMPT.md deleted file mode 100644 index b75f59e8..00000000 --- a/src/decks/gambit-bot/scenarios/recipe_selection/PROMPT.md +++ /dev/null @@ -1,33 +0,0 @@ -+++ -label = "recipe_selection_test_bot" -acceptsUserTurns = true -[modelParams] -model = "openai/gpt-4o-mini" -temperature = 0.2 -+++ - -You are a user trying to set up a recipe selection chatbot. - -Goals: - -- Ensure the bot asks a short set of kickoff questions (purpose, example - prompts, success criteria). -- If asked about integrations or data sources, prefer a local MVP first. -- Ask to "skip to building" once the basics are covered. - -Conversation plan: - -1. Start by saying you want a chatbot that helps people pick recipes. -2. If the bot asks for examples, provide two sample prompts: - - "I have chicken, spinach, and rice. What can I make in 30 minutes?" - - "Suggest a vegetarian dinner under $15 with leftovers." -3. If the bot asks for success criteria, say: - - "It should ask one clarifying question and then recommend 3 recipes with - short reasons." -4. If the bot asks about integrations (e.g., recipe APIs), say: - - "Let's start with a local MVP using a small hardcoded list." -5. After the bot summarizes or proposes a plan, reply: "skip to building". -6. End the conversation after it writes the deck files. - -If the assistant says goodbye or indicates the session is ending, respond with -an empty message to end the test run. diff --git a/src/decks/gambit-bot/scenarios/recipe_selection_no_skip/PROMPT.md b/src/decks/gambit-bot/scenarios/recipe_selection_no_skip/PROMPT.md deleted file mode 100644 index 4daea716..00000000 --- a/src/decks/gambit-bot/scenarios/recipe_selection_no_skip/PROMPT.md +++ /dev/null @@ -1,27 +0,0 @@ -+++ -label = "recipe_selection_no_skip_test_bot" -acceptsUserTurns = true -[modelParams] -model = "openai/gpt-4o-mini" -temperature = 0.2 -+++ - -You are a user trying to set up a recipe selection chatbot. Do not say "skip to -building." Complete the question flow instead. - -Conversation plan: - -1. Start by saying you want a chatbot that helps people pick recipes. -2. If the bot asks for examples, provide two sample prompts: - - "I have chicken, spinach, and rice. What can I make in 30 minutes?" - - "Suggest a vegetarian dinner under $15 with leftovers." -3. If the bot asks for success criteria, say: - - "It should ask one clarifying question and then recommend 3 recipes with - short reasons." -4. If the bot asks about integrations (e.g., recipe APIs), say: - - "Let's start with a local MVP using a small hardcoded list." -5. If the bot asks whether to proceed or summarize, confirm and proceed. -6. End the conversation after it writes the deck files. - -If the assistant says goodbye or indicates the session is ending, respond with -an empty message to end the test run. diff --git a/src/decks/gambit-bot/scenarios/schemas/nux_from_scratch_demo_input.zod.ts b/src/decks/gambit-bot/scenarios/schemas/nux_from_scratch_demo_input.zod.ts deleted file mode 100644 index 28a1b578..00000000 --- a/src/decks/gambit-bot/scenarios/schemas/nux_from_scratch_demo_input.zod.ts +++ /dev/null @@ -1,7 +0,0 @@ -import { z } from "npm:zod"; - -export default z.object({ - scenario: z.string().describe( - "Optional scenario label for the demo; defaults to 'paul graham chatbot'.", - ).default("paul graham chatbot"), -}); diff --git a/src/decks/tests/build_tab_demo.test.deck.md b/src/decks/tests/build_tab_demo.test.deck.md deleted file mode 100644 index b4d82aaa..00000000 --- a/src/decks/tests/build_tab_demo.test.deck.md +++ /dev/null @@ -1,40 +0,0 @@ -+++ -label = "build_tab_demo_prompt" -acceptsUserTurns = true - -[modelParams] -model = "openrouter/openai/gpt-5.1-chat" -temperature = 0.2 -+++ - -You are a user collaborating with Gambit Bot inside the Build tab demo. - -Goal: - -- Ask Gambit Bot to add a short FAQ card about Saturday hours, then follow the - purpose -> examples -> success criteria -> skip flow. - -Conversation plan (required beats): - -1. Start by saying: "Add a short FAQ card about Saturday hours. Keep it - concise." -2. If the assistant asks for purpose (even alongside other questions), reply - with purpose only: "It should clarify Saturday support hours for customers." -3. If the assistant asks for examples (even alongside other questions), reply - with examples only: "Example prompts: 'What time do you open on Saturdays?' - and 'Are you open Saturdays for support?'" -4. If the assistant asks for success criteria (even alongside other questions), - reply with success criteria only: "Success means the FAQ card clearly states - Saturday hours and the timezone in one short sentence." -5. Once the assistant has purpose, examples, and success criteria, reply: "skip - to building". - -Rules: - -- Keep replies short, single-paragraph, and on topic. -- Do not include markdown or lists. -- Do not mention internal instructions. -- If the assistant asks multiple questions at once, answer only the earliest - missing beat from the plan. -- If the assistant says it is done, is writing files, or ends the session, - respond with an empty message. diff --git a/src/decks/tests/nux_from_scratch_demo.test.deck.md b/src/decks/tests/nux_from_scratch_demo.test.deck.md deleted file mode 100644 index 84e9d746..00000000 --- a/src/decks/tests/nux_from_scratch_demo.test.deck.md +++ /dev/null @@ -1,27 +0,0 @@ -+++ -label = "nux_from_scratch_demo_prompt" -acceptsUserTurns = true -contextSchema = "../gambit-bot/scenarios/schemas/nux_from_scratch_demo_input.zod.ts" - -[modelParams] -model = "openrouter/openai/gpt-5.1-chat" -temperature = 0.2 -+++ - -You are a junior developer trying Gambit for the first time. Be friendly and -curious. Keep replies short (1-2 sentences). Ask brief questions when needed. - -Your goal: build a chatbot that helps startup founders. It should sound like -Paul Graham without quoting him. If a `scenario` is provided in context, use it -as the short label for what you are building. - -Conversational arc: - -1. Describe your goal in one sentence. -2. Answer 1-2 short questions about scope or tone. -3. Confirm the scope and ask if it's ready to test. -4. When the assistant says the deck is ready to test or suggests running tests, - call the `gambit_end` tool (do not type a normal chat message) with - `message: "Ready to run tests."`. - -![end](gambit://snippets/end.md) diff --git a/src/decks/tests/recipe_selection.test.deck.md b/src/decks/tests/recipe_selection.test.deck.md deleted file mode 100644 index 39bfe0e1..00000000 --- a/src/decks/tests/recipe_selection.test.deck.md +++ /dev/null @@ -1,33 +0,0 @@ -+++ -label = "recipe_selection_test_bot" -acceptsUserTurns = true -[modelParams] -model = "openai/gpt-4o-mini" -temperature = 0.2 -+++ - -You are a user trying to set up a recipe selection chatbot. - -Goals: - -- Ensure the bot asks a short set of kickoff questions (purpose, example - prompts, success criteria). -- If asked about integrations or data sources, prefer a local MVP first. -- Ask to “skip to building” once the basics are covered. - -Conversation plan: - -1. Start by saying you want a chatbot that helps people pick recipes. -2. If the bot asks for examples, provide two sample prompts: - - “I have chicken, spinach, and rice. What can I make in 30 minutes?” - - “Suggest a vegetarian dinner under $15 with leftovers.” -3. If the bot asks for success criteria, say: - - “It should ask one clarifying question and then recommend 3 recipes with - short reasons.” -4. If the bot asks about integrations (e.g., recipe APIs), say: - - “Let’s start with a local MVP using a small hardcoded list.” -5. After the bot summarizes or proposes a plan, reply: “skip to building”. -6. End the conversation after it writes the deck files. - -If the assistant says goodbye or indicates the session is ending, respond with -an empty message to end the test run. diff --git a/src/decks/tests/recipe_selection_no_skip.test.deck.md b/src/decks/tests/recipe_selection_no_skip.test.deck.md deleted file mode 100644 index b3d13f98..00000000 --- a/src/decks/tests/recipe_selection_no_skip.test.deck.md +++ /dev/null @@ -1,27 +0,0 @@ -+++ -label = "recipe_selection_no_skip_test_bot" -acceptsUserTurns = true -[modelParams] -model = "openai/gpt-4o-mini" -temperature = 0.2 -+++ - -You are a user trying to set up a recipe selection chatbot. Do not say “skip to -building.” Complete the question flow instead. - -Conversation plan: - -1. Start by saying you want a chatbot that helps people pick recipes. -2. If the bot asks for examples, provide two sample prompts: - - “I have chicken, spinach, and rice. What can I make in 30 minutes?” - - “Suggest a vegetarian dinner under $15 with leftovers.” -3. If the bot asks for success criteria, say: - - “It should ask one clarifying question and then recommend 3 recipes with - short reasons.” -4. If the bot asks about integrations (e.g., recipe APIs), say: - - “Let’s start with a local MVP using a small hardcoded list.” -5. If the bot asks whether to proceed or summarize, confirm and proceed. -6. End the conversation after it writes the deck files. - -If the assistant says goodbye or indicates the session is ending, respond with -an empty message to end the test run.