diff --git a/docs/debugging/exit-code-1-investigation.md b/docs/debugging/exit-code-1-investigation.md new file mode 100644 index 00000000..969ec368 --- /dev/null +++ b/docs/debugging/exit-code-1-investigation.md @@ -0,0 +1,267 @@ +# Happy CLI Exit Code 1 Investigation + +**Status:** ROOT CAUSE IDENTIFIED +**First observed:** 2026-01-13 +**Last updated:** 2026-01-13 + +## Summary + +Happy CLI crashes with exit code 1 when resuming sessions in stream-json mode. The crash is caused by Claude Code's **"only prompt commands are supported in streaming mode"** error. + +**Root Cause:** GitHub issue [#16712](https://github.com/anthropics/claude-code/issues/16712) - When resuming with `--input-format stream-json`, Claude Code injects a synthetic "No response requested" message BEFORE reading stdin, breaking the message chain. + +--- + +## CONFIRMED: Root Cause + +### Error Message (found in logs) +``` +"only prompt commands are supported in streaming mode" +``` + +This error appears **27 times** in a single session log, confirming it's the primary issue. + +### Mechanism +1. Happy starts session, Claude responds, session ends with pending state +2. Happy resumes with `--resume --input-format stream-json` +3. **Claude Code injects:** `[assistant] "No response requested."` (synthetic) +4. Happy sends user message via stdin +5. **Claude Code rejects:** "only prompt commands are supported in streaming mode" +6. Claude Code exits with code 1 + +### Why This Happens +Claude Code's stream-json resume behavior expects ONLY prompt commands (new user prompts). When Happy sends a continuation message after the synthetic injection, Claude Code rejects it. + +This is a **known Claude Code limitation**, not a Happy bug. + +--- + +## Crash Pattern + +### Timing +``` +[11:12:12.835] [claudeRemote] Message result +[11:12:12.836] [claudeRemote] Thinking state changed to: false +[11:12:12.837] [claudeRemote] Result received, exiting claudeRemote +[11:12:12.838] [PUSH] sendToAllDevices called +[11:12:12.839] [MessageQueue2] Waiting for messages... +[11:12:13.232] [Claude SDK] Process exit: code=1, stderr=43 bytes, stdout=5000 bytes +``` + +**Key observation:** ~395ms between "Waiting for messages" and crash. + +### stderr Content +Only contains version banner (43 bytes): +``` +[90mUsing Claude Code v2.1.3 from npm[0m +``` + +No actual error message - Claude Code exits without explaining why. + +--- + +## What We Ruled Out + +### 1. Claude Code Bug +Created `scripts/reproduce-crash.mjs` with identical flags: +- `--input-format stream-json` +- `--output-format stream-json` +- `--permission-prompt-tool stdio` +- `--mcp-config` (with Happy MCP server) +- `--resume` (with valid session ID) +- `DEBUG=true` environment + +**Result:** Process stayed alive 30+ seconds. No crash. + +### 2. stdin Closure +Checked `streamToStdin` logs - no `stdin.end()` called before crash. +stdin pipe was still open and waiting. + +### 3. MCP Transport Timeout +GitHub issue #11701 describes MCP transport timeout closing stdin. +Our crash pattern differs - stdin stays open. + +### 4. Synthetic Message Injection +GitHub issue #16712 describes resume injecting "No response requested". +**UPDATE:** This IS the cause. The error "only prompt commands are supported in streaming mode" appears 27 times in logs. + +--- + +## SUPERSEDED: Previous Hypothesis (Race Condition) + +**Note:** The race condition hypothesis is now superseded by the confirmed root cause above. Keeping for historical reference. + +### Concurrent Operations During Idle +When Happy waits for next user message, multiple systems are active: +1. **Push notifications** - `sendToAllDevices` sends to mobile +2. **Socket updates** - Broadcasts ready state to clients +3. **Mode hash checks** - Monitors for mode changes +4. **Keep-alive messages** - Periodic socket heartbeats + +### Possible Race Scenarios +1. **Mode hash change detection** triggers process restart while stdin write in progress +2. **Socket update** causes state change that affects Claude Code process +3. **Push notification callback** interferes with message queue +4. **Multiple concurrent awaits** on shared state + +### Why Options Correlate +Options (AskUserQuestion or `` XML) create longer idle time while user thinks. +More idle time = more opportunity for race condition to trigger. + +--- + +## Key Files to Investigate + +### claudeRemote.ts +```typescript +// Lines 190-213: After result, waits for next message +if (message.type === 'result') { + updateThinking(false); + logger.debug('[claudeRemote] Result received, exiting claudeRemote'); + opts.onReady(); // <-- Triggers push notifications, socket updates + const next = await opts.nextMessage(); // <-- Blocks here during crash + // ... +} +``` + +### claudeRemoteLauncher.ts +```typescript +// Lines 349-353: Mode hash check +if ((modeHash && msg.hash !== modeHash) || msg.isolate) { + logger.debug('[remote]: mode has changed, pending message'); + pending = msg; + return null; // <-- Could this race with other operations? +} +``` + +### query.ts +```typescript +// Lines 371-376: stderr capture (DEBUG enabled) +child.stderr.on('data', (data) => { + const text = data.toString() + stderrBuffer += text + if (process.env.DEBUG) { + console.error('Claude Code stderr:', text) + } +}) +``` + +--- + +## Debugging Steps + +### 1. Add Granular Logging (TODO) +In `claudeRemote.ts`, add logging: +```typescript +// Before nextMessage() +logger.debug('[claudeRemote] About to wait for next message'); + +// In nextMessage callback +logger.debug('[claudeRemote] nextMessage resolved/rejected'); +``` + +### 2. Instrument Concurrent Operations (TODO) +Log timestamps for: +- Push notification start/complete +- Socket update start/complete +- Mode hash check timing +- Any async operation during idle + +### 3. Monitor with DEBUG (DONE) +`spawnHappyCLI.ts` already modified to set `DEBUG=true`. +Next crash will capture full stderr. + +### 4. Check for Shared State Mutations (TODO) +Review all operations that modify: +- `mode` variable in claudeRemote +- `modeHash` in claudeRemoteLauncher +- `messages` PushableAsyncIterable state +- Any global/shared state + +--- + +## Reproduction Script + +Location: `scripts/reproduce-crash.mjs` + +```bash +# Basic test (no resume) +node scripts/reproduce-crash.mjs + +# With resume (needs valid session) +cd /Users/craigvanheerden/Repos +node infrastructure/happy-cli/scripts/reproduce-crash.mjs +``` + +Configuration flags at top of file: +- `USE_MCP` - Toggle MCP config +- `MCP_URL` - MCP server URL +- `USE_RESUME` - Toggle resume flag +- `SESSION_ID` - From command line arg + +--- + +## Related GitHub Issues + +| Issue | Title | Relevance | +|-------|-------|-----------| +| [#11701](https://github.com/anthropics/claude-code/issues/11701) | VS Code extension exit code 1 | MCP transport timeout pattern | +| [#16712](https://github.com/anthropics/claude-code/issues/16712) | tool_result via stdin on resume | Synthetic message injection | +| [#3187](https://github.com/anthropics/claude-code/issues/3187) | stream-json input hang | Second message hang pattern | +| [#5034](https://github.com/anthropics/claude-code/issues/5034) | Duplicate session entries | stream-json quirks | + +--- + +## Log Locations + +- Session logs: `~/.happy/logs/2026-01-13-HH-MM-SS-pid-XXXXX.log` +- Daemon logs: `~/.happy/logs/2026-01-13-HH-MM-SS-pid-XXXXX-daemon.log` +- Claude Code sessions: `~/.claude/projects/-Users-craigvanheerden-Repos/*.jsonl` + +### Useful grep commands +```bash +# Find crashes +grep "Process exit: code=1" ~/.happy/logs/*.log + +# Find DEBUG stderr output +grep "Claude Code stderr" ~/.happy/logs/*-daemon.log + +# Find timing around crash +grep -B10 "Process exit: code=1" ~/.happy/logs/*.log +``` + +--- + +## Next Actions + +1. [x] ~~Wait for next crash with DEBUG enabled~~ - Done, found root cause +2. [x] ~~Analyze stderr output for actual error~~ - Found "only prompt commands are supported in streaming mode" +3. [ ] **Monitor GitHub issue #16712** for Claude Code fix +4. [ ] **Potential workaround:** Don't use `--resume` with `--input-format stream-json` together +5. [ ] **Alternative:** Start fresh session instead of resuming when in stream-json mode + +--- + +## Potential Workarounds + +### Option 1: Avoid --resume with stream-json +Instead of resuming, start a fresh session each time. Loses conversation context but avoids the bug. + +### Option 2: Use --continue instead of --resume +May behave differently - needs testing. + +### Option 3: Wait for Claude Code fix +GitHub issue #16712 is open. Anthropic may fix this in a future release. + +### Option 4: Write tool_result to session file directly +The workaround mentioned in issue #16712 - fragile but works: +1. Write tool_result directly to `~/.claude/projects/.../.jsonl` +2. Resume with a text prompt + +--- + +## References + +- [GitHub Issue #16712](https://github.com/anthropics/claude-code/issues/16712) - Feature request to fix this behavior +- [Claude Code Headless Docs](https://code.claude.com/docs/en/headless) - Official documentation +- [Claude Agent SDK Spec](https://gist.github.com/SamSaffron/603648958a8c18ceae34939a8951d417) - Message format specification diff --git a/scripts/reproduce-crash.mjs b/scripts/reproduce-crash.mjs new file mode 100755 index 00000000..11fd6af6 --- /dev/null +++ b/scripts/reproduce-crash.mjs @@ -0,0 +1,133 @@ +#!/usr/bin/env node +/** + * Minimal reproduction script for Claude Code exit code 1 crash + * + * This script reproduces a crash that happens when: + * 1. Claude Code is spawned with --input-format stream-json and --permission-prompt-tool stdio + * 2. A conversation completes successfully (result message received) + * 3. ~400ms later, Claude Code exits with code 1 (no error message) + * + * To run: + * node scripts/reproduce-crash.mjs + * + * Expected: Process should wait indefinitely for next stdin message + * Actual: Process exits with code 1 after ~400ms + */ + +import { spawn } from 'child_process'; +import { createInterface } from 'readline'; + +// Configuration +const CLAUDE_CODE_PATH = 'claude'; // or full path to claude binary +const TEST_PROMPT = 'Say "hello" and nothing else.'; + +console.log('=== Claude Code Exit Code 1 Reproduction Script ===\n'); + +// Test configuration - toggle these to isolate the issue +const USE_MCP = true; // Happy always uses MCP +const MCP_URL = 'http://127.0.0.1:65222/'; // Use actual Happy MCP port or fake +const USE_RESUME = true; // Happy uses --resume for session continuity +const SESSION_ID = process.argv[2] || null; // Pass session ID as argument + +// Spawn Claude Code with the same flags Happy uses +const args = [ + '--output-format', 'stream-json', + '--input-format', 'stream-json', + '--verbose', + '--permission-prompt-tool', 'stdio', +]; + +// Add MCP config (Happy always passes this) +if (USE_MCP) { + args.push('--mcp-config', JSON.stringify({ + mcpServers: { + happy: { type: 'http', url: MCP_URL } + } + })); +} + +// Add resume flag (Happy uses this after first message in a session) +if (USE_RESUME && SESSION_ID) { + args.push('--resume', SESSION_ID); + console.log(`Using --resume with session: ${SESSION_ID}\n`); +} + +console.log(`Spawning: ${CLAUDE_CODE_PATH} ${args.join(' ')}\n`); + +const child = spawn(CLAUDE_CODE_PATH, args, { + stdio: ['pipe', 'pipe', 'pipe'], + env: { ...process.env, DEBUG: 'true' } +}); + +// Track timing +let resultReceivedAt = null; +let processExitedAt = null; + +// Handle stderr +child.stderr.on('data', (data) => { + console.log(`[stderr] ${data.toString().trim()}`); +}); + +// Handle stdout (stream-json output) +const rl = createInterface({ input: child.stdout }); +rl.on('line', (line) => { + try { + const msg = JSON.parse(line); + console.log(`[stdout] type=${msg.type}${msg.subtype ? ` subtype=${msg.subtype}` : ''}`); + + if (msg.type === 'result') { + resultReceivedAt = Date.now(); + console.log(`\n>>> Result received at ${new Date(resultReceivedAt).toISOString()}`); + console.log('>>> Now waiting for stdin... process should stay alive indefinitely'); + console.log('>>> If process exits with code 1 in ~400ms, bug is reproduced\n'); + } + } catch (e) { + console.log(`[stdout] (non-JSON) ${line}`); + } +}); + +// Handle process exit +child.on('close', (code, signal) => { + processExitedAt = Date.now(); + const timeSinceResult = resultReceivedAt ? processExitedAt - resultReceivedAt : 'N/A'; + + console.log(`\n=== Process Exited ===`); + console.log(`Exit code: ${code}`); + console.log(`Signal: ${signal}`); + console.log(`Time since result: ${timeSinceResult}ms`); + + if (code === 1 && resultReceivedAt && timeSinceResult < 1000) { + console.log(`\n>>> BUG REPRODUCED! Process exited with code 1 after ${timeSinceResult}ms`); + console.log('>>> This is the issue being reported.'); + } else if (code === 0) { + console.log('\n>>> Process exited normally (code 0)'); + } else if (signal) { + console.log(`\n>>> Process was killed by signal ${signal}`); + } +}); + +// Handle spawn error +child.on('error', (err) => { + console.error(`[error] Failed to spawn: ${err.message}`); + process.exit(1); +}); + +// Send initial message after a short delay +setTimeout(() => { + const userMessage = { + type: 'user', + message: { + role: 'user', + content: TEST_PROMPT + } + }; + + console.log(`[stdin] Sending user message: "${TEST_PROMPT}"`); + child.stdin.write(JSON.stringify(userMessage) + '\n'); + + // DO NOT close stdin - we want to test idle behavior + // child.stdin.end(); +}, 1000); + +// Keep the script running +console.log('Script will wait indefinitely. Press Ctrl+C to exit.\n'); diff --git a/src/ui/logger.test.ts b/src/ui/logger.test.ts new file mode 100644 index 00000000..41bdf6b5 --- /dev/null +++ b/src/ui/logger.test.ts @@ -0,0 +1,69 @@ +import { describe, it, expect } from 'vitest' + +/** + * Serialize a value for logging, with special handling for Error objects. + * (Copy of the function from logger.ts for testing) + */ +function serializeForLog(value: unknown): string { + if (typeof value === 'string') { + return value + } + if (value instanceof Error) { + return JSON.stringify({ + name: value.name, + message: value.message, + stack: value.stack, + ...value + }) + } + return JSON.stringify(value) +} + +describe('serializeForLog', () => { + it('returns strings as-is', () => { + expect(serializeForLog('hello')).toBe('hello') + }) + + it('serializes plain objects', () => { + expect(serializeForLog({ foo: 'bar' })).toBe('{"foo":"bar"}') + }) + + it('serializes Error objects with name, message, and stack', () => { + const error = new Error('test error') + const serialized = serializeForLog(error) + const parsed = JSON.parse(serialized) + + expect(parsed.name).toBe('Error') + expect(parsed.message).toBe('test error') + expect(parsed.stack).toBeDefined() + expect(parsed.stack).toContain('test error') + }) + + it('serializes TypeError with correct name', () => { + const error = new TypeError('invalid type') + const serialized = serializeForLog(error) + const parsed = JSON.parse(serialized) + + expect(parsed.name).toBe('TypeError') + expect(parsed.message).toBe('invalid type') + }) + + it('includes additional enumerable properties on errors', () => { + const error = new Error('with extra') as Error & { code: string } + error.code = 'ENOENT' + const serialized = serializeForLog(error) + const parsed = JSON.parse(serialized) + + expect(parsed.code).toBe('ENOENT') + }) + + it('handles null and undefined', () => { + expect(serializeForLog(null)).toBe('null') + expect(serializeForLog(undefined)).toBe(undefined) + }) + + it('handles numbers and booleans', () => { + expect(serializeForLog(42)).toBe('42') + expect(serializeForLog(true)).toBe('true') + }) +}) diff --git a/src/ui/logger.ts b/src/ui/logger.ts index ecf739b6..8f6bcf55 100644 --- a/src/ui/logger.ts +++ b/src/ui/logger.ts @@ -13,6 +13,27 @@ import { join, basename } from 'node:path' // Note: readDaemonState is imported lazily inside listDaemonLogFiles() to avoid // circular dependency: logger.ts ↔ persistence.ts +/** + * Serialize a value for logging, with special handling for Error objects. + * Error objects don't serialize well with JSON.stringify() because their + * properties (message, stack, name) are not enumerable. + */ +function serializeForLog(value: unknown): string { + if (typeof value === 'string') { + return value + } + if (value instanceof Error) { + return JSON.stringify({ + name: value.name, + message: value.message, + stack: value.stack, + // Include any additional enumerable properties (e.g., cause, code) + ...value + }) + } + return JSON.stringify(value) +} + /** * Consistent date/time formatting functions */ @@ -188,9 +209,7 @@ class Logger { body: JSON.stringify({ timestamp: new Date().toISOString(), level, - message: `${message} ${args.map(a => - typeof a === 'object' ? JSON.stringify(a, null, 2) : String(a) - ).join(' ')}`, + message: `${message} ${args.map(serializeForLog).join(' ')}`, source: 'cli', platform: process.platform }) @@ -201,9 +220,7 @@ class Logger { } private logToFile(prefix: string, message: string, ...args: unknown[]): void { - const logLine = `${prefix} ${message} ${args.map(arg => - typeof arg === 'string' ? arg : JSON.stringify(arg) - ).join(' ')}\n` + const logLine = `${prefix} ${message} ${args.map(serializeForLog).join(' ')}\n` // Send to remote server if configured if (this.dangerouslyUnencryptedServerLoggingUrl) {