-
Notifications
You must be signed in to change notification settings - Fork 0
fix(activity): stabilize codex and opencode states #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Ensure codex in-flight ends within 2.5s of completion, pick active JSONL per process, and make opencode message completion authoritative. Remove wheel passive listener warnings and align docs/tests.
Start the server on the next available port during dev and ignore stale OpenCode status events when determining in-flight state.
1 similar comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ee9f28dc13
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/opencodeEvents.ts
Outdated
| if (status.toLowerCase() === "idle") { | ||
| state.inFlight = false; | ||
| state.lastInFlightSignalAt = undefined; | ||
| } else { | ||
| state.inFlight = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Treat paused/stopped status as idle
If OpenCode emits session.status values like paused or stopped, this branch sets inFlight to true for any status other than idle, so those sessions will be marked active (and kept active until the in-flight timeout) even though the rest of the code treats stopped/paused as idle (see deriveOpenCodeState’s idle regex). This causes false “active” indicators whenever the server uses non-idle-but-not-busy statuses. Consider treating paused/stopped as idle here too or mapping statuses consistently with deriveOpenCodeState.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chatgpt-codex-connector Thanks — paused/stopped are already treated as idle via IDLE_STATUS_RE = /^(idle|paused|stopped)$/i in src/opencodeEvents.ts, and status is normalized in summarizeEvent. No additional change needed here. If you’re seeing a different status string, share it and I’ll add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary
- No code changes were required based on the trigger context.
Testing
⚠️ Not run (no code changes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
integrate-your-mind
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review: fix(activity): stabilize codex and opencode states
63 files changed | +6798 / -1422 | 27 commits
Critical (3)
C1. Codex file watcher never cleaned up on shutdown
src/server.ts:881-908
startCodexWatcher() creates a chokidar FSWatcher but there's no acquireRelease for it in the Effect runtime scope, and shutdown() never calls stopCodexWatcher(). The two existing acquireRelease pairs cover startReloadWatcher and startOpenCodeListener, but the codex watcher is missing. This is a resource leak on every run — the watcher continues polling the filesystem after SIGINT/SIGTERM, potentially preventing clean process exit.
C2. CONSENSUS_POLL_MS default disagrees between server and scanner
src/server.ts:134 vs src/scan.ts:787
server.ts:Math.max(50, Number(process.env.CONSENSUS_POLL_MS || 250))→ default 250, min 50scan.ts:Math.max(250, Number(process.env.CONSENSUS_POLL_MS || 500))→ default 500, min 250docs/configuration.md: documents default as 250
Two different defaults and minimums for the same env var. Users setting this variable get different enforcement depending on the code path.
C3. CONSENSUS_OPENCODE_TIMEOUT_MS code default contradicts docs
src/scan.ts:987 vs docs/configuration.md:79
- Code default: 1000ms
- Documented default: 5000ms
Users relying on the documented default get 1/5th the actual timeout, causing false "unreachable" results for slow opencode servers.
High (10)
H1. getThreadSnapshot is a read method with destructive side effects
src/codex/lifecycleGraph.ts:268-299
getThreadSnapshot calls this.dropThread(threadId) (permanently deletes the thread and all path mappings) and mutates state.lastEndAt. Any caller that calls it multiple times or calls it to "peek" at state gets different results on each call.
H2. summarizeTail treats pendingEndAt as in-flight, defeating deferred-end
src/codexLogs.ts:1500-1504
const hasPendingEnd = typeof state.pendingEndAt === "number";
const inFlight = state.inFlight || state.reviewMode || hasOpenCalls || hasPendingEnd || state.turnOpen;The entire purpose of pendingEndAt is to defer the idle transition to prevent flicker. But summarizeTail unconditionally treats it as inFlight = true, keeping the session active for the full timeout duration (2500ms) even after the agent has clearly stopped. This is the opposite of the intended behavior.
H3. noteRecent overflow counter drift prevents idle transitions in long sessions
src/codex/lifecycleGraph.ts:57-74
When >500 tool IDs are tracked, evicted IDs are counted as openAnonTools. But if a tool completed before eviction, the counter is permanently inflated. Over a long session, openCallCount drifts upward, preventing idle transitions since openCallCount > 0 forces inFlight = true.
H4. gwe() always resolves --show-toplevel from $main instead of $cwd
dev/worktrees.sh:199
cur="$("$env_bin" -u GIT_DIR ... "$git_bin" -C "$main" rev-parse --show-toplevel 2>/dev/null)"Should use "$cwd" instead of "$main". As written, $cur always equals $main, so the safety check if [ "$cur" = "$main" ] always triggers and the function always refuses to remove any worktree.
H5. shutdown() defined as const after void main() — temporal dead zone risk
dev/dev.mjs:99,213
shutdown is a const arrow function defined at line 213, but void main() is called at line 211. The child.on("exit") callback at line 96 could fire before module evaluation completes if the child dies synchronously, causing a ReferenceError. Define shutdown before startServer.
H6. /api/snapshot?mode=full races on concurrent requests
src/server.ts:293-340
Two concurrent ?mode=full requests both mutate lastBaseSnapshot/lastSnapshot without locking or deduplication, causing inconsistent broadcast state.
H7. CONSENSUS_OPENCODE_EVENT_ACTIVE_MS default mismatch
src/scan.ts:1721 vs docs/configuration.md:90
Code defaults to 1000ms, docs say 0ms. Agents appear active 1s longer than documented.
H8. Unbounded module-level caches in scan.ts
src/scan.ts:71,91,144
repoCache, opencodePrefetchFailures, pidIdentityCache grow without bound. No eviction strategy. pidIdentityCache is particularly concerning since PIDs are reused by the OS, accumulating stale entries.
H9. checkExistingSetup uses JSON.parse with unsafe type assertion
src/cli/setup.ts:16-23
return JSON.parse(content) as ConsensusConfig; // No validation!No runtime validation despite ConsensusConfigSchema existing for exactly this purpose. Corrupt config files will crash silently downstream.
H10. Test isolation: tailStates Map never cleared between tests
tests/integration/codexLogs.test.ts
The updateTail() function relies on module-level tailStates and sessionMetaCache Maps that persist across tests. The sessionMetaCache deduplication logic (now - cached.checkedAt < SESSION_META_RESYNC_MS) means two tests running within 10s using the same session path will get the first test's cached metadata.
Medium (17)
M1. codexLifecycleGraph singleton (343 lines) is entirely unused
src/codex/lifecycleGraph.ts:343
The codexLifecycleGraph singleton is exported but never imported or used anywhere in the codebase. This entire module is dead code shipping in production with runtime path resolution, env var reads, and mutable state.
M2. hasIncompletePart computed but never used
src/opencodeApi.ts:363,377
Variable is set but never referenced in the in-flight determination — dead code from the refactor.
M3. SSE busy status can be overridden by stale message API response
src/scan.ts:~3296-3315
The guard !statusAuthorityIsIdle only prevents message API from overriding an idle SSE status. It does not protect a busy SSE status from being overridden by a stale message API response, causing brief flicker to idle.
M4. expireInFlight clears lastActivityAt, losing historical data
src/opencodeEvents.ts:91-105
Setting lastActivityAt = undefined on timeout triggers the "no_signal" fallback path downstream, masking real state history from the UI.
M5. Duplicate Set allocations inside processLine() on every JSONL line
src/codexLogs.ts:1102-1119
itemStartWorkTypes and itemEndStatuses Sets are allocated inside processLine() (called per line). Module-level constants ITEM_START_WORK_TYPES and ITEM_END_STATUSES exist at the top of the file but aren't used.
M6. labelFor() renders "codex#undefined" when pid is absent
public/src/lib/format.ts:69
AgentSnapshot.pid is now optional, but the fallback codex#${agent.pid} doesn't handle the undefined case.
M7. Mock snapshot server doesn't work with flicker-detect script
scripts/mock-snapshot-server.js:78
Strict url === "/api/snapshot" comparison fails when flicker-detect calls with ?cached=1. The URL should be parsed to extract the pathname.
M8. Unbounded output string accumulation in dev server
dev/dev.mjs:86-94
The onData handler continues concatenating stdout/stderr to output forever after the promise resolves. For a long-running dev server, this grows without bound. Remove the listener after the promise settles.
M9. tmux-start.sh dropped the -A flag — TOCTOU race
dev/tmux-start.sh:14
The old new -A -s scout was atomic. The new has-session + new -s scout has a TOCTOU race where a session could be created between the check and the create.
M10. Variable shadowing in 1770-line scanCodexProcesses
src/scan.ts:1315,1381,1751,1831
jsonlPaths and cwdRaw are redeclared in nested scopes within the same massive function, making reasoning about state error-prone.
M11. setup.ts TOCTOU race on concurrent runs
src/cli/setup.ts:58-156
Read-modify-write of config file with no file locking. Concurrent npx consensus-cli setup runs overwrite each other.
M12. Test flakiness: real-time setTimeout with 30ms margins
tests/integration/codexLogs.test.ts (19+ locations)
Many tests use CONSENSUS_CODEX_INFLIGHT_TIMEOUT_MS = "50" then setTimeout(resolve, 80) — only 30ms margin. Under CI load, these will be flaky.
M13. Env var leak on test failure
tests/integration/codexLogs.test.ts:674-729
CONSENSUS_CODEX_INFLIGHT_TIMEOUT_MS is set but not cleaned in try/finally. If any assertion fails, the env var leaks to all subsequent tests.
M14. waitForExit always returns true
tests/integration/liveCodexHarness.ts:187-194
Promise.race can't distinguish process exit from delay timeout (both resolve), so the SIGKILL fallback never fires and hung processes are leaked.
M15. waitForPaneOutput auth guard is vacuous
tests/integration/codexLiveToolTimeout.test.ts:419-430
Predicate output.trim().length > 0 matches the shell prompt immediately. The auth-check regex runs against the prompt, which contains neither auth error strings, so it passes vacuously.
M16. E2E live demo: 30s sleep with zero assertions
e2e/ui/codexTuiLiveDemo.pw.ts:156
Adds 4 minutes to CI with no regression detection capability.
M17. Module-level process.env mutations with process.on("exit") cleanup
tests/integration/opencodeActivity.test.ts:7-8
Fragile cross-file ordering when both opencodeActivity.test.ts and opencodeSessionActivity.test.ts run in the same process.
Low (12)
| # | Issue | Location |
|---|---|---|
| L1 | isReviewEnter sets inFlight = true unconditionally (unlike all other handlers that check canSignal) |
src/codexLogs.ts:1185-1194 |
| L2 | stripQuotes regex removes mismatched quotes; also duplicated in scan.ts |
src/codexCmd.ts:11, src/scan.ts:254 |
| L3 | resolveCodexHome duplicated between codexNotify.ts and codexLogs.ts |
src/codexNotify.ts:7-17 |
| L4 | setupCodexHook won't update existing notifications line lacking # consensus-cli comment |
src/cli/setup.ts:134-138 |
| L5 | escapeHtml export now unused after removal from AgentPanel.tsx |
public/src/lib/format.ts |
| L6 | Command injection possible in test helper via JSON.stringify quoting (not shell-safe) |
e2e/ui/codexTuiLiveDemo.pw.ts:60-66 |
| L7 | flicker-detect.js writes to relative tmp/ without anchoring to __dirname |
scripts/flicker-detect.js:14 |
| L8 | flicker-detect.js arg parser silently swallows flags whose values start with - |
scripts/flicker-detect.js:29-30 |
| L9 | Replaced codexEventStore.test.ts drops coverage for multi-thread independence |
tests/unit/codexEventStore.test.ts |
| L10 | No test for function_call_output with unknown call_id |
tests/integration/codexLogs.test.ts |
| L11 | Fire-and-forget void (async () => {...})() silently swallows CDP errors |
tests/integration/codexLiveDebugger.test.ts:167 |
| L12 | wsProtocolSeq can theoretically overflow MAX_SAFE_INTEGER |
src/server.ts:56 |
Top Recommendations
- Fix the config default mismatches (C2, C3, H7) — most user-impacting. Code and docs disagree on 3 env vars.
- Add
acquireReleaseforstopCodexWatcher(C1) — one-line fix for resource leak on every shutdown. - Revisit
pendingEndAtin-flight logic (H2) — the deferred-end mechanism is currently self-defeating. - Guard
noteRecentcounter drift (H3) — long-running sessions permanently appear active. - Remove or wire up the 343-line unused
codexLifecycleGraphmodule (M1). - Extract
scanCodexProcesses(M10) — a 1770-line function is the root cause of several bugs found here.
🤖 Review performed by Claude Code agent swarm (4 parallel reviewers: core logic, server/config, frontend/infra, test quality)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Tail Codex JSONL incrementally when log growth exceeds MAX_READ_BYTES and surface catch-up via TailState.needsCatchUp. - Apply shouldParseJsonLine() before JSON.parse to skip noisy delta lines while still recording ingest time. - Add integration coverage for needsCatchUp behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Review comment fixes pushed (4 commits):\n\n- fix(codex): ignore token_count for pending-end markers (65b2b43)\n- fix(setup): merge required codex notifications (c98983c)\n- fix(ui): avoid spatial cell-key wrapping (2d1c9bd)\n- test(ui): filter live TUI demo lane by match query (c5d17d9)\n\nBuild gate:\n- npm run build: ✓ built in 360ms\n\nTest gate:\n- npm run test:unit: # pass 171 # fail 0\n- npm run test:integration: # pass 71 # fail 0\n\nUI gate:\n- npm run test:ui: 19 passed (6.2s), 2 skipped (env-gated demos)\n\nEvidence gate (flicker):\n- tmp/flicker-summary.json (intervalMs=250 durationMs=120000 windowMs=10000 totalFlickerCount=0)\n- tmp/flicker-summary.transitions.jsonl\n\nDemo video gate (live TUI):\n- Playwright raw: test-results/ui-codexTuiLiveDemo.pw.ts--c9c35--active---idle-30s-capture-/video.webm\n- Trimmed 30s mp4: tmp/codexTuiLiveDemo-30s.mp4 |
|
CI fix: npm test now works in bash (GitHub Actions) without relying on ** globstar.\n\n- Commit: cf91d42 fix(test): make npm test portable across shells\n- Added scripts/run-node-tests.js and updated package.json test scripts.\n\nLocal verification (summary lines):\n- npm run build: ✓ built in 331ms\n- npm test: unit # pass 171 # fail 0; integration # pass 71 # fail 0\n\nGitHub Actions:\n- build check now passing (run 21740671370). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/codexLogs.ts
Outdated
| state.inFlight = false; | ||
| state.inFlightStart = false; | ||
| state.turnOpen = false; | ||
| state.reviewMode = false; | ||
| state.pendingEndAt = undefined; | ||
| state.lastEndAt = undefined; | ||
| state.lastToolSignalAt = undefined; | ||
| state.lastInFlightSignalAt = undefined; | ||
| if (state.openCallIds) state.openCallIds.clear(); | ||
| state.openItemCount = 0; | ||
| } | ||
| state.lastEventAt = undefined; | ||
| state.lastActivityAt = undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 Stale Codex session file does not clear pendingEnd/open-call markers unless state.inFlight is true
When a Codex session JSONL file becomes stale, updateTailLegacy intends to clear all state so the session can’t remain active. However, the isStaleFile branch only clears pendingEndAt, turnOpen, reviewMode, openCallIds, etc. inside if (state.inFlight).
Actual behavior: if state.inFlight is already false but pendingEndAt is still set (or other in-flight-contributing fields are set), the stale-file branch won’t clear those fields; later summarizeTail treats pendingEndAt/open calls/turnOpen as in-flight, so a stale file can keep a session ghost-“inFlight”.
Expected behavior: stale-file handling should clear end markers / turnOpen / review/open-call tracking regardless of state.inFlight, because those fields also contribute to summary.inFlight.
Root Cause
In src/codexLogs.ts, the stale-file guard is:
if (isStaleFile) { if (state.inFlight) { ... state.pendingEndAt = undefined; ... if (state.openCallIds) state.openCallIds.clear(); state.openItemCount = 0; } state.lastEventAt = undefined; ... }(src/codexLogs.ts:913-929).
But summarizeTail defines in-flight as:
state.inFlight || state.reviewMode || hasOpenCalls || hasPendingEnd || state.turnOpen(src/codexLogs.ts:1505-1510).
So pendingEndAt/open-call/turnOpen can keep summary.inFlight true even when state.inFlight is false, and the stale-file path won’t clear them.
Impact: stale sessions can linger as active/in-flight (or at least “busy”) in snapshots, producing incorrect UI state and preventing agent cleanup.
(Refers to lines 913-929)
Was this helpful? React with 👍 or 👎 to provide feedback.
|
Follow-up fix for latest Devin inline review comment (
Build gate:
Test gate:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| const expireInFlight = () => { | ||
| if (!state.inFlight) return; | ||
| if (!Number.isFinite(inflightTimeoutMs) || inflightTimeoutMs <= 0) return; | ||
| if (process.env.CODEX_TEST_HOOKS === "1") { | ||
| "TEST_HOOK_EXPIRE_CHECK"; | ||
| } | ||
| if (!state.inFlight && !state.pendingEndAt) return; | ||
| if (!Number.isFinite(inflightTimeoutMs) || inflightTimeoutMs <= 0) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 Codex tail expiration can get stuck when turnOpen/openCallIds are set but state.inFlight is false
Codex tail state can become permanently inFlight (via summarizeTail) because expireInFlight() returns early unless state.inFlight or pendingEndAt is set.
Root Cause & Impact
expireInFlight() starts with:
if (!state.inFlight && !state.pendingEndAt) return;(/home/ubuntu/repos/repo-ab792c74ddf9424a89ed9851e3e65943/src/codexLogs.ts:852-857)
But summarizeTail() treats a session as in-flight if any of these are true: turnOpen, openCallCount > 0, reviewMode, or pendingEndAt (/home/ubuntu/repos/repo-ab792c74ddf9424a89ed9851e3e65943/src/codexLogs.ts:1506-1511).
There are code paths that set turnOpen and/or openCallIds/openItemCount even when canSignal is false (so state.inFlight may remain false):
- Turn/thread start sets
state.turnOpen = trueunconditionally, and only setsstate.inFlight = trueinside theif (canSignal)block (/home/ubuntu/repos/repo-ab792c74ddf9424a89ed9851e3e65943/src/codexLogs.ts:1115-1123). - Tool-call start adds to
openCallIds/ incrementsopenItemCountunconditionally, and only setsstate.inFlight = trueinside theif (canSignal)block (/home/ubuntu/repos/repo-ab792c74ddf9424a89ed9851e3e65943/src/codexLogs.ts:1237-1256).
If canSignal is false (e.g., timestamp considered stale by CONSENSUS_CODEX_SIGNAL_MAX_AGE_MS, or file is stale and keepStale is false), the state can end up with turnOpen=true or openCallCount>0 while state.inFlight=false and pendingEndAt unset. In that case:
summarizeTail()reportsinFlight=true(becauseturnOpen/openCallCountcontributes).expireInFlight()never runs and never clearsturnOpen/open-call tracking.
Actual: agent can stay incorrectly active indefinitely.
Expected: stale/unsignalable events should not create sticky turnOpen/open-call state, or expiration should still run when these markers are set.
Prompt for agents
In src/codexLogs.ts, make expireInFlight() consider all fields that contribute to summarizeTail().inFlight, not just state.inFlight/pendingEndAt.
Concretely:
1) In updateTailLegacy(), change the early return in expireInFlight() (currently: if (!state.inFlight && !state.pendingEndAt) return;) so it also runs when state.turnOpen is true, state.reviewMode is true, or openCallCount > 0.
2) Alternatively (or additionally), prevent turnOpen/open-call tracking from being set when canSignal is false (stale/unsignalable events). For example:
- In the isTurnStart/isResponseStart branches and tool-call start branches, gate state.turnOpen mutation and openCallIds/openItemCount mutations behind canSignal.
3) Add/adjust a unit/integration test demonstrating the stuck-active case: write a stale-timestamp turn.started or tool-call event that sets turnOpen/openCallCount while canSignal=false, then verify a later updateTail call can clear/expire the in-flight summary.
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
Testing