Directory restructure + RAG token budget + persona recovery#271
Merged
Directory restructure + RAG token budget + persona recovery#271
Conversation
Move all application source code up from the historical 3-level-deep src/debug/jtag/ path to src/. Delete 155 unused root files (old configs, debug dumps, screenshots, dead scripts). Update all hardcoded path references across CI workflows, precommit hook, CLAUDE.md, README.md, and ~70 documentation files. Verified: TypeScript compiles, 923 Rust tests pass, npm start deploys with 254 commands + 17 daemons, ping + chat working.
…token accumulation ConversationHistorySource was converting its allocated token budget to a message count using TOKENS_PER_MESSAGE_ESTIMATE=50, then fetching exactly that many messages and returning ALL of them without verifying total tokens. With 200+ token messages this caused 4-5x budget overruns and context window overflows on DeepSeek, Fireworks, Together, and other providers. Fix: fetch generous batch (500), convert to LLM format, iterate newest-to-oldest accumulating actual token counts (chars/3), stop when allocated budget is exhausted. Token budget is now the ONLY constraint — no artificial message caps. ChatRAGBuilder: removed Math.min(50) hard cap that prevented 128K models from using more than 50 messages. Generous fetch limit now scales with context window.
…g it Old messages that exceed the token budget are now compressed to "SenderName: first line..." and prepended as a conversation summary. 85% of budget goes to recent verbatim messages, 15% reserved for consolidated older messages. The AI sees the full conversation arc instead of losing everything beyond the cutoff.
When evaluateAndPossiblyRespondWithCognition threw (API 400, timeout, network error), the bookmark was never updated. Rust's tick loop re-polled the same un-bookmarked message, re-enqueued it, and the persona retried the same failed message forever — silently stuck. Wrap response in try/finally so the bookmark always advances. A failed response attempt must not block the entire queue.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR reorganizes the codebase by relocating all source files from src/debug/jtag/ to src/, deletes obsolete build/test configuration files, removes unused Python scripts, and fixes critical issues with RAG token budget enforcement and persona message bookmark recovery.
Changes:
- Relocated 3,596 source files from
src/debug/jtag/tosrc/and updated all path references across documentation, workflows, and scripts - Implemented accurate token counting in ConversationHistorySource to prevent context window overflows (replaced 50-token estimates with actual char/3 calculation)
- Added try/finally protection to persona message processing to ensure bookmarks advance even when AI generation fails
Reviewed changes
Copilot reviewed 43 out of 3818 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/api/data-seed/README.md | Updated documentation paths from src/debug/jtag to src |
| src/README.md | Updated setup instructions with new directory path |
| scripts/templates/.ts, scripts/.sh, scripts/*.cjs | Removed unused template files and build scripts |
| papers/**/*.md | Updated implementation file path references |
| docs/**/*.md | Updated architecture and documentation cross-references |
| open-ports.txt, main.ts, lerna.json | Removed obsolete tracking and configuration files |
| jest.config.*.js, eslint.config.js, babel.config.cjs | Removed unused test and linting configuration files |
| archive/devtools_full_demo.py | Removed obsolete Python DevTools recovery script (1,411 lines) |
| README.md, CLAUDE.md | Updated all path references in main documentation |
| .github/workflows/*.yml | Updated CI/CD workflow paths |
| .eslintrc.js, .eslintignore | Removed legacy ESLint configuration files |
| .continuum/genome//*.md, .continuum/genome//*.sh | Updated Python training script paths |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
src/debug/jtag/tosrc/— 3596 file renames, delete root garbage (screenshots, old configs, dead scripts), delete unused Python files, update all hardcoded path references across workflows, docs, scriptsTOKENS_PER_MESSAGE_ESTIMATE=50to guess message counts, causing 4-5x budget overruns and context window overflows. Now accumulates actual token counts (chars/3) and stops at the allocated budget.Test plan
npm startdeploys successfully (254 commands, 17 daemons)./jtag pingconfirms server + browser connected