Rust Sentinel Module + Provider-Scoped ModelRegistry + ModelCapabilities Type System#269
Rust Sentinel Module + Provider-Scoped ModelRegistry + ModelCapabilities Type System#269
Conversation
- Add SentinelRunner.ts with full step execution loop - Support all step types: command, llm, condition, watch, sentinel, emit - Implement loop control: once, count, until, while, continuous, event - Variable substitution with $variable.path[0].property syntax - Safety limits: maxIterations, timeoutMs - Nested sentinel spawning with await support - PipelineSentinelDefinition type with proper discriminator - Standalone test suite validating engine mechanics - Rust SentinelModule for process isolation (from previous work)
Features:
- LLM tool calling: parse ```tool JSON blocks, execute via Commands
- Parallel steps: concurrent execution with failFast option
- Event triggers: SentinelTriggerManager with debounce/throttle
- ParallelStep type for concurrent nested step execution
Fixes:
- Success determination: use startsWith('Completed') not includes
(error messages containing 'completed' were incorrectly marked success)
Tests:
- Olympics validation: build-fix-loop and PR review patterns
- Structure validation for complex multi-step sentinels
- Condition branching and loop control validation
- Add 'pipeline' sentinel type to sentinel/run command - Add PipelineSentinelParams interface for JSON pipeline definitions - Wire SentinelRunner into sentinel command system - Fix WorkerClient to include 'command' field for Rust IPC compatibility - Fix LoggerModule to extract nested 'payload' field from WorkerClient - Replace console.log in SentinelRunner with proper Logger system The WorkerClient was sending 'type' field but Rust IPC expected 'command'. The LoggerModule was expecting flat params but WorkerClient nested data under 'payload'. Both patterns are now supported. Live tested with multi-step pipelines via ./jtag sentinel/run --type=pipeline
…ucture
Pipeline execution now runs entirely in Rust SentinelModule:
- Shell steps with /bin/sh -c for commands with spaces
- LLM steps route through AIProviderModule
- Command steps route through ModuleRegistry
- Condition/loop steps with variable interpolation
- Step results written to steps.jsonl
Logging infrastructure properly integrated:
- Module logs: .continuum/jtag/logs/system/modules/sentinel.log
- Pipeline logs: .continuum/jtag/logs/system/sentinels/{handle}/
- LoggerModule resolve_log_path handles sentinel categories
TypeScript commands now route to Rust:
- sentinel/run passes pipeline JSON to Rust
- sentinel/logs/* use RustCoreIPCClient typed methods
- sentinel/status, sentinel/save route through Rust
Deleted ~7,776 lines of TypeScript sentinel infrastructure:
- AgentSentinel, BuildSentinel, OrchestratorSentinel
- TaskSentinel, VisualSentinel, SentinelRunner
- SentinelWorkspace, SentinelTrigger, SentinelLogWriter
- Associated tests
sentinel.rs at 2096 lines - needs decomposition (ironic).
… processes
This is foundational infrastructure for sentinels and other spawned tasks
to execute ANY command (Rust or TypeScript) without knowing where it's
implemented.
New: runtime/command_executor.rs
- CommandExecutor struct with registry + WebSocket TS bridge
- execute() returns CommandResult
- execute_json() convenience method for most use cases
- Global executor initialized at startup
- TS bridge via WebSocket to JTAGSystemServer (port 9001)
Integration:
- Initialized in ipc/mod.rs after runtime is ready
- SentinelModule uses execute_json() for command steps
- Sentinel no longer needs to know if command is Rust or TS
API:
```rust
// From anywhere in continuum-core:
let result = runtime::execute_command("code/edit", params).await?;
let json = runtime::execute_command_json("any/command", params).await?;
```
This enables sentinels to call file editing, screenshot, and other
TypeScript commands that were previously unreachable from Rust.
- Replace WebSocket-based TypeScript command routing with Unix socket - Use existing CommandRouterServer at /tmp/jtag-command-router.sock - CommandRouterServer now uses getCommandsInterface() for proper routing - Sentinel command steps can now execute both Rust and TypeScript commands Architecture: - Rust commands: Route via ModuleRegistry (direct, 0ms) - TypeScript commands: Route via Unix socket → CommandRouterServer → CommandDaemon - Browser commands (screenshot, etc.) require CLI routing (architectural constraint) Tested with multi-step pipeline: health-check (Rust) + help (TypeScript) + shell
There was a problem hiding this comment.
Pull request overview
This PR implements a complete Rust-based sentinel execution engine with universal command routing. The changes represent a significant architectural shift, moving pipeline execution from TypeScript to Rust and establishing a bidirectional command bridge between the two layers.
Changes:
- New Rust
CommandExecutormodule for universal command routing (Rust-to-Rust direct, Rust-to-TypeScript via Unix socket) - Complete removal of TypeScript sentinel implementations (BuildSentinel, OrchestratorSentinel, VisualSentinel, TaskSentinel, AgentSentinel)
- Refactored command implementations to delegate to Rust SentinelModule
- Updated logging paths from
.sentinel-workspaces/to.continuum/jtag/logs/system/sentinels/ - New comprehensive LoRA mesh distribution architecture documentation
Reviewed changes
Copilot reviewed 42 out of 43 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| command_executor.rs | New universal command executor with socket-based TypeScript bridge |
| logger.rs | Updated sentinel log paths, dual payload pattern support |
| sentinel.ts | Complete TypeScript bindings for Rust SentinelModule |
| CommandRouterServer.ts | Enhanced routing through JTAGSystemServer command interface |
| SentinelRunServerCommand.ts | Simplified to fire-and-forget Rust delegation |
| sentinel/status, logs/* | Migrated to Rust SentinelModule queries |
| Multiple .ts files | Removed legacy TypeScript sentinel implementations |
| LORA-MESH-DISTRIBUTION.md | New 830-line architecture document |
Files not reviewed (1)
- src/debug/jtag/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| static GLOBAL_EXECUTOR: std::sync::OnceLock<Arc<CommandExecutor>> = std::sync::OnceLock::new(); | ||
|
|
||
| /// Initialize the global command executor (called once at startup) | ||
| pub fn init_executor(registry: Arc<ModuleRegistry>) { | ||
| let log = super::logger("command-executor"); | ||
| let _ = GLOBAL_EXECUTOR.set(Arc::new(CommandExecutor::new(registry))); | ||
| log.info(&format!("Initialized (TS bridge: {})", TS_COMMAND_SOCKET)); | ||
| } | ||
|
|
||
| /// Get the global command executor | ||
| /// Panics if not initialized - this is intentional, executor MUST be initialized at startup | ||
| pub fn executor() -> Arc<CommandExecutor> { | ||
| GLOBAL_EXECUTOR.get() | ||
| .expect("CommandExecutor not initialized - call init_executor() at startup") | ||
| .clone() | ||
| } |
There was a problem hiding this comment.
The global executor initialization using std::sync::OnceLock doesn't handle re-initialization gracefully. If init_executor is called twice, the second call is silently ignored (line 134 uses let _ =). This could mask bugs where the executor is initialized with the wrong registry.
Consider either:
- Panicking if already initialized (fail-fast)
- Returning a Result to indicate if initialization succeeded
- Adding a debug log when re-initialization is attempted
| // WorkerClient sends data nested under "payload" field, extract it | ||
| // ORMRustClient sends data at top level - support both patterns | ||
| let payload_value = if let Some(nested) = params.get("payload") { | ||
| nested.clone() | ||
| } else { | ||
| params.clone() | ||
| }; | ||
|
|
||
| let payload: WriteLogPayload = | ||
| serde_json::from_value(params).map_err(|e| format!("Invalid payload: {e}"))?; | ||
| serde_json::from_value(payload_value).map_err(|e| format!("Invalid payload: {e}"))?; |
There was a problem hiding this comment.
The logger payload extraction logic at lines 563-569 creates an inconsistency where the same data can be passed either nested under "payload" or at the top level. This dual-pattern support increases complexity and makes the API ambiguous for callers.
Consider standardizing on one pattern across the codebase, or document this behavior clearly in the command schema.
| // Get JTAGSystemServer instance | ||
| const { JTAGSystemServer } = await import('../../../system/core/system/server/JTAGSystemServer'); | ||
| const system = JTAGSystemServer.instance; | ||
|
|
||
| const response: CommandResponse = { | ||
| success: true, | ||
| result | ||
| }; | ||
| if (!system) { | ||
| throw new Error('JTAGSystemServer not initialized'); | ||
| } |
There was a problem hiding this comment.
The CommandRouterServer now requires JTAGSystemServer to be initialized before it can route commands (line 123-128). However, there's no explicit initialization order guarantee or error handling if the system isn't ready yet. If a Rust worker tries to execute a command before JTAGSystemServer is fully initialized, it will fail with "JTAGSystemServer not initialized".
Consider adding initialization ordering documentation or a startup sequencing mechanism to ensure the server is ready before workers connect.
| // Use sentinel/run which spawns a task for the pipeline | ||
| const result = await rustClient.sentinelRun({ | ||
| type: 'pipeline', | ||
| command: 'pipeline', // Internal: tells Rust this is a pipeline | ||
| args: [], | ||
| workingDir, | ||
| env: { PIPELINE_JSON: JSON.stringify(pipeline) }, |
There was a problem hiding this comment.
The pipeline execution in sentinel/run (lines 55-61) passes the entire pipeline definition through an environment variable (PIPELINE_JSON). Large pipeline definitions could exceed environment variable size limits (typically 128KB on Linux, 32KB on Windows).
Consider passing the pipeline via a temporary file instead, or implementing a size check with an appropriate error message.
| // Use sentinel/run which spawns a task for the pipeline | |
| const result = await rustClient.sentinelRun({ | |
| type: 'pipeline', | |
| command: 'pipeline', // Internal: tells Rust this is a pipeline | |
| args: [], | |
| workingDir, | |
| env: { PIPELINE_JSON: JSON.stringify(pipeline) }, | |
| // Serialize pipeline and enforce a conservative env var size limit | |
| const pipelineJson = JSON.stringify(pipeline); | |
| const maxEnvSize = process.platform === 'win32' ? 32 * 1024 : 128 * 1024; | |
| if (pipelineJson.length > maxEnvSize) { | |
| throw new Error( | |
| `Pipeline definition is too large to pass via environment variable (size=${pipelineJson.length} bytes, limit=${maxEnvSize} bytes). ` + | |
| 'Please reduce the pipeline size or use a different execution mechanism that does not rely on environment variables.' | |
| ); | |
| } | |
| // Use sentinel/run which spawns a task for the pipeline | |
| const result = await rustClient.sentinelRun({ | |
| type: 'pipeline', | |
| command: 'pipeline', // Internal: tells Rust this is a pipeline | |
| args: [], | |
| workingDir, | |
| env: { PIPELINE_JSON: pipelineJson }, |
| // NOTE: Rust IPC expects 'command' field, not 'type' | ||
| // The JTAGRequest interface uses 'type' but ORMRustClient uses 'command' | ||
| // We need to include both for compatibility | ||
| const request: WorkerRequest<TReq> & { command: string } = { | ||
| id: generateUUID(), | ||
| type, | ||
| command: type, // Rust IPC looks for 'command' field | ||
| timestamp: new Date().toISOString(), | ||
| payload, | ||
| userId: userId ?? this.defaultUserId |
There was a problem hiding this comment.
WorkerClient now duplicates the command in both 'type' and 'command' fields (lines 238-244) for Rust IPC compatibility. This creates redundancy and increases the chance of inconsistency if one field is updated but not the other.
Consider updating the Rust IPC layer to accept 'type' consistently, or create a mapping layer that doesn't require duplicating data in the request object.
Three root causes fixed in generate-command-schemas.ts: 1. extractDescription now finds NEAREST JSDoc block (not first), strips * prefixes, skips title-pattern lines like "Foo - Types" 2. readReadmeDescription reads first paragraph from command README.md as primary description source, falls back to cleaned JSDoc 3. deduplicateSchemas merges entries sharing a name — unions params, marks variant-only params optional, picks best description. sentinel/run: 7 entries → 1 with 26 merged params Also: cleaned sentinel JSDoc, removed duplicate SentinelListParams from SentinelLoadTypes.ts. Broken descriptions: 205 → 0.
Updated interface-level JSDoc across 35 command Types files that had generic descriptions like "Parameters for X command" or missing JSDoc entirely. Also fixed extractDescription to handle single-line /** text */ JSDoc blocks. Result: 0/253 commands with weak descriptions.
…up stale refs - Add persona autonomy philosophy and Three Pillars convergence (Sentinel + Genome + Academy) - Rewrite Implementation Status for Rust-centric architecture, remove deleted TS class references - Expand Olympics from 12 to 24 validation tasks across 12 categories - Add TODO implementation roadmap with 6 prioritized phases (A-F) - Clean up References section, remove dead links
…step signatures
New step types:
- Parallel: concurrent branch execution with context snapshots and failFast
- Emit: publish interpolated events on MessageBus for inter-sentinel composition
- Watch: block until matching event arrives (glob patterns, configurable timeout)
- Sentinel: execute nested pipelines inline (recursive composition)
Enhanced existing:
- Loop: 4 termination modes (count, while, until, continuous) with maxIterations safety limit
- Interpolation: named outputs ({{named.label.output}}), type-preserving JSON interpolation
- All 9 step handlers now take uniform PipelineContext for consistent registry/bus access
Pipeline engine is now ~90% complete with 9 step types covering sequential, conditional,
looping, parallel, event-driven, and nested composition patterns.
Tested all pipeline step types through unit tests and live execution: - Shell: echo, nonzero exit, space-in-cmd passthrough, timeout, interpolation - Condition: true/false branches, interpolated conditions, failing substeps - Loop: count, while, until, continuous modes, iteration variable, safety limits - Parallel: concurrent branches (timing-verified), failure propagation, multi-step - Emit: bus publishing, event/payload interpolation, requires-bus guard - Watch: event matching (exact/wildcard/segment), timeout, ignores non-matching - Sentinel: nested pipelines, input inheritance/override, failure propagation - Executor integration: linear pipelines, stop-on-failure, condition branching, loop+interpolation, parallel, emit+watch composition, nested sentinel, step output forwarding, empty pipeline, missing registry error Also deleted 2 broken TypeScript test files referencing removed infrastructure (SentinelExecutionLog, SentinelWorkspace).
…d, fix native tool passing Phase 1: Extract AgentToolExecutor from PersonaToolExecutor - Universal tool execution (corrections, loop detection, parsing, content cleaning) - PersonaToolExecutor delegates to AgentToolExecutor, keeps persona-specific logic Phase 2: ai/agent command — universal agentic loop - generate → parse tool calls → execute tools → feed results → re-generate - Model-adaptive: native JSON tools (Anthropic/OpenAI) or XML fallback (DeepSeek) - Safety caps tiered by provider (25/10/5 iterations) - Added to ADMIN_COMMANDS to prevent recursive persona self-invocation Phase 3: Rust LLM step dual-mode routing - agentMode=false (default): fast in-process Rust call to ai/generate - agentMode=true: routes to TypeScript ai/agent via CommandExecutor IPC - Added tools, agentMode, maxIterations fields to PipelineStep::Llm Bug fixes discovered during verification: - Fix NativeToolSpec serde: remove rename_all=camelCase that broke tool deserialization (input_schema was silently dropped because Rust expected inputSchema) - Fix refreshToolDefinitions: pass includeDescription+includeSignature to list command (tool definitions had empty descriptions and no parameters)
…ommands The ai_provider Rust module claims the "ai/" prefix, intercepting TypeScript-implemented commands like ai/agent. Using the standard command_executor::execute() caused infinite recursion (registry routes back to ai_provider → stack overflow crashing continuum-core). Added execute_ts/execute_ts_json to CommandExecutor that go directly to the TypeScript Unix socket, bypassing the Rust ModuleRegistry. Updated ai_provider.rs and sentinel llm.rs to use these methods. Verified: 103 sentinel tests pass, sentinel agentMode LLM step completes end-to-end (Claude calls data/list tool, returns results).
Phase 1 of god class decomposition. PersonaResponseGenerator.calculateSimilarity, jaccardSimilarity, checkSemanticLoop and PersonaMessageEvaluator.computeTextSimilarity replaced with Rust IPC calls through RustCognitionBridge. New: persona/text_analysis module (similarity.rs, types.rs) with 20 unit tests. New: cognition/text-similarity and cognition/check-semantic-loop IPC commands. Net: -164 lines TS algorithm code, +642 lines (Rust impl + tests + bridge + IPC).
…, truncated tool, semantic) Adds garbage_detection.rs (8 checks ported from GarbageDetector.ts) and loop_detection.rs (per-persona DashMap state) to the Rust text_analysis module. PersonaResponseGenerator drops ~230 lines of inline validation, replaced by a single cognition/validate-response IPC call returning ValidationResult. Fix UTF-8 panic: find_consecutive_repeat now uses byte-level comparison instead of string slicing, preventing char boundary panics on emoji/multibyte.
…s all 13 Rust modules - New utils/params.rs: Params<'a> wrapper with typed extraction methods (str, uuid, u64, f64, bool, json<T>, aliases) - Migrated all 13 ServiceModule implementations from manual params.get().and_then() chains to Params helper - Eliminated ~160 manual extraction patterns across code.rs, voice.rs, memory.rs, ai_provider.rs, channel.rs, search.rs, mcp.rs, agent.rs, embedding.rs, models.rs, sentinel/mod.rs, sentinel/logs.rs, cognition.rs - Phase 3 text_analysis: mention_detection.rs, response_cleaning.rs, validation.rs moved from TS to Rust - Net: -726 lines, zero params.get() in IPC command handlers (3 intentional non-IPC uses remain)
- Add CommandResult::json() helper to replace serde_json::to_value().unwrap() pattern - Fix 15 to_value unwraps in data.rs, 17 lock/serde unwraps in logger.rs - Replace all Mutex/RwLock .lock().unwrap() with poison-recovering .unwrap_or_else() - Fix NaN-unsafe partial_cmp().unwrap() in embedding.rs, metrics.rs - Fix child.stdout/stderr.take().unwrap() in sentinel executor - Fix OnceLock/Array shape unwraps in voice STT/TTS modules - Fix FFI unwraps with proper error returns - Harden Params helper: u32 overflow protection, add bool_opt/i64_or/f64/f32 - Static LazyLock regexes for agent.rs, interpolation.rs (was per-call) - Remove duplicate regex in garbage_detection, dead code in response_cleaning - Compress channel.rs InboxMessage parsing from 50 lines to 14 via Params 30 files changed, 725 tests pass, zero warnings
- Delete GarbageDetector.ts (488 lines) — replaced by Rust garbage_detection.rs - Remove dead checkResponseRedundancy method + disabled redundancy gate (103 lines) - Fix ai/sleep: add params.userId to identity chain (tool executor sets this)
Three duplicate type systems (Rust ai/types.rs, TS AIProviderTypesV2.ts, TS AdapterTypes.ts) caused a production bug where --provider="groq" silently routed to DeepSeek because TS used preferredProvider while Rust expected provider. - Fix Rust u64 fields generating bigint: add #[ts(type = "number")] - Rewrite AIProviderTypesV2.ts as re-export layer from generated types - Rewrite AdapterTypes.ts to re-export from unified source - Rename preferredProvider → provider across 15 files - Rename responseTime → responseTimeMs across 30 files - Rename supportsFunctions → supportsTools across 15 adapter configs - Simplify AIProviderRustClient: remove RustAIResponse, direct passthrough - Fix ToolResult field names: tool_use_id → toolUseId, is_error → isError - Fix ContentPart tool_result is_error: boolean | null compatibility Verified: clean build, npm start, ping healthy, provider routing works for both anthropic and groq, AI personas responding in chat.
SystemDaemon: ORM returns plain objects, not class instances. configCache.get() failed because the POJO has no prototype methods. Fix: Object.assign(new SystemConfigEntity(), data) to hydrate. PersonaAutonomousLoop: 10+ personas each polling tasks every 10s = 1+ query/second hammering an empty collection, causing cascading timeouts. Fix: increase intervals (60s/60s/120s) + stagger starts with random 0-15s offset to prevent thundering herd.
Replace 42 TypeScript setIntervals (14 personas x 3 timers) with ONE Rust tick loop in ChannelModule. All background scheduling now runs in Rust: task polling, self-task generation, and training readiness checks. Rust changes: - Add tick_interval field to ModuleConfig (ServiceModule trait) - Wire Runtime.start_tick_loops() to spawn tokio tasks for tick-enabled modules - ChannelModule.tick() polls tasks, generates self-tasks, checks training - New SelfTaskGenerator struct in persona module (451 lines, 5 tests) TS cleanup (-937 lines): - PersonaAutonomousLoop: 345 → 188 lines (thin signal-based service loop) - Delete PersonaCentralNervousSystem.ts, CNSFactory.ts, SelfTaskGenerator.ts - Remove dead CNS callback methods from PersonaUser.ts - All cognition preserved: Rust engine handles priority, fast-path, scheduling
Expose tick loop configuration to TypeScript via ts-rs generated type. Runtime tick loop now re-reads interval each iteration (sleep-based, not fixed interval), allowing dynamic adjustment via channel/tick-config. - ChannelTickConfig struct with tick_interval_ms, enable flags, threshold - channel/tick-config command for runtime get/set (100ms floor) - Runtime tick loop uses sleep + config re-read (supports dynamic cadence) - Generated ChannelTickConfig.ts in shared/generated/runtime/
Consolidates response_cap, mention detection, rate limiting, sleep mode, directed mention filter, and fast-path decision into single cognition/full-evaluate Rust command. Removes ~250 lines of sequential async TS gating logic. New Rust module persona/evaluator.rs with 19 tests. Dual-write pattern keeps TS sleep/rate-limiter state in sync during migration. AI personas verified responding through new gate path.
New persona/model_selection.rs with domain-to-trait mapping and adapter priority chain (trait → current → any → base_model). 12 Rust tests. Replaces getEffectiveModel() + determineRelevantTrait() in PersonaResponseGenerator (~75 lines → 10 lines). Adapter registry synced from TS genome at init. cognition/select-model and cognition/sync-adapters IPC commands added.
… Rust ONE async Rust IPC call replaces 3 separate sync TS calls (parse + correct + strip). 68 new Rust tests covering Anthropic XML, function-style, bare JSON, markdown, old-style XML formats plus parameter correction, tool name codec, and integration.
GenomePagingEngine with 22 tests: eviction scoring (age/priority*10), critical adapter protection (priority>0.9), memory budget enforcement, and skill activation decisions. Rust decides what to evict/load, TypeScript executes GPU operations. PersonaGenome now delegates to Rust when bridge is available, falls back to local logic for tests.
Phase 5 of persona decision migration. The post-inference adequacy check (did another AI already answer?) now runs as a single Rust IPC call instead of N separate textSimilarity calls. Rust handles length filtering (>100 chars) and Jaccard n-gram similarity (>0.2 threshold) internally. All 5 phases complete: 1. Unified evaluation gate (5 TS gates → 1 Rust call) 2. Model selection (4-tier priority chain) 3. Tool call parsing (5 format adapters) 4. Genome paging (LRU eviction + memory budget) 5. Post-inference adequacy (batch similarity check)
…t analysis optimization Phase A: Unified Per-Persona State - Created PersonaCognition struct (persona/unified.rs) aggregating engine, inbox, rate_limiter, sleep_state, adapter_registry, genome_engine - CognitionState now holds single DashMap<Uuid, PersonaCognition> instead of 7 separate maps - Single lock per command, atomic cross-field access, better cache locality - Updated all ~18 command handlers across cognition module, channel module, and IPC server Phase B: Eliminate Dual Response Tracking - Removed TS rateLimiter.trackResponse() from PersonaMessageEvaluator (Rust sole authority) - Added cognition/has-evaluated and cognition/mark-evaluated IPC commands - Added hasEvaluatedMessage/markMessageEvaluated to RustCognitionBridge - PersonaUser chat dedup stays TS-local (separate concern from CognitionEngine pipeline dedup) Phase C: Dead Code Removal (-250 lines net) - Removed dead evaluated_messages HashSet from RateLimiterState in evaluator.rs - Removed dead AIDecisionService imports from PersonaMessageEvaluator - Gutted TS RateLimiter: removed trackResponse, isRateLimited, getResponseCount, hasReachedResponseCap, getRateLimitInfo, resetRoom (all now in Rust) - Updated RateLimiter.test.ts to match (337→112 lines) - Removed dead comment blocks and fallback code Phase D: Text Analysis Optimization - Added build_word_ngrams() and jaccard_from_sets() for compute-once reuse - check_semantic_loop: response ngrams computed once, reused across N history comparisons - check_response_adequacy: original ngrams computed once, reused across N response comparisons
Tool definitions and formatted memories were appended to the system prompt AFTER the RAG budget was calculated, causing unbounded context growth that crashed local models with NaN/Inf errors and emergency truncation. - New ToolDefinitionsSource (priority 45, 10% budget): handles native JSON tool specs and XML tool definitions within the budget system - SemanticMemorySource now produces systemPromptSection with formatted memories instead of PersonaResponseGenerator doing it as a bypass - PersonaResponseGenerator: deleted 3 bypass blocks (~100 lines), now reads tool specs and memories from RAG metadata only - RAGSourceContext/RAGBuildOptions carry provider + toolCapability so tool-aware sources know what format to produce - ToolFormatAdapter: default toolCapability changed from 'none' to 'xml' so all models get tools (budget truncation handles tight contexts) - Budget rebalanced across 11 sources to total 100% - Fixed <command> → <cmd> bug in PersonaToolDefinitions example
PersonaIdentitySource: 5-line stub → full rich prompt with room context, member list, self-awareness block, response format rules, meta-awareness. Previously AIs didn't know who was in the room or how to format responses. ToolDefinitionsSource: 4-tier priority ordering (critical > essential > rest) with sub-ordering by prefix (chat > code > decision > data > ai). Budget truncation now drops least-important tools instead of keeping alphabetically first. Native providers get 3000 token minimum (17 tools instead of 2).
Fix protocol mismatch: when native-capable providers (Groq, Together, Fireworks) output tool calls in text instead of structured tool_calls, the system now synthesizes native format instead of falling to XML path. - PersonaResponseGenerator + AiAgentServerCommand: branching fix — text-parsed tool calls from native providers route through native protocol (tool_use + tool_result content blocks), not XML - ToolFormatAdapter: coerceParamsToSchema() fixes type mismatches (string "true" → boolean) so APIs don't reject tool_use blocks - Rust parsers.rs: improved function-style format detection - PromptCapture integration for replay/debugging - RAG audit dump replaced with concise structured logging
- PromptCapture: JSONL-based prompt/response capture for replay and debugging — captures complete LLM context per persona per inference - PersonaIdentitySource: richer identity prompts with personality traits - ConversationHistorySource: improved message windowing and context - ToolDefinitionsSource: budget-aware tool injection refinements - ChatRAGBuilder: cleaner composition pipeline - Rust garbage_detection: expanded patterns for AI-generated noise - CLAUDE.md: documentation updates
Candle inference: - Remove 3 artificial limits from candle_adapter.rs (temp clamp, char/token truncation) - Improve quantized generation with logits sanitization and NaN detection - Set Candle context windows to 1400 tokens (empirically validated safe threshold at 1000) RAG budget (ChatRAGBuilder): - Small-context guard: skip non-essential system prompt injections for models with <1500 token budget - Provider-aware context window lookups in calculateSafeMessageCount and calculateAdjustedMaxTokens - Reduces Candle persona prompts from ~11K chars to ~3.8K chars (~860 tokens, safely under 1000 threshold) Tool calling: - Rust parser improvements for text-embedded tool calls (function= format from Groq/Together) - Tool definitions overhaul with proper parameter names and descriptions - ToolFormatAdapter improvements for cross-provider compatibility AI provider: - registerLocalModels() in AIProviderDaemonServer for proper Candle model registration - Provider-aware model config propagation through RAG pipeline RAG sources: - CodeToolSource, ConversationHistorySource improvements - PersonaIdentitySource budget-aware progressive inclusion - ToolDefinitionsSource refinements Verified: All 4 Candle personas produce coherent output, all cloud providers (Groq, Together, DeepSeek, Fireworks) responding correctly. NaN threshold test confirms 1000-token boundary.
…isions ModelRegistry now uses provider:modelId as internal key instead of just modelId. Same model on different providers (e.g., Llama-3.1-8B at 1400 tokens on Candle vs 131072 on Together) no longer collide via last-writer-wins. - ModelRegistry: provider-scoped keys, secondary index, getAll(), scoped and unscoped resolution (unscoped returns largest context window) - ModelContextWindows: added provider? param to all 7 exported functions - Fixed getInferenceSpeed bug: local providers (candle/ollama/sentinel) no longer incorrectly return 1000 TPS when found in registry - ChatRAGBuilder: deleted 3 isLocal workaround blocks, removed MODEL_CONTEXT_WINDOWS import, passes provider to all utility calls - RAGBudgetManager: provider param on constructor + static methods - GovernanceSource, ActivityContextSource: pass provider to lookups - RAG budget/load commands + MediaResize: added provider to params - 12 new tests (37 total passing) covering scoped lookups, ambiguity resolution, inference speed classification, and isSlowLocalModel
…ization Defines the complete capability vocabulary for algorithmic model selection: - QuantFormat enum: FP32 to Q2_K, GPTQ, AWQ - WeightFormat enum: GGUF, SafeTensors, MLX, PyTorch - AdapterMethod enum: LoRA, QLoRA, DoRA, IA3, prefix/prompt tuning, full - AdapterTarget enum: attention Q/K/V/O, MLP gate/up/down, embedding, lm_head - InferenceRuntime enum: Candle, llama.cpp, MLX, Ollama, vLLM, Transformers - Accelerator enum: Metal, CUDA, ROCm, CPU, cloud Composite types: - QuantizationProfile: format + bits + can-train-in-quantized (QLoRA flag) - LoRAProfile: max/recommended rank, alpha, concurrent adapters, stacking - FineTuningProfile: supported methods + LoRA config + gradient checkpointing - HardwareProfile: VRAM (inference + training), measured TPS, offload layers - ModelAdapterProfile: top-level composite attached to ModelMetadata Query helpers: isFineTunable(), supportsLoRA(), supportsAdapterStacking(), estimateAdapterVramMB(), fitsInVram() 21 new tests (58 total) covering all helpers, enum values, and registry integration (filtering models by fine-tunability across providers).
Summary
execute_ts_json${provider}:${modelId}with secondary index for fast unscoped lookupsKey Components
SentinelRunnerCommandExecutorModelRegistryModelCapabilitiesModelContextWindowsProvider-Scoped Registry Architecture
Bug fixed:
getInferenceSpeed()now correctly returns 40 TPS for local providers (candle/ollama/sentinel) instead of assuming 1000 TPS (cloud) for any registry hit.Consumer cleanup: Deleted 3 duplicated
isLocalworkaround patterns from ChatRAGBuilder, passed provider through 8 consumer files.ModelCapabilities — The "Knowing" Layer
Defines everything needed for algorithmic model selection and LoRA genome paging:
QuantFormat— 14 quantization levels (FP32 → Q2_K, GPTQ, AWQ)AdapterMethod— 8 PEFT techniques (LoRA, QLoRA, DoRA, IA3, prefix/prompt tuning)AdapterTarget— 9 targetable transformer layers (attention Q/K/V/O, MLP, embeddings)InferenceRuntime— 9 runtimes (Candle, llama.cpp, MLX, Ollama, vLLM, cloud)ModelAdapterProfile— composite type combining quantization + fine-tuning + hardwareisFineTunable(),supportsLoRA(),supportsAdapterStacking(),estimateAdapterVramMB(),fitsInVram()Test Results
Test plan