Skip to content

Comments

Sentinel LoRA training + RAG budget fix + fallback elimination#270

Merged
joelteply merged 22 commits intomainfrom
feature/sentinel-lora-training
Feb 18, 2026
Merged

Sentinel LoRA training + RAG budget fix + fallback elimination#270
joelteply merged 22 commits intomainfrom
feature/sentinel-lora-training

Conversation

@joelteply
Copy link
Contributor

Summary

  • Sentinel pipeline engine: 9 step types, 103 Rust tests, agentic loop
  • LoRA training pipeline: PEFT train → discover → load → merge → inference (proven E2E)
  • Academy Dojo: dual-sentinel teacher/student architecture with coding challenges
  • Candle inference: ModelBackend trait, BF16 context cap at 2048, hard error on oversized prompts
  • RAG token budget fix: totalBudget from contextWindow (not hardcoded 8000), chars/3 estimation, isSmallContext threshold
  • Type strictness: modelId/provider required across pipeline, eliminated all candle/LOCAL_MODELS fallbacks
  • Ollama purge: complete removal

Test plan

  • TypeScript compiles clean (npm run build:ts)
  • Rust compiles clean (cargo check)
  • All cloud AI personas respond to chat messages
  • All 4 candle personas respond (Helper AI, Teacher AI, CodeReview AI, Local Assistant)
  • Candle prompt sizes within 2048 context (verified 137-935 tokens in candle.log)
  • 103 Rust tests pass

- genome/dataset-prepare: extract chat history into JSONL training data
- genome/train: PEFT LoRA training with Python subprocess, returns adapter
- genome/training-pipeline: Sentinel pipeline orchestrating prepare→train→register→activate
- AdapterPackage: manifest.json packaging with SHA-256 content hashing, size calculation
- GenomeLayerEntity persistence: genome/train creates database record after training
- genome/paging-adapter-register: accepts layerId to hydrate from persisted entity
- LoRATrainingPipeline wires layerId via {{steps.1.0.data.layerId}} interpolation
- 74 unit tests (11 new for AdapterPackage + pipeline layerId wiring)
Entities: AcademySession, AcademyCurriculum, AcademyExamination
Commands: genome/dataset-synthesize (LLM data generation), genome/academy-session (orchestration)
Pipelines: TeacherPipeline (curriculum→synthesize→exam→grade), StudentPipeline (train→answer→report)
Extended PipelineStep bindings with emit/watch/parallel/sentinel step types
101 unit tests passing, integration tests for both new commands
…op-relative refs

Three enhancements to the Rust interpolation engine that enable the Academy
dual-sentinel pipeline to execute end-to-end:

- Multi-pass nested interpolation: {{steps.0.output.topics.{{input.iteration}}.name}}
  resolves inner patterns first via regex matching innermost {{}} only
- traverse_json_path(): array indexing (topics.0.name) and JSON string auto-parsing
  for structured LLM output traversal
- {{loop.N.field}} syntax: stable intra-loop step referencing via _loop_base offset,
  so loop.0.data always means "first sub-step of current iteration"

Pipeline command routing fix: sentinel command steps now route through TypeScript
(execute_ts_json) instead of Rust module registry, avoiding the data/ prefix
collision where Rust DataModule intercepted commands meant for TypeScript context
injection (dbPath, sessionId, userId).

ORMRustClient.store() fix: returns Rust-generated entity ID instead of echoing
back original input data (which lacked the auto-generated UUID).

Pipeline template fixes: correct watch payload paths (data.payload.X), entity ID
paths (data.data.id), LLM output traversal (output.X not data.X for parsed JSON),
session-scoped adapter names, system default model for student exams.

106 Rust sentinel tests pass. Demonstrated 6 of 9 step types in live dual-sentinel
orchestration: LLM, Command, Emit, Watch, Loop, Condition.
… to inbox

- SentinelEntity class with field decorators, registered in EntityRegistry
- SentinelEscalationService: event-driven bridge routing sentinel lifecycle
  events (complete/error/cancelled) to owning persona's inbox
- Persona ownership: parentPersonaId on all sentinels, academy-session wires it
- Execution tracking: handle→entity mapping, persistExecutionResult()
- sentinel/save + sentinel/run extended with persona ownership params
- TaskEntity: new 'sentinel' domain + 4 sentinel task types
- Architecture docs updated with lessons learned + multi-modal roadmap
- 111 unit tests passing (11 new for SentinelEntity + escalation rules)
- MemoryType.SENTINEL: sentinel executions stored as durable persona memories
- SentinelTriggerService: auto-execute sentinels on event/cron/immediate triggers
  with debounce, concurrent execution guards, and dynamic registration
- PersonaTaskExecutor: sentinel task handlers + recallSentinelPatterns() for
  querying past sentinel executions when processing similar tasks
- InboxTask metadata: typed sentinel fields (sentinelName, entityId, handle, status)
- 125 unit tests passing (14 new: memory types, cron parsing, trigger validation)
- genome/phenotype-validate command: LLM-as-judge scores pre/post training responses
- Student pipeline pre-test baseline (loop.1) before training establishes comparison point
- Quality gate condition (loop.10): only registers adapters with measurable improvement
- inference:demo and quality:gate:failed event payloads in AcademyTypes
- 138 tests passing (13 new covering phenotype scoring, quality gate, pipeline structure)
Phase C complete:
- genome/compose command: merges multiple LoRA layers into stacked genome
- Student pipeline: paging-activate after registration (LRU eviction)
- Student pipeline: post-loop genome/compose step merges all trained adapters
- Fix GenomeAssemblyTypes Timestamp import (pre-existing tech debt)

Phase D remediation:
- Teacher pipeline restructured with inner exam retry loop
- On failure: synthesizes targeted remedial data from weakAreas feedback
- Re-emits dataset:ready for student re-training, up to maxTopicAttempts
- TopicRemediatePayload and RemediationDatasetReadyPayload types

153 tests passing (15 new covering composition, paging, remediation)
- CompetitionTypes: CompetitorEntry, TopicGap, GapAnalysis, TournamentRound/Ranking, competition events
- CompetitionEntity: academy_competitions collection with 2+ competitor validation
- genome/academy-competition: spawns 1 shared teacher + N student sentinels per competitor
- genome/gap-analysis: per-topic field stats, weakness identification, remediation priorities
- 177 tests passing (24 new for competition types, entity, command types, gap analysis, events)
Candle is the ONLY local inference path. All 75+ files updated:
- Type system: ollamaModelName→trainedModelName, InferenceRuntime.OLLAMA→CANDLE,
  ModelTier 'ollama-capable'→'local-capable', embedding provider 'ollama'→'fastembed'
- Runtime: inference-worker routes 'candle'/'local' to CandleAdapter,
  VisionDescriptionService uses candle, PersonaModelConfigs deduplicated
- Comments: all Ollama references replaced with Candle/PEFT/local equivalents
- Tests: fixtures updated (219 affected unit tests pass, 0 regressions)
- Wire format: TS maps trainedModelName→ollama_model_name for Rust compat
  (Rust-side rename deferred to separate cargo test cycle)
Two critical bugs causing external WebSocket clients to hang forever:

1. JTAGRouter.handleIncomingRequest had no try/catch around
   routeToSubscriber — thrown errors propagated without sending
   a response, leaving clients waiting indefinitely.

2. CommandDaemon.processMessage threw on missing sessionId instead
   of returning an error response, triggering the above silent hang.

Also: ConnectionBroker ESM fix, vitest config with path aliases,
raw WebSocket diagnostic script, and integration tests properly
registering client via JTAGClient.registerClient().

All 10 sentinel-lora-training integration tests pass (50s).
- Remove unused TEXT_LENGTH imports from AcademySessionEntity and
  AcademyCurriculumEntity
- Use TEXT_LENGTH.UNLIMITED constant in AcademyExaminationEntity
  instead of raw maxLength: 0
- Replace all 9 `as any` casts in GenomeAcademySessionServerCommand
  with proper DataCreate/DataUpdate/PipelineSentinelParams types
- Replace all 15 `as any` casts in GenomeAcademyCompetitionServerCommand
  with same proper types
- Add missing browser command for academy-competition
- Add READMEs for genome/dataset-synthesize, genome/academy-session,
  and genome/academy-competition

177 unit tests pass, TypeScript compiles clean.
- genome/train: DataCreate.execute(), UUID layerId type
- genome/compose: DataRead.execute<GenomeLayerEntity>(), DataCreate,
  typed GenomePagingAdapterRegister and GenomeActivate params
- genome/gap-analysis: DataRead.execute<CompetitionEntity>(),
  DataList.execute<BaseEntity>() with readonly items
- genome/paging-adapter-register: DataRead.execute<GenomeLayerEntity>()

Only 1 `as any` remains across all genome server commands (enum
check in job-create). All 10 integration tests pass live.
Five gaps prevented trained LoRA adapters from affecting inference:
1. activeAdapters not on Rust wire type (TS-only)
2. AIProviderRustClient stripped activeAdapters from IPC payload
3. CandleAdapter.generate_text() never called load_lora/apply_lora
4. Candle registered as quantized() which rejects LoRA
5. Model mismatch: training on SmolLM2, inference on Llama-3.1-8B

Fixes:
- Add ActiveAdapterRequest to Rust wire type (ts-rs generated)
- Wire activeAdapters through AIProviderRustClient to Candle
- Add ensure_adapters() to CandleAdapter for LoRA loading + stacking
- Switch Candle to regular mode (BF16, LoRA-compatible)
- Unify all model references to LOCAL_MODELS.DEFAULT (Llama-3.2-3B)
- Eliminate duplicate model mapping table in PEFTLoRAAdapter
- Add AdapterStore: filesystem-based single source of truth for
  adapter discovery (replaces hardcoded paths in LimbicSystem)
- Add path validation in PersonaGenome.getActiveAdaptersForRequest()
- Fix SystemPaths.genome.adapters to match actual directory
- Fix lora.rs directory path resolution for adapter loading

48 files changed across Rust, TypeScript, and Python.
All native AIs (Helper, Teacher, CodeReview, Local Assistant) verified
responding after deployment.
WP1: KnowledgeTypes.ts foundation — SourceKnowledge, ExtractedFact,
     DataSourceConfig (5 source types), BenchmarkDefinition/Result
WP2: groundingContext on genome/dataset-synthesize — grounded synthesis
     forces LLM to trace all answers to verified facts
WP3: KnowledgeExplorationPipeline — builds sentinel pipelines that
     explore git repos, web pages, conversations, or documents then
     extract structured facts via LLM
WP4: TeacherPipeline rewrite — dynamic step indexing, optional
     knowledge exploration, backward compatible
WP5: BenchmarkPipeline — auto-generates persistent test suites from
     extracted knowledge, plus runner pipeline for scoring
WP6: SearchRateLimiter — Brave API quota tracking, 24hr LRU cache,
     in-flight request deduplication
WP7: Documentation updates — completion criteria table, Phase D.5,
     PRACTICAL-ROADMAP LoRA status correction

4 E2E tests: knowledge-synthesis-repo, benchmark-generation,
web-research-synthesis, sentinel-multi-step-pipeline
Replace Node.js spawn() in BaseServerLoRATrainer with
RustCoreIPCClient.sentinelExecute() — Python training subprocess
now runs under Rust's SentinelModule which provides:
- kill_on_drop: automatic cleanup if handle is dropped
- Timeout enforcement at the Rust tokio level
- Log capture to .sentinel-workspaces/{handle}/logs/
- Handle-based tracking: cancellable, status-queryable
- Concurrent execution limits (max_concurrent in Rust)

Sentinel handle propagates through LoRATrainingResult →
GenomeTrainResult so callers can inspect logs/status.

Verified: lora-inference-improvement E2E test passes (0% → 100%)
…rain

SentinelEventBridge polls Rust sentinel handles and emits TypeScript Events,
bridging the IPC boundary for widgets and services. genome/train now supports
async mode (returns handle immediately) alongside sync mode (default, blocks).
TrainingCompletionHandler processes async results on completion.

E2E verified: 0% → 80% Nexaflux improvement with full pipeline.
…se types

- sentinel/run sync mode (async=false): sentinelExecute polls until
  completion, returns output directly instead of unavailable stepResults
- CLI timeout: sentinel commands added to 300s category (LLM pipeline
  steps need minutes, not the 10s default)
- sentinelExecute crash fix: pipeline-type sentinels don't produce log
  streams, added try/catch fallback to status.handle.error
- BenchmarkPipeline runner: data/list+filter instead of data/read
  (academy_benchmarks is a dynamic collection without registered entity)
- BenchmarkPipeline: removed apostrophe from grading prompt that broke
  shell single-quote wrapping in CLI test harness
- recipe-load test: fixed response structure (client proxy returns flat
  payload, not wrapped in commandResult), collection → collectionName
- genome-crud test: replaced undefined DATA_COMMANDS with literals,
  reduced embedding dims from 768→16, fixed nested result.data.id path
- genome-fine-tuning-e2e: generates inline dataset if fixture missing
- All 6 test suites: replaced references to unavailable stepResults/
  stepsCompleted with success+output fields from sync pipeline mode
- CRUDTestUtils: added DATA_COMMANDS constant for shared test use

Validated: sentinel-pipeline 4/4, genome-crud 4/4, recipe-load 4/4,
benchmark-generation 4/4, lora-inference-improvement 0%→100%
Monitors active personas, triggers training for those with enough
accumulated data. Throttles to max 1 concurrent GPU training job.
Integrates with PersonaUser serviceInbox loop.
- BenchmarkEntity and BenchmarkResultEntity: proper entities for academy
  benchmarks, replacing hardcoded collection strings with registered types
- BenchmarkPipeline: uses entity .collection constants instead of raw strings
- CodingChallengePipeline: deterministic coding challenge evaluation via
  sentinel — reads buggy source, runs tests, LLM fixes, re-runs tests,
  scores pass/fail with no LLM grading bias
- sentinelExecute: fix empty output for pipeline-type sentinels by falling
  back to last step output from steps log when combined log is empty
- Integration tests: coding-challenge-benchmark (100% score on task-manager
  3-bug challenge), benchmark-generation regression test updated
RAG budget was using chars/4 estimation (250 tokens/msg) but Llama tokenizer
averages chars/3 — causing 35% underestimate. Combined with hardcoded
totalBudget=8000, minMessages=5 floor, Math.max(50,...) output floor, and
isSmallContext threshold too low at 1500, candle personas (2048 context) had
prompts exceeding context window and silently failing.

Fixes:
- totalBudget derived from contextWindow * 0.75 (not hardcoded 8000)
- avgTokensPerMessage: 250 → 350 (chars/3 estimation)
- Removed minMessages floor that forced 5 messages when budget allowed 4
- Removed Math.max(50,...) output token floor (0 = budget blown, not 50)
- isSmallContext threshold: 1500 → 3000 (skips injections for small models)
- calculateAdjustedMaxTokens uses actual content chars/3 not flat 250/msg

Type strictness (compiler-enforced, no runtime fallbacks):
- modelId and provider REQUIRED on RAGBuildOptions, AIGenerateParams,
  ThoughtStreamParams, RAGInspectParams
- model and provider REQUIRED on ModelConfig (UserEntity)
- getModelConfigForProvider() throws on unknown provider (no candle fallback)
- PersonaUser validates merged modelConfig (entity + provider defaults)
- Eliminated all || 'candle', ?? 'candle', || LOCAL_MODELS.DEFAULT fallbacks
Rust:
- ModelBackend trait unifying safetensors and GGUF backends
- backends/llama_safetensors.rs + llama_gguf.rs with BF16_PRACTICAL_CONTEXT
- Vendored quantized_llama.rs for future GGUF context window fix
- DomainClassifier for persona task routing
- Self-task generator, genome paging, cognition module updates
- Channel module and unified persona updates

TypeScript:
- Academy session command + types for coding challenges
- CodingStudent/CodingTeacher/ProjectStudent/ProjectTeacher pipelines
- CandleGrpcAdapter with correct model ID and IPC query
- ModelContextWindows: Llama-3.2-3B at 2048 (BF16 practical limit)
- ModelRegistry console.log cleanup
- PersonaGenome, PersonaAutonomousLoop, RustCognitionBridge updates
- TrainingDataAccumulator, MotorCortex, PersonaMemory fixes
- QueueItemTypes, PersonaTaskExecutor updates
- Project scaffolds (ecommerce-api, url-shortener)
- Integration + unit tests

All compiles clean (TypeScript + Rust).
Copilot AI review requested due to automatic review settings February 18, 2026 21:33
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a comprehensive Sentinel pipeline engine with LoRA training capabilities, fixes RAG token budget calculations, and removes all Ollama references in favor of Candle-based local inference.

Changes:

  • Introduced Sentinel pipeline system with 9 step types and 103 Rust tests for orchestrating complex multi-step workflows
  • Added end-to-end LoRA training pipeline: dataset preparation → PEFT training → adapter registration → activation → inference
  • Implemented Academy Dojo architecture with dual-sentinel teacher/student system for autonomous skill training
  • Fixed RAG token budget to use model's actual context window instead of hardcoded values
  • Made modelId/provider required across pipeline to eliminate fallback chains and ensure correct token calculations
  • Replaced all Ollama references with Candle for local inference

Reviewed changes

Copilot reviewed 191 out of 288 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/debug/jtag/system/sentinel/pipelines/LoRATrainingPipeline.ts New pipeline template for orchestrating LoRA training workflow
src/debug/jtag/system/sentinel/index.ts Added exports for SentinelEntity class and escalation/trigger services
src/debug/jtag/system/sentinel/entities/SentinelEntity.ts Database entity for persisting sentinel definitions with execution history
src/debug/jtag/system/rag/shared/RAGTypes.ts Changed modelId/provider from optional to required for correct budget calculation
src/debug/jtag/system/genome/fine-tuning/shared/FineTuningTypes.ts Added QLoRA quantization options and removed Ollama references
src/debug/jtag/commands/genome/train/shared/GenomeTrainTypes.ts New command types for executing LoRA training via PEFT
src/debug/jtag/commands/genome/dataset-prepare/shared/GenomeDatasetPrepareTypes.ts New command for collecting training data from chat history
src/debug/jtag/commands/genome/academy-session/shared/GenomeAcademySessionTypes.ts Entry point for Academy Dojo dual-sentinel training system
src/debug/jtag/commands/sentinel/run/server/SentinelRunServerCommand.ts Added async/sync modes and sentinel handle registration for escalation
src/debug/jtag/commands/ai/generate/shared/AIGenerateTypes.ts Made model/provider required instead of optional
src/debug/jtag/cli.ts Added sentinel commands to long-timeout category and CLI timeout override support
Files not reviewed (1)
  • src/debug/jtag/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


import type { Pipeline, PipelineStep } from '../../../workers/continuum-core/bindings/modules/sentinel';
import type { UUID } from '../../core/types/CrossPlatformUUID';
import { LOCAL_MODELS } from '@system/shared/Constants';
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import of LOCAL_MODELS is used for the default value in line 51, but this creates a dependency on a constant that may not be relevant to all pipeline contexts. Consider making baseModel a required parameter or documenting why LOCAL_MODELS.DEFAULT is the appropriate fallback for LoRA training pipelines.

Copilot uses AI. Check for mistakes.
Comment on lines +199 to +200
modelId: string; // Target model ID — drives context window, token budget, everything
provider: string; // AI provider (e.g. 'anthropic', 'candle', 'deepseek') — scopes model lookup
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making modelId and provider required is a breaking change that affects all callers of RAGBuildOptions. Consider adding migration documentation or a deprecation period with warnings for callers that don't provide these fields.

Suggested change
modelId: string; // Target model ID — drives context window, token budget, everything
provider: string; // AI provider (e.g. 'anthropic', 'candle', 'deepseek') — scopes model lookup
// NOTE: These fields remain optional for backward compatibility. Callers SHOULD provide them;
// omission is deprecated and may become a hard requirement in a future major version.
modelId?: string; // Target model ID — drives context window, token budget, everything
provider?: string; // AI provider (e.g. 'anthropic', 'candle', 'deepseek') — scopes model lookup

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +58
if (params.layerId) {
const readResult = await DataRead.execute<GenomeLayerEntity>({
collection: GenomeLayerEntity.collection,
id: params.layerId,
});

if (!readResult.success || !readResult.data) {
return createGenomePagingAdapterRegisterResultFromParams(params, {
success: false,
registered: false,
error: `GenomeLayerEntity not found for layerId: ${params.layerId}`,
});
}
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The layerId hydration logic creates two distinct code paths (layerId vs raw params). Consider extracting this into a helper function like resolveAdapterParams(params) to improve readability and testability.

Copilot uses AI. Check for mistakes.
Comment on lines +38 to +39
model: string;
provider: 'openai' | 'anthropic' | 'local' | 'candle' | 'groq' | 'deepseek';
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making model and provider required is a breaking API change. All existing callers that relied on defaults will break. Consider providing migration guidance or a compatibility layer that infers these from context when missing.

Suggested change
model: string;
provider: 'openai' | 'anthropic' | 'local' | 'candle' | 'groq' | 'deepseek';
// May be omitted by callers that rely on defaults inferred from context.
model?: string;
provider?: 'openai' | 'anthropic' | 'local' | 'candle' | 'groq' | 'deepseek';

Copilot uses AI. Check for mistakes.
Comment on lines +90 to +99
try {
const parsed = JSON.parse(result.output);
if (Array.isArray(parsed)) {
stepResults = parsed;
} else if (parsed.stepResults) {
stepResults = parsed.stepResults;
}
} catch {
// Output wasn't JSON — that's fine, raw text is also valid
}
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON parsing logic assumes two possible structures (array or object with stepResults property) without documenting when each format is expected. Add comments explaining which sentinel types produce which output format.

Copilot uses AI. Check for mistakes.
Comment on lines +328 to +333
// Merge Rust-generated fields (id, metadata) into the returned entity
// Rust auto-generates the UUID if not provided; the original `data` may lack it
const rustRecord = response.result?.data;
const mergedData = rustRecord
? { ...data, id: rustRecord.id ?? data.id } as T
: data;
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The merge logic prioritizes rustRecord.id over data.id, but doesn't document why this is necessary or what happens if both are present and differ. Add a comment explaining the precedence rules.

Copilot uses AI. Check for mistakes.
Comment on lines +381 to +383
// Extract --timeout from params (CLI-level override, not a command parameter)
const userTimeoutMs = params.timeout ? Number(params.timeout) : undefined;
delete params.timeout;
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting params.timeout after extraction could cause issues if the command legitimately expects a timeout parameter. Consider using a different naming convention (e.g., --cli-timeout) to avoid conflicts.

Suggested change
// Extract --timeout from params (CLI-level override, not a command parameter)
const userTimeoutMs = params.timeout ? Number(params.timeout) : undefined;
delete params.timeout;
// Extract --timeout from params (CLI-level override)
const userTimeoutMs = params.timeout ? Number(params.timeout) : undefined;

Copilot uses AI. Check for mistakes.
@joelteply joelteply merged commit b3bcdee into main Feb 18, 2026
2 of 5 checks passed
@joelteply joelteply deleted the feature/sentinel-lora-training branch February 18, 2026 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant