Skip to content

feat: short-term memory system#52

Open
cmac86 wants to merge 10 commits intomainfrom
feat/short-term-memory
Open

feat: short-term memory system#52
cmac86 wants to merge 10 commits intomainfrom
feat/short-term-memory

Conversation

@cmac86
Copy link
Member

@cmac86 cmac86 commented Feb 6, 2026

Summary

  • Adds deterministic short-term memory with three storage mechanisms: auto-store from tool memory_hint, explicit memory_short tool (store/get/delete/list), and HTTP API
  • File-based JSON persistence on Docker caal-memory volume with 7-day default TTL
  • Memory context injected as LLM awareness hint (after first user message) to enable tool chaining — e.g. "is my flight on time?" triggers memory_short(get) → flight_tracker
  • Frontend Memory Panel (Brain icon) with entry list, source badges (tool/voice/api), inline edit, and clear all
  • i18n translations for en, fr, it
  • Fixes Docker permission issues for /app/registry_cache.json and memory persistence

Architecture

src/caal/memory/          # Package (future-proofed for long-term memory)
├── base.py               # Shared types (MemoryEntry, MemorySource, MemoryStore)
├── short_term.py          # ShortTermMemory singleton, file persistence, TTL
└── __init__.py

src/caal/integrations/memory_tool.py  # MemoryTools mixin (memory_short function_tool)

Three storage paths:

  1. Tool hint — n8n workflows return memory_hint in response → auto-stored
  2. Explicit tool — user says "remember my flight is UA1234" → memory_short(store)
  3. HTTP APIPOST /memory for external systems

Context injection serves as awareness layer — the LLM sees what's in memory so it knows to chain tools (e.g. pull email from memory → send via Gmail), but retrieval still goes through the tool for verification.

Test plan

  • API: store, get, list, delete, clear via curl
  • Explicit tool: "remember my flight number is UA1234" → stored and retrievable
  • Tool chaining: "is my flight on time?" → memory_short(get) → web_search
  • Cross-tool chaining: "email Ashley" → memory_short(get) → gmail(send)
  • Clean greeting: memory not announced on session start
  • Frontend panel: entries display with source badges, timestamps, TTL
  • Inline edit: pencil icon → textarea → save
  • Docker persistence: entries survive container restart via /app/data volume

🤖 Generated with Claude Code

cmac86 and others added 10 commits February 3, 2026 14:16
Adds deterministic short-term memory with three storage mechanisms:
- Auto-store from tool responses via memory_hint field
- Explicit memory_short tool (store/get/delete/list actions)
- HTTP API endpoints for external access

Backend: src/caal/memory/ package with file-based JSON persistence,
singleton pattern, TTL support, and context injection into LLM.

Frontend: Memory Panel UI with Brain icon button, entry list,
detail modal, and clear all functionality.

Includes i18n translations for en, fr, it.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change default TTL from 24h to 7 days (604800s)
- Allow tools to specify custom TTL in memory_hint:
  - Simple value: uses default 7d TTL
  - {"value": ..., "ttl": seconds}: custom TTL
  - {"value": ..., "ttl": null}: no expiry

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace linear execute→stream→retry with a loop that supports
multi-step tool chaining. Model can now: call tool A → get result →
call tool B → get result → generate text response.

Previously, after one tool execution the code tried to stream a text
response. If the model wanted to chain (call another tool), it
produced 0 text chunks, triggering a retry without tools that crashed
Ollama (tool references in messages but no tools registered).

New flow:
- Loop non-streaming chat() calls (max 5 rounds)
- Each round: if tool_calls → execute → loop back
- When no tool_calls → yield content or stream final response
- Safety fallback: _strip_tool_messages converts tool messages to
  plain text if Ollama still crashes on the streaming path

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…o context

- Deduplicate identical tool calls within a single round (same name + args)
- Accumulate tool names/params across chained rounds for frontend indicator
- Keep tool indicator showing after response (don't clear when tools were used)
- Include tool call arguments in ToolDataCache context injection

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Memory file was failing with permission denied because /app is
owned by root. Now uses CAAL_MEMORY_DIR=/app/data (the caal-memory
volume) and entrypoint ensures directory is writable by agent user.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prevents the LLM from using memory data in the initial greeting.
Memory context is now skipped when there are no user messages yet.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…haining

Context injection helps the LLM know what's in memory so it can
chain tools correctly (e.g. memory_short → flight_tracker). Without
it, the model may skip memory and go to other tools directly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…missions

- Memory detail modal now has pencil icon to edit values in-place
- Add registry_cache.json symlink to entrypoint.sh (same pattern as
  settings.json) to fix permission denied on /app/registry_cache.json

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ministral-3 recommended instruction temperature is 0.15. The old 0.7
default was overriding the Modelfile setting on every API call.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant