OpenCrank C++

A modular AI assistant framework written in C++ with an agentic tool-calling loop, dynamic plugin system, and a Markdown-based skills architecture.

Quick Start • Architecture • Agentic Loop • Skills System • Plugins • Configuration • GLM 4.7 Setup

Overview

OpenCrank is a personal AI assistant framework that runs as a single native binary with optional shared-library plugins. It connects to messaging channels (Telegram, WhatsApp), AI providers (Claude, Llama.cpp), and exposes a WebSocket gateway with a built-in web UI — all orchestrated through a central event loop in pure C++.

The AI doesn't just answer questions — it acts. OpenCrank implements a full agentic loop that lets the AI read/write files, execute shell commands, browse the web, manage persistent memory, and invoke user-defined skills, all through iterative tool calls until the task is complete.

Key Features

Feature	Description
Agentic Tool Loop	Multi-iteration loop where the AI calls tools, reads results, and decides next steps autonomously
Dynamic Plugin System	Load `.so` plugins at runtime — channels, AI providers, and tools
Skills System	Drop a `SKILL.md` file into a directory and the AI learns new capabilities
Memory & Tasks	SQLite-backed persistent memory with BM25 full-text search and task management
Multiple Channels	Telegram, WhatsApp, and WebSocket gateway with web UI
Multiple AI Providers	Claude API and Llama.cpp (local models via OpenAI-compatible API)
Built-in Tools	File I/O, shell execution, web browsing, content chunking, memory/task management
Session Management	Per-user conversation history with configurable scoping (DM, group, per-peer)
Rate Limiting	Token-bucket and sliding-window rate limiters per user
AI Process Monitor	Heartbeat tracking, hang detection, automatic typing indicators
Minimal Binary	Small core binary; all optional functionality lives in plugins

Quick Start

Requirements

C++ compatible compiler (g++ or clang++)
libcurl-dev, libsqlite3-dev, libssl-dev

Fedora/RHEL:

sudo dnf install gcc-c++ libcurl-devel sqlite-devel openssl-devel

Ubuntu/Debian:

sudo apt-get install build-essential libcurl4-openssl-dev libsqlite3-dev libssl-dev

Build & Run

git clone https://github.com/user/opencrank-cpp.git
cd opencrank-cpp

make                # Build binary + all plugins

cp config.example.json config.json
# Edit config.json — add your API keys and bot tokens

./bin/opencrank config.json

Build Targets

Command	Description
`make`	Build main binary and all plugins
`make core`	Build only the core objects
`make plugins`	Build only plugins (requires core)
`make debug`	Debug build (`-g -O0`)
`make release`	Optimized build (`-O3`, stripped)
`make clean`	Remove all build artifacts
`make install`	Install to `/usr/local`

Output Structure

bin/
├── opencrank              # Main binary (orchestrator)
└── plugins/
    ├── telegram.so        # Telegram channel
    ├── whatsapp.so        # WhatsApp channel
    ├── claude.so          # Claude AI provider
    ├── llamacpp.so        # Llama.cpp local AI provider
    ├── gateway.so         # WebSocket gateway + web UI
    └── polls.so           # Poll system

Architecture

┌────────────────────────────────────────────────────────────────┐
│                     Application Singleton                      │
│  Config · PluginLoader · SessionManager · ThreadPool · Agent   │
│  SkillManager · AIProcessMonitor · RateLimiter                 │
└───────────────┬──────────────┬─────────────────┬───────────────┘
                │              │                 │
    ┌───────────┴──┐    ┌──────┴──────┐   ┌──────┴──────┐
    │   Channels   │    │  AI Agents  │   │    Tools    │
    │  (plugins)   │    │  (plugins)  │   │  (built-in  │
    │              │    │             │   │  + plugins)  │
    ├──────────────┤    ├─────────────┤   ├─────────────┤
    │ telegram.so  │    │ claude.so   │   │ Browser     │
    │ whatsapp.so  │    │ llamacpp.so │   │ Memory      │
    │ gateway.so   │    │             │   │ File I/O    │
    │              │    │             │   │ Shell        │
    └──────────────┘    └─────────────┘   └─────────────┘
                                │
                        ┌───────┴────────┐
                        │  Agentic Loop  │
                        │  (tool calls)  │
                        └───────┬────────┘
                                │
                    ┌───────────┴───────────┐
                    │    Skills System      │
                    │  (SKILL.md prompts)   │
                    └──────────────────────┘

How It Works

Startup — Application::init() loads config.json, discovers plugins from the plugin directory, initializes channels, AI providers, and tools, loads skills from workspace directories, and builds the system prompt.
Message Routing — When a channel plugin receives a message, it fires a callback. The MessageHandler performs deduplication and rate limiting, then enqueues the message into the ThreadPool.
Command Dispatch — If the message starts with /, it's matched against registered commands (built-in or skill commands). Otherwise, it's forwarded to the AI provider.
Agentic Loop — The AI response is parsed for JSON tool calls ({"tool": "...", "arguments": {...}}). If found, the referenced tool is executed, results are injected back into the conversation, and the AI is called again. This repeats until the AI produces a final response with no tool calls, or the iteration limit is reached.
Response Delivery — The final text is split into chunks (if needed) and sent back through the originating channel.

Agentic Loop

The core of OpenCrank's intelligence is its agentic loop — an iterative cycle that allows the AI to act on the world, not just respond.

How the Loop Works

User Message
    │
    ▼
┌─────────────────────┐
│  Build system prompt │◄──── Skills prompt + Tools prompt
│  + conversation      │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│   Call AI Provider   │──── Claude API / Llama.cpp
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐     ┌──────────────────────┐
│  Parse AI response   │────►│  Has tool call JSON?  │
└──────────────────────┘     └──────────┬───────────┘
                                  │           │
                                 Yes          No
                                  │           │
                                  ▼           ▼
                        ┌─────────────┐   Return final
                        │Execute tool  │   response to user
                        │Inject result │
                        └──────┬──────┘
                               │
                               ▼
                        Loop back to
                        "Call AI Provider"
                        (max 10 iterations)

Tool Call Format

The AI uses a JSON format to invoke tools:

{"tool": "shell", "arguments": {"command": "ls -la /workspace"}}

Results are injected back as plain text:

[TOOL_RESULT tool=shell success=true]
  total 42
  drwxr-xr-x  5 user user  4096 Jan 15 10:30 .
  -rw-r--r--  1 user user  1234 Jan 15 10:28 config.json
  ...
[/TOOL_RESULT]

Built-in Tools

Tool	Description
`read`	Read file contents (with line ranges)
`write`	Write/create files
`shell`	Execute shell commands (with timeout)
`list_dir`	List directory contents
`browser_fetch`	Fetch web page content
`browser_links`	Extract links from a URL
`memory_save`	Save content to persistent memory
`memory_search`	text search across memory
`memory_get`	Read a specific memory record
`task_create`	Create a tracked task
`task_list`	List pending tasks
`task_complete`	Mark a task as done
`content_chunk`	Retrieve chunks of large content
`content_search`	Search within large chunked content

Content Chunking

When a tool returns content larger than 15,000 characters, OpenCrank automatically chunks it and provides a summary to the AI. The AI can then request specific chunks or search within the content using content_chunk and content_search tools, avoiding context window overflow.

Safety

Path sandboxing — File operations are restricted to the workspace directory. Directory traversal is blocked.
Command timeout — Shell commands have a configurable timeout (default 20s).
Iteration limit — The agentic loop stops after 10 iterations (configurable).
Error limit — 3 consecutive tool errors halt the loop.
Token limit recovery — If the context window overflows, the agent automatically truncates conversation history and retries.

Skills System

Skills are the mechanism for teaching the AI new capabilities without writing C++ code. A skill is simply a SKILL.md Markdown file placed in a directory.

How Skills Work

At startup, the SkillManager scans configured directories for subdirectories containing a SKILL.md file.
Each SKILL.md is parsed for YAML-like frontmatter (name, description, metadata) and a Markdown body containing instructions.
Eligible skills are injected into the AI's system prompt as an <skills> XML block, giving the AI awareness of available capabilities.
When a user sends a message, the AI can read and follow the instructions in any active skill to accomplish the task.

Directory Structure

skills/
├── weather/
│   └── SKILL.md          # Weather lookup instructions
├── translate/
│   └── SKILL.md          # Translation instructions
└── summarize/
    └── SKILL.md          # Document summarization instructions

SKILL.md Format

Each skill file uses YAML-style frontmatter followed by Markdown instructions:

---
name: weather
description: Get current weather and forecasts (no API key required).
homepage: https://wttr.in/:help
metadata: { "opencrank": { "emoji": "🌤️", "requires": { "bins": ["curl"] } } }
---

# Weather

## Open-Meteo (JSON)

Free, no key, good for programmatic use:

\`\`\`shell
curl -s "https://api.open-meteo.com/v1/forecast?latitude=51.5&longitude=-0.12&current_weather=true"
\`\`\`

Find coordinates for a city, then query. Returns JSON with temp, windspeed, weathercode.

Skill Loading Precedence

Skills are loaded from multiple directories with a priority system — higher-priority sources override lower ones:

Priority	Source	Description
1 (highest)	Workspace	`skills/` in the current workspace directory
2	Managed	`~/.config/opencrank/skills/` (user-installed)
3	Bundled	Built-in skills shipped with OpenCrank
4 (lowest)	Extra	Additional directories from config

Skill Metadata

The frontmatter supports rich metadata for controlling skill behavior:

Field	Description
`name`	Skill identifier
`description`	Short description shown in `/skills` list
`homepage`	URL for documentation
`metadata.opencrank.emoji`	Display emoji
`metadata.opencrank.always`	Always include in system prompt
`metadata.opencrank.requires.bins`	Required binaries (eligibility check)
`metadata.opencrank.requires.any_bins`	At least one must exist
`metadata.opencrank.requires.env`	Required environment variables
`metadata.opencrank.os`	OS restrictions (`darwin`, `linux`, `win32`)

Skill Eligibility

Before a skill is included in the system prompt, OpenCrank checks:

Binary requirements — Are the required CLI tools installed? (curl, ffmpeg, etc.)
Environment variables — Are the needed API keys set?
OS restrictions — Is the skill compatible with the current platform?
Config filters — Does the skill pass the user's skill filter list?

Skills that fail eligibility checks are silently excluded.

Skill Commands

Skills can also register as chat commands. When a skill is loaded, it becomes available as /skillname in chat, allowing users to invoke skill-specific functionality directly.

Plugins

Plugins are shared libraries (.so) loaded at runtime via dlopen. Each plugin implements one of three interfaces:

Plugin Types

Type	Interface	Purpose
Channel	`ChannelPlugin`	Messaging integrations (Telegram, WhatsApp, Gateway)
AI	`AIPlugin`	LLM providers (Claude, Llama.cpp)
Tool	`ToolProvider`	Agent tools (Browser, Memory)

Available Plugins

Plugin	Type	Description
`telegram.so`	Channel	Telegram Bot API with long-polling
`whatsapp.so`	Channel	WhatsApp Business API bridge
`gateway.so`	Channel	WebSocket server with JSON-RPC protocol and built-in web UI
`claude.so`	AI	Anthropic Claude API (Sonnet, Opus, Haiku)
`llamacpp.so`	AI	Llama.cpp server via OpenAI-compatible API (fully local)
`polls.so`	Tool	Interactive poll creation and management

Plugin Search Paths

plugins_dir from config.json
./plugins
/usr/lib/opencrank/plugins
/usr/local/lib/opencrank/plugins

Creating a Plugin

#include <opencrank/core/loader.hpp>
#include <opencrank/core/channel.hpp>

class MyChannel : public opencrank::ChannelPlugin {
public:
    const char* name() const override { return "My Channel"; }
    const char* version() const override { return "1.0.0"; }
    const char* channel_id() const override { return "mychannel"; }

    bool init(const opencrank::Config& cfg) override { /* ... */ return true; }
    void shutdown() override { /* ... */ }
    bool start() override { /* ... */ return true; }
    bool stop() override { return true; }
    opencrank::ChannelStatus status() const override { return opencrank::ChannelStatus::RUNNING; }
    opencrank::ChannelCapabilities capabilities() const override { return {}; }
    opencrank::SendResult send_message(const std::string& to, const std::string& text) override { /* ... */ }
    void poll() override { /* ... */ }
};

OPENCRANK_DECLARE_PLUGIN(MyChannel, "mychannel", "1.0.0", "My custom channel", "channel")

Build as a shared library:

g++ -std=c++11 -fPIC -shared -I./include mychannel.cpp -o mychannel.so

Memory System

OpenCrank includes a built-in persistent memory system backed by SQLite with BM25 full-text search.

Capabilities

File-based memory — Save and retrieve Markdown documents in a memory/ directory
Automatic chunking — Large documents are split into overlapping chunks for search
BM25 search — Full-text search using SQLite FTS5
Session transcripts — Conversation history is indexed for search
Task management — Create, list, and complete tracked tasks with due dates

How It Integrates

The memory system is exposed to the AI through agent tools (memory_save, memory_search, memory_get, task_create, task_list, task_complete). The AI can autonomously decide to save important information or search past conversations.

Configuration

All configuration lives in a single config.json file. See config.example.json for all options with descriptions.

Quick Configurations

Telegram Bot with Claude:

{
  "plugins": ["telegram", "claude"],
  "telegram": { "bot_token": "..." },
  "claude": { "api_key": "..." }
}

WebSocket Gateway with Web UI:

{
  "plugins": ["gateway", "claude"],
  "gateway": { "port": 18789, "bind": "0.0.0.0" },
  "claude": { "api_key": "..." }
}

Fully Local (Llama.cpp):

{
  "plugins": ["gateway", "llamacpp"],
  "llamacpp": { "url": "http://localhost:8080" }
}

Configuration Reference

Option	Default	Description
`plugins`	`[]`	List of plugins to load
`plugins_dir`	`./bin/plugins`	Plugin search directory
`workspace_dir`	`.`	Working directory for file operations
`log_level`	`info`	Logging: `debug`, `info`, `warn`, `error`
`system_prompt`	(built-in)	Custom system prompt for the AI
`skills.bundled_dir`	(auto)	Directory for bundled skills
`skills.managed_dir`	(auto)	Directory for user-installed skills
`telegram.bot_token`	—	Telegram Bot API token
`telegram.poll_timeout`	`30`	Long-poll timeout in seconds
`claude.api_key`	—	Anthropic API key
`claude.model`	`claude-sonnet-4-20250514`	Model to use
`claude.max_tokens`	`4096`	Max tokens per response
`claude.temperature`	`1.0`	Sampling temperature
`llamacpp.url`	`http://localhost:8080`	Llama.cpp server URL
`llamacpp.model`	`local-model`	Model name for API
`gateway.port`	`18789`	WebSocket server port
`gateway.bind`	`0.0.0.0`	Bind address
`gateway.auth.token`	(none)	Authentication token
`browser.timeout`	`30`	HTTP fetch timeout
`memory.db_path`	`.opencrank/memory.db`	SQLite database path
`memory.chunk_tokens`	`400`	Chunk size for indexing
`session.max_history`	`20`	Messages to keep in context
`session.timeout`	`3600`	Session timeout in seconds
`rate_limit.max_tokens`	`10`	Rate limit bucket size
`rate_limit.refill_rate`	`2`	Tokens refilled per second

Running Llama.cpp with GLM 4.7 Flash

OpenCrank supports local AI inference via the llamacpp plugin, which connects to a Llama.cpp server running an OpenAI-compatible API. This guide shows how to set up and run GLM 4.7 Flash — Z.ai's 30B MoE reasoning model optimized for local deployment.

Why GLM 4.7 Flash?

Best-in-class performance: Leads SWE-Bench, GPQA, coding, and reasoning benchmarks
Efficient: Uses ~3.6B active parameters (30B total MoE)
Large context: Supports up to 200K tokens
Tool-calling ready: Excellent for agentic workflows
Runs locally: Works with 24GB RAM/VRAM (4-bit quantized), 32GB for full precision

Setup Instructions

1. Build Llama.cpp

Install dependencies and build the latest Llama.cpp with GPU support:

# Install build dependencies
sudo apt-get update
sudo apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev -y

# Clone and build llama.cpp (with CUDA support)
git clone https://github.com/ggml-org/llama.cpp
cmake llama.cpp -B llama.cpp/build \
    -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON

cmake --build llama.cpp/build --config Release -j \
    --clean-first --target llama-cli llama-server

# Copy binaries
cp llama.cpp/build/bin/llama-* llama.cpp/

Note: Change -DGGML_CUDA=ON to -DGGML_CUDA=OFF if you don't have a GPU or want CPU-only inference.

2. Download GLM 4.7 Flash GGUF

Install the Hugging Face CLI and download the 4-bit quantized model:

pip install -U huggingface_hub

# Download the recommended 4-bit model (~18GB)
huggingface-cli download unsloth/GLM-4.7-Flash-GGUF \
    --local-dir models/GLM-4.7-Flash-GGUF \
    --include "*UD-Q4_K_XL*"

Other quantization options: You can choose different quantization levels (UD-Q2_K_XL, UD-Q4_K_XL, UD-Q8_0, etc.) based on your available memory. The 4-bit version requires ~18GB RAM/VRAM and provides the best quality-to-size ratio.

3. Start Llama.cpp Server

Launch the Llama.cpp server with OpenAI-compatible API:

./llama.cpp/llama-server \
    --model models/GLM-4.7-Flash-GGUF/GLM-4.7-Flash-UD-Q4_K_XL.gguf \
    --alias "GLM-4.7-Flash" \
    --fit on \
    --seed 3407 \
    --temp 0.7 \
    --top-p 1.0 \
    --min-p 0.01 \
    --repeat-penalty 1.0 \
    --ctx-size 16384 \
    --port 8080 \
    --jinja

Important parameters:

--temp 0.7 --top-p 1.0: Recommended for tool-calling and agentic use cases
--min-p 0.01: Required for llama.cpp (default is 0.05 which causes issues)
--repeat-penalty 1.0: Disables repeat penalty (critical for GLM 4.7)
--jinja: Use Jinja templating for chat formatting
--ctx-size 16384: Context window (can be increased up to 202,752)

For general conversation (non-agentic use), use: --temp 1.0 --top-p 0.95

4. Configure OpenCrank

Update your config.json to use the llamacpp plugin:

{
  "plugins": ["gateway", "llamacpp"],
  "llamacpp": {
    "url": "http://localhost:8080",
    "model": "GLM-4.7-Flash"
  },
  "gateway": {
    "port": 18789,
    "bind": "0.0.0.0"
  },
  "workspace_dir": ".",
  "session": {
    "max_history": 20,
    "timeout": 3600
  }
}

5. Run OpenCrank

make                     # Build if not already done
./bin/opencrank --config config.json

The gateway web UI will be available at http://localhost:18789. You can now chat with GLM 4.7 Flash running entirely locally!

Tips & Troubleshooting

Repetition or looping outputs?

Ensure you're using the latest GGUF (Jan 21+ update fixed a scoring_func bug)
Verify --repeat-penalty 1.0 is set
Try re-downloading the model: huggingface-cli download unsloth/GLM-4.7-Flash-GGUF --local-dir models/GLM-4.7-Flash-GGUF --include "*UD-Q4_K_XL*"

Out of memory?

Use a lower quantization: UD-Q2_K_XL instead of UD-Q4_K_XL
Reduce --ctx-size to 8192 or 4096
Enable CPU offloading with --n-gpu-layers 0 (slower but uses less VRAM)

Slow inference?

Enable GPU acceleration by rebuilding with -DGGML_CUDA=ON
Reduce batch size: --batch-size 512
Use a smaller quantized model

Tool calling not working?

Ensure temperature is set to 0.7 and --jinja is enabled
Check that OpenCrank's agentic loop is functioning (use /info command)
Review logs for tool call parsing errors

Alternative: Using Ollama (Not Recommended)

While GLM 4.7 Flash can technically run on Ollama, chat template compatibility issues currently make Ollama unreliable for this model. Use Llama.cpp directly as described above for best results.

References

Bot Commands

Command	Description
`/start`	Welcome message
`/help`	Show available commands
`/skills`	List loaded skills with eligibility status
`/ping`	Check if bot is alive
`/info`	Show bot version and system info
`/new`	Start a new conversation (clear history)
`/status`	Show session status and memory stats
`/tools`	List available agent tools
`/fetch <url>`	Fetch and display web page content
`/links <url>`	Extract links from a web page

Or just send a message to chat with the AI directly.

Project Structure

opencrank-cpp/
├── OpenCrank.jpg                  # Logo
├── config.example.json            # Example configuration with all options
├── Makefile                       # Main build system
├── Makefile.plugin                # Shared rules for plugin builds
│
├── include/opencrank/
│   ├── ai/
│   │   └── ai.hpp                 # AIPlugin interface, ConversationMessage, CompletionResult
│   ├── core/
│   │   ├── application.hpp        # Application singleton (lifecycle, system prompt)
│   │   ├── agent.hpp              # Agentic loop, AgentTool, ContentChunker
│   │   ├── builtin_tools.hpp      # File I/O, shell, content tools
│   │   ├── browser_tool.hpp       # Web fetching and link extraction
│   │   ├── memory_tool.hpp        # Memory/task agent tools
│   │   ├── message_handler.hpp    # Message routing and dispatch
│   │   ├── ai_monitor.hpp         # AI heartbeat and hang detection
│   │   ├── plugin.hpp             # Base Plugin interface
│   │   ├── channel.hpp            # ChannelPlugin interface
│   │   ├── tool.hpp               # ToolProvider interface
│   │   ├── loader.hpp             # Plugin dynamic loading (dlopen)
│   │   ├── registry.hpp           # Plugin and command registry
│   │   ├── session.hpp            # Session management and routing
│   │   ├── config.hpp             # JSON config reader
│   │   ├── http_client.hpp        # libcurl HTTP wrapper
│   │   ├── rate_limiter.hpp       # Token-bucket rate limiter
│   │   ├── thread_pool.hpp        # Worker thread pool
│   │   ├── logger.hpp             # Leveled logging
│   │   ├── types.hpp              # Message, SendResult, ChannelCapabilities
│   │   └── utils.hpp              # String, path, phone utilities
│   ├── memory/
│   │   ├── manager.hpp            # Memory indexing, search, and tasks
│   │   ├── store.hpp              # SQLite storage backend
│   │   └── types.hpp              # MemoryChunk, MemoryConfig, MemorySearchResult
│   └── skills/
│       ├── manager.hpp            # Skill loading, filtering, prompt generation
│       ├── loader.hpp             # SKILL.md parser (frontmatter + content)
│       └── types.hpp              # Skill, SkillEntry, SkillMetadata, SkillRequirements
│
├── src/
│   ├── main.cpp                   # Entry point
│   ├── ai/                        # AI provider implementations
│   ├── core/                      # Core framework implementation
│   ├── memory/                    # Memory system implementation
│   ├── skills/                    # Skills system implementation
│   └── plugins/                   # Plugin source code
│       ├── claude/                # Claude AI plugin
│       ├── llamacpp/              # Llama.cpp AI plugin
│       ├── telegram/              # Telegram channel plugin
│       ├── whatsapp/              # WhatsApp channel plugin
│       ├── gateway/               # WebSocket gateway + web UI
│       └── polls/                 # Polls plugin
│
└── skills/                        # Workspace skills directory
    └── weather/
        └── SKILL.md               # Example: weather lookup skill

License

MIT License

Acknowledgments

Inspired by OpenClaw — a TypeScript-based personal AI assistant. Huge thanks to unsloth.ai for the best optimized models! (https://unsloth.ai/docs/models/glm-4.7-flash)

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
include/opencrank		include/opencrank
skills		skills
src		src
.gitignore		.gitignore
Makefile		Makefile
Makefile.plugin		Makefile.plugin
OpenCrank.jpg		OpenCrank.jpg
README.md		README.md
config.example.json		config.example.json

polaco1782/OpenCrank

Folders and files

Latest commit

History

Repository files navigation