Skip to content

Conversation

@maceip
Copy link

@maceip maceip commented Jan 8, 2026

Motivation and Context

This PR adds LiteRT-LM as a new LLMProvider to Koog, enabling on-device LLM inference using Google's LiteRT-LM engine. This allows users to run LLMs locally on Android and JVM platforms without requiring network connectivity, which is valuable for:

  • Privacy-sensitive applications
  • Offline-capable AI features
  • Reduced latency for on-device inference
  • Edge deployment scenarios

The implementation follows the existing Ollama provider patterns for consistency with Koog's architecture.

Breaking Changes

None. This is a purely additive change introducing a new module and provider.


Type of the changes

  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Tests improvement
  • Refactoring

Checklist

  • The pull request has a description of the proposed change
  • I read the Contributing Guidelines before opening the pull request
  • The pull request uses develop as the base branch
  • Tests for the changes have been added
  • All new and existing tests passed
Additional steps for pull requests adding a new feature
  • An issue describing the proposed change exists
  • The pull request includes a link to the issue
  • The change was discussed and approved in the issue
  • Docs have been added / updated

Summary of Changes

New module: prompt-executor-litertlm-client

Key components:

Component Description
LiteRTLMClient Main client implementing LLMClient interface with conversation API
ManagedConversation Thread-safe multi-turn conversation with history tracking
LiteRTLMToolBridge Bridges Koog's ToolDescriptor to LiteRT-LM's annotation-based tool system
LiteRTLMClientFactory KMP factory with platform-specific implementations (JVM actual, stubs for iOS/JS/Wasm)
LiteRTLMModels Model definitions for Gemma 3n variants

Features:

  • Stateless execute() and executeStreaming() for single requests
  • conversation() API for multi-turn interactions with context preservation
  • Multimodal support (image/audio) via typed methods (sendImage, sendAudio, etc.)
  • Tool calling support via ToolExecutor callback pattern
  • Thread-safe concurrent access with Mutex synchronization
  • Conversation history tracking with timestamps
  • Kotlin 2.3 forward-compatible @MustUseReturnValue annotations

maceip and others added 14 commits January 9, 2026 11:14
Add support for LiteRT-LM, Google's on-device inference engine that
enables running LLMs locally on Android and JVM platforms.

Changes:
- Add LiteRTLM provider to LLMProvider sealed class
- Create LiteRTLMModels with Gemma-3n-E4B model support
- Add prompt-executor-litertlm-client module with LiteRTLMClient
- Add litertlm-jvm and litertlm-android dependencies to version catalog

The LiteRT-LM client supports:
- Synchronous and streaming response generation
- Temperature control via SamplerConfig
- System message and conversation context
- CPU and GPU backends for inference

Note: The LiteRT-LM library dependency is marked as compileOnly. Users
must add the LiteRT-LM runtime dependency to their project when using
this provider.
Add test coverage for the LiteRT-LM client:
- Unit tests for configuration, error handling, and provider validation
- Integration test template for local testing with actual models

The integration tests are disabled by default and require:
- LiteRT-LM library dependency
- A valid model file (set via MODEL_PATH env var)
Add prebuilt native libraries from google-ai-edge/LiteRT-LM for
Android ARM64 platform:
- libGemmaModelConstraintProvider.so
- libLiteRtGpuAccelerator.so
- libLiteRtOpenClAccelerator.so
- libLiteRtTopKOpenClSampler.so
- libLiteRtTopKWebGpuSampler.so
- libLiteRtWebGpuAccelerator.so

These libraries enable GPU-accelerated inference on Android devices.

Source: https://github.com/google-ai-edge/LiteRT-LM/tree/main/prebuilt/android_arm64
Model files (.litertlm) are too large for git. Users should download
models separately for testing.
…support

Addresses several implementation issues:

1. Conversation history support (issue 2):
   - Now processes all messages in prompt, not just last user message
   - Maintains context through multi-turn conversations
   - Handles System, User, Assistant, Tool.Call, Tool.Result, Reasoning

2. Multimodal content handling (issue 3):
   - Added support for Image content via Content.ImageBytes
   - Added support for Audio content via Content.AudioBytes
   - Validates model capabilities before processing
   - File attachments converted to text representation

3. Configurable sampler (issue 4):
   - Added defaultTopK, defaultTopP, defaultTemperature constructor params
   - Temperature still overridable via prompt.params

4. Tool support (issue 6):
   - Tools parameter accepted in createConversationConfig
   - Added TODO noting LiteRT-LM uses annotation-based tool registration
   - Tool calls/results from history preserved as context strings
These binaries are for Android NDK and not usable by the JVM-only
LiteRT-LM client module. Users targeting Android should use the
litertlm-android dependency directly which includes the native libs.
- executeStreaming now passes tools parameter instead of emptyList()
- Added guard for empty content parts in buildUserMessage
Add expect/actual pattern for cross-platform LiteRT-LM client creation:
- commonMain: LiteRTLMClientConfig and factory function declarations
- jvmMain: Full implementation using LiteRTLMClient
- androidMain: Stub with guidance for adding litertlm-android dependency
- appleMain/jsMain/wasmJsMain: Stubs returning UnsupportedOperationException

Also update build.gradle.kts to work with full KMP via convention plugin.
Refactored configuration and client to match official LiteRT-LM patterns:
- Add LiteRTLMEngineConfig with visionBackend, audioBackend, maxNumTokens
- Add LiteRTLMSamplerConfig with Double types and seed parameter
- Add NPU backend option
- Add require() validation in config classes (matching official style)
- Add ImageFile/AudioFile support for file:// URLs
- Add cancelProcess() for conversation cancellation
- Use @volatile and synchronized lock pattern for thread safety
- Update KDoc with example usage matching official documentation style
… duplication

- Add sendMultimodal() private helper for sync multimodal sends
- Add sendMultimodalStreaming() private helper for async multimodal sends
- Public API unchanged, internal code quality improved
- Add back agent modules to settings.gradle.kts
- Add dokka entries for agent modules in build.gradle.kts
- Restore test-utils dependency in litertlm-client commonTest

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@maceip maceip force-pushed the add-litert-lm-provider branch from 3f84fe4 to b4af5ca Compare January 9, 2026 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant