tail -f for GPU clouds. Survives disconnects, aggregates multi-node.
Stop losing your training logs when SSH drops. Watch from anywhere, reconnect seamlessly.
You're training a model on RunPod, Vast.ai, or Lambda. You SSH in, start training, and:
- Your terminal disconnects after an hour
- You lose visibility into what's happening
- You resort to tmux hacks just to keep logs alive
- Multi-node training means logs scattered across machines
# On your GPU instance
pip install logtap
logtap collect &
python train.py 2>&1 | logtap ingest run1
# From your laptop (or phone)
logtap tail run1 --follow
# Connection drops... reconnects automatically
# "reconnected (missed 0 lines)"On the GPU instance:
pip install logtap
export LOGTAP_API_KEY=secret
logtap collect --port 8000Start training and stream logs:
python train.py 2>&1 | logtap ingest run1 --tag node=$(hostname)From your laptop:
export LOGTAP_SERVER=http://<gpu-ip>:8000
export LOGTAP_API_KEY=secret
logtap tail run1 --followDisconnect, close your terminal, or switch networks.
Re-run logtap tail anytime to resume where you left off.
Works the same on RunPod, Vast.ai, Lambda, and any ephemeral GPU cloud.
- Survives Disconnects - Resume from where you left off with cursor-based streaming
- Pipe-Friendly - Works with any training script via stdin
- Multi-Node Ready - Tag runs with
node=gpu1and filter/aggregate - Zero Infra - No database, no complex setup, just pip install
- Lightweight - <50MB memory, append-only file storage
tmux and mosh help keep SSH sessions alive. logtap solves a different problem.
- SSH can still drop (web terminals, proxies, idle timeouts)
- tmux doesn't aggregate logs across machines
- tmux can't be viewed from another device without SSH
- tmux sessions die when ephemeral instances stop
logtap streams logs over HTTP:
- survives disconnects
- resumes without gaps
- aggregates multi-node training via tags
- works from anywhere (no SSH required)
You can still use tmux. You just don't have to rely on it.
pip install logtaplogtap collect --api-key secretpython train.py 2>&1 | logtap ingest run1 --api-key secretexport LOGTAP_SERVER=http://your-gpu-ip:8000
export LOGTAP_API_KEY=secret
logtap tail run1 --follow| Command | Description |
|---|---|
logtap collect |
Start collector server (accepts ingested runs) |
logtap ingest <run> |
Pipe stdin to collector |
logtap tail <run> |
Tail a run with --follow for streaming |
logtap runs |
List active runs |
logtap doctor |
Check server connectivity and diagnose issues |
# Auto-generate run name
python train.py | logtap ingest
# Add tags for multi-node
python train.py | logtap ingest run1 --tag node=gpu1 --tag rank=0
# Quiet mode (no status messages)
python train.py | logtap ingest run1 --quiet# Follow mode (like tail -f)
logtap tail run1 --follow
# Resume from specific cursor (survives disconnects!)
logtap tail run1 --follow --since 5000
# Filter by tag
logtap tail run1 --tag node=gpu1
# Output formats
logtap tail run1 --output jsonl | jq '.line'logtap collect \
--port 8000 \
--api-key secret \
--data-dir ~/.logtap/runs \
--max-disk-mb 5000 \
--retention-hours 72Tag each node and aggregate:
# Node 1
python train.py | logtap ingest run1 --tag node=gpu1
# Node 2
python train.py | logtap ingest run1 --tag node=gpu2
# Watch all nodes
logtap tail run1 --follow
# Watch specific node
logtap tail run1 --follow --tag node=gpu1| Variable | Default | Description |
|---|---|---|
LOGTAP_SERVER |
http://localhost:8000 |
Collector URL |
LOGTAP_API_KEY |
- | API key for auth |
Set these to avoid typing --server and --api-key every time.
- Collector writes logs to append-only files with cursor tracking
- Ingest streams stdin over HTTP chunked POST
- Tail uses SSE (Server-Sent Events) with resume support
- Reconnect passes
?since=<cursor>to continue without gaps
No database. No message queue. Just files and HTTP.
For scripting or custom integrations:
| Endpoint | Description |
|---|---|
POST /runs/{id}/ingest |
Stream lines (chunked POST) |
GET /runs/{id}/stream |
SSE stream with ?since=&follow= |
GET /runs/{id}/query |
Query with ?from=&to=&search= |
GET /runs |
List runs |
GET /health |
Health check with capabilities |
logtap also works as a simple remote log viewer (the original use case):
# On server with log files
logtap serve --log-dir /var/log
# From client
logtap tail syslog --server http://myserver:8000 --follow
logtap query auth.log --regex "Failed password"- API Key Auth - Optional but recommended for production
- Path Traversal Protection - Comprehensive defense with symlink-safe containment checks (see SECURITY.md)
- ReDoS Protection - Uses google-re2 for guaranteed linear-time regex matching
- Read-Only by Default - Collector only writes to its data directory
- Input Validation - Rejects control characters, NUL bytes, and malicious path patterns
git clone https://github.com/cainky/logtap.git
cd logtap
# Install with uv
uv sync --extra dev
# Run tests
uv run pytest
# Run collector in dev mode
uv run logtap collect --reloadGPL v3 - see LICENSE
Kyle Cain - @cainky