-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
Description
Repo: coder/registry
Scope: Update the AgentAPI base module to configure state persistence and add shutdown script for log capture.
Changes:
- Add
state_file_pathvariable (default/home/coder/.agentapi/state.json) - Add
pid_file_pathvariable (default/tmp/agentapi.pid) - Add
enable_state_persistencevariable (default true) - Pass flags to AgentAPI startup:
--state-file,--load-state,--save-state,--pid-file - Add
coder_scriptresource withrun_on_stop = truethat:- Sends SIGUSR1 to AgentAPI to save state
- Fetches last 10 messages from AgentAPI
/messages(best-effort, may be fewer if truncated) - Truncates if payload > 64KB
- POSTs to coderd snapshot endpoint
- Sends SIGTERM to AgentAPI
Script environment (available in run_on_stop context):
CODER_AGENT_TOKEN: Agent auth token for coderd API callsCODER_AGENT_URL: Base URL for coderd API (via agent manifest)CODER_WORKSPACE_NAME: Workspace name (via agent manifest)
Note on task ID: CODER_TASK_ID is available during Terraform provisioning but NOT during agent runtime. The shutdown script must have the task ID embedded at provisioning time via Terraform interpolation (see Terraform example below).
Startup script change (in main.sh):
- AgentAPI writes its own PID file via
--pid-file. Registry scripts should not rely on$!for the AgentAPI PID.
Terraform resource (in registry module):
resource "coder_script" "shutdown" {
agent_id = coder_agent.main.id
run_on_stop = true
script = <<-EOF
#!/bin/bash
set -euo pipefail
# Task ID embedded at provisioning time.
TASK_ID="${data.coder_task.me.id}"
AGENTAPI_PID=$(cat /tmp/agentapi.pid 2>/dev/null || echo "")
# Save state early (SIGUSR1 triggers save without exit).
if [[ -n "$AGENTAPI_PID" ]] && kill -0 "$AGENTAPI_PID" 2>/dev/null; then
kill -USR1 "$AGENTAPI_PID" || true
fi
# Capture and post snapshot (best-effort).
if curl -sf http://localhost:4321/messages >/dev/null 2>&1; then
# Fetch, truncate, post logic here
...
fi
# Terminate AgentAPI.
if [[ -n "$AGENTAPI_PID" ]]; then
kill -TERM "$AGENTAPI_PID" 2>/dev/null || true
fi
EOF
}Files (in registry repo):
coder/modules/agentapi/main.tfcoder/modules/agentapi/scripts/main.sh(PID file)coder/modules/agentapi/scripts/shutdown.sh(new)
Acceptance criteria:
- AgentAPI starts with state persistence flags when enabled
- Shutdown script captures and posts log snapshot
- State file written to persistent storage location
- Script handles missing AgentAPI process gracefully (no error exit)
- Script uses
set -euo pipefailbut snapshot POST failures don't abort shutdown - Task ID correctly embedded via
data.coder_task.me.idat provisioning time - AgentAPI writes PID file via
--pid-fileand shutdown script uses it for signaling
Dependencies:
- Tasks: Log snapshot storage endpoint #1253
- AgentAPI: State persistence #1256
- Bug: Agent loses authorization before
run_on_stopscript completes coder#19467
References:
- PRD: Start/Pause/Resume the Task Workspace
- RFC: Tasks: Start/Pause/Resume Lifecycle: Log Snapshots (Shutdown script implementation), AgentAPI State Persistence