Skip to content

Registry: AgentAPI base module shutdown script and state persistence #1257

@mafredri

Description

@mafredri

Repo: coder/registry

Scope: Update the AgentAPI base module to configure state persistence and add shutdown script for log capture.

Changes:

  • Add state_file_path variable (default /home/coder/.agentapi/state.json)
  • Add pid_file_path variable (default /tmp/agentapi.pid)
  • Add enable_state_persistence variable (default true)
  • Pass flags to AgentAPI startup: --state-file, --load-state, --save-state, --pid-file
  • Add coder_script resource with run_on_stop = true that:
    1. Sends SIGUSR1 to AgentAPI to save state
    2. Fetches last 10 messages from AgentAPI /messages (best-effort, may be fewer if truncated)
    3. Truncates if payload > 64KB
    4. POSTs to coderd snapshot endpoint
    5. Sends SIGTERM to AgentAPI

Script environment (available in run_on_stop context):

  • CODER_AGENT_TOKEN: Agent auth token for coderd API calls
  • CODER_AGENT_URL: Base URL for coderd API (via agent manifest)
  • CODER_WORKSPACE_NAME: Workspace name (via agent manifest)

Note on task ID: CODER_TASK_ID is available during Terraform provisioning but NOT during agent runtime. The shutdown script must have the task ID embedded at provisioning time via Terraform interpolation (see Terraform example below).

Startup script change (in main.sh):

  • AgentAPI writes its own PID file via --pid-file. Registry scripts should not rely on $! for the AgentAPI PID.

Terraform resource (in registry module):

resource "coder_script" "shutdown" {
  agent_id     = coder_agent.main.id
  run_on_stop  = true
  script       = <<-EOF
    #!/bin/bash
    set -euo pipefail

    # Task ID embedded at provisioning time.
    TASK_ID="${data.coder_task.me.id}"
    AGENTAPI_PID=$(cat /tmp/agentapi.pid 2>/dev/null || echo "")

    # Save state early (SIGUSR1 triggers save without exit).
    if [[ -n "$AGENTAPI_PID" ]] && kill -0 "$AGENTAPI_PID" 2>/dev/null; then
      kill -USR1 "$AGENTAPI_PID" || true
    fi

    # Capture and post snapshot (best-effort).
    if curl -sf http://localhost:4321/messages >/dev/null 2>&1; then
      # Fetch, truncate, post logic here
      ...
    fi

    # Terminate AgentAPI.
    if [[ -n "$AGENTAPI_PID" ]]; then
      kill -TERM "$AGENTAPI_PID" 2>/dev/null || true
    fi
  EOF
}

Files (in registry repo):

  • coder/modules/agentapi/main.tf
  • coder/modules/agentapi/scripts/main.sh (PID file)
  • coder/modules/agentapi/scripts/shutdown.sh (new)

Acceptance criteria:

  • AgentAPI starts with state persistence flags when enabled
  • Shutdown script captures and posts log snapshot
  • State file written to persistent storage location
  • Script handles missing AgentAPI process gracefully (no error exit)
  • Script uses set -euo pipefail but snapshot POST failures don't abort shutdown
  • Task ID correctly embedded via data.coder_task.me.id at provisioning time
  • AgentAPI writes PID file via --pid-file and shutdown script uses it for signaling

Dependencies:

References:

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions