Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 50 additions & 34 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,23 +122,17 @@ silo/
**Config Struct** (`manager.go`):
```go
type Config struct {
Version string // CLI version
ImageTag string // Docker image tag
Port int // Frontend port
LLMBaseURL string // Inference engine URL
DefaultModel string // Default LLM model
InferencePort int
InferenceModelFile string // GGUF filename
InferenceShmSize string
InferenceContextSize int
InferenceBatchSize int
InferenceGPULayers int // 999 = all layers on GPU
InferenceTensorSplit string
InferenceMainGPU int
InferenceThreads int
InferenceHTTPThreads int
InferenceFit string
InferenceGPUDevices string // Quoted CSV: "0", "1", "2"
Version string // CLI version
ImageTag string // Docker image tag
Port int // Frontend port
LLMBaseURL string // Inference engine URL
DefaultModel string // Default LLM model
EnableProxyAgent bool // Enable remote proxy agent
EnableDeepResearch bool // Enable deep research service
DeepResearchImage string // GHCR image (sha-tagged)
DeepResearchPort int // Default 3031
SearchProvider string // "perplexity" or "tavily"
PerplexityAPIKey string // Required for deep research
}
```

Expand Down Expand Up @@ -224,29 +218,18 @@ version: "0.1.2" # CLI version
image_tag: "0.1.2" # Docker image tag
port: 80 # Frontend port
llm_base_url: "http://inference-engine:30000/v1"
default_model: "GLM-4.7-Q4_K_M.gguf"
inference_port: 30000
inference_model_file: "GLM-4.7-Q4_K_M.gguf"
inference_shm_size: "16g"
inference_context_size: 8192
inference_batch_size: 256
inference_gpu_layers: 999 # 999 = all layers on GPU
inference_tensor_split: "1,1,1"
inference_main_gpu: 0
inference_threads: 16
inference_http_threads: 8
inference_fit: "off"
inference_gpu_devices: "\"0\", \"1\", \"2\"" # Quoted CSV for YAML
default_model: "model-name"

# Service toggles
enable_inference_engine: false # Enable llama.cpp inference
enable_proxy_agent: false # Enable remote proxy agent
enable_deep_research: true # Enable deep research service

# Deep research configuration
deep_research_image: "ghcr.io/eternisai/deep_research:sha-XXXXXXX" # See manager.go for current default
# NOTE: Image uses SHA tags from GHCR (not semver from Docker Hub)
# The default is pinned in manager.go and auto-updated during silo upgrade
deep_research_image: "ghcr.io/eternisai/deep_research:sha-XXXXXXX"
deep_research_port: 3031
search_provider: "perplexity"
search_provider: "perplexity" # "perplexity" or "tavily"
perplexity_api_key: "" # Required for deep research web search
```

Expand All @@ -265,10 +248,43 @@ perplexity_api_key: "" # Required for deep research web search
1. **Template-driven Configuration**: docker-compose.yml and config.yml generated from embedded templates with user values
2. **Single-responsibility Packages**: Each package handles one concern (installer, updater, docker, config)
3. **Stateful Operations**: Tracks install timestamps and versions in state.json
4. **Selective Image Pulls**: Only pulls backend/frontend (inference engine image is larger, pre-packaged)
4. **Selective Image Pulls**: Pulls backend/frontend/deep-research; inference managed separately via docker run
5. **Non-blocking Updates**: Version checks warn but don't fail operations
6. **Graceful Degradation**: Warns on errors but continues where possible

## Deep Research Deployment

The deep research service uses a different deployment model than frontend/backend:

| Aspect | Frontend / Backend | Deep Research |
|--------|-------------------|---------------|
| **Registry** | Docker Hub (`eternis/silo-box-*`) | GHCR (`ghcr.io/eternisai/deep_research`) |
| **Versioning** | Semantic versioning (`0.1.2`) | Commit SHA tags (`sha-2e9f2ef`) |
| **Update Source** | CLI queries Docker Hub API for latest | Version pinned as `DefaultDeepResearchImage` in `manager.go` |
| **Pull Criticality** | Critical (blocks upgrade on failure) | Non-critical (warns but continues) |

### Update Flow

1. Push changes to `silo_deep_research` repo
2. GitHub Actions builds and pushes to GHCR with `sha-{commit}` tag
3. Update `DefaultDeepResearchImage` constant in `internal/config/manager.go`
4. Release new CLI version (`gh workflow run Release`)
5. Users run `silo upgrade` to get new image

### GHCR Authentication

If the deep research image is private, users need GHCR auth:

```bash
# Create PAT (classic) with read:packages scope
# If org uses SAML SSO, authorize PAT for the org
echo "YOUR_PAT" | docker login ghcr.io -u YOUR_USERNAME --password-stdin
```

### Graceful Pull Handling

The CLI pulls services individually. If deep research fails to pull (auth issues), it logs a warning but continues deploying frontend/backend.

## Development Workflow

```bash
Expand Down
52 changes: 37 additions & 15 deletions internal/assets/config.yml.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,43 @@ port: {{.Port}}
llm_base_url: "{{.LLMBaseURL}}"
default_model: "{{.DefaultModel}}"

# Inference Engine Configuration
inference_port: {{.InferencePort}}
inference_model_file: "{{.InferenceModelFile}}"
inference_shm_size: "{{.InferenceShmSize}}"
inference_context_size: {{.InferenceContextSize}}
inference_batch_size: {{.InferenceBatchSize}}
inference_gpu_layers: {{.InferenceGPULayers}}
inference_tensor_split: "{{.InferenceTensorSplit}}"
inference_main_gpu: {{.InferenceMainGPU}}
inference_threads: {{.InferenceThreads}}
inference_http_threads: {{.InferenceHTTPThreads}}
inference_fit: "{{.InferenceFit}}"
inference_gpu_devices: '{{.InferenceGPUDevices}}'

# Service Toggles
enable_inference_engine: {{.EnableInferenceEngine}}
enable_proxy_agent: {{.EnableProxyAgent}}
proxy_server_url: "{{.ProxyServerURL}}"
enable_deep_research: {{.EnableDeepResearch}}

# Deep Research
deep_research_image: "{{.DeepResearchImage}}"
deep_research_port: {{.DeepResearchPort}}
search_provider: "{{.SearchProvider}}"
perplexity_api_key: "{{.PerplexityAPIKey}}"

# SGLang Inference Engine (managed separately from docker-compose)
sglang:
enabled: {{.SGLang.Enabled}}
image: "{{.SGLang.Image}}"
container_name: "{{.SGLang.ContainerName}}"
port: {{.SGLang.Port}}
gpu_devices:{{- if .SGLang.GPUDevices}}
{{- range .SGLang.GPUDevices}}
- "{{.}}"
{{- end}}
{{- else}} []
{{- end}}
shm_size: "{{.SGLang.ShmSize}}"
model_path: "{{.SGLang.ModelPath}}"
huggingface_cache: "{{.SGLang.HuggingFaceCache}}"
dp_size: {{.SGLang.DPSize}}
tp_size: {{.SGLang.TPSize}}
max_running_requests: {{.SGLang.MaxRunningRequests}}
max_total_tokens: {{.SGLang.MaxTotalTokens}}
context_length: {{.SGLang.ContextLength}}
mem_fraction_static: {{.SGLang.MemFractionStatic}}
chunked_prefill_size: {{.SGLang.ChunkedPrefillSize}}
schedule_policy: "{{.SGLang.SchedulePolicy}}"
kv_cache_dtype: "{{.SGLang.KVCacheDtype}}"
attention_backend: "{{.SGLang.AttentionBackend}}"
disable_radix_cache: {{.SGLang.DisableRadixCache}}
reasoning_parser: "{{.SGLang.ReasoningParser}}"
trust_remote_code: {{.SGLang.TrustRemoteCode}}
log_level: "{{.SGLang.LogLevel}}"
43 changes: 1 addition & 42 deletions internal/assets/docker-compose.yml.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -53,48 +53,7 @@ services:
depends_on:
- backend
restart: unless-stopped
{{if .EnableInferenceEngine}}
inference-engine:
image: ghcr.io/ggml-org/llama.cpp:full-cuda
ports:
- "{{.InferencePort}}:{{.InferencePort}}"
volumes:
- {{.DataDir}}/models:/models:ro
shm_size: '{{.InferenceShmSize}}'
ipc: host
command:
- --server
- -m
- /models/{{.InferenceModelFile}}
- -c
- "{{.InferenceContextSize}}"
- -b
- "{{.InferenceBatchSize}}"
- -ngl
- "{{.InferenceGPULayers}}"
- --tensor-split
- {{.InferenceTensorSplit}}
- -mg
- "{{.InferenceMainGPU}}"
- -t
- "{{.InferenceThreads}}"
- --threads-http
- "{{.InferenceHTTPThreads}}"
- -fit
- "{{.InferenceFit}}"
- --host
- 0.0.0.0
- --port
- "{{.InferencePort}}"
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: [{{.InferenceGPUDevices}}]
capabilities: [gpu]
restart: unless-stopped
{{end}}

{{if .EnableProxyAgent}}
silo-proxy-agent:
image: eternis/silo-proxy-agent
Expand Down
13 changes: 6 additions & 7 deletions internal/cli/root.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,12 @@ import (
)

var (
verbose bool
configDir string
imageTag string
port int
enableInferenceEngine bool
enableProxyAgent bool
log *logger.Logger
verbose bool
configDir string
imageTag string
port int
enableProxyAgent bool
log *logger.Logger
)

var rootCmd = &cobra.Command{
Expand Down
2 changes: 0 additions & 2 deletions internal/cli/up.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ By default, the inference engine is NOT started. Use --all to include it.`,
if port > 0 {
cfg.Port = port
}
cfg.EnableInferenceEngine = enableInferenceEngine
cfg.EnableProxyAgent = enableProxyAgent

if err := config.Validate(cfg); err != nil {
Expand Down Expand Up @@ -130,7 +129,6 @@ func init() {

upCmd.Flags().StringVar(&imageTag, "image-tag", config.DefaultImageTag, "Docker image tag (first install only)")
upCmd.Flags().IntVar(&port, "port", config.DefaultPort, "Application port (first install only)")
upCmd.Flags().BoolVar(&enableInferenceEngine, "enable-inference-engine", config.DefaultEnableInferenceEngine, "Enable local inference engine (first install only)")
upCmd.Flags().BoolVar(&enableProxyAgent, "enable-proxy-agent", config.DefaultEnableProxyAgent, "Enable proxy agent (first install only)")
upCmd.Flags().BoolVar(&upAll, "all", false, "Include inference engine")
}
80 changes: 7 additions & 73 deletions internal/config/manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,24 +17,9 @@ const (
DefaultLLMBaseURL = "http://host.docker.internal:30000/v1"
DefaultModel = "glm47-awq"

// Inference engine defaults (legacy llama.cpp - kept for compatibility)
DefaultInferencePort = 30000
DefaultInferenceModelFile = "GLM-4.7-Q4_K_M.gguf"
DefaultInferenceShmSize = "16g"
DefaultInferenceContextSize = 8192
DefaultInferenceBatchSize = 256
DefaultInferenceGPULayers = 999
DefaultInferenceTensorSplit = "1,1,1"
DefaultInferenceMainGPU = 0
DefaultInferenceThreads = 16
DefaultInferenceHTTPThreads = 8
DefaultInferenceFit = "off"
DefaultInferenceGPUDevices = `"0", "1", "2"`

// Service toggles
DefaultEnableInferenceEngine = false
DefaultEnableProxyAgent = false
DefaultEnableDeepResearch = true
DefaultEnableProxyAgent = false
DefaultEnableDeepResearch = true
Comment on lines +21 to +22
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

DefaultEnableDeepResearch = true will be impossible to override to false via config file due to mergeConfigs logic.

The isZeroValue helper (line 246) treats false as the zero value for booleans. When LoadOrDefault merges an existing config with defaults, any enable_deep_research: false set by the user will be detected as "zero" and silently replaced with the default true. This means users cannot disable deep research once it's enabled by default.

The same issue affects SGLang.TrustRemoteCode (default true) and SGLang.DisableRadixCache (default true).

Possible fixes:

  1. Change the merge strategy for bools to always prefer the existing config's value (i.e., never treat a bool as "missing").
  2. Use *bool (pointer) for toggles so that nil represents "not set" vs explicit false.
  3. Track which fields were explicitly present in the YAML before merging.
Option 1: Always prefer existing bool values in merge
 // mergeStructFields recursively merges struct fields
 func mergeStructFields(existing, defaults, result reflect.Value) {
 	for i := 0; i < existing.NumField(); i++ {
 		existingField := existing.Field(i)
 		defaultField := defaults.Field(i)
 		resultField := result.Field(i)
 
 		if !resultField.CanSet() {
 			continue
 		}
 
 		// For nested structs, merge fields recursively
 		if existingField.Kind() == reflect.Struct {
 			mergeStructFields(existingField, defaultField, resultField)
 			continue
 		}
 
+		// Bools have no distinguishable "unset" zero value;
+		// always keep whatever the existing config says.
+		if existingField.Kind() == reflect.Bool {
+			resultField.Set(existingField)
+			continue
+		}
+
 		if isZeroValue(existingField) {
 			resultField.Set(defaultField)
 		} else {
 			resultField.Set(existingField)
 		}
 	}
 }

Note: Option 1 means a brand-new config file (all bools at Go zero = false) would not pick up true defaults, so you'd need to ensure NewDefaultConfig is used for fresh installs rather than going through the merge path. The existing code already does this (line 185), but verify the full flow.

#!/bin/bash
# Verify how LoadOrDefault handles fresh install vs existing config
# Check if there's a code path where mergeConfigs is called for a brand-new config
ast-grep --pattern $'func LoadOrDefault($_, $_) ($_, $_) {
  $$$
}'


// Proxy agent defaults
DefaultProxyServerURL = "ballast.proxy.rlwy.net:16587"
Expand Down Expand Up @@ -121,24 +106,9 @@ type Config struct {
DataDir string `yaml:"-"`
SocketFile string `yaml:"-"`

// Inference engine configuration
InferencePort int `yaml:"inference_port"`
InferenceModelFile string `yaml:"inference_model_file"`
InferenceShmSize string `yaml:"inference_shm_size"`
InferenceContextSize int `yaml:"inference_context_size"`
InferenceBatchSize int `yaml:"inference_batch_size"`
InferenceGPULayers int `yaml:"inference_gpu_layers"`
InferenceTensorSplit string `yaml:"inference_tensor_split"`
InferenceMainGPU int `yaml:"inference_main_gpu"`
InferenceThreads int `yaml:"inference_threads"`
InferenceHTTPThreads int `yaml:"inference_http_threads"`
InferenceFit string `yaml:"inference_fit"`
InferenceGPUDevices string `yaml:"inference_gpu_devices"`

// Service toggles
EnableInferenceEngine bool `yaml:"enable_inference_engine"`
EnableProxyAgent bool `yaml:"enable_proxy_agent"`
EnableDeepResearch bool `yaml:"enable_deep_research"`
EnableProxyAgent bool `yaml:"enable_proxy_agent"`
EnableDeepResearch bool `yaml:"enable_deep_research"`

// Proxy agent configuration
ProxyServerURL string `yaml:"proxy_server_url"`
Expand Down Expand Up @@ -173,24 +143,9 @@ func NewDefaultConfig(paths *Paths) *Config {
DataDir: paths.AppDataDir,
SocketFile: paths.SocketFile,

// Inference engine defaults
InferencePort: DefaultInferencePort,
InferenceModelFile: DefaultInferenceModelFile,
InferenceShmSize: DefaultInferenceShmSize,
InferenceContextSize: DefaultInferenceContextSize,
InferenceBatchSize: DefaultInferenceBatchSize,
InferenceGPULayers: DefaultInferenceGPULayers,
InferenceTensorSplit: DefaultInferenceTensorSplit,
InferenceMainGPU: DefaultInferenceMainGPU,
InferenceThreads: DefaultInferenceThreads,
InferenceHTTPThreads: DefaultInferenceHTTPThreads,
InferenceFit: DefaultInferenceFit,
InferenceGPUDevices: DefaultInferenceGPUDevices,

// Service toggles
EnableInferenceEngine: DefaultEnableInferenceEngine,
EnableProxyAgent: DefaultEnableProxyAgent,
EnableDeepResearch: DefaultEnableDeepResearch,
EnableProxyAgent: DefaultEnableProxyAgent,
EnableDeepResearch: DefaultEnableDeepResearch,

// Proxy agent defaults
ProxyServerURL: DefaultProxyServerURL,
Expand Down Expand Up @@ -360,35 +315,14 @@ func Validate(config *Config) error {
return fmt.Errorf("default_model cannot be empty")
}

// Inference engine validation (only when enabled)
if config.EnableInferenceEngine {
if config.InferencePort < 1 || config.InferencePort > 65535 {
return fmt.Errorf("inference_port must be between 1 and 65535")
}
if config.InferenceModelFile == "" {
return fmt.Errorf("inference_model_file cannot be empty")
}
if config.InferenceContextSize < 1 {
return fmt.Errorf("inference_context_size must be positive")
}
if config.InferenceBatchSize < 1 {
return fmt.Errorf("inference_batch_size must be positive")
}
if config.InferenceThreads < 1 {
return fmt.Errorf("inference_threads must be positive")
}
if config.InferenceHTTPThreads < 1 {
return fmt.Errorf("inference_http_threads must be positive")
}
}

// Proxy agent validation (only when enabled)
if config.EnableProxyAgent {
if config.ProxyServerURL == "" {
return fmt.Errorf("proxy_server_url cannot be empty when proxy agent is enabled")
}
}


return nil
}

Expand Down
Loading
Loading