Skip to content

Feat/api to provision cert#8

Merged
lwlee2608 merged 29 commits intomainfrom
feat/API-to-provision-cert
Feb 5, 2026
Merged

Feat/api to provision cert#8
lwlee2608 merged 29 commits intomainfrom
feat/API-to-provision-cert

Conversation

@JckHoe
Copy link
Contributor

@JckHoe JckHoe commented Jan 27, 2026

Summary by CodeRabbit

  • New Features

    • Agent certificate management endpoints: create, retrieve (ZIP download), and delete per-agent TLS certificates.
    • Admin API key authentication for certificate endpoints (expects X-API-Key header).
    • New TLS configuration options for CA/key, domain names, IP addresses, and agent certificate storage.
    • Convenience scripts to create, get, delete, and list agent certificates.
  • Documentation

    • Added a brief potential-issues note about certificate lifecycle and safety considerations.

@JckHoe JckHoe requested a review from lwlee2608 January 27, 2026 18:16
@coderabbitai
Copy link

coderabbitai bot commented Jan 27, 2026

Caution

Review failed

The pull request is closed.

Walkthrough

This PR adds an internal cert package and runtime certificate management, new TLS configuration fields, an admin API key config, API-key auth middleware, HTTP routes and handlers to create/get/delete per-agent certificates at /agents/:id/certificate (ZIP responses), wiring to initialize a cert service when gRPC TLS is enabled, and helper shell scripts to exercise the endpoints. TLS remains opt-in (still disabled by default).

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Router
    participant AuthMiddleware
    participant CertHandler
    participant CertService
    participant FileSystem

    Client->>Router: POST /agents/{id}/certificate (X-API-Key)
    Router->>AuthMiddleware: run APIKeyAuth
    AuthMiddleware->>Router: allow or reject
    Router->>CertHandler: CreateAgentCertificate
    CertHandler->>CertService: AgentCertExists?
    CertService->>FileSystem: stat/read agent cert files
    CertService-->>CertHandler: exists? / status
    CertHandler->>CertService: GenerateAgentCert(agentID)
    CertService->>CertService: load or generate CA and agent cert/key
    CertService->>FileSystem: write agent cert & key
    CertService-->>CertHandler: cert & key
    CertHandler->>CertService: GetCACert()
    CertService->>FileSystem: read CA cert
    CertService-->>CertHandler: CA cert bytes
    CertHandler->>CertHandler: create ZIP archive
    CertHandler-->>Client: HTTP 201 + ZIP
Loading
sequenceDiagram
    participant Client
    participant Router
    participant AuthMiddleware
    participant CertHandler
    participant CertService
    participant FileSystem

    Client->>Router: GET /agents/{id}/certificate (X-API-Key)
    Router->>AuthMiddleware: run APIKeyAuth
    AuthMiddleware->>Router: allow or reject
    Router->>CertHandler: GetAgentCertificate
    CertHandler->>CertService: AgentCertExists?
    CertService->>FileSystem: read agent cert & key
    CertService-->>CertHandler: cert & key bytes
    CertHandler->>CertService: GetCACert()
    CertService->>FileSystem: read CA cert
    CertService-->>CertHandler: CA cert bytes
    CertHandler->>CertHandler: create ZIP archive
    CertHandler-->>Client: HTTP 200 + ZIP
Loading
sequenceDiagram
    participant Client
    participant Router
    participant AuthMiddleware
    participant CertHandler
    participant CertService
    participant FileSystem

    Client->>Router: DELETE /agents/{id}/certificate (X-API-Key)
    Router->>AuthMiddleware: run APIKeyAuth
    AuthMiddleware->>Router: allow or reject
    Router->>CertHandler: DeleteAgentCertificate
    CertHandler->>CertService: AgentCertExists?
    CertService->>FileSystem: stat agent cert path
    CertHandler->>CertService: DeleteAgentCert(agentID)
    CertService->>FileSystem: remove agent cert & key files
    CertService-->>CertHandler: deletion result
    CertHandler-->>Client: HTTP 200 + deletion info
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I nibble keys and carve a CA,
Agents get bundles wrapped today,
ZIPs snug with cert and secret seed,
X-API-Key guards the rabbit mead,
Hops and bytes — secure sprouts grow free.

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Feat/api to provision cert' is partially related to the changeset. It refers to a real aspect of the change (API for certificate provisioning) but uses vague and truncated phrasing ('cert' instead of 'certificates', 'provision' rather than 'manage/generate'). It omits the core infrastructure component (internal/cert package) and oversimplifies the scope, which includes auto-generation, storage, and deletion endpoints. Consider using a more descriptive title such as 'Add certificate management API with auto-generation' or 'Implement agent certificate provisioning service' to better reflect both the API endpoints and the underlying service implementation.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/API-to-provision-cert

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@internal/api/http/handler/cert.go`:
- Around line 91-95: Sanitize req.AgentID before using it in filenames and the
Content-Disposition header: create a safeAgentID by validating/replacing
disallowed characters (e.g., allow only alphanumerics, dashes, underscores;
replace others with '_', or reject invalid IDs) and use safeAgentID when
building the files map (currently using fmt.Sprintf("%s-cert.pem", req.AgentID)
and "%s-key.pem") and when composing the Content-Disposition header; also ensure
GenerateAgentCert validation aligns with this sanitization so the handler
(agentCertPEM, agentKeyPEM, caCertBytes, and the files map) never writes or
references paths with unsafe characters.
- Around line 130-181: The DeleteServerCerts handler (method
CertHandler.DeleteServerCerts) is currently unprotected and can delete CA/server
keys; add authentication and admin-only authorization and an explicit
confirmation requirement: register an authentication middleware for the cert
routes so requests reach DeleteServerCerts only after identity is validated,
enforce an admin role check inside DeleteServerCerts (or via an authorization
middleware) that verifies the caller has admin privileges before proceeding, and
require a confirmed deletion token or special header (e.g., X-Confirm-Delete
with a configured value) that must be present and valid before performing file
removals on certService.CaKeyPath, certService.CaCertPath,
certService.ServerKeyPath, and certService.ServerCertPath; return 401/403 when
unauthenticated/unauthorized and 400 when confirmation header is
missing/invalid, and log authorization and confirmation failures.

In `@internal/api/http/router.go`:
- Around line 27-29: Before creating certHandler using handler.NewCertHandler,
add a nil check for srvs.CertService to mirror the pattern used for GrpcServer
and ensure that certHandler and its routes are only registered if
srvs.CertService is not nil.

In `@internal/cert/generate.go`:
- Around line 98-141: The code uses agentID directly as the certificate
CommonName in GenerateAgentCert; validate the agentID earlier (e.g., in the
ProvisionAgent handler) to allow only alphanumeric, hyphen, underscore and limit
length to <=64 chars: add a package-level regexp (agentIDPattern :=
regexp.MustCompile(`^[A-Za-z0-9_-]{1,64}$`)), import "regexp", check
agentIDPattern.MatchString(req.AgentID) in ProvisionAgent (or before calling
Service.GenerateAgentCert) and return HTTP 400 with an error if it fails so
GenerateAgentCert only receives validated input.
🧹 Nitpick comments (5)
misc/delete-server-certs.sh (1)

1-2: Consider adding error handling and status feedback.

For a utility script, adding -f (fail silently on HTTP errors) or -w to print status codes would help users understand if the operation succeeded.

💡 Optional improvement
 #!/bin/bash
-curl -X DELETE http://localhost:8080/cert/server
+curl -X DELETE http://localhost:8080/cert/server -w "\nHTTP Status: %{http_code}\n"
misc/provision-agent-cert.sh (1)

1-2: Consider parameterizing the agent ID for reusability.

The script works as a demo, but parameterizing the agent ID would make it more useful.

💡 Optional improvement
 #!/bin/bash
-curl -X POST http://localhost:8080/cert/agent -H "Content-Type: application/json" -d '{"agent_id": "agent-1"}' --output agent-1-certs.zip
+AGENT_ID="${1:-agent-1}"
+curl -X POST http://localhost:8080/cert/agent -H "Content-Type: application/json" -d "{\"agent_id\": \"$AGENT_ID\"}" --output "${AGENT_ID}-certs.zip"
internal/api/http/router.go (1)

28-29: Consider adding authentication/authorization for certificate endpoints.

The DELETE /cert/server endpoint can delete server certificates, and POST /cert/agent can provision new certificates. These are sensitive operations that should likely be protected by authentication middleware to prevent unauthorized access.

internal/cert/utils.go (1)

104-107: Consider distinguishing between "not exists" and other errors.

The current implementation returns false for both non-existent files and permission errors. For certificate management, this is likely acceptable since permission errors would surface during actual read/write operations.

internal/api/http/handler/cert.go (1)

43-48: Redundant validation check.

This check is redundant because the DTO already has binding:"required" on the AgentID field. The ShouldBindJSON call at line 36 will return an error if agent_id is missing or empty.

♻️ Consider removing redundant check
-	if req.AgentID == "" {
-		ctx.JSON(http.StatusBadRequest, gin.H{
-			"error": "agent_id is required",
-		})
-		return
-	}

Comment on lines 130 to 181
func (h *CertHandler) DeleteServerCerts(ctx *gin.Context) {
if h.certService == nil {
slog.Warn("Server cert deletion requested but TLS is disabled")
ctx.JSON(http.StatusBadRequest, gin.H{
"error": "TLS is not enabled on this server",
})
return
}

slog.Info("Deleting all server certificates")

certPaths := []string{
h.certService.CaCertPath,
h.certService.CaKeyPath,
h.certService.ServerCertPath,
h.certService.ServerKeyPath,
}

deletedFiles := []string{}
var errors []string

for _, path := range certPaths {
if _, err := os.Stat(path); os.IsNotExist(err) {
slog.Debug("Certificate file does not exist, skipping", "path", path)
continue
}

if err := os.Remove(path); err != nil {
slog.Error("Failed to delete certificate file", "error", err, "path", path)
errors = append(errors, fmt.Sprintf("%s: %v", path, err))
} else {
slog.Info("Deleted certificate file", "path", path)
deletedFiles = append(deletedFiles, path)
}
}

if len(errors) > 0 {
ctx.JSON(http.StatusInternalServerError, gin.H{
"error": "Failed to delete some certificate files",
"deleted_files": deletedFiles,
"errors": errors,
})
return
}

ctx.JSON(http.StatusOK, gin.H{
"message": "Successfully deleted server certificates",
"deleted_files": deletedFiles,
})

slog.Info("Server certificates deleted successfully", "count", len(deletedFiles))
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find route registrations for cert handler
echo "=== Finding cert handler route registrations ==="
rg -n "DeleteServerCerts|certHandler|/certs.*delete" --type go -A 5 -B 5

echo -e "\n=== Looking for router setup and middleware ==="
rg -n "func.*[Rr]outer|[Rr]outer\(\)|NewRouter" --type go -A 10 | head -100

Repository: EternisAI/silo-proxy

Length of output: 1606


🏁 Script executed:

#!/bin/bash
# Find the main router setup and middleware patterns
echo "=== Finding authentication middleware ==="
rg -n "auth|Auth|JWT|middleware" --type go -g "router*" -g "server*" -g "main.go" -A 3 -B 3 | head -150

Repository: EternisAI/silo-proxy

Length of output: 4425


🏁 Script executed:

#!/bin/bash
# Look for the cert handler initialization and route group
echo "=== Finding cert handler setup ==="
fd -e go -x rg -l "CertHandler" {} \; | xargs -I {} sh -c 'echo "=== File: {} ===" && cat -n {}'

Repository: EternisAI/silo-proxy

Length of output: 7457


🏁 Script executed:

#!/bin/bash
# Check what middleware exists in the middleware package
echo "=== Middleware files in package ==="
find internal/api/http/middleware -type f -name "*.go" 2>/dev/null | sort

echo -e "\n=== Checking middleware implementations ==="
cat internal/api/http/middleware/*.go 2>/dev/null | head -200

Repository: EternisAI/silo-proxy

Length of output: 1128


Add authentication/authorization protection to the DeleteServerCerts endpoint.

This endpoint is completely unprotected and can be called by any client with network access. It is a destructive operation that deletes the CA private key and server certificates, which would break TLS connectivity for all agents and the server itself.

The endpoint needs:

  1. Authentication middleware protecting the /cert/* routes
  2. Authorization checks to restrict access to administrators only
  3. Optionally, a confirmation mechanism (e.g., deletion token, special header requirement)

Currently, no authentication middleware exists in the codebase—only request logging is implemented.

🤖 Prompt for AI Agents
In `@internal/api/http/handler/cert.go` around lines 130 - 181, The
DeleteServerCerts handler (method CertHandler.DeleteServerCerts) is currently
unprotected and can delete CA/server keys; add authentication and admin-only
authorization and an explicit confirmation requirement: register an
authentication middleware for the cert routes so requests reach
DeleteServerCerts only after identity is validated, enforce an admin role check
inside DeleteServerCerts (or via an authorization middleware) that verifies the
caller has admin privileges before proceeding, and require a confirmed deletion
token or special header (e.g., X-Confirm-Delete with a configured value) that
must be present and valid before performing file removals on
certService.CaKeyPath, certService.CaCertPath, certService.ServerKeyPath, and
certService.ServerCertPath; return 401/403 when unauthenticated/unauthorized and
400 when confirmation header is missing/invalid, and log authorization and
confirmation failures.

Comment on lines 27 to 29
certHandler := handler.NewCertHandler(srvs.CertService)
engine.POST("/cert/agent", certHandler.ProvisionAgent)
engine.DELETE("/cert/server", certHandler.DeleteServerCerts)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add nil check for CertService to match GrpcServer pattern.

The GrpcServer is nil-checked before creating adminHandler (line 22), but CertService is used unconditionally. If CertService is nil, this could cause a nil pointer dereference when the handler methods are invoked.

🔧 Proposed fix
-	certHandler := handler.NewCertHandler(srvs.CertService)
-	engine.POST("/cert/agent", certHandler.ProvisionAgent)
-	engine.DELETE("/cert/server", certHandler.DeleteServerCerts)
+	if srvs.CertService != nil {
+		certHandler := handler.NewCertHandler(srvs.CertService)
+		engine.POST("/cert/agent", certHandler.ProvisionAgent)
+		engine.DELETE("/cert/server", certHandler.DeleteServerCerts)
+	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
certHandler := handler.NewCertHandler(srvs.CertService)
engine.POST("/cert/agent", certHandler.ProvisionAgent)
engine.DELETE("/cert/server", certHandler.DeleteServerCerts)
if srvs.CertService != nil {
certHandler := handler.NewCertHandler(srvs.CertService)
engine.POST("/cert/agent", certHandler.ProvisionAgent)
engine.DELETE("/cert/server", certHandler.DeleteServerCerts)
}
🤖 Prompt for AI Agents
In `@internal/api/http/router.go` around lines 27 - 29, Before creating
certHandler using handler.NewCertHandler, add a nil check for srvs.CertService
to mirror the pattern used for GrpcServer and ensure that certHandler and its
routes are only registered if srvs.CertService is not nil.

Comment on lines 98 to 158
func (s *Service) GenerateAgentCert(agentID string) (*x509.Certificate, *rsa.PrivateKey, error) {
slog.Info("Generating agent certificate", "agent_id", agentID)

caCert, caKey, err := loadCA(s.CaCertPath, s.CaKeyPath)
if err != nil {
slog.Error("Failed to load CA for agent cert generation", "error", err, "agent_id", agentID)
return nil, nil, fmt.Errorf("failed to load CA: %w", err)
}

agentKey, err := rsa.GenerateKey(rand.Reader, 4096)
if err != nil {
return nil, nil, fmt.Errorf("failed to generate agent key: %w", err)
}

serialNumber, err := rand.Int(rand.Reader, new(big.Int).Lsh(big.NewInt(1), 128))
if err != nil {
return nil, nil, fmt.Errorf("failed to generate serial number: %w", err)
}

agentTemplate := &x509.Certificate{
SerialNumber: serialNumber,
Subject: pkix.Name{
Organization: []string{"Silo Proxy"},
CommonName: agentID,
},
NotBefore: time.Now(),
NotAfter: time.Now().Add(365 * 24 * time.Hour),
KeyUsage: x509.KeyUsageDigitalSignature | x509.KeyUsageKeyEncipherment,
ExtKeyUsage: []x509.ExtKeyUsage{x509.ExtKeyUsageClientAuth},
BasicConstraintsValid: true,
}

agentCertBytes, err := x509.CreateCertificate(rand.Reader, agentTemplate, caCert, &agentKey.PublicKey, caKey)
if err != nil {
return nil, nil, fmt.Errorf("failed to create agent certificate: %w", err)
}

agentCert, err := x509.ParseCertificate(agentCertBytes)
if err != nil {
return nil, nil, fmt.Errorf("failed to parse agent certificate: %w", err)
}

slog.Info("Generated agent certificate", "agent_id", agentID)
return agentCert, agentKey, nil
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate agentID before using it as certificate CommonName.

The agentID is used directly as the certificate's CommonName (line 121). Since this value comes from user input via the HTTP request, consider validating it to ensure it:

  1. Contains only alphanumeric characters, hyphens, or underscores
  2. Has a reasonable length limit (e.g., 64 characters)
  3. Doesn't contain special characters that could cause issues in filenames or certificate parsing

This prevents potential issues with malformed certificate subjects.

🔒 Suggested validation approach

Add validation in the handler before calling GenerateAgentCert:

import "regexp"

var agentIDPattern = regexp.MustCompile(`^[a-zA-Z0-9_-]{1,64}$`)

// In ProvisionAgent handler, after binding:
if !agentIDPattern.MatchString(req.AgentID) {
    ctx.JSON(http.StatusBadRequest, gin.H{
        "error": "agent_id must be 1-64 alphanumeric characters, hyphens, or underscores",
    })
    return
}
🤖 Prompt for AI Agents
In `@internal/cert/generate.go` around lines 98 - 141, The code uses agentID
directly as the certificate CommonName in GenerateAgentCert; validate the
agentID earlier (e.g., in the ProvisionAgent handler) to allow only
alphanumeric, hyphen, underscore and limit length to <=64 chars: add a
package-level regexp (agentIDPattern :=
regexp.MustCompile(`^[A-Za-z0-9_-]{1,64}$`)), import "regexp", check
agentIDPattern.MatchString(req.AgentID) in ProvisionAgent (or before calling
Service.GenerateAgentCert) and return HTTP 400 with an error if it fails so
GenerateAgentCert only receives validated input.


certHandler := handler.NewCertHandler(srvs.CertService)
engine.POST("/cert/agent", certHandler.ProvisionAgent)
engine.DELETE("/cert/server", certHandler.DeleteServerCerts)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new endpoints looks weird, I was expecting a CRD

@railway-app railway-app bot temporarily deployed to Silo / production January 28, 2026 04:32 Inactive
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@internal/cert/service.go`:
- Around line 60-136: When you call s.GenerateCA() (and assign caCert, caKey)
set a caRotated flag (e.g., caRotated := true) so the code knows a fresh CA was
created; ensure loadCA path leaves caRotated false. Later, when deciding whether
to reuse server certs (the fileExists checks and the branch that may call
s.GenerateServerCert), treat server certs as missing if caRotated is true so
they are regenerated with the new CA. Update logic around s.GenerateCA, loadCA,
fileExists, and s.GenerateServerCert to use this flag and keep existing
writeCertToFile/writeKeyToFile flows unchanged.
- Around line 141-146: The ensureDirectory method (func (s *Service)
ensureDirectory(filePath string)) creates the cert/key directory with mode 0755;
change the os.MkdirAll call to use 0700 to restrict directory traversal to the
owner for defense-in-depth (i.e., replace the 0755 mode argument with 0700 in
the os.MkdirAll(dir, ...) call and keep the existing error logging/return
behavior).

In `@internal/cert/utils.go`:
- Around line 53-60: The current load of CA key uses x509.ParsePKCS8PrivateKey
and errors out for PKCS#1 keys; update the logic around key, err :=
x509.ParsePKCS8PrivateKey(keyBlock.Bytes) and the subsequent caKey type
assertion so that if ParsePKCS8 fails you fall back to
x509.ParsePKCS1PrivateKey(keyBlock.Bytes), and if ParsePKCS8 succeeds but the
type assertion caKey, ok := key.(*rsa.PrivateKey) fails, also attempt parsing
with x509.ParsePKCS1PrivateKey; ensure the final caKey is an *rsa.PrivateKey or
return a clear error.
🧹 Nitpick comments (1)
internal/cert/utils.go (1)

119-121: Improve error handling to distinguish between missing and inaccessible files.

fileExists returns false for both missing files and permission/IO errors, causing unnecessary regeneration attempts when permissions are the real issue. Consider returning (bool, error) and checking os.IsNotExist() specifically to distinguish actual missing files from access errors. This requires updating the 4 callers in service.go (lines 54–55, 100–101).

Comment on lines +60 to +138
if !caCertExists || !caKeyExists {
slog.Info("CA certificate not found, generating new CA", "cert_path", s.CaCertPath)

var err error
caCert, caKey, err = s.GenerateCA()
if err != nil {
slog.Error("Failed to generate CA certificate", "error", err)
return fmt.Errorf("failed to generate CA certificate: %w", err)
}

if err := s.ensureDirectory(s.CaCertPath); err != nil {
return err
}

if err := writeCertToFile(caCert, s.CaCertPath); err != nil {
slog.Error("Failed to write CA certificate", "error", err, "path", s.CaCertPath)
return fmt.Errorf("failed to write CA certificate: %w", err)
}

if err := s.ensureDirectory(s.CaKeyPath); err != nil {
return err
}

if err := writeKeyToFile(caKey, s.CaKeyPath); err != nil {
slog.Error("Failed to write CA key", "error", err, "path", s.CaKeyPath)
return fmt.Errorf("failed to write CA key: %w", err)
}

slog.Info("Generated CA certificate", "cert_path", s.CaCertPath, "key_path", s.CaKeyPath)
} else {
slog.Debug("Using existing CA certificate", "cert_path", s.CaCertPath)

var err error
caCert, caKey, err = loadCA(s.CaCertPath, s.CaKeyPath)
if err != nil {
slog.Error("Failed to load existing CA certificate", "error", err)
return fmt.Errorf("failed to load existing CA certificate: %w", err)
}
}

serverCertExists := fileExists(s.ServerCertPath)
serverKeyExists := fileExists(s.ServerKeyPath)

if !serverCertExists || !serverKeyExists {
slog.Info("Server certificate not found, generating new server certificate",
"cert_path", s.ServerCertPath,
"domains", s.DomainNames,
"ips", s.IPAddresses)

serverCert, serverKey, err := s.GenerateServerCert(caCert, caKey, s.DomainNames, s.IPAddresses)
if err != nil {
slog.Error("Failed to generate server certificate", "error", err)
return fmt.Errorf("failed to generate server certificate: %w", err)
}

if err := s.ensureDirectory(s.ServerCertPath); err != nil {
return err
}

if err := writeCertToFile(serverCert, s.ServerCertPath); err != nil {
slog.Error("Failed to write server certificate", "error", err, "path", s.ServerCertPath)
return fmt.Errorf("failed to write server certificate: %w", err)
}

if err := s.ensureDirectory(s.ServerKeyPath); err != nil {
return err
}

if err := writeKeyToFile(serverKey, s.ServerKeyPath); err != nil {
slog.Error("Failed to write server key", "error", err, "path", s.ServerKeyPath)
return fmt.Errorf("failed to write server key: %w", err)
}

slog.Info("Generated server certificate", "cert_path", s.ServerCertPath, "key_path", s.ServerKeyPath)
} else {
slog.Debug("Using existing server certificate", "cert_path", s.ServerCertPath)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Regenerate server certs when the CA is rotated.
If the CA cert or key is missing, a new CA is generated, but existing server certs are still reused. That yields a server cert signed by the old CA, which breaks trust. Track CA regeneration and force server cert regeneration when it happens.

🐛 Suggested fix
-	if !caCertExists || !caKeyExists {
+	caRegenerated := false
+	if !caCertExists || !caKeyExists {
 		slog.Info("CA certificate not found, generating new CA", "cert_path", s.CaCertPath)
@@
 		if err := writeKeyToFile(caKey, s.CaKeyPath); err != nil {
 			slog.Error("Failed to write CA key", "error", err, "path", s.CaKeyPath)
 			return fmt.Errorf("failed to write CA key: %w", err)
 		}
+		caRegenerated = true
@@
-	if !serverCertExists || !serverKeyExists {
+	if caRegenerated || !serverCertExists || !serverKeyExists {
 		slog.Info("Server certificate not found, generating new server certificate",
 			"cert_path", s.ServerCertPath,
 			"domains", s.DomainNames,
 			"ips", s.IPAddresses)
🤖 Prompt for AI Agents
In `@internal/cert/service.go` around lines 60 - 136, When you call s.GenerateCA()
(and assign caCert, caKey) set a caRotated flag (e.g., caRotated := true) so the
code knows a fresh CA was created; ensure loadCA path leaves caRotated false.
Later, when deciding whether to reuse server certs (the fileExists checks and
the branch that may call s.GenerateServerCert), treat server certs as missing if
caRotated is true so they are regenerated with the new CA. Update logic around
s.GenerateCA, loadCA, fileExists, and s.GenerateServerCert to use this flag and
keep existing writeCertToFile/writeKeyToFile flows unchanged.

Comment on lines +141 to +148
func (s *Service) ensureDirectory(filePath string) error {
dir := filepath.Dir(filePath)
if err := os.MkdirAll(dir, 0755); err != nil {
slog.Error("Failed to create directory", "error", err, "path", dir)
return fmt.Errorf("failed to create directory %s: %w", dir, err)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the ensureDirectory function and its usage context
cd internal/cert && cat -n service.go | sed -n '130,160p'

Repository: EternisAI/silo-proxy

Length of output: 1029


🏁 Script executed:

# Search for where ensureDirectory is called
rg "ensureDirectory" --context 5

Repository: EternisAI/silo-proxy

Length of output: 3015


🏁 Script executed:

# Look at the full service.go file to understand what this service does
wc -l internal/cert/service.go

Repository: EternisAI/silo-proxy

Length of output: 92


🏁 Script executed:

# Check the beginning of service.go to understand the purpose
head -80 internal/cert/service.go

Repository: EternisAI/silo-proxy

Length of output: 2057


🌐 Web query:

Best practices for TLS private key directory permissions Linux 0700 vs 0755

💡 Result:

What to prefer for a TLS private-key directory: 0700 vs 0755

Best practice (most common and safest default): use 0700 on the directory containing private keys, owned by root:root, and restrict the key file to 0600 (or even 0400 if the key never needs to be edited). This follows “least privilege” and provides defense-in-depth even if a key file’s mode is accidentally loosened. Certbot historically does exactly this by creating key-containing directories as 0700. [1], [2]

Why 0755 is usually not ideal for private-key directories

A directory needs the execute (x) bit to be traversed. With 0755, any local user can traverse the directory path, and if a key file’s permissions are ever mis-set (e.g., 0644), the key becomes readable. With 0700, even if a key file accidentally becomes world-readable, non-root users still can’t reach it because the parent directory blocks access. Let’s Encrypt community support explicitly notes this “directory protection” model (e.g., /etc/letsencrypt/archive as 0700). [2]

When 0755 (or 0750) can be acceptable

Use a more-open directory only if you need non-root daemons to access the files, and then do it in a controlled way:

  • Prefer group-based access:
    • Directory: 0750 (or similar) with an appropriate group (e.g., ssl-cert)
    • Private key file: 0640 with that same group
    • Add only the required service accounts to the group
      This pattern is commonly recommended in real deployments to avoid “world” access while still enabling multiple services. [3], [1]

Certbot’s docs also describe this approach for services that need to read privkey.pem without running as root: change group ownership and set the key to 0640. [1]

What about 0710 / 0750?

These are often better than 0755:

  • 0710: only owner can list; group can traverse (useful when a service needs to open a known filename but you don’t want group members listing the directory).
  • 0750: group can list + traverse (fine if group members are trusted service accounts).

Practical recommendations (quick checklist)

  • Single-service, simplest & safest: directory 0700, key 0600 (or 0400). (CIS-style hardening commonly targets very restrictive key modes.) [4]
  • Multiple services need access: directory 0750 (root + ssl-cert group), key 0640 (root:ssl-cert). [3], [1]
  • Avoid putting private keys in any directory that is 0755 unless you have a strong operational reason and you’re confident key file modes/groups are tightly managed.

Sources: Certbot documentation on permissions and group access [1]; Let’s Encrypt community explanations of 0700 parent-directory protection [2]; GitLab issue proposing 750 dir / 640 file for cert material [3]; CIS/benchmark guidance emphasizing highly restrictive private-key file permissions [4].


🏁 Script executed:

# Check how the key files are written - what permissions are set
rg -A 10 "writeKeyToFile" internal/cert/service.go

Repository: EternisAI/silo-proxy

Length of output: 940


🏁 Script executed:

# Look for the writeKeyToFile function implementation
rg -B 5 -A 15 "func writeKeyToFile"

Repository: EternisAI/silo-proxy

Length of output: 949


🏁 Script executed:

# Check if there's any chmod or permission setting for the key files
rg "Chmod|0[0-7]{3}|perm" internal/cert/

Repository: EternisAI/silo-proxy

Length of output: 302


Use 0700 permissions for the certificate/key directory.
The directory is created with 0755, allowing any local user to traverse it. While key files are correctly restricted to 0600, using 0700 for the directory provides defense-in-depth: if a key file's permissions are ever accidentally loosened (e.g., 0644), non-root users still cannot access it. This follows Certbot and Let's Encrypt best practices.

Change line 143 to use os.MkdirAll(dir, 0700).

🤖 Prompt for AI Agents
In `@internal/cert/service.go` around lines 141 - 146, The ensureDirectory method
(func (s *Service) ensureDirectory(filePath string)) creates the cert/key
directory with mode 0755; change the os.MkdirAll call to use 0700 to restrict
directory traversal to the owner for defense-in-depth (i.e., replace the 0755
mode argument with 0700 in the os.MkdirAll(dir, ...) call and keep the existing
error logging/return behavior).

Comment on lines +53 to +76
key, err := x509.ParsePKCS8PrivateKey(keyBlock.Bytes)
if err != nil {
return nil, nil, fmt.Errorf("failed to parse CA key: %w", err)
}

caKey, ok := key.(*rsa.PrivateKey)
if !ok {
return nil, nil, fmt.Errorf("CA key is not an RSA private key")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd /tmp && find . -name "utils.go" -path "*/cert/*" 2>/dev/null | head -5

Repository: EternisAI/silo-proxy

Length of output: 46


🏁 Script executed:

# Search for the cert utils file
git ls-files | grep -E "cert/utils\.go"

Repository: EternisAI/silo-proxy

Length of output: 86


🏁 Script executed:

# Look for internal/cert directory
git ls-files | grep -E "internal/cert"

Repository: EternisAI/silo-proxy

Length of output: 137


🌐 Web query:

Go x509 ParsePKCS8PrivateKey ParsePKCS1PrivateKey compatibility "BEGIN RSA PRIVATE KEY" PKCS#1 PKCS#8 PEM

💡 Result:

In Go, the compatibility is:

  • PEM -----BEGIN RSA PRIVATE KEY----- = PKCS#1 (RSA-only). Decode PEM → call x509.ParsePKCS1PrivateKey(der) (returns *rsa.PrivateKey). [1]
  • PEM -----BEGIN PRIVATE KEY----- = PKCS#8 “PrivateKeyInfo” wrapper (can contain RSA/ECDSA/Ed25519/X25519, etc.). Decode PEM → call x509.ParsePKCS8PrivateKey(der) (returns any, you type-assert). [2]
  • PEM -----BEGIN ENCRYPTED PRIVATE KEY----- = encrypted PKCS#8; x509.ParsePKCS8PrivateKey only parses unencrypted PKCS#8, so you must decrypt/convert first (often via OpenSSL to an unencrypted PKCS#8 or PKCS#1 form). [2]

Typical PEM dispatch:

block, _ := pem.Decode(pemBytes)
switch block.Type {
case "RSA PRIVATE KEY":
    k, err := x509.ParsePKCS1PrivateKey(block.Bytes)
    _ = k; _ = err
case "PRIVATE KEY":
    kAny, err := x509.ParsePKCS8PrivateKey(block.Bytes)
    _ = kAny; _ = err
default:
    // e.g. "ENCRYPTED PRIVATE KEY" (encrypted PKCS#8), "EC PRIVATE KEY", etc.
}

Note: historically, Go’s PKCS#1 parser has rejected some valid-but-uncommon RSA private key encodings (e.g., missing CRT parameters), which can show up when keys are produced by certain toolchains. [3]

Sources: [1] https://golangdoc.github.io/pkg/1.12/crypto/x509/index.html, [2] https://pkg.go.dev/crypto/x509, [3] golang/go#14512


🏁 Script executed:

cat -n internal/cert/utils.go

Repository: EternisAI/silo-proxy

Length of output: 3623


🏁 Script executed:

# Get line count first to determine how to read it
wc -l internal/cert/utils.go

Repository: EternisAI/silo-proxy

Length of output: 90


Support PKCS#1 keys when loading existing CA keys.

x509.ParsePKCS8PrivateKey fails on PKCS#1-formatted keys (PEM blocks like BEGIN RSA PRIVATE KEY), which breaks startups for users with such keys. Add a fallback to x509.ParsePKCS1PrivateKey.

Suggested fix
	key, err := x509.ParsePKCS8PrivateKey(keyBlock.Bytes)
	if err != nil {
-		return nil, nil, fmt.Errorf("failed to parse CA key: %w", err)
+		if pkcs1Key, err2 := x509.ParsePKCS1PrivateKey(keyBlock.Bytes); err2 == nil {
+			return caCert, pkcs1Key, nil
+		}
+		return nil, nil, fmt.Errorf("failed to parse CA key: %w", err)
	}

	caKey, ok := key.(*rsa.PrivateKey)
	if !ok {
		return nil, nil, fmt.Errorf("CA key is not an RSA private key")
	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
key, err := x509.ParsePKCS8PrivateKey(keyBlock.Bytes)
if err != nil {
return nil, nil, fmt.Errorf("failed to parse CA key: %w", err)
}
caKey, ok := key.(*rsa.PrivateKey)
if !ok {
return nil, nil, fmt.Errorf("CA key is not an RSA private key")
key, err := x509.ParsePKCS8PrivateKey(keyBlock.Bytes)
if err != nil {
if pkcs1Key, err2 := x509.ParsePKCS1PrivateKey(keyBlock.Bytes); err2 == nil {
return caCert, pkcs1Key, nil
}
return nil, nil, fmt.Errorf("failed to parse CA key: %w", err)
}
caKey, ok := key.(*rsa.PrivateKey)
if !ok {
return nil, nil, fmt.Errorf("CA key is not an RSA private key")
}
🤖 Prompt for AI Agents
In `@internal/cert/utils.go` around lines 53 - 60, The current load of CA key uses
x509.ParsePKCS8PrivateKey and errors out for PKCS#1 keys; update the logic
around key, err := x509.ParsePKCS8PrivateKey(keyBlock.Bytes) and the subsequent
caKey type assertion so that if ParsePKCS8 fails you fall back to
x509.ParsePKCS1PrivateKey(keyBlock.Bytes), and if ParsePKCS8 succeeds but the
type assertion caKey, ok := key.(*rsa.PrivateKey) fails, also attempt parsing
with x509.ParsePKCS1PrivateKey; ensure the final caKey is an *rsa.PrivateKey or
return a clear error.

@railway-app railway-app bot temporarily deployed to Silo / production January 28, 2026 04:48 Inactive
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@internal/cert/service.go`:
- Around line 195-208: The DeleteAgentCert function currently accepts an
unvalidated agentID which can enable path traversal; update DeleteAgentCert (and
similarly CreateAgentCertificate and GetAgentCertificate handlers) to validate
agentID before using it to build paths: reject any agentID containing "..", "/",
"\" or os.PathSeparator or any path-separator characters (or ensure agentID ==
filepath.Base(agentID) and matches an allowed pattern), and return a clear error
for invalid input; perform this validation in a shared helper (e.g.,
isValidAgentID) and call it from DeleteAgentCert, CreateAgentCertificate, and
GetAgentCertificate so RemoveAll/Join cannot operate on escaped paths.
🧹 Nitpick comments (2)
internal/cert/generate.go (1)

59-96: Consider removing ExtKeyUsage from the CA certificate.

CA certificates typically don't need ExtKeyUsage constraints. The ExtKeyUsage on lines 79 is inherited/enforced by the issued certificates, not the CA itself. While not harmful, it's unconventional and could be confusing.

♻️ Suggested change
 	caTemplate := &x509.Certificate{
 		SerialNumber: serialNumber,
 		Subject: pkix.Name{
 			Organization: []string{"Silo Proxy CA"},
 			CommonName:   "Silo Proxy Root CA",
 		},
 		NotBefore:             time.Now(),
 		NotAfter:              time.Now().Add(10 * 365 * 24 * time.Hour),
 		KeyUsage:              x509.KeyUsageCertSign | x509.KeyUsageDigitalSignature,
-		ExtKeyUsage:           []x509.ExtKeyUsage{x509.ExtKeyUsageServerAuth, x509.ExtKeyUsageClientAuth},
 		BasicConstraintsValid: true,
 		IsCA:                  true,
 		MaxPathLenZero:        true,
 	}
internal/api/http/handler/cert.go (1)

89-173: Consider reusing createCertZip to reduce code duplication.

GetAgentCertificate (lines 133-166) duplicates the zip creation logic that exists in createCertZip. While the inputs differ slightly (PEM bytes vs. cert objects), you could refactor to have a shared helper that accepts map[string][]byte.

♻️ Suggested refactor

Extract a helper method:

func (h *CertHandler) createZipFromFiles(files map[string][]byte) (*bytes.Buffer, error) {
	zipBuffer := new(bytes.Buffer)
	zipWriter := zip.NewWriter(zipBuffer)

	for filename, content := range files {
		f, err := zipWriter.Create(filename)
		if err != nil {
			return nil, fmt.Errorf("failed to create zip entry for %s: %w", filename, err)
		}
		if _, err := f.Write(content); err != nil {
			return nil, fmt.Errorf("failed to write zip entry for %s: %w", filename, err)
		}
	}

	if err := zipWriter.Close(); err != nil {
		return nil, fmt.Errorf("failed to close zip writer: %w", err)
	}

	return zipBuffer, nil
}

Then use it in both createCertZip and GetAgentCertificate.

@railway-app railway-app bot temporarily deployed to Silo / production January 28, 2026 04:52 Inactive
@railway-app railway-app bot temporarily deployed to Silo / production January 28, 2026 06:30 Inactive
@railway-app railway-app bot temporarily deployed to Silo / production January 28, 2026 06:34 Inactive
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@internal/api/http/router.go`:
- Around line 36-37: The DELETE /server-certs route is an unprotected
destructive endpoint; update router.go to only register or allow access to it
when authorized: either (A) wrap the engine.DELETE registration of
certHandler.DeleteServerCerts behind a config flag (e.g.,
ServerCertsCleanupEnabled) so it’s only added in non-prod/testing, or (B) attach
an authentication/authorization middleware (e.g., RequireAdmin or RequireRole)
to the route so only authorized principals can call
certHandler.DeleteServerCerts; ensure you reference and use the existing
certHandler.DeleteServerCerts symbol and the router engine registration point
when implementing the check.

@lwlee2608
Copy link
Collaborator

Summary

This PR adds a comprehensive certificate management API that enables dynamic provisioning of agent certificates. The
implementation includes auto-generation of CA and server certificates on startup, and REST endpoints to manage
agent certificates.

Scope: 958 additions across 14 files
Core Changes: New /internal/cert package, certificate management handlers, configuration updates


Architecture Review

Strengths

  1. Clean Layer Separation: The cert service (internal/cert) is properly separated from HTTP handlers, following the
    project's architecture principles.
  2. Smart Certificate Management (service.go:55-141):
    - Auto-generates CA and server certificates on first run if missing
    - Reuses existing certificates on subsequent starts
    - Creates directories as needed
  3. Proper Security Practices:
    - Key files written with 0600 permissions (utils.go:1045)
    - Certificate files written with 0644 permissions (utils.go:1032)
    - PKCS8 format for private keys
    - 4096-bit RSA keys
  4. Good Error Handling: Consistent structured logging with slog throughout.

Issues & Recommendations

Critical

  1. Dangerous Temp Endpoint (handler/cert.go:312-363, router.go:524)
    // Temp endpoint for cleaning
    engine.DELETE("/server-certs", certHandler.DeleteServerCerts)

This endpoint deletes CA keys with no authentication or authorization. This is extremely dangerous even for
development.

Recommendation:

  • Remove this endpoint before merging to production
  • If needed for development, add it behind a feature flag or only in debug mode
  • At minimum, add authentication and require confirmation
  1. Exposed Service Fields (handler/cert.go:323-328)
    certPaths := []string{
    h.certService.CaCertPath,
    h.certService.CaKeyPath,
    // ...
    }

The handler directly accesses service struct fields, breaking encapsulation.

Recommendation: Add getter methods to the service or move this logic into the cert service itself.

High Priority

  1. Code Duplication (handler/cert.go:89-173 vs handler/cert.go:275-310)

The zip creation logic appears twice - once inline in GetAgentCertificate and once in the helper method
createCertZip. The inline version in GetAgentCertificate reads files from disk, while createCertZip works with
in-memory certificates.

Recommendation: Refactor to use a single zip creation method. Either:

  • Make createCertZip accept both cert objects and file bytes
  • Extract the common zip writing logic into a shared helper
  1. Missing Tests

No tests are included for the new functionality. For a security-sensitive feature like certificate management, tests
are critical.

Recommendation: Add tests for:

  • Certificate generation (CA, server, agent)
  • Handler endpoints (success and error cases)
  • Service methods (exists, list, delete)
  • Edge cases (missing CA, invalid agent IDs)
  1. No Validation on Agent ID (handler/cert.go:38-44)

Agent IDs are used directly in file paths without validation. This could potentially lead to path traversal issues.

Recommendation: Add validation to ensure agent IDs contain only safe characters (alphanumeric, hyphens,
underscores).

Medium Priority

  1. Misleading Script (misc/delete-server-certs.sh:2)
    curl -X DELETE http://localhost:8080/cert/server

The script references /cert/server but the actual endpoint is /server-certs (router.go:524).

Recommendation: Fix the script or remove it if the endpoint will be removed.

  1. Configuration Parsing (service.go:32-46)

IP address parsing silently skips invalid IPs with just a warning. If a user misconfigures an IP, the server might
start with an incomplete certificate.

Recommendation: Consider failing fast on invalid IPs in production, or at least log at ERROR level.

  1. Certificate Expiry

Certificates expire after 1 year but there's no renewal mechanism or expiry warnings.

Recommendation: Document the manual renewal process or add expiry warnings to logs on startup.


Code Quality Issues

Minor Issues

  1. Inconsistent Error Messages

handler/cert.go:59 returns "Failed to generate agent certificate" but doesn't include helpful context in the
user-facing message (though it's in logs).

Recommendation: Consider including actionable information in error responses where appropriate.

  1. Magic Numbers

generate.go:16,60,107: RSA key size 4096 and certificate validity periods are hardcoded.

Recommendation: Extract to named constants or configuration.

  1. Route Grouping Side Effect (router.go:507-525)

The refactoring of /agents into a route group is fine, but it changes the HTTP path structure. The admin handler
used to be at /agents, now it's nested within a group.

Impact: This appears to maintain the same path (/agents) so it should be backward compatible.


Testing Recommendations

Create test coverage for:

  1. Certificate Generation:
    - CA generation produces valid certificates
    - Server certificates include correct SANs
    - Agent certificates have ClientAuth usage
  2. Handler Tests:
    - POST creates certificate and returns zip
    - POST returns 409 if certificate exists
    - GET returns 404 for non-existent certificates
    - DELETE removes certificate files
    - All endpoints return 400 when TLS disabled
  3. Service Tests:
    - ensureCertificates creates CA if missing
    - ensureCertificates reuses existing CA
    - ListAgentCerts handles missing directory
    - DeleteAgentCert returns error if not exists

Configuration Review

The new config fields look good:

ca_key_file: ./certs/ca/ca-key.pem # Good - explicit path
domain_names: "localhost" # Good - comma-separated
ip_addresses: "127.0.0.1" # Good - comma-separated
agent_cert_dir: ./certs/agents # Good - organized


Security Considerations

  1. ✅ Private keys use secure permissions (0600)
  2. ✅ Certificates use appropriate key usage extensions
  3. ✅ Strong RSA key sizes (4096 bits)
  4. ❌ No authentication on certificate endpoints (anyone can create certs)
  5. ❌ No rate limiting on certificate creation
  6. ❌ Temp delete endpoint is a security risk

Documentation Needs

The PR should update CLAUDE.md to document:

  • New certificate API endpoints
  • Configuration options for domain names and IPs
  • Certificate lifecycle (generation, retrieval, deletion)
  • Manual certificate renewal process

Summary

Overall Assessment: The implementation is well-structured with good separation of concerns and solid cryptographic
practices. However, there are critical security concerns (the delete endpoint) and missing tests that should be
addressed before merging.

Recommendation:

  • ⚠️ Do not merge until the temp delete endpoint is removed or secured
  • 🔴 High priority: Add input validation and tests
  • 🟡 Medium priority: Fix code duplication and documentation

Estimated Risk: Medium-High (security endpoint, no tests, no input validation)

@railway-app railway-app bot temporarily deployed to Silo / production February 4, 2026 13:41 Inactive
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@internal/api/http/handler/cert.go`:
- Around line 52-62: Remove the unreliable pre-check using
h.certService.AgentCertExists and instead handle "already exists" at generation
time: ensure certService.GenerateAgentCert returns a sentinel error (e.g.,
ErrAgentCertAlreadyExists) when a cert cannot be created due to an existing
cert, and in the HTTP handler map that specific error to
ctx.JSON(http.StatusConflict, ...) with the same message; for all other errors
return the existing 500 behavior. Alternatively, if you keep the pre-check,
still detect ErrAgentCertAlreadyExists from GenerateAgentCert and turn it into a
409 to avoid the race. Refer to certService.GenerateAgentCert,
certService.AgentCertExists, and the HTTP handler surrounding the
GenerateAgentCert call.
🧹 Nitpick comments (1)
internal/api/http/handler/cert.go (1)

243-248: Keep deletion logic inside cert.Service.
Direct field access couples the handler to storage details; consider a DeleteServerCerts() method on the service to own path selection and deletion.

Comment on lines 52 to 63
if h.certService.AgentCertExists(agentID) {
ctx.JSON(http.StatusConflict, gin.H{
"error": "Certificate already exists for this agent",
})
return
}

slog.Info("Creating agent certificate", "agent_id", agentID)

agentCert, agentKey, err := h.certService.GenerateAgentCert(agentID)
if err != nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential race between existence check and generation.
Concurrent requests can both pass AgentCertExists and then one will fail in GenerateAgentCert (likely as a 500). Consider letting the service enforce “already exists” and map that error to 409 instead of a pre-check.

🤖 Prompt for AI Agents
In `@internal/api/http/handler/cert.go` around lines 52 - 62, Remove the
unreliable pre-check using h.certService.AgentCertExists and instead handle
"already exists" at generation time: ensure certService.GenerateAgentCert
returns a sentinel error (e.g., ErrAgentCertAlreadyExists) when a cert cannot be
created due to an existing cert, and in the HTTP handler map that specific error
to ctx.JSON(http.StatusConflict, ...) with the same message; for all other
errors return the existing 500 behavior. Alternatively, if you keep the
pre-check, still detect ErrAgentCertAlreadyExists from GenerateAgentCert and
turn it into a 409 to avoid the race. Refer to certService.GenerateAgentCert,
certService.AgentCertExists, and the HTTP handler surrounding the
GenerateAgentCert call.

@railway-app railway-app bot temporarily deployed to Silo / production February 4, 2026 13:48 Inactive
@JckHoe JckHoe force-pushed the feat/API-to-provision-cert branch from 49f6eb9 to 60a7854 Compare February 4, 2026 13:51
@railway-app railway-app bot temporarily deployed to Silo / production February 4, 2026 13:51 Inactive
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@cmd/silo-proxy-agent/main.go`:
- Line 54: The call internalhttp.SetupRoute(engine, services, "") is hardcoding
an empty admin API key which forces cert routes to 503; replace the literal ""
with the actual configured admin API key (e.g. pull it from the agent
configuration or services' config like services.Config.AdminAPIKey or a
configuredAdminAPIKey variable) so the routes are enabled when a key is present,
or explicitly disable the routes by passing a clear disable flag/option instead
of an empty string if you intend to keep them off.
🧹 Nitpick comments (2)
misc/delete-agent-cert.sh (1)

1-5: Require explicit ADMIN_API_KEY and fail fast on curl errors.
Defaulting to a placeholder key and ignoring curl failures makes debugging harder.

🛠️ Proposed hardening
 #!/bin/bash
+set -euo pipefail
 AGENT_ID=${1:-agent-1}
-API_KEY=${ADMIN_API_KEY:-your-api-key-here}
-curl -X DELETE "http://localhost:8080/agents/${AGENT_ID}/certificate" \
-  -H "X-API-Key: ${API_KEY}"
+: "${ADMIN_API_KEY:?ADMIN_API_KEY is required}"
+curl -fsS -X DELETE "http://localhost:8080/agents/${AGENT_ID}/certificate" \
+  -H "X-API-Key: ${ADMIN_API_KEY}"
internal/cert/service.go (1)

3-44: Consider failing fast on invalid IPs in config.
Silently skipping bad IPs can hide misconfiguration and produce unexpected SANs.

💡 One way to surface invalid values
 import (
 	"crypto/rsa"
 	"crypto/x509"
 	"fmt"
 	"log/slog"
 	"net"
 	"os"
 	"path/filepath"
+	"strings"
 )
@@
 	ipAddresses := ParseCommaSeparated(IPAddressesConfig)
 	if len(ipAddresses) > 0 {
+		var invalid []string
 		for _, ipStr := range ipAddresses {
 			if ip := net.ParseIP(ipStr); ip != nil {
 				s.IPAddresses = append(s.IPAddresses, ip)
 			} else {
-				slog.Warn("Invalid IP address in configuration, skipping", "ip", ipStr)
+				invalid = append(invalid, ipStr)
 			}
 		}
+		if len(invalid) > 0 {
+			return nil, fmt.Errorf("invalid IP addresses in configuration: %s", strings.Join(invalid, ", "))
+		}
 	}

}))
engine.Use(gin.Recovery())
internalhttp.SetupRoute(engine, services)
internalhttp.SetupRoute(engine, services, "")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don’t hardcode an empty admin API key.
Passing "" means cert routes will always return 503 even if a key is configured. If the agent should support these endpoints, wire through the configured key; if not, consider explicitly disabling the routes.

✅ Suggested wiring
-internalhttp.SetupRoute(engine, services, "")
+internalhttp.SetupRoute(engine, services, config.Http.AdminAPIKey)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
internalhttp.SetupRoute(engine, services, "")
internalhttp.SetupRoute(engine, services, config.Http.AdminAPIKey)
🤖 Prompt for AI Agents
In `@cmd/silo-proxy-agent/main.go` at line 54, The call
internalhttp.SetupRoute(engine, services, "") is hardcoding an empty admin API
key which forces cert routes to 503; replace the literal "" with the actual
configured admin API key (e.g. pull it from the agent configuration or services'
config like services.Config.AdminAPIKey or a configuredAdminAPIKey variable) so
the routes are enabled when a key is present, or explicitly disable the routes
by passing a clear disable flag/option instead of an empty string if you intend
to keep them off.

@railway-app railway-app bot temporarily deployed to Silo / production February 5, 2026 12:35 Inactive
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@internal/api/http/handler/cert.go`:
- Around line 208-230: The createCertZipFromBytes function currently defers
zipWriter.Close(), which discards any Close error and can return a corrupt ZIP
with nil error; replace the deferred close with an explicit call to
zipWriter.Close() before returning (after the loop that writes files), capture
its error and propagate it (return nil, err or wrap with context) so callers see
failures finalizing the archive; reference the existing zipWriter, zipBuffer and
the files loop in createCertZipFromBytes to locate where to insert the explicit
close and error handling.
🧹 Nitpick comments (1)
internal/api/http/handler/cert.go (1)

168-174: Minor TOCTOU between existence check and deletion.

Similar to CreateAgentCertificate, concurrent delete requests may pass the existence check but one will fail during DeleteAgentCert. Consider handling "not found" errors from DeleteAgentCert and mapping them to 404 for consistency, rather than returning 500.

♻️ Suggested approach
-	if !h.certService.AgentCertExists(agentID) {
-		slog.Warn("Certificate not found for deletion", "agent_id", agentID)
-		ctx.JSON(http.StatusNotFound, gin.H{
-			"error": "Certificate not found for this agent",
-		})
-		return
-	}
-
 	slog.Info("Deleting agent certificate", "agent_id", agentID)
 
 	certDir := h.certService.GetAgentCertDir(agentID)
 	if err := h.certService.DeleteAgentCert(agentID); err != nil {
+		if errors.Is(err, cert.ErrAgentCertNotFound) {
+			slog.Warn("Certificate not found for deletion", "agent_id", agentID)
+			ctx.JSON(http.StatusNotFound, gin.H{
+				"error": "Certificate not found for this agent",
+			})
+			return
+		}
 		slog.Error("Failed to delete agent certificate", "error", err, "agent_id", agentID)

This requires the service to return a sentinel error like ErrAgentCertNotFound.

@railway-app railway-app bot temporarily deployed to Silo / production February 5, 2026 12:40 Inactive
@lwlee2608 lwlee2608 merged commit a5d6d67 into main Feb 5, 2026
2 of 4 checks passed
@lwlee2608 lwlee2608 deleted the feat/API-to-provision-cert branch February 5, 2026 12:41
@lwlee2608 lwlee2608 restored the feat/API-to-provision-cert branch February 5, 2026 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants