Skip to content

grafana metrics#560

Merged
webpiratt merged 8 commits intomainfrom
vultisig/487-grafana-metrics
Feb 24, 2026
Merged

grafana metrics#560
webpiratt merged 8 commits intomainfrom
vultisig/487-grafana-metrics

Conversation

@evgensheff
Copy link
Collaborator

@evgensheff evgensheff commented Feb 13, 2026

Closes vultisig/recipes#487
Closes vultisig/recipes#531

Summary by CodeRabbit

  • New Features

    • App Store metrics added (installations, policies, fees) with periodic collector and last-update timestamp.
    • Metrics endpoint can be protected with a configurable Bearer token.
  • Changes

    • Metrics configuration accepts a token; servers report whether the endpoint is protected.
    • Metrics collection integrated across services to surface App Store data.
  • Behavior

    • Vault/plugin authentication moved from broad group-level enforcement to explicit per-route middleware.
    • Policy creation/update no longer requires a request-context vault public key match.

@evgensheff evgensheff requested a review from webpiratt February 13, 2026 13:21
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds Prometheus App Store metrics and a periodic Postgres-backed collector; integrates the collector into verifier startup/shutdown; introduces token-protected metrics HTTP endpoint and propagates token through configs/commands; moves vault/plugin authentication from group-level to selective per-route middleware.

Changes

Cohort / File(s) Summary
Metrics definitions
internal/metrics/appstore.go
New Prometheus gauges/gaugevecs and AppStoreMetrics with methods to reset/populate per-plugin installations, policies, fees, and to update a last-update timestamp.
Collector implementation
internal/metrics/appstore_collector.go
New DatabaseQuerier interface and AppStoreCollector with NewAppStoreCollector, Start, Stop; ticker-driven collect loop queries DB sources, logs per-source errors, updates metrics, and sets timestamp.
Metrics registry
internal/metrics/registry.go
Adds ServiceAppStore constant and registerAppStoreMetrics; integrates appstore metrics into the metrics registration switch.
Storage queries (Postgres)
internal/storage/postgres/appstore_metrics.go
Implements PostgresBackend methods: GetInstallationsByPlugin, GetPoliciesByPlugin, GetFeesByPlugin returning per-plugin maps with query/scan/rows error handling.
Verifier integration
cmd/verifier/main.go
Initializes AppStoreMetrics, creates/starts AppStoreCollector, registers ServiceAppStore with metrics server, passes logger into StartMetricsServer, and defers collector stop on shutdown.
Metrics server & config
internal/metrics/server.go, config/config.go, plugin/tx_indexer/pkg/config/config.go
Adds Token to MetricsConfig; NewServer gains token param; introduces authMiddleware enforcing Bearer token with constant-time validation and conditional handler registration/logging when token is set or absent.
Command binaries
cmd/tx_indexer/main.go, cmd/worker/main.go
Propagates cfg.Metrics.Token into metrics server configuration when metrics are enabled.
API: policy creation
internal/api/policy.go
Removes reliance on request-context vault_public_key in CreatePluginPolicy and UpdatePluginPolicyById; uses policy.PublicKey for trial checks, vault operations, and public-key comparisons.
API server route middleware
internal/api/server.go, plugin/server/server.go
Moves vault authentication from group-level to explicit per-route middleware for selected endpoints; removes VerifierAuthMiddleware from /vault/reshare and plugin policy create/update routes and adjusts route registrations accordingly.
🚥 Pre-merge checks | ✅ 3 | ❌ 3

❌ Failed checks (1 warning, 2 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 54.55% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'grafana metrics' is vague and generic, lacking specificity about the actual changes made to the codebase. Consider a more descriptive title that captures the main change, such as 'Add app store metrics collection and Grafana dashboard support' or 'Implement Prometheus-based app store metrics with token auth'.
Out of Scope Changes check ❓ Inconclusive The PR includes token-based authentication for metrics endpoints which was not explicitly mentioned in the linked issues, representing a partial scope expansion. Clarify whether token-based metrics authentication aligns with security requirements, or if it should be a separate concern addressed in a different PR.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed The PR addresses both linked issues: adds app store metrics collection supporting the Grafana dashboard requirements [#487] and removes authentication from specific endpoints as required [#531].
Description check ✅ Passed PR description clearly states closure of linked issues and provides context about the changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch vultisig/487-grafana-metrics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@cmd/verifier/main.go`:
- Around line 102-104: The Postgres backend must satisfy the DatabaseQuerier
interface by providing the exact methods GetInstallationsByPlugin,
GetPoliciesByPlugin, and GetFeesByPlugin; add these methods to the
PostgresBackend (or create a PostgresBackendAdapter) that delegate to the
existing implementations: call GetPluginInstallationsCount (or wrap it to return
the expected installation slice/result) for GetInstallationsByPlugin, map
GetPoliciesByPlugin to call GetPluginPolicies or GetAllPluginPolicies and adapt
the return types, and implement GetFeesByPlugin to call GetFeesByPluginID and
convert its output to the DatabaseQuerier-expected shape; ensure method
signatures, return types and error handling exactly match the DatabaseQuerier
interface so PostgresBackend implements DatabaseQuerier.

In `@internal/metrics/appstore_collector.go`:
- Around line 64-88: The collect method currently calls
c.metrics.UpdateTimestamp() unconditionally, which advances the "last successful
metrics update" even when all DB queries fail; modify
AppStoreCollector.collect() to track success (e.g., a boolean flag) that is set
to true when any of the calls to c.db.GetInstallationsByPlugin,
c.db.GetPoliciesByPlugin, or c.db.GetFeesByPlugin succeed and their
corresponding c.metrics.UpdateInstallations, UpdatePolicies, or UpdateFees are
called, and only call c.metrics.UpdateTimestamp() if that flag is true.
🧹 Nitpick comments (4)
cmd/verifier/main.go (1)

106-110: Redundant nil check on appStoreCollector.

At this point in the code, appStoreCollector was just assigned on line 103 and is guaranteed to be non-nil (assuming NewAppStoreCollector never returns nil, which it doesn't). The nil guard is unnecessary.

Proposed fix
 		defer func() {
-			if appStoreCollector != nil {
-				appStoreCollector.Stop()
-			}
+			appStoreCollector.Stop()
 		}()
internal/metrics/appstore.go (1)

83-93: Reset() before Set() creates a brief scrape window with missing data.

Between Reset() on line 84 and the subsequent Set() calls, a Prometheus scrape could observe zero or absent time series. The same pattern applies to UpdatePolicies and UpdateFees. This can cause brief dips/gaps in Grafana panels.

A safer approach is to build the new label set, then swap — or simply overwrite without resetting (stale plugin IDs would linger, but that's often preferable to data gaps). If you need to prune removed plugins, consider collecting the current label set and deleting only the ones no longer present.

Example: delete only stale labels instead of resetting
// Helper to update a GaugeVec without a full Reset.
// previousKeys should be tracked per metric (e.g., stored on AppStoreMetrics).
func updateGaugeVec(gv *prometheus.GaugeVec, data map[string]int64, previousKeys *map[string]struct{}) {
	current := make(map[string]struct{}, len(data))
	for pluginID, count := range data {
		gv.WithLabelValues(pluginID).Set(float64(count))
		current[pluginID] = struct{}{}
	}
	// Remove stale entries
	if previousKeys != nil {
		for old := range *previousKeys {
			if _, ok := current[old]; !ok {
				gv.DeleteLabelValues(old)
			}
		}
	}
	*previousKeys = current
}
internal/metrics/appstore_collector.go (2)

64-66: DB queries use context.Background() with no timeout — could block indefinitely.

If the database is slow or unresponsive, collect() will block the entire ticker loop. Consider using a context with a timeout derived from the collection interval.

Proposed fix
 func (c *AppStoreCollector) collect() {
-	ctx := context.Background()
+	ctx, cancel := context.WithTimeout(context.Background(), c.interval)
+	defer cancel()

59-62: Stop() will panic if called twice due to double close(c.stopCh).

This may not be an issue in practice given the current single call site, but it's a latent footgun. A sync.Once or a select-based guard would make it safe.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@internal/metrics/appstore_collector.go`:
- Around line 59-62: Stop() currently closes c.stopCh unconditionally which will
panic if Stop() is called more than once; add a sync.Once field (e.g., stopOnce
sync.Once) to AppStoreCollector and change Stop() to call c.stopOnce.Do(func(){
close(c.stopCh) }) and then wait on <-c.doneCh (or close doneCh once worker
exits) so repeated Stop() calls are safe; also import sync and initialize the
struct normally (no special init required).
🧹 Nitpick comments (1)
internal/metrics/appstore_collector.go (1)

64-89: Consider adding a context timeout to prevent the collector from hanging on slow DB queries.

context.Background() with no deadline means a stalled database connection will block the collector goroutine indefinitely. Since this runs on a periodic ticker, a reasonable timeout (e.g., half the collection interval) would prevent cascading delays.

Proposed fix
 func (c *AppStoreCollector) collect() {
-	ctx := context.Background()
+	ctx, cancel := context.WithTimeout(context.Background(), c.interval/2)
+	defer cancel()

Also, UpdateTimestamp() is called up to three times per cycle (once per successful query). A single call at the end when any query succeeded would be cleaner:

Minor cleanup
 func (c *AppStoreCollector) collect() {
 	ctx := context.Background()
+	anySuccess := false

 	installations, err := c.db.GetInstallationsByPlugin(ctx)
 	if err != nil {
 		c.logger.Errorf("Failed to collect installations: %v", err)
 	} else {
 		c.metrics.UpdateInstallations(installations)
-		c.metrics.UpdateTimestamp()
+		anySuccess = true
 	}

 	policies, err := c.db.GetPoliciesByPlugin(ctx)
 	if err != nil {
 		c.logger.Errorf("Failed to collect policies: %v", err)
 	} else {
 		c.metrics.UpdatePolicies(policies)
-		c.metrics.UpdateTimestamp()
+		anySuccess = true
 	}

 	fees, err := c.db.GetFeesByPlugin(ctx)
 	if err != nil {
 		c.logger.Errorf("Failed to collect fees: %v", err)
 	} else {
 		c.metrics.UpdateFees(fees)
-		c.metrics.UpdateTimestamp()
+		anySuccess = true
 	}
+
+	if anySuccess {
+		c.metrics.UpdateTimestamp()
+	}
 }

@evgensheff evgensheff force-pushed the vultisig/487-grafana-metrics branch from 6144d70 to 872a56f Compare February 18, 2026 05:21
Added metrics auth
Rm reshare and policy control endpoints auth
@evgensheff evgensheff requested a review from webpiratt February 18, 2026 06:46
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
cmd/verifier/main.go (2)

105-109: Nil guard in the deferred Stop() is dead code.

Start() is called on line 103 without a nil guard; if NewAppStoreCollector ever returned nil, execution would already have panicked there. The check if appStoreCollector != nil in the defer is redundant.

♻️ Proposed cleanup
-		defer func() {
-			if appStoreCollector != nil {
-				appStoreCollector.Stop()
-			}
-		}()
+		defer appStoreCollector.Stop()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/verifier/main.go` around lines 105 - 109, The deferred nil check is dead
code: since NewAppStoreCollector(...) is followed by appStoreCollector.Start()
(so appStoreCollector is expected non-nil or would have panicked), remove the
redundant guard in the defer and always call appStoreCollector.Stop(); update
the defer that references appStoreCollector to simply call
appStoreCollector.Stop() (keep the existing defer block around the Stop call)
and ensure references are to appStoreCollector, NewAppStoreCollector, Start, and
Stop so reviewers can locate the change.

102-102: Hardcoded 30-second collection interval should be configurable.

Different environments (staging vs. production) may need different scrape cadences. Consider exposing this as cfg.Metrics.CollectInterval (with a sensible default of 30s).

♻️ Proposed refactor

In the metrics config struct (e.g., config.MetricsConfig):

+ CollectInterval time.Duration `mapstructure:"collect_interval"`

With a default fallback in main.go:

-	appStoreCollector = internalMetrics.NewAppStoreCollector(db, appStoreMetrics, logger, 30*time.Second)
+	collectInterval := cfg.Metrics.CollectInterval
+	if collectInterval == 0 {
+		collectInterval = 30 * time.Second
+	}
+	appStoreCollector = internalMetrics.NewAppStoreCollector(db, appStoreMetrics, logger, collectInterval)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/verifier/main.go` at line 102, Replace the hardcoded 30s interval passed
to internalMetrics.NewAppStoreCollector with a configurable field: add
CollectInterval time.Duration `mapstructure:"collect_interval"` to your
config.MetricsConfig (or config.Metrics) and use cfg.Metrics.CollectInterval
when constructing appStoreCollector; ensure you apply a sensible default of
30*time.Second if cfg.Metrics.CollectInterval is zero/empty before calling
internalMetrics.NewAppStoreCollector so existing behavior remains when the
config omits the value.
internal/metrics/appstore_collector.go (1)

68-93: UpdateTimestamp() is called up to three times per cycle — once per successful source.

If the intent is "last time at least one source succeeded," a single call at the end is sufficient and avoids writing the same timestamp value three times.

♻️ Proposed refactor
 func (c *AppStoreCollector) collect() {
 	ctx, cancel := context.WithTimeout(context.Background(), c.interval*9/10)
 	defer cancel()
+	anySuccess := false

 	installations, err := c.db.GetInstallationsByPlugin(ctx)
 	if err != nil {
 		c.logger.Errorf("Failed to collect installations: %v", err)
 	} else {
 		c.metrics.UpdateInstallations(installations)
-		c.metrics.UpdateTimestamp()
+		anySuccess = true
 	}

 	policies, err := c.db.GetPoliciesByPlugin(ctx)
 	if err != nil {
 		c.logger.Errorf("Failed to collect policies: %v", err)
 	} else {
 		c.metrics.UpdatePolicies(policies)
-		c.metrics.UpdateTimestamp()
+		anySuccess = true
 	}

 	fees, err := c.db.GetFeesByPlugin(ctx)
 	if err != nil {
 		c.logger.Errorf("Failed to collect fees: %v", err)
 	} else {
 		c.metrics.UpdateFees(fees)
-		c.metrics.UpdateTimestamp()
+		anySuccess = true
 	}
+
+	if anySuccess {
+		c.metrics.UpdateTimestamp()
+	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metrics/appstore_collector.go` around lines 68 - 93,
AppStoreCollector.collect() currently calls c.metrics.UpdateTimestamp() after
each successful source update (in the branches after
c.metrics.UpdateInstallations, UpdatePolicies, UpdateFees); change this so the
timestamp is only updated once per collect cycle when at least one source
succeeded: introduce a local bool (e.g., anySucceeded) that you set true after
each successful UpdateInstallations/UpdatePolicies/UpdateFees call, and after
all three fetches call c.metrics.UpdateTimestamp() only if anySucceeded is true.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/metrics/appstore_collector.go`:
- Around line 68-69: The collect() method uses context.Background() which can
block shutdown; change it to use a bounded context (e.g. ctx, cancel :=
context.WithTimeout(context.Background(), timeout)) and defer cancel so any hung
DB/query inside AppStoreCollector.collect() times out before the next collection
and before Stop() waits on <-c.doneCh; pick timeout shorter than the collection
interval (use the collector's existing interval value or a new field like
c.collectInterval) and pass the bounded ctx into any DB/query calls inside
collect().
- Around line 38-58: Add a start guard to prevent Start() from spawning multiple
goroutines: add a startOnce sync.Once field to the AppStoreCollector struct and
wrap the goroutine launch in Start() with c.startOnce.Do(func() { ... }),
leaving the existing defer close(c.doneCh) and stop logic intact; this mirrors
the existing stopOnce used by Stop() and prevents a second Start() from causing
a panic by closing doneCh twice.

---

Nitpick comments:
In `@cmd/verifier/main.go`:
- Around line 105-109: The deferred nil check is dead code: since
NewAppStoreCollector(...) is followed by appStoreCollector.Start() (so
appStoreCollector is expected non-nil or would have panicked), remove the
redundant guard in the defer and always call appStoreCollector.Stop(); update
the defer that references appStoreCollector to simply call
appStoreCollector.Stop() (keep the existing defer block around the Stop call)
and ensure references are to appStoreCollector, NewAppStoreCollector, Start, and
Stop so reviewers can locate the change.
- Line 102: Replace the hardcoded 30s interval passed to
internalMetrics.NewAppStoreCollector with a configurable field: add
CollectInterval time.Duration `mapstructure:"collect_interval"` to your
config.MetricsConfig (or config.Metrics) and use cfg.Metrics.CollectInterval
when constructing appStoreCollector; ensure you apply a sensible default of
30*time.Second if cfg.Metrics.CollectInterval is zero/empty before calling
internalMetrics.NewAppStoreCollector so existing behavior remains when the
config omits the value.

In `@internal/metrics/appstore_collector.go`:
- Around line 68-93: AppStoreCollector.collect() currently calls
c.metrics.UpdateTimestamp() after each successful source update (in the branches
after c.metrics.UpdateInstallations, UpdatePolicies, UpdateFees); change this so
the timestamp is only updated once per collect cycle when at least one source
succeeded: introduce a local bool (e.g., anySucceeded) that you set true after
each successful UpdateInstallations/UpdatePolicies/UpdateFees call, and after
all three fetches call c.metrics.UpdateTimestamp() only if anySucceeded is true.

Copy link
Collaborator

@Ehsan-saradar Ehsan-saradar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove the auth-header check here.

vaultGroup.POST("/reshare", s.ReshareVault)

and
pluginGroup.POST("/policy", s.CreatePluginPolicy)

Plugins can still validate the auth header since it’s sent by the verifier. However, the agent/extension doesn’t have an access token, so we need to remove the auth check on the verifier side for reshare and policy requests.

@evgensheff
Copy link
Collaborator Author

@Ehsan-saradar please check if everything is sufficient

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/api/policy.go (1)

79-101: ⚠️ Potential issue | 🟠 Major

Signature verification must precede the billing/trial checks to prevent unauthenticated probing.

Since the middleware-level auth requirement was intentionally removed from this endpoint, policy.PublicKey is now fully attacker-controlled until verifyPolicySignature succeeds. In the current ordering, the two operations on lines 80 and 88–96 are executed against an unverified caller-supplied key:

  • Line 80IsTrialActive hits the database with an arbitrary policy.PublicKey, allowing any caller to enumerate whether a given public key has an active trial.
  • Lines 88–96vaultStorage.Exist hits object storage with an arbitrary key, similarly leaking billing vault presence.

Both are I/O operations against sensitive state and incur real resource cost, yet neither has been authorized. Moving verifyPolicySignature to immediately after UUID generation ensures the cryptographic proof of ownership is established before any state is probed.

🔒 Proposed fix: hoist signature check before billing/trial probe
 	if policy.ID == uuid.Nil {
 		policy.ID = uuid.New()
 	}
 
+	if !s.verifyPolicySignature(policy) {
+		s.logger.Error("invalid policy signature")
+		return c.JSON(http.StatusBadRequest, NewErrorResponseWithMessage(msgInvalidPolicySignature))
+	}
+
 	var (
 		isTrialActive bool
 		err           error
 	)
 	err = s.db.WithTransaction(c.Request().Context(), func(ctx context.Context, tx pgx.Tx) error {
 		isTrialActive, _, err = s.db.IsTrialActive(ctx, tx, policy.PublicKey)
 		return err
 	})
 	if err != nil {
 		s.logger.WithError(err).Warnf("Failed to check trial info")
 	}
 
 	if !isTrialActive {
 		filePathName := common.GetVaultBackupFilename(policy.PublicKey, vtypes.PluginVultisigFees_feee.String())
 		exist, err := s.vaultStorage.Exist(filePathName)
 		if err != nil {
 			s.logger.WithError(err).Error("failed to check vault existence")
 			return c.JSON(http.StatusInternalServerError, NewErrorResponseWithMessage(msgInternalError))
 		}
 		if !exist {
 			return c.JSON(http.StatusForbidden, NewErrorResponseWithMessage(msgAccessDeniedBilling))
 		}
 	}
 
-	if !s.verifyPolicySignature(policy) {
-		s.logger.Error("invalid policy signature")
-		return c.JSON(http.StatusBadRequest, NewErrorResponseWithMessage(msgInvalidPolicySignature))
-	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/api/policy.go` around lines 79 - 101, The signature verification
must run before any billing/trial or storage probes because policy.PublicKey is
attacker-controlled until proven; move the call to
s.verifyPolicySignature(policy) to immediately after the policy UUID is
generated/parsed and before calling s.db.IsTrialActive(...) and
s.vaultStorage.Exist(...). If s.verifyPolicySignature(policy) returns false, log
and return the existing BadRequest response (msgInvalidPolicySignature) without
touching IsTrialActive or vaultStorage.Exist; keep the existing error handling
and responses for the DB/storage calls but only execute them after
verifyPolicySignature succeeds.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/api/server.go`:
- Around line 160-164: The ReshareVault endpoint is currently unauthenticated
and allows resharing without proof-of-ownership; update the route registration
so that vaultGroup.POST("/reshare", s.ReshareVault) requires the same
VaultAuthMiddleware used by GetVault, and also protect the existence check by
adding authentication to vaultGroup.GET("/exist/:pluginId/:publicKeyECDSA",
s.ExistVault). Concretely, apply s.VaultAuthMiddleware (either by adding it to
the vaultGroup middleware stack or by passing it into the POST/GET handlers) so
ReshareVault and ExistVault enforce the caller’s identity/ownership before
proceeding.
- Around line 171-174: The PUT /plugin/policy route is missing the
VaultAuthMiddleware so UpdatePluginPolicyById will not find the vault_public_key
in context and will always error; update the route registration for
pluginGroup.PUT("/policy", ...) to include s.VaultAuthMiddleware (same pattern
used by DELETE and other sensitive routes) so that VaultAuthMiddleware sets the
vault_public_key context key before UpdatePluginPolicyById executes.

---

Outside diff comments:
In `@internal/api/policy.go`:
- Around line 79-101: The signature verification must run before any
billing/trial or storage probes because policy.PublicKey is attacker-controlled
until proven; move the call to s.verifyPolicySignature(policy) to immediately
after the policy UUID is generated/parsed and before calling
s.db.IsTrialActive(...) and s.vaultStorage.Exist(...). If
s.verifyPolicySignature(policy) returns false, log and return the existing
BadRequest response (msgInvalidPolicySignature) without touching IsTrialActive
or vaultStorage.Exist; keep the existing error handling and responses for the
DB/storage calls but only execute them after verifyPolicySignature succeeds.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/api/policy.go (1)

79-97: ⚠️ Potential issue | 🔴 Critical

Signature verification must precede all database and storage operations to prevent unauthenticated writes.

policy.PublicKey is entirely user-supplied and unauthenticated at this point. However, IsTrialActive() (line 80) performs a database INSERT when called with a public key that has no existing trial record. This write happens in a transaction before verifyPolicySignature() runs at line 99.

Attack scenario: An attacker can call CreatePluginPolicy repeatedly with arbitrary public keys they do not control. For each key without an existing trial record, IsTrialActive will insert a new trial entry into the fees table. The global rate limiter (5 req/min, burst 30) is insufficient to prevent this DOS-style record creation.

Move verifyPolicySignature() to execute immediately after binding the policy and before any database or storage operations. Only allow database/storage access for callers who cryptographically prove ownership of the private key.

Reordering sketch
 func (s *Server) CreatePluginPolicy(c echo.Context) error {
     var policy types.PluginPolicy
     if err := c.Bind(&policy); err != nil { ... }
     if policy.ID == uuid.Nil { policy.ID = uuid.New() }
 
+    // Verify signature first — caller must control the vault key before any DB/storage access.
+    if !s.verifyPolicySignature(policy) {
+        s.logger.Error("invalid policy signature")
+        return c.JSON(http.StatusBadRequest, NewErrorResponseWithMessage(msgInvalidPolicySignature))
+    }
+
     err = s.db.WithTransaction(c.Request().Context(), func(ctx context.Context, tx pgx.Tx) error {
         isTrialActive, _, err = s.db.IsTrialActive(ctx, tx, policy.PublicKey)
         return err
     })
     ...
-
-    if !s.verifyPolicySignature(policy) {
-        s.logger.Error("invalid policy signature")
-        return c.JSON(http.StatusBadRequest, NewErrorResponseWithMessage(msgInvalidPolicySignature))
-    }
     ...
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/api/policy.go` around lines 79 - 97, The code currently calls
IsTrialActive (which can perform DB INSERTs) before verifying the caller owns
policy.PublicKey; move the signature verification to immediately after the
policy is bound and before any DB/storage calls so unauthenticated keys cannot
trigger writes. Concretely, in CreatePluginPolicy (the handler that calls
s.db.WithTransaction and s.vaultStorage.Exist), call verifyPolicySignature(...)
right after binding/parsing the policy object and before invoking
s.db.IsTrialActive or any vaultStorage methods; ensure verifyPolicySignature
returns an error on failure and short-circuits the handler to avoid any
DB/Storage access when signature verification fails.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@internal/api/policy.go`:
- Around line 79-97: The code currently calls IsTrialActive (which can perform
DB INSERTs) before verifying the caller owns policy.PublicKey; move the
signature verification to immediately after the policy is bound and before any
DB/storage calls so unauthenticated keys cannot trigger writes. Concretely, in
CreatePluginPolicy (the handler that calls s.db.WithTransaction and
s.vaultStorage.Exist), call verifyPolicySignature(...) right after
binding/parsing the policy object and before invoking s.db.IsTrialActive or any
vaultStorage methods; ensure verifyPolicySignature returns an error on failure
and short-circuits the handler to avoid any DB/Storage access when signature
verification fails.

@webpiratt webpiratt closed this Feb 20, 2026
@webpiratt webpiratt reopened this Feb 23, 2026
@webpiratt webpiratt merged commit 57e27ba into main Feb 24, 2026
7 checks passed
@webpiratt webpiratt deleted the vultisig/487-grafana-metrics branch February 24, 2026 08:14
evgensheff added a commit that referenced this pull request Feb 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

verifier: remove auth /reshare and policy create Grafana product metrics

3 participants