Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds Prometheus App Store metrics and a periodic Postgres-backed collector; integrates the collector into verifier startup/shutdown; introduces token-protected metrics HTTP endpoint and propagates token through configs/commands; moves vault/plugin authentication from group-level to selective per-route middleware. Changes
🚥 Pre-merge checks | ✅ 3 | ❌ 3❌ Failed checks (1 warning, 2 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@cmd/verifier/main.go`:
- Around line 102-104: The Postgres backend must satisfy the DatabaseQuerier
interface by providing the exact methods GetInstallationsByPlugin,
GetPoliciesByPlugin, and GetFeesByPlugin; add these methods to the
PostgresBackend (or create a PostgresBackendAdapter) that delegate to the
existing implementations: call GetPluginInstallationsCount (or wrap it to return
the expected installation slice/result) for GetInstallationsByPlugin, map
GetPoliciesByPlugin to call GetPluginPolicies or GetAllPluginPolicies and adapt
the return types, and implement GetFeesByPlugin to call GetFeesByPluginID and
convert its output to the DatabaseQuerier-expected shape; ensure method
signatures, return types and error handling exactly match the DatabaseQuerier
interface so PostgresBackend implements DatabaseQuerier.
In `@internal/metrics/appstore_collector.go`:
- Around line 64-88: The collect method currently calls
c.metrics.UpdateTimestamp() unconditionally, which advances the "last successful
metrics update" even when all DB queries fail; modify
AppStoreCollector.collect() to track success (e.g., a boolean flag) that is set
to true when any of the calls to c.db.GetInstallationsByPlugin,
c.db.GetPoliciesByPlugin, or c.db.GetFeesByPlugin succeed and their
corresponding c.metrics.UpdateInstallations, UpdatePolicies, or UpdateFees are
called, and only call c.metrics.UpdateTimestamp() if that flag is true.
🧹 Nitpick comments (4)
cmd/verifier/main.go (1)
106-110: Redundant nil check onappStoreCollector.At this point in the code,
appStoreCollectorwas just assigned on line 103 and is guaranteed to be non-nil (assumingNewAppStoreCollectornever returns nil, which it doesn't). The nil guard is unnecessary.Proposed fix
defer func() { - if appStoreCollector != nil { - appStoreCollector.Stop() - } + appStoreCollector.Stop() }()internal/metrics/appstore.go (1)
83-93:Reset()beforeSet()creates a brief scrape window with missing data.Between
Reset()on line 84 and the subsequentSet()calls, a Prometheus scrape could observe zero or absent time series. The same pattern applies toUpdatePoliciesandUpdateFees. This can cause brief dips/gaps in Grafana panels.A safer approach is to build the new label set, then swap — or simply overwrite without resetting (stale plugin IDs would linger, but that's often preferable to data gaps). If you need to prune removed plugins, consider collecting the current label set and deleting only the ones no longer present.
Example: delete only stale labels instead of resetting
// Helper to update a GaugeVec without a full Reset. // previousKeys should be tracked per metric (e.g., stored on AppStoreMetrics). func updateGaugeVec(gv *prometheus.GaugeVec, data map[string]int64, previousKeys *map[string]struct{}) { current := make(map[string]struct{}, len(data)) for pluginID, count := range data { gv.WithLabelValues(pluginID).Set(float64(count)) current[pluginID] = struct{}{} } // Remove stale entries if previousKeys != nil { for old := range *previousKeys { if _, ok := current[old]; !ok { gv.DeleteLabelValues(old) } } } *previousKeys = current }internal/metrics/appstore_collector.go (2)
64-66: DB queries usecontext.Background()with no timeout — could block indefinitely.If the database is slow or unresponsive,
collect()will block the entire ticker loop. Consider using a context with a timeout derived from the collection interval.Proposed fix
func (c *AppStoreCollector) collect() { - ctx := context.Background() + ctx, cancel := context.WithTimeout(context.Background(), c.interval) + defer cancel()
59-62:Stop()will panic if called twice due to doubleclose(c.stopCh).This may not be an issue in practice given the current single call site, but it's a latent footgun. A
sync.Onceor a select-based guard would make it safe.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@internal/metrics/appstore_collector.go`:
- Around line 59-62: Stop() currently closes c.stopCh unconditionally which will
panic if Stop() is called more than once; add a sync.Once field (e.g., stopOnce
sync.Once) to AppStoreCollector and change Stop() to call c.stopOnce.Do(func(){
close(c.stopCh) }) and then wait on <-c.doneCh (or close doneCh once worker
exits) so repeated Stop() calls are safe; also import sync and initialize the
struct normally (no special init required).
🧹 Nitpick comments (1)
internal/metrics/appstore_collector.go (1)
64-89: Consider adding a context timeout to prevent the collector from hanging on slow DB queries.
context.Background()with no deadline means a stalled database connection will block the collector goroutine indefinitely. Since this runs on a periodic ticker, a reasonable timeout (e.g., half the collection interval) would prevent cascading delays.Proposed fix
func (c *AppStoreCollector) collect() { - ctx := context.Background() + ctx, cancel := context.WithTimeout(context.Background(), c.interval/2) + defer cancel()Also,
UpdateTimestamp()is called up to three times per cycle (once per successful query). A single call at the end when any query succeeded would be cleaner:Minor cleanup
func (c *AppStoreCollector) collect() { ctx := context.Background() + anySuccess := false installations, err := c.db.GetInstallationsByPlugin(ctx) if err != nil { c.logger.Errorf("Failed to collect installations: %v", err) } else { c.metrics.UpdateInstallations(installations) - c.metrics.UpdateTimestamp() + anySuccess = true } policies, err := c.db.GetPoliciesByPlugin(ctx) if err != nil { c.logger.Errorf("Failed to collect policies: %v", err) } else { c.metrics.UpdatePolicies(policies) - c.metrics.UpdateTimestamp() + anySuccess = true } fees, err := c.db.GetFeesByPlugin(ctx) if err != nil { c.logger.Errorf("Failed to collect fees: %v", err) } else { c.metrics.UpdateFees(fees) - c.metrics.UpdateTimestamp() + anySuccess = true } + + if anySuccess { + c.metrics.UpdateTimestamp() + } }
6144d70 to
872a56f
Compare
Added metrics auth Rm reshare and policy control endpoints auth
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
cmd/verifier/main.go (2)
105-109: Nil guard in the deferredStop()is dead code.
Start()is called on line 103 without a nil guard; ifNewAppStoreCollectorever returnednil, execution would already have panicked there. The checkif appStoreCollector != nilin the defer is redundant.♻️ Proposed cleanup
- defer func() { - if appStoreCollector != nil { - appStoreCollector.Stop() - } - }() + defer appStoreCollector.Stop()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/verifier/main.go` around lines 105 - 109, The deferred nil check is dead code: since NewAppStoreCollector(...) is followed by appStoreCollector.Start() (so appStoreCollector is expected non-nil or would have panicked), remove the redundant guard in the defer and always call appStoreCollector.Stop(); update the defer that references appStoreCollector to simply call appStoreCollector.Stop() (keep the existing defer block around the Stop call) and ensure references are to appStoreCollector, NewAppStoreCollector, Start, and Stop so reviewers can locate the change.
102-102: Hardcoded 30-second collection interval should be configurable.Different environments (staging vs. production) may need different scrape cadences. Consider exposing this as
cfg.Metrics.CollectInterval(with a sensible default of30s).♻️ Proposed refactor
In the metrics config struct (e.g.,
config.MetricsConfig):+ CollectInterval time.Duration `mapstructure:"collect_interval"`With a default fallback in
main.go:- appStoreCollector = internalMetrics.NewAppStoreCollector(db, appStoreMetrics, logger, 30*time.Second) + collectInterval := cfg.Metrics.CollectInterval + if collectInterval == 0 { + collectInterval = 30 * time.Second + } + appStoreCollector = internalMetrics.NewAppStoreCollector(db, appStoreMetrics, logger, collectInterval)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/verifier/main.go` at line 102, Replace the hardcoded 30s interval passed to internalMetrics.NewAppStoreCollector with a configurable field: add CollectInterval time.Duration `mapstructure:"collect_interval"` to your config.MetricsConfig (or config.Metrics) and use cfg.Metrics.CollectInterval when constructing appStoreCollector; ensure you apply a sensible default of 30*time.Second if cfg.Metrics.CollectInterval is zero/empty before calling internalMetrics.NewAppStoreCollector so existing behavior remains when the config omits the value.internal/metrics/appstore_collector.go (1)
68-93:UpdateTimestamp()is called up to three times per cycle — once per successful source.If the intent is "last time at least one source succeeded," a single call at the end is sufficient and avoids writing the same timestamp value three times.
♻️ Proposed refactor
func (c *AppStoreCollector) collect() { ctx, cancel := context.WithTimeout(context.Background(), c.interval*9/10) defer cancel() + anySuccess := false installations, err := c.db.GetInstallationsByPlugin(ctx) if err != nil { c.logger.Errorf("Failed to collect installations: %v", err) } else { c.metrics.UpdateInstallations(installations) - c.metrics.UpdateTimestamp() + anySuccess = true } policies, err := c.db.GetPoliciesByPlugin(ctx) if err != nil { c.logger.Errorf("Failed to collect policies: %v", err) } else { c.metrics.UpdatePolicies(policies) - c.metrics.UpdateTimestamp() + anySuccess = true } fees, err := c.db.GetFeesByPlugin(ctx) if err != nil { c.logger.Errorf("Failed to collect fees: %v", err) } else { c.metrics.UpdateFees(fees) - c.metrics.UpdateTimestamp() + anySuccess = true } + + if anySuccess { + c.metrics.UpdateTimestamp() + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/metrics/appstore_collector.go` around lines 68 - 93, AppStoreCollector.collect() currently calls c.metrics.UpdateTimestamp() after each successful source update (in the branches after c.metrics.UpdateInstallations, UpdatePolicies, UpdateFees); change this so the timestamp is only updated once per collect cycle when at least one source succeeded: introduce a local bool (e.g., anySucceeded) that you set true after each successful UpdateInstallations/UpdatePolicies/UpdateFees call, and after all three fetches call c.metrics.UpdateTimestamp() only if anySucceeded is true.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/metrics/appstore_collector.go`:
- Around line 68-69: The collect() method uses context.Background() which can
block shutdown; change it to use a bounded context (e.g. ctx, cancel :=
context.WithTimeout(context.Background(), timeout)) and defer cancel so any hung
DB/query inside AppStoreCollector.collect() times out before the next collection
and before Stop() waits on <-c.doneCh; pick timeout shorter than the collection
interval (use the collector's existing interval value or a new field like
c.collectInterval) and pass the bounded ctx into any DB/query calls inside
collect().
- Around line 38-58: Add a start guard to prevent Start() from spawning multiple
goroutines: add a startOnce sync.Once field to the AppStoreCollector struct and
wrap the goroutine launch in Start() with c.startOnce.Do(func() { ... }),
leaving the existing defer close(c.doneCh) and stop logic intact; this mirrors
the existing stopOnce used by Stop() and prevents a second Start() from causing
a panic by closing doneCh twice.
---
Nitpick comments:
In `@cmd/verifier/main.go`:
- Around line 105-109: The deferred nil check is dead code: since
NewAppStoreCollector(...) is followed by appStoreCollector.Start() (so
appStoreCollector is expected non-nil or would have panicked), remove the
redundant guard in the defer and always call appStoreCollector.Stop(); update
the defer that references appStoreCollector to simply call
appStoreCollector.Stop() (keep the existing defer block around the Stop call)
and ensure references are to appStoreCollector, NewAppStoreCollector, Start, and
Stop so reviewers can locate the change.
- Line 102: Replace the hardcoded 30s interval passed to
internalMetrics.NewAppStoreCollector with a configurable field: add
CollectInterval time.Duration `mapstructure:"collect_interval"` to your
config.MetricsConfig (or config.Metrics) and use cfg.Metrics.CollectInterval
when constructing appStoreCollector; ensure you apply a sensible default of
30*time.Second if cfg.Metrics.CollectInterval is zero/empty before calling
internalMetrics.NewAppStoreCollector so existing behavior remains when the
config omits the value.
In `@internal/metrics/appstore_collector.go`:
- Around line 68-93: AppStoreCollector.collect() currently calls
c.metrics.UpdateTimestamp() after each successful source update (in the branches
after c.metrics.UpdateInstallations, UpdatePolicies, UpdateFees); change this so
the timestamp is only updated once per collect cycle when at least one source
succeeded: introduce a local bool (e.g., anySucceeded) that you set true after
each successful UpdateInstallations/UpdatePolicies/UpdateFees call, and after
all three fetches call c.metrics.UpdateTimestamp() only if anySucceeded is true.
There was a problem hiding this comment.
I think we should remove the auth-header check here.
verifier/internal/api/server.go
Line 162 in 18cfaed
and
verifier/internal/api/server.go
Line 173 in 18cfaed
Plugins can still validate the auth header since it’s sent by the verifier. However, the agent/extension doesn’t have an access token, so we need to remove the auth check on the verifier side for reshare and policy requests.
|
@Ehsan-saradar please check if everything is sufficient |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/api/policy.go (1)
79-101:⚠️ Potential issue | 🟠 MajorSignature verification must precede the billing/trial checks to prevent unauthenticated probing.
Since the middleware-level auth requirement was intentionally removed from this endpoint,
policy.PublicKeyis now fully attacker-controlled untilverifyPolicySignaturesucceeds. In the current ordering, the two operations on lines 80 and 88–96 are executed against an unverified caller-supplied key:
- Line 80 –
IsTrialActivehits the database with an arbitrarypolicy.PublicKey, allowing any caller to enumerate whether a given public key has an active trial.- Lines 88–96 –
vaultStorage.Existhits object storage with an arbitrary key, similarly leaking billing vault presence.Both are I/O operations against sensitive state and incur real resource cost, yet neither has been authorized. Moving
verifyPolicySignatureto immediately after UUID generation ensures the cryptographic proof of ownership is established before any state is probed.🔒 Proposed fix: hoist signature check before billing/trial probe
if policy.ID == uuid.Nil { policy.ID = uuid.New() } + if !s.verifyPolicySignature(policy) { + s.logger.Error("invalid policy signature") + return c.JSON(http.StatusBadRequest, NewErrorResponseWithMessage(msgInvalidPolicySignature)) + } + var ( isTrialActive bool err error ) err = s.db.WithTransaction(c.Request().Context(), func(ctx context.Context, tx pgx.Tx) error { isTrialActive, _, err = s.db.IsTrialActive(ctx, tx, policy.PublicKey) return err }) if err != nil { s.logger.WithError(err).Warnf("Failed to check trial info") } if !isTrialActive { filePathName := common.GetVaultBackupFilename(policy.PublicKey, vtypes.PluginVultisigFees_feee.String()) exist, err := s.vaultStorage.Exist(filePathName) if err != nil { s.logger.WithError(err).Error("failed to check vault existence") return c.JSON(http.StatusInternalServerError, NewErrorResponseWithMessage(msgInternalError)) } if !exist { return c.JSON(http.StatusForbidden, NewErrorResponseWithMessage(msgAccessDeniedBilling)) } } - if !s.verifyPolicySignature(policy) { - s.logger.Error("invalid policy signature") - return c.JSON(http.StatusBadRequest, NewErrorResponseWithMessage(msgInvalidPolicySignature)) - }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/api/policy.go` around lines 79 - 101, The signature verification must run before any billing/trial or storage probes because policy.PublicKey is attacker-controlled until proven; move the call to s.verifyPolicySignature(policy) to immediately after the policy UUID is generated/parsed and before calling s.db.IsTrialActive(...) and s.vaultStorage.Exist(...). If s.verifyPolicySignature(policy) returns false, log and return the existing BadRequest response (msgInvalidPolicySignature) without touching IsTrialActive or vaultStorage.Exist; keep the existing error handling and responses for the DB/storage calls but only execute them after verifyPolicySignature succeeds.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/api/server.go`:
- Around line 160-164: The ReshareVault endpoint is currently unauthenticated
and allows resharing without proof-of-ownership; update the route registration
so that vaultGroup.POST("/reshare", s.ReshareVault) requires the same
VaultAuthMiddleware used by GetVault, and also protect the existence check by
adding authentication to vaultGroup.GET("/exist/:pluginId/:publicKeyECDSA",
s.ExistVault). Concretely, apply s.VaultAuthMiddleware (either by adding it to
the vaultGroup middleware stack or by passing it into the POST/GET handlers) so
ReshareVault and ExistVault enforce the caller’s identity/ownership before
proceeding.
- Around line 171-174: The PUT /plugin/policy route is missing the
VaultAuthMiddleware so UpdatePluginPolicyById will not find the vault_public_key
in context and will always error; update the route registration for
pluginGroup.PUT("/policy", ...) to include s.VaultAuthMiddleware (same pattern
used by DELETE and other sensitive routes) so that VaultAuthMiddleware sets the
vault_public_key context key before UpdatePluginPolicyById executes.
---
Outside diff comments:
In `@internal/api/policy.go`:
- Around line 79-101: The signature verification must run before any
billing/trial or storage probes because policy.PublicKey is attacker-controlled
until proven; move the call to s.verifyPolicySignature(policy) to immediately
after the policy UUID is generated/parsed and before calling
s.db.IsTrialActive(...) and s.vaultStorage.Exist(...). If
s.verifyPolicySignature(policy) returns false, log and return the existing
BadRequest response (msgInvalidPolicySignature) without touching IsTrialActive
or vaultStorage.Exist; keep the existing error handling and responses for the
DB/storage calls but only execute them after verifyPolicySignature succeeds.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/api/policy.go (1)
79-97:⚠️ Potential issue | 🔴 CriticalSignature verification must precede all database and storage operations to prevent unauthenticated writes.
policy.PublicKeyis entirely user-supplied and unauthenticated at this point. However,IsTrialActive()(line 80) performs a database INSERT when called with a public key that has no existing trial record. This write happens in a transaction beforeverifyPolicySignature()runs at line 99.Attack scenario: An attacker can call
CreatePluginPolicyrepeatedly with arbitrary public keys they do not control. For each key without an existing trial record,IsTrialActivewill insert a new trial entry into the fees table. The global rate limiter (5 req/min, burst 30) is insufficient to prevent this DOS-style record creation.Move
verifyPolicySignature()to execute immediately after binding the policy and before any database or storage operations. Only allow database/storage access for callers who cryptographically prove ownership of the private key.Reordering sketch
func (s *Server) CreatePluginPolicy(c echo.Context) error { var policy types.PluginPolicy if err := c.Bind(&policy); err != nil { ... } if policy.ID == uuid.Nil { policy.ID = uuid.New() } + // Verify signature first — caller must control the vault key before any DB/storage access. + if !s.verifyPolicySignature(policy) { + s.logger.Error("invalid policy signature") + return c.JSON(http.StatusBadRequest, NewErrorResponseWithMessage(msgInvalidPolicySignature)) + } + err = s.db.WithTransaction(c.Request().Context(), func(ctx context.Context, tx pgx.Tx) error { isTrialActive, _, err = s.db.IsTrialActive(ctx, tx, policy.PublicKey) return err }) ... - - if !s.verifyPolicySignature(policy) { - s.logger.Error("invalid policy signature") - return c.JSON(http.StatusBadRequest, NewErrorResponseWithMessage(msgInvalidPolicySignature)) - } ... }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/api/policy.go` around lines 79 - 97, The code currently calls IsTrialActive (which can perform DB INSERTs) before verifying the caller owns policy.PublicKey; move the signature verification to immediately after the policy is bound and before any DB/storage calls so unauthenticated keys cannot trigger writes. Concretely, in CreatePluginPolicy (the handler that calls s.db.WithTransaction and s.vaultStorage.Exist), call verifyPolicySignature(...) right after binding/parsing the policy object and before invoking s.db.IsTrialActive or any vaultStorage methods; ensure verifyPolicySignature returns an error on failure and short-circuits the handler to avoid any DB/Storage access when signature verification fails.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@internal/api/policy.go`:
- Around line 79-97: The code currently calls IsTrialActive (which can perform
DB INSERTs) before verifying the caller owns policy.PublicKey; move the
signature verification to immediately after the policy is bound and before any
DB/storage calls so unauthenticated keys cannot trigger writes. Concretely, in
CreatePluginPolicy (the handler that calls s.db.WithTransaction and
s.vaultStorage.Exist), call verifyPolicySignature(...) right after
binding/parsing the policy object and before invoking s.db.IsTrialActive or any
vaultStorage methods; ensure verifyPolicySignature returns an error on failure
and short-circuits the handler to avoid any DB/Storage access when signature
verification fails.
Closes vultisig/recipes#487
Closes vultisig/recipes#531
Summary by CodeRabbit
New Features
Changes
Behavior