Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds App Store Prometheus metrics, a periodic AppStoreCollector that queries the DB and updates those metrics, and optional bearer-token auth for the /metrics endpoint; updates configs/examples to include a token and adds Grafana dashboards for App Store product metrics. Changes
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (4)
cmd/verifier/main.go (1)
103-103: Consider making the collection interval configurable.The 30-second interval is hardcoded. For a product metrics dashboard, this is reasonable, but extracting it to
MetricsConfigwould allow tuning without code changes (e.g., increasing it to reduce DB load as the dataset grows).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/verifier/main.go` at line 103, The hardcoded 30-second interval passed to internalMetrics.NewAppStoreCollector should be made configurable: add a field (e.g., CollectionInterval time.Duration) to your MetricsConfig struct, load it from the existing configuration source or environment with a sensible default (e.g., 30*time.Second), and replace the literal 30*time.Second in the call that constructs appStoreCollector with metricsConfig.CollectionInterval so NewAppStoreCollector receives the configurable interval.internal/metrics/appstore.go (1)
83-93: Minor scrape-window gap afterReset()on gauge vectors.Between
Reset()on line 84 and the completion of the loop, a Prometheus scrape could observe partial or zeroed data. The same applies toUpdatePoliciesandUpdateFees. For a dashboard with a 30-second collection interval this is unlikely to cause visible issues, but if precise consistency matters, you could build into a temporary map and swap atomically via re-registration or use a custom collector.This is low-risk given the use case (product metrics dashboard).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/metrics/appstore.go` around lines 83 - 93, The current UpdateInstallations (and similarly UpdatePolicies and UpdateFees) calls GaugeVec.Reset() then sets values, which can expose a scrape window with partial/zeroed metrics; fix by preparing a new prometheus.GaugeVec instance, populate it fully off-line, then swap it in atomically: create a new gauge vec (e.g., newAppstoreInstallationsTotal), set all label values and grand total on that new vec, then under a mutex unregister the old appstoreInstallationsTotal and register the new one and replace the pointer on AppStoreMetrics; apply the same pattern for appstorePoliciesTotal and appstoreFeesTotal and update UpdatePolicies/UpdateFees to build and swap their gauge vectors instead of Reset()-then-set.internal/metrics/appstore_collector.go (1)
64-67:Stop()panics on double-close and deadlocks ifStart()was never called.
close(c.stopCh)will panic if called twice. Also,doneChis never closed ifStart()is not invoked, so<-c.doneChblocks forever.Consider adding a
sync.Oncefor safe shutdown, or at minimum document the expected lifecycle contract.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/metrics/appstore_collector.go` around lines 64 - 67, The Stop method on AppStoreCollector currently does close(c.stopCh) which panics if Stop is called twice and then waits on <-c.doneCh which blocks if Start never ran; to fix, make shutdown idempotent and safe by adding a sync.Once (e.g. a field shutdownOnce) and use shutdownOnce.Do to close stopCh only once, and ensure doneCh is either closed in Start or created/closed in Stop when Start wasn't run (or use a buffered signal/ctx) so <-c.doneCh never blocks indefinitely; update AppStoreCollector.Stop and Start to coordinate using stopCh, doneCh, and shutdownOnce to guarantee no double-close panics and no deadlocks.product-metrics.json (1)
13-13: Hardcoded datasource UID may not match all Grafana deployments.The datasource
"uid": "prometheus"is repeated across all panels. Consider using a Grafana template variable (e.g.,"uid": "${DS_PROMETHEUS}") so the dashboard is portable across environments.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@product-metrics.json` at line 13, The dashboard JSON currently hardcodes the Prometheus datasource UID via the "datasource": {"type": "prometheus", "uid": "prometheus"} entry in product-metrics.json; replace the hardcoded "uid": "prometheus" with a Grafana template variable such as "uid": "${DS_PROMETHEUS}" for every panel/occurrence (search for the "datasource" objects) so the dashboard is portable across deployments and can be mapped via the DS_PROMETHEUS variable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/metrics/appstore_collector.go`:
- Around line 25-39: NewAppStoreCollector should guard against zero or negative
interval values to avoid time.NewTicker panics in Start(): inside
NewAppStoreCollector check if interval <= 0 and replace it with a sane default
(e.g., time.Minute or another configured default) and optionally emit a warning
via the provided logger; ensure the sanitized interval is stored on the returned
AppStoreCollector so Start (which calls time.NewTicker) always receives a
positive duration. Include mention of NewAppStoreCollector,
AppStoreCollector.interval, and Start in the change.
- Around line 69-93: The collect method on AppStoreCollector currently uses
context.Background() for DB calls and always calls c.metrics.UpdateTimestamp(),
which can hang on slow DBs and mark timestamps stale; replace the background
context with a context.WithTimeout (use the collector's interval or a fraction
thereof) and defer cancel, then pass that ctx into
c.db.GetInstallationsByPlugin, GetPoliciesByPlugin and GetFeesByPlugin so
queries time out; track whether any Update* call succeeded and only call
c.metrics.UpdateTimestamp() if at least one of c.metrics.UpdateInstallations,
UpdatePolicies or UpdateFees ran successfully; ensure errors are still logged
via c.logger.Errorf and the cancel function is invoked to avoid leaks.
In `@internal/metrics/server.go`:
- Line 20: The Token field in the Config struct currently has a JSON tag that
will include the secret when the struct is marshalled; change the struct tag on
Token (the field named Token) to hide it (e.g., json:"-") or implement a custom
MarshalJSON/MarshalText on the Config type to omit or redact Token so serialized
output (debug logs, endpoints) cannot leak the secret.
- Around line 38-49: In bearerAuthMiddleware, reject Authorization headers that
do not start with the exact "Bearer " prefix (use strings.HasPrefix and return
401 if absent) and replace the direct string equality check with a constant-time
comparison: after confirming the prefix, extract the token (without TrimPrefix
accepting fallback), ensure lengths match, and use
crypto/subtle.ConstantTimeCompare on the byte slices of providedToken and the
expected token; keep the existing 401 response path when the comparison fails.
In `@internal/storage/postgres/metrics.go`:
- Around line 71-102: GetFeesByPlugin uses incorrect column names and an
unnecessary join; update the SQL to select f.plugin_id and SUM(f.amount) (use
f.amount not f.value), remove the WHERE f.is_collected filter and the JOIN to
plugin_policies (fees already has plugin_id), and ensure the query returns two
columns matching the Scan targets (pluginID string, total int64) so the
rows.Scan(&pluginID, &total) succeeds; keep using p.pool.Query(ctx, query) and
existing result map population.
---
Nitpick comments:
In `@cmd/verifier/main.go`:
- Line 103: The hardcoded 30-second interval passed to
internalMetrics.NewAppStoreCollector should be made configurable: add a field
(e.g., CollectionInterval time.Duration) to your MetricsConfig struct, load it
from the existing configuration source or environment with a sensible default
(e.g., 30*time.Second), and replace the literal 30*time.Second in the call that
constructs appStoreCollector with metricsConfig.CollectionInterval so
NewAppStoreCollector receives the configurable interval.
In `@internal/metrics/appstore_collector.go`:
- Around line 64-67: The Stop method on AppStoreCollector currently does
close(c.stopCh) which panics if Stop is called twice and then waits on
<-c.doneCh which blocks if Start never ran; to fix, make shutdown idempotent and
safe by adding a sync.Once (e.g. a field shutdownOnce) and use shutdownOnce.Do
to close stopCh only once, and ensure doneCh is either closed in Start or
created/closed in Stop when Start wasn't run (or use a buffered signal/ctx) so
<-c.doneCh never blocks indefinitely; update AppStoreCollector.Stop and Start to
coordinate using stopCh, doneCh, and shutdownOnce to guarantee no double-close
panics and no deadlocks.
In `@internal/metrics/appstore.go`:
- Around line 83-93: The current UpdateInstallations (and similarly
UpdatePolicies and UpdateFees) calls GaugeVec.Reset() then sets values, which
can expose a scrape window with partial/zeroed metrics; fix by preparing a new
prometheus.GaugeVec instance, populate it fully off-line, then swap it in
atomically: create a new gauge vec (e.g., newAppstoreInstallationsTotal), set
all label values and grand total on that new vec, then under a mutex unregister
the old appstoreInstallationsTotal and register the new one and replace the
pointer on AppStoreMetrics; apply the same pattern for appstorePoliciesTotal and
appstoreFeesTotal and update UpdatePolicies/UpdateFees to build and swap their
gauge vectors instead of Reset()-then-set.
In `@product-metrics.json`:
- Line 13: The dashboard JSON currently hardcodes the Prometheus datasource UID
via the "datasource": {"type": "prometheus", "uid": "prometheus"} entry in
product-metrics.json; replace the hardcoded "uid": "prometheus" with a Grafana
template variable such as "uid": "${DS_PROMETHEUS}" for every panel/occurrence
(search for the "datasource" objects) so the dashboard is portable across
deployments and can be mapped via the DS_PROMETHEUS variable.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
plugin/progress/service_nil.go (1)
10-10: Add a compile-time interface assertion.The idiomatic Go pattern
var _ Service = (*NilService)(nil)catches interface drift at compile time rather than at the call site, which can surface in distant packages.♻️ Proposed addition
// NilService implements Service for plugins where progress tracking is not required type NilService struct{} + +var _ Service = (*NilService)(nil)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/progress/service_nil.go` at line 10, Add a compile-time interface assertion to ensure NilService implements Service: add the idiomatic line var _ Service = (*NilService)(nil) near the NilService type declaration (reference symbols: NilService and Service) so any interface drift is caught at compile time rather than at runtime.plugin/progress/service.go (1)
14-17: Consider documenting the nil-Progress-without-error contract.
GetProgresscan legitimately return(nil, nil)(asNilServicedoes), but there's no indication in the interface itself that callers must nil-check before dereferencing. A brief comment prevents future consumers and implementors from misinterpreting the contract.📝 Suggested doc additions
+// Service provides plugin progress tracking. +// Implementations may return (nil, nil) from GetProgress to indicate that +// no progress is available without it being an error condition. type Service interface { + // GetProgress returns the current progress for policyID, or (nil, nil) if unavailable. GetProgress(ctx context.Context, policyID uuid.UUID) (*Progress, error) + // GetProgressBatch returns progress keyed by policyID; missing entries are omitted. GetProgressBatch(ctx context.Context, policyIDs []uuid.UUID) (map[uuid.UUID]*Progress, error) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/progress/service.go` around lines 14 - 17, Add clear documentation to the Service interface stating that GetProgress may legitimately return (nil, nil) and callers must nil-check the *Progress before dereferencing; also document expected semantics of GetProgressBatch (e.g., map may omit entries or contain nil *Progress values for missing policies) so implementors like NilService and callers know the contract for nil Progress results without an error. Target the Service interface and mention GetProgress, GetProgressBatch, Progress, and NilService in the comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@plugin/progress/service_nil.go`:
- Line 10: Add a compile-time interface assertion to ensure NilService
implements Service: add the idiomatic line var _ Service = (*NilService)(nil)
near the NilService type declaration (reference symbols: NilService and Service)
so any interface drift is caught at compile time rather than at runtime.
In `@plugin/progress/service.go`:
- Around line 14-17: Add clear documentation to the Service interface stating
that GetProgress may legitimately return (nil, nil) and callers must nil-check
the *Progress before dereferencing; also document expected semantics of
GetProgressBatch (e.g., map may omit entries or contain nil *Progress values for
missing policies) so implementors like NilService and callers know the contract
for nil Progress results without an error. Target the Service interface and
mention GetProgress, GetProgressBatch, Progress, and NilService in the comment.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (5)
grafana-dashboard-appstore.json (5)
255-256:"spanNulls": falsemay produce broken lines in timeseries panelsIf the AppStore collector runs on a period longer than the Prometheus scrape interval (e.g., every 5 min vs a 1-min scrape), Prometheus will store stale-then-absent data points, and
spanNulls: falsewill render visible line gaps. For a monotonically increasing gauge like a grand total, connecting across nulls is always safe and produces a cleaner chart.✏️ Proposed fix (apply to panels 4, 5, and 6)
- "spanNulls": false + "spanNulls": trueAlso applies to: 325-326, 395-396
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@grafana-dashboard-appstore.json` around lines 255 - 256, The timeseries panels currently set "spanNulls": false which creates visible gaps when the AppStore collector reports less frequently than Prometheus scrapes; change the panels that display the AppStore grand total (panels referenced in the review as panels 4, 5 and 6 and the other occurrences noted) to use "spanNulls": true so the line connects across nulls. Locate the JSON objects for those panels (look for the timeseries panel definitions that contain "showPoints" and "spanNulls") and update spanNulls from false to true for each occurrence.
429-432: Default time rangenow-6his too narrow for product/business metricsInstallation counts, active policies, and cumulative fees are slow-moving business metrics; a 6-hour window provides little useful signal. A 24-hour or 7-day default better serves a product dashboard.
✏️ Proposed fix
"time": { - "from": "now-6h", + "from": "now-7d", "to": "now" },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@grafana-dashboard-appstore.json` around lines 429 - 432, The dashboard's default time window is set to "now-6h" under the "time" block (property "from": "now-6h"), which is too short for product/business metrics; update that "from" value to a longer window such as "now-7d" (or "now-24h" if you prefer daily) so installation counts, active policies, and cumulative fees show meaningful trends by default.
7-7: Empty"timezone"defaults to browser timezone — prefer"utc"An empty string causes Grafana to use each viewer's local browser timezone, making timestamps inconsistent across team members. Operational and product metric dashboards conventionally use
"utc".✏️ Proposed fix
- "timezone": "", + "timezone": "utc",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@grafana-dashboard-appstore.json` at line 7, The dashboard's "timezone" field is an empty string which causes Grafana to use each viewer's browser timezone; update the "timezone" value in grafana-dashboard-appstore.json from "" to "utc" so the dashboard consistently uses UTC across all viewers.
15-18: Hardcoded datasource UID"prometheus"reduces portabilityAll six panels share
"uid": "prometheus". This works only in environments where the Prometheus datasource was provisioned with exactly that UID. A dashboard variable (${DS_PROMETHEUS}) is the standard Grafana pattern for portable datasource references.✏️ Proposed fix — add a datasource template variable and reference it
Add to the dashboard's
"templating"section:"templating": { "list": [ { "name": "DS_PROMETHEUS", "type": "datasource", "query": "prometheus", "current": {} } ] }Then replace the hardcoded datasource block in all six panels:
- "datasource": { - "type": "prometheus", - "uid": "prometheus" - }, + "datasource": "${DS_PROMETHEUS}",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@grafana-dashboard-appstore.json` around lines 15 - 18, Add a templated datasource variable named DS_PROMETHEUS in the dashboard's "templating" section (a datasource type variable that queries "prometheus") and then update each panel's "datasource" block (the six panels currently using "uid": "prometheus") to reference that variable instead of the hardcoded UID (use the template variable placeholder so the panels use ${DS_PROMETHEUS}). This ensures portability across environments where the Prometheus datasource UID differs.
3-3: Add a stableuidto the dashboardThe JSON has
"id": nullbut no"uid"field. Without a stable UID, every Grafana provisioning run/import creates a duplicate dashboard instance instead of updating the existing one. This also prevents stable deep-linking and annotation anchoring.✏️ Proposed fix
{ "dashboard": { "id": null, + "uid": "vultisig-appstore-metrics", "title": "Vultisig App Store Metrics",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@grafana-dashboard-appstore.json` at line 3, The dashboard JSON currently has "id": null but no stable "uid", causing provisioning to create duplicates; add a constant top-level "uid" field (for example "uid": "appstore-dashboard" or another predictable string/UUID) to grafana-dashboard-appstore.json so Grafana can match and update the same dashboard across imports; leave "id": null for provisioning but ensure the new "uid" value is committed and stays stable across deployments.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@grafana-dashboard-appstore.json`:
- Around line 42-52: The sortBy transformation is referencing the pre-rename
field name "Value" so sorting silently fails after the organize/rename step;
update the sortBy option in each affected table panel (the sortBy block that
currently lists field: "Value") to reference the post-rename column name used by
the organize transformation — replace "Value" with "Installations" for the
Installations panel, "Active Policies" for the Active Policies panel, and "Total
Fees (USDC)" for the Total Fees panel so the sortBy matches the renamed fields.
---
Nitpick comments:
In `@grafana-dashboard-appstore.json`:
- Around line 255-256: The timeseries panels currently set "spanNulls": false
which creates visible gaps when the AppStore collector reports less frequently
than Prometheus scrapes; change the panels that display the AppStore grand total
(panels referenced in the review as panels 4, 5 and 6 and the other occurrences
noted) to use "spanNulls": true so the line connects across nulls. Locate the
JSON objects for those panels (look for the timeseries panel definitions that
contain "showPoints" and "spanNulls") and update spanNulls from false to true
for each occurrence.
- Around line 429-432: The dashboard's default time window is set to "now-6h"
under the "time" block (property "from": "now-6h"), which is too short for
product/business metrics; update that "from" value to a longer window such as
"now-7d" (or "now-24h" if you prefer daily) so installation counts, active
policies, and cumulative fees show meaningful trends by default.
- Line 7: The dashboard's "timezone" field is an empty string which causes
Grafana to use each viewer's browser timezone; update the "timezone" value in
grafana-dashboard-appstore.json from "" to "utc" so the dashboard consistently
uses UTC across all viewers.
- Around line 15-18: Add a templated datasource variable named DS_PROMETHEUS in
the dashboard's "templating" section (a datasource type variable that queries
"prometheus") and then update each panel's "datasource" block (the six panels
currently using "uid": "prometheus") to reference that variable instead of the
hardcoded UID (use the template variable placeholder so the panels use
${DS_PROMETHEUS}). This ensures portability across environments where the
Prometheus datasource UID differs.
- Line 3: The dashboard JSON currently has "id": null but no stable "uid",
causing provisioning to create duplicates; add a constant top-level "uid" field
(for example "uid": "appstore-dashboard" or another predictable string/UUID) to
grafana-dashboard-appstore.json so Grafana can match and update the same
dashboard across imports; leave "id": null for provisioning but ensure the new
"uid" value is committed and stays stable across deployments.
Closes: vultisig/recipes#532
Summary by CodeRabbit
New Features
Configuration
Observability