Skip to content

feat(metrics): Add MetricsPersistence SPI for backend-agnostic metrics storage (#3337)#3616

Open
obelix74 wants to merge 11 commits intoapache:mainfrom
obelix74:feat-3337-metrics-persistence-spi
Open

feat(metrics): Add MetricsPersistence SPI for backend-agnostic metrics storage (#3337)#3616
obelix74 wants to merge 11 commits intoapache:mainfrom
obelix74:feat-3337-metrics-persistence-spi

Conversation

@obelix74
Copy link
Contributor

This commit introduces a Service Provider Interface (SPI) for persisting Iceberg metrics reports, addressing the extensibility concerns raised in the design review.

Key components:

  • MetricsPersistence: Main SPI interface with write and query operations
  • NoOpMetricsPersistence: Default do-nothing implementation for backends that don't support metrics persistence
  • ScanMetricsRecord: Immutable interface for scan metrics data
  • CommitMetricsRecord: Immutable interface for commit metrics data
  • MetricsQueryCriteria: Query parameters with filtering and pagination
  • MetricsContext: Context for conversion (realm, catalog, principal info)
  • MetricsPersistenceFactory: Factory for realm-scoped instances
  • MetricsRecordConverter: Converts Iceberg reports to SPI records

Relates to: #3337

Checklist

  • 🛡️ Don't disclose security issues! (contact security@apache.org)
  • 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

@obelix74
Copy link
Contributor Author

@dimas-b This is the PR for the SPI, as you suggested. This contains the document feedback incorporated.

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay, @obelix74 ... Some preliminary comments below.

I'll probably stick to this PR for SPI reviews rather than the doc 🤔 GH seems more convenient.

@obelix74 obelix74 requested a review from dimas-b February 4, 2026 02:50
@obelix74
Copy link
Contributor Author

obelix74 commented Feb 4, 2026

@dimas-b Thank you again for the detailed review. I have addressed all your questions. There are a few open items.

  • Did not use CatalogEntity - explained why.
  • Added comments about query sub-interfaces and separate query methods. I will follow your lead when you respond.

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall... Just a couple more minor comments.

@dimas-b
Copy link
Contributor

dimas-b commented Feb 4, 2026

@obelix74 : please rebase to fix CI

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, last comment from me :)

Anand Kumar Sankaran added 9 commits February 4, 2026 17:59
…s storage

This commit introduces a Service Provider Interface (SPI) for persisting
Iceberg metrics reports, addressing the extensibility concerns raised in
the design review.

Key components:
- MetricsPersistence: Main SPI interface with write and query operations
- NoOpMetricsPersistence: Default do-nothing implementation for backends
  that don't support metrics persistence
- ScanMetricsRecord: Immutable interface for scan metrics data
- CommitMetricsRecord: Immutable interface for commit metrics data
- MetricsQueryCriteria: Query parameters with filtering and pagination
- MetricsContext: Context for conversion (realm, catalog, principal info)
- MetricsPersistenceFactory: Factory for realm-scoped instances
- MetricsRecordConverter: Converts Iceberg reports to SPI records

Design principles:
- Backend-agnostic: Can be implemented by JDBC, NoSQL, or custom backends
- No instanceof checks: Service code calls interface methods directly
- Idempotent writes: Same reportId written twice has no effect
- Graceful degradation: Unsupported backends return empty results

Relates to: apache#3337
Remove fields that can be obtained from ambient request context at write time:
- principalName: Available from SecurityContext/PolarisPrincipal
- requestId: Not well-defined in Polaris; unclear what request it refers to
- otelTraceId/otelSpanId: Available from OTel context via Span.current()

Keep reportTraceId as it's a client-provided value from the report metadata
that cannot be obtained from the ambient context.

Rename otelTraceId filter in MetricsQueryCriteria to reportTraceId to match
the field that is actually stored in the records.

This keeps the SPI focused on business data (the metrics themselves) rather
than infrastructure concerns (tracing, authentication) which the persistence
implementation can obtain from the ambient context at write time if needed.
…tern

- Create ReportIdToken for cursor-based pagination using report ID (UUID)
- Remove limit() and offset() from MetricsQueryCriteria
- Update MetricsPersistence to use PageToken parameter and return Page<T>
- Update NoOpMetricsPersistence to return empty Page objects
- Register ReportIdToken via service loader

This change makes the SPI truly backend-agnostic by using the existing
Polaris PageToken pattern instead of RDBMS-specific offset pagination.
Each backend can implement cursors in their optimal way (keyset for RDBMS,
continuation tokens for NoSQL).

Addresses reviewer feedback on MetricsQueryCriteria.offset() field.
Per reviewer feedback:
- Replace Iceberg's TableIdentifier with separate namespace/tableName strings
  in MetricsRecordIdentity to avoid Iceberg dependencies in Polaris SPI
- Remove catalogName from records, keep only catalogId since catalog names
  can change over time (via rename operations)
- Update MetricsQueryCriteria to use catalogId (OptionalLong) instead of catalogName
- Update MetricsRecordConverter to extract namespace/tableName from TableIdentifier

The service layer (MetricsRecordConverter) still accepts TableIdentifier and
performs the conversion to primitives for the SPI records.
Per reviewer feedback, namespace is now represented as a List<String>
of individual levels rather than a dot-separated string. This avoids
ambiguity when namespace segments contain dots.

Changes:
- MetricsRecordIdentity: namespace() now returns List<String>
- MetricsQueryCriteria: namespace() now returns List<String>
- MetricsRecordConverter: namespaceToList() converts Iceberg Namespace
  to List<String> using Arrays.asList()

The persistence implementation handles the serialization format.
Per reviewer feedback:
- r2766326028: Use table ID (same as catalog ID) since table names can change
- r2766343275: Avoid denormalizing table names to prevent correctness issues
- r2766321215: Return builder with table info, add time ranges at call site

Changes:
- MetricsRecordIdentity: tableName() -> tableId() (long)
- MetricsQueryCriteria: tableName() -> tableId() (OptionalLong)
- MetricsQueryCriteria.forTable(): Returns builder with catalogId/tableId
- MetricsRecordConverter: tableIdentifier(TableIdentifier) -> tableId(long) + namespace(List<String>)

The caller (PersistingMetricsReporter) now needs to resolve table entity ID
before creating records, similar to how catalogId is resolved.
Per reviewer feedback - since we query by tableId, namespace is implicit.
If users want to query by namespace, the service layer should resolve
namespace to table IDs using the current catalog state, then query by
those IDs. This avoids confusion with table moves over time.

Namespace is still stored in MetricsRecordIdentity for display purposes.
@obelix74 obelix74 force-pushed the feat-3337-metrics-persistence-spi branch from 9a9e149 to 7d4212c Compare February 5, 2026 02:09
dimas-b
dimas-b previously approved these changes Feb 5, 2026
Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is reasonable to merge at this point, assume future SPI changes are not ruled out.

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Feb 5, 2026
Added @beta annotation from Guava to all public types in the
Metrics Persistence SPI package to signal that this API is
experimental and may change in future releases.

Annotated types:
- MetricsPersistence
- ScanMetricsRecord
- CommitMetricsRecord
- MetricsQueryCriteria
- MetricsRecordIdentity
- ReportIdToken
dimas-b
dimas-b previously approved these changes Feb 5, 2026
@dimas-b
Copy link
Contributor

dimas-b commented Feb 5, 2026

This PR is one week old and it looks like it did not attract attention from many contributors 😅 (thanks for your review, @evindj !). So, unless fresh comments appear, I'm planning to merge tomorrow.

Since the new SPI is marked as Beta, if concerns are uncovered later, we'll address on main.

Allow callers to specify the timestamp for metrics records, defaulting
to Instant.now() if not provided. This enables the reporter to use the
received timestamp rather than the conversion time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants