feat(metrics): Evolve PolarisMetricsReporter interface with timestamp parameter and comprehensive documentation#3468
Conversation
This commit adds the core infrastructure for emitting metrics as events when reportMetrics() is called on the Iceberg REST catalog API. Changes: - Add REPORT_METRICS_REQUEST attribute to EventAttributes.java - Add BEFORE_REPORT_METRICS and AFTER_REPORT_METRICS to PolarisEventType.java - Update reportMetrics() in IcebergRestCatalogEventServiceDelegator.java to emit BEFORE/AFTER events with catalog name, namespace, table, and request - Add ReportMetricsEventTest.java with unit tests verifying event emission This enables event listeners to receive metrics report events, allowing for use cases like audit logging and metrics persistence. Added tests and a Feature flag
|
@dimas-b Created this PR for the event emission and added a feature flag for backwards compatibility. |
...java/org/apache/polaris/service/catalog/iceberg/IcebergRestCatalogEventServiceDelegator.java
Outdated
Show resolved
Hide resolved
I am not sure backwards compatibility is important here as currently no listener bundled with Polaris actually processes the metrics-related events. |
...java/org/apache/polaris/service/catalog/iceberg/IcebergRestCatalogEventServiceDelegator.java
Outdated
Show resolved
Hide resolved
This commit removes the event-based metrics reporting system and introduces a new MetricsProcessor interface with CDI support. This is the foundation for a simpler, more direct metrics processing architecture. Changes: - Remove ENABLE_METRICS_EVENT_EMISSION feature flag - Remove BEFORE_REPORT_METRICS and AFTER_REPORT_METRICS event types - Remove REPORT_METRICS_REQUEST event attribute - Remove event emission from IcebergRestCatalogEventServiceDelegator.reportMetrics() - Remove ReportMetricsEventTest - Add MetricsProcessor interface for processing metrics reports - Add MetricsProcessingContext with rich contextual information (realm ID, principal, request ID, OpenTelemetry trace context) - Add MetricsProcessorConfiguration for type-safe configuration - Add CDI producer in ServiceProducers for MetricsProcessor The new MetricsProcessor interface provides: - Simpler, more direct processing (no events) - Rich context with realm, principal, request ID, OTel trace - CDI-based extensibility via @Identifier annotations - Type-safe configuration Implementations will be added in subsequent PRs. This commit provides the foundational interfaces and CDI infrastructure. Backward compatibility: The existing PolarisMetricsReporter interface and configuration remain unchanged and functional.
- Remove ENABLE_METRICS_EVENT_EMISSION feature flag entry - Add polaris.metrics.processor.type configuration property
runtime/service/src/main/java/org/apache/polaris/service/reporting/MetricsProcessor.java
Outdated
Show resolved
Hide resolved
|
@adutra and @dimas-b thanks for all the feedback. Indeed, @singhpk234 also did not like the event based approach. I have made the following changes to address your feedback.
The existing PolarisMetricsReporter interface and polaris.iceberg-metrics.reporting.type configuration remain unchanged and functional. What's Not Included This PR intentionally excludes implementations to keep the scope focused: These will be added in subsequent PRs. Configuration Next Steps Follow-up PRs will add:
I am open to removing |
|
Perhaps I should keep the |
...ime/service/src/main/java/org/apache/polaris/service/reporting/MetricsProcessingContext.java
Outdated
Show resolved
Hide resolved
...ime/service/src/main/java/org/apache/polaris/service/reporting/MetricsProcessingContext.java
Outdated
Show resolved
Hide resolved
...ime/service/src/main/java/org/apache/polaris/service/reporting/MetricsProcessingContext.java
Outdated
Show resolved
Hide resolved
...ime/service/src/main/java/org/apache/polaris/service/reporting/MetricsProcessingContext.java
Outdated
Show resolved
Hide resolved
runtime/service/src/main/java/org/apache/polaris/service/reporting/MetricsProcessor.java
Outdated
Show resolved
Hide resolved
runtime/service/src/main/java/org/apache/polaris/service/reporting/PolarisMetricsReporter.java
Outdated
Show resolved
Hide resolved
runtime/service/src/main/java/org/apache/polaris/service/reporting/PolarisMetricsReporter.java
Show resolved
Hide resolved
runtime/service/src/main/java/org/apache/polaris/service/reporting/PolarisMetricsReporter.java
Outdated
Show resolved
Hide resolved
.../service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogAdapter.java
Outdated
Show resolved
Hide resolved
runtime/service/src/main/java/org/apache/polaris/service/reporting/PolarisMetricsReporter.java
Outdated
Show resolved
Hide resolved
runtime/service/src/main/java/org/apache/polaris/service/reporting/PolarisMetricsReporter.java
Outdated
Show resolved
Hide resolved
...java/org/apache/polaris/service/catalog/iceberg/IcebergRestCatalogEventServiceDelegator.java
Outdated
Show resolved
Hide resolved
runtime/service/src/main/java/org/apache/polaris/service/reporting/PolarisMetricsReporter.java
Outdated
Show resolved
Hide resolved
.../testFixtures/java/org/apache/polaris/service/events/listeners/TestPolarisEventListener.java
Outdated
Show resolved
Hide resolved
dimas-b
left a comment
There was a problem hiding this comment.
LGTM 👍 Thanks again, @obelix74 !
Let's give this PR another day in review in case other people have comments.
CC: @cccs-cat001
singhpk234
left a comment
There was a problem hiding this comment.
Thank you @obelix74 for the change it mostly LGTM too !
| * @see MetricsReportingConfiguration | ||
| */ | ||
| public interface PolarisMetricsReporter { | ||
| public void reportMetric(String catalogName, TableIdentifier table, MetricsReport metricsReport); |
There was a problem hiding this comment.
we release this public interface in 1.3 right ? i wonder if we keep this api with default impl to the new method with null or something ?
361b7e9
There was a problem hiding this comment.
@singhpk234 I had a deprecated backward compatible method and the consensus was to remove it since this SPI is not meant to be used by developers outside Polaris.
There was a problem hiding this comment.
Didn't fully get it, wdym by developers outside Polaris ? can this not be used downstream projects, my recommendation is to make sure version upgrades are seemless so keep this and add more if we want and just update the default of this work under timestamp null assumption.
There was a problem hiding this comment.
Let me also check previous discussions, can you please point me to that ?
There was a problem hiding this comment.
There was a problem hiding this comment.
Per our standing evolution guidelines public classes / methods can change in any release. Unlike REST API changes, it is not considered a "major" change for versioning purposes. Essentially java methods are not part of the API surface in the SemVer sense.
We should and do try to make java API changes in a backward -compatible manner when practical.
Regarding this particular PR and java interfaces that are part of an SPI (defined and called by Polaris, implemented by 3rd party plugins), I do not see a practical way to evolve them in a backward-compatible manner without causing excessive maintenance burden in Polaris code.
In this case, Polaris would have to define a new interface for the new method signature and perform instanceof checks at runtime in order to decide whether to call the old or the new method. I do believe it would be an overkill at the current stage of the project (still evolving actively). It is not too hard for downstream implementations to adjust code for the newly added parameter.
That said, I'd like to propose moving the SPI evolution discussion to the dev ML as it is a big and complex topic.
@singhpk234 : please clarify whether you consider this comment thread a blocker for merging (or not).
There was a problem hiding this comment.
@dimas-b can we not add the api in the same interface and add the default impl of old api call this null clock, i would like to understand this more if this is possible. If its not possible i am supportive of the change !
@singhpk234 : please clarify whether you consider this comment thread a blocker for merging (or not).
thanks for asking the clarification i believe anything is thats not marked as 'nit' / 'not blocker' / 'not' is expected to be resolved before merging (per here) , i really appreciate you checking in !
There was a problem hiding this comment.
can we not add the api in the same interface and add the default impl of old api call this null clock [...]
I do not see a point in adding a default impl for the old method alone. Existing implementations will have overrides for it already (javac will not let them miss that).
We could add a default impl. for the new method and redirect to the old one (without the Instant). This will allow existing implementations to compile without changes. However, this creates uncertainty for the implementor regarding which method should do the real work. Javadoc helps, but adds cognitive load. We could add default to both methods and deprecate the old one, but this will add "cruft" to the interface definition, when, I believe, adapting to the new interface in downstream projects is very easy.
Please note that providing a custom implementation for this interface requires a downstream build. So all upgrades will go through local builds and CI (assuming normal software engineering practices), where the need for adjustments will be become apparent.
There was a problem hiding this comment.
When I uptook 1.3.0, I was glad that my compilation failed since if the old method was deprecated, I would be confused why I need to uptake the new approach and how to do it.
Having said that, this PR started as adding events for metrics, we removed it. It only contains this extra attribute. It is not a blocker for me. If we don't agree on how to proceed, I would rather abandon this PR and focus on the simplified metrics persistence code here: #3385. That allows Polaris to persist table metrics to the database.
There was a problem hiding this comment.
Thats fair, though in this case we are forcing the upgrades to have this new interface implementation when having the old api was not hurting ? taking the case LOGGERs can be configured to implicitly log timestamp additionally and the way we have reporting wired if i wanna do custom one can just do timestamp internally rather than requesting from the signature ?
metricsReporter.reportMetric(
catalogName, tableIdentifier, reportMetricsRequest.report(), clock.instant());
it not the time when the request hit the server but its the time when the reportMetric is called so does it matter of i do this vs
reportMetric(catalogName, tableIdentifier, reportMetricsRequest.report())) {
timestamp = clock.instant()
}
with that being said i think its fine if we wanna move forward :), but above is my thought process. Please move forward @dimas-b trust your judgement here !
CHANGELOG.md
Outdated
| - The EclipseLink Persistence implementation has been completely removed. | ||
| - The default request ID header name has changed from `Polaris-Request-Id` to `X-Request-ID`. | ||
| - The (Before/After)CommitTableEvent has been removed. | ||
| - The `PolarisMetricsReporter.reportMetric()` method signature has been extended to include a |
There was a problem hiding this comment.
i belive we need to freeze the changelog for 1.3 since its released and then add this in an unrelease section ?
There was a problem hiding this comment.
I will move it to unreleased section.
There was a problem hiding this comment.
@singhpk234 I don't see a 1.3.0 section in the CHANGELOG - I synced my fork to upstream.
https://github.com/apache/polaris/blob/main/CHANGELOG.md
Am I missing something?
There was a problem hiding this comment.
Perhaps we need to rename Unreleased to 1.3.0-incubating in CHANGELOG and move this line alone to Unreleased.
There was a problem hiding this comment.
no you are not ideally there needs to a update in the main branch to freeze 1.3 section and create the unreleased section as soon as a version of Polaris is released, seems like we need to do it first ?
There was a problem hiding this comment.
As far as the CHANGELOG "diff" in this PR is concerned, it looks correct to me.
I'd expect @pingtimeout (as the release manager for 1.3.0) to open a reconciliation PR for main and add a dedicated section for 1.3.0 soon 😉 I'm sure @pingtimeout will be able to deal with conflicts if this PR merges first 🙂
There was a problem hiding this comment.
@obelix74 : please do not add a 1.3.0 section to CHANGELOG in this PR :)
cccs-cat001
left a comment
There was a problem hiding this comment.
Great idea! I appreciate the improvement :)
|
It looks like we have consensus in all discussions and some approvals. If no new concerns are raised. I propose to merge on Jan 23. |
|
Thanks all. |
|
@obelix74 : unfortunately, it looks like there's a conflict on |
Resolved. |
… parameter and comprehensive documentation (apache#3468) Enhance the `PolarisMetricsReporter` SPI interface by adding a timestamp parameter to the `reportMetric()` method, enabling accurate time-series metrics reporting to external systems.
This PR enhances the PolarisMetricsReporter SPI interface by adding a timestamp parameter to the reportMetric() method, enabling accurate time-series metrics reporting to external systems.
Checklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)