Add realtime metric for data freshness by jugomezv · Pull Request #2 · jugomezv/pinot

jugomezv · 2022-11-02T20:53:44Z

Add a metrics that measures the time lag from when an event was posted to the last stage of the ingestion stream till the event was consumed by the realtime server.

Tests performed: ran realtime LLC cluster integration test which includes table with two partitions and used jvisualvm to visualize metrics for both partitions after test completes. Enable warn log and monitored delta in time for each consumed row in realtime server, confirmed the last values logged are the last values contained in the metrics via jvisualvm. Confirmed we get a metric per-table, per partition.

Sample log traces of delay in milliseconds:
14:14:40.789 WARN [LLRealtimeSegmentDataManager_mytable__1__23__20221102T2114Z] [mytable__1__23__20221102T2114Z] ADDING REALTIME DATA FRESHNESS: delta=905[MS]
14:14:40.791 WARN [LLRealtimeSegmentDataManager_mytable__1__23__20221102T2114Z] [mytable__1__23__20221102T2114Z] ADDING REALTIME DATA FRESHNESS: delta=907[MS]

…#9678) * Enable dictionary code changes * Address review comments. * Checkstyle violation * Add e2e query execution test * Review comments

* Customize stopword for Lucene Index * Customize stopword for Lucene Index * Customize stopword for Lucene Index * Customize stopword for Lucene Index * Customize stopword for Lucene Index

gviedma · 2022-11-02T23:18:31Z

pinot-common/src/main/java/org/apache/pinot/common/metrics/ServerGauge.java

Nit: suggest renaming the metric to something more descriptive like PARTITION_INGESTION_LAG_MS.

I think we should also change milliseconds to something like partitionIngestionLagMs

.../src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java

mcvsubbu

Consider emitting one metric for a table-host combination. That can be the maximum lag amongst all partitions for that table in the host. We have other metrics that indicate consumption rate on a per-partition basis, so that should help if we want to debug further.
We should be emitting zero if we get back 0 events from the poll, right?

.../src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java

jugomezv · 2022-11-03T20:32:56Z

@mcvsubbu We considered a metric per table but that wont work because partitions are distributed across servers. Not sure what value will be added keeping a metric per host/per table, I have scheduled a discussing next Monday it would be nice to close on this then.

Zero is being issues when we hit an empty batch on the most recent change.

GSharayu · 2022-11-07T22:20:33Z

Hey @jugomezv Can you please rebase this PR with the latest. I am just confused with Stop words PR and your changes. It will be helpful for review

sajjad-moradi · 2022-11-08T05:51:27Z

Hey @jugomezv Can you please rebase this PR with the latest. I am just confused with Stop words PR and your changes. It will be helpful for review

+1

@GSharayu I think you need to only review LLRealtimeSegmentDataManager.java and ServerGauge.java.

sajjad-moradi

Looks good overall.

One thing to consider:
I believe we need to come up with a strategy for the period in which the consuming segment is being completed. There's no consumption in that period. We have no idea what the lag is! Should we make the value of the new metric zero? Should we continue with the latest value? Or may be increase it linearly as the time passes? None of these cases represents the reality because we don't know even if there's a data available in the stream at that time!

.../src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java

sajjad-moradi · 2022-11-08T05:37:27Z

.../src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java

+            long lastMileIngestionDelayMs = System.currentTimeMillis() - msgMetadata.getRecordIngestionTimeMs();
+            lastMileIngestionDelayMs = lastMileIngestionDelayMs < 0 ? 0 : lastMileIngestionDelayMs;
+            _serverMetrics.setValueOfPartitionGauge(_tableNameWithType, _partitionGroupId,
+                ServerGauge.PARTITION_INGESTION_LAG_MS, lastMileIngestionDelayMs);


lastMile and end2end (for _t header) are internal to our projects at Linkedin. From Pinot OSS's perspective, there's only one kafka topic and that's the one the RT servers consume from. Maybe drop lastMile or simply change it to the name of the metric partitionIngestionLagMs?

.../src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java

sajjad-moradi · 2022-11-08T06:03:38Z

... used jvisualvm to visualize metrics for both partitions after test completes ...

The metrics you highlighted in the screenshots are not the ones you added.

vvivekiyer and others added 2 commits November 2, 2022 13:44

Support creating dictionary at runtime for an existing column (apache…

351cb3e

…#9678) * Enable dictionary code changes * Address review comments. * Checkstyle violation * Add e2e query execution test * Review comments

Customize stopword for Lucene Index (apache#9708)

229c55e

* Customize stopword for Lucene Index * Customize stopword for Lucene Index * Customize stopword for Lucene Index * Customize stopword for Lucene Index * Customize stopword for Lucene Index

gviedma suggested changes Nov 2, 2022

View reviewed changes

mcvsubbu reviewed Nov 3, 2022

View reviewed changes

jugomezv force-pushed the jugomez/RealtimeLastMileIngestionDelay branch from d68f32c to 7d58fd2 Compare November 3, 2022 20:22

jugomezv force-pushed the jugomez/RealtimeLastMileIngestionDelay branch from 7d58fd2 to ed1c4a3 Compare November 3, 2022 20:37

Add metric for realtime data freshness

1315db0

jugomezv force-pushed the jugomez/RealtimeLastMileIngestionDelay branch from ed1c4a3 to 1315db0 Compare November 3, 2022 20:41

sajjad-moradi reviewed Nov 8, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add realtime metric for data freshness#2

Add realtime metric for data freshness#2
jugomezv wants to merge 3 commits intomasterfrom
jugomez/RealtimeLastMileIngestionDelay

jugomezv commented Nov 2, 2022 •

edited

Loading

Uh oh!

gviedma Nov 2, 2022

Uh oh!

GSharayu Nov 7, 2022 •

edited

Loading

Uh oh!

Uh oh!

mcvsubbu left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jugomezv commented Nov 3, 2022 •

edited

Loading

Uh oh!

GSharayu commented Nov 7, 2022

Uh oh!

sajjad-moradi commented Nov 8, 2022

Uh oh!

sajjad-moradi left a comment

Uh oh!

Uh oh!

sajjad-moradi Nov 8, 2022

Uh oh!

Uh oh!

sajjad-moradi commented Nov 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

jugomezv commented Nov 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gviedma Nov 2, 2022

Choose a reason for hiding this comment

Uh oh!

GSharayu Nov 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mcvsubbu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jugomezv commented Nov 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GSharayu commented Nov 7, 2022

Uh oh!

sajjad-moradi commented Nov 8, 2022

Uh oh!

sajjad-moradi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sajjad-moradi Nov 8, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sajjad-moradi commented Nov 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

jugomezv commented Nov 2, 2022 •

edited

Loading

GSharayu Nov 7, 2022 •

edited

Loading

jugomezv commented Nov 3, 2022 •

edited

Loading