Skip to content

Conversation

@fightBoxing
Copy link

Problem
In production environments, there is a common need to join streaming data with dimension data stored in Iceberg tables. The dimension data needs to be periodically refreshed to ensure join accuracy. Currently, Flink lacks native support for Iceberg lookup joins, forcing users to work around this limitation or use alternative solutions.

Solution
This PR implements Iceberg lookup join functionality for Flink, enabling efficient joins between streaming data and Iceberg dimension tables. The implementation includes:

IcebergLookupCache: A cache mechanism for storing and managing lookup data with TTL support
IcebergLookupReader: A reader component for loading and refreshing lookup data from Iceberg tables
IcebergTableSource enhancement: Updated to support lookup join operations
Configuration options: New config options for customizing lookup join behavior (cache size, refresh interval, etc.)
Integration tests: Comprehensive test coverage (IcebergLookupJoinITCase)
Changes
Added IcebergLookupCache for efficient caching of lookup data
Added IcebergLookupReader for reading lookup data from Iceberg tables
Added IcebergLookupJoinITCase for integration testing
Updated IcebergTableSource to support lookup join operations
Added configuration options in FlinkConfigOptions for lookup join settings
Updated build.gradle files for v1.16, v1.17, and v1.18
Benefits
Enables real-time joins with Iceberg dimension tables
Reduces data latency by avoiding frequent full table scans
Improves performance through intelligent caching strategies
Seamlessly integrates with existing Flink lookup join framework
Supports periodic data refresh to ensure data freshness
Testing
Added integration tests to validate lookup join functionality
Tested cache refresh mechanisms
Verified correctness of join results
Ensures backward compatibility

Copy link
Contributor

@mxm mxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @fightBoxing! A couple of comments:

  • Please only target the newest version of Flink (currently 2.1). Remove any code for older versions. Backports need to be dealt later on.
  • Please make sure the code is clean and compiles.
  • Please make sure to read https://iceberg.apache.org/contribute/
  • Comments should be in English

In general, this type of change may warrant a design document which should be reviewed before the code changes. It may be necessary to break up the changes into multiple PRs to make it easier to review the different components. I hope this makes sense. Thank you for your time and effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants