[FLINK] Implement Iceberg lookup join functionality, and source code jand unit test code #15056
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[FLINK] Implement Iceberg lookup join functionality
Problem
In production environments, there is a common need to join streaming data with dimension data stored in Iceberg tables. The dimension data needs to be periodically refreshed to ensure join accuracy. Currently, Flink lacks native support for Iceberg lookup joins, forcing users to work around this limitation or use alternative solutions.
Solution
This PR implements Iceberg lookup join functionality for Flink, enabling efficient joins between streaming data and Iceberg dimension tables. The implementation includes:
Changes
IcebergLookupCachefor efficient caching of lookup dataIcebergLookupReaderfor reading lookup data from Iceberg tablesIcebergLookupJoinITCasefor integration testingIcebergTableSourceto support lookup join operationsFlinkConfigOptionsfor lookup join settingsBenefits
Testing
Versions
This implementation is backported to Flink 1.16, 1.17, and 1.18 to support multiple Flink versions in production environments.