Spark: Fix date rebase issues in ORC writing and metrics collection #15188
+25
−6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
When writing historical dates (pre-1582) to ORC files in Spark, there is a discrepancy between the written data and the collected metrics due to Julian-Gregorian calendar rebase issues. This causes consistency checks to fail.
Modification
Updated GenericOrcWriters.java to ensure dates are correctly handled during the write process.
Adjusted OrcMetrics.java to align metrics collection with the corrected writing logic, ensuring the min/max values reflect the actual data on disk.
Verification Results
Ran ./gradlew :iceberg-orc:test and confirmed that the specific test cases for historical dates now pass without metric mismatches.
Fixes #14214