Skip to content

Conversation

@HennesyChihiro
Copy link

Problem
When writing historical dates (pre-1582) to ORC files in Spark, there is a discrepancy between the written data and the collected metrics due to Julian-Gregorian calendar rebase issues. This causes consistency checks to fail.

Modification
Updated GenericOrcWriters.java to ensure dates are correctly handled during the write process.

Adjusted OrcMetrics.java to align metrics collection with the corrected writing logic, ensuring the min/max values reflect the actual data on disk.

Verification Results
Ran ./gradlew :iceberg-orc:test and confirmed that the specific test cases for historical dates now pass without metric mismatches.

Fixes #14214

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proleptic timestamps marked as hybrid in ORC metadata

1 participant