-
Notifications
You must be signed in to change notification settings - Fork 196
Upgrade Iceberg to 1.9.2 and Avro to 1.12.0 #784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Tracking the CI issues in this issue #787 |
The CI issue is resolved, please rebase your PR |
a4b2a8c to
afea83c
Compare
| "type": ["null", "bytes"], | ||
| "default": null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not modify the test schema as part of this change. It seems like it is not required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe this is needed for the new parquet library version, but let me revert and see
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, fails with
Error: org.apache.xtable.hudi.ITHudiConversionSource.insertAndUpsertData(HoodieTableType, PartitionConfig)[3] -- Time elapsed: 10.34 s <<< ERROR!
org.apache.hudi.exception.HoodieInsertException: Failed to bulk insert for commit time 20260119224246020
at org.apache.hudi.table.action.commit.JavaBulkInsertCommitActionExecutor.execute(JavaBulkInsertCommitActionExecutor.java:63)
at org.apache.hudi.table.HoodieJavaCopyOnWriteTable.bulkInsert(HoodieJavaCopyOnWriteTable.java:118)
at org.apache.hudi.table.HoodieJavaCopyOnWriteTable.bulkInsert(HoodieJavaCopyOnWriteTable.java:85)
at org.apache.hudi.client.HoodieJavaWriteClient.bulkInsert(HoodieJavaWriteClient.java:169)
at org.apache.hudi.client.HoodieJavaWriteClient.bulkInsert(HoodieJavaWriteClient.java:158)
at org.apache.xtable.TestJavaHudiTable.insertRecordsWithCommitAlreadyStarted(TestJavaHudiTable.java:195)
at org.apache.xtable.hudi.ITHudiConversionSource.insertAndUpsertData(ITHudiConversionSource.java:245)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.tryRemoveAndExec(ForkJoinPool.java:1062)
at java.base/java.util.concurrent.ForkJoinPool.awaitJoin(ForkJoinPool.java:1688)
at java.base/java.util.concurrent.ForkJoinTask.doJoin(ForkJoinTask.java:397)
at java.base/java.util.concurrent.ForkJoinTask.get(ForkJoinTask.java:1004)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
Caused by: java.lang.RuntimeException: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: org.apache.avro.AvroTypeException: Invalid default for field bytes_field: "" not a "bytes"
at org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at org.apache.hudi.table.action.commit.JavaBulkInsertHelper.bulkInsert(JavaBulkInsertHelper.java:131)
at org.apache.hudi.table.action.commit.JavaBulkInsertHelper.bulkInsert(JavaBulkInsertHelper.java:84)
at org.apache.hudi.table.action.commit.JavaBulkInsertCommitActionExecutor.execute(JavaBulkInsertCommitActionExecutor.java:58)
... 17 more
Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: org.apache.avro.AvroTypeException: Invalid default for field bytes_field: "" not a "bytes"
at org.apache.hudi.execution.JavaLazyInsertIterable.computeNext(JavaLazyInsertIterable.java:71)
at org.apache.hudi.execution.JavaLazyInsertIterable.computeNext(JavaLazyInsertIterable.java:37)
at org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:119)
... 21 more
Caused by: org.apache.hudi.exception.HoodieException: org.apache.avro.AvroTypeException: Invalid default for field bytes_field: "" not a "bytes"
at org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:75)
at org.apache.hudi.execution.JavaLazyInsertIterable.computeNext(JavaLazyInsertIterable.java:67)
... 23 more
Caused by: org.apache.avro.AvroTypeException: Invalid default for field bytes_field: "" not a "bytes"
at org.apache.avro.Schema.validateDefault(Schema.java:1719)
at org.apache.avro.Schema$Field.<init>(Schema.java:578)
at org.apache.avro.Schema$Field.<init>(Schema.java:614)
at org.apache.hudi.avro.HoodieAvroUtils.addMetadataFields(HoodieAvroUtils.java:291)
at org.apache.hudi.io.HoodieWriteHandle.<init>(HoodieWriteHandle.java:96)
at org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:89)
at org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:76)
at org.apache.hudi.io.CreateHandleFactory.create(CreateHandleFactory.java:45)
at org.apache.hudi.execution.CopyOnWriteInsertHandler.consume(CopyOnWriteInsertHandler.java:85)
at org.apache.hudi.execution.CopyOnWriteInsertHandler.consume(CopyOnWriteInsertHandler.java:42)
at org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:69)
... 24 more
https://github.com/apache/incubator-xtable/actions/runs/21153333020/job/60833552403?pr=784
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like an avro issue, not parquet. Is the avro upgrade required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops yea, i meant new avro library version. let me double check if the upgrade is necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the avro upgrade is required, Iceberg 1.9.2 relies on Avro 1.12.0 APIs
and the schema changes here in basic_schema.avsc is due to compatibility with Avro 1.12.0's stricter default value handling
| * Validates that the metadata for the table is properly created/updated. {@link | ||
| * ITConversionController} validates that the table and its data can be properly read. | ||
| */ | ||
| @Execution(ExecutionMode.SAME_THREAD) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is so that github CI wont run into concurrency issues, i see this is done for TestDeltaSync as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the concurrency issue? This wasn't required on the lower version of Iceberg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember its about the InMemory catalog the tests are using.
| "type": ["null", "bytes"], | ||
| "default": null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe this is needed for the new parquet library version, but let me revert and see
|
CI is green @the-other-tim-brown could you take another look? i tried removing as many unrelated changes as possible |
|
@kevinjqliu why not upgrade to the latest iceberg? Can you update the PR description with why each version was chosen? |
|
Thanks for the review @the-other-tim-brown I was going to update #783 on why 1.9.2 is picked. |
|
@kevinjqliu for the hive support, is there a new import we should use to keep the version consistent? Or is it removed completely? |
|
unfortunately |
|
I added a note in the PR description on why we need the avro version upgrade. I want to keep this PR scoped to just upgrading the iceberg library version. LMK what you think. |
|
I think there are also other updates required to bump Iceberg version. Let me take a look. |
What is the purpose of the pull request
Upgrades Iceberg and Avro libraries to newer versions to benefit from bug fixes, performance improvements, and new features.
Version Changes
Note:
iceberg-hive-runtimeis pinned to 1.7.2 as it was removed in Iceberg 1.8.0. The Hive runtime functionality has been restructured in newer Iceberg versions.Verify this pull request
This pull request is already covered by existing tests, such as (please describe tests).