Skip to content

Conversation

@kevinjqliu
Copy link

@kevinjqliu kevinjqliu commented Jan 15, 2026

What is the purpose of the pull request

Upgrades Iceberg and Avro libraries to newer versions to benefit from bug fixes, performance improvements, and new features.

Version Changes

Library Previous New
Iceberg 1.4.2 1.9.2
Iceberg Hive Runtime 1.4.2 1.7.2
Avro 1.11.4 1.12.0

Note:

Verify this pull request

This pull request is already covered by existing tests, such as (please describe tests).

@the-other-tim-brown
Copy link
Contributor

Tracking the CI issues in this issue #787

@the-other-tim-brown
Copy link
Contributor

Tracking the CI issues in this issue #787

The CI issue is resolved, please rebase your PR

@kevinjqliu kevinjqliu force-pushed the kevinjqliu/iceberg-upgrade branch from a4b2a8c to afea83c Compare January 19, 2026 18:54
Comment on lines +75 to +76
"type": ["null", "bytes"],
"default": null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not modify the test schema as part of this change. It seems like it is not required.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe this is needed for the new parquet library version, but let me revert and see

Copy link
Author

@kevinjqliu kevinjqliu Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, fails with

Error:  org.apache.xtable.hudi.ITHudiConversionSource.insertAndUpsertData(HoodieTableType, PartitionConfig)[3] -- Time elapsed: 10.34 s <<< ERROR!
org.apache.hudi.exception.HoodieInsertException: Failed to bulk insert for commit time 20260119224246020
	at org.apache.hudi.table.action.commit.JavaBulkInsertCommitActionExecutor.execute(JavaBulkInsertCommitActionExecutor.java:63)
	at org.apache.hudi.table.HoodieJavaCopyOnWriteTable.bulkInsert(HoodieJavaCopyOnWriteTable.java:118)
	at org.apache.hudi.table.HoodieJavaCopyOnWriteTable.bulkInsert(HoodieJavaCopyOnWriteTable.java:85)
	at org.apache.hudi.client.HoodieJavaWriteClient.bulkInsert(HoodieJavaWriteClient.java:169)
	at org.apache.hudi.client.HoodieJavaWriteClient.bulkInsert(HoodieJavaWriteClient.java:158)
	at org.apache.xtable.TestJavaHudiTable.insertRecordsWithCommitAlreadyStarted(TestJavaHudiTable.java:195)
	at org.apache.xtable.hudi.ITHudiConversionSource.insertAndUpsertData(ITHudiConversionSource.java:245)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.tryRemoveAndExec(ForkJoinPool.java:1062)
	at java.base/java.util.concurrent.ForkJoinPool.awaitJoin(ForkJoinPool.java:1688)
	at java.base/java.util.concurrent.ForkJoinTask.doJoin(ForkJoinTask.java:397)
	at java.base/java.util.concurrent.ForkJoinTask.get(ForkJoinTask.java:1004)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
Caused by: java.lang.RuntimeException: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: org.apache.avro.AvroTypeException: Invalid default for field bytes_field: "" not a "bytes"
	at org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
	at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
	at org.apache.hudi.table.action.commit.JavaBulkInsertHelper.bulkInsert(JavaBulkInsertHelper.java:131)
	at org.apache.hudi.table.action.commit.JavaBulkInsertHelper.bulkInsert(JavaBulkInsertHelper.java:84)
	at org.apache.hudi.table.action.commit.JavaBulkInsertCommitActionExecutor.execute(JavaBulkInsertCommitActionExecutor.java:58)
	... 17 more
Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: org.apache.avro.AvroTypeException: Invalid default for field bytes_field: "" not a "bytes"
	at org.apache.hudi.execution.JavaLazyInsertIterable.computeNext(JavaLazyInsertIterable.java:71)
	at org.apache.hudi.execution.JavaLazyInsertIterable.computeNext(JavaLazyInsertIterable.java:37)
	at org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:119)
	... 21 more
Caused by: org.apache.hudi.exception.HoodieException: org.apache.avro.AvroTypeException: Invalid default for field bytes_field: "" not a "bytes"
	at org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:75)
	at org.apache.hudi.execution.JavaLazyInsertIterable.computeNext(JavaLazyInsertIterable.java:67)
	... 23 more
Caused by: org.apache.avro.AvroTypeException: Invalid default for field bytes_field: "" not a "bytes"
	at org.apache.avro.Schema.validateDefault(Schema.java:1719)
	at org.apache.avro.Schema$Field.<init>(Schema.java:578)
	at org.apache.avro.Schema$Field.<init>(Schema.java:614)
	at org.apache.hudi.avro.HoodieAvroUtils.addMetadataFields(HoodieAvroUtils.java:291)
	at org.apache.hudi.io.HoodieWriteHandle.<init>(HoodieWriteHandle.java:96)
	at org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:89)
	at org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:76)
	at org.apache.hudi.io.CreateHandleFactory.create(CreateHandleFactory.java:45)
	at org.apache.hudi.execution.CopyOnWriteInsertHandler.consume(CopyOnWriteInsertHandler.java:85)
	at org.apache.hudi.execution.CopyOnWriteInsertHandler.consume(CopyOnWriteInsertHandler.java:42)
	at org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:69)
	... 24 more

https://github.com/apache/incubator-xtable/actions/runs/21153333020/job/60833552403?pr=784

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an avro issue, not parquet. Is the avro upgrade required?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops yea, i meant new avro library version. let me double check if the upgrade is necessary

Copy link
Author

@kevinjqliu kevinjqliu Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the avro upgrade is required, Iceberg 1.9.2 relies on Avro 1.12.0 APIs

and the schema changes here in basic_schema.avsc is due to compatibility with Avro 1.12.0's stricter default value handling

* Validates that the metadata for the table is properly created/updated. {@link
* ITConversionController} validates that the table and its data can be properly read.
*/
@Execution(ExecutionMode.SAME_THREAD)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is so that github CI wont run into concurrency issues, i see this is done for TestDeltaSync as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the concurrency issue? This wasn't required on the lower version of Iceberg

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember its about the InMemory catalog the tests are using.

Comment on lines +75 to +76
"type": ["null", "bytes"],
"default": null
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe this is needed for the new parquet library version, but let me revert and see

@kevinjqliu kevinjqliu changed the title upgrade to iceberg 1.10.1 Upgrade Iceberg to 1.9.2 and Avro to 1.12.0 Jan 19, 2026
@kevinjqliu kevinjqliu marked this pull request as ready for review January 20, 2026 00:28
@kevinjqliu
Copy link
Author

kevinjqliu commented Jan 20, 2026

CI is green @the-other-tim-brown could you take another look?

i tried removing as many unrelated changes as possible

@the-other-tim-brown
Copy link
Contributor

@kevinjqliu why not upgrade to the latest iceberg? Can you update the PR description with why each version was chosen?

@kevinjqliu
Copy link
Author

Thanks for the review @the-other-tim-brown
I'll double check if the avro version upgrade is needed here.

I was going to update #783 on why 1.9.2 is picked.
Latest Iceberg version (1.10.x) uses parquet 1.15 which conflicts with the parquet version in Hudi. Specifically, for the new variant feature that was added
I choose 1.9.x for now and we can deal with the parquet version as a follow up

@the-other-tim-brown
Copy link
Contributor

@kevinjqliu for the hive support, is there a new import we should use to keep the version consistent? Or is it removed completely?

@kevinjqliu
Copy link
Author

unfortunately iceberg-hive-runtime is removed after 1.7.2, https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-hive-runtime
i want to punt this as a follow up

@kevinjqliu
Copy link
Author

I added a note in the PR description on why we need the avro version upgrade.

I want to keep this PR scoped to just upgrading the iceberg library version. LMK what you think.

@jbonofre
Copy link
Member

jbonofre commented Jan 22, 2026

I think there are also other updates required to bump Iceberg version. Let me take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Iceberg Version Upgrade

3 participants