-
Notifications
You must be signed in to change notification settings - Fork 281
chore: Annotate classes/methods/fields that are used by Apache Iceberg #3237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add documentation detailing all Comet classes and methods that form the public API used by Apache Iceberg. This helps contributors understand which APIs may affect Iceberg integration and need backward compatibility considerations. The documentation covers: - org.apache.comet.parquet: FileReader, RowGroupReader, ReadOptions, ParquetColumnSpec, column readers, BatchReader, Native JNI methods - org.apache.comet: CometSchemaImporter - org.apache.comet.vector: CometVector - org.apache.comet.shaded.arrow: RootAllocator, ValueVector Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a custom Java annotation @IcebergApi to mark all classes, methods, constructors, and fields that form the public API used by Apache Iceberg. This makes it easy to identify which APIs need backward compatibility considerations when making changes. The annotation is applied to: - org.apache.comet.parquet: FileReader, RowGroupReader, ReadOptions, WrappedInputFile, ParquetColumnSpec, AbstractColumnReader, ColumnReader, BatchReader, MetadataColumnReader, ConstantColumnReader, Native, TypeUtil, Utils - org.apache.comet: CometSchemaImporter - org.apache.comet.vector: CometVector Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add annotations to: - AbstractColumnReader.nativeHandle (protected field accessed by Iceberg subclasses) - AbstractCometSchemaImporter.close() (called by Iceberg) Also update documentation to include these APIs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3237 +/- ##
============================================
+ Coverage 56.12% 59.99% +3.86%
- Complexity 976 1475 +499
============================================
Files 119 175 +56
Lines 11743 16165 +4422
Branches 2251 2681 +430
============================================
+ Hits 6591 9698 +3107
- Misses 4012 5114 +1102
- Partials 1140 1353 +213 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Add a new Maven module containing dedicated unit tests for all @IcebergApi annotated classes, ensuring the public API contract with Apache Iceberg remains stable and tested. Key changes: - Add iceberg-public-api module with 169 tests covering all @IcebergApi classes - Fix CometVector constructor visibility (protected -> public) to match API annotation - Add IcebergApiVerificationTest for reflection-based API verification - Add tests for FileReader, BatchReader, ColumnReader, Native, TypeUtil, Utils - Add tests for CometVector, CometSchemaImporter, WrappedInputFile Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
How this will be maintained in the future ? |
The next step is #3240 which adds unit tests to ensure that the API is as expected. This will help prevent accidental changes to the current API methods. We should probably update the release process documentation to suggest auditing the Iceberg API use before creating a new release. |
| // Methods used by Iceberg | ||
| public void setRequestedSchemaFromSpecs(List<ParquetColumnSpec> specs) | ||
| public RowGroupReader readNextRowGroup() throws IOException | ||
| public void skipNextRowGroup() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public void skipNextRowGroup() | |
| public boolean skipNextRowGroup() |
|
|
||
| ```java | ||
| // Constructor | ||
| public WrappedInputFile(org.apache.iceberg.io.InputFile inputFile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datafusion-comet/common/src/main/java/org/apache/comet/parquet/WrappedInputFile.java
Line 40 in 640dd03
| public WrappedInputFile(Object inputFile) { |
java.lang.Object
| @After | ||
| public void tearDown() throws IOException { | ||
| if (tempDir != null && Files.exists(tempDir)) { | ||
| Files.walk(tempDir).sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Files.walk(tempDir).sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete); | |
| try (Stream<Path> stream : Files.walk(tempDir)) { | |
| stream.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete); | |
| } |
The returned stream contains references to one or more open directories. The directories are closed by closing the stream.
| /** Checks if native library is available. */ | ||
| protected static boolean isNativeLibraryAvailable() { | ||
| try { | ||
| Class.forName("org.apache.comet.NativeBase"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not guarantee that the native library is available and loadable.
| @@ -189,6 +191,7 @@ public BatchReader( | |||
| * @deprecated since 0.10.0, will be removed in 0.11.0. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it should be undeprecated ?!
Currently is is both deprecated and important (used by Iceberg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should remove the deprecated annotation for the time being.
| @Test | ||
| public void testCurrentBatchMethodExists() throws NoSuchMethodException { | ||
| Method method = BatchReader.class.getMethod("currentBatch"); | ||
| assertThat(method).isNotNull(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| assertThat(method).isNotNull(); |
| @Test | ||
| public void testCloseMethodExists() throws NoSuchMethodException { | ||
| Method method = BatchReader.class.getMethod("close"); | ||
| assertThat(method).isNotNull(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| assertThat(method).isNotNull(); |
| // Methods used by Iceberg | ||
| public static AbstractColumnReader getColumnReader( | ||
| DataType sparkType, | ||
| ColumnDescriptor descriptor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter types do not match with https://github.com/apache/datafusion-comet/pull/3237/changes#diff-54de18a6f3ec3c2944f1628012f8c0b0af863da30419a8bc989eb6cd8ccb8cd1R39-R46
| ColumnDescriptor descriptor, | |
| ParquetColumnSpec columnSpec, |
| CometSchemaImporter importer, | ||
| int batchSize, | ||
| boolean useDecimal128, | ||
| boolean isConstant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| boolean isConstant | |
| boolean useLazyMaterialization, | |
| boolean useLegacyTimestamp |
|
|
||
| 1. Creates `CometSchemaImporter` with a `RootAllocator` | ||
| 2. Uses `Utils.getColumnReader()` to create appropriate column readers | ||
| 3. Calls `reset()` and `setPageReader()` for each row group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which reset() method ?
Maybe:
| 3. Calls `reset()` and `setPageReader()` for each row group | |
| 3. Calls `Native.resetBatch()` and `setPageReader()` for each row group |
?!
- Fix documentation: skipNextRowGroup returns boolean, WrappedInputFile takes Object, update Utils.getColumnReader signature, fix TypeUtil reference, clarify Native.resetBatch() - Use try-with-resources for Files.walk() in AbstractApiTest - Remove unnecessary isNotNull() assertions in test files (getMethod/getConstructor throw if not found) - Add @IcebergApi to BatchReader.close() and ReadOptions.Builder constructor Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
would it be more resilient to have a @publicapi attribute which took a list of libraries you knew used it it? There's some stuff in hadoop (e.g. hdfs MiniDFSCluster) that document that hbase & others use it... |
Thanks for the suggestion @steveloughran. Honestly, I'm not anticipating any other systems to have dependencies on Comet, so this may be be overkill right now. However, if this is an approach that people are more comfortable with, then I'm fine with moving to something like this. @mbutrovich @comphead @parthchandra any thoughts on this? |
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
@huaxingao @parthchandra @martin-g in an effort to make this PR easier to review, I changed the scope to just adding the annotations, rather than also adding tests and contributor guide documentation. I would appreciate another review. |
I like this suggestion because it forces us to think in terms of APIs that are for general purpose use. We can do this in a follow up PR, I feel. |
parthchandra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment, otherwise lgtm
| @@ -189,6 +191,7 @@ public BatchReader( | |||
| * @deprecated since 0.10.0, will be removed in 0.11.0. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should remove the deprecated annotation for the time being.
Rationale
Iceberg depends on Comet, therefore Comet has a public API that needs to be maintained. This isn't documented very well.
This PR adds an
@IcebergApiannotation to the public API.There is also a new page in the contributor guide documenting this API.I generated this information by analyzing the latest from Iceberg's main branch.
🤖 Generated with Claude Code