feat: pass ExternalCatalog properties into federated catalogs#3480
feat: pass ExternalCatalog properties into federated catalogs#3480dimas-b merged 1 commit intoapache:mainfrom
Conversation
polaris-core/src/main/java/org/apache/polaris/core/catalog/ExternalCatalogFactory.java
Outdated
Show resolved
Hide resolved
.../main/java/org/apache/polaris/service/catalog/iceberg/IcebergRESTExternalCatalogFactory.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/apache/polaris/extensions/federation/hive/HiveFederatedCatalogFactory.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/polaris/extensions/federation/hadoop/HadoopFederatedCatalogFactory.java
Outdated
Show resolved
Hide resolved
|
👋 Hi @adutra, I appreciate your thoughtful review and suggestions. I made updates based on your feedback:
If you have a moment, I would really appreciate another look. If there is anything else you would prefer to see structured differently, I am happy to revise. Thanks again! 🙇♂️ |
...t/java/org/apache/polaris/service/catalog/iceberg/IcebergRESTExternalCatalogFactoryTest.java
Outdated
Show resolved
Hide resolved
dimas-b
left a comment
There was a problem hiding this comment.
Thanks for you contribution, @yj-lee0503 ! This looks like a valuable change to me (with a comment).
Also, I wonder if @dennishuo could review too - thx!
| if (catalogProperties != null && !catalogProperties.isEmpty()) { | ||
| LoggerFactory.getLogger(getClass()) | ||
| .warn( | ||
| "catalogProperties were provided but {} does not override createCatalog with " |
There was a problem hiding this comment.
This is a reasonable approach to ensuring backward compatibility with older implementations of this interface.
However, Polaris core will always call this method (if I'm not mistaken), and if a custom implementation does not override it, the server will be producing a lot of WARN messages in runtime.
At the same time, in a new implementation of this interface it may be tempting to the author to override only the old method (especially if helping tools are involved), which would be sub-optimal since the intention is obviously to override this method.
Our standing evolution guidelines make it clear that java interfaces may change at any time. I tend to prefer to make a breaking change in this case and simply add the new parameter to the old method. Old implementations will be able to catch that at CI time and the fix is rather trivial. The benefit is simpler Polaris code base and easier OSS code maintenance (which has more exposure than private implementations).
Apologies if this was already considered and I missed it. This is just my personal opinion. Please consider it non-blocking.
There was a problem hiding this comment.
Hi @dimas-b . Thank you for the thoughtful feedback (and the kind words). ✨ Also, I appreciate sharing the evolution guidelines.
I’m leaning toward updating the PR to the breaking-change approach: adding the new parameter directly to the existing interface method and removing the default-method delegation + WARN, unless @dennishuo, @adutra, or others feel strongly that we should preserve compatibility for external implementations right now. I've prepared the changes to be pushed. Please let me know!
Thanks for looping in @dennishuo as well. I look forward to hearing from you all!. 🙇
There was a problem hiding this comment.
I pushed the changes that were discussed in this thread. I can revert the changes if needed. Please let me know. Thank you all again for your review.
There was a problem hiding this comment.
I am OK with the breaking change approach, FYI.
There was a problem hiding this comment.
The latest state of this PR LGTM 👍 Thanks, @yj-lee0503 !
Given that we're introducing breaking changes into SPI classes, it might be a good idea to also send an email about this PR to the dev ML for general awareness.
Thanks again for your review and guidance, @dimas-b . I sent an email to the dev mailing list. 😄 🙇 |
|
Thanks, @yj-lee0503 ! Could you also post a link to the |
Ah! I am sorry. I sent it via email. I don't think I have access to the Slack channel unless I am horribly mistaken. I thought it would require |
|
Email is quite fine. They should show up here: https://lists.apache.org/list.html?dev@polaris.apache.org ... but I do not see anything about this PR 🤔 You can send to dev AT polaris.apache.org from any email. You may want to subscribe with your personal email too. |
I tried to send it twice using my personal email and once using my work email. I think my work email address worked. :) Here is a link. |
|
Thanks, @yj-lee0503 ! Let's give some grace time for people to be able to catch up with email. I think it would be reasonable to merge on Feb 5 if no objections are raised. |
Sounds amazing! Thanks again for all of your guidance and support. 🙇♂️🙏😄 It was a fantastic learning opportunity for me. |
|
@yj-lee0503 : Could you rebase to fix CI checks, please? |
Pass ExternalCatalog.properties through to federated catalog clients fix double init and add backward compatibility update changelog on ExternalCatalog properties Replace custom merge logic with RESTUtil.merge() in all federated catalog factories (IcebergREST, Hive, Hadoop) Inline logger in ExternalCatalogFactory interface Simplify tests to verify RESTUtil.merge() behavior we depend on Remove redundant RESTUtil.merge() tests Resolve commit conflicts. Simplify ExternalCatalogFactory to breaking change per review Remove deprecated 2-param createCatalog/createGenericCatalog methods and make the 3-param versions the only abstract methods. This follows Polaris evolution guidelines that Java interfaces may change at any time, and avoids runtime WARN noise for legacy implementations. Move CHANGELOG entry from Changes to Breaking changes section.
56b3e8e to
a294004
Compare
Hello @dimas-b ! I rebased my branch. Thanks for your help. 🙇♂️ |
Summary
ExternalCatalog.propertiesthrough federation factories (Iceberg REST, Hive, Hadoop)Context
External catalog properties such as
rest.client.proxy.*and REST client timeout settings were not reaching the Iceberg REST HTTP client. This blocks federation in controlled egress environments where outbound traffic must go through an allowlisted forward proxy.Changes
Tests
./gradlew format compileAll✅./gradlew rat✅./gradlew :polaris-core:test✅./gradlew :polaris-extensions-federation-hadoop:test(NO-SOURCE) ✅./gradlew :polaris-extensions-federation-hive:test(NO-SOURCE) ✅./gradlew :polaris-runtime-service:testfails with 4 failures inAwsCloudWatchEventListenerTest(Testcontainers/Docker initialization issue; unrelated)Manual integration test (EKS + Squid forward proxy + External Iceberg REST catalog)
I validated this PR end to end in a real AWS EKS environment with an external Iceberg REST catalog behind a Squid forward proxy. Federation requests succeeded and proxy usage was confirmed via Squid access logs.
Setup
Evidence (Squid access logs)
When calling the federation API (example:
GET /api/catalog/v1/<catalog>/namespaces), Squid showed a successful HTTPS tunnel:This indicates traffic originated from the Polaris pod and went through the proxy via CONNECT to the external catalog host.
Before vs After
CONNECT <external-catalog-host>:443entries observedTCP_TUNNEL/200 CONNECT <external-catalog-host>:443entries observedFederation API Results
All requests were observed going through Squid.
Note
Related to #3465
Checklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)