Skip to content

feat: pass ExternalCatalog properties into federated catalogs#3480

Merged
dimas-b merged 1 commit intoapache:mainfrom
yj-lee0503:ylee1212/proxy-config-addition
Feb 5, 2026
Merged

feat: pass ExternalCatalog properties into federated catalogs#3480
dimas-b merged 1 commit intoapache:mainfrom
yj-lee0503:ylee1212/proxy-config-addition

Conversation

@yj-lee0503
Copy link
Contributor

@yj-lee0503 yj-lee0503 commented Jan 19, 2026

Summary

  • pass ExternalCatalog.properties through federation factories (Iceberg REST, Hive, Hadoop)
  • merge catalog properties with connection config, with connection config taking precedence
  • document proxy/timeout settings for Iceberg REST federation
  • add tests that exercise production merge logic

Context

External catalog properties such as rest.client.proxy.* and REST client timeout settings were not reaching the Iceberg REST HTTP client. This blocks federation in controlled egress environments where outbound traffic must go through an allowlisted forward proxy.

Changes

  • Add a shared merge helper in IcebergRESTExternalCatalogFactory and use it for REST catalog initialization
  • Honor ExternalCatalog.properties for Hive/Hadoop federation as well
  • Update docs with supported proxy/timeout keys and CLI examples

Tests

  • ./gradlew format compileAll
  • ./gradlew rat
  • ./gradlew :polaris-core:test
  • ./gradlew :polaris-extensions-federation-hadoop:test (NO-SOURCE) ✅
  • ./gradlew :polaris-extensions-federation-hive:test (NO-SOURCE) ✅
  • ./gradlew :polaris-runtime-service:test fails with 4 failures in AwsCloudWatchEventListenerTest (Testcontainers/Docker initialization issue; unrelated)

Manual integration test (EKS + Squid forward proxy + External Iceberg REST catalog)

I validated this PR end to end in a real AWS EKS environment with an external Iceberg REST catalog behind a Squid forward proxy. Federation requests succeeded and proxy usage was confirmed via Squid access logs.

Setup

  • Polaris built from this PR branch (1.4.0-incubating-SNAPSHOT)
  • AWS EKS cluster
  • Squid deployed as a Kubernetes service (domain allowlist for the external catalog)
  • External catalog: Iceberg REST
  • Proxy configured via external catalog properties (rest.client.proxy.hostname, rest.client.proxy.port)
{
  "type": "EXTERNAL",
  "name": "test-external-catalog",
  "properties": {
    "rest.client.proxy.hostname": "squid.egress-proxy.svc.cluster.local",
    "rest.client.proxy.port": "3128",
    "rest.client.connection-timeout-ms": "30000",
    "rest.client.socket-timeout-ms": "120000"
  },
  "connectionConfigInfo": {
    "connectionType": "ICEBERG_REST",
    "uri": "https://<external-catalog-host>/polaris/api/catalog",
    "remoteCatalogName": "<remote-catalog>",
    "authenticationParameters": {
      "authenticationType": "OAUTH",
      "tokenUri": "https://<external-catalog-host>/polaris/api/catalog/v1/oauth/tokens",
      "clientId": "<redacted>",
      "clientSecret": "<redacted>",
      "scopes": ["<scope>"]
    }
  }
}

Evidence (Squid access logs)

When calling the federation API (example: GET /api/catalog/v1/<catalog>/namespaces), Squid showed a successful HTTPS tunnel:

<timestamp> <duration> <polaris-pod-ip> TCP_TUNNEL/200 <bytes> CONNECT <external-catalog-host>:443 - HIER_DIRECT/<external-ip> -

This indicates traffic originated from the Polaris pod and went through the proxy via CONNECT to the external catalog host.

Before vs After

Behavior Before PR #3480 After PR #3480
Proxy properties in external catalog config Proxy settings were not propagated to the Iceberg REST HTTP client Proxy settings are passed through and applied by the REST client
Squid access logs for external catalog traffic No CONNECT <external-catalog-host>:443 entries observed TCP_TUNNEL/200 CONNECT <external-catalog-host>:443 entries observed
Network path from Polaris to external catalog Direct egress (proxy bypass) Routed via the configured Squid forward proxy
Federation API calls (namespaces/tables) Failed or could not complete in proxy-only egress environments Succeeded; namespaces and tables retrieved from the remote catalog

Federation API Results

  • List namespaces: returned namespaces from the remote catalog. ✅
  • List tables: returned tables from the remote catalog ✅ .
    All requests were observed going through Squid.

Note

  • This changes an external extension point (ExternalCatalogFactory). Per CONTRIBUTING guidance, I plan to discuss this on the dev mailing list before marking the PR ready for review.

Related to #3465

Checklist

@yj-lee0503
Copy link
Contributor Author

👋 Hi @adutra, I appreciate your thoughtful review and suggestions.

I made updates based on your feedback:

  • Removed the logger field from ExternalCatalogFactory and inlined logging in the default methods
  • Replaced the custom merge logic with org.apache.iceberg.rest.RESTUtil.merge() in the Iceberg REST, Hive, and Hadoop factories
  • Adjusted the tests to align with the RESTUtil.merge() behavior we rely on

If you have a moment, I would really appreciate another look. If there is anything else you would prefer to see structured differently, I am happy to revise.

Thanks again! 🙇‍♂️

@yj-lee0503 yj-lee0503 requested a review from adutra January 20, 2026 21:49
@yj-lee0503 yj-lee0503 marked this pull request as ready for review January 21, 2026 02:06
@yj-lee0503 yj-lee0503 requested a review from adutra January 22, 2026 02:44
@yj-lee0503 yj-lee0503 requested a review from adutra January 27, 2026 21:33
adutra
adutra previously approved these changes Jan 28, 2026
@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Jan 28, 2026
@dimas-b dimas-b requested a review from dennishuo January 31, 2026 02:17
Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for you contribution, @yj-lee0503 ! This looks like a valuable change to me (with a comment).

Also, I wonder if @dennishuo could review too - thx!

if (catalogProperties != null && !catalogProperties.isEmpty()) {
LoggerFactory.getLogger(getClass())
.warn(
"catalogProperties were provided but {} does not override createCatalog with "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a reasonable approach to ensuring backward compatibility with older implementations of this interface.

However, Polaris core will always call this method (if I'm not mistaken), and if a custom implementation does not override it, the server will be producing a lot of WARN messages in runtime.

At the same time, in a new implementation of this interface it may be tempting to the author to override only the old method (especially if helping tools are involved), which would be sub-optimal since the intention is obviously to override this method.

Our standing evolution guidelines make it clear that java interfaces may change at any time. I tend to prefer to make a breaking change in this case and simply add the new parameter to the old method. Old implementations will be able to catch that at CI time and the fix is rather trivial. The benefit is simpler Polaris code base and easier OSS code maintenance (which has more exposure than private implementations).

Apologies if this was already considered and I missed it. This is just my personal opinion. Please consider it non-blocking.

Copy link
Contributor Author

@yj-lee0503 yj-lee0503 Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dimas-b . Thank you for the thoughtful feedback (and the kind words). ✨ Also, I appreciate sharing the evolution guidelines.

I’m leaning toward updating the PR to the breaking-change approach: adding the new parameter directly to the existing interface method and removing the default-method delegation + WARN, unless @dennishuo, @adutra, or others feel strongly that we should preserve compatibility for external implementations right now. I've prepared the changes to be pushed. Please let me know!

Thanks for looping in @dennishuo as well. I look forward to hearing from you all!. 🙇

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed the changes that were discussed in this thread. I can revert the changes if needed. Please let me know. Thank you all again for your review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK with the breaking change approach, FYI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again for your review, @adutra and @dimas-b. 🙇‍♂️

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest state of this PR LGTM 👍 Thanks, @yj-lee0503 !

Given that we're introducing breaking changes into SPI classes, it might be a good idea to also send an email about this PR to the dev ML for general awareness.

@yj-lee0503
Copy link
Contributor Author

The latest state of this PR LGTM 👍 Thanks, @yj-lee0503 !

Given that we're introducing breaking changes into SPI classes, it might be a good idea to also send an email about this PR to the dev ML for general awareness.

Thanks again for your review and guidance, @dimas-b . I sent an email to the dev mailing list. 😄 🙇

@dimas-b
Copy link
Contributor

dimas-b commented Feb 2, 2026

Thanks, @yj-lee0503 ! Could you also post a link to the dev ML thread? I'm not able to find it 😅

@yj-lee0503
Copy link
Contributor Author

Thanks, @yj-lee0503 ! Could you also post a link to the dev ML thread? I'm not able to find it 😅

Ah! I am sorry. I sent it via email. I don't think I have access to the Slack channel unless I am horribly mistaken. I thought it would require *@apache.org email address, which I don't have as a first-time contributor.

@dimas-b
Copy link
Contributor

dimas-b commented Feb 2, 2026

Email is quite fine. They should show up here: https://lists.apache.org/list.html?dev@polaris.apache.org ... but I do not see anything about this PR 🤔

You can send to dev AT polaris.apache.org from any email. You may want to subscribe with your personal email too.

@yj-lee0503
Copy link
Contributor Author

yj-lee0503 commented Feb 3, 2026

Email is quite fine. They should show up here: https://lists.apache.org/list.html?dev@polaris.apache.org ... but I do not see anything about this PR 🤔

You can send to dev AT polaris.apache.org from any email. You may want to subscribe with your personal email too.

I tried to send it twice using my personal email and once using my work email. I think my work email address worked. :) Here is a link.

@dimas-b
Copy link
Contributor

dimas-b commented Feb 3, 2026

Thanks, @yj-lee0503 !

Let's give some grace time for people to be able to catch up with email. I think it would be reasonable to merge on Feb 5 if no objections are raised.

@yj-lee0503
Copy link
Contributor Author

Thanks, @yj-lee0503 !

Let's give some grace time for people to be able to catch up with email. I think it would be reasonable to merge on Feb 5 if no objections are raised.

Sounds amazing! Thanks again for all of your guidance and support. 🙇‍♂️🙏😄 It was a fantastic learning opportunity for me.

@dimas-b
Copy link
Contributor

dimas-b commented Feb 5, 2026

@yj-lee0503 : Could you rebase to fix CI checks, please?

Pass ExternalCatalog.properties through to federated catalog clients

fix double init and add backward compatibility

update changelog on ExternalCatalog properties

Replace custom merge logic with RESTUtil.merge() in all federated   catalog factories (IcebergREST, Hive, Hadoop)

Inline logger in ExternalCatalogFactory interface

Simplify tests to verify RESTUtil.merge() behavior we depend on

Remove redundant RESTUtil.merge() tests

Resolve commit conflicts.

Simplify ExternalCatalogFactory to breaking change per review

Remove deprecated 2-param createCatalog/createGenericCatalog methods
and make the 3-param versions the only abstract methods. This follows
Polaris evolution guidelines that Java interfaces may change at any
time, and avoids runtime WARN noise for legacy implementations.

Move CHANGELOG entry from Changes to Breaking changes section.
@yj-lee0503 yj-lee0503 force-pushed the ylee1212/proxy-config-addition branch from 56b3e8e to a294004 Compare February 5, 2026 20:31
@yj-lee0503
Copy link
Contributor Author

@yj-lee0503 : Could you rebase to fix CI checks, please?

Hello @dimas-b ! I rebased my branch. Thanks for your help. 🙇‍♂️

@dimas-b dimas-b enabled auto-merge (squash) February 5, 2026 21:26
@dimas-b dimas-b merged commit 216ba55 into apache:main Feb 5, 2026
15 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Feb 5, 2026
@yj-lee0503
Copy link
Contributor Author

@dimas-b and @adutra - thanks a ton for your feedback and patience with my first-time contribution. I am excited that these changes were merged. 😄 🚀 I hope you have a fabulous day/weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Federation (ExternalCatalog ICEBERG_REST): rest.client.proxy.* properties not being applied

3 participants