Skip to content

Conversation

@davseitsev
Copy link

It's revival and extension of #2850, regards to @sshkvar.

This PR introduces a new catalog property, unique-table-location, which enables generating unique table locations for catalogs that support table rename operations. The feature is disabled by default to preserve current behavior.

When enabled, a unique suffix is added to the table path, ensuring that each table has its own dedicated storage location including scenarios involving table renames. This addresses a key issue where, after renaming a table and creating a new one with the original name, both tables would otherwise share the same location. Such overlap can lead to:

  • Data loss during the DeleteOrphanFilesSparkAction, which may inadvertently delete files belonging to other tables in the shared location.
  • Difficulties in analyzing table-specific storage costs, as storage usage cannot be cleanly attributed to individual tables.
  • Inability to apply path-based rules such as S3 Intelligent-Tiering, Lifecycle Rules, or fine-grained permissions, which depend on isolated storage paths.

NessieCatalog already supports it, but it's not configurable:

return location + "_" + UUID.randomUUID();

Such feature was added to Trino a while ago trinodb/trino#6063 and related discussion trinodb/trino#5632 (comment)

@davseitsev
Copy link
Author

@RussellSpitzer @kbendick @rdblue could you take a look? What do you think?

@mrcnc
Copy link
Contributor

mrcnc commented May 6, 2025

+1 to having a catalog property for unique table locations

@RussellSpitzer
Copy link
Member

I think this makes a lot of sense but I'm not sure if this should be a client side decision. I'd like us to explore the idea of "owned locations" for tables and talk more about catalog responsibilities. I think as a nice "best effort" feature this is a good thing to do, but I really think the Catalog needs to own/manage where tables are allowed to be located.

@RussellSpitzer
Copy link
Member

Brief notes from the Catalog sync: Common sentiment was that this is good to get in, REST Catalogs can still do what they want (including ignoring client generated unique paths). More reviewers should be incoming as well

@kongul
Copy link

kongul commented May 19, 2025

Really looking forward having this unique table locations feature in Iceberg.
Business analysts in our company got used to renaming tables a lot

@davseitsev
Copy link
Author

Is there anything I should add to the PR?

@davseitsev
Copy link
Author

@nastra, I see you contributed a lot to catalogs, could you please review the PR?

@davseitsev
Copy link
Author

Thank you for the review! I'll commit all the changes separately and rebase them into a single commit when it's ready because I have some questions.

@davseitsev davseitsev force-pushed the main branch 2 times, most recently from 3364136 to ea47451 Compare August 10, 2025 18:04
@github-actions github-actions bot added the GCP label Aug 10, 2025
@davseitsev
Copy link
Author

@nastra do you have more comments?
Also I'm wandering if it makes sense to restrict table creation in existing directory if the user specifies custom location. Maybe it makes sense to validate it if unique-table-location=true

@pvary
Copy link
Contributor

pvary commented Aug 12, 2025

This PR is quite big, I might be able to review it next week. Thanks, Peter

@davseitsev
Copy link
Author

Resolved conflicts with the latest master

@davseitsev
Copy link
Author

Merged latest master

@pvary
Copy link
Contributor

pvary commented Jan 12, 2026

Restarting the jobs

@pvary pvary closed this Jan 12, 2026
@pvary pvary reopened this Jan 12, 2026
@pvary
Copy link
Contributor

pvary commented Jan 12, 2026

@nastra: any more comments on this?

@nastra
Copy link
Contributor

nastra commented Jan 13, 2026

@nastra: any more comments on this?

It's been a while, so let me do another full pass


@Override
protected boolean supportsUniqueTableLocation() {
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be the same as I mentioned in TestRESTCatalog where the flag just isn't passed to the backing catalog

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my initial PR version I configured RESTServerExtension to use unique table path. But after @pvary review I removed it as it affected all the tests, not only new ones. Please see these comments:
#12892 (comment)
#12892 (comment)
#12892 (comment)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nastra, what do you suggest we do with the REST catalog tests? Can we keep them excluded in this PR, or should I re-enable them and address the issue @pvary mentioned so they don’t force all tests to use a unique location?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the original issue that was flagged I believe is that UNIQUE_TABLE_LOCATION was set to true by default for all tests, which is a behavioral change. We don't want to set this by default. Instead, we only want to set it in the respective tests that verify unique table location behavior, which you are already doing in the new tests you added to CatalogTests.
My expectation is actually that TestRESTCatalog would work with the new tests you added as well. However, I think the reason why they don't currently work with TestRESTCatalog is because the flag isn't fully passed through to the backendCatalog that is used by the REST catalog.
Does that make sense? So you'd need to make sure that the UNIQUE_TABLE_LOCATION flag is passed through to the underlying/backend catalog, which in the case of TestRESTCatalog is the InMemoryCatalog

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nastra I enabled new tests for TestRESTCatalog and set UNIQUE_TABLE_LOCATION for backendCatalog, please take a look.

There are two other places where I can't set UNIQUE_TABLE_LOCATION only for new tests:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also removed supportsUniqueTableLocation flag and disabled new test in RESTCompatibilityKitCatalogTests

@nastra nastra requested a review from danielcweeks January 13, 2026 11:26
@davseitsev davseitsev force-pushed the main branch 2 times, most recently from 262c9eb to b72b7c1 Compare January 29, 2026 20:28
@davseitsev
Copy link
Author

I rebased on the latest master, resolved conflicts, addressed all review comments, removed the old Spark-version tests as @nastra suggested, and added tests for Spark 4.1. There are two open threads - if they don't require further changes, this PR is ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants