-
Notifications
You must be signed in to change notification settings - Fork 3k
Catalogs: Add support for unique table locations via catalog property #12892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@RussellSpitzer @kbendick @rdblue could you take a look? What do you think? |
|
+1 to having a catalog property for unique table locations |
|
I think this makes a lot of sense but I'm not sure if this should be a client side decision. I'd like us to explore the idea of "owned locations" for tables and talk more about catalog responsibilities. I think as a nice "best effort" feature this is a good thing to do, but I really think the Catalog needs to own/manage where tables are allowed to be located. |
|
Brief notes from the Catalog sync: Common sentiment was that this is good to get in, REST Catalogs can still do what they want (including ignoring client generated unique paths). More reviewers should be incoming as well |
|
Really looking forward having this unique table locations feature in Iceberg. |
|
Is there anything I should add to the PR? |
|
@nastra, I see you contributed a lot to catalogs, could you please review the PR? |
aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogTable.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/jdbc/TestJdbcCatalog.java
Outdated
Show resolved
Hide resolved
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/SparkCatalogConfig.java
Outdated
Show resolved
Hide resolved
|
Thank you for the review! I'll commit all the changes separately and rebase them into a single commit when it's ready because I have some questions. |
3364136 to
ea47451
Compare
|
@nastra do you have more comments? |
aws/src/test/java/org/apache/iceberg/aws/dynamodb/TestDynamoDbCatalog.java
Outdated
Show resolved
Hide resolved
aws/src/test/java/org/apache/iceberg/aws/dynamodb/TestDynamoDbCatalog.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/inmemory/InMemoryCatalog.java
Outdated
Show resolved
Hide resolved
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/SparkCatalogConfig.java
Outdated
Show resolved
Hide resolved
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/sql/TestUniqueLocation.java
Outdated
Show resolved
Hide resolved
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/sql/TestUniqueLocation.java
Show resolved
Hide resolved
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/sql/TestUniqueLocation.java
Outdated
Show resolved
Hide resolved
|
This PR is quite big, I might be able to review it next week. Thanks, Peter |
aws/src/integration/java/org/apache/iceberg/aws/glue/GlueTestBase.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java
Outdated
Show resolved
Hide resolved
aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogTable.java
Outdated
Show resolved
Hide resolved
|
Resolved conflicts with the latest master |
|
Merged latest master |
|
Restarting the jobs |
|
@nastra: any more comments on this? |
It's been a while, so let me do another full pass |
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java
Outdated
Show resolved
Hide resolved
|
|
||
| @Override | ||
| protected boolean supportsUniqueTableLocation() { | ||
| return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be the same as I mentioned in TestRESTCatalog where the flag just isn't passed to the backing catalog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my initial PR version I configured RESTServerExtension to use unique table path. But after @pvary review I removed it as it affected all the tests, not only new ones. Please see these comments:
#12892 (comment)
#12892 (comment)
#12892 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the original issue that was flagged I believe is that UNIQUE_TABLE_LOCATION was set to true by default for all tests, which is a behavioral change. We don't want to set this by default. Instead, we only want to set it in the respective tests that verify unique table location behavior, which you are already doing in the new tests you added to CatalogTests.
My expectation is actually that TestRESTCatalog would work with the new tests you added as well. However, I think the reason why they don't currently work with TestRESTCatalog is because the flag isn't fully passed through to the backendCatalog that is used by the REST catalog.
Does that make sense? So you'd need to make sure that the UNIQUE_TABLE_LOCATION flag is passed through to the underlying/backend catalog, which in the case of TestRESTCatalog is the InMemoryCatalog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nastra I enabled new tests for TestRESTCatalog and set UNIQUE_TABLE_LOCATION for backendCatalog, please take a look.
There are two other places where I can't set UNIQUE_TABLE_LOCATION only for new tests:
- TestBaseWithCatalog.java#REST_SERVER_EXTENSION - RESTServerExtension is not configurable per test, so I can't run my Spark tests on REST Catalog.
- RESTCompatibilityKitCatalogTests.java#RESTServerExtension - the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also removed supportsUniqueTableLocation flag and disabled new test in RESTCompatibilityKitCatalogTests
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/sql/TestUniqueTableLocation.java
Outdated
Show resolved
Hide resolved
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/sql/TestUniqueTableLocation.java
Outdated
Show resolved
Hide resolved
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/SparkCatalogConfig.java
Outdated
Show resolved
Hide resolved
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java
Outdated
Show resolved
Hide resolved
262c9eb to
b72b7c1
Compare
bigquery/src/test/java/org/apache/iceberg/gcp/bigquery/TestBigQueryCatalog.java
Show resolved
Hide resolved
|
I rebased on the latest master, resolved conflicts, addressed all review comments, removed the old Spark-version tests as @nastra suggested, and added tests for Spark 4.1. There are two open threads - if they don't require further changes, this PR is ready. |
It's revival and extension of #2850, regards to @sshkvar.
This PR introduces a new catalog property,
unique-table-location, which enables generating unique table locations for catalogs that support table rename operations. The feature is disabled by default to preserve current behavior.When enabled, a unique suffix is added to the table path, ensuring that each table has its own dedicated storage location including scenarios involving table renames. This addresses a key issue where, after renaming a table and creating a new one with the original name, both tables would otherwise share the same location. Such overlap can lead to:
DeleteOrphanFilesSparkAction, which may inadvertently delete files belonging to other tables in the shared location.NessieCatalog already supports it, but it's not configurable:
iceberg/nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java
Line 258 in ee2ffb4
Such feature was added to Trino a while ago trinodb/trino#6063 and related discussion trinodb/trino#5632 (comment)