-
Notifications
You must be signed in to change notification settings - Fork 3k
[REST | SPARK]: Reference implementation of referenced-by in the loadTable call #13979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
gaborkaszab
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @singhpk234 !
I took a look for my own benefit, and left some observations and questions meanwhile. Let me know what you think!
api/src/main/java/org/apache/iceberg/catalog/ContextAwareTableCatalog.java
Outdated
Show resolved
Hide resolved
| * @throws NoSuchTableException if the table does not exist | ||
| */ | ||
| Table loadTableViaView(TableIdentifier identifier, Map<String, Object> context) | ||
| throws NoSuchTableException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the referencing view doesn't exist? Shouldn't we check for the referencing view and throw NoSuchViewException, or should we silently omit if the view doesn't exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the concern that its a public API, that we should validate this ? My intention was to treat this as a runtime property, I renamed this API to loadTableWithContext, please let me know if this helps addressing this concern.
| * @return the loaded table | ||
| * @throws NoSuchTableException if the table does not exist | ||
| */ | ||
| Table loadTableViaView(TableIdentifier identifier, Map<String, Object> context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we expect more context params to this function? Can't we have a simple String parameter to hold the name of the view? Or Maybe a TableIdentifier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we expect more context params to this function
That was the intention, to leave door open to pass more runtime context to the catalog, i was trying to model this as something like we already do i.e SessionContext this is a more runtime context so that engines can plumb their context back to the catalog and then catalog client can use this to make subsequent calls to the server. Hence didn't just have view identifier in the signature.
|
|
||
| @Override | ||
| public Table loadTableViaView(TableIdentifier identifier, Map<String, Object> viewContext) { | ||
| if (delegate instanceof ContextAwareTableCatalog) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delegate is a BaseSessionCatalog.AsCatalog that now implements ContextAwareTableCatalog. Is there a use-case or configuration where delegate doesn't implement the new interface?
I see the concept in this class is rather to have a new member of type ContextAwareTableCatalog that is converted from delegate and this new function could directly call that. Similarly as nsCatalog.
Would this also work here? Do I miss something?
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java
Outdated
Show resolved
Hide resolved
api/src/main/java/org/apache/iceberg/catalog/ContextAwareTableCatalog.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| @Override | ||
| public Table loadTableViaView(Identifier ident, String version, Map<String, Object> context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as in core/: Now a loadTableViaView function performs the load table even not via view, that seems a bit weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed it to loadTableWithContext where context would be a runtime context, let me know does is sounds better.
api/src/main/java/org/apache/iceberg/catalog/ContextAwareTableCatalog.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java
Outdated
Show resolved
Hide resolved
| String[] namespaceParts = new String[parts.length - 1]; | ||
| System.arraycopy(parts, 0, namespaceParts, 0, parts.length - 1); | ||
| String viewName = parts[parts.length - 1]; | ||
| queryParams.put( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding:
Let's take the following input: "ns1.ns2.view1"
We convert this to "ns1%1Fns2%1F.view1"
Is this because we wan't to tell the REST catalog which part is the namespace and which is the view name? If we include a dot in the query params, can't we just send the view path as it is with dots and let the catalog server figure out the namespace/view name part that we do here? Or can we also use the same special char between the namespace and the view name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one way to look at this is anything from the last . to the end is the view name and the every thing before it is the namespace, the reason %1F was chosen the separator was to use something similar, the reason i choose the %1F is the language of the existing rest spec for the nested namespaces.
I think we are yet to conclude on the separator discussion.
97296ab to
bcec3f2
Compare
api/src/main/java/org/apache/iceberg/catalog/ContextAwareTableCatalog.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java
Outdated
Show resolved
Hide resolved
...ain/scala/org/apache/spark/sql/catalyst/plans/logical/views/UnResolvedRelationFromView.scala
Outdated
Show resolved
Hide resolved
a34c0f2 to
fb6e824
Compare
20d30b3 to
737fc50
Compare
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
3271d52 to
f368355
Compare
f368355 to
57e3e30
Compare
gaborkaszab
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through the api/ and core/ part for now (haven't checked Spark). Left some questions and nits. I also found some of my previous comments so I'm not sure this was ready for review.
Anyway, some test coverage in TestRESTCatalog would be great!
| * <p>Common context keys: | ||
| * | ||
| * <ul> | ||
| * <li>{@link #VIEW_IDENTIFIER_KEY}: A List<TableIdentifier> of view(s) referencing this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to escape the lt and gt chars?
"List<TableIdentifier>"
|
|
||
| default Table loadTableWithContext( | ||
| SessionContext sessionContext, TableIdentifier ident, Map<String, Object> context) { | ||
| return loadTable(sessionContext, ident); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered throwing UnsupportedOperationException instead of silently ignore the context param?
|
|
||
| @Override | ||
| public Table loadTable(TableIdentifier identifier, Map<String, Object> loadingContext) { | ||
| return BaseSessionCatalog.this.loadTableWithContext(context, identifier, loadingContext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe consider renaming context to sessionContext to avoid confusion with the new type of context?
| .get( | ||
| paths.table(identifier), | ||
| snapshotModeToParam(mode), | ||
| referencedByToQueryParam(snapshotModeToParam(mode), viewContext), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: here at the calcite the reader could have the expression that the order of referencedByToQueryParam and snapshotModeToParam is interchangeable, but in practice it isn't. Maybe having a separate function to paramsForLoadTable that produces both?
| Map<String, String> queryParams = Maps.newHashMap(params); | ||
| Object viewIdentifierObj = context.get(ContextAwareTableCatalog.VIEW_IDENTIFIER_KEY); | ||
|
|
||
| if (!(viewIdentifierObj instanceof List)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe Preconditions.checkArgument(viewIdentifierObj instanceof List, error msg)
| } | ||
|
|
||
| @SuppressWarnings("unchecked") | ||
| List<TableIdentifier> viewIdentifiers = (List<TableIdentifier>) viewIdentifierObj; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we include a check into the above that not just check for List type but for List? Wouldn't this result in a ClassCastException or something similar if we input List that contains anything other than TableIdentifier?
| } | ||
|
|
||
| // Format per REST API spec: | ||
| // - Namespace parts joined by namespace separator (0x1F, URL-encoded as %1F) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separator is configurable now, the comment now is invalid
| // - Commas in view names must be percent-encoded as %2C | ||
| List<String> formattedViews = Lists.newArrayListWithCapacity(viewIdentifiers.size()); | ||
| for (TableIdentifier viewId : viewIdentifiers) { | ||
| // Encode namespace using RESTUtil which handles the namespace separator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't think these comments in the for loop are necessary. The code seems pretty straightforward and the longer comment above the loop also helps
|
|
||
| @Override | ||
| public Table loadTable(SessionContext context, TableIdentifier identifier) { | ||
| public Table loadTableWithContext( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this had been discussed already, but can't this be simply loadTable? The params will tell us anyway that this is with context.
About the change
This provides a reference implementation for passing the view name that table is referenced in as part of which the loadTable call is being made.
Details on the spec change proposal and motivation here