-
Notifications
You must be signed in to change notification settings - Fork 3k
SPEC: Add referenced-by in loadTable API #13810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
fe3b4cb to
c66b870
Compare
ac2ecee to
afdb92b
Compare
|
devlist thread - https://lists.apache.org/thread/01gb9rygdd1gqks7lnl1o6440qocnh9m |
094b0f0 to
89e088b
Compare
| type: string | ||
| enum: [ all, refs ] | ||
| - in: query | ||
| name: referenced-by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call it something like referenced-by-view or from-view since it can only be referenced by a view here? I understand that the description already explicitly says that it's a view, but people can still misuse it given a generic name like referenced by.
flyrain
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @singhpk234 !
gaborkaszab
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment about the expected behavior when the referenced-by view doesn't exist.
89e088b to
d62e21c
Compare
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
gaborkaszab
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine, just left some nits or questions
|
Thanks for your great work! Although we should trust the clients, I still have some concern about the security risk because it relies on client-provided information without server-side validation. The server cannot verify if the referenced view actually exists or legitimately references the table being loaded, potentially allowing permission bypasses. For example: Jerry has the privilege to read the view We would better add more constraints in the |
|
@qqqttt123 thank you for the feedback ! We covered this discussion in here as well as in the design doc here as proposal mentions its first step towards enable, there will subsequent work required in the catalog end to complete E2E Please let us know if it helps answer your concern ! |
Yes, it helps a lot. Thanks. We shouldn't involve the authorization concepts for the spec. I found that Trino add |
- Rename parameter from 'referenced-list' to 'referenced-by' - Update namespace separator docs to align with PR apache#14448 - Clarify parsing rules for dot separator between namespace and view name - Document comma separator for multiple view identifiers - Add example showing multiple comma-separated views - Specify URL encoding for commas in view names (%2C) Format: ?referenced-by=namespace.view1,namespace.view2 where namespace parts use configurable separator (default %1F) Separators used: - %1F (or configured): between namespace parts - . (dot): between namespace and view name - , (comma): between multiple view identifiers
4e98998 to
430b5bd
Compare
| name: referenced-by | ||
| description: | ||
| A comma-separated list of fully qualified view names (namespace and view name) representing the view | ||
| reference chain when a table is loaded via a view. The list should be ordered with the outermost view |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the reference chain and not the last view to reference the table? Is there some case that we can't handle with just the direct reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its for the cases when the direct referencing view is not a definer view but some view in the chain might be a DEFINER view ? Hence we decided to send the whole reference chain instead of just the directly referencing view.
we discussed this in the catalog sync as well, Notes : Link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I was there, but I don't remember there being a specific conclusion to that question. I think we need to be clear about exactly what this is for before moving forward so that we know that this requires the full reference list.
If I understand correctly, the idea is that you have a nested view that is run as INVOKER within a view run as DEFINER. If the inner view were run as DEFINER, we would not need any references from outside of that view because it is self-contained, so it must be that we have an INVOKER within a DEFINER. Then the question is how to handle the inner INVOKER: is the definer of the outer view considered the invoker of the inner view? If so, we need to know about the outer view.
This may seem like an obvious choice, but always using the access privileges from the original query is how PostgreSQL works:
If the view has the security_invoker property set to true, access to the underlying base relations is determined by the permissions of the user executing the query, rather than the view owner. Thus, the user of a security invoker view must have the relevant permissions on the view and its underlying base relations.
If any of the underlying base relations is a security invoker view, it will be treated as if it had been accessed directly from the original query. Thus, a security invoker view will always check its underlying base relations using the permissions of the current user, even if it is accessed from a view without the security_invoker property.
If we used this interpretation, that the INVOKER is not context-dependent, then we would have a simpler spec and behavior (DEFINER=definer's permissions, INVOKER=query permissions), but at the cost of removing flexibility for catalog implementers.
I'm trying to think whether this is a situation where the simpler option would be too restrictive. I think the case where you would want to leave the inner view as INVOKER is when you want both INVOKER and DEFINER behavior when accessed directly (INVOKER) and indirectly (DEFINER). That seems like a weird pattern with easy work-arounds to me: if you want to delegate when accessing indirectly, you can embed the INVOKER view SQL in the DEFINER view.
I think I'd vote for the simpler option, but I'm interested in a bit more discussion on this before we finalize it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any prior art for engines/frameworks/systems that allow you override a downstream invoker?
This scenario to be concrete
Definer => Invoker => Table
Definer View made by User A
Table is Accesible to User A
User B queries definer view
Does User B see rows from the table?
From Ryan's quote this doesn't seem like it would be allowed in postgres
I was checking Trino and it doesn't look like it supports this either.
In the INVOKER security mode, tables referenced in the view are accessed using the permissions of the user executing the query (the invoker of the view). A view created in this mode is simply a stored query.
Snowflake doesn't support Invoker views, so it's always Definer
Not that I want to forbid a catalog being able to do this, but I think it would be helpful to know if anyone actually plans on allowing this pattern? Feels like it would be a security hole?
Are there any other use cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like it would be a security hole?
I'm not sure I follow the case where this could be a security hole. Any time you get the permissions of a DEFINER, you must have access to the DEFINER view. Wouldn't it be strange if the catalog's intent was to nest an INVOKER view inside a DEFINER view in order to protect data referenced by the INVOKER? And I don't think it's a hole if that's the case because the catalog is what gets to decide (at least with the referenced-by chain) what the behavior is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I mean more like a practice I would not encourage.
I feel like if you have an invoker, that is a signal that the data behind the invoker is very sensitive and only certain users should have it. If you add a definer on top of the invoker view it's like you are bypassing that. Better to build a definer view in the first place
About the change
This change proposes referenced-by in loadTable request, which is expected to contain the FQN of the view (only 2 part identifier), which a rest catalog would expect from the client based on that it knows the table is being loaded in the context of view (view referencing the table) so that catalog can an action accordingly.
This would be really helpful in following cases :
Supporting the security for views, i.e definer / invoker mode :
Definermode means the access to table should be authorized against the principal which created the view, this will be a very normal case where one would want to create a view and grant access to view but not to the underlying table.Invokermode means the access to table should be authorized against the principal which which is calling the loadTable essentially what happens in default.when a view is defined in a definer mode security,
referenced-bywould help give catalog proper signal that this loadTable is happening in the context of view (i.e view is referencing the table) so the catalog can reverse lookup who the creator was and what the security mode is defined, and take proper authZ action.Reference Implementation
TODO: send a dev list thread.