Skip to content

Conversation

@singhpk234
Copy link
Contributor

@singhpk234 singhpk234 commented Aug 14, 2025

About the change

This change proposes referenced-by in loadTable request, which is expected to contain the FQN of the view (only 2 part identifier), which a rest catalog would expect from the client based on that it knows the table is being loaded in the context of view (view referencing the table) so that catalog can an action accordingly.

This would be really helpful in following cases :

Supporting the security for views, i.e definer / invoker mode :
Definer mode means the access to table should be authorized against the principal which created the view, this will be a very normal case where one would want to create a view and grant access to view but not to the underlying table.

Invoker mode means the access to table should be authorized against the principal which which is calling the loadTable essentially what happens in default.

when a view is defined in a definer mode security, referenced-by would help give catalog proper signal that this loadTable is happening in the context of view (i.e view is referencing the table) so the catalog can reverse lookup who the creator was and what the security mode is defined, and take proper authZ action.

Reference Implementation

TODO: send a dev list thread.

@singhpk234 singhpk234 force-pushed the feature/loaded-via-view branch from fe3b4cb to c66b870 Compare August 14, 2025 01:43
@singhpk234 singhpk234 force-pushed the feature/loaded-via-view branch 5 times, most recently from ac2ecee to afdb92b Compare August 17, 2025 23:25
@singhpk234 singhpk234 changed the title SPEC: Add loaded-via in loadTable API SPEC: Add referenced-by in loadTable API Aug 21, 2025
@singhpk234
Copy link
Contributor Author

singhpk234 commented Aug 21, 2025

@singhpk234 singhpk234 added the Specification Issues that may introduce spec changes. label Aug 28, 2025
@github-actions github-actions bot added the API label Sep 2, 2025
@singhpk234 singhpk234 force-pushed the feature/loaded-via-view branch 4 times, most recently from 094b0f0 to 89e088b Compare September 3, 2025 00:59
@github-actions github-actions bot removed the API label Sep 3, 2025
@singhpk234 singhpk234 marked this pull request as ready for review September 5, 2025 01:04
type: string
enum: [ all, refs ]
- in: query
name: referenced-by
Copy link
Contributor

@flyrain flyrain Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call it something like referenced-by-view or from-view since it can only be referenced by a view here? I understand that the description already explicitly says that it's a view, but people can still misuse it given a generic name like referenced by.

Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @singhpk234 !

Copy link
Collaborator

@gaborkaszab gaborkaszab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment about the expected behavior when the referenced-by view doesn't exist.

@singhpk234 singhpk234 force-pushed the feature/loaded-via-view branch from 89e088b to d62e21c Compare September 17, 2025 17:47
@github-actions
Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Oct 24, 2025
@github-actions
Copy link

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

Copy link
Collaborator

@gaborkaszab gaborkaszab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, just left some nits or questions

@qqqttt123
Copy link

qqqttt123 commented Jan 29, 2026

Thanks for your great work! Although we should trust the clients, I still have some concern about the security risk because it relies on client-provided information without server-side validation. The server cannot verify if the referenced view actually exists or legitimately references the table being loaded, potentially allowing permission bypasses.

For example:
Tom creates a view A using SQL, has the privilege to read table B and table C.

CREATE VIEW A 
SQL SECURITY DEFINER
AS SELECT * FROM B WHERE department = 'Engineering';

Jerry has the privilege to read the view A, if he mocks a request load table C referenced by view A, it will produce a security risk.

We would better add more constraints in the createVIew request. We should add required table identifiers in the request. So we can validate the table identifiers in the server side when loading tables referenced by the view.

@singhpk234
Copy link
Contributor Author

singhpk234 commented Jan 29, 2026

@qqqttt123 thank you for the feedback !
I believe in order for the catalog to authorize the access to the table based on the clients input, its requires a notion of trust between catalog and the client, we neither define AuthZ in IRC nor we define notion of Trusted Engine in IRC, hence how the catalog wanna authorize the access to the table based on reference-by is not intentionally described here.

We covered this discussion in here as well as in the design doc here as proposal mentions its first step towards enable, there will subsequent work required in the catalog end to complete E2E

Please let us know if it helps answer your concern !

@qqqttt123
Copy link

@qqqttt123 thank you for the feedback ! I believe in order for the catalog to authorize the access to the table based on the clients input, its requires a notion of trust between catalog and the client, we neither define AuthZ in IRC nor we define notion of Trusted Engine in IRC, hence how the catalog wanna authorize the access to the table based on reference-by is not intentionally described here.

We covered this discussion in here as well as in the design doc here as proposal mentions its first step towards enable, there will subsequent work required in the catalog end to complete E2E

Please let us know if it helps answer your concern !

Yes, it helps a lot. Thanks. We shouldn't involve the authorization concepts for the spec.

I found that Trino add ICEBERG_VIEW_RUN_AS_OWNER as the property of the view. Maybe we should add table identifiers of the view to the properties, too. It will benefits authorization of the Iceberg REST catalog.

- Rename parameter from 'referenced-list' to 'referenced-by'
- Update namespace separator docs to align with PR apache#14448
- Clarify parsing rules for dot separator between namespace and view name
- Document comma separator for multiple view identifiers
- Add example showing multiple comma-separated views
- Specify URL encoding for commas in view names (%2C)

Format: ?referenced-by=namespace.view1,namespace.view2
where namespace parts use configurable separator (default %1F)

Separators used:
- %1F (or configured): between namespace parts
- . (dot): between namespace and view name
- , (comma): between multiple view identifiers
name: referenced-by
description:
A comma-separated list of fully qualified view names (namespace and view name) representing the view
reference chain when a table is loaded via a view. The list should be ordered with the outermost view
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the reference chain and not the last view to reference the table? Is there some case that we can't handle with just the direct reference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its for the cases when the direct referencing view is not a definer view but some view in the chain might be a DEFINER view ? Hence we decided to send the whole reference chain instead of just the directly referencing view.

we discussed this in the catalog sync as well, Notes : Link

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was there, but I don't remember there being a specific conclusion to that question. I think we need to be clear about exactly what this is for before moving forward so that we know that this requires the full reference list.

If I understand correctly, the idea is that you have a nested view that is run as INVOKER within a view run as DEFINER. If the inner view were run as DEFINER, we would not need any references from outside of that view because it is self-contained, so it must be that we have an INVOKER within a DEFINER. Then the question is how to handle the inner INVOKER: is the definer of the outer view considered the invoker of the inner view? If so, we need to know about the outer view.

This may seem like an obvious choice, but always using the access privileges from the original query is how PostgreSQL works:

If the view has the security_invoker property set to true, access to the underlying base relations is determined by the permissions of the user executing the query, rather than the view owner. Thus, the user of a security invoker view must have the relevant permissions on the view and its underlying base relations.

If any of the underlying base relations is a security invoker view, it will be treated as if it had been accessed directly from the original query. Thus, a security invoker view will always check its underlying base relations using the permissions of the current user, even if it is accessed from a view without the security_invoker property.

If we used this interpretation, that the INVOKER is not context-dependent, then we would have a simpler spec and behavior (DEFINER=definer's permissions, INVOKER=query permissions), but at the cost of removing flexibility for catalog implementers.

I'm trying to think whether this is a situation where the simpler option would be too restrictive. I think the case where you would want to leave the inner view as INVOKER is when you want both INVOKER and DEFINER behavior when accessed directly (INVOKER) and indirectly (DEFINER). That seems like a weird pattern with easy work-arounds to me: if you want to delegate when accessing indirectly, you can embed the INVOKER view SQL in the DEFINER view.

I think I'd vote for the simpler option, but I'm interested in a bit more discussion on this before we finalize it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any prior art for engines/frameworks/systems that allow you override a downstream invoker?

This scenario to be concrete

Definer => Invoker => Table 

Definer View made by User A
Table is Accesible to User A
User B queries definer view

Does User B see rows from the table?


From Ryan's quote this doesn't seem like it would be allowed in postgres

I was checking Trino and it doesn't look like it supports this either.

In the INVOKER security mode, tables referenced in the view are accessed using the permissions of the user executing the query (the invoker of the view). A view created in this mode is simply a stored query.

Snowflake doesn't support Invoker views, so it's always Definer


Not that I want to forbid a catalog being able to do this, but I think it would be helpful to know if anyone actually plans on allowing this pattern? Feels like it would be a security hole?

Are there any other use cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like it would be a security hole?

I'm not sure I follow the case where this could be a security hole. Any time you get the permissions of a DEFINER, you must have access to the DEFINER view. Wouldn't it be strange if the catalog's intent was to nest an INVOKER view inside a DEFINER view in order to protect data referenced by the INVOKER? And I don't think it's a hole if that's the case because the catalog is what gets to decide (at least with the referenced-by chain) what the behavior is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I mean more like a practice I would not encourage.

I feel like if you have an invoker, that is a signal that the data behind the invoker is very sensitive and only certain users should have it. If you add a definer on top of the invoker view it's like you are bypassing that. Better to build a definer view in the first place

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core not-stale OPENAPI spark Specification Issues that may introduce spec changes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.