-
Notifications
You must be signed in to change notification settings - Fork 376
Description
Hello, I had some questions around Iceberg-rust regarding data interactions with S3, authn, and authz.
-
How does connecting an Iceberg catalog with a specific S3 bucket work? I understand the structure on S3 with dividing a table into parquet data files and avro metadata files, but I am not sure how the relationship between this file organization and a deployed catalog works, and how to configure that exactly.
-
Where does Pyiceberg fit into Iceberg-rust? Would it be possible to deploy Iceberg-rust on the server side, and interact with the rest catalog through Pyiceberg? I like python as a nice interface for data consumers to interact with a catalog, and for basic management of tables.
-
What are the write table options with an Iceberg rust? As of now, is it only possible with a distributed engine like Spark or Trino? What would be the bottlenecks to duckdb, polars, or Ibis+backend writes? The vast majority of my datasets are less than 50Gb currently, and most workloads a fraction of that. I would like to use Iceberg for its superior data management vs files, but initially for use cases that can mostly be done on a single node and don't really need the power of distributed engines.
-
How does authentication and authorization work with the current Iceberg-rust? The access control system I described above works for AWS S3 and sharing files. Any pointers about where I could learn to integrate IAM permissions into a catalog and tables? It seems the creators of https://github.com/hansetag/iceberg-catalog are in the middle of implementing some of these exact features. I would love to contribute on these features and implement for my use case. It seems the way it works where non-AWS credentials are vended to consumers, and the catalog uses AWS credentials to sign S3 requests for the users, but I am not sure. I am also not sure how this implementation compares with the open-sourced implementation released by Databricks.
-
Where exactly does OpenDAL fit into the Iceberg-rust catalog? Would OpenDAL help standardize accessing data from the catalog? The custom metadata Tracking issues of user metadata support opendal#4842 feature could also be useful for connecting tables to different authz commands.