Skip to content

Questions around Iceberg-rust #450

@ChristianCasazza

Description

@ChristianCasazza

Hello, I had some questions around Iceberg-rust regarding data interactions with S3, authn, and authz.

  1. How does connecting an Iceberg catalog with a specific S3 bucket work? I understand the structure on S3 with dividing a table into parquet data files and avro metadata files, but I am not sure how the relationship between this file organization and a deployed catalog works, and how to configure that exactly.

  2. Where does Pyiceberg fit into Iceberg-rust? Would it be possible to deploy Iceberg-rust on the server side, and interact with the rest catalog through Pyiceberg? I like python as a nice interface for data consumers to interact with a catalog, and for basic management of tables.

  3. What are the write table options with an Iceberg rust? As of now, is it only possible with a distributed engine like Spark or Trino? What would be the bottlenecks to duckdb, polars, or Ibis+backend writes? The vast majority of my datasets are less than 50Gb currently, and most workloads a fraction of that. I would like to use Iceberg for its superior data management vs files, but initially for use cases that can mostly be done on a single node and don't really need the power of distributed engines.

  4. How does authentication and authorization work with the current Iceberg-rust? The access control system I described above works for AWS S3 and sharing files. Any pointers about where I could learn to integrate IAM permissions into a catalog and tables? It seems the creators of https://github.com/hansetag/iceberg-catalog are in the middle of implementing some of these exact features. I would love to contribute on these features and implement for my use case. It seems the way it works where non-AWS credentials are vended to consumers, and the catalog uses AWS credentials to sign S3 requests for the users, but I am not sure. I am also not sure how this implementation compares with the open-sourced implementation released by Databricks.

  5. Where exactly does OpenDAL fit into the Iceberg-rust catalog? Would OpenDAL help standardize accessing data from the catalog? The custom metadata Tracking issues of user metadata support opendal#4842 feature could also be useful for connecting tables to different authz commands.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions