Skip to content

Migrate to REST Catalog and MinIO for Execution Results #4126

@mengw15

Description

@mengw15

Feature Summary

Background

Currently, computing units are responsible for writing execution results and execution records directly to PostgreSQL / Iceberg, and execution result files are stored inside the computing units. This leads to several issues:

  • Result persistence logic lives inside the computing units.
  • Computing units need direct DB access and JDBC configuration.

To improve isolation, we want to move execution-result handling out of the computing units into a dedicated microservice.

Proposed Solution

Architecture Overview

Migrate computing units to use existing services instead of direct database access:

  1. REST Catalog Service (e.g., tabulario/iceberg-rest) for Iceberg catalog operations

    • CUs use REST Catalog client instead of direct JDBC connection.
    • Database credentials only known to REST Catalog Service.
  2. MinIO (S3-compatible object storage) for storing result files

    • CUs write Parquet files directly to MinIO using S3 API.
    • Global, shared storage accessible from all CUs.

Components

1. REST Catalog Service Integration

  • Deploy existing REST Catalog Service
  • CUs use RESTCatalog client
  • Database credentials for Iceberg catalog are only known to REST Catalog Service, not to CUs.

2. MinIO for File Storage

  • Configure Iceberg to use MinIO (S3) as warehouse instead of local file system.
  • CUs write result files directly to MinIO buckets using S3 API.
  • CUs read result files directly from MinIO.

High-level Data Flow

New Architecture:

  1. Computing UnitREST Catalog Service (Iceberg catalog operations via REST API)

  2. REST Catalog ServicePostgreSQL (Iceberg catalog metadata)

  3. Computing UnitMinIO (result file storage via S3 API)

High-level diagram of current architecture and the proposed one

Current:

Image

New:

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    triagePending for triaging

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions