-
Notifications
You must be signed in to change notification settings - Fork 111
Description
Feature Summary
Background
Currently, computing units are responsible for writing execution results and execution records directly to PostgreSQL / Iceberg, and execution result files are stored inside the computing units. This leads to several issues:
- Result persistence logic lives inside the computing units.
- Computing units need direct DB access and JDBC configuration.
To improve isolation, we want to move execution-result handling out of the computing units into a dedicated microservice.
Proposed Solution
Architecture Overview
Migrate computing units to use existing services instead of direct database access:
-
REST Catalog Service (e.g.,
tabulario/iceberg-rest) for Iceberg catalog operations- CUs use REST Catalog client instead of direct JDBC connection.
- Database credentials only known to REST Catalog Service.
-
MinIO (S3-compatible object storage) for storing result files
- CUs write Parquet files directly to MinIO using S3 API.
- Global, shared storage accessible from all CUs.
Components
1. REST Catalog Service Integration
- Deploy existing REST Catalog Service
- CUs use
RESTCatalogclient - Database credentials for Iceberg catalog are only known to REST Catalog Service, not to CUs.
2. MinIO for File Storage
- Configure Iceberg to use MinIO (S3) as warehouse instead of local file system.
- CUs write result files directly to MinIO buckets using S3 API.
- CUs read result files directly from MinIO.
High-level Data Flow
New Architecture:
-
Computing Unit → REST Catalog Service (Iceberg catalog operations via REST API)
-
REST Catalog Service → PostgreSQL (Iceberg catalog metadata)
-
Computing Unit → MinIO (result file storage via S3 API)
High-level diagram of current architecture and the proposed one
Current:
New:
