feat: DB2 LUW (Linux, Unix, Windows) as source connector #694

ImDoubD-datazip · 2025-12-22T10:21:54Z

Description

DB2 LUW as source connector is added. As of now it supports 2 sync modes:

Full Refresh
Incremental

Chunking is either done via primary keys (if primary keys are present). Else RID based chunking is done (for tables with no primary keys).

Pre-Requisite

To run DB2 LUW, one needs IBM data server ODBC and CLI driver to be installed in the machine.

Steps to run DB2 LUW in your machine:

Download the IBM data server ODBC and CLI driver as per OS.
Now extract the downloadable content from the above URLs, a folder named clidriver will be created in your directory.
Now set the environment variables in pc terminal, necessary to run DB2 LUW.

# IBM DB2 CLI environment
export IBM_DB_HOME=/pathto/clidriver
export PATH=$IBM_DB_HOME/bin:$PATH
export CGO_CFLAGS="-I$IBM_DB_HOME/include"
export CGO_LDFLAGS="-L$IBM_DB_HOME/lib -Wl,-rpath,$IBM_DB_HOME/lib"
export DYLD_LIBRARY_PATH=$IBM_DB_HOME/lib

Now try to run the discover command and sync.

In this PR, the base alpine image has been changed to `debian:bookworm-slim` for better db2 , other database drivers support.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

This has been tested on DB2 LUW VM Instance. Full refresh and incremental were tested in this.

Documentation

Documentation Link: [link to README, olake.io/docs, or olake-docs]
N/A (bug fix, refactor, or test changes only)

Related PR's (If Any):

vishalm0509 · 2025-12-26T03:35:54Z

pkg/jdbc/jdbc.go

+			TRIM(TABSCHEMA) AS table_schema,
+			TRIM(TABNAME) AS table_name
+		FROM SYSCAT.TABLES
+		WHERE TYPE IN ('T', 'V')


Should we select View also ?

will ask product and do changes accordingly.

vishalm0509 · 2025-12-26T04:37:30Z

drivers/db2/internal/backfill.go

+		err := d.client.QueryRowContext(ctx, existsQuery).Scan(&hasRows)
+
+		if err != nil {
+			return nil, fmt.Errorf("failed to check if table has rows: %s", err)
+		}


Suggested change

err := d.client.QueryRowContext(ctx, existsQuery).Scan(&hasRows)

if err != nil {

return nil, fmt.Errorf("failed to check if table has rows: %s", err)

}

err := d.client.QueryRowContext(ctx, existsQuery).Scan(&hasRows)

if err != nil {

return nil, fmt.Errorf("failed to check if table has rows: %s", err)

}

vishalm0509 · 2025-12-26T05:43:41Z

drivers/db2/internal/backfill.go

+		return chunks, nil
+	}
+	// split chunks via physical identifier RID()
+	splitViaRID := func(ctx context.Context, stream types.StreamInterface) (*types.Set[types.Chunk], error) {


is it safe to use RID for chunking ?

for non-primary key, I think we should use it because in table with no primary keys there is less chance of any column to be indexed. so for those kind of tables, it is better to use RID.

vishalm0509 · 2025-12-30T10:34:26Z

Column type: DBCLOB

In the databae

In destination (Glue)

vishalm0509 · 2025-12-30T10:44:17Z

Column type: varbinary

"varbinary": types.String
Also column col_varbinary

In the database
In the destination

vishalm0509 · 2025-12-30T10:47:16Z

Column name: col_long_vargraphic
Column type: LONG VARGRAPHIC
Not mapped in OLake

In the database
In the destination

vishalm0509 · 2025-12-30T11:47:38Z

Incremental test

Table: DB2_ALL_DATATYPES

"sync_mode": "incremental",
"cursor_field": "COL_TIMESTAMP:COL_TIME",

I checked for col_bigInt, it's working fine. We need to check for timestamp based cols similar to Oracle

Dockerfile

release-tool.sh

vaibhav-datazip · 2026-01-06T05:49:18Z

drivers/abstract/incremental.go

+	switch v := cursorValue.(type) {
+	case time.Time:
+		if a.driver.Type() == string(constants.DB2) {
+			return v.Format("2006-01-02 15:04:05.000000")


isn't there a timestamp aware format for db2 ?

This is the format we require for DB2 timestamp to be saved in state.

drivers/db2/internal/backfill.go

utils/typeutils/reformat.go

pkg/jdbc/jdbc.go

vishalm0509 · 2026-01-07T21:55:11Z

Column type: DBCLOB

In the databae

* In destination (Glue)

This is still the case in Glue

vishalm0509 · 2026-01-07T21:56:51Z

Column name: col_long_vargraphic

Column type: LONG VARGRAPHIC

Not mapped in OLake

In the database

In the destination

Still not resolved.

vishalm0509 · 2026-01-07T21:58:17Z

col_timestamp

Database: 2024-01-01-10.15.30.123456

Destination: 2024-01-01 10:15:30.123000 UTC

Please check this also

vishalm0509 · 2026-01-07T21:59:22Z

Column: CHAR_ONE

Datatype: CHARACTER

Database: CHAR_ONE

Destination: "Q0hBUl9PTkUgIA=="

THis also

vishalm0509 · 2026-01-07T22:00:34Z

col_time (also col_date)

Datatype: TIME

Databae: 10:15:30

Destination: 0001-01-01 10:15:30 +0000 UTC

"Same case with MSSQL (SQL Server) also"

Database: 14:20:30
Destination: 0001-01-01 14:20:30 +0553 LMT

vishalm0509 · 2026-01-07T22:03:44Z

Incremental test

Table: DB2_ALL_DATATYPES
"sync_mode": "incremental",
"cursor_field": "COL_TIMESTAMP:COL_TIME",
I checked for `col_bigInt`, it's working fine. We need to check for `timestamp` based cols similar to Oracle

This issue is still there

Signed-off-by: Duke <duke@datazip.io>

schitizsharma · 2026-01-09T07:52:23Z

build.sh

+            fi
+            ;;
+        "Linux")
+            download_url="https://public.dhe.ibm.com/ibmdl/export/pub/software/data/db2/drivers/odbc_cli/linuxx64_odbc_cli.tar.gz"


So for linux arm64 don't exist. Is that why we skip the case?

schitizsharma · 2026-01-09T07:54:27Z

build.sh

+            # Clean up any partial downloads from the failed go installer
+            rm -rf "$install_dir/clidriver" 2>/dev/null
+            rm -f "$install_dir"/*.tar.gz 2>/dev/null
+            rm -f "$install_dir"/*.zip 2>/dev/null
+


Let's just do curl. Keep it simple silly!

vaibhav-datazip

Tested for both full-refresh and incremental mode on OLake-CLI

using only primary cursor
using fallback cursor aswell
using float , int, string , timestamp as cursor values
filter using string, timestamp , int .

vaibhav-datazip · 2026-01-08T19:47:19Z

drivers/db2/internal/backfill.go

+		}
+
+		if hasRows {
+			return nil, fmt.Errorf("stats not populated for table[%s]. Please run command:\tRUNSTATS ON TABLE %s.%s WITH DISTRIBUTION AND DETAILED INDEXES ALL;\t to update table statistics", stream.ID(), stream.Namespace(), stream.Name())


instead of writing Please run command, you can mention Please run CLP command:

vaibhav-datazip · 2026-01-09T08:59:31Z

drivers/db2/internal/backfill.go

+
+func (d *DB2) splitTableIntoChunks(ctx context.Context, stream types.StreamInterface) (*types.Set[types.Chunk], error) {
+	// split chunks via primary key
+	splitViaPrimaryKey := func(ctx context.Context, stream types.StreamInterface) (*types.Set[types.Chunk], error) {


I have tried syncing following table with 3 records

CREATE TABLE ALL_DB2_TYPES ( COL_SMALLINT SMALLINT, COL_INTEGER INTEGER, COL_BIGINT BIGINT, COL_DECIMAL DECIMAL(10,2), COL_NUMERIC NUMERIC(8,4), COL_REAL REAL, COL_DOUBLE DOUBLE, COL_DECFLOAT16 DECFLOAT(16), COL_DECFLOAT34 DECFLOAT(34), COL_CHAR10 CHAR(10), COL_VARCHAR50 VARCHAR(50), COL_VARGRAPHIC50 VARGRAPHIC(50), COL_GRAPHIC10 GRAPHIC(10), COL_LONGVARCHAR LONG VARCHAR, COL_LONGVARGRAPHIC LONG VARGRAPHIC, COL_CHAR_BIT CHAR(10) FOR BIT DATA, COL_VARCHAR_BIT VARCHAR(20) FOR BIT DATA, COL_VARBINARY VARBINARY(50), COL_DATE DATE, COL_TIME TIME, COL_TIMESTAMP TIMESTAMP, COL_BOOLEAN BOOLEAN, COL_CLOB CLOB(1M), COL_DBCLOB DBCLOB(500K), COL_BLOB BLOB(500K), COL_XML XML );

I got the following error which syncing

2026-01-09T09:06:38Z DEBUG Starting backfill for DB2INST1.ALL_DB2_TYPES with chunk {6 7} using query: SELECT * FROM "DB2INST1"."ALL_DB2_TYPES" WHERE RID("DB2INST1"."ALL_DB2_TYPES") >= 6 AND RID("DB2INST1"."ALL_DB2_TYPES") < 7 2026-01-09T09:06:38Z INFO Sync completed, wait 5 seconds cleanup in progress... 2026-01-09T09:06:43Z FATAL error occurred while reading records: error occurred while waiting for connections: thread[DB2INST1.ALL_DB2_TYPES_01KEH03WW4FHZKVDDVHQFEKQ43]: failed to insert chunk min[%!s(int64=4)] and max[%!s(int64=6)] of stream DB2INST1.ALL_DB2_TYPES, insert func error: %!s(<nil>), thread error: failed to flush data while closing: failed to write records: failed to send batch: rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8

data in database is invalid utf-8 vlues

vaibhav-datazip · 2026-01-09T09:12:55Z

drivers/db2/internal/backfill.go

+}
+
+func (d *DB2) splitTableIntoChunks(ctx context.Context, stream types.StreamInterface) (*types.Set[types.Chunk], error) {
+	// split chunks via primary key


with 0 records I tried syncing the following table

CREATE TABLE ALL_DB2_TYPES ( COL_SMALLINT SMALLINT, COL_INTEGER INTEGER, COL_BIGINT BIGINT, COL_DECIMAL DECIMAL(10,2), COL_NUMERIC NUMERIC(8,4), COL_REAL REAL, COL_DOUBLE DOUBLE, COL_DECFLOAT16 DECFLOAT(16), COL_DECFLOAT34 DECFLOAT(34), COL_CHAR10 CHAR(10), COL_VARCHAR50 VARCHAR(50), COL_VARGRAPHIC50 VARGRAPHIC(50), COL_GRAPHIC10 GRAPHIC(10), COL_LONGVARCHAR LONG VARCHAR, COL_LONGVARGRAPHIC LONG VARGRAPHIC, COL_CHAR_BIT CHAR(10) FOR BIT DATA, COL_VARCHAR_BIT VARCHAR(20) FOR BIT DATA, COL_VARBINARY VARBINARY(50), COL_DATE DATE, COL_TIME TIME, COL_TIMESTAMP TIMESTAMP, COL_BOOLEAN BOOLEAN, COL_CLOB CLOB(1M), COL_DBCLOB DBCLOB(500K), COL_BLOB BLOB(500K), COL_XML XML );

got this error

2026-01-09T08:56:32Z INFO Sync completed, wait 5 seconds cleanup in progress... 2026-01-09T08:56:37Z FATAL error occurred while reading records: error occurred while waiting for context groups: failed to get or split chunks: failed to get the min and max rid: sql: Scan error on column index 0, name "1": converting NULL to int64 is unsupported

vaibhav-datazip · 2026-01-09T09:16:25Z

drivers/db2/internal/backfill.go

+	logger.Debugf("Starting backfill for %s with chunk %v using query: %s", stream.ID(), chunk, stmt)
+
+	reader := jdbc.NewReader(ctx, stmt, func(ctx context.Context, query string, queryArgs ...any) (*sql.Rows, error) {
+		return d.client.QueryContext(ctx, query, args...)
+	})
+
+	return reader.Capture(func(rows *sql.Rows) error {
+		record := make(types.Record)
+		if err := jdbc.MapScan(rows, record, d.dataTypeConverter); err != nil {
+			return fmt.Errorf("failed to scan record data as map: %s", err)
+		}
+		return OnMessage(ctx, record)


what isolation mode are we using here ?

read committed or cursor stability as it is called by DB2

vaibhav-datazip · 2026-01-09T09:53:56Z

drivers/db2/internal/datatype_conversion.go

+	"dbclob":    types.String,
+
+	// date / time
+	"time":      types.String,


the time I am seeing in dbeaver is different from iceberg

in iceberg

in iceberg it is coming in utc format, 10 -> 4:30 (- 5:30)

vaibhav-datazip · 2026-01-09T10:37:38Z

drivers/db2/internal/datatype_conversion.go

+	"decfloat": types.Float64,
+
+	// boolean
+	"boolean": types.Bool,


there is not boolean type in in db2, found this while testing

vaibhav-datazip · 2026-01-09T13:14:29Z

drivers/db2/internal/backfill.go

+		}
+
+		if hasRows {
+			return nil, fmt.Errorf("stats not populated for table[%s]. Please run CLP command:\tRUNSTATS ON TABLE %s.%s WITH DISTRIBUTION AND DETAILED INDEXES ALL;\t to update table statistics", stream.ID(), stream.Namespace(), stream.Name())


for LOB (CLOB, DBCLOB, BLOB) and XML columns, don't support distribution statistics. Running RUNSTATS will give error while trying to run the given command in database

SQL2310N The utility could not generate statistics. Error "-668" was returned.

please mention about this in the doc as well

vaibhav-datazip · 2026-01-09T13:31:21Z

drivers/db2/internal/datatype_conversion.go

+	"real":     types.Float32,
+	"float":    types.Float64,
+	"numeric":  types.Float64,
+	"double":   types.Float64,


testing with DECFLOAT34 datatype, in db it was

in iceberg its

is this due to spark that we are getting in scientific notation, is there a way we can get as its in db

vaibhav-datazip · 2026-01-09T13:52:07Z

drivers/db2/internal/backfill.go

+	reader := jdbc.NewReader(ctx, stmt, func(ctx context.Context, query string, queryArgs ...any) (*sql.Rows, error) {
+		return d.client.QueryContext(ctx, query, args...)
+	})
+


in some cases blank values are taking NULL in others its blank

ImDoubD-datazip added 9 commits December 19, 2025 15:08

fix: db2 changes

def0e57

fix: changes pulled

f1730ff

fix: changes

4abfdff

fix: changes

9203c36

fix: readme changes

ad3e34b

Merge branch 'staging' into feat/db2-source

e03d155

fix: changes

e8b16bf

fix: pulled changes

687d241

fix: docker file changes

04ff6b4

ImDoubD-datazip had a problem deploying to Publish Driver December 23, 2025 10:24 — with GitHub Actions Failure

fix: changes

3c61d4d

ImDoubD-datazip had a problem deploying to Publish Driver December 24, 2025 09:52 — with GitHub Actions Failure

vishalm0509 had a problem deploying to Publish Driver December 24, 2025 10:02 — with GitHub Actions Failure

vishalm0509 reviewed Dec 26, 2025

View reviewed changes

fix: olake dockerfile changes

6ab380b

vishalm0509 temporarily deployed to Publish Driver December 26, 2025 07:05 — with GitHub Actions Inactive

vishalm0509 temporarily deployed to Publish Driver December 26, 2025 07:47 — with GitHub Actions Inactive

vishalm0509 temporarily deployed to Publish Driver December 26, 2025 07:48 — with GitHub Actions Inactive

vishalm0509 had a problem deploying to Publish Driver December 26, 2025 07:48 — with GitHub Actions Failure

vishalm0509 had a problem deploying to Publish Driver December 26, 2025 09:20 — with GitHub Actions Failure

vishalm0509 had a problem deploying to Publish Driver December 26, 2025 09:21 — with GitHub Actions Failure

vishalm0509 had a problem deploying to Publish Driver December 26, 2025 09:50 — with GitHub Actions Failure

vishalm0509 temporarily deployed to Publish Driver December 26, 2025 09:50 — with GitHub Actions Inactive

vishalm0509 temporarily deployed to Publish Driver December 26, 2025 10:29 — with GitHub Actions Inactive

Merge branch 'staging' into feat/db2-source

3ee9c47

vishalm0509 temporarily deployed to Publish Driver December 29, 2025 06:49 — with GitHub Actions Inactive

Merge branch 'staging' into feat/db2-source

c63fa39

Merge branch 'staging' into feat/db2-source

5dd80ef

schitizsharma requested changes Jan 2, 2026

View reviewed changes

Dockerfile Outdated Show resolved Hide resolved

Dockerfile Show resolved Hide resolved

Dockerfile Show resolved Hide resolved

release-tool.sh Show resolved Hide resolved

ImDoubD-datazip added 3 commits January 2, 2026 17:31

fix: changes

f11ddc0

Merge branch 'staging' into feat/db2-source

c626483

Merge branch 'staging' into feat/db2-source

39ddd6d

vaibhav-datazip reviewed Jan 7, 2026

View reviewed changes

fix: changes

b680312

ImDoubD-datazip added 9 commits January 8, 2026 14:27

Merge branch 'staging' into feat/db2-source

5f5f5b2

Signed-off-by: Duke <duke@datazip.io>

fix: changes

3cc0ffd

Merge branch 'staging' into feat/db2-source

2b94868

fix: test changes

acfa3dd

fix: changes

426910e

Merge branch 'staging' into feat/db2-source

7e58b41

fix: nothing

5a42de4

fix: nothing

482c4c6

fix: chnages in build sh

ec4bcb0

schitizsharma requested changes Jan 9, 2026

View reviewed changes

fix: changes

02b5de7

vaibhav-datazip reviewed Jan 9, 2026

View reviewed changes

vaibhav-datazip and others added 3 commits January 9, 2026 16:19

Merge branch 'staging' into feat/db2-source

553f0e9

fix: review changes

1da282c

fix: 0 records changes

9f16926

vaibhav-datazip reviewed Jan 9, 2026

View reviewed changes

feat: DB2 LUW (Linux, Unix, Windows) as source connector #694

Are you sure you want to change the base?

feat: DB2 LUW (Linux, Unix, Windows) as source connector #694

Uh oh!

Conversation

ImDoubD-datazip commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Pre-Requisite

In this PR, the base alpine image has been changed to debian:bookworm-slim for better db2 , other database drivers support.

Type of change

How Has This Been Tested?

Documentation

Related PR's (If Any):

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vishalm0509 commented Dec 30, 2025

Uh oh!

vishalm0509 commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vishalm0509 commented Dec 30, 2025

Uh oh!

vishalm0509 commented Dec 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vishalm0509 commented Jan 7, 2026

Uh oh!

vishalm0509 commented Jan 7, 2026

Uh oh!

vishalm0509 commented Jan 7, 2026

Uh oh!

vishalm0509 commented Jan 7, 2026

Uh oh!

vishalm0509 commented Jan 7, 2026

Uh oh!

vishalm0509 commented Jan 7, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vaibhav-datazip left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ImDoubD-datazip commented Dec 22, 2025 •

edited

Loading

In this PR, the base alpine image has been changed to `debian:bookworm-slim` for better db2 , other database drivers support.

vishalm0509 commented Dec 30, 2025 •

edited

Loading

ImDoubD-datazip Jan 9, 2026 •

edited

Loading