Convert indexer tables to TimescaleDB hypertables by aditya1702 · Pull Request #486 · stellar/wallet-backend

aditya1702 · 2026-02-02T21:55:57Z

What

[TODO: Short statement about what is changing.]

Why

[TODO: Why this change is being made. Include any context required to understand the why.]

Known limitations

[TODO or N/A]

Issue that this PR addresses

[TODO: Attach the link to the GitHub issue or task. Include the priority of the task here in addition to the link.]

Checklist

PR Structure

It is not possible to break this PR down into smaller PRs.
This PR does not mix refactoring changes with feature changes.
This PR's title starts with name of package that is most changed in the PR, or all if the changes are broad or impact many packages.

Thoroughness

This PR adds tests for the new functionality or fixes.
All updated queries have been tested (refer to this check if the data set returned by the updated query is expected to be same as the original one).

Release

This is not a breaking change.
This is ready to be tested in development.
The new functionality is gated with a feature flag if this is not ready for production.

… tests) Bring in BYTEA type definitions, indexer changes, GraphQL schema/resolver updates, processor changes, and test files from opxdr-bytea-2 branch. These files had no modifications on the timescale branch.

…hemas Change hash, account_id, operation_xdr, and address columns from TEXT to BYTEA type while preserving all TimescaleDB hypertable configuration, composite primary keys, and chunk settings.

… statechanges) Convert hash, account_id, operation_xdr, and address columns from TEXT to BYTEA in BatchInsert, BatchCopy, and query methods. Uses HashBytea, AddressBytea, and pgtypeBytesFromNullAddressBytea for type conversions. Preserves TimescaleDB junction table patterns (ledger_created_at columns, composite primary keys, parameter numbering).

Adopt BYTEA types (HashBytea, AddressBytea, XDRBytea, NullAddressBytea) in test data while preserving TimescaleDB-specific patterns: - Keep ledger_created_at in junction table INSERTs - Use generic "duplicate key value violates unique constraint" assertions (TimescaleDB chunk-based constraint names differ from standard PG) - Keep 5-value return from processLedgersInBatch (includes startTime/endTime)

Rename inner err to addrErr in address BYTEA conversion loops to avoid shadowing the outer err variable from pgx.CopyFrom.

Copilot

Pull request overview

This PR converts the indexer tables (transactions, operations, state_changes, and their account junction tables) to TimescaleDB hypertables with columnstore compression for efficient time-series data storage. The changes include significant schema modifications, removal of foreign keys (which aren't supported on hypertables), updates to primary keys to include the partitioning column (ledger_created_at), and implementation of automatic chunk compression for historical backfills. Additionally, the RPC service health tracking functionality has been removed, with health checks now performed on-demand.

Changes:

Convert transactions, operations, and state_changes tables to TimescaleDB 3.0+ hypertables with ledger_created_at as the partition column
Add automatic chunk compression logic for backfilled historical data
Remove RPC health tracking background service, replacing it with on-demand health checks
Update all primary keys and ON CONFLICT clauses to include ledger_created_at
Upgrade container images from PostgreSQL 14 to TimescaleDB 2.25.0

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
internal/db/migrations/2025-06-10.2-transactions.sql	Converts transactions and transactions_accounts to TimescaleDB hypertables with new primary keys
internal/db/migrations/2025-06-10.3-operations.sql	Converts operations and operations_accounts to TimescaleDB hypertables with segmentation
internal/db/migrations/2025-06-10.4-statechanges.sql	Converts state_changes to TimescaleDB hypertable, removes FK constraint
internal/data/transactions.go	Updates BatchInsert/BatchCopy to include ledger_created_at in junction table inserts
internal/data/operations.go	Updates BatchInsert/BatchCopy to include ledger_created_at in junction table inserts
internal/data/statechanges.go	Updates ON CONFLICT clause to match new composite primary key
internal/data/query_utils.go	Fixes column parsing to properly handle comma-separated column lists
internal/data/*_test.go	Updates test data inserts to include ledger_created_at in junction tables
internal/services/ingest_backfill.go	Adds compressBackfilledChunks function to compress historical data chunks
internal/services/rpc_service.go	Removes TrackRPCServiceHealth method and related documentation
internal/services/rpc_service_test.go	Removes tests for TrackRPCServiceHealth functionality
internal/integrationtests/infrastructure/helpers.go	Replaces heartbeat channel with direct health polling
internal/integrationtests/infrastructure/main_setup.go	Removes health tracking goroutine startup
internal/integrationtests/infrastructure/containers.go	Updates DB container to use TimescaleDB latest
internal/db/dbtest/dbtest.go	Adds TimescaleDB extension enablement for test databases
docker-compose.yaml	Updates image to TimescaleDB 2.25.0 and adds extension creation
.github/workflows/go.yaml	Updates CI image to TimescaleDB 2.25.0

Comments suppressed due to low confidence (1)

internal/db/migrations/2025-06-10.2-transactions.sql:48

The migration file is dated 2025-06-10, which is in the future (current date is February 11, 2026). This suggests the migration was created in June 2025. However, there are later migrations (2025-12-14, 2026-01-12, 2026-01-15, 2026-01-16) that already exist. This appears to be correct chronologically.

More critically, the TimescaleDB declarative syntax (WITH tsdb.hypertable, tsdb.partition_column, etc.) is only supported in TimescaleDB 3.0+, which was released in November 2024. The docker-compose.yaml specifies timescale/timescaledb:2.25.0-pg17, which is TimescaleDB 2.x and does not support this syntax. This migration will fail.

You should either:

Update to TimescaleDB 3.x (e.g., timescale/timescaledb:latest-pg17), OR
Use the traditional SELECT create_hypertable() function-based syntax for TimescaleDB 2.x compatibility

-- +migrate Up

-- Table: transactions (TimescaleDB hypertable with columnstore)
CREATE TABLE transactions (
    ledger_created_at TIMESTAMPTZ NOT NULL,
    to_id BIGINT NOT NULL,
    hash BYTEA NOT NULL,
    envelope_xdr TEXT,
    fee_charged BIGINT NOT NULL,
    result_code TEXT NOT NULL,
    meta_xdr TEXT,
    ledger_number INTEGER NOT NULL,
    is_fee_bump BOOLEAN NOT NULL DEFAULT false,
    ingested_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    PRIMARY KEY (ledger_created_at, to_id)
) WITH (
    tsdb.hypertable,
    tsdb.partition_column = 'ledger_created_at',
    tsdb.chunk_interval = '1 day',
    tsdb.orderby = 'ledger_created_at DESC'
);

CREATE INDEX idx_transactions_hash ON transactions(hash);
CREATE INDEX idx_transactions_to_id ON transactions(to_id);

-- Table: transactions_accounts (TimescaleDB hypertable for automatic cleanup with retention)
CREATE TABLE transactions_accounts (
    ledger_created_at TIMESTAMPTZ NOT NULL,
    tx_to_id BIGINT NOT NULL,
    account_id BYTEA NOT NULL,
    PRIMARY KEY (ledger_created_at, account_id, tx_to_id)
) WITH (
    tsdb.hypertable,
    tsdb.partition_column = 'ledger_created_at',
    tsdb.chunk_interval = '1 day',
    tsdb.orderby = 'ledger_created_at DESC',
    tsdb.sparse_index = 'bloom(account_id)'
);

CREATE INDEX idx_transactions_accounts_tx_to_id ON transactions_accounts(tx_to_id);
CREATE INDEX idx_transactions_accounts_account_id ON transactions_accounts(account_id);

-- +migrate Down

-- Tables
DROP TABLE IF EXISTS transactions_accounts CASCADE;
DROP TABLE IF EXISTS transactions CASCADE;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-11T14:00:38Z

internal/integrationtests/infrastructure/helpers.go

+			healthRes, err := rpcService.GetHealth()
+			if err == nil {
+				if healthRes.Status == "healthy" {
+					return nil


The WaitForRPCHealthAndRun function has a critical bug: after successfully checking RPC health (lines 60-65), it returns nil immediately, but never calls the onReady callback function that was passed as a parameter. This breaks the intended workflow where onReady should be executed after the RPC service becomes healthy.

The function should call onReady() before returning nil when the health check succeeds, similar to the original implementation that was removed in rpc_service_test.go.

Suggested change

return nil

return onReady()

internal/services/ingest_backfill.go

.github/workflows/go.yaml

Copilot · 2026-02-11T14:00:39Z

.github/workflows/go.yaml

          POSTGRES_DB: postgres
          POSTGRES_PASSWORD: postgres
-          PGHOST: localhost
+          PGHOST: /var/run/postgresql


The PGHOST environment variable is set to /var/run/postgresql in the postgres service (line 95) but overridden to localhost in the test environment (line 104). This configuration mismatch is confusing.

The PGHOST=/var/run/postgresql setting in the service configuration is typically used for Unix domain socket connections within the container, but since the service exposes port 5432:5432 and the tests connect via TCP from the host, the test environment correctly uses localhost. The service-level PGHOST setting appears unnecessary and potentially misleading.

Consider removing line 95 (PGHOST: /var/run/postgresql) from the postgres service configuration as it's not needed for the GitHub Actions workflow where connections are made via TCP from the host.

Suggested change

PGHOST: /var/run/postgresql

docker-compose.yaml

Copilot · 2026-02-11T14:00:39Z

internal/integrationtests/infrastructure/helpers.go

+				if healthRes.Status == "healthy" {
+					return nil
+				}
 			}


The infinite polling loop with no delay will cause 100% CPU usage when the RPC service is not yet healthy. The default case in the select statement immediately retries GetHealth() without any sleep/delay between attempts.

Add a time.Sleep() after the health check to prevent CPU spin-looping, for example: time.Sleep(100 * time.Millisecond) or use a ticker instead of the default case.

Suggested change

}

}

// Avoid a tight spin loop when the RPC service is not yet healthy.

time.Sleep(100 * time.Millisecond)

Copilot · 2026-02-11T14:00:40Z

internal/services/ingest_backfill.go

+
+		var chunks []string
+		for rows.Next() {
+			var chunk string
+			if err := rows.Scan(&chunk); err != nil {
+				continue
+			}
+			chunks = append(chunks, chunk)
+		}
+		rows.Close()


The rows.Close() call on line 645 should be deferred to ensure it's called even if there's an error during iteration. Additionally, the error from rows.Scan() is being silently ignored on line 640. If there's a scanning error, it should be logged at minimum.

Consider replacing line 645 with defer rows.Close() after line 631, and handle the scanning error properly instead of continuing silently.

Suggested change

var chunks []string

for rows.Next() {

var chunk string

if err := rows.Scan(&chunk); err != nil {

continue

}

chunks = append(chunks, chunk)

}

rows.Close()

defer rows.Close()

var chunks []string

for rows.Next() {

var chunk string

if err := rows.Scan(&chunk); err != nil {

log.Ctx(ctx).Warnf("Failed to scan chunk row for table %s: %v", table, err)

continue

}

chunks = append(chunks, chunk)

}

if err := rows.Err(); err != nil {

log.Ctx(ctx).Warnf("Error encountered while iterating chunks for table %s: %v", table, err)

}

internal/db/migrations/2025-06-10.2-transactions.sql

aditya1702 added 21 commits February 2, 2026 16:51

add timescale db hypertable

37c03e8

update docker and test files to use timescaledb

a963d23

Fix tests

ce9d7d0

convert junction tables to hypertables

8badbef

fix failing tests

f2bff57

remove comments

2bc2180

changes to indexes

34124ee

Merge branch 'db-indexes' into timescale

63de994

Update 2025-06-10.2-transactions.sql

e51e697

Update 2025-06-10.2-transactions.sql

3fb3a6c

compress chunks at the end

66c39be

parallely compress chunks

1bf6580

Update ingest_backfill.go

2fffe5d

Update ingest_backfill.go

fc165bf

Update ingest_backfill.go

38d08c1

Cherry-pick opxdr-bytea-2 files (types, indexer, graphql, processors,…

369e79a

… tests) Bring in BYTEA type definitions, indexer changes, GraphQL schema/resolver updates, processor changes, and test files from opxdr-bytea-2 branch. These files had no modifications on the timescale branch.

Migrate TEXT columns to BYTEA in transaction/operation/statechange sc…

1224c17

…hemas Change hash, account_id, operation_xdr, and address columns from TEXT to BYTEA type while preserving all TimescaleDB hypertable configuration, composite primary keys, and chunk settings.

Fix shadow variable warnings in BatchCopy address conversions

ec91fe7

Rename inner err to addrErr in address BYTEA conversion loops to avoid shadowing the outer err variable from pgx.CopyFrom.

Update go.yaml

d013efa

aditya1702 changed the base branch from db-indexes to opxdr-bytea-2 February 10, 2026 20:03

aditya1702 added 6 commits February 10, 2026 15:15

merge branch opxr-bytea-2

235aa88

Update go.yaml

3abf801

Update go.yaml

2b9e98e

Update docker-compose.yaml

6c55228

Update query_utils.go

9f41d08

remove flaky RPC test

d34dc37

aditya1702 changed the title ~~Timescale~~ Convert indexer tables to TimescaleDB hypertables Feb 10, 2026

Update helpers.go

516251b

aditya1702 added 2 commits February 10, 2026 16:50

Update helpers.go

bb35d75

add bloom filter on account_id

71869d7

aditya1702 requested a review from Copilot February 11, 2026 13:54

Copilot started reviewing on behalf of aditya1702 February 11, 2026 13:54 View session

Copilot AI reviewed Feb 11, 2026

View reviewed changes

aditya1702 added 7 commits February 11, 2026 10:37

remove BatchInsert

147a67f

update hypertable config - 1

1561630

remove Duplicate failure tests

ff75bf7

Add extra orderby columns and other hypertable configration

e85fb63

Update normal B-tree indexes

7a6334b

enable chunk skipping

6768f8d

Update dbtest.go

a415099

aditya1702 marked this pull request as ready for review February 11, 2026 17:56

aditya1702 added 17 commits February 11, 2026 13:55

Update containers.go

d0e76df

Compress backfill chunks parallelly using goroutine

ed71815

remove schemas for mainnet and testnet

b9305e5

Enable direct compress and recompression for backfilling

5a5ac6e

Add command line configuration to set retention policy and chunk size

77dbbc7

Update ingest.go

7ef1866

add batch index tracking to logs

6488155

Update ingest_backfill.go

3db78c8

update auto-vaccum settings for balances tables

7f578ea

add explanations for the values

cefebf5

Disable FK checks during checkpoint population

22d6f33

Update ingest_live.go

4c7afba

Update ingest_live.go

88586d1

Update ingest_live.go

5896323

Update ingest_backfill.go

b4e4abe

fix ledger_created_at bug

0183370

update oldest ledger using timescaledb job

5ccac7f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert indexer tables to TimescaleDB hypertables#486

Convert indexer tables to TimescaleDB hypertables#486
aditya1702 wants to merge 54 commits intoopxdr-bytea-2from
timescale

aditya1702 commented Feb 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-			}
+			}
+			// Avoid a tight spin loop when the RPC service is not yet healthy.
+			time.Sleep(100 * time.Millisecond)

Conversation

aditya1702 commented Feb 2, 2026

What

Why

Known limitations

Issue that this PR addresses

Checklist

PR Structure

Thoroughness

Release

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant