Shard parquet files based on configured number of columns limit #131

yeya24 · 2026-01-06T07:39:21Z

This PR introduces an automatic parquet file sharding based on the limit of number of columns in the generated parquet file. The parquet go library has a max limit of 32767 columns so this PR tries to shard based on that whenever number of columns would exceed the limit.

The implementation is using the same approach as thanos-io/thanos-parquet-gateway#34 to use a map to keep track of number of labels and split whenever reaching limit.

It tries to handle both sharded reader and single reader path. We used to not shard at all in the single reader scenario but if it identifies that there are more label names than the configured limit, it tries to use the same code path to shard blocks.

Signed-off-by: yeya24 <benye@amazon.com>

yeya24 added 2 commits January 5, 2026 23:29

shard parquet files based on configured number of columns limit

fd43830

Signed-off-by: yeya24 <benye@amazon.com>

remove explicit reference of 32767 limit value and fix unit test

0d469a3

Signed-off-by: yeya24 <benye@amazon.com>

yeya24 force-pushed the shard-conversion-column-limit branch from da336a1 to 0d469a3 Compare January 6, 2026 17:47

try to reduce usage further

a44f928

Signed-off-by: yeya24 <benye@amazon.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Shard parquet files based on configured number of columns limit #131

Shard parquet files based on configured number of columns limit #131

Uh oh!

yeya24 commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Shard parquet files based on configured number of columns limit #131

Are you sure you want to change the base?

Shard parquet files based on configured number of columns limit #131

Uh oh!

Conversation

yeya24 commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant