Skip to content

Conversation

@yeya24
Copy link
Collaborator

@yeya24 yeya24 commented Jan 6, 2026

This PR introduces an automatic parquet file sharding based on the limit of number of columns in the generated parquet file. The parquet go library has a max limit of 32767 columns so this PR tries to shard based on that whenever number of columns would exceed the limit.

The implementation is using the same approach as thanos-io/thanos-parquet-gateway#34 to use a map to keep track of number of labels and split whenever reaching limit.

It tries to handle both sharded reader and single reader path. We used to not shard at all in the single reader scenario but if it identifies that there are more label names than the configured limit, it tries to use the same code path to shard blocks.

@yeya24 yeya24 force-pushed the shard-conversion-column-limit branch from da336a1 to 0d469a3 Compare January 6, 2026 17:47
Signed-off-by: yeya24 <benye@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant