Skip to content

Conversation

@bohutang
Copy link
Owner

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

refactor(query): stream style block writer for hash join spill

  • preallocate shifted offsets in binary compression to avoid per-push bounds checks
  • keep block reader imports aligned after streaming writer refactor

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

drmingdrmer and others added 30 commits July 31, 2025 17:25
…abendlabs#18458)

* refactor(meta): move Lua functions to metactl namespace

Move all Lua functions from global scope to metactl namespace
to prevent conflicts with other Lua libraries:

- metactl.new_grpc_client() replaces new_grpc_client()
- metactl.spawn() replaces spawn()
- metactl.sleep() replaces sleep()
- metactl.NULL replaces NULL

* docs(meta): add Lua API documentation and benchmarking tools

Add comprehensive documentation for the metactl Lua runtime API,
including all available functions, client methods, and usage patterns.
Add benchmark script and runner for performance testing of concurrent
meta operations.

- Complete API documentation with examples and best practices
- Benchmark script with configurable concurrent workers
- Python test runner with meta service setup automation

* chore: add README to benchmark dir
* fix: vacuum drop table with limit does not work

* add result set

* fix test

* fix test
…atabendlabs#18461)

* feat: add Lua admin client support and metrics subcommand to metactl

- Add MetricsArgs and metrics subcommand to metactl CLI
- Implement LuaAdminClient with admin API methods (metrics, status, transfer_leader, etc.)
- Add new_admin_client function to Lua environment
- Add comprehensive test suite for Lua admin client functionality
- Update utils.py to improve error handling in run_command

* M  tests/metactl/subcommands/cmd_metrics.py
…endlabs#18450)

* [chore] update comment in rule_grouping_sets_to_union.rs

* Addressed

---------

Co-authored-by: sundyli <543950155@qq.com>
* chore: refine cte profile

* chore: add setting

* make lint
* refactor(query): refactor row fetcher for avoid oom

* refactor(query): refactor row fetcher for avoid oom

* refactor(query): refactor row fetcher for avoid oom

* refactor(query): refactor row fetcher for avoid oom

* refactor(query): refactor row fetcher for avoid oom

* refactor(query): refactor row fetcher for avoid oom

* refactor(query): refactor row fetcher for avoid oom

* refactor(query): refactor row fetcher for avoid oom

* refactor(query): refactor row fetcher for avoid oom

* refactor(query): refactor row fetcher for avoid oom

* refactor(query): refactor row fetcher for avoid oom
* fix: collect statistics for MaterializeCTERef

* fix test

* make lint

* fix test
* improve stream write

* fix virtual column builder

* add block statistics

* remove null in bloom index builder

* use metahll

* avoid large string

* fix

* use segment level stats

* add test

* remove unused code

* fix review comment

* fix test

---------

Co-authored-by: Bohu <overred.shuttler@gmail.com>
* fix: By extending the task time unit by 5 times, the hardware weakness can easily lead to CI failure.

* chore: fix test

* chore: fix test
…atabendlabs#18465)

* fix: attach table does not carry the indexes of the original table

* chore: fix test

* chore: fix test

* chore: fix test

* chore: fix test

* chore: fix test

* chore: fix test

* chore: add more index type on attach table test

* chore: fix test

* chore: fix test
* fix: make update table meta idempotent

* add ut

* refine

* fix

* rename func

* polish unit test

---------

Co-authored-by: dantengsky <dantengsky@gmail.com>
…stage (databendlabs#18453)

* feat(query): add zero table

* feat(query): add zero table

* feat(query): add zero table

* feat(query): add zero table

* feat(query): add zero table

* feat(query): add zero table
* fix: missing 'values' when displaying insert stmt.

* feat: add header X-DATABEND-CLIENT-CAPS.

* feat: add header X-DATABEND-CLIENT-CAPS.
* chore(query): add max node quota

* chore(query): add max node quota

* chore(query): add max node quota

* chore(query): add max node quota
drmingdrmer and others added 26 commits September 15, 2025 22:07
…ndlabs#18722)

* fix(meta-service): detach the SysData to avoid race condition

When creating a new level in state-machine, it should detach the SysData
to avoid race condition with snapshot building.

Before this commit, the new writable level and the snapshot compactor
shares the same data thus the new applied data increases the
`last-log-id` of a new built snapshot. Result in a snapshot that
lacks some log entries it declares to have.

* M  src/meta/raft-store/src/sm_v003/compact_immutable_levels_test.rs
* chore: move GetSubTable to separate file

* chore: replace Arc<Mutex<SysData>> with SysData
* chore: add error check on private task test script

* chore: codefmt

* chore: codefmt

* chore: codefmt

* chore: enable private_task_warehouse.sh on ci test private task

* chore: codefmt
…mpatibility (databendlabs#18724)

* fix(query): Set Parquet default encoding to `PLAIN` to ensure data compatibility

* add comments

* fix

* only set encode for decimal column
…#18728)

- Add ANY_VALUE as an alias for the ANY aggregate function to improve compatibility with standard SQL and other database systems
- ANY_VALUE is widely used in BI tools and analytical workloads as shown in the Snowflake paper analyzing 667M queries
- Add comprehensive tests to verify ANY_VALUE works correctly
…bs#18736)

Add new compaction modules: compact_all, compact_conductor, compact_min_adjacent
* refactor: new setting `max_vacuum_threads`

Add new setting `max_vacuum_threads` which effect the degree of concurrency during vacuume operations.

* cargo fmt
- Add DropCallback to call a callback when being dropped.

- Remove `CompactingData`, use `LeveledMap` directly.

- Refine `WriterPermit` and `CompactorPermit` logging.

- When building snapshot, it should acquire both the writer and
  compactor permits, because it needs to modify both the writable and
  the `immutable` data.
…databendlabs#18741)

During long-running SQL queries, the system repeatedly logs empty pages
with rows=0 which creates excessive log noise. This change only logs
non-empty pages and final completion status.

Changes:
- Skip logging empty pages (rows=0) during query execution
- Only log when pages contain actual data (rows>0)
- Log final completion status when query ends with empty page
- Preserve all error and cleanup logs for debugging

This significantly reduces log volume while maintaining visibility
into actual data processing and query completion.
* eat(rbac): procedure object support rbac

* 1. replace p_id to procedure_id
2. add new function create_id_value_with_cleanup process ownership kv
3. modify test

* no need to assert the seq of ownership key

* refactor create_id_value

* refactor cleanup_old_fn

* fix
…cation (databendlabs#18732)

* refactor(query): refactor the join partition to reduce memory amplification

* refactor(query): refactor the join partition to reduce memory amplification

* refactor(query): refactor the join partition to reduce memory amplification

* refactor(query): refactor the join partition to reduce memory amplification

* refactor(query): refactor the join partition to reduce memory amplification

* refactor(query): refactor the join partition to reduce memory amplification
databendlabs#18744)

* fix: fuse_vacuum2 panic while vauuming empty table with data_retention_num_snapshots_to_keep policy

Return early if found table has no snapshot

* revert test config file

* improve(vacuum): enhance vacuum drop table logging for better progress tracking

- Replace verbose TableMeta output with concise table_name(id:table_id) format
- Add clear start/completion markers with === delimiters
- Improve result summary with specific counts of success/failed operations
- Add detailed progress information while preserving all debug data
- Failed table IDs are still logged separately for troubleshooting

* tweak logs

---------

Co-authored-by: BohuTANG <overred.shuttler@gmail.com>
…ndlabs#18749)

- Remove `is_opened` flag from `RaftStore`

- Remove obsolete config no_sync, which is only used by sled tree store

- Add config to MetaRaftLog

- Remove `RaftStoreInner`
@github-actions
Copy link

Pull request description must contain CLA like the following:

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

## Summary

Summary about this PR

- Close #issue

@bohutang bohutang force-pushed the refactor/stream_writer branch from ed36db3 to 7d70667 Compare September 20, 2025 00:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.