Skip to content

Conversation

@ngngwr
Copy link
Collaborator

@ngngwr ngngwr commented Dec 18, 2025

Issues

Rebalance Pipeline Optimisation Efforts

Description

ReadClusterDataStage waste lot of time waiting for the actual ZK call
image
image

Tests

Unit and Integration Tests Added.

…d mode

In distributed controller mode, when a CONTROLLER_PARTICIPANT becomes
the leader for a managed cluster, it creates a new HelixManager with
InstanceType.CONTROLLER. This CONTROLLER instance runs the intensive
ReadClusterDataStage pipeline which benefits significantly from caching.

This change enables ZkCacheBaseDataAccessor for CONTROLLER instances,
caching the following ZK paths:
- /<cluster>/LIVEINSTANCES
- /<cluster>/INSTANCES
- /<cluster>/IDEALSTATES
- /<cluster>/CONFIGS

CONTROLLER_PARTICIPANT instances (which participate in the grand cluster
for leader election only) continue to use the non-cached ZkBaseDataAccessor
since they don't run the ReadClusterDataStage pipeline.
@ngngwr ngngwr changed the title Ngangwar/cache accessor for controller Added ZKHelixManager to use ZKCache Dec 18, 2025
@ngngwr ngngwr changed the title Added ZKHelixManager to use ZKCache Updated ZKHelixManager to use ZKCache Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants