-
Notifications
You must be signed in to change notification settings - Fork 523
Description
With release https://github.com/Altinity/clickhouse-operator/releases/tag/release-0.25.5 , readiness probes have been added back to clickhouse-keeper. But this upgrade prevents cluster formation for fresh setups.
@alex-zaitsev I found one thing. While the readiness probe addition works when you upgrade existing installations with probes however the same fails on fresh installations as the pod itself keeps failing readiness check while trying to resolve the FQDN for other hosts in the cluster. I see logs such as the following for a fresh setup with readiness probes.
2025.10.28 09:41:26.967231 [ 65 ] {} <Information> RaftInstance: [ELECTION TIMEOUT] current role: candidate, log last term 8, state term 8, target p 1, my p 1, hb dead, pre-vote NOT done
2025.10.28 09:41:26.967248 [ 65 ] {} <Warning> RaftInstance: total 1 nodes (including this node) responded for pre-vote (term 8, live 0, dead 1), at least 2 nodes should respond. failure count 10
2025.10.28 09:41:26.967262 [ 65 ] {} <Information> RaftInstance: [PRE-VOTE INIT] my id 0, my role candidate, term 8, log idx 716359, log term 8, priority (target 1 / mine 1)
2025.10.28 09:41:26.967270 [ 65 ] {} <Warning> RaftInstance: failed to send prevote request: peer 2 (chk-clickhouse-keeper-chkeeper-chkeeper-0-2:9444) is busy
2025.10.28 09:41:26.967280 [ 65 ] {} <Warning> RaftInstance: failed to send prevote request: peer 1 (chk-clickhouse-keeper-chkeeper-chkeeper-0-1:9444) is busy
2025.10.28 09:41:28.845396 [ 64 ] {} <Warning> RaftInstance: Election timeout, initiate leader election
2025.10.28 09:41:28.845469 [ 64 ] {} <Information> RaftInstance: [PRIORITY] decay, target 1 -> 1, mine 1
2025.10.28 09:41:28.845483 [ 64 ] {} <Information> RaftInstance: [ELECTION TIMEOUT] current role: candidate, log last term 8, state term 8, target p 1, my p 1, hb dead, pre-vote NOT done
2025.10.28 09:41:28.845497 [ 64 ] {} <Warning> RaftInstance: total 1 nodes (including this node) responded for pre-vote (term 8, live 0, dead 1), at least 2 nodes should respond. failure count 11
2025.10.28 09:41:28.845521 [ 64 ] {} <Information> RaftInstance: [PRE-VOTE INIT] my id 0, my role candidate, term 8, log idx 716359, log term 8, priority (target 1 / mine 1)
2025.10.28 09:41:28.845529 [ 64 ] {} <Warning> RaftInstance: failed to send prevote request: peer 2 (chk-clickhouse-keeper-chkeeper-chkeeper-0-2:9444) is busy
2025.10.28 09:41:28.845539 [ 64 ] {} <Warning> RaftInstance: failed to send prevote request: peer 1 (chk-clickhouse-keeper-chkeeper-chkeeper-0-1:9444) is busy
2025.10.28 09:41:29.681296 [ 67 ] {} <Warning> RaftInstance: peer (2) response error: failed to resolve host chk-clickhouse-keeper-chkeeper-chkeeper-0-2 due to error 2, Host not found (non-authoritative), try again later
2025.10.28 09:41:30.015660 [ 66 ] {} <Warning> RaftInstance: Election timeout, initiate leader election
2025.10.28 09:41:30.015747 [ 66 ] {} <Information> RaftInstance: [PRIORITY] decay, target 1 -> 1, mine 1
2025.10.28 09:41:30.015767 [ 66 ] {} <Information> RaftInstance: [ELECTION TIMEOUT] current role: candidate, log last term 8, state term 8, target p 1, my p 1, hb dead, pre-vote NOT done
2025.10.28 09:41:30.015785 [ 66 ] {} <Information> RaftInstance: reset RPC client for peer 2
2025.10.28 09:41:30.015909 [ 66 ] {} <Warning> RaftInstance: total 1 nodes (including this node) responded for pre-vote (term 8, live 0, dead 1), at least 2 nodes should respond. failure count 12
2025.10.28 09:41:30.015929 [ 66 ] {} <Information> RaftInstance: [PRE-VOTE INIT] my id 0, my role candidate, term 8, log idx 716359, log term 8, priority (target 1 / mine 1)
2025.10.28 09:41:30.015948 [ 66 ] {} <Warning> RaftInstance: failed to send prevote request: peer 1 (chk-clickhouse-keeper-chkeeper-chkeeper-0-1:9444) is busy
As a result, the first replica itself is stuck and cluster doesn't progress towards completion. But while upgrading, since hosts already exist, I think that is why cluster formation succeeds.
Related issue: #1846