Keeper readiness probes fail when upgrading setups

With release https://github.com/Altinity/clickhouse-operator/releases/tag/release-0.25.5 , readiness probes have been added back to clickhouse-keeper. But this upgrade prevents cluster formation for fresh setups.
@alex-zaitsev I found one thing. While the readiness probe addition works when you upgrade existing installations with probes however the same fails on fresh installations as the pod itself keeps failing readiness check while trying to resolve the FQDN for other hosts in the cluster. I see logs such as the following for a fresh setup with readiness probes.
```
2025.10.28 09:41:26.967231 [ 65 ] {} <Information> RaftInstance: [ELECTION TIMEOUT] current role: candidate, log last term 8, state term 8, target p 1, my p 1, hb dead, pre-vote NOT done
2025.10.28 09:41:26.967248 [ 65 ] {} <Warning> RaftInstance: total 1 nodes (including this node) responded for pre-vote (term 8, live 0, dead 1), at least 2 nodes should respond. failure count 10
2025.10.28 09:41:26.967262 [ 65 ] {} <Information> RaftInstance: [PRE-VOTE INIT] my id 0, my role candidate, term 8, log idx 716359, log term 8, priority (target 1 / mine 1)
2025.10.28 09:41:26.967270 [ 65 ] {} <Warning> RaftInstance: failed to send prevote request: peer 2 (chk-clickhouse-keeper-chkeeper-chkeeper-0-2:9444) is busy
2025.10.28 09:41:26.967280 [ 65 ] {} <Warning> RaftInstance: failed to send prevote request: peer 1 (chk-clickhouse-keeper-chkeeper-chkeeper-0-1:9444) is busy
2025.10.28 09:41:28.845396 [ 64 ] {} <Warning> RaftInstance: Election timeout, initiate leader election
2025.10.28 09:41:28.845469 [ 64 ] {} <Information> RaftInstance: [PRIORITY] decay, target 1 -> 1, mine 1
2025.10.28 09:41:28.845483 [ 64 ] {} <Information> RaftInstance: [ELECTION TIMEOUT] current role: candidate, log last term 8, state term 8, target p 1, my p 1, hb dead, pre-vote NOT done
2025.10.28 09:41:28.845497 [ 64 ] {} <Warning> RaftInstance: total 1 nodes (including this node) responded for pre-vote (term 8, live 0, dead 1), at least 2 nodes should respond. failure count 11
2025.10.28 09:41:28.845521 [ 64 ] {} <Information> RaftInstance: [PRE-VOTE INIT] my id 0, my role candidate, term 8, log idx 716359, log term 8, priority (target 1 / mine 1)
2025.10.28 09:41:28.845529 [ 64 ] {} <Warning> RaftInstance: failed to send prevote request: peer 2 (chk-clickhouse-keeper-chkeeper-chkeeper-0-2:9444) is busy
2025.10.28 09:41:28.845539 [ 64 ] {} <Warning> RaftInstance: failed to send prevote request: peer 1 (chk-clickhouse-keeper-chkeeper-chkeeper-0-1:9444) is busy
2025.10.28 09:41:29.681296 [ 67 ] {} <Warning> RaftInstance: peer (2) response error: failed to resolve host chk-clickhouse-keeper-chkeeper-chkeeper-0-2 due to error 2, Host not found (non-authoritative), try again later
2025.10.28 09:41:30.015660 [ 66 ] {} <Warning> RaftInstance: Election timeout, initiate leader election
2025.10.28 09:41:30.015747 [ 66 ] {} <Information> RaftInstance: [PRIORITY] decay, target 1 -> 1, mine 1
2025.10.28 09:41:30.015767 [ 66 ] {} <Information> RaftInstance: [ELECTION TIMEOUT] current role: candidate, log last term 8, state term 8, target p 1, my p 1, hb dead, pre-vote NOT done
2025.10.28 09:41:30.015785 [ 66 ] {} <Information> RaftInstance: reset RPC client for peer 2
2025.10.28 09:41:30.015909 [ 66 ] {} <Warning> RaftInstance: total 1 nodes (including this node) responded for pre-vote (term 8, live 0, dead 1), at least 2 nodes should respond. failure count 12
2025.10.28 09:41:30.015929 [ 66 ] {} <Information> RaftInstance: [PRE-VOTE INIT] my id 0, my role candidate, term 8, log idx 716359, log term 8, priority (target 1 / mine 1)
2025.10.28 09:41:30.015948 [ 66 ] {} <Warning> RaftInstance: failed to send prevote request: peer 1 (chk-clickhouse-keeper-chkeeper-chkeeper-0-1:9444) is busy
```
As a result, the first replica itself is stuck and cluster doesn't progress towards completion. But while upgrading, since hosts already exist, I think that is why cluster formation succeeds.
Related issue: https://github.com/Altinity/clickhouse-operator/issues/1846

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Keeper readiness probes fail when upgrading setups #1856

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Keeper readiness probes fail when upgrading setups #1856

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions