[Suggestion] Tools to check replica freshness during rolling restarts #15098

the-mikedavis · 2025-12-09T19:17:34Z

the-mikedavis
Dec 9, 2025
Maintainer

RabbitMQ series

4.2.x

Operating system (distribution) used

Linux

How is RabbitMQ deployed?

Other

What would you like to suggest for a future version of RabbitMQ?

During a rolling restart, QQs and streams can lose availability if one restart takes a long time and then another member restarts too quickly. When publishing at high throughput to a stream, for example, a node which is restarting will come back up behind on replication. If one of the other two nodes goes offline too quickly then the stream won't be able to make progress (moving the commit offset forward) until the first node catches up on replication. This can happen for QQs too but it is less likely since QQs can't accept data at the same very high throughput, and under normal usage they would take snapshots that would reduce the amount of data to replicate.

rabbitmq-queues check_if_node_is_quorum_critical does not catch these situations since it is only checking membership. The restarted node which is behind on replication still counts towards a quorum membership-wise, but it won't help making progress until it catches up. It would be useful to have a similar command (or extend check_if_node_is_quorum_critical) which would list queues where a replica is far enough behind that stopping another replica would stop progress. Then automation performing rolling restarts could wait to restart other nodes until replicas are close enough to leaders. Ideally this "freshness" metric could be quantified so that automation could distinguish between positive progress and replication becoming stuck for some reason (for example a partition).

gomoripeti · 2025-12-09T21:50:56Z

gomoripeti
Dec 9, 2025

we had exactly the same wait-check in mind during rolling restarts (we only thought of quorum queues, but obviously same applies to streams). what we thought to do is first take note of the last log index of the leader of each QQ and then wait until all the followers reach that log index.

Do I remember correctly that this waiting is kind of already done when adding new members, as they stay in promotable membership state until they are synced.

@Ayanda-D contributed multiple cli commands and improvements around QQ safety recently. I'm sure he is interested in this topic too.

1 reply

the-mikedavis Dec 9, 2025
Maintainer Author

Right, exactly. It's actually easier to join a new instance because it starts as promotable, so it fails the check_if_node_is_quorum_critical until it becomes a voter by catching up to the promotion target.

One way to approach this could be to add a way to stop a node and convert the QQ members to a promotable-like membership state as it shuts down. And then as the node restarts it would become promotable and once it reaches the target, check_if_node_is_quorum_critical would stop failing. I haven't really thought this approach through - changing states often and/or automatically is a bit scary. I looked around at other Raft implementations but couldn't find nice prior art for solving this.

the-mikedavis · 2025-12-09T22:59:48Z

the-mikedavis
Dec 9, 2025
Maintainer Author

There is some prior art for this in rabbit_stream_coordinator:add_replica/2:

rabbitmq-server/deps/rabbit/src/rabbit_stream_coordinator.erl

Lines 209 to 221 in 765d2c5

    
           %% performing safety check 
        
           %% if any replica is stale then we should not allow 
        
           %% further replicas to be added 
        
           Pid = amqqueue:get_pid(Q), 
        
           try 
        
               ReplState0 = osiris_writer:query_replication_state(Pid), 
        
               {{_, InitTs}, ReplState} = maps:take(node(Pid), ReplState0), 
        
               {MaxTs, MinTs} = maps:fold(fun (_, {_, Ts}, {Max, Min}) -> 
        
                                                  {max(Ts, Max), min(Ts, Min)} 
        
                                          end, {InitTs, InitTs}, ReplState), 
        
               case (MaxTs - MinTs) > ?REPLICA_FRESHNESS_LIMIT_MS of 
        
                   true -> 
        
                       {error, {disallowed, out_of_sync_replica}};

The stream coordinator asks the writer for its replica information and compares the tail of the writer's log to the writer's knowledge of the replicas' last replicated chunks. If the max diff in the timestamps is too large (a replica is behind on replication) then the stream coordinator refuses to add a new member.

0 replies

michaelklishin · 2025-12-10T05:07:37Z

michaelklishin
Dec 10, 2025
Maintainer

A similar command sounds good but I'm curious how it's going to work in the most painful scenario for QQs: when there are many thousands of them.

Thousands of cluster-wide calls are operationally problematic beyond a certain scale, so any solution we pick should be node local to the extent possible, even if it won't be perfectly precise.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Suggestion] Tools to check replica freshness during rolling restarts #15098

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Suggestion] Tools to check replica freshness during rolling restarts #15098

Uh oh!

the-mikedavis Dec 9, 2025 Maintainer

RabbitMQ series

Operating system (distribution) used

How is RabbitMQ deployed?

What would you like to suggest for a future version of RabbitMQ?

Replies: 3 comments · 1 reply

Uh oh!

gomoripeti Dec 9, 2025

Uh oh!

the-mikedavis Dec 9, 2025 Maintainer Author

Uh oh!

the-mikedavis Dec 9, 2025 Maintainer Author

Uh oh!

michaelklishin Dec 10, 2025 Maintainer

the-mikedavis
Dec 9, 2025
Maintainer

Replies: 3 comments 1 reply

gomoripeti
Dec 9, 2025

the-mikedavis Dec 9, 2025
Maintainer Author

the-mikedavis
Dec 9, 2025
Maintainer Author

michaelklishin
Dec 10, 2025
Maintainer