Skip to content

Improve/fix Block server retry logic [JIRA: RCS-235] #1183

@shino

Description

@shino

Handling of insufficient_vnodes in local get.

If local get results in insufficient_vnodes errors, block server
will retry by RetryFun which is do_get_block/11 wrapper.
Then get block will be done by try_local_get, i.e. n_val=1.

{error, {insufficient_vnodes,_,need,_}} ->
RetryFun(NumRetries + 2);

Unused condition

{error, {insufficient_vnodes,_,need,_} = Reason} ->
    RetryFun([{local_one, Reason}|ErrorReasons]);
{error, Why} when Why == notfound;
                  Why == timeout;
                  Why == disconnected;
                  Why == <<"{insufficient_vnodes,0,need,1}">>;
                  Why == {insufficient_vnodes,0,need,1} ->

Second clause with Why = insufficient_vnodes is already matched by
the first one which is broader.

{error, {insufficient_vnodes,_,need,_}} ->
RetryFun(NumRetries + 2);
{error, Why} when Why == notfound;
Why == timeout;
Why == disconnected;
Why == <<"{insufficient_vnodes,0,need,1}">>;
Why == {insufficient_vnodes,0,need,1} ->

sidenote: this may be dialyzer issue (or just restriction?)

Contemplate N values in retries when timeout

Assumingand PG is dislabled or not used, the current implementation
uses n_val in retries as N=1 => N=all => N=1 => N=all => ....

It's not obvious immediately which is better:

  1. current alternating n_val or
  2. N=1 at first and N=all for each subsequent requests.

Retries after proxy get notfound is necessary?

When proxy get is triggered, remote cluster is the origin cluster.
notfound from the origin may indicate data loss? or cluster-wide
failure at origin? 🃏

https://github.com/basho/riak_cs/blob/develop/src/riak_cs_block_server.erl#L313-L318


Some of above findings are based on #1177 (h/t @kuenishi )

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions