Skip to content

Conversation

@mrForza
Copy link
Contributor

@mrForza mrForza commented Dec 15, 2025

This patch introduces a new way of map_callrw execution by which we can pass some arguments to all storages and split buckets' arguments to those storages that have at least one bucket of bucket_ids. To achieve this we introduce a new string option - mode to map_callrw api.

@TarantoolBot document

Title: vshard: mode option for router.map_callrw()

This string option regulates on which storages the user function will be executed via map_callrw. Possible values:

  1. full - the user function will be executed on all storages in
    cluster.
  2. partial - the user function will be executed on certain sotrages on
    which buckets from bucket_ids were found.

After that changes map_callrw works in 4 different scenarios depending on mode and bucket_ids options:

  1. <mode = 'full', bucket_ids = nil> - user function will be executed
    with args on all storages.
  2. <mode = 'full', bucket_ids = {[bid_1] = {b_arg_1}, ...}> - storages
    that have at least one bucket of bucket_ids will execute user
    function with args and additional buckets' arguments. Other storages
    will execute user function only with args.
  3. <mode = 'partial', bucket_ids = {bid_1, ...}> - user function will be
    executed with args on storages that have at least one bucket of
    bucket_ids.
  4. <mode = 'partial', bucket_id = {[bid_1] = {b_arg_1}, ...}> - the same
    as the 3rd scenario but buckets' arguments (b_arg_1, ..., b_arg_n) will
    be added as additional arguments to user function.

Also now map_callrw ends with error in cases of <mode = 'full', bucket_ids = {1, 2, ...}> and <mode = 'partial', bucket_ids = nil>.

Closes #559

Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 🔥! I sure hope it will help the customers 😁.

@Gerold103
Copy link
Collaborator

The commit message doesn't mention the ticket. You need to write Closes #559 right before the TarantoolBot doc request.

@Gerold103 Gerold103 assigned mrForza and unassigned Gerold103 Dec 18, 2025
@mrForza mrForza force-pushed the mrforza/gh-559-full-map-call-rw-with-split-args branch from 1f4e9b4 to 11e76dc Compare December 22, 2025 12:37
@mrForza mrForza assigned Serpentian and unassigned Serpentian and mrForza Dec 23, 2025
Copy link
Collaborator

@Serpentian Serpentian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the patch! It looks great, but I'm afraid, it's a little bit broken now, let's fix and test the new mode with broken cache on the router

@Serpentian Serpentian assigned mrForza and unassigned Serpentian Dec 23, 2025
@mrForza mrForza force-pushed the mrforza/gh-559-full-map-call-rw-with-split-args branch 2 times, most recently from 1fa423e to afbe43e Compare December 26, 2025 10:41
@mrForza mrForza assigned Serpentian and unassigned mrForza Dec 26, 2025
@mrForza mrForza requested a review from Serpentian December 26, 2025 11:04
@Serpentian Serpentian assigned mrForza and unassigned Serpentian Dec 29, 2025
@mrForza mrForza force-pushed the mrforza/gh-559-full-map-call-rw-with-split-args branch 3 times, most recently from 4b2f73b to 2b8e8f1 Compare January 15, 2026 07:29
@mrForza mrForza assigned Serpentian and unassigned mrForza Jan 15, 2026
@mrForza mrForza requested a review from Serpentian January 15, 2026 07:42
Copy link
Collaborator

@Serpentian Serpentian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the patchset. Very good! Seems like the last comments from me

@Serpentian Serpentian assigned mrForza and unassigned Serpentian Jan 15, 2026
@mrForza mrForza force-pushed the mrforza/gh-559-full-map-call-rw-with-split-args branch from 2b8e8f1 to de86696 Compare January 16, 2026 08:48
@mrForza mrForza assigned Serpentian and unassigned mrForza Jan 16, 2026
@mrForza mrForza requested a review from Serpentian January 16, 2026 10:33
Copy link
Collaborator

@Serpentian Serpentian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you a lot! Clean work, IMHO. Hope, it'll help TDG make requests safer

@Serpentian Serpentian assigned Gerold103 and mrForza and unassigned Serpentian Jan 16, 2026
This patch takes initialization of `rid` out to `router_map_callrw` and
passes this variable to ref-functions. It is needed for future features
tidiness, for example - `make full map_callrw with split args` in which
the logic of `router_map_callrw` becomes more complex.

Needed for tarantool#559

NO_DOC=refactoring
NO_TEST=refactoring
This patch introduces a new way of `map_callrw` execution by which we can
pass some arguments to all storages and split buckets' arguments to those
storages that have at least one bucket of `bucket_ids`. To achieve this we
introduce a new string option - `mode` to `map_callrw` api.

Also we change the logic of ref stages in `map_callrw`. Now we primarily
ref storages that have at least one bucket of `bucket_ids` by
`router_ref_storage_by_buckets` function. It can help us to cover
`partial map_callrw` scenarios and a part of `full map_callrw` with split
args. After that if mode is `full` we ref remaining storages by
`router_ref_remaining_storages` function.

Closes tarantool#559

@TarantoolBot document

Title: vshard: `mode` option for `router.map_callrw()`

This string option regulates on which storages the user function will be
executed via `map_callrw`. Possible values:
1) mode = 'partial'. In this mode user function will be executed on
   storages that have at least one bucket of 'bucket_ids'. The
   'bucket_ids' option can be presented in two ways: like a numeric array
   of buckets' ids or like a map of buckets' arguments. In first one user
   function will only receive args, in second one it will additionally
   receive buckets' arguments.
2) mode = 'full'. In this mode user function will be executed with args on
   all storages in cluster. If we pass 'bucket_ids' like a map of bucket's
   arguments the user function will additionally receive buckets'
   arguments on those storages that have at least one bucket of
   'bucket_ids'.

If we didn't specify the 'mode' option, then it is set based on
'bucket_ids' option - if 'bucket_ids' is presented, the mode will be
'partial' otherwise 'full'. Also now `map_callrw` ends with error in
cases of `<mode = 'full', bucket_ids = {1, 2, ...}>` and `<mode =
'partial', bucket_ids = nil>`.
@mrForza mrForza force-pushed the mrforza/gh-559-full-map-call-rw-with-split-args branch from de86696 to edfeebc Compare January 19, 2026 09:21
@mrForza mrForza requested a review from Gerold103 January 19, 2026 10:04
@mrForza mrForza assigned Gerold103 and unassigned Gerold103 and mrForza Jan 19, 2026
Comment on lines 777 to 779
local function router_ref_storage_all(router, timeout, rid)
local function router_ref_storage_all(router, timeout, refed_replicasets, rid)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. Why do you need "refed replicasets"? The stages are Ref + Map.

Ref works in 2 ways:

  • Ref all storages regardless of buckets. This was done previously by router_ref_storage_all() without any replicasets referenced in advance.
  • Or ref specific storages which have at least one bucket. For that we had router_ref_storage_by_buckets(). It was already doing all the checks to make sure that the referenced storage has the needed buckets.

This patch was only supposed to add additional arguments to these already existing ways of execution. Not change the way of referencing. It was previously "ref all or ref by buckets" and remains like this.

What changed here? What am I missing?

Copy link
Contributor Author

@mrForza mrForza Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need "refed replicasets"?

We need refed_replicasets in router_ref_storage_all in order to check that storages which were referenced by router_ref_storage_by_buckets still have refs (in other words they are not expired). Also if we were just use router_ref_storage_by_buckets and then router_ref_storage_all without refed_replicasets, we would have to complicate the logic in this functions to avoid doubled refs for previously refed storages. See Nikita's comment related to this problem.


What changed here? What am I missing?

To show what has changed, I'll draw a table of map_callrw scenarios:

mode bucket_ids ref-functions which we use
partial {bid_1, ..., bid_n} router_ref_storage_by_buckets
partial {bid_1 = {bargs_1}, ..., bid_n = {bargs_n}} router_ref_storage_by_buckets
full nil router_ref_storage_all
full {bid_1 = {bargs_1}, ..., bid_n = {bargs_n}} router_ref_storage_by_buckets + router_ref_storage_all

Our main goal is to implement full map_callrw with split args with minimal cost. We don't need to introduce new ref-functions or new ways of referencing for it. We only want to minimally modify existing ref-functions and combine it in some way. If we want to have full map_callrw with split args we need 2 important things:

  1. ref all storages
  2. build a table of bucket arguments

router_ref_storage_by_buckets + router_ref_storage_all in minimally modified way can help us to achieve these 2 things. Also as you can see, for the 1-3 map_callrw scenarios we continue to use the old ref-functions. In my opinion this patch implements full map_callrw with split args without any serious complication of map_callrw business logic.

Comment on lines +1210 to 1216
if mode == MAP_CALLRW_FULL then
timeout, err, err_id, replicasets_to_map =
router_ref_storage_all(router, timeout, replicasets_to_map, rid)
if not timeout then
goto fail
end
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now in the full mode with buckets we call router_ref_storage_by_buckets() and then router_ref_storage_all()? Why not call just router_ref_storage_all()? The effect would be the same - all storages receiving a ref.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, in full map_callrw with split args we must invoke router_ref_storage_by_buckets, because without using this function the router_group_map_callrw_args will build incorrect table of form {rs_id = {b_id = {b_args}}} as the router's cache may be broken. router_ref_storage_by_buckets finds the buckets in the cluster, updates the router's cache - router.route_map and refs storages which relate to given buckets ids whereas the router_ref_storage_all doesn't do that. See Nikita's comment related to this problem.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The possibility of the router's cache being broken is quite low. We just introduced one more network hop just for that.

Would it work to perhaps pass grouped bucket IDs to replicasets as arguments of router_ref_storage_all()? Like make this one call ref_storage_with buckets for storages where we expect specific buckets, and just "ref" for storages where we don't expect any specific buckets.

At least then we wouldn't have to do 2 network hops to each replicaset just to build/check the refs. Building refs must be possible with a single hop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Full map_callrw with split args

3 participants