Parallel implementation of the (batch-based) bind join algorithm by hartig · Pull Request #499 · LiUSemWeb/HeFQUIN

hartig · 2026-02-10T11:45:32Z

Our current implementation performs all bind-join requests sequentially. This PR introduces an implementation that can perform them all in parallel.

More specifically, the new implementation issues the bind-join requests without blocking, handling the processing of their responses in parallel (in the threads that the federation access manager uses to perform the requests).

The algorithm works as follows: For every sequence of solution mappings from the input, the algorithm splits this sequence into batches where each such batch will then be used for a separate bind-join request. Each such batch is associated with a sub-multiset of the input solution mappings that are covered by the batch, whereas the batch itself consists of versions of these input solution mappings that are already restricted to the join variables (and that contain no blank nodes, see below). Hence, while the number of such already-restricted solution mappings per batch is fixed (see the batchSize argument of the constructor), the size of the sub-multiset of input solution mappings associated with each batch may be greater than the batch size.

After splitting the current sequence of input solution mappings into batches, the last batch may not be full, in which case it is kept and will be populated further once the next sequence of input solution mappings is passed to the operator. The full batches are used to create bind-join requests, one per batch. The response to such a request is the subset of the solutions for the query/pattern of this operator that are join partners for at least one of the solutions that were used for creating the request.

Each of the requests is issued using the asynchronous functionality of the federation access manager, which results in a CompletableFuture. The algorithm connects this future to an internal response processor to process the response once it arrives (joining the solution mappings from the response with the solution mappings covered by the corresponding batch). All these futures are collected such that the algorithm can wait for their completion after the child operator has stopped producing input for this operator.

This implementation is also capable of separating out each input solution mapping that assigns a blank node to any of the join variables. Then, such solution mappings are not even considered when creating the requests because they cannot have any join partners in the results obtained from the federation member. Of course, in case the algorithm is used with outer-join semantics, these solution mappings are still returned to the output (without joining them with anything).

Another feature of this implementation is that it switches into a full-retrieval mode as soon as there is an input solution mapping that does not have a binding for any of the join variables (which may happen only in cases in which none of the join variables is a certain variable). Such an input solution mapping is compatible with (and, thus, can be joined with) every solution mapping that the federation member has for the query/pattern of this bind-join operator. Therefore, when switching into full-retrieval mode, this implementation performs a request to retrieve the complete set of all these solution mappings and, then, uses this set to find join partners for the current and the future batches of input solution mappings (because, with the complete set available locally, there is no need anymore to issue further bind-join requests).

…VALUES-based variation so far

hartig · 2026-02-10T11:56:47Z

@AdrianaConcha there are still a few things to be done here, but can you please give it an initial try first in your experiment setup? Just for a handful queries first, only to see whether it works (it should ;) and whether it has an effect. I tried with one query on my machine and saw reduction of execution time to about 1/2, for a query in which the bind-join had to process five batches.

To enable the parallel version, edit the config file. There is a new entry now for PhysicalOpParallelBindJoinWithVALUES. That one needs to be uncommented, and the other bind-join related entries before it need to be commented.

… to PhysicalOpRegistry, and removes almost all the functions from PhysicalPlanFactory that create plans with a specific type of root operator

…inSPARQL

…nSPARQLwithUNION

…ts code directly into PhysicalOpBindJoinSPARQL)

parallel implementation of the (batch-based) bind join; only for the …

13997ea

…VALUES-based variation so far

hartig marked this pull request as draft February 10, 2026 11:57

hartig added 7 commits February 10, 2026 22:07

renames the executable operators for bind joins

be4f671

adds 'getAllPossible' to LogicalToPhysicalOpConverter and 'createAll'…

fb5da1d

… to PhysicalOpRegistry, and removes almost all the functions from PhysicalPlanFactory that create plans with a specific type of root operator

merges all PhysicalOpBindJoinWith* classes into one: PhysicalOpBindJo…

8fd6877

…inSPARQL

adds ExecOpParallelBindJoinSPARQLwithFILTER and ExecOpParallelBindJoi…

d828eaa

…nSPARQLwithUNION

remove obsolete import

7a67887

removes BaseForPhysicalOpSingleInputJoinAtSPARQLEndpoint (and moves i…

893ac54

…ts code directly into PhysicalOpBindJoinSPARQL)

mention PR #499 in CHANGELOG.md

c23d279

hartig marked this pull request as ready for review February 12, 2026 21:30

hartig changed the title ~~WIP: Parallel implementation of the (batch-based) bind join algorithm~~ Parallel implementation of the (batch-based) bind join algorithm Feb 12, 2026

hartig merged commit 391350e into main Feb 12, 2026
1 check passed

hartig deleted the ParallelBindJoin branch February 12, 2026 21:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel implementation of the (batch-based) bind join algorithm#499

Parallel implementation of the (batch-based) bind join algorithm#499
hartig merged 8 commits intomainfrom
ParallelBindJoin

hartig commented Feb 10, 2026

Uh oh!

hartig commented Feb 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hartig commented Feb 10, 2026

Uh oh!

hartig commented Feb 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant