Skip to content

Conversation

@nirandaperera
Copy link
Contributor

@nirandaperera nirandaperera commented Nov 5, 2025

This PR adds Fanout Node.

/**
 * @brief Fanout policy controlling how messages are propagated.
 */
enum class FanoutPolicy : uint8_t {
    /**
     * @brief Process messages as they arrive and immediately forward them.
     *
     * Messages are forwarded as soon as they are received from the input channel.
     * The next message is not processed until all output channels have completed
     * sending the current one, ensuring backpressure and synchronized flow.
     */
    BOUNDED,

    /**
     * @brief Forward messages without enforcing backpressure.
     *
     * In this mode, messages may be accumulated internally before being
     * broadcast, or they may be forwarded immediately depending on the
     * implementation and downstream consumption rate.
     *
     * This mode disables coordinated backpressure between outputs, allowing
     * consumers to process at independent rates, but can lead to unbounded
     * buffering and increased memory usage.
     *
     * @note Consumers might not receive any messages until *all* upstream
     * messages have been sent, depending on the implementation and buffering
     * strategy.
     */
    UNBOUNDED,
};

/**
 * @brief Broadcast messages from one input channel to multiple output channels.
 *
 * The node continuously receives messages from the input channel and forwards
 * them to all output channels according to the selected fanout policy, see
 * ::FanoutPolicy.
 *
 * Each output channel receives a shallow copy of the same message; no payload
 * data is duplicated. All copies share the same underlying payload, ensuring
 * zero-copy broadcast semantics.
 *
 * @param ctx The node context to use.
 * @param ch_in Input channel from which messages are received.
 * @param chs_out Output channels to which messages are broadcast.
 * @param policy The fanout strategy to use (see ::FanoutPolicy).
 *
 * @return Streaming node representing the fanout operation.
 *
 * @throws std::invalid_argument If an unknown fanout policy is specified.
 *
 * @note Since messages are shallow-copied, releasing a payload (`release<T>()`)
 * is only valid on messages that hold exclusive ownership of the payload.
 */
Node fanout(
    std::shared_ptr<Context> ctx,
    std::shared_ptr<Channel> ch_in,
    std::vector<std::shared_ptr<Channel>> chs_out,
    FanoutPolicy policy
);

Depends on #648

Closes #560

Signed-off-by: niranda perera <niranda.perera@gmail.com>
Signed-off-by: niranda perera <niranda.perera@gmail.com>
Signed-off-by: niranda perera <niranda.perera@gmail.com>
Signed-off-by: niranda perera <niranda.perera@gmail.com>
Signed-off-by: niranda perera <niranda.perera@gmail.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 5, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Signed-off-by: niranda perera <niranda.perera@gmail.com>
Signed-off-by: niranda perera <niranda.perera@gmail.com>
Signed-off-by: niranda perera <niranda.perera@gmail.com>
@nirandaperera nirandaperera marked this pull request as ready for review November 7, 2025 20:13
@nirandaperera nirandaperera requested review from a team as code owners November 7, 2025 20:13
@nirandaperera nirandaperera added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Nov 7, 2025
Signed-off-by: niranda perera <niranda.perera@gmail.com>
Signed-off-by: niranda perera <niranda.perera@gmail.com>
@nirandaperera nirandaperera requested a review from a team as a code owner November 7, 2025 22:29
Signed-off-by: niranda perera <niranda.perera@gmail.com>
Co-authored-by: Mads R. B. Kristensen <madsbk@gmail.com>
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is a much simpler implementation lurking here.

Comment on lines 144 to 146
RAPIDSMPF_EXPECTS(
co_await ch_out->send(msg.copy(res)), "failed to send message"
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be an error I think. Consider the case where the consumer has "seen enough". It wants to shut down the input channel to signal to the producer "I don't need any more inputs". The producer task should then exit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh okay. That's a good point. I didnt think about it. This could make purging messages a little complicated. But let me try this out

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do it in a followup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! I added this capability now.

Comment on lines 15 to 41
class FanoutPolicy(IntEnum):
"""
Fanout policy controlling how messages are propagated.

Attributes
----------
BOUNDED : int
Process messages as they arrive and immediately forward them.
Messages are forwarded as soon as they are received from the input channel.
The next message is not processed until all output channels have completed
sending the current one, ensuring backpressure and synchronized flow.
UNBOUNDED : int
Forward messages without enforcing backpressure.
In this mode, messages may be accumulated internally before being
broadcast, or they may be forwarded immediately depending on the
implementation and downstream consumption rate.

This mode disables coordinated backpressure between outputs, allowing
consumers to process at independent rates, but can lead to unbounded
buffering and increased memory usage.

Note: Consumers might not receive any messages until *all* upstream
messages have been sent, depending on the implementation and buffering
strategy.
"""
BOUNDED = <uint8_t>cpp_FanoutPolicy.BOUNDED
UNBOUNDED = <uint8_t>cpp_FanoutPolicy.UNBOUNDED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reimport the C++ enum.

from rapidsmpf.streaming.core.fanout import FanoutPolicy

I think.

Comment on lines 92 to 94
# Validate policy
if not isinstance(policy, (FanoutPolicy, int)):
raise TypeError(f"policy must be a FanoutPolicy enum value, got {type(policy)}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we only accept FanoutPolicy as the type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that, and cython was complaining that FanoutPolicy enum is not a type identifier

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to cimport it as well.

owner.append(ch_out)
_chs_out.push_back((<Channel>ch_out)._handle)

cdef cpp_FanoutPolicy _policy = <cpp_FanoutPolicy>(<int>policy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you8 use the re-exported cdef enum this is unnecessary. Check some of the pylibcudf cython code to see how it's done there.

Signed-off-by: niranda perera <niranda.perera@gmail.com>
Signed-off-by: niranda perera <niranda.perera@gmail.com>
This reverts commit d99d7c4.
Signed-off-by: niranda perera <niranda.perera@gmail.com>
Comment on lines 51 to 56
for (auto& ch_out : chs_out) {
// do a reservation for each copy, so that it will fallback to host memory if
// needed
// TODO: change this
auto res = ctx->br()->reserve_or_fail(msg.copy_cost(), try_memory_types(msg)[0]);
tasks.push_back(ch_out->send(msg.copy(res)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We own (or should own) the message, so one of these copies is redundant (the "last" output channel can just take ownership of the input message).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Let me move the msg to the last channel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wence- I've addressed this now

// intentionally not locking the mtx here, because we only need to know a
// lower-bound on the last completed idx (ch_next_idx values are monotonically
// increasing)
size_t last_completed_idx = std::ranges::min(state.ch_next_idx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is UB, because another thread (one of the consumers) might be updating an entry in state.ch_next_idx simultaneously.

Copy link
Contributor Author

@nirandaperera nirandaperera Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but its strictly increasing. And we only need an approximate value here. So, a current running min may not be the exact min in std::ranges::min, but it will be strictly less than or equal. Consequence would be, not cleaning up all the finished messages. But I felt it was a better trade-off than trying to relock the mutex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Thinking about this again, maybe I can do a ranges::min_max during request_data.wait, and purge until the min value. That would eliminate this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this now

Comment on lines 221 to 228
// start send tasks for each output channel
coro::task_container<coro::thread_pool> tasks(ctx->executor());
for (size_t i = 0; i < chs_out.size(); i++) {
RAPIDSMPF_EXPECTS(
tasks.start(unbounded_fo_send_task(*ctx, i, chs_out[i], state)),
"failed to start send task"
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need a task container here, or can in "process inputs" loop be a task as well, and we instead do:

std:vector<...> tasks = {process_inputs(), unbounded_fo_send_task(...), ...};
coro_results(co_await coro::when_all(std::move(tasks)));

With appropriate schedule of the tasks we're running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 235 to 237
return std::ranges::any_of(state.ch_next_idx, [&](size_t next_idx) {
return state.recv_messages.size() == next_idx;
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this more complicated than it needs to be?

I think this is "just" a boolean flag for "at least one of the consumers wants more data".

Could we have such a flag in the state struct that is updated when a consumer is ready and the flipped back to false here?

I think then the ch_next_idx of each consumer doesn't need to be part of the state of the struct, it's a local piece of information rather than a signalling mechanism.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a good point. The only issue would be, cleaning up finished messages. Currently I use the ch_next_idx to find the boundary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cost of this test is O(n_out_channels) which would be fairly small IMO.

… fanout-node

Signed-off-by: niranda perera <niranda.perera@gmail.com>
@nirandaperera
Copy link
Contributor Author

@wence- @madsbk I think I addressed all the comments now. Could you please take another look?

Signed-off-by: niranda perera <niranda.perera@gmail.com>
Signed-off-by: niranda perera <niranda.perera@gmail.com>
} else {
// request more data from the input channel
lock.unlock();
co_await request_data.notify_one();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was reminded what was bugging me here. If there are multiple tasks running notify_one (or notify_all) on the same condvar, we can have lost wakeups unfortunately. I have tried to fix this in libcoro here: jbaldwin/libcoro#416

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! 🙁 I think your rationale in the jbaldwin/libcoro#416 makes sense. Should hold this PR off until jbaldwin/libcoro#416 merges? I did a bunch of repeat test runs, but didnt encounter a hang yet though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason for that is that these tasks are typically not all firing at once. I think we are probably OK to merge now, but keep that in mind in case we see issues.

Co-authored-by: Lawrence Mitchell <wence@gmx.li>
@nirandaperera nirandaperera requested a review from wence- November 25, 2025 16:12
Signed-off-by: niranda perera <niranda.perera@gmail.com>
@nirandaperera nirandaperera requested a review from madsbk November 25, 2025 19:40
nirandaperera and others added 3 commits November 25, 2025 12:23
Co-authored-by: Mads R. B. Kristensen <madsbk@gmail.com>
Co-authored-by: Mads R. B. Kristensen <madsbk@gmail.com>
Signed-off-by: niranda perera <niranda.perera@gmail.com>
@nirandaperera nirandaperera requested a review from madsbk November 25, 2025 22:48
Copy link
Member

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @nirandaperera

@nirandaperera
Copy link
Contributor Author

/merge

@nirandaperera
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 089a73a into rapidsai:main Dec 2, 2025
89 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA] Support channel broadcast (streaming "fanout" node).

3 participants