-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Description
What would you like to happen?
Hi dear team,
I would like to ask if the current implementation of PubSubIO is sufficient for cases where message ordering is important.
There is a note:
Integration with Dataflow: Don't enable message ordering for subscriptions when configuring Dataflow with Pub/Sub. Dataflow has its own mechanism for total message ordering, ensuring chronological order across all messages as part of windowing operations. This method of ordering differs from Pub/Sub's ordering key-based approach. Using ordering keys with Dataflow can potentially reduce pipeline performance.
https://docs.cloud.google.com/pubsub/docs/ordering#:~:text=Using ordering keys with Dataflow,with the same ordering key.
Additionally, there's a notification when starting the pipeline indicating that enabling ordering in the subscription could cause performance issues.
With #31608, there was a significant improvement that allows writing with an ordering key, but reading still lacks this functionality.
I've tested a few approaches, and it appears that whether ordering is enabled or disabled in the subscription, there's a chance that messages will be passed to the next step in the wrong order after PubsubIO.readMessagesWithAttributesAndMessageIdAndOrderingKey(), even if the next step is just sorting back by ordering key.
This is a big disadvantage for us compared to KafkaIO, which outputs KV<key, message> instead of just message.
Is there a chance to fix this and to make the output of PubSubIO optionally KV (key-value) depending on the use case, to truly maintain the correct order, even with a potential performance downgrade?
Thank you
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner