Skip to content

[Feature Request]: PubSubIO read considering OrderingKey #37163

@aprochko

Description

@aprochko

What would you like to happen?

Hi dear team,

I would like to ask if the current implementation of PubSubIO is sufficient for cases where message ordering is important.

There is a note:
Integration with Dataflow: Don't enable message ordering for subscriptions when configuring Dataflow with Pub/Sub. Dataflow has its own mechanism for total message ordering, ensuring chronological order across all messages as part of windowing operations. This method of ordering differs from Pub/Sub's ordering key-based approach. Using ordering keys with Dataflow can potentially reduce pipeline performance.

https://docs.cloud.google.com/pubsub/docs/ordering#:~:text=Using ordering keys with Dataflow,with the same ordering key.

Additionally, there's a notification when starting the pipeline indicating that enabling ordering in the subscription could cause performance issues.

With #31608, there was a significant improvement that allows writing with an ordering key, but reading still lacks this functionality.

I've tested a few approaches, and it appears that whether ordering is enabled or disabled in the subscription, there's a chance that messages will be passed to the next step in the wrong order after PubsubIO.readMessagesWithAttributesAndMessageIdAndOrderingKey(), even if the next step is just sorting back by ordering key.

This is a big disadvantage for us compared to KafkaIO, which outputs KV<key, message> instead of just message.

Is there a chance to fix this and to make the output of PubSubIO optionally KV (key-value) depending on the use case, to truly maintain the correct order, even with a potential performance downgrade?

Thank you

Issue Priority

Priority: 2 (default / most feature requests should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions