feat(trainer): run dataset and model initializers in parallel by Sayan4496 · Pull Request #292 · kubeflow/sdk

Sayan4496 · 2026-02-12T18:07:03Z

Summary

Run dataset and model initializer containers in parallel in the container backend instead of sequential execution.

Motivation

Previously, when both dataset and model initializers were configured, they executed sequentially, increasing total startup time.
Since Docker/Podman allow multiple containers to mount the same volume simultaneously, running them in parallel reduces initialization latency to approximately the maximum of the two durations rather than their sum.

Changes

Execute dataset and model initializers concurrently using ThreadPoolExecutor.
Wait for all initializers to complete and propagate any failure.
Preserve existing timeout, logging, and cleanup behavior.
Maintain backward compatibility when only one initializer is configured.
Add debug log after successful completion of all initializers.

Testing

Trainer test suite passes locally:

184 passed

No regressions observed in container backend behavior.

Fixes #290

google-oss-prow · 2026-02-12T18:07:11Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign astefanutti for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

github-actions · 2026-02-12T18:07:14Z

🎉 Welcome to the Kubeflow SDK! 🎉

Thanks for opening your first PR! We're happy to have you as part of our community 🚀

Here's what happens next:

If you haven't already, please check out our Contributing Guide for repo-specific guidelines and the Kubeflow Contributor Guide for general community standards
Our team will review your PR soon! cc @kubeflow/kubeflow-sdk-team

Join the community:

Slack: Join our #kubeflow-ml-experience and #kubeflow-trainer Slack channels
Meetings: Attend the Kubeflow SDK and ML Experience bi-weekly meetings

Feel free to ask questions in the comments if you need any help or clarification!
Thanks again for contributing to Kubeflow! 🙏

Copilot

Pull request overview

Updates the container backend to reduce startup latency by running dataset and model initializer containers concurrently instead of sequentially.

Changes:

Execute dataset and model initializers in parallel via ThreadPoolExecutor.
Wait for initializer completion and propagate failures to the caller.
Add a debug log after all initializers finish successfully.

Copilot · 2026-02-12T18:10:34Z

kubeflow/trainer/backends/container/backend.py

+            # Wait for all initializers to complete and propagate errors
+            for future in as_completed(futures):
+                future.result()
+


When iterating as_completed(futures), the first failing initializer raises immediately, which means exceptions from any other initializer future are never consumed (and the surfaced failure becomes completion-order dependent), making debugging nondeterministic when multiple initializers fail; consider capturing results for all futures (e.g., map future→name, collect exceptions from every future.result() in a list, then raise a combined/deterministic error after all have finished).

Suggested change

# Wait for all initializers to complete and propagate errors

for future in as_completed(futures):

future.result()

# Wait for all initializers to complete and collect errors deterministically

exceptions: list[Exception] = []

for future in as_completed(futures):

try:

future.result()

except Exception as exc:

exceptions.append(exc)

if exceptions:

if len(exceptions) == 1:

raise RuntimeError("Initializer failed") from exceptions[0]

messages = "\n".join(

f"{idx + 1}) {type(exc).__name__}: {exc}"

for idx, exc in enumerate(exceptions)

)

raise RuntimeError(

f"Multiple initializers failed:\n{messages}"

)

Signed-off-by: Sayan Deyashi <deyashisayan2@gmail.com>

Copilot AI review requested due to automatic review settings February 12, 2026 18:07

google-oss-prow bot requested review from Electronic-Waste, astefanutti and szaher February 12, 2026 18:07

google-oss-prow bot added the size/M label Feb 12, 2026

Copilot started reviewing on behalf of Sayan4496 February 12, 2026 18:07 View session

Copilot AI reviewed Feb 12, 2026

View reviewed changes

Sayan4496 force-pushed the feat-parallel-initializers branch from b84f0e2 to 987ee5d Compare February 13, 2026 15:14

feat(trainer): run dataset and model initializers in parallel

85e110d

Signed-off-by: Sayan Deyashi <deyashisayan2@gmail.com>

Sayan4496 force-pushed the feat-parallel-initializers branch from 987ee5d to 85e110d Compare February 13, 2026 15:21

Sayan4496 changed the title ~~feat(container): run dataset and model initializers in parallel~~ feat(trainer): run dataset and model initializers in parallel Feb 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat(trainer): run dataset and model initializers in parallel#292

feat(trainer): run dataset and model initializers in parallel#292
Sayan4496 wants to merge 1 commit intokubeflow:mainfrom
Sayan4496:feat-parallel-initializers

Sayan4496 commented Feb 12, 2026

Uh oh!

google-oss-prow bot commented Feb 12, 2026

Uh oh!

github-actions bot commented Feb 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-            # Wait for all initializers to complete and propagate errors
-            for future in as_completed(futures):
-                future.result()
+            # Wait for all initializers to complete and collect errors deterministically
+            exceptions: list[Exception] = []
+            for future in as_completed(futures):
+                try:
+                    future.result()
+                except Exception as exc:
+                    exceptions.append(exc)
+            if exceptions:
+                if len(exceptions) == 1:
+                    raise RuntimeError("Initializer failed") from exceptions[0]
+                messages = "\n".join(
+                    f"{idx + 1}) {type(exc).__name__}: {exc}"
+                    for idx, exc in enumerate(exceptions)
+                )
+                raise RuntimeError(
+                    f"Multiple initializers failed:\n{messages}"
+                )

Comments

Conversation

Sayan4496 commented Feb 12, 2026

Summary

Motivation

Changes

Testing

Uh oh!

google-oss-prow bot commented Feb 12, 2026

Uh oh!

github-actions bot commented Feb 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant