chore: migrate to a10.2 gpu for gpu e2e by jaiakash · Pull Request #3220 · kubeflow/trainer

jaiakash · 2026-02-18T05:00:40Z

What this PR does / why we need it:
This PR migrates the runner for GPU E2E to A10.2 with 2 GPUs. This also updates the Qwen eg to run with 2 GPUs.

Related #3201

Checklist:

Docs included if any changes are user facing

Signed-off-by: Akash Jaiswal <akashjaiswal3846@gmail.com>

review-notebook-app · 2026-02-18T05:00:46Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

coveralls · 2026-02-18T05:05:32Z

Pull Request Test Coverage Report for Build 22127251024

Details

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 55.998%

Totals
Change from base Build 22127246076:	0.0%
Covered Lines:	1391
Relevant Lines:	2484

💛 - Coveralls

Copilot

Pull request overview

Migrates the GPU E2E GitHub Actions job to a new A10.2 self-hosted runner configuration with 2 GPUs, and updates example notebooks/test parameters to request 2 GPUs accordingly.

Changes:

Update GPU E2E workflow to run on the oracle-vm-gpu-a10-2 runner label.
Increase GPU allocation in the TorchTune Qwen2.5 Alpaca notebook (resources_per_node["gpu"] = 2).
Run the JAX MNIST notebook in GPU E2E with num_gpu=2.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
examples/torchtune/qwen2_5/qwen2.5-1.5B-with-alpaca.ipynb	Requests 2 GPUs for the TorchTune Qwen2.5 training example (plus notebook metadata churn).
.github/workflows/test-e2e-gpu.yaml	Switches GPU E2E runner label to A10.2 and updates JAX notebook execution params to use 2 GPUs.

Copilot · 2026-02-18T05:06:43Z

examples/torchtune/qwen2_5/qwen2.5-1.5B-with-alpaca.ipynb

+      "id": "7c49a6d5",
      "metadata": {},
      "source": [
        "# Fine-tune Qwe2.5-1.5B with Alpaca Dataset"


The notebook title has a typo: "Qwe2.5" should be "Qwen2.5" to match the actual model name and avoid confusion when searching/reading the example.

Suggested change

"# Fine-tune Qwe2.5-1.5B with Alpaca Dataset"

"# Fine-tune Qwen2.5-1.5B with Alpaca Dataset"

tenzen-y

Thanks for updating infra.

tenzen-y · 2026-02-18T05:28:33Z

examples/torchtune/qwen2_5/qwen2.5-1.5B-with-alpaca.ipynb

Why do we need to change this?

Now the runner has 2 GPUs. So by default, in the qwen notebook eg itself I have requested GPU as 2.

Sounds good, thank you for describing that!
/lgtm
/approve

google-oss-prow · 2026-02-18T09:15:48Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [tenzen-y]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

chore: migrate to a10.2 gpu for gpu e2e

d6770fb

Signed-off-by: Akash Jaiswal <akashjaiswal3846@gmail.com>

Copilot AI review requested due to automatic review settings February 18, 2026 05:00

google-oss-prow bot requested review from akshaychitneni and jinchihe February 18, 2026 05:00

google-oss-prow bot added the size/L label Feb 18, 2026

Copilot started reviewing on behalf of jaiakash February 18, 2026 05:01 View session

Copilot AI reviewed Feb 18, 2026

View reviewed changes

tenzen-y reviewed Feb 18, 2026

View reviewed changes

jaiakash mentioned this pull request Feb 18, 2026

feat: add Megatron-Core GPT Tensor Parallelism example notebook #3201

Open

1 task

google-oss-prow bot assigned tenzen-y Feb 18, 2026

google-oss-prow bot added the lgtm label Feb 18, 2026

google-oss-prow bot added the approved label Feb 18, 2026

google-oss-prow bot merged commit 3264245 into kubeflow:master Feb 18, 2026
40 of 42 checks passed

google-oss-prow bot added this to the v2.2 milestone Feb 18, 2026

jaiakash mentioned this pull request Feb 19, 2026

Validate GPU support on JAX notebooks #3184

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: migrate to a10.2 gpu for gpu e2e#3220

chore: migrate to a10.2 gpu for gpu e2e#3220
google-oss-prow[bot] merged 1 commit intokubeflow:masterfrom
jaiakash:a10-2-gpu

jaiakash commented Feb 18, 2026 •

edited

Loading

Uh oh!

review-notebook-app bot commented Feb 18, 2026

Uh oh!

coveralls commented Feb 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 18, 2026

Uh oh!

tenzen-y left a comment

Uh oh!

tenzen-y Feb 18, 2026

Uh oh!

jaiakash Feb 18, 2026 •

edited

Loading

Uh oh!

tenzen-y Feb 18, 2026

Uh oh!

google-oss-prow bot commented Feb 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

	"# Fine-tune Qwe2.5-1.5B with Alpaca Dataset"
	"# Fine-tune Qwen2.5-1.5B with Alpaca Dataset"

Conversation

jaiakash commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Feb 18, 2026

Uh oh!

coveralls commented Feb 18, 2026

Pull Request Test Coverage Report for Build 22127251024

Details

💛 - Coveralls

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

tenzen-y left a comment

Choose a reason for hiding this comment

Uh oh!

tenzen-y Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

jaiakash Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tenzen-y Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

google-oss-prow bot commented Feb 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

jaiakash commented Feb 18, 2026 •

edited

Loading

jaiakash Feb 18, 2026 •

edited

Loading