Skip to content

chore: migrate to a10.2 gpu for gpu e2e#3220

Merged
google-oss-prow[bot] merged 1 commit intokubeflow:masterfrom
jaiakash:a10-2-gpu
Feb 18, 2026
Merged

chore: migrate to a10.2 gpu for gpu e2e#3220
google-oss-prow[bot] merged 1 commit intokubeflow:masterfrom
jaiakash:a10-2-gpu

Conversation

@jaiakash
Copy link
Member

@jaiakash jaiakash commented Feb 18, 2026

What this PR does / why we need it:
This PR migrates the runner for GPU E2E to A10.2 with 2 GPUs. This also updates the Qwen eg to run with 2 GPUs.

Related #3201

Checklist:

  • Docs included if any changes are user facing

Signed-off-by: Akash Jaiswal <akashjaiswal3846@gmail.com>
Copilot AI review requested due to automatic review settings February 18, 2026 05:00
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@coveralls
Copy link

Pull Request Test Coverage Report for Build 22127251024

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 55.998%

Totals Coverage Status
Change from base Build 22127246076: 0.0%
Covered Lines: 1391
Relevant Lines: 2484

💛 - Coveralls

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Migrates the GPU E2E GitHub Actions job to a new A10.2 self-hosted runner configuration with 2 GPUs, and updates example notebooks/test parameters to request 2 GPUs accordingly.

Changes:

  • Update GPU E2E workflow to run on the oracle-vm-gpu-a10-2 runner label.
  • Increase GPU allocation in the TorchTune Qwen2.5 Alpaca notebook (resources_per_node["gpu"] = 2).
  • Run the JAX MNIST notebook in GPU E2E with num_gpu=2.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
examples/torchtune/qwen2_5/qwen2.5-1.5B-with-alpaca.ipynb Requests 2 GPUs for the TorchTune Qwen2.5 training example (plus notebook metadata churn).
.github/workflows/test-e2e-gpu.yaml Switches GPU E2E runner label to A10.2 and updates JAX notebook execution params to use 2 GPUs.

"id": "7c49a6d5",
"metadata": {},
"source": [
"# Fine-tune Qwe2.5-1.5B with Alpaca Dataset"
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notebook title has a typo: "Qwe2.5" should be "Qwen2.5" to match the actual model name and avoid confusion when searching/reading the example.

Suggested change
"# Fine-tune Qwe2.5-1.5B with Alpaca Dataset"
"# Fine-tune Qwen2.5-1.5B with Alpaca Dataset"

Copilot uses AI. Check for mistakes.
Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating infra.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to change this?

Copy link
Member Author

@jaiakash jaiakash Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the runner has 2 GPUs. So by default, in the qwen notebook eg itself I have requested GPU as 2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thank you for describing that!
/lgtm
/approve

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 3264245 into kubeflow:master Feb 18, 2026
40 of 42 checks passed
@google-oss-prow google-oss-prow bot added this to the v2.2 milestone Feb 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments