chore: migrate to a10.2 gpu for gpu e2e#3220
chore: migrate to a10.2 gpu for gpu e2e#3220google-oss-prow[bot] merged 1 commit intokubeflow:masterfrom
Conversation
Signed-off-by: Akash Jaiswal <akashjaiswal3846@gmail.com>
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Pull Request Test Coverage Report for Build 22127251024Details
💛 - Coveralls |
There was a problem hiding this comment.
Pull request overview
Migrates the GPU E2E GitHub Actions job to a new A10.2 self-hosted runner configuration with 2 GPUs, and updates example notebooks/test parameters to request 2 GPUs accordingly.
Changes:
- Update GPU E2E workflow to run on the
oracle-vm-gpu-a10-2runner label. - Increase GPU allocation in the TorchTune Qwen2.5 Alpaca notebook (
resources_per_node["gpu"] = 2). - Run the JAX MNIST notebook in GPU E2E with
num_gpu=2.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| examples/torchtune/qwen2_5/qwen2.5-1.5B-with-alpaca.ipynb | Requests 2 GPUs for the TorchTune Qwen2.5 training example (plus notebook metadata churn). |
| .github/workflows/test-e2e-gpu.yaml | Switches GPU E2E runner label to A10.2 and updates JAX notebook execution params to use 2 GPUs. |
| "id": "7c49a6d5", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "# Fine-tune Qwe2.5-1.5B with Alpaca Dataset" |
There was a problem hiding this comment.
The notebook title has a typo: "Qwe2.5" should be "Qwen2.5" to match the actual model name and avoid confusion when searching/reading the example.
| "# Fine-tune Qwe2.5-1.5B with Alpaca Dataset" | |
| "# Fine-tune Qwen2.5-1.5B with Alpaca Dataset" |
There was a problem hiding this comment.
Why do we need to change this?
There was a problem hiding this comment.
Now the runner has 2 GPUs. So by default, in the qwen notebook eg itself I have requested GPU as 2.
There was a problem hiding this comment.
Sounds good, thank you for describing that!
/lgtm
/approve
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
This PR migrates the runner for GPU E2E to A10.2 with 2 GPUs. This also updates the Qwen eg to run with 2 GPUs.
Related #3201
Checklist: