Feat: Update RL on Multi-Host TPUs tutorial for clarity and structure #2890

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

RexBearIU wants to merge 3 commits into main from jackyf/docs/rl_multi

+108 −42

Collaborator

RexBearIU commented Dec 24, 2025

Description

This update reorganizes the multi-host TPU reinforcement learning tutorial for MaxText, Tunix, and vLLM, adding a table of contents and revising the sections for environment setup, checkpoint conversion, and Docker image creation. It separates the steps for stable versus local builds, updates the workload submission commands for GRPO and GSPO, and adds a section for troubleshooting.

Tests

Verified the updated documentation by walking through the entire workflow, including environment setup, Docker image builds, and workload submission. The commands executed successfully as described. Attached are two test logs confirming the results.

GRPO log
GSPO log

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.


          Fix: Update RL on Multi-Host TPUs tutorial for clarity and structure

f2ee948

RexBearIU requested review from A9isha, RissyRan, bvandermoon, gagika, gobbleturk, jacoguzo, jiangjy1982, richjames0, shralex and vipannalla as code owners

December 24, 2025 09:47

RexBearIU requested a review from SurbhiJainUSC

December 24, 2025 09:56

codecov bot commented Dec 24, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

SurbhiJainUSC reviewed

View reviewed changes

docs/tutorials/posttraining/rl_on_multi_host.md Show resolved Hide resolved

SurbhiJainUSC reviewed

View reviewed changes

docs/tutorials/posttraining/rl_on_multi_host.md Outdated

    
              ```

              xpk workload create-pathways --workload $WORKLOAD \

              --docker-image <path/to/gcr.io> --cluster $TPU_CLUSTER \

              --docker-image $CLOUD_IMAGE_NAME --cluster $TPU_CLUSTER \

Collaborator

SurbhiJainUSC Dec 26, 2025

This should be actually gcr.io/$PROJECT_ID/$CLOUD_IMAGE_NAME

Collaborator Author

RexBearIU Dec 29, 2025

You're absolutely right! Nice catch.

SurbhiJainUSC reviewed

View reviewed changes

docs/tutorials/posttraining/rl_on_multi_host.md Outdated Show resolved Hide resolved

SurbhiJainUSC reviewed

View reviewed changes

docs/tutorials/posttraining/rl_on_multi_host.md Outdated

    
              Alternatively, locally clone the repositories and build with local sources:

              ```bash

              # Clone repositories (if not already done)

Collaborator

SurbhiJainUSC Dec 26, 2025

This needs to be done outside of maxtext.

Collaborator

SurbhiJainUSC Dec 29, 2025

Please add a comment.

SurbhiJainUSC reviewed

View reviewed changes

docs/tutorials/posttraining/rl_on_multi_host.md

    
                run_name=${RUN_NAME} \

                base_output_directory=${BASE_OUTPUT_DIRECTORY} \

                hf_access_token=$HF_TOKEN"

                hf_access_token=${HF_TOKEN}"

Collaborator

SurbhiJainUSC Dec 26, 2025 •

edited

Loading

Do we need to set hf_access_token?

Collaborator Author

RexBearIU Dec 29, 2025

Technically no, as the code block currently ignores that flag. However, I suggest keeping it since it's consistent with our docs and other examples. It causes no issues, and the implementation might be updated to use it later anyway.


          Fix: Update installation instructions and Docker image references in …

8f4c585

…RL on Multi-Host TPUs tutorial

SurbhiJainUSC reviewed

View reviewed changes

docs/tutorials/posttraining/rl_on_multi_host.md Outdated

    
              - A Pathways-ready GKE cluster (see [create GKE cluster](https://docs.cloud.google.com/ai-hypercomputer/docs/workloads/pathways-on-cloud/create-gke-cluster)).

              Setup following environment variables:

              ## Create Virtual Environment and Install MaxText Dependencies

Collaborator

SurbhiJainUSC Dec 29, 2025 •

edited

Loading

We don't need to install MaxText dependencies for multi-host. This section can be removed. Please verify.

Collaborator Author

RexBearIU Dec 30, 2025

I’ve verified this and you're right. We don't need to install MaxText dependencies for multi-host, so I have removed that section.

SurbhiJainUSC reviewed

View reviewed changes

docs/tutorials/posttraining/rl_on_multi_host.md Outdated

    
              export MAXTEXT_CKPT_PATH=${BASE_OUTPUT_DIRECTORY}/${RUN_NAME}/0/items

              # -- Workload configuration --

              export WORKLOAD=${RUN_NAME}

Collaborator

SurbhiJainUSC Dec 29, 2025

$RUN_NAME and $WORKLOAD are two duplicate variables. Maybe we can just have $WORKLOAD for simplicity.

Collaborator

SurbhiJainUSC Dec 29, 2025

Can you also add a note to address b/470463466?

Collaborator Author

RexBearIU Dec 30, 2025 •

edited

Loading

Sure. I've added a note pointing to the Troubleshooting section to clarify the process for handling failed workloads.


          fix: Update RL tutorial for clarity and workload management

a6e8759

RexBearIU force-pushed the jackyf/docs/rl_multi branch from e06d035 to a6e8759 Compare

December 30, 2025 04:24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

SurbhiJainUSC SurbhiJainUSC left review comments

jacoguzo Awaiting requested review from jacoguzo jacoguzo is a code owner

bvandermoon Awaiting requested review from bvandermoon bvandermoon is a code owner

richjames0 Awaiting requested review from richjames0 richjames0 is a code owner

shralex Awaiting requested review from shralex shralex is a code owner

gobbleturk Awaiting requested review from gobbleturk gobbleturk is a code owner

RissyRan Awaiting requested review from RissyRan RissyRan is a code owner

gagika Awaiting requested review from gagika gagika is a code owner

A9isha Awaiting requested review from A9isha A9isha is a code owner

jiangjy1982 Awaiting requested review from jiangjy1982 jiangjy1982 is a code owner

vipannalla Awaiting requested review from vipannalla vipannalla is a code owner

At least 2 approving reviews are required to merge this pull request.

Labels

None yet