AI-Hypercomputer · RobMulla · Dec 16, 2025
@@ -1,23 +1,26 @@
-<!--
- Copyright 2024 Google LLC
+# How-to Guides
 
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
+Practical step-by-step guides for common tasks, optimizations, and workflows in MaxText.
 
-      https://www.apache.org/licenses/LICENSE-2.0
+## Performance & Optimization
+*   [**Optimization Factors**](guides/optimization.md)
+    *   Running custom models, configuring sharding strategies, and writing high-performance Pallas kernels.
+*   [**Monitoring & Debugging**](guides/monitoring_and_debugging.md)
+    *   Tools for diagnosing performance issues, including Goodput monitoring, Cloud Logging, and XProf profiling.
 
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
- -->
+## Data & Storage
+*   [**Data Input Pipelines**](guides/data_input_pipeline.md)
+    *   Configuring data loaders for high performance. Includes Grain (ArrayRecord), Hugging Face, and TFDS pipelines.
+*   [**Checkpointing**](guides/checkpointing_solutions.md)
+    *   Strategies for saving and restoring model state, including GCS checkpointing, emergency recovery, and multi-tier solutions.
 
-# How-to guides
+## Development Workflows
+*   [**Python Notebooks**](guides/run_python_notebook.md)
+    *   Interactive development using Jupyter/Colab on TPUs. Covers local port-forwarding and Colab setups.
 
 ```{toctree}
 :maxdepth: 1
+:hidden:
 
 guides/optimization.md
 guides/data_input_pipeline.md

@@ -1,23 +1,22 @@
-<!--
- Copyright 2024 Google LLC
+# Reference Documentation
 
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
+Technical reference material for MaxText architecture, metrics, and configurations.
 
-      https://www.apache.org/licenses/LICENSE-2.0
+## Core Concepts
+*   [**Architecture Overview**](reference/architecture.md)
+    *   Deep dive into the design of MaxText and the JAX AI stack choices.
+*   [**Core Concepts**](reference/core_concepts.md)
+    *   Explanations of fundamental topics like quantization, tiling, MoE configuration, and JAX/XLA/Pallas interactions.
 
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
- -->
-
-# Reference documentation
+## Benchmarks & Models
+*   [**Performance Metrics**](reference/performance_metrics.md)
+    *   Understanding key metrics like Model FLOPs Utilization (MFU), step time, and tokens/second.
+*   [**Supported Models**](reference/models.md)
+    *   List of supported architectures, model tiering levels, and configuration details.
 
 ```{toctree}
 :maxdepth: 1
+:hidden:
 
 reference/performance_metrics.md
 reference/models.md

@@ -1,12 +1,38 @@
 # Run MaxText
 
+MaxText provides flexible execution options ranging from local development and single-host experimentation to massively scalable training on thousands of chips. Choose the runbook that matches your infrastructure and goals.
+
+## Local & Single Host
+Ideal for development, debugging, and small-scale experimentation.
+
+*   [**Localhost / Single VM**](run_maxtext/run_maxtext_localhost.md)
+    *   The best starting point. Run directly on a single TPU VM or GPU machine (e.g., A3/H100).
+    *   Great for learning the basics, testing configurations, and running small models.
+
+*   [**Single Host GPU Guide**](run_maxtext/run_maxtext_single_host_gpu.md)
+    *   Specific instructions for setting up and running on NVIDIA GPUs (A3/H100), including CUDA and Docker setup.
+
+*   [**Decoupled Mode (No Cloud Dependencies)**](run_maxtext/decoupled_mode.md)
+    *   Run tests and development loops completely offline without Google Cloud dependencies (GCS, JetStream, etc.).
+
+## Multi-Host & Cluster (At Scale)
+For large-scale training jobs running on GKE clusters.
+
+*   [**Running with XPK (Recommended)**](run_maxtext/run_maxtext_via_xpk.md)
+    *   The standard way to run production workloads on GKE.
+    *   Uses the Accelerated Processing Kit (XPK) to orchestrate Docker containers across TPU/GPU clusters.
+
+*   [**Running with Pathways**](run_maxtext/run_maxtext_via_pathways.md)
+    *   Advanced orchestration using the Pathways backend on GKE.
+    *   Supports both batch jobs and interactive "headless" workloads for development.
+
 ```{toctree}
 :maxdepth: 1
+:hidden:
 
 run_maxtext/run_maxtext_localhost.md
 run_maxtext/run_maxtext_single_host_gpu.md
 run_maxtext/run_maxtext_via_xpk.md
 run_maxtext/run_maxtext_via_pathways.md
 run_maxtext/decoupled_mode.md
-
 ```