From c1a0c9746f8be057cbb3ac1bf18547e824b4d5fb Mon Sep 17 00:00:00 2001 From: jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> Date: Mon, 9 Feb 2026 09:48:12 -0800 Subject: [PATCH] [PyTorch][Core][JAX] Expand troubleshooting docs (#2602) * expand troubleshooting docs Signed-off-by: Jeremy Berchtold * Update README.rst Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> * Update README.rst Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> * Update README.rst Signed-off-by: jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> --------- Signed-off-by: Jeremy Berchtold Signed-off-by: jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --- README.rst | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/README.rst b/README.rst index cd55a2e18f..3cc5f81293 100644 --- a/README.rst +++ b/README.rst @@ -315,6 +315,37 @@ Troubleshooting cd transformer_engine pip install -v -v -v --no-build-isolation . +**Problems using UV or Virtual Environments:** + +1. **Import Error:** + + * **Symptoms:** Cannot import ``transformer_engine`` + * **Solution:** Ensure your UV environment is active and that you have used ``uv pip install --no-build-isolation `` instead of a regular pip install to your system environment. + +2. **cuDNN Sublibrary Loading Failed:** + + * **Symptoms:** Errors at runtime with ``CUDNN_STATUS_SUBLIBRARY_LOADING_FAILED`` + * **Solution:** This can occur when TE is built against the container's system installation of cuDNN, but pip packages inside the virtual environment pull in pip packages for ``nvidia-cudnn-cu12/cu13``. To resolve this, when building TE from source please specify the following environment variables to point to the cuDNN in your virtual environment. + + + .. code-block:: bash + + export CUDNN_PATH=$(pwd)/.venv/lib/python3.12/site-packages/nvidia/cudnn + export CUDNN_HOME=$CUDNN_PATH + export LD_LIBRARY_PATH=$CUDNN_PATH/lib:$LD_LIBRARY_PATH + +3. **Building Wheels:** + + * **Symptoms:** Regular TE installs work correctly but UV wheel builds fail at runtime. + * **Solution:** Ensure that ``uv build --wheel --no-build-isolation -v`` is used during the wheel build as well as the pip installation of the wheel. Use ``-v`` for verbose output to verify that TE is not pulling in a mismatching version of PyTorch or JAX that differs from the UV environment's version. + +**JAX-specific Common Issues and Solutions:** + +1. **FFI Issues:** + + * **Symptoms:** ``No registered implementation for custom call to for platform CUDA`` + * **Solution:** Ensure ``--no-build-isolation`` is used during installation. If pre-building wheels, ensure that the wheel is both built and installed with ``--no-build-isolation``. See "Problems using UV or Virtual Environments" above if using UV. + .. troubleshooting-end-marker-do-not-remove Breaking Changes