From e20869fb758dd35bfe583693192eda8b180bd8f7 Mon Sep 17 00:00:00 2001 From: Ruban Kumar <92320771+rkamd@users.noreply.github.com> Date: Thu, 29 Aug 2024 08:44:05 -0500 Subject: [PATCH 1/2] Add a note about thread oversubscription in AOCL (#2684) * Add a note about thread oversubscription in AOCL * Apply suggestions from code review Co-authored-by: Jeffrey Novotny --- docs/how-to/Programmers_Guide.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/how-to/Programmers_Guide.rst b/docs/how-to/Programmers_Guide.rst index a142f1635..591a2bb30 100644 --- a/docs/how-to/Programmers_Guide.rst +++ b/docs/how-to/Programmers_Guide.rst @@ -900,6 +900,12 @@ There are three client executables that can be used with rocBLAS. They are: These three clients can be built by following the instructions in the Building and Installing section of the User Guide. After building the rocBLAS clients, they can be found in the directory ``rocBLAS/build/release/clients/staging``. +.. note:: + The ``rocblas-bench`` and ``rocblas-test`` executables use AMD's ILP64 version of AOCL-BLAS 4.2 as the host reference BLAS to verify correctness. However, there is a known issue with AOCL-BLAS that can cause these executables to hang. This problem can arise because the AOCL-BLAS library launches multiple threads to perform computations. If the number of threads matches the total number of CPU threads, it can lead to thread oversubscription, causing the program to hang. + To prevent this issue, we recommend limiting the number of threads that the AOCL-BLAS library uses to fewer than the available CPU cores. You can do this by setting the ``OMP_NUM_THREADS`` environment variable. + + For example, on a server with 32 cores, you can limit the number of threads to 28 by setting ``export OMP_NUM_THREADS=28`` + The next three sections will provide a brief explanation and the usage of each rocBLAS client. rocblas-bench From d7fb603dd0ca0ea237c849bc4c2a5bd6160f3554 Mon Sep 17 00:00:00 2001 From: Ruban Kumar Date: Thu, 29 Aug 2024 12:32:41 -0400 Subject: [PATCH 2/2] review comment --- docs/how-to/Programmers_Guide.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/how-to/Programmers_Guide.rst b/docs/how-to/Programmers_Guide.rst index 591a2bb30..f7495e0a2 100644 --- a/docs/how-to/Programmers_Guide.rst +++ b/docs/how-to/Programmers_Guide.rst @@ -901,7 +901,7 @@ There are three client executables that can be used with rocBLAS. They are: These three clients can be built by following the instructions in the Building and Installing section of the User Guide. After building the rocBLAS clients, they can be found in the directory ``rocBLAS/build/release/clients/staging``. .. note:: - The ``rocblas-bench`` and ``rocblas-test`` executables use AMD's ILP64 version of AOCL-BLAS 4.2 as the host reference BLAS to verify correctness. However, there is a known issue with AOCL-BLAS that can cause these executables to hang. This problem can arise because the AOCL-BLAS library launches multiple threads to perform computations. If the number of threads matches the total number of CPU threads, it can lead to thread oversubscription, causing the program to hang. + The ``rocblas-bench`` and ``rocblas-test`` executables use AMD's ILP64 version of AOCL-BLAS 4.2 as the host reference BLAS to verify correctness. However, there is a known issue with AOCL-BLAS that can cause these executables to hang. This problem can arise because the AOCL-BLAS library launches multiple threads to perform computations. If the number of threads matches the total number of CPU logical cores, it can lead to thread oversubscription, causing the program to hang. To prevent this issue, we recommend limiting the number of threads that the AOCL-BLAS library uses to fewer than the available CPU cores. You can do this by setting the ``OMP_NUM_THREADS`` environment variable. For example, on a server with 32 cores, you can limit the number of threads to 28 by setting ``export OMP_NUM_THREADS=28``