Skip to content

Conversation

@DaAwesomeP
Copy link

This removes the executable flag for files committed to the Git repo that aren't actually executable.

When developing on Windows, please be sure set your Git settings not to mark files as executable by default. Note that there is a per-repo setting as well as a system default setting. On Windows, the system default is set during Git installation.

DaAwesomeP pushed a commit to mitmedialab/amd_blis that referenced this pull request Dec 19, 2025
- Added a set of thresholds(based on input dimensions) that
  determine and set the ideal number of threads to be used
  for CGEMM (on ZEN4 and ZEN5 architectures).

- The thread-setting logic is as follows :
    - The underlying kernels(single-threaded) work on blocks
      of MRxk of A, kxNR of B and MRxNR of C. Thus, it is
      initially assumed that the optimal number of threads is
      ceil(m/MR)*ceil(n/NR). This is the upper bound on the
      actual number of threads that is ideal.

    - The actual ideal thread count could be lesser than the
      upper bound, based on the work that every thread receives.
      This is mainly determined by the value of 'k'.

    - If 'k' is small, the arithmetic intensity(AI) is low and
      memory bandwidth becomes the limiting factor, thus favoring
      smaller thread counts. In contrast, if 'k' is high, the AI
      is high and the workload scales well with higher thread counts.

    - So, we limit the number of threads when 'k' is small to avoid
      bandwidth contention. Using fewer threads ensures each thread
      gets more bandwidth, improving efficiency. In contrast, we allow
      more threads when 'k' is large, as the computation becomes more
      compute-bound and less limited by memory bandwidth, thereby benefitting
      with a higher-thread count.

- The new logic will now set the upper bound for the optimal number of threads
  (based on the number of tiles), and then further reduce it based on the values
  of 'm', 'n' and 'k'. This comes under the 'AOCL_DYNAMIC' feature for CGEMM,
  specifically for ZEN4 and ZEN5 architectures.

AMD-Internal: [CPUPL-6498]

Co-authored-by: Vignesh Balasubramanian <vignbala@amd.com>
Co-authored-by: Varaganti, Kiran <Kiran.Varaganti@amd.com>
@DaAwesomeP
Copy link
Author

This appears to be fixed in dev.

@DaAwesomeP DaAwesomeP closed this Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant