[DoNotCommit] Add support for building Codegen example with an existi… #3

nicolasvasilache · 2021-01-14T11:16:05Z

…ng MLIR

Prerequisites:
==============

First, `export MLIR_SOURCE_DIR=...`

```
(mkdir -p ${MLIR_SOURCE_DIR}/../build && \
cd ${MLIR_SOURCE_DIR}/../build && \
cmake -G Ninja ../llvm -DLLVM_ENABLE_PROJECTS="mlir"  -DBUILD_SHARED_LIBS=ON -DLLVM_BUILD_LLVM_DYLIB=1 -DMLIR_LINK_MLIR_DYLIB=1  -DLLVM_BUILD_EXAMPLES=OFF  -DLLVM_TARGETS_TO_BUILD="X86" \
 -DCMAKE_BUILD_TYPE=Release    -DLLVM_ENABLE_ASSERTIONS=ON -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ && \
cmake --build . --target MLIR check-mlir)
```

Codegen:
========

```
MLIR_DIR=${MLIR_SOURCE_DIR}/../build    cmake -GNinja -DCMAKE_CXX_COMPILER=clang++-11 -DCMAKE_C_COMPILER=clang-11 \
-DMLIR_SOURCE=${MLIR_SOURCE_DIR} -DUSE_MKL=OFF -DMLIR_BUILD=${MLIR_SOURCE_DIR}/../build/lib -B build ./Codegen/matmul && \
cmake --build build
```

Benchmark:
==========

```
rm -f build/matmul_* && cmake --build build --target matmul-compile; \
for f in $(find build/ -maxdepth 1 -executable -type f | sort --version-sort); do $f; done; \
ls *out | sort --version-sort | xargs tail -n 1
```

Results (on my machine, peak ~96GFlops/s DP):
=============================================

==> matmul_18x32x96_mlir_perf.out <==
32.44 GFLOPS

==> matmul_24x64x96_mlir_perf.out <==
33.86 GFLOPS

==> matmul_24x64x512_mlir_perf.out <==
40.66 GFLOPS

==> matmul_48x64x128_mlir_perf.out <==
42.69 GFLOPS

==> matmul_192x64x128_mlir_perf.out <==
41.60 GFLOPS

==> matmul_192x128x128_mlir_perf.out <==
36.87 GFLOPS

==> matmul_192x256x256_mlir_perf.out <==
34.32 GFLOPS

==> matmul_384x256x256_mlir_perf.out <==
35.13 GFLOPS

==> matmul_480x512x256_mlir_perf.out <==
30.80 GFLOPS

==> matmul_1020x1152x1152_mlir_perf.out <==
12.49 GFLOPS

==> matmul_1024x1024x1024_mlir_perf.out <==
35.26 GFLOPS

==> matmul_2304x2304x2560_mlir_perf.out <==
24.42 GFLOPS

Notes:
======

1. ODM numbers were using F32, good register/tile sizes need to be explored for F64.
2. Fixed some issues preventing AVX512, may be a few more things needed re compiler flags.
3. There seems to be some core MLIR regressions: manually trying different tiles sizes can create code that segfaults.
4. MLIR OSS lacks hoistings that were used internally, linalg on tensors is a better abstraction for this but still WIP.
5. MLIR OSS lacks full/partial splitting + outlining strategies that were used internally.

…ng MLIR Prerequisites: ============== First, `export MLIR_SOURCE_DIR=...` ``` (mkdir -p ${MLIR_SOURCE_DIR}/../build && \ cd ${MLIR_SOURCE_DIR}/../build && \ cmake -G Ninja ../llvm -DLLVM_ENABLE_PROJECTS="mlir" -DBUILD_SHARED_LIBS=ON -DLLVM_BUILD_LLVM_DYLIB=1 -DMLIR_LINK_MLIR_DYLIB=1 -DLLVM_BUILD_EXAMPLES=OFF -DLLVM_TARGETS_TO_BUILD="X86" \ -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ && \ cmake --build . --target MLIR check-mlir) ``` Codegen: ======== ``` MLIR_DIR=${MLIR_SOURCE_DIR}/../build cmake -GNinja -DCMAKE_CXX_COMPILER=clang++-11 -DCMAKE_C_COMPILER=clang-11 \ -DMLIR_SOURCE=${MLIR_SOURCE_DIR} -DUSE_MKL=OFF -DMLIR_BUILD=${MLIR_SOURCE_DIR}/../build/lib -B build ./Codegen/matmul && \ cmake --build build ``` Benchmark: ========== ``` rm -f build/matmul_* && cmake --build build --target matmul-compile; \ for f in $(find build/ -maxdepth 1 -executable -type f | sort --version-sort); do $f; done; \ ls *out | sort --version-sort | xargs tail -n 1 ``` Results (on my machine, peak ~96GFlops/s DP): ============================================= ==> matmul_18x32x96_mlir_perf.out <== 32.44 GFLOPS ==> matmul_24x64x96_mlir_perf.out <== 33.86 GFLOPS ==> matmul_24x64x512_mlir_perf.out <== 40.66 GFLOPS ==> matmul_48x64x128_mlir_perf.out <== 42.69 GFLOPS ==> matmul_192x64x128_mlir_perf.out <== 41.60 GFLOPS ==> matmul_192x128x128_mlir_perf.out <== 36.87 GFLOPS ==> matmul_192x256x256_mlir_perf.out <== 34.32 GFLOPS ==> matmul_384x256x256_mlir_perf.out <== 35.13 GFLOPS ==> matmul_480x512x256_mlir_perf.out <== 30.80 GFLOPS ==> matmul_1020x1152x1152_mlir_perf.out <== 12.49 GFLOPS ==> matmul_1024x1024x1024_mlir_perf.out <== 35.26 GFLOPS ==> matmul_2304x2304x2560_mlir_perf.out <== 24.42 GFLOPS Notes: ====== 1. ODM numbers were using F32, good register/tile sizes need to be explored for F64. 2. Fixed some issues preventing AVX512, may be a few more things needed re compiler flags. 3. There seems to be some core MLIR regressions: manually trying different tiles sizes can create code that segfaults. 4. MLIR OSS lacks hoistings that were used internally, linalg on tensors is a better abstraction for this but still WIP. 5. MLIR OSS lacks full/partial splitting + outlining strategies that were used internally.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DoNotCommit] Add support for building Codegen example with an existi… #3

[DoNotCommit] Add support for building Codegen example with an existi… #3

Uh oh!

nicolasvasilache commented Jan 14, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[DoNotCommit] Add support for building Codegen example with an existi… #3

Are you sure you want to change the base?

[DoNotCommit] Add support for building Codegen example with an existi… #3

Uh oh!

Conversation

nicolasvasilache commented Jan 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nicolasvasilache commented Jan 14, 2021 •

edited

Loading