Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
177 commits
Select commit Hold shift + click to select a range
e823d51
ckProfiler and device-level XDL GEMM operator (#48)
Nov 14, 2021
b491ebf
FP16 data in-register transpose (#41)
Nov 15, 2021
3737bb0
Add bfp16/int8 support into XDL GEMM operator (#50)
Nov 15, 2021
89e1ebd
updated bfloat16_to_float
Nov 16, 2021
0a66c54
fixed multiple definition issue of bfp16/fp32 conversion function whe…
Nov 16, 2021
a651ea4
Fixed bfp16 host_conv_fwd (#52)
Nov 18, 2021
970fa3e
v5r1 fusion kernels for inference (#49)
Nov 18, 2021
64350af
Use __builtin_memcpy to implement bit_cast and for accessing vector f…
Nov 18, 2021
567f5e9
add args for packed gemm (#54)
Nov 24, 2021
237d4ca
added test for magic number division (#58)
Nov 30, 2021
4041850
fix layout naming convention (#56)
Nov 30, 2021
d798c9b
fixed c_buffer alloc
Dec 2, 2021
2cbb897
add static_buffer_v2 zero out
Dec 2, 2021
d7a0a3f
renaming/comments
Dec 2, 2021
41cdd38
GEMM/Conv+BiasAdd+ReLU+Add (#55)
Dec 3, 2021
fd3d907
fix ReLU formula (#61)
Dec 4, 2021
a4f2423
manually apply bug fix changes in pr #63 (#64)
Dec 13, 2021
acbd7bd
Fusion Conv+Bias+ReLU(+Add) (#62)
Dec 26, 2021
6260ced
Fix building issue for examples (#66)
Jan 18, 2022
4d40b19
Add gemm_shuffle host api (#71)
rocking5566 Jan 21, 2022
ca47a6c
Do not hardcode the function parameter, use template instead. (#72)
rocking5566 Jan 25, 2022
4be7f01
add split-k GEMM (#59)
ltqin Feb 3, 2022
6d92959
Replace llvm Intrinsics with clang buildins (#65)
Feb 3, 2022
690c75a
References for conv2d fwd bias relu and add (#75)
ltqin Feb 4, 2022
823657e
GEMM+Bias+ReLU+Add (#76)
asroy Feb 7, 2022
904cbe2
fix build breaks (#81)
rosenrodt Feb 11, 2022
6f928a0
Support alpha beta scaling for GEMM (#78)
rocking5566 Feb 11, 2022
b53e9d0
Batched GEMM for fp16 (#79)
zjing14 Feb 11, 2022
20a672d
Add small tile size for fp16/fp32 and NN layout (#80)
zjing14 Feb 11, 2022
880fbee
NHWC conv 2d: fwd bfp16/int8, Device level tuning and host API (#73)
ltqin Feb 12, 2022
2778e99
Initial Setup for CI (#86)
JehandadKhan Feb 19, 2022
19c5d6e
Gemm alpha beta profiler (fp32 & fp16) (#91)
rocking5566 Feb 21, 2022
6dfb92b
Conv3d new (#94)
j4yan Feb 23, 2022
756a761
Unify Convolution FWD XDL 1D/2D implementation. (#93)
aosewski Feb 23, 2022
22d438a
Add gridwise GEMM pipeline (#89)
asroy Feb 23, 2022
bdedf64
Space filling curve (#96)
j4yan Feb 25, 2022
e221d11
Split k f16 (#97)
zjing14 Feb 25, 2022
6d4450e
Allow distinct K0/K1 values for A/B block descriptor (#98)
rosenrodt Feb 28, 2022
992f71e
Update test CMakeLists to add new tests automatically and add Jenkins…
JehandadKhan Mar 3, 2022
c254e5a
NHWC conv 2d: bwd fp32/fp16/bfp16/int8, Device level tuning and host …
ltqin Mar 4, 2022
0619ebf
Refactor threadwise copy using sfcurve (#101)
j4yan Mar 4, 2022
0c79af1
fix type in PR #101 (#107)
asroy Mar 4, 2022
7e9a9d3
[Bf16 & int8] [example & ckprofiler] (#100)
rocking5566 Mar 4, 2022
7a9b93f
Example for conv2d backward weight fp16 (#106)
ltqin Mar 5, 2022
5b17887
Fix Tests build (#109)
asroy Mar 5, 2022
ad41aa0
Int8 qunatization gemm xdl (#108)
rocking5566 Mar 5, 2022
12dfba3
revert changes in threadwise copy due to PR #101 (space filling curve…
asroy Mar 5, 2022
e17c0d8
Reduction in Composable Kernel (#82)
qianfengz Mar 5, 2022
245f741
improve parallelism for testing (#112)
asroy Mar 7, 2022
5d37d7b
Reorganize files, Part 1 (#119)
asroy Mar 9, 2022
827301d
Pr82 followup (#115)
qianfengz Mar 10, 2022
9e33fe7
Use Space Filling Curve in Threadwise Copy (#118)
j4yan Mar 11, 2022
c78d1be
revise count_vgpr script to capture all possible syntaxes (#124)
rosenrodt Mar 11, 2022
9a17e7f
Consider gemm requant relu requant as gemm fusuion (#116)
rocking5566 Mar 12, 2022
b51808d
Fix conv2d bwd data bug when filter is 1x1 and stride = 2 (#132)
ltqin Mar 21, 2022
485ea46
Gemm_c_shuffle (4 layouts) X (fp32 bf16 int8) (#131)
rocking5566 Mar 21, 2022
cb87b04
refactored deviceBatchedGemm; removed GridwiseBatchedGemm; added fp32…
j4yan Mar 21, 2022
9a8ee8a
Reduction for int8 and bfloat16 (#125)
qianfengz Mar 22, 2022
716f1c7
Grouped GEMM for fp16 (#126)
zjing14 Mar 22, 2022
d91f9f1
Batched gemm bf16 (#142)
j4yan Mar 22, 2022
2206136
clean (#143)
asroy Mar 23, 2022
f91579a
Unified conv3D API + support for all data types. (#133)
aosewski Mar 23, 2022
f95267f
Gemm+Reduce Fusion (#128)
asroy Mar 24, 2022
12f4cfc
fixed alloc mem size (#145)
zjing14 Mar 24, 2022
3ba1493
Gemm test return value (#148)
rocking5566 Mar 24, 2022
313bbea
ctest of batched_gemm returns 0 or 1 (#149)
j4yan Mar 25, 2022
fe6ce55
Grouped gemm test fix (#150)
zjing14 Mar 28, 2022
0536f2b
Unified implementation of 1d/2d/3d conv bwd-data. fp32/fp16/bfp16/int…
ltqin Mar 29, 2022
98e1e2d
Refine kernel parameter of int8 (ScalarPerVector) (#155)
rocking5566 Mar 29, 2022
34c661e
Batched gemm and reduction (#156)
j4yan Mar 30, 2022
982f8bb
Fix return type to be conformant with CTest. (#160)
aosewski Mar 31, 2022
c8f3acf
batched_gemm: use profiler in ctest (#163)
j4yan Mar 31, 2022
f015c77
use single threaded tensor generator (#161)
rosenrodt Mar 31, 2022
ecf337b
fixed issue164 (#165)
j4yan Mar 31, 2022
cd167e4
Compile for gfx908 and gfx90a (#130)
asroy Mar 31, 2022
c0e95f6
Patch for bwd data #134 (#168)
ltqin Mar 31, 2022
7db48f9
Tune & add conflict-free LDS gemm kernels (#159)
rosenrodt Mar 31, 2022
6468781
fix build (#171)
asroy Apr 1, 2022
82c8b9f
Improve Reduction kernel api (#152)
qianfengz Apr 5, 2022
781cacd
NHWC Conv2d Bwd weight fp16 ckprofiler and test (#166)
ltqin Apr 5, 2022
6717168
Patch for bwd data comments (#174)
ltqin Apr 5, 2022
abf4bdb
Common forward convolution utility refactor. (#141)
aosewski Apr 5, 2022
ac0d806
Fix typo in batched gemm profiler (#176)
j4yan Apr 7, 2022
4221505
Compile CK for all targets (#188)
illsilin Apr 15, 2022
c1ef731
Use ck::half_t for Host Reduction (#195)
qianfengz Apr 21, 2022
860e291
removed unused lds loads (#196)
zjing14 Apr 21, 2022
7353ec0
Fix `clang-format` (#189)
JehandadKhan Apr 21, 2022
1a0cd5d
Convolution FWD profiler refactor. (#183)
aosewski Apr 21, 2022
08a979f
use inline asm for 4x4 int8 transposition (#187)
rosenrodt Apr 22, 2022
31d869a
Clang-format only modified files. (#181)
aosewski Apr 22, 2022
7c0b149
profiler: fix fp32 c-shuffle gemm tuning parameter (#194)
rosenrodt Apr 22, 2022
3956085
add comments to batched_gemm (#186)
j4yan Apr 25, 2022
95e9343
Hotfix for gemm test (#214)
rosenrodt Apr 29, 2022
97d8c50
Add gfx90a CI stage for tests (#208)
JehandadKhan Apr 29, 2022
c77ae65
Update to gemm_reduce and batched_gemm_reduce (#213)
qianfengz Apr 29, 2022
8a2c69e
use integer value for GEMM test (#219)
asroy Apr 30, 2022
8eca05a
Introduce GoogleTest framework. (#204)
aosewski Apr 30, 2022
a3c910a
Add Benchmark test into CI (#226)
illsilin May 8, 2022
ec7c2e9
Code refactor (#175)
asroy May 9, 2022
968bd93
Update README.md (#228)
whchung May 9, 2022
f03a173
Resolution of issue #153: Add compiler warning on comparing int and s…
myamlak May 9, 2022
712e464
Post PR183 review fixes. (#224)
aosewski May 10, 2022
76764d8
Manual control of MAC cluster for improved interwave performance (#184)
rosenrodt May 11, 2022
0f912e2
enable convnd bwd data test (#234)
ltqin May 12, 2022
cec69bc
Add host API (#220)
JehandadKhan May 12, 2022
9f71ff4
Validate examples in CI (#233)
rosenrodt May 13, 2022
aafc3ac
elementwise op (#238)
rocking5566 May 19, 2022
0ffe956
Gemm reduce max (#209)
rocking5566 May 20, 2022
bb4b82a
Hotfix eltiwseop (#242)
rocking5566 May 20, 2022
b9b9c3b
[Perf][Bwd-weights]Lds re-layout to avoid ds read/write bank conflict…
shaojiewang May 20, 2022
b31b588
remove unused conv bwd data profiler header and cpp (#245)
shaojiewang May 20, 2022
070619f
[conv bwd-weight]Binding gemm k1 to conv n (#202)
shaojiewang May 20, 2022
a054f7d
Refactor block to C tile map (#235)
rosenrodt May 20, 2022
44943e0
remove options.hpp.in (#240)
May 20, 2022
ac54331
example of conv bwd weight 1d/2d/3d fp32/fp16/bf16 xdl (#244)
shaojiewang May 20, 2022
ba58a93
fix build (#246)
May 23, 2022
0d08cf1
add GetWorkSpaceSize to base arg (#253)
shaojiewang May 24, 2022
1085794
Add performance tests as a stage of CI. (#247)
illsilin May 24, 2022
63eee2d
Overhaul to Reducton and its dependants (#237)
qianfengz May 24, 2022
40b59a6
Navi21 gemm (#197)
j4yan May 24, 2022
61851ae
minor fix for recent PR (#255)
May 25, 2022
e579c9e
Tensile-style block to C tile map (#239)
rosenrodt May 25, 2022
82d7d99
Hotfix binary elementwise (for broadcast on fastest axis) (#254)
rocking5566 May 25, 2022
97c4d48
Add pooling example (#257)
qianfengz May 26, 2022
3e6c261
Add FP64 XDL GEMM built-in function (#199)
ltqin May 26, 2022
91d8b7d
Fixing conv bug (#258)
May 27, 2022
d32a67a
gemm + layernorm (#261)
rocking5566 May 30, 2022
85fc91c
Minor fix for recent PR (#260)
May 31, 2022
7b1e2c3
Multi-kernel CGEMM (#230)
myamlak May 31, 2022
b6eaf3e
Pass gemm_descs for grouped gemm via __constant__ buff (#232)
zjing14 May 31, 2022
86185bd
Unify the naming of the math functions used by the host and kernel (#…
qianfengz Jun 2, 2022
1c5d06f
use old ctile to avoid conv2d fwd bias relu add compute error (#271)
shaojiewang Jun 2, 2022
1677cf7
Adding Resnet50 test to Performance tests (#268)
illsilin Jun 2, 2022
1ced00a
Add performance tests on MI200 in CI, reporting number of CUs, add st…
illsilin Jun 10, 2022
fb9b6b1
Use new github credentials (#278)
illsilin Jun 16, 2022
561ec12
example for convnd bwd weight bf16 splitk (#265)
shaojiewang Jun 16, 2022
6eb5549
Gemm + bias + relu + add + layernorm (#272)
rocking5566 Jun 17, 2022
c7a96ed
add p_workspace to baseargument (#275)
ltqin Jun 17, 2022
63cdd92
use universal workspace pointer in bwd-weight (#286)
shaojiewang Jun 17, 2022
1f543bf
Regulate reduction accumulator operations and Element-wise operations…
qianfengz Jun 17, 2022
e4584d9
Don't look up the /sys/module/amdgpu/version file. (#287)
illsilin Jun 17, 2022
56adf7e
GEMM with Multiple Source, GEMM+Bias+Add+FastGeLU example and ckProfi…
Jun 19, 2022
ccbd8d9
update readme and script (#290)
Jun 21, 2022
1ae2410
bring up to date with the usage of __builtin_amdgcn_sched_barrier (#293)
rosenrodt Jun 21, 2022
be60d60
Create MIT LICENSE (#229)
Jun 21, 2022
15c89e8
Standalone softmax kernel (#284)
rosenrodt Jun 21, 2022
4634b12
fix Issue 291 (#294)
shaojiewang Jun 21, 2022
a2edd7d
Testing all fwd convolution specializations. (#259)
aosewski Jun 23, 2022
a49115b
update license (#297)
Jun 23, 2022
d1db6a0
Absolute include path (#281)
Jun 25, 2022
d3051d7
add license in file (#303)
Jun 25, 2022
b653c5e
Switch to standard ROCm packaging (#301)
lawruble13 Jun 25, 2022
aebd211
External Interface (#304)
Jun 27, 2022
1223511
external api for gemm + layernorm (#285)
rocking5566 Jun 27, 2022
eccf877
Remove incorrect old packaging statement (#308)
lawruble13 Jun 30, 2022
93c99f3
Standalone sweep once softmax kernel w/ ckProfiler (#295)
rosenrodt Jun 30, 2022
ab6c82c
Grouped Gemm ckProfiler hotfix (#313)
zjing14 Jun 30, 2022
fa9a0a5
Gemm + bias + c_permute (#312)
zjing14 Jul 1, 2022
0dcb349
Improve external interface for GEMM and GEMM+add+add+fastgelu (#311)
Jul 1, 2022
1c8126a
add batch_stride into batched gemm (#314)
zjing14 Jul 1, 2022
63fd5da
Single-kernel GEMM + layernorm (#263)
rosenrodt Jul 1, 2022
8e37478
modified grouped gemm addressing method (#307)
guangzlu Jul 1, 2022
9e4429f
Gemm+Bilinear (#316)
Jul 2, 2022
334361c
Batched Gemm with C Permute (#305)
zjing14 Jul 6, 2022
4fe9c39
N-D Tensor Contraction example, instance, and client example (#270)
Jul 7, 2022
763ca61
add conv1d/3d bwd weight instances (#318)
shaojiewang Jul 8, 2022
6391474
GEMM pipeline v2 (#317)
poyenc Jul 8, 2022
92a0945
convnd_fwd fp16 example
Jul 11, 2022
d789a53
update example
Jul 12, 2022
2189220
update example
Jul 12, 2022
d41a59a
update instance
Jul 12, 2022
ba816e6
updating refernce conv
Jul 12, 2022
0cb8ba9
update reference conv
Jul 13, 2022
11edd0f
update conv fwd profiler
Jul 14, 2022
615e1d3
update conv 1d and 3d instance
Jul 14, 2022
0a0c952
update include path
Jul 14, 2022
6b6360b
clean
Jul 14, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
48 changes: 48 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Compiled Object files
*.slo
*.lo
*.o
*.obj

# Precompiled Headers
*.gch
*.pch
*.ipch

# Compiled Dynamic libraries
*.so
*.dylib
*.dll

# Fortran module files
*.mod

# Compiled Static libraries
*.lai
*.la
*.a
*.lib

# Executables
*.exe
*.out
*.app

# vim tags
tags
.tags
.*.swp

# Editors
.vscode

# build-in-source directory
build*

# emacs temporary/backup files
.\#*
\#*\#
*~

# GDB temporary files
.gdb_history
135 changes: 106 additions & 29 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,26 @@
cmake_minimum_required(VERSION 3.5)
cmake_minimum_required(VERSION 3.14)

# Check support for CUDA/HIP in Cmake
project(composable_kernel)

list(APPEND CMAKE_MODULE_PATH "${PROJECT_SOURCE_DIR}/cmake")

enable_testing()

set(ROCM_SYMLINK_LIBS OFF)
find_package(ROCM 0.8 REQUIRED PATHS /opt/rocm)

include(ROCMInstallTargets)
include(ROCMPackageConfigHelpers)
include(ROCMSetupVersion)
include(ROCMInstallSymlinks)
include(ROCMCreatePackage)
include(CheckCXXCompilerFlag)

rocm_setup_version(VERSION 0.2.0)
include(TargetFlags)
list(APPEND CMAKE_PREFIX_PATH ${CMAKE_INSTALL_PREFIX} ${CMAKE_INSTALL_PREFIX}/llvm ${CMAKE_INSTALL_PREFIX}/hip /opt/rocm /opt/rocm/llvm /opt/rocm/hip)

## C++
enable_language(CXX)
set(CMAKE_CXX_STANDARD 17)
Expand All @@ -30,35 +46,42 @@ message("OpenMP_gomp_LIBRARY: ${OpenMP_gomp_LIBRARY}")
message("OpenMP_pthread_LIBRARY: ${OpenMP_pthread_LIBRARY}")
message("OpenMP_CXX_FLAGS: ${OpenMP_CXX_FLAGS}")

set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
link_libraries(${OpenMP_gomp_LIBRARY})
link_libraries(${OpenMP_pthread_LIBRARY})

## HIP
find_package(HIP REQUIRED)
message(STATUS "Build with HIP ${hip_VERSION}")

## half
#find_path(HALF_INCLUDE_DIR half.hpp)
message("HALF_INCLUDE_DIR: ${HALF_INCLUDE_DIR}")

# CMAKE_CXX_FLAGS
SET(BUILD_DEV ON CACHE BOOL "BUILD_DEV")
if(BUILD_DEV)
string(APPEND CMAKE_CXX_FLAGS " -Werror -Weverything")
# Override HIP version in config.h, if necessary.
# The variables set by find_package() can't be overwritten,
# therefore let's use intermediate variables.
set(CK_HIP_VERSION_MAJOR "${HIP_VERSION_MAJOR}")
set(CK_HIP_VERSION_MINOR "${HIP_VERSION_MINOR}")
set(CK_HIP_VERSION_PATCH "${HIP_VERSION_PATCH}")
if( DEFINED CK_OVERRIDE_HIP_VERSION_MAJOR )
set(CK_HIP_VERSION_MAJOR "${CK_OVERRIDE_HIP_VERSION_MAJOR}")
message(STATUS "CK_HIP_VERSION_MAJOR overriden with ${CK_OVERRIDE_HIP_VERSION_MAJOR}")
endif()
message("CMAKE_CXX_FLAGS: ${CMAKE_CXX_FLAGS}")
if( DEFINED CK_OVERRIDE_HIP_VERSION_MINOR )
set(CK_HIP_VERSION_MINOR "${CK_OVERRIDE_HIP_VERSION_MINOR}")
message(STATUS "CK_HIP_VERSION_MINOR overriden with ${CK_OVERRIDE_HIP_VERSION_MINOR}")
endif()
if( DEFINED CK_OVERRIDE_HIP_VERSION_PATCH )
set(CK_HIP_VERSION_PATCH "${CK_OVERRIDE_HIP_VERSION_PATCH}")
message(STATUS "CK_HIP_VERSION_PATCH overriden with ${CK_OVERRIDE_HIP_VERSION_PATCH}")
endif()
message(STATUS "Build with HIP ${HIP_VERSION}")

## tidy
include(EnableCompilerWarnings)
set(MIOPEN_TIDY_ERRORS ERRORS * -readability-inconsistent-declaration-parameter-name)
set(CK_TIDY_ERRORS ERRORS * -readability-inconsistent-declaration-parameter-name)
if(CMAKE_CXX_COMPILER MATCHES ".*hcc" OR CMAKE_CXX_COMPILER MATCHES ".*clang\\+\\+")
set(MIOPEN_TIDY_CHECKS -modernize-use-override -readability-non-const-parameter)
set(CK_TIDY_CHECKS -modernize-use-override -readability-non-const-parameter)
# Enable tidy on hip
elseif(MIOPEN_BACKEND STREQUAL "HIP" OR MIOPEN_BACKEND STREQUAL "HIPNOGPU")
set(MIOPEN_TIDY_ERRORS ALL)
elseif(CK_BACKEND STREQUAL "HIP" OR CK_BACKEND STREQUAL "HIPNOGPU")
set(CK_TIDY_ERRORS ALL)
endif()


include(ClangTidy)
enable_clang_tidy(
CHECKS
Expand Down Expand Up @@ -150,13 +173,12 @@ enable_clang_tidy(
-cppcoreguidelines-narrowing-conversions
-altera-struct-pack-align
-cppcoreguidelines-prefer-member-initializer

${MIOPEN_TIDY_CHECKS}
${MIOPEN_TIDY_ERRORS}
${CK_TIDY_CHECKS}
${CK_TIDY_ERRORS}
HEADER_FILTER
"\.hpp$"
EXTRA_ARGS
-DMIOPEN_USE_CLANG_TIDY
-DCK_USE_CLANG_TIDY
)

include(CppCheck)
Expand All @@ -180,19 +202,74 @@ enable_cppcheck(
unmatchedSuppression
FORCE
SOURCES
host/host_tensor/src
host/driver_offline/src
composable_kernel/src/kernel_wrapper
library/src
INCLUDE
host/host_tensor/include
host/solver/include
host/driver_offline/include
composable_kernel/include/*
${CMAKE_CURRENT_SOURCE_DIR}/include
${CMAKE_CURRENT_BINARY_DIR}/include
${CMAKE_CURRENT_SOURCE_DIR}/library/include
DEFINE
CPPCHECK=1
__linux__=1
)

add_subdirectory(host)
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/lib)
set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/lib)
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/bin)

include_directories(BEFORE
${PROJECT_SOURCE_DIR}/include
${PROJECT_SOURCE_DIR}/library/include
)


SET(BUILD_DEV ON CACHE BOOL "BUILD_DEV")
if(BUILD_DEV)
add_compile_options(-Werror)
add_compile_options(-Weverything)
endif()
message("CMAKE_CXX_FLAGS: ${CMAKE_CXX_FLAGS}")

add_custom_target(check COMMAND ${CMAKE_CTEST_COMMAND} --output-on-failure -C ${CMAKE_CFG_INTDIR})

rocm_package_setup_component(tests
LIBRARY_NAME composablekernel
PACKAGE_NAME tests # Prevent -static suffix on package name
)

add_subdirectory(library)
add_subdirectory(example)
add_subdirectory(test)
add_subdirectory(profiler)

#Create an interface target for the include only files and call it "composablekernels"
include(CMakePackageConfigHelpers)

set(version 1.0.0)
write_basic_package_version_file(
"${CMAKE_CURRENT_BINARY_DIR}/composable_kernelConfigVersion.cmake"
VERSION "${version}"
COMPATIBILITY AnyNewerVersion
)

configure_package_config_file(${CMAKE_CURRENT_SOURCE_DIR}/Config.cmake.in
"${CMAKE_CURRENT_BINARY_DIR}/composable_kernelConfig.cmake"
INSTALL_DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/composable_kernel
NO_CHECK_REQUIRED_COMPONENTS_MACRO
)

rocm_install(FILES
"${CMAKE_CURRENT_BINARY_DIR}/composable_kernelConfig.cmake"
"${CMAKE_CURRENT_BINARY_DIR}/composable_kernelConfigVersion.cmake"
DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/composable_kernel
)

set(CPACK_RESOURCE_FILE_LICENSE "${CMAKE_CURRENT_SOURCE_DIR}/LICENSE")
set(CPACK_RPM_PACKAGE_LICENSE "MIT")

rocm_create_package(
NAME composablekernel
DESCRIPTION "High Performance Composable Kernel for AMD GPUs"
MAINTAINER "MIOpen Kernels Dev Team <dl.MIOpen@amd.com>"
LDCONFIG
HEADER_ONLY
)
11 changes: 11 additions & 0 deletions Config.cmake.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
@PACKAGE_INIT@

set(_composable_kernel_supported_components device_operations host_tensor)

foreach(_comp ${composable_kernel_FIND_COMPONENTS})
if(NOT _comp IN_LIST _composable_kernel_supported_components)
set(composable_kernel_FOUND False)
set(composable_kernel_NOT_FOUND_MESSAGE "Unsupported component: ${_comp}")
endif()
include("${CMAKE_CURRENT_LIST_DIR}/composable_kernel${_comp}Targets.cmake")
endforeach()
95 changes: 95 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
FROM ubuntu:18.04

ARG ROCMVERSION=5.1
ARG OSDB_BKC_VERSION

RUN set -xe

ARG BUILD_THREADS=8
ARG DEB_ROCM_REPO=http://repo.radeon.com/rocm/apt/.apt_$ROCMVERSION/
# Add rocm repository
RUN apt-get update
RUN apt-get install -y wget gnupg
RUN wget -qO - http://repo.radeon.com/rocm/rocm.gpg.key | apt-key add -
RUN sh -c "echo deb [arch=amd64] $DEB_ROCM_REPO ubuntu main > /etc/apt/sources.list.d/rocm.list"
RUN wget --no-check-certificate -qO - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | apt-key add -
RUN sh -c "echo deb https://apt.kitware.com/ubuntu/ bionic main | tee -a /etc/apt/sources.list"

# ADD requirements.txt requirements.txt
# Install dependencies
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated \
apt-utils \
build-essential \
cmake-data=3.15.1-0kitware1 \
cmake=3.15.1-0kitware1 \
curl \
g++ \
gdb \
git \
hip-rocclr \
jq \
libelf-dev \
libncurses5-dev \
libnuma-dev \
libpthread-stubs0-dev \
llvm-amdgpu \
pkg-config \
python \
python3.8 \
python-dev \
python3-dev \
python-pip \
python3-pip \
software-properties-common \
wget \
rocm-dev \
rocm-device-libs \
rocm-cmake \
vim \
zlib1g-dev \
openssh-server \
clang-format-10 \
kmod && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

# Setup ubsan environment to printstacktrace
RUN ln -s /usr/bin/llvm-symbolizer-3.8 /usr/local/bin/llvm-symbolizer
ENV UBSAN_OPTIONS=print_stacktrace=1

# Install an init system
RUN wget https://github.com/Yelp/dumb-init/releases/download/v1.2.0/dumb-init_1.2.0_amd64.deb
RUN dpkg -i dumb-init_*.deb && rm dumb-init_*.deb

# Install cget
RUN pip install cget

# Install rclone
RUN pip install https://github.com/pfultz2/rclone/archive/master.tar.gz

ARG PREFIX=/opt/rocm
# Install dependencies
RUN cget install pfultz2/rocm-recipes
# Install rbuild
RUN pip3 install https://github.com/RadeonOpenCompute/rbuild/archive/6d78a0553babdaea8d2da5de15cbda7e869594b8.tar.gz
# Install packages for processing the performance results
RUN pip3 install --upgrade pip
RUN pip3 install sqlalchemy
RUN pip3 install pymysql
RUN pip3 install pandas
RUN pip3 install setuptools-rust
RUN pip3 install sshtunnel
# Setup ubsan environment to printstacktrace
ENV UBSAN_OPTIONS=print_stacktrace=1

ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8
ADD rbuild.ini /rbuild.ini
ADD dev-requirements.txt dev-requirements.txt
RUN rbuild prepare -s develop -d $PREFIX
RUN groupadd -f render

# Install the new rocm-cmake version
RUN git clone -b master https://github.com/RadeonOpenCompute/rocm-cmake.git && \
cd rocm-cmake && mkdir build && cd build && \
cmake .. && cmake --build . && cmake --build . --target install
Loading