Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
27a67bf
Added changes/additions to Dockerfile
amd-bartgips Oct 8, 2025
11462b3
auto format
amd-bartgips Oct 16, 2025
06fbe3c
auto format
amd-bartgips Oct 16, 2025
ebc50d8
WIP: parallell functionality
amd-bartgips Oct 17, 2025
b30ba37
perf(mituna_interface): optimize job state updates and improve enqueu…
amd-bartgips Oct 17, 2025
3001f9e
used yapf formatter
amd-bartgips Oct 21, 2025
a19f1bb
changed default base image and properly passed it through to second d…
amd-bartgips Nov 6, 2025
c396afc
changed to newer version of clang-format (12 no longer available)
amd-bartgips Nov 7, 2025
c284083
Big overhaul in order to build docker image and run on mi355. Used ne…
amd-bartgips Nov 10, 2025
d80dd29
Added text() wrapper to SQL queries
amd-bartgips Nov 11, 2025
5f67df3
More text() wrapping for load_job etc
amd-bartgips Nov 11, 2025
ad1c371
Further updates to work with new sqlalchemy version (mostly text() wr…
amd-bartgips Nov 11, 2025
3fc26d5
Added string sanitize function to avoid errors with Sql queries
amd-bartgips Nov 11, 2025
98a1a68
fixed bug with first batch grabbing all available jobs (instead of be…
amd-bartgips Nov 11, 2025
95b6d4e
continuous polling loop for job queue
amd-bartgips Nov 11, 2025
b104982
refactor: improve job state tracking and retry handling
amd-bartgips Nov 11, 2025
1602588
feat(docker): enable COMGR and HIPRTC for MIOpen build
amd-bartgips Nov 11, 2025
1055570
refactor(mituna): simplify job enqueue logic and improve progress tra…
amd-bartgips Nov 11, 2025
93e1c2d
refactor: replace sets with Manager lists for multiprocess job tracking
amd-bartgips Nov 12, 2025
372102c
Added reset for consecutive_empty_fetches to make sure the process do…
amd-bartgips Nov 17, 2025
ca4a2e4
feat(celery): add machine registration and tracking for tuning jobs
amd-bartgips Nov 17, 2025
1240e79
feat(db): add unique constraint on machine hostname
amd-bartgips Nov 18, 2025
089df04
feat(celery): improve machine registration robustness and error handling
amd-bartgips Nov 18, 2025
29a1f05
feat(machine): add SQLAlchemy validator for avail_gpus field
amd-bartgips Nov 18, 2025
f75e451
yapf formatting
amd-bartgips Nov 18, 2025
08ddbe0
refactor(machine): convert avail_gpus to hybrid_property with getter/…
amd-bartgips Nov 18, 2025
9477576
fix(machine): handle avail_gpus type conversion for hybrid property
amd-bartgips Nov 18, 2025
fb09e8d
feat(miopen): add detection and handling of database-locked jobs
amd-bartgips Nov 21, 2025
1ff4047
refactor: reorganize imports and reduce verbose logging in MITunaInte…
amd-bartgips Nov 22, 2025
b2713f0
feat(miopen): add filtering options for applicability updates
amd-bartgips Dec 2, 2025
10773d1
feat(miopen): fix SQLAlchemy subquery usage with explicit select()
amd-bartgips Dec 4, 2025
3ee4459
NOTE: This commit may introduce a bug with NCHW layouts, they do not …
amd-bartgips Dec 22, 2025
2cce59a
feat(config): update database name to silo_heuristic_2d, share change…
amd-bartgips Jan 22, 2026
5c72433
Merge branch 'develop' into silo/3d_conv_benchmark_mi355
amd-ahyttine Jan 23, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,7 @@
__pycache__
*.rej
*.orig
.cline*
*.egg-info
myvenv
venv
177 changes: 153 additions & 24 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ ARG OSDB_BKC_VERSION=
ARG HASVER=${ROCMVERSION:+$ROCMVERSION}
ARG HASVER=${HASVER:-$OSDB_BKC_VERSION}

ARG BASEIMAGE=rocm/miopen:ci_3708da
ARG BASEIMAGE=rocm/miopen:ci_7c45f0
ARG UBUNTU=ubuntu:22.04

#use UBUNTU with rocm version set
Expand All @@ -18,6 +18,8 @@ FROM $USEIMAGE as dtuna-ver-0
#args before from are wiped
ARG ROCMVERSION=
ARG OSDB_BKC_VERSION=
# pass through baseimage for later use
ARG BASEIMAGE

RUN test -d /opt/rocm*; \
if [ $? -eq 0 ] ; then \
Expand Down Expand Up @@ -71,17 +73,21 @@ RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -f -y --all
apt-utils \
build-essential \
cmake \
clang-format-12 \
clang-format \
curl \
doxygen \
gdb \
git \
lbzip2 \
lcov \
libboost-filesystem-dev \
libbz2-dev \
libeigen3-dev \
libncurses5-dev \
libnuma-dev \
libpthread-stubs0-dev \
mysql-client \
nlohmann-json3-dev \
openssh-server \
pkg-config \
python3 \
Expand Down Expand Up @@ -117,30 +123,81 @@ ENV UBSAN_OPTIONS=print_stacktrace=1
RUN wget https://github.com/Yelp/dumb-init/releases/download/v1.2.0/dumb-init_1.2.0_amd64.deb
RUN dpkg -i dumb-init_*.deb && rm dumb-init_*.deb

# Install frugally-deep and its dependencies (header-only libraries)
RUN . /env; if [ -z $SKIP_MIOPEN_BUILD ]; then \
# Clone FunctionalPlus
git clone https://github.com/Dobiasd/FunctionalPlus.git /tmp/FunctionalPlus && \
cd /tmp/FunctionalPlus && \
mkdir build && cd build && \
cmake -DCMAKE_INSTALL_PREFIX=/usr/local .. && \
make install && \
# Clone frugally-deep
git clone https://github.com/Dobiasd/frugally-deep.git /tmp/frugally-deep && \
cd /tmp/frugally-deep && \
mkdir build && cd build && \
cmake -DCMAKE_INSTALL_PREFIX=/usr/local .. && \
make install && \
# Clean up
rm -rf /tmp/FunctionalPlus /tmp/frugally-deep; \
fi


# ============================================
# Check if BOTH MIOpen and Fin are already installed
# ============================================
# We check both together because Fin depends on MIOpen headers
# If either is missing, we build both to ensure compatibility
RUN if [ -f /opt/rocm/lib/libMIOpen.so ] && [ -d /opt/rocm/include/miopen ] && \
([ -f /opt/rocm/bin/fin ] || [ -f /opt/rocm/miopen/bin/fin ]); then \
echo "=== Both MIOpen and Fin already installed, skipping builds ==="; \
echo "export SKIP_MIOPEN_BUILD=1" >> /env; \
echo "export SKIP_FIN_BUILD=1" >> /env; \
else \
echo "=== Building MIOpen and Fin from source (Fin needs MIOpen headers) ==="; \
fi

# ============================================
# Clone MIOpen (if needed)
# ============================================
ARG ROCM_LIBS_DIR=/root/rocm-libraries
ARG MIOPEN_DIR=$ROCM_LIBS_DIR/projects/miopen
#Clone MIOpen
RUN git clone --filter=blob:none --sparse https://github.com/ROCm/rocm-libraries.git $ROCM_LIBS_DIR

RUN . /env; if [ -z $SKIP_MIOPEN_BUILD ]; then \
git clone --filter=blob:none --sparse https://github.com/ROCm/rocm-libraries.git $ROCM_LIBS_DIR; \
else \
mkdir -p $ROCM_LIBS_DIR/projects && mkdir -p $MIOPEN_DIR; \
fi

# Run sparse-checkout from the git repo root
RUN . /env; if [ -z $SKIP_MIOPEN_BUILD ]; then \
cd $ROCM_LIBS_DIR && git sparse-checkout set projects/miopen; \
fi

WORKDIR $MIOPEN_DIR
RUN git sparse-checkout set projects/miopen
ARG MIOPEN_BRANCH=4940cf3ec
RUN git pull && git checkout $MIOPEN_BRANCH

# not sure what this commit is, using latest develop for now
# ARG MIOPEN_BRANCH=4940cf3ec
ARG MIOPEN_BRANCH=develop
RUN . /env; if [ -z $SKIP_MIOPEN_BUILD ]; then \
git pull && git checkout $MIOPEN_BRANCH; \
fi

ARG PREFIX=/opt/rocm
ARG MIOPEN_DEPS=$MIOPEN_DIR/deps

# Install dependencies # included in rocm/miopen:ci_xxxxxx
ARG BUILD_MIOPEN_DEPS=
ARG ARCH_TARGET=
RUN . /env; if [ -z $NO_ROCM_INST ] || ! [ -z $BUILD_MIOPEN_DEPS ]; then\
RUN . /env; if [ -z $SKIP_MIOPEN_BUILD ] && ([ -z $NO_ROCM_INST ] || ! [ -z $BUILD_MIOPEN_DEPS ]); then\
pip install cget; \
if ! [ -z $ARCH_TARGET ]; then \
sed -i "s#\(composable_kernel.*\)#\1 -DGPU_TARGETS=\"$ARCH_TARGET\"#" requirements.txt; \
fi; \
apt-get remove -y composablekernel-dev miopen-hip; \
CXX=/opt/rocm/llvm/bin/clang++ cget install -f ./dev-requirements.txt --prefix $MIOPEN_DEPS -DCMAKE_POLICY_VERSION_MINIMUM=3.5; \
git checkout requirements.txt; \
echo "=== DEBUG: cget install completed, checking for composable_kernel ==="; \
ls -la $MIOPEN_DEPS/lib/cmake/ || echo "No cmake configs found"; \
fi

ARG TUNA_USER=miopenpdb
Expand All @@ -150,36 +207,85 @@ WORKDIR $MIOPEN_DIR/build
ARG MIOPEN_CACHE_DIR=/tmp/${TUNA_USER}/cache
ARG MIOPEN_USER_DB_PATH=/tmp/$TUNA_USER/config/miopen
# build kdb objects with offline clang compiler, disable comgr + hiprtc (which would make target id specific code objects)
ARG MIOPEN_CMAKE_ARGS="-DMIOPEN_USE_COMGR=Off -DMIOPEN_USE_HIPRTC=Off -DMIOPEN_INSTALL_CXX_HEADERS=On -DMIOPEN_CACHE_DIR=${MIOPEN_CACHE_DIR} -DMIOPEN_USER_DB_PATH=${MIOPEN_USER_DB_PATH} -DMIOPEN_BACKEND=${BACKEND} -DCMAKE_PREFIX_PATH=${MIOPEN_DEPS}"
ARG MIOPEN_CMAKE_ARGS="-DMIOPEN_USE_COMGR=on -DMIOPEN_USE_HIPRTC=On -DMIOPEN_INSTALL_CXX_HEADERS=On -DMIOPEN_CACHE_DIR=${MIOPEN_CACHE_DIR} -DMIOPEN_USER_DB_PATH=${MIOPEN_USER_DB_PATH} -DMIOPEN_BACKEND=${BACKEND} -DCMAKE_PREFIX_PATH=${MIOPEN_DEPS} -DBUILD_TESTING=Off -DMIOPEN_USE_MLIR=OFF"

RUN echo "MIOPEN: Selected $BACKEND backend."
RUN if [ $BACKEND = "OpenCL" ]; then \
cmake -DMIOPEN_HIP_COMPILER=/opt/rocm/llvm/bin/clang++ ${MIOPEN_CMAKE_ARGS} $MIOPEN_DIR ; \
else \
CXX=/opt/rocm/llvm/bin/clang++ cmake ${MIOPEN_CMAKE_ARGS} $MIOPEN_DIR ; \
RUN . /env; if [ -z $SKIP_MIOPEN_BUILD ]; then \
echo "MIOPEN: Selected $BACKEND backend."; \
fi


# Debug: Check if cmake directory exists and list its contents
RUN . /env; if [ -z $SKIP_MIOPEN_BUILD ]; then \
echo "=== DEBUG: Current directory ==="; \
pwd; \
echo "=== DEBUG: Parent directory contents ==="; \
ls -la ..; \
echo "=== DEBUG: Parent cmake directory ==="; \
ls -la ../cmake/ || echo "cmake directory not found!"; \
echo "=== DEBUG: CMAKE_MODULE_PATH value ==="; \
echo "../cmake"; \
echo "=== DEBUG: Checking if cmake files exist ==="; \
test -f ../cmake/ClangCheck.cmake && echo "ClangCheck.cmake EXISTS" || echo "ClangCheck.cmake NOT FOUND"; \
test -f ../cmake/TargetFlags.cmake && echo "TargetFlags.cmake EXISTS" || echo "TargetFlags.cmake NOT FOUND"; \
test -f ../cmake/CheckCXXLinkerFlag.cmake && echo "CheckCXXLinkerFlag.cmake EXISTS" || echo "CheckCXXLinkerFlag.cmake NOT FOUND"; \
fi

RUN make -j $(nproc)
RUN make install

#Build Fin
WORKDIR $MIOPEN_DIR
RUN git submodule update --init --recursive
RUN . /env; if [ -z $SKIP_MIOPEN_BUILD ]; then \
if [ $BACKEND = "OpenCL" ]; then \
cmake -DMIOPEN_HIP_COMPILER=/opt/rocm/llvm/bin/clang++ ${MIOPEN_CMAKE_ARGS} .. ; \
else \
CXX=/opt/rocm/llvm/bin/clang++ cmake ${MIOPEN_CMAKE_ARGS} .. ; \
fi; \
fi

RUN . /env; if [ -z $SKIP_MIOPEN_BUILD ]; then \
make -j $(nproc) MIOpen; \
make -j $(nproc) MIOpenDriver; \
fi

RUN . /env; if [ -z $SKIP_MIOPEN_BUILD ]; then \
make install; \
fi

# ============================================
# Build Fin (if needed)
# ============================================
# Fin is built as a submodule of MIOpen, so we only build it if MIOpen was also built
ARG FIN_DIR=$MIOPEN_DIR/fin

# Initialize Fin submodule (only runs if MIOpen was built)
RUN . /env; if [ -z $SKIP_FIN_BUILD ]; then \
echo "=== Initializing Fin as MIOpen submodule ==="; \
cd $MIOPEN_DIR && git submodule update --init --recursive; \
fi

WORKDIR $FIN_DIR

# Can be a branch or a SHA
ARG FIN_BRANCH=develop
RUN if ! [ -z $FIN_BRANCH ]; then \
git fetch && git checkout $FIN_BRANCH; \
RUN . /env; if [ -z $SKIP_FIN_BUILD ]; then \
if ! [ -z $FIN_BRANCH ]; then \
git fetch && git checkout $FIN_BRANCH; \
fi; \
fi

# Install dependencies
#RUN cmake -P install_deps.cmake

WORKDIR $FIN_DIR/_hip
RUN CXX=/opt/rocm/llvm/bin/clang++ cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_PREFIX_PATH=$MIOPEN_DEPS $FIN_DIR

RUN make -j $(nproc)
RUN make install
RUN . /env; if [ -z $SKIP_FIN_BUILD ]; then \
CXX=/opt/rocm/llvm/bin/clang++ cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_PREFIX_PATH=$MIOPEN_DEPS $FIN_DIR; \
fi

RUN . /env; if [ -z $SKIP_FIN_BUILD ]; then \
make -j $(nproc); \
fi

RUN . /env; if [ -z $SKIP_FIN_BUILD ]; then \
make install; \
fi

#SET MIOPEN ENVIRONMENT VARIABLES
ENV MIOPEN_LOG_LEVEL=6
Expand Down Expand Up @@ -209,3 +315,26 @@ RUN python3 setup.py install

# reset WORKDIR to /tuna
WORKDIR /tuna

# save BASEIMAGE as env variable
ENV BASEIMAGE=${BASEIMAGE}

# install mysql-server and mysql-client
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -f -y --allow-unauthenticated \
mysql-server \
mysql-client

# install redis-server
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -f -y --allow-unauthenticated \
redis-server

# install RabbitMQ server
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -f -y --allow-unauthenticated \
rabbitmq-server

# install iproute2
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -f -y --allow-unauthenticated \
iproute2

# clean up apt cache
RUN apt-get clean && rm -rf /var/lib/apt/lists/*
2 changes: 1 addition & 1 deletion alembic/versions/054211043da5_benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
import sqlalchemy as sa
from sqlalchemy.sql import func as sqla_func
from sqlalchemy import Column, Integer, DateTime, text, ForeignKey, String
from tuna.miopen.benchmark import ModelEnum, FrameworkEnum
from tuna.miopen.db.benchmark import ModelEnum, FrameworkEnum
from sqlalchemy.dialects.mysql import TINYINT, DOUBLE, MEDIUMBLOB, LONGBLOB
from sqlalchemy import Float, BigInteger, String
from sqlalchemy import Enum
Expand Down
38 changes: 38 additions & 0 deletions alembic/versions/a1b2c3d4e5f6_add_machine_hostname_unique.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
"""add_machine_hostname_unique

Revision ID: a1b2c3d4e5f6
Revises: 219858383a66
Create Date: 2025-11-18 02:38:00.000000

"""
from alembic import op
import sqlalchemy as sa

# revision identifiers, used by Alembic.
revision = 'a1b2c3d4e5f6'
down_revision = '219858383a66'
branch_labels = None
depends_on = None


def upgrade() -> None:
# First, remove any duplicate hostnames if they exist
# Keep the oldest entry (lowest id) for each hostname
op.execute("""
DELETE m1 FROM machine m1
INNER JOIN machine m2
WHERE m1.id > m2.id
AND m1.hostname = m2.hostname
""")

# Then add the unique constraint on hostname
# Using prefix length of 255 since hostname is TEXT type
op.create_index('idx_hostname',
'machine', ['hostname'],
unique=True,
mysql_length={'hostname': 255})


def downgrade() -> None:
# Remove the unique constraint
op.drop_index('idx_hostname', 'machine')
22 changes: 10 additions & 12 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
aioredis==2.0.1
alembic==1.8.1
asn1crypto==0.24.0
astroid==2.15.4
astroid>=3.0.0
asyncio==3.4.3
attrs==19.3.0
backcall==0.1.0
Expand All @@ -11,7 +10,7 @@ celery==5.3.4
cryptography==43.0.1
decorator==4.3.0
docutils==0.20
flask==2.2.5
flask>=3.0.0
flower==2.0.1
idna==3.7
importlib-metadata>=6.6.0
Expand All @@ -23,40 +22,39 @@ markdown-it-py==3.0.0
mccabe==0.6.1
myst-parser==3.0.1
more-itertools==8.3.0
numpy==1.24.2
numpy>=1.26.0
opentelemetry-api==1.12.0rc2
opentelemetry-distro==0.32b0
opentelemetry-exporter-otlp-proto-http==1.11.1
packaging==24.1
pandas==1.5.3
pandas>=2.1.0
paramiko==3.5.0
parso==0.3.1
pathlib2==2.3.5
pexpect==4.6.0
pickleshare==0.7.5
pluggy==0.13.1
pluggy>=1.5.0
prompt-toolkit==3.0.36
protobuf<5.0.0dev,>=3.19.5
ptyprocess==0.6.0
py==1.10.0
pyasn1==0.4.4
pycparser==2.19
Pygments==2.18.0
pylint<=2.17.0-dev0,>=2.15.4
pylint>=3.0.0
pymysql==1.1.1
PyNaCl==1.5
pyparsing==2.4.7
pytest==7.4.4
pytest>=8.0.0
pytest-asyncio==0.21
pyyaml==6.0
pyyaml
redis==5.0.1
six==1.12.0
sqlalchemy==1.3.23
six>=1.16.0
sqlalchemy>=2.0.0
sphinx==7.4.7
sphinx_rtd_theme==2.0.0
traitlets==4.3.2
twine==5.1.1
typed-ast==1.5.4
types-PyYAML==6.0.12.6
types-paramiko==3.0.0.4
types-PyMySQL==1.0.19.5
Expand Down
2 changes: 1 addition & 1 deletion tests/test_celery.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
import pytest
from time import sleep
from multiprocessing import Value
import aioredis
import redis.asyncio as aioredis
import pytest_asyncio
from sqlalchemy.inspection import inspect

Expand Down
Loading