Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
09ea2f4
Add CI campaign source-of-truth plan
kondratyevd Feb 4, 2026
c76e026
Add advisory CI integrity, quality, container, gitops, and security c…
kondratyevd Feb 4, 2026
7049587
Run CI checks on pull requests only
kondratyevd Feb 4, 2026
73d2169
Fix CI baseline failures and docker filter dependency trigger
kondratyevd Feb 4, 2026
1b2c2ba
Fix remaining CI failures in integrity, shell, python, and af-pod-mon…
kondratyevd Feb 4, 2026
9b769d0
Align python and shell lint checks with baseline style
kondratyevd Feb 4, 2026
05beb61
Refresh CI plan to current optimization workstreams
kondratyevd Feb 4, 2026
43e0d63
ci(worker2): add scoped integration scenarios and fixture matrix
kondratyevd Feb 4, 2026
4ca5aa8
ci(worker1): tighten unit coverage flow and clean transient test arti…
kondratyevd Feb 4, 2026
625737d
ci(worker3): optimize advisory security and runtime workflow scope
kondratyevd Feb 4, 2026
dd78608
docs(ci): align CI plan with active workflow surface
kondratyevd Feb 4, 2026
52dab13
ci(docker): enable gha cache for advisory image builds
kondratyevd Feb 4, 2026
fd4b57a
ci: fix lint formatting and unit coverage reporting
kondratyevd Feb 4, 2026
a592a4c
ci: add formatter autofix workflow and resolve lint-python import order
kondratyevd Feb 4, 2026
ef3821b
ci: harden autofix workflow push ref handling
kondratyevd Feb 4, 2026
15fad76
docs: modernize CI badges with status and policy signal
kondratyevd Feb 4, 2026
6304b85
ci: trim duplicate checks and path-scope fast lint workflows
kondratyevd Feb 4, 2026
237dc69
ci: harden runtime and security workflow execution behavior
kondratyevd Feb 4, 2026
5cb774d
ci: path-scope and harden autofix workflow runtime behavior
kondratyevd Feb 4, 2026
61f918b
docs: elevate CI section with expanded runtime badges and CI profile …
kondratyevd Feb 4, 2026
dbe0c48
ci: pin formatter and lint toolchain versions for deterministic runs
kondratyevd Feb 4, 2026
903233f
docs(ci): sync CI plan with current runtime strategy
kondratyevd Feb 4, 2026
600da03
ci: fix actionlint shell warnings in workflow summaries
kondratyevd Feb 4, 2026
6999e2f
ci: remove docker build timeouts and trim build context
kondratyevd Feb 4, 2026
7f675f1
ci: bound docker builds and improve build visibility
kondratyevd Feb 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions .codex/CI_PLAN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# CI/CD Campaign Plan (Current State)

## Mission
Deliver one draft PR from `codex/ci` to `main` with a stable advisory-first CI baseline, then optimize test depth, integration realism, and security signal without broad refactors.

## Current Status
- PR branch: `codex/ci`
- Delivery model: single PR `codex/ci -> main`
- PR #21 is open against `main` (not draft).
- Existing CI baseline is green on fast PR checks; container build jobs are the long pole.
- Root-context Docker builds are cache-enabled and use a repo-level `.dockerignore` to reduce context size.
- Lint workflows are check-only; formatter autofix workflow can commit formatting-only fixes to PR branches.

## Success Criteria
- CI remains stable on `pull_request` runs for all configured workflows.
- Optimization phase adds meaningful unit and integration coverage for repo-owned code.
- Security checks include nightly advisory plus PR-time advisory signal.
- `README.md` keeps A-E category badges aligned with active workflows.
- `.codex/CI_PLAN.md` remains the single source of truth.

## In-Scope / Out-of-Scope Paths
In scope:
- `.github/**`
- `apps/**`
- `deploy/**`
- `docker/**` (except exclusions)
- `README.md`
- `.codex/CI_PLAN.md`

Out of scope:
- `docker/dask-gateway-server/**`
- `docs/**`
- `docs/source/demos/**`
- `docker/kaniko-build-jobs/**`
- `slurm/**`
- `.cursor/**`

Approved exception:
- `slurm/**` is used as a dependency-only trigger in container reliability path filters because maintained Dockerfiles copy `slurm/` artifacts.
- CI auto-commit is enabled for formatter-only fixes in `ci-format-autofix.yml` to reduce lint iteration noise.

## Active Workflow Surface
- `.github/workflows/ci-workflow-integrity.yml`
- `.github/workflows/lint-python.yml`
- `.github/workflows/lint-shell.yml`
- `.github/workflows/lint-json.yml`
- `.github/workflows/lint-yaml.yml`
- `.github/workflows/ci-format-autofix.yml`
- `.github/workflows/ci-repo-quality.yml`
- `.github/workflows/ci-integration-scenarios.yml`
- `.github/workflows/lint-docker.yml`
- `.github/workflows/ci-gitops-deployability.yml`
- `.github/workflows/ci-security-advisory.yml`
- `.github/workflows/nightly-security-advisory.yml`

## Check Architecture
### A) CI System Integrity (advisory)
- Workflow: `ci-workflow-integrity.yml`
- Checks: actionlint + workflow YAML parse.
- Risk: broken workflow definitions and silent CI drift.

### B) Repo Quality and Tests (advisory)
- Workflows: `lint-python.yml`, `lint-shell.yml`, `lint-json.yml`, `lint-yaml.yml`, `ci-format-autofix.yml`, `ci-repo-quality.yml`, `ci-integration-scenarios.yml`
- Checks: black/isort check-only, py_compile, pytest unit advisory with coverage threshold, shellcheck/shfmt/bash -n, JSON/YAML parse, auto-format commits for changed Python/shell/JSON/YAML files, integration scenario matrix tests via mocked container/monitoring flows.
- Execution model: fast workflows are path-scoped with PR concurrency cancellation; formatter/lint tool versions are pinned for deterministic behavior.
- Risk: script/runtime regressions.

### C) Container Reliability (advisory)
- Workflow: `lint-docker.yml`
- Checks: hadolint, targeted Docker Buildx jobs with GitHub Actions layer cache, smoke checks via `.github/scripts/container-smoke.sh`.
- Execution model: path-scoped change detection, 120-minute per-job timeout cap for Docker build jobs, root-context `.dockerignore` optimization, BuildKit plain progress logging, and advisory summaries in run output.
- Risk: image build/runtime regressions.

### D) GitOps Deployability (advisory)
- Workflow: `ci-gitops-deployability.yml`
- Checks: kustomize render + kubeconform schema validation.
- Execution model: overlay-scoped detection, explicit job timeouts, and advisory plan/result summaries in run output.
- Risk: Flux reconciliation failures from invalid manifests.

### E) Security Posture (advisory)
- Workflows: `nightly-security-advisory.yml`, `ci-security-advisory.yml`
- Checks: nightly Trivy filesystem scan plus PR-time advisory Trivy vulnerability/config scans with run summaries and artifacts.
- Execution model: path-scoped PR scans, explicit scan timeouts, and summary tables for scan scope/outcomes.
- Risk: security drift in dependencies/configuration.

## Optimization Workstreams (Current)
### Worker 1: Coverage Optimizer
File lane:
- `tests/unit/**`
- `tests/conftest.py`
- `.github/workflows/lint-python.yml`
- `.github/workflows/ci-repo-quality.yml`
Goal:
- Increase meaningful Python test coverage and publish coverage in CI (advisory threshold first).

### Worker 2: Integration Scenarios
File lane:
- `tests/integration/**`
- `tests/fixtures/**`
- `.github/workflows/ci-integration-scenarios.yml` (new)
- `.github/scripts/integration/**`
Goal:
- Add realistic automated integration scenarios with deterministic mocks and PR advisory execution.

### Worker 3: Security and Runtime Optimizer
File lane:
- `.github/workflows/nightly-security-advisory.yml`
- `.github/workflows/ci-security-advisory.yml` (new)
- `.github/workflows/lint-docker.yml`
- `.github/workflows/ci-gitops-deployability.yml`
Goal:
- Add PR-time advisory security checks and reduce CI runtime/noise safely.

## Branch and Sync Rules
- No side branches.
- No force-push on shared campaign work.
- Daily sync: merge `main` into `codex/ci` (no rebase).

## Constraint Challenge Protocol
If any hard constraint must be challenged, submit an `EXCEPTION REQUEST` with:
1) challenged constraint,
2) concrete risk if unchanged,
3) minimal exception requested,
4) rollback path.

No exception is implemented without explicit user approval.
14 changes: 14 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Keep root-context Docker builds lean for CI.
# This file affects builds that use `context: .` in GitHub Actions.
**

# Keep required sources for maintained root-context Dockerfiles.
!docker/
!docker/interlink-slurm-plugin/
!docker/interlink-slurm-plugin/**
!docker/purdue-af/
!docker/purdue-af/**
!slurm/
!slurm/slurm-24.05.1-1.el8.x86_64.rpm
!slurm/slurm-configs/
!slurm/slurm-configs/**
30 changes: 30 additions & 0 deletions .github/scripts/container-smoke.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/usr/bin/env bash
set -euo pipefail

if [ "$#" -ne 2 ]; then
echo "Usage: $0 <image> <profile>" >&2
exit 2
fi

image="$1"
profile="$2"

docker image inspect "$image" >/dev/null

case "$profile" in
af-pod-monitor)
docker run --rm --entrypoint python "$image" -c "import prometheus_client"
;;
interlink-slurm-plugin)
docker run --rm --entrypoint /bin/sh "$image" -lc 'test -x /sidecar/slurm-sidecar'
;;
purdue-af)
docker run --rm --entrypoint /bin/bash "$image" -lc 'python --version && jupyter --version >/dev/null'
;;
*)
echo "Unknown profile: $profile" >&2
exit 2
;;
esac

echo "Smoke checks passed for profile: $profile"
45 changes: 45 additions & 0 deletions .github/scripts/integration/mock-docker-cli.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#!/usr/bin/env bash
set -euo pipefail

if [ -n "${MOCK_DOCKER_LOG:-}" ]; then
printf '%s\n' "$*" >>"$MOCK_DOCKER_LOG"
fi

cmd="${1:-}"
shift || true

case "$cmd" in
image)
subcmd="${1:-}"
shift || true
if [ "$subcmd" != "inspect" ]; then
echo "mock docker unsupported image subcommand: $subcmd" >&2
exit 64
fi

if [ -n "${MOCK_DOCKER_INSPECT_STDOUT:-}" ]; then
printf '%s\n' "$MOCK_DOCKER_INSPECT_STDOUT"
fi
if [ -n "${MOCK_DOCKER_INSPECT_STDERR:-}" ]; then
printf '%s\n' "$MOCK_DOCKER_INSPECT_STDERR" >&2
fi

exit "${MOCK_DOCKER_INSPECT_EXIT:-0}"
;;

run)
if [ -n "${MOCK_DOCKER_RUN_STDOUT:-}" ]; then
printf '%s\n' "$MOCK_DOCKER_RUN_STDOUT"
fi
if [ -n "${MOCK_DOCKER_RUN_STDERR:-}" ]; then
printf '%s\n' "$MOCK_DOCKER_RUN_STDERR" >&2
fi

exit "${MOCK_DOCKER_RUN_EXIT:-0}"
;;

*)
echo "mock docker unsupported command: $cmd" >&2
exit 64
;;
esac
7 changes: 7 additions & 0 deletions .github/scripts/integration/run-integration-scenarios.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/usr/bin/env bash
set -euo pipefail

repo_root="$(cd "$(dirname "${BASH_SOURCE[0]}")/../../.." && pwd)"

cd "$repo_root"
python3 -m unittest discover -s tests/integration -p 'test_*.py' -v
Loading
Loading