(fix):Resolve Windows CUDA Flash Attention prebuilt wheels by tonyjohnvan · Pull Request #604 · ace-step/ACE-Step-1.5

tonyjohnvan · 2026-02-16T06:57:49Z

Summary

Fix two dependency bugs affecting Windows CUDA users, introduced/exposed by the DGX Spark PR (#575):

Windows uv run fails with "unsatisfiable" torch resolution — Adding win32/AMD64 to required-environments forced uv lock to cross-resolve all four platforms simultaneously. Since uv.lock is gitignored, Windows users had no lockfile and the implicit uv lock failed resolving torch==2.7.1+cu128 from the explicit PyTorch index. Fix: remove win32/AMD64 from required-environments (restoring pre-(fix):DGX Spark dependencies #575 local resolution behavior) and upgrade Windows torch to 2.10.0+cu128 to align with Linux x86_64.
flash-attn unavailable on Python 3.12 — The only Windows flash-attn wheel (sdbds, cp311 built against torch 2.7.1) excluded Python 3.12 users entirely. Fix: switch to mjun0812/flash-attention-prebuild-wheels which provides cp311 and cp312 wheels built against torch 2.10 for both Windows (cu126, backward-compatible with cu128 runtime) and Linux (cu128).

Scope

Files changed:

pyproject.toml — Upgrade Windows torch/torchvision/torchaudio from 2.7.1 to 2.10.0+cu128, add platform_machine == 'AMD64' marker, remove win32 from required-environments
acestep/third_parts/nano-vllm/pyproject.toml — Replace sdbds flash-attn wheel with mjun0812 wheels (cp311 + cp312) for Windows; add cp312 wheel for Linux
requirements.txt — Mirror the same torch upgrade and flash-attn wheel changes for pip users

Explicitly out of scope:

Linux x86_64 torch (unchanged at 2.10.0+cu128)
Linux aarch64 / DGX Spark torch (unchanged at 2.10.0+cu130)
macOS arm64 torch (unchanged at >=2.9.1)
triton / triton-windows dependencies (unchanged)
All application code (zero Python source changes)

Risk and Compatibility

Platform	torch	flash-attn	Status
Windows AMD64	2.7.1+cu128 → 2.10.0+cu128	sdbds cp311 → mjun0812 cp311+cp312	Changed
Linux x86_64	2.10.0+cu128	added cp312 wheel	Additive only
Linux aarch64 (DGX Spark)	2.10.0+cu130	N/A (SDPA fallback)	Unchanged
macOS arm64	>=2.9.1	N/A (SDPA fallback)	Unchanged

Windows torch upgrade: all 2.10.0+cu128 wheels verified to exist on the PyTorch index for cp311, cp312, cp313 (win_amd64)
Windows flash-attn cu126 wheels are backward-compatible with cu128 CUDA runtime (CUDA minor versions are forward-compatible within the same major version)
required-environments removal for Windows: restores the exact behavior that worked before (fix):DGX Spark dependencies #575 — uv resolves locally at sync time instead of requiring cross-platform lockfile resolution

Regression Checks

uv lock succeeds (resolves 172 packages)
uv sync --dry-run succeeds
Lockfile contains correct Windows torch 2.10.0+cu128 wheels (cp311 + cp312)
Lockfile contains correct flash-attn entries for all four platform/python combos
DGX Spark (aarch64) resolution unchanged in lockfile
macOS arm64 resolution unchanged in lockfile
Windows end-to-end: uv run acestep (needs Windows tester)
Linux end-to-end: flash-attn import on Python 3.12 (needs Linux tester)

Reviewer Notes

uv.lock is gitignored (.gitignore line 112) — this is the root cause of Bug 1. Users never receive a lockfile, so uv run triggers uv lock on their machine. Committing uv.lock to version control would be a stronger fix but is a separate discussion.
No cp312 flash-attn for Windows + torch 2.7.1 exists anywhere — the sdbds repo only builds cp311, and mjun0812 only builds against torch 2.10. Upgrading Windows torch to 2.10.0 was necessary to unlock cp312 flash-attn support.
Follow-up: Consider un-gitignoring uv.lock so users get reproducible installs without needing to resolve dependencies themselves.

Summary by CodeRabbit

Chores
- Updated PyTorch, torchvision, and torchaudio to 2.10.0+cu128 for Windows (AMD64) and aligned Linux x86_64 builds to 2.10.0+cu128.
- Added/explicitly declared macOS ARM64 (Apple Silicon) CPU/MPS wheel entries and preserved Linux aarch64 configurations.
- Introduced a platform-generic flash-attn dependency and added curated prebuilt wheel sources for Windows and Linux to broaden compatibility.
- Simplified platform/Python gating for several native-accelerator wheels to reduce overly specific version constraints.

coderabbitai · 2026-02-16T06:58:01Z

📝 Walkthrough

Walkthrough

Dependency manifests updated: platform- and Python-gated wheel sources and dependency lines for PyTorch, torchvision, torchaudio, Triton, and flash-attn were reworked—Windows entries now include platform_machine == 'AMD64' and PyTorch versions bumped; flash-attn platform-specific wheel URLs replaced with generic entries and new wheel source blocks added.

Changes

Cohort / File(s)	Summary
Nano-vLLM pyproject `acestep/third_parts/nano-vllm/pyproject.toml`	Reworked Triton and flash-attn dependency constraints: removed Python-version-gated platform-specific Triton entries; replaced platform/Python-specific flash-attn wheel URLs with a generic `flash-attn` entry.
Top-level pyproject.toml `pyproject.toml`	Bumped Windows PyTorch/torchvision/torchaudio to 2.10.0+cu128 / 0.25.0+cu128 / 2.10.0+cu128 with `sys_platform == 'win32'` and `platform_machine == 'AMD64'` gating; added `tool.uv.sources` wheel URLs for flash-attn and platform/python markers; added flash-attn dependency gated off macOS/ARM.
Requirements manifest `requirements.txt`	Updated platform-specific wheel lines: added/adjusted Windows AMD64 gating, macOS arm64 entries, Linux x86_64 / aarch64 entries for CUDA/CPU variants; replaced Windows-specific flash-attn URL with a generic flash-attn specifier and aligned other platform flash-attn entries.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

(fix):DGX Spark dependencies #575: Modifies the same manifest-level dependency entries (pyproject.toml/requirements.txt) to remap platform/architecture-specific PyTorch, Triton, and flash-attn wheel sources and markers.

Suggested reviewers

ChuxiJ

Poem

🐰 I nibbled through manifests, neat and quick,
Swapped wheels and markers, trimmed each wick,
AMD64 now gets its proper gate,
Flash-attn moved and versions update,
Hooray — builds hop forward, light and slick!

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately describes the main fix: resolving Windows CUDA Flash Attention prebuilt wheels configuration, which is the core objective.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Merge Conflict Detection	✅ Passed	✅ No merge conflicts detected when merging into `main`

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@requirements.txt`:
- Around line 54-56: The Linux entry for "flash-attn" in requirements.txt
currently uses a generic specifier and should be replaced with explicit prebuilt
wheel URLs (like the Windows entries) so pip won't attempt to compile from
source; update the single "flash-attn; sys_platform == 'linux' and
platform_machine == 'x86_64'" line to explicit wheel installs for Linux CPython
3.11 and 3.12 (using python_version == '3.11' and python_version == '3.12'
conditionals) mirroring the style used for the Windows lines and consistent with
the wheel URLs used in acessórios/nano-vllm/pyproject.toml so users on Linux get
the prebuilt wheels.

requirements.txt

…sh-attn Two bugs introduced/exposed by the DGX Spark PR (ace-step#575): 1. Adding win32/AMD64 to required-environments forced uv lock to cross-resolve all four platforms. Since uv.lock is gitignored, Windows users had no lockfile and uv lock failed resolving torch==2.7.1+cu128 from the explicit pytorch index. Fix: remove win32 from required-environments so Windows resolves locally at sync time (restoring pre-ace-step#575 behavior), and upgrade Windows torch to 2.10.0+cu128 to align with Linux x86_64. 2. flash-attn only had a Python 3.11 (cp311) wheel for Windows (sdbds, built against torch 2.7.1). Python 3.12 users could not install flash-attn at all. Fix: switch to mjun0812/flash-attention-prebuild-wheels which provides cp311 and cp312 wheels built against torch 2.10 for both Windows (cu126, backward-compatible with cu128 runtime) and Linux (cu128). Added cp312 entries for both platforms. Co-authored-by: Cursor <cursoragent@cursor.com>

- Added flash-attn dependency to pyproject.toml and requirements.txt for better platform support. - Updated flash-attn source URLs in pyproject.toml to include specific wheels for Windows and Linux based on Python versions. - Removed old flash-attn wheel links from nano-vllm's pyproject.toml to streamline dependency management.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@acestep/third_parts/nano-vllm/pyproject.toml`:
- Around line 14-19: Update the pyproject.toml dependency entries to guard
platform/Python compatibility: make the "flash-attn" dependency conditional
(only Linux x86_64 with Python >=3.11 and Windows x86_64 for supported Python
versions) by adding appropriate environment markers (e.g., sys_platform and
platform_machine and python_version) and tighten "triton-windows" to exclude
Windows ARM by adding platform_machine == 'AMD64'; also update or raise the
requires-python metadata from ">=3.10" to ">=3.11" (or add python_version
markers on flash-attn) so the nano-vllm path dependency won’t force unresolved
wheels on unsupported OS/architectures and Python versions.

coderabbitai · 2026-02-16T10:23:28Z

acestep/third_parts/nano-vllm/pyproject.toml

    "torch>=2.4.0",
-    # Triton and Flash Attention are optional on ROCm (Python 3.12) - SDPA fallback used instead
-    "triton-windows>=3.0.0,<3.4; sys_platform == 'win32' and python_version == '3.11'",
-    "triton>=3.0.0; sys_platform == 'linux' and python_version == '3.11'",
+    "triton>=3.0.0; sys_platform == 'linux'",
+    "triton-windows>=3.0.0,<3.4; sys_platform == 'win32'",
    "transformers>=4.51.0",
-    "flash-attn @ https://github.com/sdbds/flash-attention-for-windows/releases/download/2.8.2/flash_attn-2.8.2+cu128torch2.7.1cxx11abiFALSEfullbackward-cp311-cp311-win_amd64.whl ; sys_platform == 'win32' and python_version == '3.11' and platform_machine == 'AMD64'",
-    "flash-attn @ https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.7.12/flash_attn-2.8.3+cu128torch2.10-cp311-cp311-linux_x86_64.whl ; sys_platform == 'linux' and python_version == '3.11' and platform_machine == 'x86_64'",
+    "flash-attn",
    "xxhash",


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, read the pyproject.toml file to see the full dependency section and requires-python cat -n acestep/third_parts/nano-vllm/pyproject.toml | head -50

Repository: ace-step/ACE-Step-1.5

Length of output: 916

🏁 Script executed:

# Also check if there's a git history or PR context cd acestep/third_parts/nano-vllm && git log --oneline -10 2>/dev/null || echo "No git history available"

Repository: ace-step/ACE-Step-1.5

Length of output: 137

🏁 Script executed:

# Search for any documentation about flash-attn wheels or platform support rg -i "flash.attn|triton.windows" --type python --type toml -A 3 -B 3

Repository: ace-step/ACE-Step-1.5

Length of output: 95

🏁 Script executed:

# Search for PR context or discussion about flash-attn wheels rg "flash.attn|cp311|cp312" --type toml --type md -B 2 -A 2

Repository: ace-step/ACE-Step-1.5

Length of output: 3731

🏁 Script executed:

# Check the git commit for more details about flash-attn update cd acestep/third_parts/nano-vllm && git show 779bddf --stat 2>/dev/null || echo "Cannot show commit details"

Repository: ace-step/ACE-Step-1.5

Length of output: 50377

🌐 Web query:

flash-attn wheel availability python versions platforms 2024 2025

💡 Result:

Official flash-attn (Dao-AILab) distribution in 2024–2025

PyPI does not provide prebuilt wheels for the 2024–2025 releases (it uploads source tarballs only), so pip install flash-attn generally means building from source. This is true for:

2.7.0.post2 (Nov 13, 2024): only flash_attn-2.7.0.post2.tar.gz (no .whl files) and metadata says Python >= 3.8. [2]

2.8.3 (Aug 15, 2025): only flash_attn-2.8.3.tar.gz (no .whl files) and metadata says Python >= 3.9. [1]

Upstream states the project is primarily Linux-focused; Windows “might work” but is not the main supported path (and they ask for help setting up prebuilt Windows wheels). [3]

Where wheels were available in practice (community, 2024–2025 timeframe)

Because PyPI is source-only, most people relied on unofficial/community wheel builders:

Linux (x86_64 / manylinux): mjun0812/flash-attention-prebuild-wheels publishes large matrices of wheels by Python / PyTorch / CUDA combos. [4]

Windows (win_amd64): multiple community repos published wheels for specific setups, e.g.

NeedsMoar builds: wheels for Python 3.10 and 3.11 (CUDA-specific). [5]

sunsetcoder builds: wheels (notably mentions Python 3.10 availability). [6]

Practical summary for “wheel availability” (2024–2025)

If you installed from PyPI in 2024–2025: you were not getting wheels; you were building from source. [1][2]

If you needed wheels: you typically used community GitHub/HF wheel repos (Linux coverage tends to be broader than Windows; Windows wheels are often narrower and GPU/CUDA-specific). [3][4][5][6]

Aggregators such as flashattn.dev exist to help find matching community wheels by platform/Python/PyTorch/CUDA. [7]

Add platform/Python markers for flash-attn to prevent failed resolution on unsupported configurations.

At Line 18, flash-attn is unconditional in this submodule's pyproject.toml. While the parent project applies these markers at the root level, nano-vllm as a path dependency should also declare them defensively. Community wheels for flash-attn are available only for Linux x86_64 (Python 3.11+) and Windows x86_64 (Python 3.12–3.13); there are no macOS or Windows ARM wheels. If nano-vllm is installed standalone or in an environment where parent markers are not evaluated, installs on Python 3.10, macOS, or Windows ARM will fail during dependency resolution.

Similarly, Line 16 (triton-windows) should exclude Windows ARM by adding platform_machine == 'AMD64' to match the x86_64-only wheel availability.

Additionally, consider raising requires-python from >=3.10 to >=3.11 to align with flash-attn wheel availability, or apply version-specific markers as shown in the parent project's configuration.

✅ Suggested markers (align with root project)

- "triton-windows>=3.0.0,<3.4; sys_platform == 'win32'", - "flash-attn", + "triton-windows>=3.0.0,<3.4; sys_platform == 'win32' and platform_machine == 'AMD64'", + "flash-attn; (sys_platform == 'linux' and platform_machine == 'x86_64' and python_version >= '3.11') or (sys_platform == 'win32' and platform_machine == 'AMD64' and python_version >= '3.12')",

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"torch>=2.4.0",

# Triton and Flash Attention are optional on ROCm (Python 3.12) - SDPA fallback used instead

"triton-windows>=3.0.0,<3.4; sys_platform == 'win32' and python_version == '3.11'",

"triton>=3.0.0; sys_platform == 'linux' and python_version == '3.11'",

"triton>=3.0.0; sys_platform == 'linux'",

"triton-windows>=3.0.0,<3.4; sys_platform == 'win32'",

"transformers>=4.51.0",

"flash-attn @ https://github.com/sdbds/flash-attention-for-windows/releases/download/2.8.2/flash_attn-2.8.2+cu128torch2.7.1cxx11abiFALSEfullbackward-cp311-cp311-win_amd64.whl ; sys_platform == 'win32' and python_version == '3.11' and platform_machine == 'AMD64'",

"flash-attn @ https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.7.12/flash_attn-2.8.3+cu128torch2.10-cp311-cp311-linux_x86_64.whl ; sys_platform == 'linux' and python_version == '3.11' and platform_machine == 'x86_64'",

"flash-attn",

"xxhash",

"torch>=2.4.0",

"triton>=3.0.0; sys_platform == 'linux'",

"triton-windows>=3.0.0,<3.4; sys_platform == 'win32' and platform_machine == 'AMD64'",

"transformers>=4.51.0",

"flash-attn; (sys_platform == 'linux' and platform_machine == 'x86_64' and python_version >= '3.11') or (sys_platform == 'win32' and platform_machine == 'AMD64' and python_version >= '3.12')",

"xxhash",

🤖 Prompt for AI Agents

In `@acestep/third_parts/nano-vllm/pyproject.toml` around lines 14 - 19, Update the pyproject.toml dependency entries to guard platform/Python compatibility: make the "flash-attn" dependency conditional (only Linux x86_64 with Python >=3.11 and Windows x86_64 for supported Python versions) by adding appropriate environment markers (e.g., sys_platform and platform_machine and python_version) and tighten "triton-windows" to exclude Windows ARM by adding platform_machine == 'AMD64'; also update or raise the requires-python metadata from ">=3.10" to ">=3.11" (or add python_version markers on flash-attn) so the nano-vllm path dependency won’t force unresolved wheels on unsupported OS/architectures and Python versions.

coderabbitai bot reviewed Feb 16, 2026

View reviewed changes

requirements.txt Outdated Show resolved Hide resolved

tonyjohnvan force-pushed the fix/windows-cuda-deps-and-flash-attn branch from 98447c7 to b151636 Compare February 16, 2026 07:11

tonyjohnvan marked this pull request as draft February 16, 2026 07:15

tonyjohnvan force-pushed the fix/windows-cuda-deps-and-flash-attn branch from b151636 to 217ba43 Compare February 16, 2026 07:18

tonyjohnvan marked this pull request as ready for review February 16, 2026 10:15

coderabbitai bot reviewed Feb 16, 2026

View reviewed changes

ChuxiJ closed this Feb 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(fix):Resolve Windows CUDA Flash Attention prebuilt wheels#604

(fix):Resolve Windows CUDA Flash Attention prebuilt wheels#604
tonyjohnvan wants to merge 2 commits intoace-step:mainfrom
tonyjohnvan:fix/windows-cuda-deps-and-flash-attn

tonyjohnvan commented Feb 16, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 16, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tonyjohnvan commented Feb 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope

Risk and Compatibility

Regression Checks

Reviewer Notes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 16, 2026

Choose a reason for hiding this comment

Official flash-attn (Dao-AILab) distribution in 2024–2025

Where wheels were available in practice (community, 2024–2025 timeframe)

Practical summary for “wheel availability” (2024–2025)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tonyjohnvan commented Feb 16, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 16, 2026 •

edited

Loading

Official `flash-attn` (Dao-AILab) distribution in 2024–2025