(fix):Resolve Windows CUDA Flash Attention prebuilt wheels#604
(fix):Resolve Windows CUDA Flash Attention prebuilt wheels#604tonyjohnvan wants to merge 2 commits intoace-step:mainfrom
Conversation
📝 WalkthroughWalkthroughDependency manifests updated: platform- and Python-gated wheel sources and dependency lines for PyTorch, torchvision, torchaudio, Triton, and flash-attn were reworked—Windows entries now include Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@requirements.txt`:
- Around line 54-56: The Linux entry for "flash-attn" in requirements.txt
currently uses a generic specifier and should be replaced with explicit prebuilt
wheel URLs (like the Windows entries) so pip won't attempt to compile from
source; update the single "flash-attn; sys_platform == 'linux' and
platform_machine == 'x86_64'" line to explicit wheel installs for Linux CPython
3.11 and 3.12 (using python_version == '3.11' and python_version == '3.12'
conditionals) mirroring the style used for the Windows lines and consistent with
the wheel URLs used in acessórios/nano-vllm/pyproject.toml so users on Linux get
the prebuilt wheels.
98447c7 to
b151636
Compare
…sh-attn Two bugs introduced/exposed by the DGX Spark PR (ace-step#575): 1. Adding win32/AMD64 to required-environments forced uv lock to cross-resolve all four platforms. Since uv.lock is gitignored, Windows users had no lockfile and uv lock failed resolving torch==2.7.1+cu128 from the explicit pytorch index. Fix: remove win32 from required-environments so Windows resolves locally at sync time (restoring pre-ace-step#575 behavior), and upgrade Windows torch to 2.10.0+cu128 to align with Linux x86_64. 2. flash-attn only had a Python 3.11 (cp311) wheel for Windows (sdbds, built against torch 2.7.1). Python 3.12 users could not install flash-attn at all. Fix: switch to mjun0812/flash-attention-prebuild-wheels which provides cp311 and cp312 wheels built against torch 2.10 for both Windows (cu126, backward-compatible with cu128 runtime) and Linux (cu128). Added cp312 entries for both platforms. Co-authored-by: Cursor <cursoragent@cursor.com>
b151636 to
217ba43
Compare
- Added flash-attn dependency to pyproject.toml and requirements.txt for better platform support. - Updated flash-attn source URLs in pyproject.toml to include specific wheels for Windows and Linux based on Python versions. - Removed old flash-attn wheel links from nano-vllm's pyproject.toml to streamline dependency management.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@acestep/third_parts/nano-vllm/pyproject.toml`:
- Around line 14-19: Update the pyproject.toml dependency entries to guard
platform/Python compatibility: make the "flash-attn" dependency conditional
(only Linux x86_64 with Python >=3.11 and Windows x86_64 for supported Python
versions) by adding appropriate environment markers (e.g., sys_platform and
platform_machine and python_version) and tighten "triton-windows" to exclude
Windows ARM by adding platform_machine == 'AMD64'; also update or raise the
requires-python metadata from ">=3.10" to ">=3.11" (or add python_version
markers on flash-attn) so the nano-vllm path dependency won’t force unresolved
wheels on unsupported OS/architectures and Python versions.
| "torch>=2.4.0", | ||
| # Triton and Flash Attention are optional on ROCm (Python 3.12) - SDPA fallback used instead | ||
| "triton-windows>=3.0.0,<3.4; sys_platform == 'win32' and python_version == '3.11'", | ||
| "triton>=3.0.0; sys_platform == 'linux' and python_version == '3.11'", | ||
| "triton>=3.0.0; sys_platform == 'linux'", | ||
| "triton-windows>=3.0.0,<3.4; sys_platform == 'win32'", | ||
| "transformers>=4.51.0", | ||
| "flash-attn @ https://github.com/sdbds/flash-attention-for-windows/releases/download/2.8.2/flash_attn-2.8.2+cu128torch2.7.1cxx11abiFALSEfullbackward-cp311-cp311-win_amd64.whl ; sys_platform == 'win32' and python_version == '3.11' and platform_machine == 'AMD64'", | ||
| "flash-attn @ https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.7.12/flash_attn-2.8.3+cu128torch2.10-cp311-cp311-linux_x86_64.whl ; sys_platform == 'linux' and python_version == '3.11' and platform_machine == 'x86_64'", | ||
| "flash-attn", | ||
| "xxhash", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, read the pyproject.toml file to see the full dependency section and requires-python
cat -n acestep/third_parts/nano-vllm/pyproject.toml | head -50Repository: ace-step/ACE-Step-1.5
Length of output: 916
🏁 Script executed:
# Also check if there's a git history or PR context
cd acestep/third_parts/nano-vllm && git log --oneline -10 2>/dev/null || echo "No git history available"Repository: ace-step/ACE-Step-1.5
Length of output: 137
🏁 Script executed:
# Search for any documentation about flash-attn wheels or platform support
rg -i "flash.attn|triton.windows" --type python --type toml -A 3 -B 3Repository: ace-step/ACE-Step-1.5
Length of output: 95
🏁 Script executed:
# Search for PR context or discussion about flash-attn wheels
rg "flash.attn|cp311|cp312" --type toml --type md -B 2 -A 2Repository: ace-step/ACE-Step-1.5
Length of output: 3731
🏁 Script executed:
# Check the git commit for more details about flash-attn update
cd acestep/third_parts/nano-vllm && git show 779bddf --stat 2>/dev/null || echo "Cannot show commit details"Repository: ace-step/ACE-Step-1.5
Length of output: 50377
🌐 Web query:
flash-attn wheel availability python versions platforms 2024 2025
💡 Result:
Official flash-attn (Dao-AILab) distribution in 2024–2025
- PyPI does not provide prebuilt wheels for the 2024–2025 releases (it uploads source tarballs only), so
pip install flash-attngenerally means building from source. This is true for:- 2.7.0.post2 (Nov 13, 2024): only
flash_attn-2.7.0.post2.tar.gz(no.whlfiles) and metadata says Python >= 3.8. [2] - 2.8.3 (Aug 15, 2025): only
flash_attn-2.8.3.tar.gz(no.whlfiles) and metadata says Python >= 3.9. [1]
- 2.7.0.post2 (Nov 13, 2024): only
- Upstream states the project is primarily Linux-focused; Windows “might work” but is not the main supported path (and they ask for help setting up prebuilt Windows wheels). [3]
Where wheels were available in practice (community, 2024–2025 timeframe)
Because PyPI is source-only, most people relied on unofficial/community wheel builders:
- Linux (x86_64 / manylinux):
mjun0812/flash-attention-prebuild-wheelspublishes large matrices of wheels by Python / PyTorch / CUDA combos. [4] - Windows (win_amd64): multiple community repos published wheels for specific setups, e.g.
- NeedsMoar builds: wheels for Python 3.10 and 3.11 (CUDA-specific). [5]
- sunsetcoder builds: wheels (notably mentions Python 3.10 availability). [6]
Practical summary for “wheel availability” (2024–2025)
- If you installed from PyPI in 2024–2025: you were not getting wheels; you were building from source. [1][2]
- If you needed wheels: you typically used community GitHub/HF wheel repos (Linux coverage tends to be broader than Windows; Windows wheels are often narrower and GPU/CUDA-specific). [3][4][5][6]
- Aggregators such as flashattn.dev exist to help find matching community wheels by platform/Python/PyTorch/CUDA. [7]
Add platform/Python markers for flash-attn to prevent failed resolution on unsupported configurations.
At Line 18, flash-attn is unconditional in this submodule's pyproject.toml. While the parent project applies these markers at the root level, nano-vllm as a path dependency should also declare them defensively. Community wheels for flash-attn are available only for Linux x86_64 (Python 3.11+) and Windows x86_64 (Python 3.12–3.13); there are no macOS or Windows ARM wheels. If nano-vllm is installed standalone or in an environment where parent markers are not evaluated, installs on Python 3.10, macOS, or Windows ARM will fail during dependency resolution.
Similarly, Line 16 (triton-windows) should exclude Windows ARM by adding platform_machine == 'AMD64' to match the x86_64-only wheel availability.
Additionally, consider raising requires-python from >=3.10 to >=3.11 to align with flash-attn wheel availability, or apply version-specific markers as shown in the parent project's configuration.
✅ Suggested markers (align with root project)
- "triton-windows>=3.0.0,<3.4; sys_platform == 'win32'",
- "flash-attn",
+ "triton-windows>=3.0.0,<3.4; sys_platform == 'win32' and platform_machine == 'AMD64'",
+ "flash-attn; (sys_platform == 'linux' and platform_machine == 'x86_64' and python_version >= '3.11') or (sys_platform == 'win32' and platform_machine == 'AMD64' and python_version >= '3.12')",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "torch>=2.4.0", | |
| # Triton and Flash Attention are optional on ROCm (Python 3.12) - SDPA fallback used instead | |
| "triton-windows>=3.0.0,<3.4; sys_platform == 'win32' and python_version == '3.11'", | |
| "triton>=3.0.0; sys_platform == 'linux' and python_version == '3.11'", | |
| "triton>=3.0.0; sys_platform == 'linux'", | |
| "triton-windows>=3.0.0,<3.4; sys_platform == 'win32'", | |
| "transformers>=4.51.0", | |
| "flash-attn @ https://github.com/sdbds/flash-attention-for-windows/releases/download/2.8.2/flash_attn-2.8.2+cu128torch2.7.1cxx11abiFALSEfullbackward-cp311-cp311-win_amd64.whl ; sys_platform == 'win32' and python_version == '3.11' and platform_machine == 'AMD64'", | |
| "flash-attn @ https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.7.12/flash_attn-2.8.3+cu128torch2.10-cp311-cp311-linux_x86_64.whl ; sys_platform == 'linux' and python_version == '3.11' and platform_machine == 'x86_64'", | |
| "flash-attn", | |
| "xxhash", | |
| "torch>=2.4.0", | |
| "triton>=3.0.0; sys_platform == 'linux'", | |
| "triton-windows>=3.0.0,<3.4; sys_platform == 'win32' and platform_machine == 'AMD64'", | |
| "transformers>=4.51.0", | |
| "flash-attn; (sys_platform == 'linux' and platform_machine == 'x86_64' and python_version >= '3.11') or (sys_platform == 'win32' and platform_machine == 'AMD64' and python_version >= '3.12')", | |
| "xxhash", |
🤖 Prompt for AI Agents
In `@acestep/third_parts/nano-vllm/pyproject.toml` around lines 14 - 19, Update
the pyproject.toml dependency entries to guard platform/Python compatibility:
make the "flash-attn" dependency conditional (only Linux x86_64 with Python
>=3.11 and Windows x86_64 for supported Python versions) by adding appropriate
environment markers (e.g., sys_platform and platform_machine and python_version)
and tighten "triton-windows" to exclude Windows ARM by adding platform_machine
== 'AMD64'; also update or raise the requires-python metadata from ">=3.10" to
">=3.11" (or add python_version markers on flash-attn) so the nano-vllm path
dependency won’t force unresolved wheels on unsupported OS/architectures and
Python versions.
Summary
Fix two dependency bugs affecting Windows CUDA users, introduced/exposed by the DGX Spark PR (#575):
Windows
uv runfails with "unsatisfiable" torch resolution — Addingwin32/AMD64torequired-environmentsforceduv lockto cross-resolve all four platforms simultaneously. Sinceuv.lockis gitignored, Windows users had no lockfile and the implicituv lockfailed resolvingtorch==2.7.1+cu128from the explicit PyTorch index. Fix: removewin32/AMD64fromrequired-environments(restoring pre-(fix):DGX Spark dependencies #575 local resolution behavior) and upgrade Windows torch to2.10.0+cu128to align with Linux x86_64.flash-attn unavailable on Python 3.12 — The only Windows flash-attn wheel (sdbds,
cp311built against torch 2.7.1) excluded Python 3.12 users entirely. Fix: switch to mjun0812/flash-attention-prebuild-wheels which providescp311andcp312wheels built against torch 2.10 for both Windows (cu126, backward-compatible withcu128runtime) and Linux (cu128).Scope
Files changed:
pyproject.toml— Upgrade Windows torch/torchvision/torchaudio from 2.7.1 to 2.10.0+cu128, addplatform_machine == 'AMD64'marker, removewin32fromrequired-environmentsacestep/third_parts/nano-vllm/pyproject.toml— Replace sdbds flash-attn wheel with mjun0812 wheels (cp311 + cp312) for Windows; add cp312 wheel for Linuxrequirements.txt— Mirror the same torch upgrade and flash-attn wheel changes for pip usersExplicitly out of scope:
Risk and Compatibility
2.10.0+cu128wheels verified to exist on the PyTorch index for cp311, cp312, cp313 (win_amd64)cu126wheels are backward-compatible withcu128CUDA runtime (CUDA minor versions are forward-compatible within the same major version)required-environmentsremoval for Windows: restores the exact behavior that worked before (fix):DGX Spark dependencies #575 — uv resolves locally at sync time instead of requiring cross-platform lockfile resolutionRegression Checks
uv locksucceeds (resolves 172 packages)uv sync --dry-runsucceedsuv run acestep(needs Windows tester)Reviewer Notes
uv.lockis gitignored (.gitignoreline 112) — this is the root cause of Bug 1. Users never receive a lockfile, souv runtriggersuv lockon their machine. Committinguv.lockto version control would be a stronger fix but is a separate discussion.uv.lockso users get reproducible installs without needing to resolve dependencies themselves.Summary by CodeRabbit