Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds Olive optimization recipes and per-model licensing for the Qwen-3 LLM family across CPU, CUDA, and WebGPU execution providers.
Changes:
- Added CPU, CUDA, and WebGPU Olive JSON recipes for Qwen-Qwen3-{0.6B, 1.7B, 4B, 4B-Instruct-2507, 4B-Thinking-2507, 8B, 14B, 32B}.
- Added per-backend README.md files explaining setup and usage for each model/backend combination.
- Added Apache 2.0 LICENSE files at the root of each Qwen-Qwen3-* model directory.
Reviewed changes
Copilot reviewed 56 out of 56 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| Qwen-Qwen3-8B/webgpu/README.md | WebGPU README for Qwen-Qwen3-8B Olive recipes and usage. |
| Qwen-Qwen3-8B/webgpu/Qwen-Qwen3-8B_webgpu_int4_default.json | WebGPU INT4 Olive recipe for Qwen-Qwen3-8B. |
| Qwen-Qwen3-8B/cuda/README.md | CUDA README for Qwen-Qwen3-8B Olive recipes and usage. |
| Qwen-Qwen3-8B/cuda/Qwen-Qwen3-8B_cuda_int4_kquant_last.json | CUDA INT4 k_quant_last Olive recipe for Qwen-Qwen3-8B. |
| Qwen-Qwen3-8B/cpu/README.md | CPU README for Qwen-Qwen3-8B Olive recipes and usage. |
| Qwen-Qwen3-8B/cpu/Qwen-Qwen3-8B_cpu_int4_int8_kquant_mixed.json | CPU INT4/INT8 mixed k_quant_mixed Olive recipe for Qwen-Qwen3-8B. |
| Qwen-Qwen3-8B/LICENSE | Apache 2.0 license for Qwen-Qwen3-8B assets. |
| Qwen-Qwen3-4B/webgpu/README.md | WebGPU README for Qwen-Qwen3-4B Olive recipes and usage. |
| Qwen-Qwen3-4B/webgpu/Qwen-Qwen3-4B_webgpu_int4_default.json | WebGPU INT4 Olive recipe for Qwen-Qwen3-4B. |
| Qwen-Qwen3-4B/cuda/README.md | CUDA README for Qwen-Qwen3-4B Olive recipes and usage. |
| Qwen-Qwen3-4B/cuda/Qwen-Qwen3-4B_cuda_int4_kquant_last.json | CUDA INT4 k_quant_last Olive recipe for Qwen-Qwen3-4B. |
| Qwen-Qwen3-4B/cpu/README.md | CPU README for Qwen-Qwen3-4B Olive recipes and usage. |
| Qwen-Qwen3-4B/cpu/Qwen-Qwen3-4B_cpu_int4_int8_kquant_mixed.json | CPU INT4/INT8 mixed k_quant_mixed Olive recipe for Qwen-Qwen3-4B. |
| Qwen-Qwen3-4B/LICENSE | Apache 2.0 license for Qwen-Qwen3-4B assets. |
| Qwen-Qwen3-4B-Thinking-2507/webgpu/README.md | WebGPU README for Qwen-Qwen3-4B-Thinking-2507 Olive recipes and usage. |
| Qwen-Qwen3-4B-Thinking-2507/webgpu/Qwen-Qwen3-4B-Thinking-2507_webgpu_int4_default.json | WebGPU INT4 Olive recipe for Qwen-Qwen3-4B-Thinking-2507. |
| Qwen-Qwen3-4B-Thinking-2507/cuda/README.md | CUDA README for Qwen-Qwen3-4B-Thinking-2507 Olive recipes and usage. |
| Qwen-Qwen3-4B-Thinking-2507/cuda/Qwen-Qwen3-4B-Thinking-2507_cuda_int4_kquant_last.json | CUDA INT4 k_quant_last Olive recipe for Qwen-Qwen3-4B-Thinking-2507. |
| Qwen-Qwen3-4B-Thinking-2507/cpu/README.md | CPU README for Qwen-Qwen3-4B-Thinking-2507 Olive recipes and usage. |
| Qwen-Qwen3-4B-Thinking-2507/cpu/Qwen-Qwen3-4B-Thinking-2507_cpu_int4_int8_kquant_mixed.json | CPU INT4/INT8 mixed k_quant_mixed Olive recipe for Qwen-Qwen3-4B-Thinking-2507. |
| Qwen-Qwen3-4B-Thinking-2507/LICENSE | Apache 2.0 license for Qwen-Qwen3-4B-Thinking-2507 assets. |
| Qwen-Qwen3-4B-Instruct-2507/webgpu/README.md | WebGPU README for Qwen-Qwen3-4B-Instruct-2507 Olive recipes and usage. |
| Qwen-Qwen3-4B-Instruct-2507/webgpu/Qwen-Qwen3-4B-Instruct-2507_webgpu_int4_default.json | WebGPU INT4 Olive recipe for Qwen-Qwen3-4B-Instruct-2507. |
| Qwen-Qwen3-4B-Instruct-2507/cuda/README.md | CUDA README for Qwen-Qwen3-4B-Instruct-2507 Olive recipes and usage. |
| Qwen-Qwen3-4B-Instruct-2507/cuda/Qwen-Qwen3-4B-Instruct-2507_cuda_int4_kquant_last.json | CUDA INT4 k_quant_last Olive recipe for Qwen-Qwen3-4B-Instruct-2507. |
| Qwen-Qwen3-4B-Instruct-2507/cpu/README.md | CPU README for Qwen-Qwen3-4B-Instruct-2507 Olive recipes and usage. |
| Qwen-Qwen3-4B-Instruct-2507/cpu/Qwen-Qwen3-4B-Instruct-2507_cpu_int4_int8_kquant_mixed.json | CPU INT4/INT8 mixed k_quant_mixed Olive recipe for Qwen-Qwen3-4B-Instruct-2507. |
| Qwen-Qwen3-4B-Instruct-2507/LICENSE | Apache 2.0 license for Qwen-Qwen3-4B-Instruct-2507 assets. |
| Qwen-Qwen3-32B/webgpu/README.md | WebGPU README for Qwen-Qwen3-32B Olive recipes and usage. |
| Qwen-Qwen3-32B/webgpu/Qwen-Qwen3-32B_webgpu_int4_default.json | WebGPU INT4 Olive recipe for Qwen-Qwen3-32B. |
| Qwen-Qwen3-32B/cuda/README.md | CUDA README for Qwen-Qwen3-32B Olive recipes and usage. |
| Qwen-Qwen3-32B/cuda/Qwen-Qwen3-32B_cuda_int4_kquant_last.json | CUDA INT4 k_quant_last Olive recipe for Qwen-Qwen3-32B. |
| Qwen-Qwen3-32B/cpu/README.md | CPU README for Qwen-Qwen3-32B Olive recipes and usage. |
| Qwen-Qwen3-32B/cpu/Qwen-Qwen3-32B_cpu_int4_int8_kquant_mixed.json | CPU INT4/INT8 mixed k_quant_mixed Olive recipe for Qwen-Qwen3-32B. |
| Qwen-Qwen3-32B/LICENSE | Apache 2.0 license for Qwen-Qwen3-32B assets. |
| Qwen-Qwen3-14B/webgpu/README.md | WebGPU README for Qwen-Qwen3-14B Olive recipes and usage. |
| Qwen-Qwen3-14B/webgpu/Qwen-Qwen3-14B_webgpu_int4_default.json | WebGPU INT4 Olive recipe for Qwen-Qwen3-14B. |
| Qwen-Qwen3-14B/cuda/README.md | CUDA README for Qwen-Qwen3-14B Olive recipes and usage. |
| Qwen-Qwen3-14B/cuda/Qwen-Qwen3-14B_cuda_int4_kquant_last.json | CUDA INT4 k_quant_last Olive recipe for Qwen-Qwen3-14B. |
| Qwen-Qwen3-14B/cpu/README.md | CPU README for Qwen-Qwen3-14B Olive recipes and usage. |
| Qwen-Qwen3-14B/cpu/Qwen-Qwen3-14B_cpu_int4_int8_kquant_mixed.json | CPU INT4/INT8 mixed k_quant_mixed Olive recipe for Qwen-Qwen3-14B. |
| Qwen-Qwen3-14B/LICENSE | Apache 2.0 license for Qwen-Qwen3-14B assets. |
| Qwen-Qwen3-1.7B/webgpu/README.md | WebGPU README for Qwen-Qwen3-1.7B Olive recipes and usage. |
| Qwen-Qwen3-1.7B/webgpu/Qwen-Qwen3-1.7B_webgpu_int4_default.json | WebGPU INT4 Olive recipe for Qwen-Qwen3-1.7B. |
| Qwen-Qwen3-1.7B/cuda/README.md | CUDA README for Qwen-Qwen3-1.7B Olive recipes and usage. |
| Qwen-Qwen3-1.7B/cuda/Qwen-Qwen3-1.7B_cuda_int4_kquant_last.json | CUDA INT4 k_quant_last Olive recipe for Qwen-Qwen3-1.7B. |
| Qwen-Qwen3-1.7B/cpu/README.md | CPU README for Qwen-Qwen3-1.7B Olive recipes and usage. |
| Qwen-Qwen3-1.7B/cpu/Qwen-Qwen3-1.7B_cpu_int4_int8_kquant_mixed.json | CPU INT4/INT8 mixed k_quant_mixed Olive recipe for Qwen-Qwen3-1.7B. |
| Qwen-Qwen3-1.7B/LICENSE | Apache 2.0 license for Qwen-Qwen3-1.7B assets. |
| Qwen-Qwen3-0.6B/webgpu/README.md | WebGPU README for Qwen-Qwen3-0.6B Olive recipes and usage. |
| Qwen-Qwen3-0.6B/webgpu/Qwen-Qwen3-0.6B_webgpu_int4_default.json | WebGPU INT4 Olive recipe for Qwen-Qwen3-0.6B. |
| Qwen-Qwen3-0.6B/cuda/README.md | CUDA README for Qwen-Qwen3-0.6B Olive recipes and usage. |
| Qwen-Qwen3-0.6B/cuda/Qwen-Qwen3-0.6B_cuda_int4_kquant_last.json | CUDA INT4 k_quant_last Olive recipe for Qwen-Qwen3-0.6B. |
| Qwen-Qwen3-0.6B/cpu/README.md | CPU README for Qwen-Qwen3-0.6B Olive recipes and usage. |
| Qwen-Qwen3-0.6B/cpu/Qwen-Qwen3-0.6B_cpu_int4_int8_kquant_mixed.json | CPU INT4/INT8 mixed k_quant_mixed Olive recipe for Qwen-Qwen3-0.6B. |
| Qwen-Qwen3-0.6B/LICENSE | Apache 2.0 license for Qwen-Qwen3-0.6B assets. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
devang-ml
reviewed
Jan 30, 2026
| } | ||
| }, | ||
| "engine": { "target": "local_system" }, | ||
| "passes": { |
Contributor
There was a problem hiding this comment.
It is worth adding other quantization technique such as kld_gradient where we are getting better quality.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR adds recipes for all Qwen-3 LLMs on the CPU EP, CUDA EP, and WebGPU EP.
Motivation and Context
The recipes were auto-generated with the following bash script.