Skip to content

Comments

Add Qwen-3 0.6B CPU recipe#233

Merged
jambayk merged 3 commits intomainfrom
kvaishnavi/qwen3-0.6b
Feb 10, 2026
Merged

Add Qwen-3 0.6B CPU recipe#233
jambayk merged 3 commits intomainfrom
kvaishnavi/qwen3-0.6b

Conversation

@kunal-vaishnavi
Copy link
Contributor

@kunal-vaishnavi kunal-vaishnavi commented Feb 9, 2026

Description

This PR adds a recipe for Qwen-3 0.6B on the CPU EP.

Motivation and Context

The Qwen-3 0.6B variant that uses KLD gradient with a ratio of 0.65 shows the best tradeoff for MMLU vs. model size when quantized. The recipes in the broader PR are for other variants.

Results

Sample size = 1000

Model Hardware Quantization Algo Model Size MMLU Score Δ vs FP32
Qwen-3 0.6B CPU None (FP32) 2.83 GB 46.2
Qwen-3 0.6B CPU k_quant_down 984 MB 42.7 -3.5 (-7.6%)
Qwen-3 0.6B CPU kld_gradient (0.75) 569 MB 41.5 -4.7 (-10.2%)
Qwen-3 0.6B CPU kld_gradient (0.65) 598 MB 43.3 -2.9 (-6.3%)
Qwen-3 0.6B CPU dq 476 MB 42.4 -3.8 (-8.2%)

Copilot AI review requested due to automatic review settings February 9, 2026 20:26
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new model folder for Qwen/Qwen3-0.6B with Olive recipes targeting CPU, CUDA, and WebGPU execution providers, intended to complement the quantized Qwen-3 recipes added elsewhere.

Changes:

  • Added CPU (FP32) ModelBuilder recipe + metadata/README.
  • Added CUDA (FP16) ModelBuilder recipe + metadata/README.
  • Added WebGPU (FP16) ModelBuilder recipe + metadata/README, plus a model-local LICENSE file.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
Qwen-Qwen3-0.6B/LICENSE Adds Apache-2.0 license text for the model folder.
Qwen-Qwen3-0.6B/cpu/info.yaml Registers the CPU FP32 recipe for discovery.
Qwen-Qwen3-0.6B/cpu/README.md Documents how to run the CPU recipe.
Qwen-Qwen3-0.6B/cpu/Qwen-Qwen3-0.6B_cpu_fp32.json Olive config to export/build FP32 on CPU EP.
Qwen-Qwen3-0.6B/cuda/info.yaml Registers the CUDA FP16 recipe for discovery.
Qwen-Qwen3-0.6B/cuda/README.md Documents how to run the CUDA recipe.
Qwen-Qwen3-0.6B/cuda/Qwen-Qwen3-0.6B_cuda_fp16.json Olive config to export/build FP16 on CUDA EP.
Qwen-Qwen3-0.6B/webgpu/info.yaml Registers the WebGPU FP16 recipe for discovery.
Qwen-Qwen3-0.6B/webgpu/README.md Documents how to run the WebGPU recipe.
Qwen-Qwen3-0.6B/webgpu/Qwen-Qwen3-0.6B_webgpu_fp16.json Olive config to export/build FP16 on WebGPU EP.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@kunal-vaishnavi kunal-vaishnavi changed the title Add Qwen-3 0.6B recipes Add Qwen-3 0.6B CPU recipe Feb 10, 2026
@jambayk jambayk merged commit 11dc8c5 into main Feb 10, 2026
7 checks passed
@jambayk jambayk deleted the kvaishnavi/qwen3-0.6b branch February 10, 2026 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants