Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .aitk/configs/checks.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"configCheck": 139,
"copyCheck": 179,
"copyCheck": 178,
"extensionCheck": 1,
"gitignoreCheck": 40,
"inferenceModelCheck": 25,
Expand Down
1 change: 1 addition & 0 deletions .aitk/requirements/requirements-Profiling.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ mpmath==1.3.0
numpy==2.2.4
# onnx==1.17.0
onnx==1.17.0
onnxruntime-genai-winml==0.11.2
# uvpip:uninstall onnxruntime-winml;pre
# We also need to uninstall in case user tries new version, uses previous version and then updates again
# because uninstalling winml will remove onnxruntime folder but we will not install windowsml to add it back
Expand Down
7 changes: 5 additions & 2 deletions Qwen-Qwen2.5-1.5B-Instruct/aitk/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,17 @@

This repository demonstrates the optimization of the [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) model using **post-training quantization (PTQ)** techniques. The optimization process is divided into these workflows:

- QDQ for AMD NPU
- Quark Quantization for AMD NPU
- PTQ + AOT for QNN NPU
+ This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
- Int4 Quantization for QNN GPU
Comment on lines +5 to +8
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow list now mentions “Quark Quantization for AMD NPU” and “int4 Quantization for QNN GPU”, but this README doesn’t include any corresponding sections/usage guidance (and there’s no other mention of Quark/QNN GPU later). Either add links/sections that explain how to run these workflows (e.g., which *.json.config to execute), or remove the bullets to avoid advertising unsupported steps.

Suggested change
- Quark Quantization for AMD NPU
- PTQ + AOT for QNN NPU
+ This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
- Int4 Quantization for QNN GPU
- PTQ + AOT for QNN NPU
+ This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**

Copilot uses AI. Check for mistakes.
- OpenVINO for Intel® CPU/GPU/NPU
+ This process uses OpenVINO specific passes like `OpenVINOOptimumConversion`, `OpenVINOIoUpdate` and `OpenVINOEncapsulation`
- Float downcasting for NVIDIA TRT for RTX GPU
- DML for general GPU
+ This process uses AutoAWQ and ModelBuilder
+ This process uses ModelBuilder

**For some python packages, users need to install visual studio 2022 or visual studio 2022 build tools with c++ development tools modules.**
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize product/term names in this prerequisite sentence for readability/accuracy (Python, Visual Studio 2022, Build Tools, C++). Also consider using the official Visual Studio wording (“C++ development workload/tools”) rather than “modules”.

Suggested change
**For some python packages, users need to install visual studio 2022 or visual studio 2022 build tools with c++ development tools modules.**
**For some Python packages, users need to install Visual Studio 2022 or Visual Studio 2022 Build Tools with the C++ development workload (or C++ build tools) installed.**

Copilot uses AI. Check for mistakes.

## **QDQ Model with 4-bit Weights & 16-bit Activations**

Expand Down
7 changes: 5 additions & 2 deletions deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,17 @@

This repository demonstrates the optimization of the [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) model using **post-training quantization (PTQ)** techniques. The optimization process is divided into these workflows:

- QDQ for AMD NPU
- Quark Quantization for AMD NPU
- PTQ + AOT for QNN NPU
+ This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
- Int4 Quantization for QNN GPU
Comment on lines +5 to +8
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow list now mentions “Quark Quantization for AMD NPU” and “int4 Quantization for QNN GPU”, but this README doesn’t include any corresponding sections/usage guidance (and there’s no other mention of Quark/QNN GPU later). Either add links/sections that explain how to run these workflows (e.g., which *.json.config to execute), or remove the bullets to avoid advertising unsupported steps.

Suggested change
- Quark Quantization for AMD NPU
- PTQ + AOT for QNN NPU
+ This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
- Int4 Quantization for QNN GPU
- PTQ + AOT for QNN NPU
+ This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**

Copilot uses AI. Check for mistakes.
- OpenVINO for Intel® CPU/GPU/NPU
+ This process uses OpenVINO specific passes like `OpenVINOOptimumConversion`, `OpenVINOIoUpdate` and `OpenVINOEncapsulation`
- Float downcasting for NVIDIA TRT for RTX GPU
- DML for general GPU
+ This process uses AutoAWQ and ModelBuilder
+ This process uses ModelBuilder

**For some python packages, users need to install visual studio 2022 or visual studio 2022 build tools with c++ development tools modules.**
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize product/term names in this prerequisite sentence for readability/accuracy (Python, Visual Studio 2022, Build Tools, C++). Also consider using the official Visual Studio wording (“C++ development workload/tools”) rather than “modules”.

Suggested change
**For some python packages, users need to install visual studio 2022 or visual studio 2022 build tools with c++ development tools modules.**
**For some Python packages, users need to install Visual Studio 2022 with the C++ development workload, or Visual Studio 2022 Build Tools with the C++ build tools installed.**

Copilot uses AI. Check for mistakes.

## **QDQ Model with 4-bit Weights & 16-bit Activations**

Expand Down
18 changes: 0 additions & 18 deletions meta-llama-Llama-3.1-8B-Instruct/aitk/_copy.json.config
Original file line number Diff line number Diff line change
Expand Up @@ -47,24 +47,6 @@
"dst": "llama3_1_dml_config.json.config",
"replacements": []
},
{
"src": "../../deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/README.md",
"dst": "README.md",
"replacements": [
{
"find": "# DeepSeek-R1-Distill-Qwen-1.5B Model Optimization",
"replace": "# Llama-3.1-8B-Instruct Model Optimization"
},
{
"find": "[DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)",
"replace": "[Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)"
},
{
"find": "> ⚠️ If got 6033 error, replace `genai_config.json` in `./model` folder",
"replace": ""
}
]
},
{
"src": "../../deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/requirements.txt",
"dst": "requirements.txt",
Expand Down
7 changes: 5 additions & 2 deletions meta-llama-Llama-3.2-1B-Instruct/aitk/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,17 @@

This repository demonstrates the optimization of the [Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) model using **post-training quantization (PTQ)** techniques. The optimization process is divided into these workflows:

- QDQ for AMD NPU
- Quark Quantization for AMD NPU
- PTQ + AOT for QNN NPU
+ This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
- Int4 Quantization for QNN GPU
Comment on lines +5 to +8
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow list now mentions “Quark Quantization for AMD NPU” and “int4 Quantization for QNN GPU”, but this README doesn’t include any corresponding sections/usage guidance (and there’s no other mention of Quark/QNN GPU later). Either add links/sections that explain how to run these workflows (e.g., which *.json.config to execute), or remove the bullets to avoid advertising unsupported steps.

Suggested change
- Quark Quantization for AMD NPU
- PTQ + AOT for QNN NPU
+ This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
- Int4 Quantization for QNN GPU
- PTQ + AOT for QNN NPU
+ This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**

Copilot uses AI. Check for mistakes.
- OpenVINO for Intel® CPU/GPU/NPU
+ This process uses OpenVINO specific passes like `OpenVINOOptimumConversion`, `OpenVINOIoUpdate` and `OpenVINOEncapsulation`
- Float downcasting for NVIDIA TRT for RTX GPU
- DML for general GPU
+ This process uses AutoAWQ and ModelBuilder
+ This process uses ModelBuilder

**For some python packages, users need to install visual studio 2022 or visual studio 2022 build tools with c++ development tools modules.**
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize product/term names in this prerequisite sentence for readability/accuracy (Python, Visual Studio 2022, Build Tools, C++). Also consider using the official Visual Studio wording (“C++ development workload/tools”) rather than “modules”.

Suggested change
**For some python packages, users need to install visual studio 2022 or visual studio 2022 build tools with c++ development tools modules.**
**For some Python packages, users need to install Visual Studio 2022 with the C++ development workload or Visual Studio 2022 Build Tools with the C++ build tools.**

Copilot uses AI. Check for mistakes.

## **QDQ Model with 4-bit Weights & 16-bit Activations**

Expand Down
7 changes: 5 additions & 2 deletions microsoft-Phi-3.5-mini-instruct/aitk/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,17 @@

This repository demonstrates the optimization of the [Microsoft Phi-3.5 Mini Instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) model using **post-training quantization (PTQ)** techniques. The optimization process is divided into these workflows:

- QDQ for AMD NPU
- Quark Quantization for AMD NPU
- PTQ + AOT for QNN NPU
+ This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
- Int4 Quantization for QNN GPU
Comment on lines +5 to +8
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow list now mentions “Quark Quantization for AMD NPU” and “int4 Quantization for QNN GPU”, but this README doesn’t include any corresponding sections/usage guidance (and there’s no other mention of Quark/QNN GPU later). Either add links/sections that explain how to run these workflows (e.g., which *.json.config to execute), or remove the bullets to avoid advertising unsupported steps.

Suggested change
- Quark Quantization for AMD NPU
- PTQ + AOT for QNN NPU
+ This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
- Int4 Quantization for QNN GPU
- PTQ + AOT for QNN NPU
+ This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**

Copilot uses AI. Check for mistakes.
- OpenVINO for Intel® CPU/GPU/NPU
+ This process uses OpenVINO specific passes like `OpenVINOOptimumConversion`, `OpenVINOIoUpdate` and `OpenVINOEncapsulation`
- Float downcasting for NVIDIA TRT for RTX GPU
- DML for general GPU
+ This process uses AutoAWQ and ModelBuilder
+ This process uses ModelBuilder

**For some python packages, users need to install visual studio 2022 or visual studio 2022 build tools with c++ development tools modules.**
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize product/term names in this prerequisite sentence for readability/accuracy (Python, Visual Studio 2022, Build Tools, C++). Also consider using the official Visual Studio wording (“C++ development workload/tools”) rather than “modules”.

Suggested change
**For some python packages, users need to install visual studio 2022 or visual studio 2022 build tools with c++ development tools modules.**
**For some Python packages, users need to install Visual Studio 2022 or Visual Studio 2022 Build Tools with the C++ development tools workload.**

Copilot uses AI. Check for mistakes.

## **QDQ Model with 4-bit Weights & 16-bit Activations**

Expand Down
Loading