microsoft · xieofxie · Feb 12, 2026 · Feb 12, 2026 · Feb 12, 2026 · Copilot
@@ -1,6 +1,6 @@
 {
     "configCheck": 139,
-    "copyCheck": 179,
+    "copyCheck": 178,
     "extensionCheck": 1,
     "gitignoreCheck": 40,
     "inferenceModelCheck": 25,

@@ -5,6 +5,7 @@ mpmath==1.3.0
 numpy==2.2.4
 # onnx==1.17.0
 onnx==1.17.0
+onnxruntime-genai-winml==0.11.2
 # uvpip:uninstall onnxruntime-winml;pre
 # We also need to uninstall in case user tries new version, uses previous version and then updates again
 # because uninstalling winml will remove onnxruntime folder but we will not install windowsml to add it back

@@ -2,14 +2,17 @@
 
 This repository demonstrates the optimization of the [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) model using **post-training quantization (PTQ)** techniques. The optimization process is divided into these workflows:
 
-- QDQ for AMD NPU
+- Quark Quantization for AMD NPU
 - PTQ + AOT for QNN NPU
    + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
+- Int4 Quantization for QNN GPU
- Quark Quantization for AMD NPU
- PTQ + AOT for QNN NPU
-   + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
- Int4 Quantization for QNN GPU
+- PTQ + AOT for QNN NPU
+   + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
- Quark Quantization for AMD NPU
- PTQ + AOT for QNN NPU
-   + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
- Int4 Quantization for QNN GPU
+- PTQ + AOT for QNN NPU
+   + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
 - OpenVINO for Intel® CPU/GPU/NPU
    + This process uses OpenVINO specific passes like `OpenVINOOptimumConversion`, `OpenVINOIoUpdate` and `OpenVINOEncapsulation`
 - Float downcasting for NVIDIA TRT for RTX GPU
 - DML for general GPU
-   + This process uses AutoAWQ and ModelBuilder
+   + This process uses ModelBuilder
+
+**For some python packages, users need to install visual studio 2022 or visual studio 2022 build tools with c++ development tools modules.**
-**For some python packages, users need to install visual studio 2022 or visual studio 2022 build tools with c++ development tools modules.**
+**For some Python packages, users need to install Visual Studio 2022 or Visual Studio 2022 Build Tools with the C++ development workload (or C++ build tools) installed.**
-**For some python packages, users need to install visual studio 2022 or visual studio 2022 build tools with c++ development tools modules.**
+**For some Python packages, users need to install Visual Studio 2022 or Visual Studio 2022 Build Tools with the C++ development workload (or C++ build tools) installed.**
 
 ## **QDQ Model with 4-bit Weights & 16-bit Activations**
 

@@ -2,14 +2,17 @@
 
 This repository demonstrates the optimization of the [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) model using **post-training quantization (PTQ)** techniques. The optimization process is divided into these workflows:
 
-- QDQ for AMD NPU
+- Quark Quantization for AMD NPU
 - PTQ + AOT for QNN NPU
    + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
+- Int4 Quantization for QNN GPU
- Quark Quantization for AMD NPU
- PTQ + AOT for QNN NPU
-   + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
- Int4 Quantization for QNN GPU
+- PTQ + AOT for QNN NPU
+   + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
- Quark Quantization for AMD NPU
- PTQ + AOT for QNN NPU
-   + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
- Int4 Quantization for QNN GPU
+- PTQ + AOT for QNN NPU
+   + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
 - OpenVINO for Intel® CPU/GPU/NPU
    + This process uses OpenVINO specific passes like `OpenVINOOptimumConversion`, `OpenVINOIoUpdate` and `OpenVINOEncapsulation`
 - Float downcasting for NVIDIA TRT for RTX GPU
 - DML for general GPU
-   + This process uses AutoAWQ and ModelBuilder
+   + This process uses ModelBuilder
+
+**For some python packages, users need to install visual studio 2022 or visual studio 2022 build tools with c++ development tools modules.**
-**For some python packages, users need to install visual studio 2022 or visual studio 2022 build tools with c++ development tools modules.**
+**For some Python packages, users need to install Visual Studio 2022 with the C++ development workload, or Visual Studio 2022 Build Tools with the C++ build tools installed.**
-**For some python packages, users need to install visual studio 2022 or visual studio 2022 build tools with c++ development tools modules.**
+**For some Python packages, users need to install Visual Studio 2022 with the C++ development workload, or Visual Studio 2022 Build Tools with the C++ build tools installed.**
 
 ## **QDQ Model with 4-bit Weights & 16-bit Activations**
 

@@ -47,24 +47,6 @@
             "dst": "llama3_1_dml_config.json.config",
             "replacements": []
         },
-        {
-            "src": "../../deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/README.md",
-            "dst": "README.md",
-            "replacements": [
-                {
-                    "find": "# DeepSeek-R1-Distill-Qwen-1.5B Model Optimization",
-                    "replace": "# Llama-3.1-8B-Instruct Model Optimization"
-                },
-                {
-                    "find": "[DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)",
-                    "replace": "[Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)"
-                },
-                {
-                    "find": "> ⚠️ If got 6033 error, replace `genai_config.json` in `./model` folder",
-                    "replace": ""
-                }
-            ]
-        },
         {
             "src": "../../deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/requirements.txt",
             "dst": "requirements.txt",

@@ -2,14 +2,17 @@
 
 This repository demonstrates the optimization of the [Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) model using **post-training quantization (PTQ)** techniques. The optimization process is divided into these workflows:
 
-- QDQ for AMD NPU
+- Quark Quantization for AMD NPU
 - PTQ + AOT for QNN NPU
    + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
+- Int4 Quantization for QNN GPU
- Quark Quantization for AMD NPU
- PTQ + AOT for QNN NPU
-   + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
- Int4 Quantization for QNN GPU
+- PTQ + AOT for QNN NPU
+   + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
- Quark Quantization for AMD NPU
- PTQ + AOT for QNN NPU
-   + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
- Int4 Quantization for QNN GPU
+- PTQ + AOT for QNN NPU
+   + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
 - OpenVINO for Intel® CPU/GPU/NPU
    + This process uses OpenVINO specific passes like `OpenVINOOptimumConversion`, `OpenVINOIoUpdate` and `OpenVINOEncapsulation`
 - Float downcasting for NVIDIA TRT for RTX GPU
 - DML for general GPU
-   + This process uses AutoAWQ and ModelBuilder
+   + This process uses ModelBuilder
+
+**For some python packages, users need to install visual studio 2022 or visual studio 2022 build tools with c++ development tools modules.**
-**For some python packages, users need to install visual studio 2022 or visual studio 2022 build tools with c++ development tools modules.**
+**For some Python packages, users need to install Visual Studio 2022 with the C++ development workload or Visual Studio 2022 Build Tools with the C++ build tools.**
-**For some python packages, users need to install visual studio 2022 or visual studio 2022 build tools with c++ development tools modules.**
+**For some Python packages, users need to install Visual Studio 2022 with the C++ development workload or Visual Studio 2022 Build Tools with the C++ build tools.**
 
 ## **QDQ Model with 4-bit Weights & 16-bit Activations**
 

@@ -2,14 +2,17 @@
 
 This repository demonstrates the optimization of the [Microsoft Phi-3.5 Mini Instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) model using **post-training quantization (PTQ)** techniques. The optimization process is divided into these workflows:
 
-- QDQ for AMD NPU
+- Quark Quantization for AMD NPU
 - PTQ + AOT for QNN NPU
    + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
+- Int4 Quantization for QNN GPU
- Quark Quantization for AMD NPU
- PTQ + AOT for QNN NPU
-   + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
- Int4 Quantization for QNN GPU
+- PTQ + AOT for QNN NPU
+   + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
- Quark Quantization for AMD NPU
- PTQ + AOT for QNN NPU
-   + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
- Int4 Quantization for QNN GPU
+- PTQ + AOT for QNN NPU
+   + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs**
 - OpenVINO for Intel® CPU/GPU/NPU
    + This process uses OpenVINO specific passes like `OpenVINOOptimumConversion`, `OpenVINOIoUpdate` and `OpenVINOEncapsulation`
 - Float downcasting for NVIDIA TRT for RTX GPU
 - DML for general GPU
-   + This process uses AutoAWQ and ModelBuilder
+   + This process uses ModelBuilder
+
+**For some python packages, users need to install visual studio 2022 or visual studio 2022 build tools with c++ development tools modules.**
-**For some python packages, users need to install visual studio 2022 or visual studio 2022 build tools with c++ development tools modules.**
+**For some Python packages, users need to install Visual Studio 2022 or Visual Studio 2022 Build Tools with the C++ development tools workload.**
-**For some python packages, users need to install visual studio 2022 or visual studio 2022 build tools with c++ development tools modules.**
+**For some Python packages, users need to install Visual Studio 2022 or Visual Studio 2022 Build Tools with the C++ development tools workload.**
 
 ## **QDQ Model with 4-bit Weights & 16-bit Activations**