-
Notifications
You must be signed in to change notification settings - Fork 31
feat: update readme and profiling req #241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -2,14 +2,17 @@ | |||||
|
|
||||||
| This repository demonstrates the optimization of the [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) model using **post-training quantization (PTQ)** techniques. The optimization process is divided into these workflows: | ||||||
|
|
||||||
| - QDQ for AMD NPU | ||||||
| - Quark Quantization for AMD NPU | ||||||
| - PTQ + AOT for QNN NPU | ||||||
| + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs** | ||||||
| - Int4 Quantization for QNN GPU | ||||||
| - OpenVINO for Intel® CPU/GPU/NPU | ||||||
| + This process uses OpenVINO specific passes like `OpenVINOOptimumConversion`, `OpenVINOIoUpdate` and `OpenVINOEncapsulation` | ||||||
| - Float downcasting for NVIDIA TRT for RTX GPU | ||||||
| - DML for general GPU | ||||||
| + This process uses AutoAWQ and ModelBuilder | ||||||
| + This process uses ModelBuilder | ||||||
|
|
||||||
| **For some python packages, users need to install visual studio 2022 or visual studio 2022 build tools with c++ development tools modules.** | ||||||
|
||||||
| **For some python packages, users need to install visual studio 2022 or visual studio 2022 build tools with c++ development tools modules.** | |
| **For some Python packages, users need to install Visual Studio 2022 or Visual Studio 2022 Build Tools with the C++ development workload (or C++ build tools) installed.** |
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -2,14 +2,17 @@ | |||||||||||||
|
|
||||||||||||||
| This repository demonstrates the optimization of the [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) model using **post-training quantization (PTQ)** techniques. The optimization process is divided into these workflows: | ||||||||||||||
|
|
||||||||||||||
| - QDQ for AMD NPU | ||||||||||||||
| - Quark Quantization for AMD NPU | ||||||||||||||
| - PTQ + AOT for QNN NPU | ||||||||||||||
| + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs** | ||||||||||||||
| - Int4 Quantization for QNN GPU | ||||||||||||||
|
Comment on lines
+5
to
+8
|
||||||||||||||
| - Quark Quantization for AMD NPU | |
| - PTQ + AOT for QNN NPU | |
| + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs** | |
| - Int4 Quantization for QNN GPU | |
| - PTQ + AOT for QNN NPU | |
| + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs** |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalize product/term names in this prerequisite sentence for readability/accuracy (Python, Visual Studio 2022, Build Tools, C++). Also consider using the official Visual Studio wording (“C++ development workload/tools”) rather than “modules”.
| **For some python packages, users need to install visual studio 2022 or visual studio 2022 build tools with c++ development tools modules.** | |
| **For some Python packages, users need to install Visual Studio 2022 with the C++ development workload, or Visual Studio 2022 Build Tools with the C++ build tools installed.** |
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -2,14 +2,17 @@ | |||||||||||||
|
|
||||||||||||||
| This repository demonstrates the optimization of the [Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) model using **post-training quantization (PTQ)** techniques. The optimization process is divided into these workflows: | ||||||||||||||
|
|
||||||||||||||
| - QDQ for AMD NPU | ||||||||||||||
| - Quark Quantization for AMD NPU | ||||||||||||||
| - PTQ + AOT for QNN NPU | ||||||||||||||
| + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs** | ||||||||||||||
| - Int4 Quantization for QNN GPU | ||||||||||||||
|
Comment on lines
+5
to
+8
|
||||||||||||||
| - Quark Quantization for AMD NPU | |
| - PTQ + AOT for QNN NPU | |
| + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs** | |
| - Int4 Quantization for QNN GPU | |
| - PTQ + AOT for QNN NPU | |
| + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs** |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalize product/term names in this prerequisite sentence for readability/accuracy (Python, Visual Studio 2022, Build Tools, C++). Also consider using the official Visual Studio wording (“C++ development workload/tools”) rather than “modules”.
| **For some python packages, users need to install visual studio 2022 or visual studio 2022 build tools with c++ development tools modules.** | |
| **For some Python packages, users need to install Visual Studio 2022 with the C++ development workload or Visual Studio 2022 Build Tools with the C++ build tools.** |
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -2,14 +2,17 @@ | |||||||||||||
|
|
||||||||||||||
| This repository demonstrates the optimization of the [Microsoft Phi-3.5 Mini Instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) model using **post-training quantization (PTQ)** techniques. The optimization process is divided into these workflows: | ||||||||||||||
|
|
||||||||||||||
| - QDQ for AMD NPU | ||||||||||||||
| - Quark Quantization for AMD NPU | ||||||||||||||
| - PTQ + AOT for QNN NPU | ||||||||||||||
| + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs** | ||||||||||||||
| - Int4 Quantization for QNN GPU | ||||||||||||||
|
Comment on lines
+5
to
+8
|
||||||||||||||
| - Quark Quantization for AMD NPU | |
| - PTQ + AOT for QNN NPU | |
| + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs** | |
| - Int4 Quantization for QNN GPU | |
| - PTQ + AOT for QNN NPU | |
| + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs** |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalize product/term names in this prerequisite sentence for readability/accuracy (Python, Visual Studio 2022, Build Tools, C++). Also consider using the official Visual Studio wording (“C++ development workload/tools”) rather than “modules”.
| **For some python packages, users need to install visual studio 2022 or visual studio 2022 build tools with c++ development tools modules.** | |
| **For some Python packages, users need to install Visual Studio 2022 or Visual Studio 2022 Build Tools with the C++ development tools workload.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The workflow list now mentions “Quark Quantization for AMD NPU” and “int4 Quantization for QNN GPU”, but this README doesn’t include any corresponding sections/usage guidance (and there’s no other mention of Quark/QNN GPU later). Either add links/sections that explain how to run these workflows (e.g., which *.json.config to execute), or remove the bullets to avoid advertising unsupported steps.