Enable native ModelOpt quantization support (3/3)#2
Open
Edwardf0t1 wants to merge 348 commits intozhiyu/modelopt-sglang-api-2from
Open
Enable native ModelOpt quantization support (3/3)#2Edwardf0t1 wants to merge 348 commits intozhiyu/modelopt-sglang-api-2from
Edwardf0t1 wants to merge 348 commits intozhiyu/modelopt-sglang-api-2from
Conversation
2674259 to
aed7dd2
Compare
19fcedb to
95fc54b
Compare
f074579 to
c13b457
Compare
95fc54b to
d25e5d1
Compare
1c16530 to
54524e2
Compare
d25e5d1 to
a9e4353
Compare
c118561 to
e75fbf3
Compare
b66d1dc to
c5181b3
Compare
e75fbf3 to
5c1587f
Compare
c5181b3 to
15dd13e
Compare
7134aa5 to
fe3ee4e
Compare
9c2eaac to
6c34fd9
Compare
fcfd22b to
ff8cb61
Compare
f1dd65e to
6769000
Compare
40fefb3 to
9bc99e7
Compare
7b27705 to
456a3f9
Compare
…nager` utility function (sgl-project#11586)
Signed-off-by: Alex Chi Z <iskyzh@gmail.com>
…` and add `minijinja-contrib` (sgl-project#11882)
Co-authored-by: Simo Lin <linsimo.mark@gmail.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: 羽癫 <yudian.zy@antgroup.com> Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>
Co-authored-by: guangyey <guangye.yu@intel.com> Co-authored-by: DiweiSun <105627594+DiweiSun@users.noreply.github.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: xintin <gaurav.verma@amd.com>
…1487) Co-authored-by: Jianwei Dong <1913953267@qq.com>
sgl-project#6572] (sgl-project#11416) Signed-off-by: ybyang <ybyang7@iflytek.com> Co-authored-by: YorkSu <york_su@qq.com>
Signed-off-by: zhengkezhou1 <madzhou1@gmail.com> Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
…#11902) Signed-off-by: Shangming Cai <csmthu@gmail.com>
Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
…e_intermediate_size` / `weight_block_size_n` (sgl-project#11702) Signed-off-by: Kai-Hsun Chen <khchen@x.ai>
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Signed-off-by: Serge Panev <spanev@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Original PR: sgl-project#10154
This PR only shows the diff between the 2nd and 3rd PR for a three-part series to enable native ModelOpt quantization in SGLang
Motivation
Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist