1161 Commits

Author SHA1 Message Date
Aleksandr Malyshev
3053a22b33
fp8 kv cache support fix for torch.compile (#22758)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
2025-09-16 21:27:11 +00:00
Sage Moore
567939953b
[Core/DBO][1/N] Add Dual-Batch Overlap mechanism to VLLM (#23693)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-09-16 12:21:48 -04:00
Chih-Chieh Yang
73cfb3c5ee
[Model] Clean up and simplify Mamba2 Metadata Usage in both V0 and V1 (#24331)
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
2025-09-16 14:53:43 +00:00
Chen Bruce
7ea5c73ad7
[Feat][EPLB] A novel static EPLB placement strategy for MoE models. (#23745)
Signed-off-by: bruceszchen <bruceszchen@tencent.com>
Signed-off-by: Chen Bruce <bruceszchen@tencent.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Chen Bruce <cszwwdz@vip.qq.com>
Co-authored-by: lemon412 <lemon412@foxmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-16 10:55:16 +00:00
tomeras91
27fcfe7bcf
[Mamba] Support TP>1 with quantization for mamba2 mixer in case n_groups % tp_size == 0 (#24593)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-16 10:51:01 +00:00
Jee Jee Li
04ad0dc275
[benchmark] Add triton version in the moe tuned config (#24769)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-16 14:10:54 +08:00
Saman A. Pour
238c4c1705
[QWEN NEXT] Fused MoE kernels Optimization configs (#24924)
Signed-off-by: Saman Keon <samanamp@outlook.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-16 13:06:03 +08:00
Gregory Shtrasberg
2891603efd
[ROCm][Bugfix] Fix the case where there's bias (#24895)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-09-15 20:05:12 -06:00
Kyle Sayers
a0b26701c9
[Transform] Deterministic Hadacore Transforms (#24106)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-09-15 12:59:31 -06:00
Rafael Marcelino Koike
b834b4cbf1
[USAGE] Improve error handling for weight initialization in Unquantized… (#20321)
Signed-off-by: Rafael Marcelino Koike <rafael.koike@oracle.com>
Signed-off-by: Rafael Koike <koike.rafael@gmail.com>
2025-09-15 16:45:49 +00:00
Didier Durand
4979eb79da
[Doc]: fix typos in various files (#24821)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-15 01:08:52 -07:00
Wentao Ye
fc2dbcda8b
[Perf] Fix DeepGEMM Contiguous Layout Issue, 5.5% Throughput Improvement (#24783)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-09-14 11:20:17 -04:00
Didier Durand
41ae4a1eab
[Doc]: fix typos in various files (#24798)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-13 00:43:33 -07:00
Elvir Crnčević
98229db244
[Kernels][DP/EP] Optimize Silu Kernel for R1 (#24054)
Signed-off-by: elvircrn <elvircrn@gmail.com>
2025-09-13 00:17:27 -07:00
Woosuk Kwon
5febdc8750
[Chore] Remove unused batched RoPE op & kernel (#24789)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-13 00:08:20 -07:00
Matthew Bonanni
7ba32aa60b
[Attention][FlashInfer] Enable FP8 FlashInfer (TRTLLM) MLA decode (#24705)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
2025-09-12 15:45:53 -06:00
Elvir Crnčević
9f04d9d55f
[Qwen3-Next] MoE configs for H100 TP=1,2 and TP2/EP (#24739)
Signed-off-by: elvircrn <elvircrn@gmail.com>
2025-09-12 07:54:04 -07:00
Yan Ma
4d7c1d531b
[Bugfix] Fix MRoPE dispatch on XPU (#24724)
Signed-off-by: Yan Ma <yan.ma@intel.com>
2025-09-12 21:43:56 +08:00
Hyogeun Oh (오효근)
41f17bf290
[Docs] Fix warnings in mkdocs build (continued) (#24740)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
2025-09-12 06:43:15 -07:00
Didier Durand
bcb06d7baf
[Doc]: fix typos in various files (#24726)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-12 06:43:12 -07:00
Li, Jiang
7920de0a2a
[Bugfix] Fix MRoPE dispatch on CPU (#24712)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-09-12 04:56:31 +00:00
Jee Jee Li
12a8414d81
[Qwen3-Next] MoE configs for H20 TP=1,2,4,8 (#24707)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-12 10:06:26 +08:00
Tao He
880c741bb6
[Bugfix] fixes the causal_conv1d_update kernel update non-speculative decoding cases (#24680)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-09-11 18:16:43 -07:00
Wentao Ye
fcba05c435
[Bug] Fix Layer weight_block_size Assertion Issue (#24674)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-11 19:47:59 -04:00
Chen Zhang
f82f7a8990
[Qwen3-Next] MOE configs for H100 TP4 (#24699)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-09-11 15:45:52 -07:00
Michael Goin
c3aea10dc8
[Perf] Use upstream CUTLASS for SM90 Block FP8 kernel (#23280)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-11 15:43:14 -07:00
Vadim Gimpelson
7a70a71892
[Qwen3-Next] Add B200 MoE configs for Qwen3-next (#24698)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2025-09-11 15:34:58 -07:00
Woosuk Kwon
569bf1c9c0
[Qwen3-Next] MoE configs for H200 TP=1,2,4 (#24695)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-11 14:38:16 -07:00
Duncan Moss
074854b24f
[Kernel][B200] mxfp4 fused cutlass moe (#23696)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-09-11 17:04:56 -04:00
Woosuk Kwon
c733bd5e87
[Qwen3-Next] Add MoE Config for H200 (#24688)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-11 12:40:15 -07:00
Wentao Ye
a892b259b4
[Doc] Remove Useless Comments (#24687)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-11 12:25:47 -07:00
co63oc
e26fef8397
fix some typos (#24616)
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
2025-09-11 10:48:46 -07:00
Konrad Zawora
4aa23892d6
[Bugfix] Fix platform-specific routing in CustomOp implementations (#24444)
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
2025-09-11 17:15:01 +00:00
Tao He
e93f4cc9e3
Add the support for the qwen3 next model (a hybrid attention model). (#24526)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-11 15:32:09 +08:00
Jerry Zhang
2048c4e379
[torchao] Support quantization configs using module swap (#21982)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
2025-09-10 23:53:24 -07:00
TaehyunKim
9bd831f501
[Model] New model support for Motif-1-Tiny (#23414)
Signed-off-by: ca1207 <ca1207zzz@gmail.com>
Signed-off-by: TaehyunKim <73943231+ca1207@users.noreply.github.com>
Co-authored-by: WyldeCat <skan1543@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-10 23:29:40 -07:00
Didier Durand
e2b1f863aa
[Doc]: fixing doc typos (#24635)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-10 23:19:28 -07:00
Saman A. Pour
3d1393f6fc
Kimi K2 Fused MoE kernels Optimization configs (#24597)
Signed-off-by: Saman Keon <samanamp@outlook.com>
2025-09-10 23:06:16 -07:00
Gregory Shtrasberg
9a161307f5
[torch.compile][ROCm][V1] Enable attention output FP8 fusion for V1 attention backends (#19767)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-09-10 13:59:55 -07:00
Ming Yang
4032949630
[Bugfix] Fix DeepEP config for DP4TP4 (#23619)
Signed-off-by: Ming Yang <minos.future@gmail.com>
2025-09-10 10:37:56 -07:00
tomeras91
08abfa78ec
[Bugfix] fix modelopt exclude_modules name mapping (#24178)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-09-10 10:20:46 -07:00
zzhxxx
736569da8d
[Platform] Custom ops support for LMhead and LogitsProcessor (#23564)
Signed-off-by: zzhx1 <zzh_201018@outlook.com>
2025-09-10 06:26:31 -07:00
Hyogeun Oh (오효근)
ccee371e86
[Docs] Fix warnings in mkdocs build (continued) (#24092)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-10 06:23:28 -07:00
RoadToNowhereX
c0bd6a684a
Fix Auto_Round Quatization Loading on SM75 and Lower GPUs (#24217)
Signed-off-by: RoadToNowhereX <37441177+RoadToNowhereX@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-10 06:22:31 -07:00
vllmellm
7c195d43da
[ROCm][Bugfix] Fix Aiter RMSNorm (#23412)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-09-10 21:08:03 +08:00
Wei
0efdb5c3ba
[gpt-oss] Cache permute indices for faster MXFP4 MoE layer loading (#24154)
Signed-off-by: Wei Wei <wwei6@meta.com>
2025-09-10 04:27:53 +00:00
bnellnm
b23fb78623
[Bugfix] Fix for 24530. Fix naive all2all shared expert overlap. (#24538) 2025-09-09 17:53:53 -07:00
youkaichao
1aa427fdc1
[Kernels] Add Flash Linear Attention Kernels (#24518)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-09-10 00:04:41 +08:00
wang.yuqi
19332c0479
[Model] Systematic support for fp32 head, pooling models part (#23810)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-09-09 07:29:50 -07:00
Tyler Michael Smith
955c624915
[Bugfix][Wide EP] Fix redundant work when using DeepEP, TP Attn, and EP MoE (#24134)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2025-09-08 19:01:51 -07:00