Michael Goin
fbd6523ac0
Refactor dense FP8 tensor/channel/block utils and add CT FP8 block ( #21404 )
2025-09-18 08:53:45 -04:00
Asaf Joseph Gardin
66072b36db
[Bugfix][Mamba] - Fix Conv State Kernel FP32 Support ( #24883 )
...
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>
2025-09-18 12:21:17 +00:00
Jee Jee Li
37970105fe
[Model] Improve Pooling Model ( #25149 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-18 11:04:21 +00:00
Punitvara
05b044e698
[Doc] Fix cross-reference warnings ( #25058 )
...
Signed-off-by: Punit Vara <punitvara@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-18 02:05:16 -07:00
YiwenC
9d8a2d86d2
[EPLB] Add EPLB support for hunyuan_v1 ( #23078 )
2025-09-18 04:51:35 +00:00
bnellnm
dc2979c585
[Kernels] Overlap shared experts with combine instead of dispatch ( #24254 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-09-18 12:10:21 +08:00
bnellnm
5963b98b46
[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses ( #22537 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-09-17 17:43:31 -06:00
Tao He
dd6a910aac
[Bugfix][Qwen3-Next] fixes the varlen issue in qwen3-next's MTP implementation. ( #24957 )
...
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
2025-09-17 21:59:09 +08:00
haoyangli-amd
ca2d1925ef
[Rocm] [quantization] Fix quark ptpc moe and add test case ( #24649 )
...
Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>
Co-authored-by: Haoyang Li <haoyang.li@amd.com>
2025-09-16 22:15:13 -07:00
Roger Wang
0f7acdd73c
[Model] Support Qwen3-VL Model Series ( #24727 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Huang Jie <92386084+JJJYmmm@users.noreply.github.com>
Co-authored-by: 松灵 <26085463+wulipc@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-17 05:01:04 +00:00
Tahsin Tunan
cef32104b4
[FP8] Extend per-token-group quantization support to QuantFP8 ( #24342 )
...
Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
2025-09-16 18:31:06 -07:00
Matthew Bonanni
d119fc8614
[CI][Bugfix] Fix failing Blackwell test ( #24993 )
...
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-09-16 15:55:02 -07:00
Michael Goin
dbebb7f812
[Perf] Reuse workspace for FP8+FP4 Marlin MoE ( #20500 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-16 15:45:10 -06:00
Aleksandr Malyshev
3053a22b33
fp8 kv cache support fix for torch.compile ( #22758 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
2025-09-16 21:27:11 +00:00
Sage Moore
567939953b
[Core/DBO][1/N] Add Dual-Batch Overlap mechanism to VLLM ( #23693 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-09-16 12:21:48 -04:00
Chih-Chieh Yang
73cfb3c5ee
[Model] Clean up and simplify Mamba2 Metadata Usage in both V0 and V1 ( #24331 )
...
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
2025-09-16 14:53:43 +00:00
Chen Bruce
7ea5c73ad7
[Feat][EPLB] A novel static EPLB placement strategy for MoE models. ( #23745 )
...
Signed-off-by: bruceszchen <bruceszchen@tencent.com>
Signed-off-by: Chen Bruce <bruceszchen@tencent.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Chen Bruce <cszwwdz@vip.qq.com>
Co-authored-by: lemon412 <lemon412@foxmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-16 10:55:16 +00:00
tomeras91
27fcfe7bcf
[Mamba] Support TP>1 with quantization for mamba2 mixer in case n_groups % tp_size == 0 ( #24593 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-16 10:51:01 +00:00
Jee Jee Li
04ad0dc275
[benchmark] Add triton version in the moe tuned config ( #24769 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-16 14:10:54 +08:00
Saman A. Pour
238c4c1705
[QWEN NEXT] Fused MoE kernels Optimization configs ( #24924 )
...
Signed-off-by: Saman Keon <samanamp@outlook.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-16 13:06:03 +08:00
Gregory Shtrasberg
2891603efd
[ROCm][Bugfix] Fix the case where there's bias ( #24895 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-09-15 20:05:12 -06:00
Kyle Sayers
a0b26701c9
[Transform] Deterministic Hadacore Transforms ( #24106 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-09-15 12:59:31 -06:00
Rafael Marcelino Koike
b834b4cbf1
[USAGE] Improve error handling for weight initialization in Unquantized… ( #20321 )
...
Signed-off-by: Rafael Marcelino Koike <rafael.koike@oracle.com>
Signed-off-by: Rafael Koike <koike.rafael@gmail.com>
2025-09-15 16:45:49 +00:00
Didier Durand
4979eb79da
[Doc]: fix typos in various files ( #24821 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-15 01:08:52 -07:00
Wentao Ye
fc2dbcda8b
[Perf] Fix DeepGEMM Contiguous Layout Issue, 5.5% Throughput Improvement ( #24783 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-09-14 11:20:17 -04:00
Didier Durand
41ae4a1eab
[Doc]: fix typos in various files ( #24798 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-13 00:43:33 -07:00
Elvir Crnčević
98229db244
[Kernels][DP/EP] Optimize Silu Kernel for R1 ( #24054 )
...
Signed-off-by: elvircrn <elvircrn@gmail.com>
2025-09-13 00:17:27 -07:00
Woosuk Kwon
5febdc8750
[Chore] Remove unused batched RoPE op & kernel ( #24789 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-13 00:08:20 -07:00
Matthew Bonanni
7ba32aa60b
[Attention][FlashInfer] Enable FP8 FlashInfer (TRTLLM) MLA decode ( #24705 )
...
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
2025-09-12 15:45:53 -06:00
Elvir Crnčević
9f04d9d55f
[Qwen3-Next] MoE configs for H100 TP=1,2 and TP2/EP ( #24739 )
...
Signed-off-by: elvircrn <elvircrn@gmail.com>
2025-09-12 07:54:04 -07:00
Yan Ma
4d7c1d531b
[Bugfix] Fix MRoPE dispatch on XPU ( #24724 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com>
2025-09-12 21:43:56 +08:00
Hyogeun Oh (오효근)
41f17bf290
[Docs] Fix warnings in mkdocs build (continued) ( #24740 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com>
2025-09-12 06:43:15 -07:00
Didier Durand
bcb06d7baf
[Doc]: fix typos in various files ( #24726 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-12 06:43:12 -07:00
Li, Jiang
7920de0a2a
[Bugfix] Fix MRoPE dispatch on CPU ( #24712 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-09-12 04:56:31 +00:00
Jee Jee Li
12a8414d81
[Qwen3-Next] MoE configs for H20 TP=1,2,4,8 ( #24707 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-12 10:06:26 +08:00
Tao He
880c741bb6
[Bugfix] fixes the causal_conv1d_update kernel update non-speculative decoding cases ( #24680 )
...
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-09-11 18:16:43 -07:00
Wentao Ye
fcba05c435
[Bug] Fix Layer weight_block_size Assertion Issue ( #24674 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-11 19:47:59 -04:00
Chen Zhang
f82f7a8990
[Qwen3-Next] MOE configs for H100 TP4 ( #24699 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-09-11 15:45:52 -07:00
Michael Goin
c3aea10dc8
[Perf] Use upstream CUTLASS for SM90 Block FP8 kernel ( #23280 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-11 15:43:14 -07:00
Vadim Gimpelson
7a70a71892
[Qwen3-Next] Add B200 MoE configs for Qwen3-next ( #24698 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2025-09-11 15:34:58 -07:00
Woosuk Kwon
569bf1c9c0
[Qwen3-Next] MoE configs for H200 TP=1,2,4 ( #24695 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-11 14:38:16 -07:00
Duncan Moss
074854b24f
[Kernel][B200] mxfp4 fused cutlass moe ( #23696 )
...
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-09-11 17:04:56 -04:00
Woosuk Kwon
c733bd5e87
[Qwen3-Next] Add MoE Config for H200 ( #24688 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-11 12:40:15 -07:00
Wentao Ye
a892b259b4
[Doc] Remove Useless Comments ( #24687 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-11 12:25:47 -07:00
co63oc
e26fef8397
fix some typos ( #24616 )
...
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
2025-09-11 10:48:46 -07:00
Konrad Zawora
4aa23892d6
[Bugfix] Fix platform-specific routing in CustomOp implementations ( #24444 )
...
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
2025-09-11 17:15:01 +00:00
Tao He
e93f4cc9e3
Add the support for the qwen3 next model (a hybrid attention model). ( #24526 )
...
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-11 15:32:09 +08:00
Jerry Zhang
2048c4e379
[torchao] Support quantization configs using module swap ( #21982 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
2025-09-10 23:53:24 -07:00
TaehyunKim
9bd831f501
[Model] New model support for Motif-1-Tiny ( #23414 )
...
Signed-off-by: ca1207 <ca1207zzz@gmail.com>
Signed-off-by: TaehyunKim <73943231+ca1207@users.noreply.github.com>
Co-authored-by: WyldeCat <skan1543@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-10 23:29:40 -07:00
Didier Durand
e2b1f863aa
[Doc]: fixing doc typos ( #24635 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-10 23:19:28 -07:00