Hashem Hashemi
a3a7828010
[ROCm] Add skinny gemm bias support for dtypes fp16,bf16,fp8 ( #24988 )
...
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
Signed-off-by: Hashem Hashemi <159079214+amd-hhashemi@users.noreply.github.com>
2025-09-23 14:31:45 -04:00
Burkhard Ringlein
100b630a60
[V1][Kernel] Add triton implementation for reshape_and_cache_flash ( #24503 )
...
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
Co-authored-by: Chih-Chieh Yang <chih.chieh.yang@ibm.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-23 12:52:40 -04:00
Isotr0py
b6a136b58c
[CI/Build] Fix disabled v1 attention backend selection test ( #25471 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-23 13:05:46 +00:00
Cyrus Leung
f92d952632
[V0 Deprecation] Remove MultiModalPlaceholderMap ( #25366 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-22 08:49:19 +00:00
Woosuk Kwon
bc6e542d9f
Remove V0 attention backends ( #25351 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-21 16:03:28 -07:00
Woosuk Kwon
52c2a8d4ad
[V0 Deprecation] Remove LLMEngine ( #25033 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-20 17:56:30 -07:00
Cyrus Leung
3d9a1d2de5
[V1] Support LLM.apply_model ( #18465 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-20 07:14:35 +00:00
Zhiyu
431535b522
Enable modelopt gemma3 nvfp4/fp8, make workflow more robust ( #22771 )
...
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-09-19 22:40:33 +00:00
qizixi
a2a5f79e09
Optimize triton unified attention performance for sliding window attention ( #24390 )
...
Signed-off-by: zixi-qi <qizixi@meta.com>
2025-09-19 13:07:26 -06:00
Isotr0py
cea91a32f2
[Kernel][Performance] Add Triton kernel for Qwen3-VL interleaved MRoPE ( #25055 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-19 10:27:49 +00:00
jvlunteren
01a583fea4
[Kernel] Decouple Tile Size from Block Size in Triton Unified Attention Kernel ( #21197 )
...
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>
2025-09-18 14:27:01 +00:00
bnellnm
5963b98b46
[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses ( #22537 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-09-17 17:43:31 -06:00
elvischenv
e6585ddb45
[Bugfix] Fix accuracy issue for silu_mul + nvfp4 quant fusion kernel ( #24833 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-17 16:37:23 -07:00
Michael Goin
087c6ffc92
[CI Bugfix] Fix failing test_invalid_env ( #25078 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-09-17 08:28:58 -07:00
Tahsin Tunan
cef32104b4
[FP8] Extend per-token-group quantization support to QuantFP8 ( #24342 )
...
Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
2025-09-16 18:31:06 -07:00
Woosuk Kwon
759ef49b15
Remove V0 Encoder-Decoder Support ( #24907 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-15 21:17:14 -07:00
Gregory Shtrasberg
2891603efd
[ROCm][Bugfix] Fix the case where there's bias ( #24895 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-09-15 20:05:12 -06:00
Kyle Sayers
a0b26701c9
[Transform] Deterministic Hadacore Transforms ( #24106 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-09-15 12:59:31 -06:00
Michael Goin
59d7ffc17f
[CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe ( #24750 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-09-13 07:29:19 +00:00
Elvir Crnčević
98229db244
[Kernels][DP/EP] Optimize Silu Kernel for R1 ( #24054 )
...
Signed-off-by: elvircrn <elvircrn@gmail.com>
2025-09-13 00:17:27 -07:00
Woosuk Kwon
5febdc8750
[Chore] Remove unused batched RoPE op & kernel ( #24789 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-13 00:08:20 -07:00
Matthew Bonanni
5fe643fc26
Add FLASHINFER_MLA to backend selector test ( #24753 )
...
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
2025-09-12 22:30:07 +00:00
Wenlong Wang
72fc8aa412
[Multi Modal] Add FA3 in VIT ( #24347 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
2025-09-12 21:27:24 +08:00
Michael Goin
c3aea10dc8
[Perf] Use upstream CUTLASS for SM90 Block FP8 kernel ( #23280 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-11 15:43:14 -07:00
Duncan Moss
074854b24f
[Kernel][B200] mxfp4 fused cutlass moe ( #23696 )
...
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-09-11 17:04:56 -04:00
co63oc
e26fef8397
fix some typos ( #24616 )
...
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
2025-09-11 10:48:46 -07:00
TaehyunKim
9bd831f501
[Model] New model support for Motif-1-Tiny ( #23414 )
...
Signed-off-by: ca1207 <ca1207zzz@gmail.com>
Signed-off-by: TaehyunKim <73943231+ca1207@users.noreply.github.com>
Co-authored-by: WyldeCat <skan1543@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-10 23:29:40 -07:00
Hanjie Qiu
dcb28a332b
[Kernel] Flashinfer MLA (trtllm-gen) decode kernel integration ( #21078 )
...
Signed-off-by: hjjq <hanjieq@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-09-10 15:31:10 -07:00
baonudesifeizhai
6cbd41909e
Feature/vit attention unification# 23880 ( #23978 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-10 06:10:14 -07:00
Wei
0efdb5c3ba
[gpt-oss] Cache permute indices for faster MXFP4 MoE layer loading ( #24154 )
...
Signed-off-by: Wei Wei <wwei6@meta.com>
2025-09-10 04:27:53 +00:00
Nick Hill
7e7db04310
[CI] Retry flaky fp8 cutlass mla tests ( #24536 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-09-09 20:33:10 -07:00
Didier Durand
46876dff32
[Doc]: fixing typos to improve docs ( #24480 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-08 23:06:04 -07:00
elvischenv
bba1042c6f
[Flashinfer] Support Flashinfer TRTLLM FP8-qkv BF16/FP16-out Attention Kernel ( #23647 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-09-08 20:53:07 -07:00
tomeras91
e041314184
[Bugfix] Fix mamba2 prefill chunking ( #23279 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-08 11:42:41 +00:00
Ming Yang
86173ad593
[Kernel] Support decode context parallelism on Blackwell with CUTLASS MLA ( #24385 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-09-08 09:27:12 +08:00
elvischenv
e68dc2f014
[Bugfix] Fix unstable silu_mul+nvfp4 quant fusion test ( #24370 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-09-06 20:39:34 +00:00
Jee Jee Li
7555d6b34a
[Bugfix] Fix test_mixtral_moe ( #24371 )
2025-09-06 09:32:03 -07:00
elvischenv
adc3ddb430
[Bugfix][Misc] Fix silu_and_mul_nvfp4_quant issue and extract common utils for nvfp4 kernel source files ( #23727 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-09-04 14:25:45 -07:00
Lucas Wilkinson
402759d472
[Attention] FlashAttn MLA ( #14258 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
2025-09-04 02:47:59 -07:00
Li, Jiang
57b1ce94f7
[CPU] Refactor CPU unquantized linear ( #24150 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-09-04 14:28:45 +08:00
Matthew Bonanni
a742322092
[Attention] Blackwell FP8 MLA support with CUTLASS_MLA backend ( #23289 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-09-03 14:05:24 -04:00
bnellnm
e9b92dcd89
[Kernels] Overlap shared experts with send/recv ( #23273 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-09-03 12:35:18 -04:00
Didier Durand
d7e1e59972
[Doc]: fix typos in Python comments ( #24093 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-02 21:05:45 -07:00
co63oc
1bd007f234
fix some typos ( #24071 )
...
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
2025-09-02 20:44:50 -07:00
Michael Goin
e66ed3e675
[CI Failure] Skip failing nvfp4 silu test ( #23959 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-02 13:18:15 -04:00
Didier Durand
fad73be1a5
[Doc]: fix typos in Python comments ( #24077 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-02 02:38:55 -07:00
elvischenv
16a45b3a28
[NVIDIA] Support SiluMul + NVFP4 quant fusion ( #23671 )
...
Signed-off-by: jindih <jindih@nvidia.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: jindih <jindih@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedic <lgovedic@redhat.com>
2025-08-28 19:36:50 +00:00
yzds
186aced5ff
[Kernel] cuda kernels for upcoming decode context parallel feature ( #23791 )
...
Co-authored-by: hongchao <hongchao@msh.team>
2025-08-28 15:29:11 +08:00
Wentao Ye
3af47c3cc6
[Feature] Add Hopper DeepGEMM E8M0 for DeepSeekV3.1 scale_fmt ( #23666 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-08-27 14:09:08 +00:00
Jiangyun Zhu
c37c0af990
[Misc] Fix comments in tests/kernels/quantization ( #23675 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-08-26 19:31:20 +00:00