jiahanc
|
34553b9d27
|
[Performance] Support FP8 flashinfer TRTLLM MOE on Qwen3 and Qwen-3next (#27492)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
|
2025-11-10 12:34:57 -05:00 |
|
Varun Sundar Rabindranath
|
b039bfda8f
|
[Bugfix] Fix persistent_masked_m_silu_mul_quant tests (#28366)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-11-10 09:21:52 -08:00 |
|
Cyrus Leung
|
d0e186c16f
|
[V0 Deprecation] Remove unused context_len and seq_len from M-RoPE (#28395)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-11 00:30:06 +08:00 |
|
vllmellm
|
f080a83511
|
[RFC][ROCm][AITER] Keep all AITER kernels in _aiter_ops class like _custom_ops and _ipex_ops (#24490)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-11-10 08:20:53 -08:00 |
|
caozuoba
|
40e2eeeb92
|
[Kernel] Optimization of the mm_k operator. (#28280)
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-10 16:03:46 +00:00 |
|
zejunchen-zejun
|
b06b9470ca
|
[Rocm][fused_moe][fp4] view weight to torch.float4_e2m1fn_x2 when running aiter fused moe for fp4 model (#27474)
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
|
2025-11-10 10:38:56 -05:00 |
|
TJian
|
4673e465ff
|
Add @tjtanaa to codeowner for ROCm and multi-modal (#28360)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-11-10 21:39:17 +08:00 |
|
Ferrebo
|
912744d066
|
[Fix] optimize visual token mask with caching and multi-token support (#28374)
Signed-off-by: Ferrebo <itachi971009@gmail.com>
Signed-off-by: kebo01 <kebo01@baidu.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-10 13:23:49 +00:00 |
|
Yu Jiaqi
|
15be507c86
|
[bugfix] fix siglip batch text output error (#28365)
Signed-off-by: piood <2477084691@qq.com>
|
2025-11-10 21:21:15 +08:00 |
|
Mark McLoughlin
|
6f7de33bed
|
[Metrics] Refactor LoRA state tracking (#26801)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-11-10 16:34:36 +08:00 |
|
Shinichi Hemmi
|
a98cc35c34
|
Restore PlaMo2 unit test as pfnet/plamo-2-1b now supports transformers >=4.56 (#28019)
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>
|
2025-11-10 06:50:02 +00:00 |
|
Lucas Wilkinson
|
e8697faf03
|
[V0 deprecation] Remove no longer used get_metadata_cls (#28370)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-10 14:32:09 +08:00 |
|
Xiake Sun
|
03fa4d3fb3
|
[Hardware][AMD][Model] Add Triton MoE tuning support and optimized configs for Qwen3 omni for MI308X (#28373)
Signed-off-by: Xiake Sun <xiake.sun@amd.com>
Signed-off-by: Xiake Sun <xisun@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-10 04:53:40 +00:00 |
|
Varun Sundar Rabindranath
|
6b2b9fd934
|
[CI] lora/test_mixtral.py : Add additional expected outputs due to flakiness (#28322)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-11-10 10:45:29 +08:00 |
|
JartX
|
c5f685b3ae
|
[ROCm][Platform] Add RX7900XTX device id in _ROCM_DEVICE_ID_NAME_MAP (#28279)
Signed-off-by: JartX <sagformas@epdcenter.es>
|
2025-11-09 23:09:36 +00:00 |
|
Jiangyun Zhu
|
c4768dcf47
|
[Kernel] Fix fused_gdn_gating (#28343)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-11-09 14:26:35 -07:00 |
|
Zhewen Li
|
a65a934ebe
|
[CI/Build] Temporary fix to LM Eval Small Models (#28324)
Signed-off-by: zhewenli <zhewenli@meta.com>
|
2025-11-09 21:08:38 +00:00 |
|
usberkeley
|
4a8d6bd168
|
Fix cu_num_generated_tokens slicing logic in LogprobsLists.slice() method (#28214)
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
|
2025-11-09 19:11:46 +00:00 |
|
Lucas Wilkinson
|
636efd10a5
|
[Core] Separate out attention metadata building logic from prepare inputs (#26764)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-09 13:51:43 -05:00 |
|
Nick Hill
|
289eb6c537
|
[Core] Simplify async KV output aggregation (#28327)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-09 09:44:13 -08:00 |
|
Nicolò Lucchesi
|
19d91ece4b
|
[CI] Fix flaky test_eagle_correctness test (#28364)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-11-09 16:04:59 +00:00 |
|
Jiangyun Zhu
|
7ae5a5fb11
|
[Misc] Add some comments in qwen3-next (#28267)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-11-08 23:59:24 -08:00 |
|
Yong Hoon Shin
|
de2b78305f
|
[ROCm] Add env to enable/disable aiter triton gemm (#28321)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-11-08 22:27:00 -08:00 |
|
Ning Xie
|
e5e9067e61
|
[Misc] fix typo and add detailed log (#28178)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-11-09 05:33:46 +00:00 |
|
yihong
|
3a7d580343
|
fix: close issue 28338 by fixed python version (#28339)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-11-09 05:07:26 +00:00 |
|
Kevin H. Luu
|
05f8d69077
|
[chore] Move some wikimedia images to S3 (#28351)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
|
2025-11-09 01:58:26 +00:00 |
|
Mohammad Miadh Angkad
|
404d7a9d14
|
[Performance][gpt-oss] Revert gpt-oss max cudagraph size to 1024 (#28345)
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>
|
2025-11-08 15:50:10 -07:00 |
|
ElizaWszola
|
171133f929
|
[Bugfix] Fix test fused quant layernorm tests (#27865)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-11-08 14:31:33 -08:00 |
|
Cole Murray
|
32787d0644
|
Remove setuptools upper bound constraint (<80) (#28337)
Signed-off-by: Cole Murray <colemurray.cs@gmail.com>
|
2025-11-08 22:30:18 +00:00 |
|
Benjamin Chislett
|
975676d174
|
[Feat] Drop-in Torch CUDA Profiler (#27841)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2025-11-08 14:07:37 -08:00 |
|
Ev Lacey
|
77d702a22b
|
Enhance run_cluster.sh for multi-NIC support (#28328)
Signed-off-by: Ev Lacey <elacey@nvidia.com>
|
2025-11-08 22:04:16 +00:00 |
|
zhangsicheng5
|
2108a571d7
|
[DCP] Support dcp kv_cache interleave size > 1 (#26696)
Signed-off-by: zhangsicheng5 <zhangsicheng5@huawei.com>
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Signed-off-by: Qiu <qiuchunshuo@huawei.com>
Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com>
|
2025-11-09 04:45:27 +09:00 |
|
Andy Lo
|
47604137a2
|
[Bugfix] Spec decode + structured output + spec model max len edge case (#28298)
Signed-off-by: Andy Lo <andy@mistral.ai>
|
2025-11-08 19:44:25 +00:00 |
|
Robert Shaw
|
26990d25dc
|
[Bugfix] Update device name for H200 detection (#28349)
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2025-11-08 19:01:11 +00:00 |
|
Harry Mellor
|
d9ab1ad9d1
|
reasoning_content -> reasoning (#27752)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-08 12:15:08 +00:00 |
|
22quinn
|
608bb14462
|
[Attention] Remove max cudagraph size limit of 992 (#27840)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-11-07 22:33:27 -08:00 |
|
Xiaozhu Meng
|
4a36681f85
|
[flashinfer][fix] do not check nvcc availability when using pre-downloaded cubins (#27990)
Signed-off-by: Xiaozhu <mxz297@gmail.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2025-11-07 22:25:21 -08:00 |
|
Abolfazl Shahbazi
|
d15afc1fd0
|
Refactor CPU/GPU extension targets for CMake build (#28026)
Signed-off-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
|
2025-11-08 14:17:35 +08:00 |
|
Isotr0py
|
934a9c3b79
|
[Model] Consolidate Deepseek-MoE implementation with DeepSeek-v2 (#28101)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-11-08 05:01:27 +00:00 |
|
gnovack
|
70af44fd10
|
[bugfix] support eagle with lora cudagraph specialization (#28318)
Signed-off-by: gnovack <gnovack@amazon.com>
|
2025-11-08 03:25:45 +00:00 |
|
Aurick Qiao
|
781f5ebf52
|
Bump arctic-inference requirement (#28174)
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-11-07 18:31:18 -08:00 |
|
Michael Goin
|
0852527647
|
[Perf][DeepSeek] Add sigmoid+bias fusion to fused_grouped_topk from TRTLLM (#28124)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-11-07 18:20:55 -08:00 |
|
Hamid Mukhtar
|
61d25dc44b
|
Update gpu.rocm.inc.md to add support for AMD Ryzen AI MAX / AI 300 Series (gfx1151, gfx1150) (#28308)
Signed-off-by: Hamid Mukhtar <15519013+hammmmy@users.noreply.github.com>
|
2025-11-08 02:09:21 +00:00 |
|
Xiaohong (Sean) Chen
|
d0c7792004
|
[Bugfix][LoRA][Spec Decode] Support LoRA with speculative decoding (#21068)
Signed-off-by: Sean Chen <xiaohong_chen1991@hotmail.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Danielle Robinson <dcmaddix@gmail.com>
Co-authored-by: Haipeng Li <li2haipeng@gmail.com>
Co-authored-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com>
|
2025-11-08 01:58:22 +00:00 |
|
Boyuan Feng
|
b158df2813
|
remove resolve_op_overloads and use splitting_ops directly (#28081)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-11-08 01:13:13 +00:00 |
|
Kunshang Ji
|
1aaecda078
|
[XPU] Enable Expert parallel for MoE models (#28263)
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-11-08 00:33:11 +00:00 |
|
Harry Mellor
|
811df41ee9
|
Update Flashinfer from v0.4.1 to v0.5.2 (#27952)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-07 16:24:42 -08:00 |
|
Nick Hill
|
67a2da890e
|
[PerfFix] Avoid separate thread for MP executor shm spin (take 2) (#28319)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-07 22:11:03 +00:00 |
|
Nick Hill
|
da786e339e
|
[Core] Rework handling of async scheduling config (#28250)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-07 20:01:23 +00:00 |
|
Benjamin Chislett
|
18903216f5
|
[Bugfix] Fix and add tests for GptOss reasoning parser (#28000)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2025-11-07 19:28:04 +00:00 |
|