Varun Sundar Rabindranath
b039bfda8f
[Bugfix] Fix persistent_masked_m_silu_mul_quant tests ( #28366 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-11-10 09:21:52 -08:00
Cyrus Leung
d0e186c16f
[V0 Deprecation] Remove unused context_len and seq_len from M-RoPE ( #28395 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-11 00:30:06 +08:00
vllmellm
f080a83511
[RFC][ROCm][AITER] Keep all AITER kernels in _aiter_ops class like _custom_ops and _ipex_ops ( #24490 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-10 08:20:53 -08:00
caozuoba
40e2eeeb92
[Kernel] Optimization of the mm_k operator. ( #28280 )
...
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-10 16:03:46 +00:00
zejunchen-zejun
b06b9470ca
[Rocm][fused_moe][fp4] view weight to torch.float4_e2m1fn_x2 when running aiter fused moe for fp4 model ( #27474 )
...
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
2025-11-10 10:38:56 -05:00
TJian
4673e465ff
Add @tjtanaa to codeowner for ROCm and multi-modal ( #28360 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-11-10 21:39:17 +08:00
Ferrebo
912744d066
[Fix] optimize visual token mask with caching and multi-token support ( #28374 )
...
Signed-off-by: Ferrebo <itachi971009@gmail.com>
Signed-off-by: kebo01 <kebo01@baidu.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-10 13:23:49 +00:00
Yu Jiaqi
15be507c86
[bugfix] fix siglip batch text output error ( #28365 )
...
Signed-off-by: piood <2477084691@qq.com>
2025-11-10 21:21:15 +08:00
Mark McLoughlin
6f7de33bed
[Metrics] Refactor LoRA state tracking ( #26801 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-11-10 16:34:36 +08:00
Shinichi Hemmi
a98cc35c34
Restore PlaMo2 unit test as pfnet/plamo-2-1b now supports transformers >=4.56 ( #28019 )
...
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>
2025-11-10 06:50:02 +00:00
Lucas Wilkinson
e8697faf03
[V0 deprecation] Remove no longer used get_metadata_cls ( #28370 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-11-10 14:32:09 +08:00
Xiake Sun
03fa4d3fb3
[Hardware][AMD][Model] Add Triton MoE tuning support and optimized configs for Qwen3 omni for MI308X ( #28373 )
...
Signed-off-by: Xiake Sun <xiake.sun@amd.com>
Signed-off-by: Xiake Sun <xisun@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-10 04:53:40 +00:00
Varun Sundar Rabindranath
6b2b9fd934
[CI] lora/test_mixtral.py : Add additional expected outputs due to flakiness ( #28322 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-11-10 10:45:29 +08:00
JartX
c5f685b3ae
[ROCm][Platform] Add RX7900XTX device id in _ROCM_DEVICE_ID_NAME_MAP ( #28279 )
...
Signed-off-by: JartX <sagformas@epdcenter.es>
2025-11-09 23:09:36 +00:00
Jiangyun Zhu
c4768dcf47
[Kernel] Fix fused_gdn_gating ( #28343 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-11-09 14:26:35 -07:00
Zhewen Li
a65a934ebe
[CI/Build] Temporary fix to LM Eval Small Models ( #28324 )
...
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-11-09 21:08:38 +00:00
usberkeley
4a8d6bd168
Fix cu_num_generated_tokens slicing logic in LogprobsLists.slice() method ( #28214 )
...
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
2025-11-09 19:11:46 +00:00
Lucas Wilkinson
636efd10a5
[Core] Separate out attention metadata building logic from prepare inputs ( #26764 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-11-09 13:51:43 -05:00
Nick Hill
289eb6c537
[Core] Simplify async KV output aggregation ( #28327 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-09 09:44:13 -08:00
Nicolò Lucchesi
19d91ece4b
[CI] Fix flaky test_eagle_correctness test ( #28364 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-11-09 16:04:59 +00:00
Jiangyun Zhu
7ae5a5fb11
[Misc] Add some comments in qwen3-next ( #28267 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-11-08 23:59:24 -08:00
Yong Hoon Shin
de2b78305f
[ROCm] Add env to enable/disable aiter triton gemm ( #28321 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-11-08 22:27:00 -08:00
Ning Xie
e5e9067e61
[Misc] fix typo and add detailed log ( #28178 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-11-09 05:33:46 +00:00
yihong
3a7d580343
fix: close issue 28338 by fixed python version ( #28339 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-11-09 05:07:26 +00:00
Kevin H. Luu
05f8d69077
[chore] Move some wikimedia images to S3 ( #28351 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
2025-11-09 01:58:26 +00:00
Mohammad Miadh Angkad
404d7a9d14
[Performance][gpt-oss] Revert gpt-oss max cudagraph size to 1024 ( #28345 )
...
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>
2025-11-08 15:50:10 -07:00
ElizaWszola
171133f929
[Bugfix] Fix test fused quant layernorm tests ( #27865 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-11-08 14:31:33 -08:00
Cole Murray
32787d0644
Remove setuptools upper bound constraint (<80) ( #28337 )
...
Signed-off-by: Cole Murray <colemurray.cs@gmail.com>
2025-11-08 22:30:18 +00:00
Benjamin Chislett
975676d174
[Feat] Drop-in Torch CUDA Profiler ( #27841 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
2025-11-08 14:07:37 -08:00
Ev Lacey
77d702a22b
Enhance run_cluster.sh for multi-NIC support ( #28328 )
...
Signed-off-by: Ev Lacey <elacey@nvidia.com>
2025-11-08 22:04:16 +00:00
zhangsicheng5
2108a571d7
[DCP] Support dcp kv_cache interleave size > 1 ( #26696 )
...
Signed-off-by: zhangsicheng5 <zhangsicheng5@huawei.com>
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Signed-off-by: Qiu <qiuchunshuo@huawei.com>
Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com>
2025-11-09 04:45:27 +09:00
Andy Lo
47604137a2
[Bugfix] Spec decode + structured output + spec model max len edge case ( #28298 )
...
Signed-off-by: Andy Lo <andy@mistral.ai>
2025-11-08 19:44:25 +00:00
Robert Shaw
26990d25dc
[Bugfix] Update device name for H200 detection ( #28349 )
...
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-11-08 19:01:11 +00:00
Harry Mellor
d9ab1ad9d1
reasoning_content -> reasoning (#27752 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-08 12:15:08 +00:00
22quinn
608bb14462
[Attention] Remove max cudagraph size limit of 992 ( #27840 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-11-07 22:33:27 -08:00
Xiaozhu Meng
4a36681f85
[flashinfer][fix] do not check nvcc availability when using pre-downloaded cubins ( #27990 )
...
Signed-off-by: Xiaozhu <mxz297@gmail.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
2025-11-07 22:25:21 -08:00
Abolfazl Shahbazi
d15afc1fd0
Refactor CPU/GPU extension targets for CMake build ( #28026 )
...
Signed-off-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
2025-11-08 14:17:35 +08:00
Isotr0py
934a9c3b79
[Model] Consolidate Deepseek-MoE implementation with DeepSeek-v2 ( #28101 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2025-11-08 05:01:27 +00:00
gnovack
70af44fd10
[bugfix] support eagle with lora cudagraph specialization ( #28318 )
...
Signed-off-by: gnovack <gnovack@amazon.com>
2025-11-08 03:25:45 +00:00
Aurick Qiao
781f5ebf52
Bump arctic-inference requirement ( #28174 )
...
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-11-07 18:31:18 -08:00
Michael Goin
0852527647
[Perf][DeepSeek] Add sigmoid+bias fusion to fused_grouped_topk from TRTLLM ( #28124 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-11-07 18:20:55 -08:00
Hamid Mukhtar
61d25dc44b
Update gpu.rocm.inc.md to add support for AMD Ryzen AI MAX / AI 300 Series (gfx1151, gfx1150) ( #28308 )
...
Signed-off-by: Hamid Mukhtar <15519013+hammmmy@users.noreply.github.com>
2025-11-08 02:09:21 +00:00
Xiaohong (Sean) Chen
d0c7792004
[Bugfix][LoRA][Spec Decode] Support LoRA with speculative decoding ( #21068 )
...
Signed-off-by: Sean Chen <xiaohong_chen1991@hotmail.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Danielle Robinson <dcmaddix@gmail.com>
Co-authored-by: Haipeng Li <li2haipeng@gmail.com>
Co-authored-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com>
2025-11-08 01:58:22 +00:00
Boyuan Feng
b158df2813
remove resolve_op_overloads and use splitting_ops directly ( #28081 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-11-08 01:13:13 +00:00
Kunshang Ji
1aaecda078
[XPU] Enable Expert parallel for MoE models ( #28263 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-11-08 00:33:11 +00:00
Harry Mellor
811df41ee9
Update Flashinfer from v0.4.1 to v0.5.2 ( #27952 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-07 16:24:42 -08:00
Nick Hill
67a2da890e
[PerfFix] Avoid separate thread for MP executor shm spin (take 2) ( #28319 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-07 22:11:03 +00:00
Nick Hill
da786e339e
[Core] Rework handling of async scheduling config ( #28250 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-07 20:01:23 +00:00
Benjamin Chislett
18903216f5
[Bugfix] Fix and add tests for GptOss reasoning parser ( #28000 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
2025-11-07 19:28:04 +00:00
Simon Mo
d0ceb38ae8
[Build] Fix release pipeline failing annotation ( #28272 )
...
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-07 10:06:45 -08:00