Varun Sundar Rabindranath
6b2b9fd934
[CI] lora/test_mixtral.py : Add additional expected outputs due to flakiness ( #28322 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-11-10 10:45:29 +08:00
JartX
c5f685b3ae
[ROCm][Platform] Add RX7900XTX device id in _ROCM_DEVICE_ID_NAME_MAP ( #28279 )
...
Signed-off-by: JartX <sagformas@epdcenter.es>
2025-11-09 23:09:36 +00:00
Jiangyun Zhu
c4768dcf47
[Kernel] Fix fused_gdn_gating ( #28343 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-11-09 14:26:35 -07:00
Zhewen Li
a65a934ebe
[CI/Build] Temporary fix to LM Eval Small Models ( #28324 )
...
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-11-09 21:08:38 +00:00
usberkeley
4a8d6bd168
Fix cu_num_generated_tokens slicing logic in LogprobsLists.slice() method ( #28214 )
...
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
2025-11-09 19:11:46 +00:00
Lucas Wilkinson
636efd10a5
[Core] Separate out attention metadata building logic from prepare inputs ( #26764 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-11-09 13:51:43 -05:00
Nick Hill
289eb6c537
[Core] Simplify async KV output aggregation ( #28327 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-09 09:44:13 -08:00
Nicolò Lucchesi
19d91ece4b
[CI] Fix flaky test_eagle_correctness test ( #28364 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-11-09 16:04:59 +00:00
Jiangyun Zhu
7ae5a5fb11
[Misc] Add some comments in qwen3-next ( #28267 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-11-08 23:59:24 -08:00
Yong Hoon Shin
de2b78305f
[ROCm] Add env to enable/disable aiter triton gemm ( #28321 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-11-08 22:27:00 -08:00
Ning Xie
e5e9067e61
[Misc] fix typo and add detailed log ( #28178 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-11-09 05:33:46 +00:00
yihong
3a7d580343
fix: close issue 28338 by fixed python version ( #28339 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-11-09 05:07:26 +00:00
Kevin H. Luu
05f8d69077
[chore] Move some wikimedia images to S3 ( #28351 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
2025-11-09 01:58:26 +00:00
Mohammad Miadh Angkad
404d7a9d14
[Performance][gpt-oss] Revert gpt-oss max cudagraph size to 1024 ( #28345 )
...
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>
2025-11-08 15:50:10 -07:00
ElizaWszola
171133f929
[Bugfix] Fix test fused quant layernorm tests ( #27865 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-11-08 14:31:33 -08:00
Cole Murray
32787d0644
Remove setuptools upper bound constraint (<80) ( #28337 )
...
Signed-off-by: Cole Murray <colemurray.cs@gmail.com>
2025-11-08 22:30:18 +00:00
Benjamin Chislett
975676d174
[Feat] Drop-in Torch CUDA Profiler ( #27841 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
2025-11-08 14:07:37 -08:00
Ev Lacey
77d702a22b
Enhance run_cluster.sh for multi-NIC support ( #28328 )
...
Signed-off-by: Ev Lacey <elacey@nvidia.com>
2025-11-08 22:04:16 +00:00
zhangsicheng5
2108a571d7
[DCP] Support dcp kv_cache interleave size > 1 ( #26696 )
...
Signed-off-by: zhangsicheng5 <zhangsicheng5@huawei.com>
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Signed-off-by: Qiu <qiuchunshuo@huawei.com>
Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com>
2025-11-09 04:45:27 +09:00
Andy Lo
47604137a2
[Bugfix] Spec decode + structured output + spec model max len edge case ( #28298 )
...
Signed-off-by: Andy Lo <andy@mistral.ai>
2025-11-08 19:44:25 +00:00
Robert Shaw
26990d25dc
[Bugfix] Update device name for H200 detection ( #28349 )
...
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-11-08 19:01:11 +00:00
Harry Mellor
d9ab1ad9d1
reasoning_content -> reasoning (#27752 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-08 12:15:08 +00:00
22quinn
608bb14462
[Attention] Remove max cudagraph size limit of 992 ( #27840 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-11-07 22:33:27 -08:00
Xiaozhu Meng
4a36681f85
[flashinfer][fix] do not check nvcc availability when using pre-downloaded cubins ( #27990 )
...
Signed-off-by: Xiaozhu <mxz297@gmail.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
2025-11-07 22:25:21 -08:00
Abolfazl Shahbazi
d15afc1fd0
Refactor CPU/GPU extension targets for CMake build ( #28026 )
...
Signed-off-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
2025-11-08 14:17:35 +08:00
Isotr0py
934a9c3b79
[Model] Consolidate Deepseek-MoE implementation with DeepSeek-v2 ( #28101 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2025-11-08 05:01:27 +00:00
gnovack
70af44fd10
[bugfix] support eagle with lora cudagraph specialization ( #28318 )
...
Signed-off-by: gnovack <gnovack@amazon.com>
2025-11-08 03:25:45 +00:00
Aurick Qiao
781f5ebf52
Bump arctic-inference requirement ( #28174 )
...
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-11-07 18:31:18 -08:00
Michael Goin
0852527647
[Perf][DeepSeek] Add sigmoid+bias fusion to fused_grouped_topk from TRTLLM ( #28124 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-11-07 18:20:55 -08:00
Hamid Mukhtar
61d25dc44b
Update gpu.rocm.inc.md to add support for AMD Ryzen AI MAX / AI 300 Series (gfx1151, gfx1150) ( #28308 )
...
Signed-off-by: Hamid Mukhtar <15519013+hammmmy@users.noreply.github.com>
2025-11-08 02:09:21 +00:00
Xiaohong (Sean) Chen
d0c7792004
[Bugfix][LoRA][Spec Decode] Support LoRA with speculative decoding ( #21068 )
...
Signed-off-by: Sean Chen <xiaohong_chen1991@hotmail.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Danielle Robinson <dcmaddix@gmail.com>
Co-authored-by: Haipeng Li <li2haipeng@gmail.com>
Co-authored-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com>
2025-11-08 01:58:22 +00:00
Boyuan Feng
b158df2813
remove resolve_op_overloads and use splitting_ops directly ( #28081 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-11-08 01:13:13 +00:00
Kunshang Ji
1aaecda078
[XPU] Enable Expert parallel for MoE models ( #28263 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-11-08 00:33:11 +00:00
Harry Mellor
811df41ee9
Update Flashinfer from v0.4.1 to v0.5.2 ( #27952 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-07 16:24:42 -08:00
Nick Hill
67a2da890e
[PerfFix] Avoid separate thread for MP executor shm spin (take 2) ( #28319 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-07 22:11:03 +00:00
Nick Hill
da786e339e
[Core] Rework handling of async scheduling config ( #28250 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-07 20:01:23 +00:00
Benjamin Chislett
18903216f5
[Bugfix] Fix and add tests for GptOss reasoning parser ( #28000 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
2025-11-07 19:28:04 +00:00
Simon Mo
d0ceb38ae8
[Build] Fix release pipeline failing annotation ( #28272 )
...
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-07 10:06:45 -08:00
youkaichao
155ad56d7b
[doc] add guide about the provided PTX was compiled with an unsupported toolchain ( #28305 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-11-08 00:26:34 +08:00
Fadi Arafeh
5fb4137c99
[README] Add Arm CPUs to the list of supported targets ( #28290 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-11-07 15:41:47 +00:00
Nicolò Lucchesi
68a72a5cc1
Revert "[PerfFix] Avoid separate thread for MP executor shm spin ( #28012 )" ( #28289 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-11-07 15:07:01 +00:00
Boyuan Feng
0f872b7977
[Log] update shm wait time msg ( #28255 )
2025-11-07 09:43:30 -05:00
Wentao Ye
4b1ff13221
[Feature] Default ignore_eos True for random dataset ( #28227 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-07 07:35:33 -05:00
Iceber Gu
e0d6b4a867
[CLI] add --max-tokens to vllm complete ( #28109 )
...
Signed-off-by: Iceber Gu <caiwei95@hotmail.com>
2025-11-07 12:21:40 +00:00
Pavani Majety
72b1c2ae2c
[Bugfix] Use latency MOE backend as default for Flashinfer and other misc fixes ( #27439 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
2025-11-07 04:18:39 -08:00
Lukas Geiger
e0919f331d
[Core][MM] Add mechanism to configure multimodal fields which should stay on CPU ( #28168 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-11-07 12:14:29 +00:00
Kevin H. Luu
8e19d470af
[fix] Revert "fixing mm placeholder replacement issue with gemma3" ( #28285 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
2025-11-07 12:09:09 +00:00
Mengqing Cao
1958bda9b4
[Misc][Model][Refactor] Pass the prefix into Linear layers ( #28259 )
...
Signed-off-by: MengqingCao <cmq0113@163.com>
2025-11-07 19:38:38 +08:00
Zhang Xiangze
7bdb42b2f2
[CPU]Avoid repeated random sample compile ( #28260 )
...
Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com>
2025-11-07 11:03:57 +00:00
汪志鹏
315068eb4a
[FixBug]Aeala/ShareGPT_Vicuna_unfiltered marked as multimodal benchmark ( #28265 )
...
Signed-off-by: princepride <wangzhipeng628@gmail.com>
2025-11-07 09:35:22 +00:00