Isotr0py
934a9c3b79
[Model] Consolidate Deepseek-MoE implementation with DeepSeek-v2 ( #28101 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2025-11-08 05:01:27 +00:00
gnovack
70af44fd10
[bugfix] support eagle with lora cudagraph specialization ( #28318 )
...
Signed-off-by: gnovack <gnovack@amazon.com>
2025-11-08 03:25:45 +00:00
Aurick Qiao
781f5ebf52
Bump arctic-inference requirement ( #28174 )
...
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-11-07 18:31:18 -08:00
Michael Goin
0852527647
[Perf][DeepSeek] Add sigmoid+bias fusion to fused_grouped_topk from TRTLLM ( #28124 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-11-07 18:20:55 -08:00
Hamid Mukhtar
61d25dc44b
Update gpu.rocm.inc.md to add support for AMD Ryzen AI MAX / AI 300 Series (gfx1151, gfx1150) ( #28308 )
...
Signed-off-by: Hamid Mukhtar <15519013+hammmmy@users.noreply.github.com>
2025-11-08 02:09:21 +00:00
Xiaohong (Sean) Chen
d0c7792004
[Bugfix][LoRA][Spec Decode] Support LoRA with speculative decoding ( #21068 )
...
Signed-off-by: Sean Chen <xiaohong_chen1991@hotmail.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Danielle Robinson <dcmaddix@gmail.com>
Co-authored-by: Haipeng Li <li2haipeng@gmail.com>
Co-authored-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com>
2025-11-08 01:58:22 +00:00
Boyuan Feng
b158df2813
remove resolve_op_overloads and use splitting_ops directly ( #28081 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-11-08 01:13:13 +00:00
Kunshang Ji
1aaecda078
[XPU] Enable Expert parallel for MoE models ( #28263 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-11-08 00:33:11 +00:00
Harry Mellor
811df41ee9
Update Flashinfer from v0.4.1 to v0.5.2 ( #27952 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-07 16:24:42 -08:00
Nick Hill
67a2da890e
[PerfFix] Avoid separate thread for MP executor shm spin (take 2) ( #28319 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-07 22:11:03 +00:00
Nick Hill
da786e339e
[Core] Rework handling of async scheduling config ( #28250 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-07 20:01:23 +00:00
Benjamin Chislett
18903216f5
[Bugfix] Fix and add tests for GptOss reasoning parser ( #28000 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
2025-11-07 19:28:04 +00:00
Simon Mo
d0ceb38ae8
[Build] Fix release pipeline failing annotation ( #28272 )
...
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-07 10:06:45 -08:00
youkaichao
155ad56d7b
[doc] add guide about the provided PTX was compiled with an unsupported toolchain ( #28305 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-11-08 00:26:34 +08:00
Fadi Arafeh
5fb4137c99
[README] Add Arm CPUs to the list of supported targets ( #28290 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-11-07 15:41:47 +00:00
Nicolò Lucchesi
68a72a5cc1
Revert "[PerfFix] Avoid separate thread for MP executor shm spin ( #28012 )" ( #28289 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-11-07 15:07:01 +00:00
Boyuan Feng
0f872b7977
[Log] update shm wait time msg ( #28255 )
2025-11-07 09:43:30 -05:00
Wentao Ye
4b1ff13221
[Feature] Default ignore_eos True for random dataset ( #28227 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-07 07:35:33 -05:00
Iceber Gu
e0d6b4a867
[CLI] add --max-tokens to vllm complete ( #28109 )
...
Signed-off-by: Iceber Gu <caiwei95@hotmail.com>
2025-11-07 12:21:40 +00:00
Pavani Majety
72b1c2ae2c
[Bugfix] Use latency MOE backend as default for Flashinfer and other misc fixes ( #27439 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
2025-11-07 04:18:39 -08:00
Lukas Geiger
e0919f331d
[Core][MM] Add mechanism to configure multimodal fields which should stay on CPU ( #28168 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-11-07 12:14:29 +00:00
Kevin H. Luu
8e19d470af
[fix] Revert "fixing mm placeholder replacement issue with gemma3" ( #28285 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
2025-11-07 12:09:09 +00:00
Mengqing Cao
1958bda9b4
[Misc][Model][Refactor] Pass the prefix into Linear layers ( #28259 )
...
Signed-off-by: MengqingCao <cmq0113@163.com>
2025-11-07 19:38:38 +08:00
Zhang Xiangze
7bdb42b2f2
[CPU]Avoid repeated random sample compile ( #28260 )
...
Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com>
2025-11-07 11:03:57 +00:00
汪志鹏
315068eb4a
[FixBug]Aeala/ShareGPT_Vicuna_unfiltered marked as multimodal benchmark ( #28265 )
...
Signed-off-by: princepride <wangzhipeng628@gmail.com>
2025-11-07 09:35:22 +00:00
Jialin Ouyang
ccd98b59c1
[Perf] Introduce FlattenLogprobs to store logprobs results to reduce GC overhead ( #28171 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-11-07 00:27:12 -08:00
Jee Jee Li
21b82f4ea2
[Kernel] LoRA triton kernels support PDL ( #27402 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-07 08:05:48 +00:00
Copilot
a736e5ff77
[CI] Reduce Blackwell Fusion test runtime by filtering tests and only run all tests in nightly ( #28074 )
2025-11-07 15:58:16 +08:00
baonudesifeizhai
9da9208b20
[Bug] Fix missing token_ids for reasoning parser models in chat completions #28246 ( #28256 )
2025-11-07 07:31:58 +00:00
smit kadvani
11fd69dd54
[amd][gptoss] Perf gain because of block alignment ( #28024 )
...
Signed-off-by: Smit Kadvani <smit.kadvani@gmail.com>
Co-authored-by: Smit Shaileshbhai Kadvani <kadvani@meta.com>
2025-11-07 05:27:42 +00:00
Harry Mellor
c0a4b95d64
Fix issues from #28242 ( #28257 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-07 04:23:17 +00:00
Alexis MacAskill
a47d94f18c
Add runai model streamer e2e test for GCS ( #28079 )
...
Signed-off-by: Alexis MacAskill <amacaskill@google.com>
2025-11-07 03:07:54 +00:00
Alex Brooks
e70fbc599b
[CI/Build] Loosen STT LoRA Translate Check (Flaky Test) ( #28247 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex Brooks <alex.brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-11-07 02:51:27 +00:00
Lucas Kabela
4bf56c79cc
[Multimodal][torch.compile] Add compilation config field for turning off ViT/MM compile ( #28242 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
2025-11-07 00:16:03 +00:00
Junhong Liu
59b453eaa2
Speed up mm processor kwargs per request by spliting dynamic and static kwargs ( #26483 )
...
Signed-off-by: Junhong <liujunhong11@huawei.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Co-authored-by: Junhong <liujunhong11@huawei.com>
2025-11-07 07:51:28 +08:00
Eugene Khvedchenya
827e4237bc
Fix failing test for CRadio ( #27738 )
...
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: wang.yuqi <noooop@126.com>
2025-11-06 15:32:25 -08:00
Varun Sundar Rabindranath
ca6f755d24
[BugFix] Fix FusedMoELoRA + ModularKernel Integration ( #28237 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-11-06 22:53:30 +00:00
Matthew Bonanni
ca90f50304
[Test] Add non-MoE DP test coverage ( #28235 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-06 20:59:57 +00:00
Fang Han
da855b42d2
[Doc]: Make extraInit containers fully configurable in helm chart ( #27497 )
...
Signed-off-by: Fang Han <fhan0520@gmail.com>
2025-11-06 20:27:16 +00:00
Aleksandr Malyshev
449de9001a
[ROCm] triton fp8 kernel ( #27058 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
2025-11-06 14:46:44 -05:00
Vico Chu
d4aa65c998
[Chore] eliminate duplicated and unconditional object serialization in anthropic messages api ( #27792 )
...
Signed-off-by: Vico Chu <vico24826@gmail.com>
2025-11-06 19:09:19 +00:00
Julien Denize
7a8375f8a0
Add llama 4 scaling support ( #28145 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
2025-11-06 18:55:17 +00:00
Andy Lo
5e0c1fe69c
[Structured outputs] Upgrade llguidance to 1.3.0 ( #28039 )
...
Signed-off-by: Andy Lo <andy@mistral.ai>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-11-06 10:24:47 -08:00
Russell Bryant
4507a6dae4
CODEOWNERS: Add myself as reviewer on security docs ( #28216 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-11-06 17:39:42 +00:00
Roy Wang
d1dd5f53e4
[Frontend] Fix logging format when enable response logging ( #28049 )
...
Signed-off-by: esmeetu <jasonailu87@gmail.com>
2025-11-06 16:25:39 +00:00
StanHatko
e52e4da971
[HARDWARE][CPU] Add Option for Disabling Binding to Specific CPU Cores ( #27953 )
...
Signed-off-by: Stan Hatko <stan_hatko@live.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2025-11-06 23:47:11 +08:00
Milos Puzovic
2176778cd3
[Doc] Add Arm CPUs are on the list of supported targets in vLLM ( #26018 )
...
Signed-off-by: Milos Puzovic <milos.puzovic@arm.com>
2025-11-06 15:30:26 +00:00
Eric Yue
0370679ce9
[Kernel][Model] Tune fused_moe Triton configs for MiniMax-M2 on H100 ( #28200 )
...
Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com>
2025-11-06 07:29:46 -08:00
Harry Mellor
8816e375d3
[Docs] Switch to directory style URLs ( #28058 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-06 07:06:33 -08:00
Michael Goin
f32229293e
Disable nm-testing models with issues in CI ( #28206 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-11-06 06:19:07 -08:00