10518 Commits

Author SHA1 Message Date
CSWYF3634076
782505ed8e
[Model] Add reasoning_parser and tool_parser for Ernie45 thinking (#25027)
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
2025-10-13 15:55:20 +08:00
Jee Jee Li
98f30b8cba
[Model] Fix Skywork R1V mlp (#26673)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-10-12 22:42:17 -07:00
yihong
3cd36660f7
docs: wrong command in structured_outputs README (#26677)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-10-12 20:59:01 -07:00
yyzxw
46ad73955a
[FIX] Throwing an exception when the model does not support pool tasks (#25840) (#25855)
Signed-off-by: zxw <1020938856@qq.com>
Co-authored-by: wang.yuqi <noooop@126.com>
2025-10-12 20:56:21 -07:00
quanliu
41f3884438
[Bugfix][Core]Fix block table out-of-range issue in priority scheduling (#26661)
Signed-off-by: quanliu <18646313696@163.com>
2025-10-13 01:25:42 +00:00
bnellnm
60e419c1ee
[Misc] cache result of disable_inplace (#26666)
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-10-13 00:17:50 +00:00
Michael Goin
7ef6052804
[CI/Build] Add tool to build vllm-tpu wheel (#19165)
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-10-12 16:25:40 -06:00
Huamin Li
4fca1a1bd2
[easy] fix pre commit error on trunk (#26665)
Signed-off-by: Huamin Li <3ericli@gmail.com>
2025-10-12 21:25:34 +00:00
Lukas Geiger
a6049be73c
[Models][Qwen3VL] Speedup fast_pos_embed_interpolate (#26647)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-10-13 01:20:07 +08:00
gjgjos
18ed7746ea
[Feature] Add support for naver/splade-v3 (BERT-based sparse embedding model) (#26339)
Signed-off-by: gjgjos <gjgjos@naver.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-10-12 17:00:52 +00:00
Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-12 09:51:31 -07:00
Chendi.Xue
9bb38130cb
[Bugfix] Fix GPU_ID issue in test script (#26442)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
2025-10-12 11:39:05 +00:00
Jaya Yuan
b91d8db873
[Bugfix][DCP] Set default CUDAGraphMode to PIECEWISE for DCP (#26574)
Signed-off-by: FENP <32334296+FENP@users.noreply.github.com>
2025-10-12 09:58:38 +00:00
Isotr0py
045b396d09
[Bugfix][CI/Build] Fix failing Mteb CI (#26638)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-10-12 02:42:42 -07:00
wang.yuqi
76852017ea
[MISC] Rename the torch profiler filename as instance_id+rank_id for merging the Profiler results of each Rank (#25867)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-10-12 09:29:08 +00:00
Vadim Gimpelson
82e64c7a20
[PERF] [Qwen3-next] Speed up gated RMSNorm (#26207)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-12 08:27:50 +00:00
wang.yuqi
4ca204055e
Add @noooop to codeowner for pooling models (#26652)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-10-12 14:04:44 +08:00
Haisheng Chen
c5c8f5ea59
[EPLB] Support ernie4.5-moe (#22100)
Signed-off-by: Haisheng Chen <langzs335@outlook.com>
Signed-off-by: Haisheng Chen <60504847+HsChen-sys@users.noreply.github.com>
Signed-off-by: Haisheng Chen <hac048@ucsd.edu>
Co-authored-by: Haisheng Chen <langzs335@outlook.com>
2025-10-12 10:40:47 +08:00
Angela Yi
01653a917b
[compile] Fix inductor partition config (#26645)
Signed-off-by: angelayi <yiangela7@gmail.com>
2025-10-11 21:03:14 +00:00
Huamin Li
0cd103e7cb
CP: make correct_attn_out robust to 4‑D views and fix Triton arg binding (#26509)
Signed-off-by: Huamin Li <3ericli@gmail.com>
2025-10-11 20:50:57 +00:00
Cyrus Leung
5be7ca1b99
[Benchmark] Support Infinity API (#26641)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-12 01:45:32 +08:00
Jee Jee Li
f0a30a067b
[Bugfix] Fix qwen-moe packed_modules_mapping (#26634)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-10-11 15:21:33 +00:00
JJJYmmm
9d6cff3ede
[Bugfix][Qwen3VL] fix deepstack in qwen3vl (#26626)
Signed-off-by: liuye.hj <liuye.hj@alibaba-inc.com>
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
Co-authored-by: liuye.hj <liuye.hj@alibaba-inc.com>
2025-10-11 05:58:33 -07:00
Angela Yi
a25f2adee9
[compile] Add patched_fused_scaled_matmul_reduce_scatter (#26604)
Signed-off-by: angelayi <yiangela7@gmail.com>
2025-10-11 05:44:43 -07:00
Chauncey
d0bed837ac
[Refactor]Reduce duplicate code in serving_chat (#26627)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-10-11 12:04:49 +00:00
muzian666
f7ee69868a
[CPU] fix the issue when the node is '-' cause json decode error. (#26562)
Signed-off-by: muzian666 <andylee_2001@163.com>
Co-authored-by: qingan.li <qingan.li@wizpresso.com>
2025-10-11 12:04:04 +00:00
Rahul Tuli
d2a71530c1
Add EAGLE-3 Speculative Decoding Support for Qwen3 MoE (#26485)
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
2025-10-11 10:14:41 +00:00
ihb2032
086609de64
fix(nix): Allow local oneDNN path to fix vLLM CPU build failure (#26401)
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
Signed-off-by: ihb2032 <1355790728@qq.com>
2025-10-11 09:12:16 +00:00
dsinghvi
727144bed1
[Refactor]: Use M-RoPE interface directly while defining model class instead of maintaining model specific M-RoPE implementation in mrope.py (#24172)
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: wwl2755 <wangwenlong2755@gmail.com>
2025-10-11 07:21:04 +00:00
sangho.lee
55392bc879
[Bugfix][Multi Modal] Fix incorrect Molmo image processing (#26563)
Signed-off-by: sanghol <sanghol@allenai.org>
2025-10-10 22:28:23 -07:00
Roger Wang
ddaff2938e
[MM] Move Qwen3Omni MRoPE impl to model file (#26608)
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-10 22:17:24 -07:00
liuzhenwei
27ed39a347
[XPU] Upgrade NIXL to remove CUDA dependency (#26570)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
2025-10-11 05:15:23 +00:00
Nishidha Panpaliya
8f8474fbe3
[CI/Build] Fix ppc64le CPU build and tests (#22443)
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>
2025-10-11 13:04:42 +08:00
Chauncey
be067861c6
[Frontend] Improve the performance of is_reasoning_end (#25735)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-10-11 10:43:39 +08:00
Nick Hill
5bc26c438d
[BugFix] Make penalties and bad_words work with async scheduling (#26467)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-10-10 23:27:04 +00:00
Zhengxu Chen
eef921f45e
AOT Compilation for torch.compile (Bundled) (#24274)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
2025-10-10 19:02:11 -04:00
Bram Wasti
e317414ce1
Cache the environment variable check for batch invariance (#26510)
Signed-off-by: Bram Wasti <bwasti@meta.com>
2025-10-10 22:47:34 +00:00
Nick Hill
949cb0170d
[BugFix] Fix async scheduling + request preemption (#26385)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-10-10 20:29:57 +00:00
Vadim Gimpelson
e94cfd51da
[BUG] Qwen3-next MTP. Fix attn metadata build bug (#26564)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2025-10-10 14:59:03 -04:00
Harry Mellor
7c12763b24
Fix some typing issues found by mypy==1.18.2 (#26596)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-10 18:21:25 +00:00
Will Eaton
3b780a4bbb
Update CUDA architecture list in build pipeline for 12.9.1 wheels (#26592)
Signed-off-by: Will Eaton <wseaton@users.noreply.github.com>
2025-10-10 11:15:27 -07:00
Harry Mellor
30f78af147
Update pre-commit hook versions (#26591)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-10 17:03:44 +00:00
Xiong Wang
19a9b169bf
Add Qwen3-Omni moe thinker (#25550)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Xiong Wang <feizi.wx@alibaba-inc.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-10-10 17:00:56 +00:00
Roberto L. Castro
96ad65b7fe
[Transform] [Quantization] Add QuTLASS support to vLLM (#24440)
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Signed-off-by: Andrei Panferov <andrei@panferov.org>
Co-authored-by: Andrei Panferov <andrei@panferov.org>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-10-10 09:43:40 -07:00
Shane A
8d2b8c0ff2
[Model] Add FlexOlmo model implementation (#24923)
Signed-off-by: Shane A <shanea@allenai.org>
2025-10-10 09:43:15 -07:00
Lukas Geiger
b2155ed317
[Model][Qwen3VL] Compute cu_seqlens on CPU to remove (#26496)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-10 09:42:17 -07:00
Chauncey
910abdbd08
[Bugfix] fixed top_logprobs: -1 does not appear to work as intended (#26470)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-10-11 00:41:17 +08:00
baonudesifeizhai
cddce79fda
[torch.compile] Make inductor partition rules respect splitting_ops #25691 (#25845)
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-10-10 16:35:28 +00:00
Mark McLoughlin
e519281920
[Metrics] Add test for multi-modal cache stats logging (#26588)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-10-10 16:00:50 +00:00
Elvir Crnčević
7b03584de8
Silu v2 (#25074)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: elvircrn <elvircrn@gmail.com>
Signed-off-by: Elvir Crnčević <elvircrn@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
2025-10-10 15:19:53 +00:00