Fardin Hoque
577c72a227
[CI Perf]Prune Tests in kernel/mamba ( #26538 )
...
Signed-off-by: Fardin Hoque <kfhfar@amazon.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-10-13 18:22:31 -04:00
Wentao Ye
314285d4f2
[CI] Fix mypy for vllm/distributed ( #26593 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-13 16:02:24 -04:00
wang.yuqi
d2a7938582
[Frontend][1/N] Improve all pooling task | Support FP16 Embedding Base64 (Still uses fp32 by default). ( #26414 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-10-13 19:06:43 +00:00
Alex Kogan
89342ce4c0
[Quantization] [Performance] Enable Marlin GEMM kernels for the calibration-free RTN-based quantization ( #26051 )
...
Signed-off-by: Alex Kogan <alex.kogan@oracle.com>
Signed-off-by: Alex Kogan <82225080+sakogan@users.noreply.github.com>
2025-10-13 18:52:54 +00:00
Yibo Cai
f89f599395
[CI][Release][Arm64]: Build arm64 release for gpu arch 8.9 ( #26698 )
2025-10-13 18:42:12 +00:00
Wentao Ye
e251e457c5
[Log] Optimize Startup Log ( #26601 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-14 02:06:57 +08:00
Cyrus Leung
afc47e4de7
[Model] Use merge_by_field_config for MM models (M-N) ( #26710 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-14 01:27:01 +08:00
Rahul Tuli
e3b90c1ba2
[Bugfix][Speculative Decoding] Extend Eagle quantization config fix to llama_eagle.py ( #26590 )
...
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
2025-10-13 17:17:13 +00:00
haoyangli-amd
134f70b3ed
[Bugfix][Rocm] fix qr error when different inp shape ( #25892 )
...
Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-10-13 10:04:21 -07:00
Sangyeon Cho
a1b2d658ee
[CI/Build] upgrade compressed-tensors to 0.12.2 to address LGPLv3 ( #26501 )
...
Signed-off-by: Sangyeon Cho <josang1204@gmail.com>
2025-10-13 12:58:33 -04:00
Aleksei Tsvetkov
5c7fe25491
[Misc] Separate prompt logging to debug ( #26713 )
...
Signed-off-by: Aleksei Tsvetkov <aitsvet@ya.ru>
2025-10-13 09:04:18 -07:00
Will Eaton
53c9a7cee2
[P/D] [NixlConnector] kv load recovery integration ( #26171 )
...
Signed-off-by: Will Eaton <weaton@redhat.com>
2025-10-13 08:48:04 -07:00
Michael Goin
0d21b9b51e
[UX] Speedup DeepGEMM warmup with heuristics ( #25619 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-10-13 07:59:27 -07:00
Anand Roy
10214b6935
[FEATURE]: Use pydantic validation in multimodal.py config ( #26629 )
...
Signed-off-by: Anand Roy <86306690+andycandy@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-13 07:56:59 -07:00
ihb2032
4a61950f4d
[Hardware][CPU] Disable torch.compile for RISC-V to prevent APIError ( #26693 )
...
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
Signed-off-by: ihb2032 <1355790728@qq.com>
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn
2025-10-13 07:56:01 -07:00
Bram Wasti
3263799056
[unrevert] Add batch invariant kernel override for FlashInfer backend [2/n] ( #26373 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com>
Signed-off-by: Bram Wasti <bwasti@fb.com>
2025-10-13 10:24:53 -04:00
Isotr0py
8e67b2557a
[Bugfix] Fix out of bound index issue for Jina-embedding-v3 RoPE with cuda graph ( #26687 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-10-13 03:21:48 -07:00
Jialin Ouyang
4073c82c4e
[ResponseAPI] Simplify input/output message serialization ( #26620 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-10-13 09:59:15 +00:00
wang.yuqi
767c3ab869
[Model][0/N] Improve all pooling task | clean up ( #25817 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-10-13 16:44:50 +08:00
Harry Mellor
4f207c7174
Ignore large reformatting PRs in git blame ( #26690 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-13 01:20:47 -07:00
CSWYF3634076
782505ed8e
[Model] Add reasoning_parser and tool_parser for Ernie45 thinking ( #25027 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
2025-10-13 15:55:20 +08:00
Jee Jee Li
98f30b8cba
[Model] Fix Skywork R1V mlp ( #26673 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-10-12 22:42:17 -07:00
yihong
3cd36660f7
docs: wrong command in structured_outputs README ( #26677 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-10-12 20:59:01 -07:00
yyzxw
46ad73955a
[FIX] Throwing an exception when the model does not support pool tasks ( #25840 ) ( #25855 )
...
Signed-off-by: zxw <1020938856@qq.com>
Co-authored-by: wang.yuqi <noooop@126.com>
2025-10-12 20:56:21 -07:00
quanliu
41f3884438
[Bugfix][Core]Fix block table out-of-range issue in priority scheduling ( #26661 )
...
Signed-off-by: quanliu <18646313696@163.com>
2025-10-13 01:25:42 +00:00
bnellnm
60e419c1ee
[Misc] cache result of disable_inplace ( #26666 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-10-13 00:17:50 +00:00
Michael Goin
7ef6052804
[CI/Build] Add tool to build vllm-tpu wheel ( #19165 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-10-12 16:25:40 -06:00
Huamin Li
4fca1a1bd2
[easy] fix pre commit error on trunk ( #26665 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com>
2025-10-12 21:25:34 +00:00
Lukas Geiger
a6049be73c
[Models][Qwen3VL] Speedup fast_pos_embed_interpolate ( #26647 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-10-13 01:20:07 +08:00
gjgjos
18ed7746ea
[Feature] Add support for naver/splade-v3 (BERT-based sparse embedding model) ( #26339 )
...
Signed-off-by: gjgjos <gjgjos@naver.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-10-12 17:00:52 +00:00
Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y ( #26633 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-12 09:51:31 -07:00
Chendi.Xue
9bb38130cb
[Bugfix] Fix GPU_ID issue in test script ( #26442 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
2025-10-12 11:39:05 +00:00
Jaya Yuan
b91d8db873
[Bugfix][DCP] Set default CUDAGraphMode to PIECEWISE for DCP ( #26574 )
...
Signed-off-by: FENP <32334296+FENP@users.noreply.github.com>
2025-10-12 09:58:38 +00:00
Isotr0py
045b396d09
[Bugfix][CI/Build] Fix failing Mteb CI ( #26638 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-10-12 02:42:42 -07:00
wang.yuqi
76852017ea
[MISC] Rename the torch profiler filename as instance_id+rank_id for merging the Profiler results of each Rank ( #25867 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-10-12 09:29:08 +00:00
Vadim Gimpelson
82e64c7a20
[PERF] [Qwen3-next] Speed up gated RMSNorm ( #26207 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-12 08:27:50 +00:00
wang.yuqi
4ca204055e
Add @noooop to codeowner for pooling models ( #26652 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-10-12 14:04:44 +08:00
Haisheng Chen
c5c8f5ea59
[EPLB] Support ernie4.5-moe ( #22100 )
...
Signed-off-by: Haisheng Chen <langzs335@outlook.com>
Signed-off-by: Haisheng Chen <60504847+HsChen-sys@users.noreply.github.com>
Signed-off-by: Haisheng Chen <hac048@ucsd.edu>
Co-authored-by: Haisheng Chen <langzs335@outlook.com>
2025-10-12 10:40:47 +08:00
Angela Yi
01653a917b
[compile] Fix inductor partition config ( #26645 )
...
Signed-off-by: angelayi <yiangela7@gmail.com>
2025-10-11 21:03:14 +00:00
Huamin Li
0cd103e7cb
CP: make correct_attn_out robust to 4‑D views and fix Triton arg binding ( #26509 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com>
2025-10-11 20:50:57 +00:00
Cyrus Leung
5be7ca1b99
[Benchmark] Support Infinity API ( #26641 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-12 01:45:32 +08:00
Jee Jee Li
f0a30a067b
[Bugfix] Fix qwen-moe packed_modules_mapping ( #26634 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-10-11 15:21:33 +00:00
JJJYmmm
9d6cff3ede
[Bugfix][Qwen3VL] fix deepstack in qwen3vl ( #26626 )
...
Signed-off-by: liuye.hj <liuye.hj@alibaba-inc.com>
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
Co-authored-by: liuye.hj <liuye.hj@alibaba-inc.com>
2025-10-11 05:58:33 -07:00
Angela Yi
a25f2adee9
[compile] Add patched_fused_scaled_matmul_reduce_scatter ( #26604 )
...
Signed-off-by: angelayi <yiangela7@gmail.com>
2025-10-11 05:44:43 -07:00
Chauncey
d0bed837ac
[Refactor]Reduce duplicate code in serving_chat ( #26627 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-10-11 12:04:49 +00:00
muzian666
f7ee69868a
[CPU] fix the issue when the node is '-' cause json decode error. ( #26562 )
...
Signed-off-by: muzian666 <andylee_2001@163.com>
Co-authored-by: qingan.li <qingan.li@wizpresso.com>
2025-10-11 12:04:04 +00:00
Rahul Tuli
d2a71530c1
Add EAGLE-3 Speculative Decoding Support for Qwen3 MoE ( #26485 )
...
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
2025-10-11 10:14:41 +00:00
ihb2032
086609de64
fix(nix): Allow local oneDNN path to fix vLLM CPU build failure ( #26401 )
...
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
Signed-off-by: ihb2032 <1355790728@qq.com>
2025-10-11 09:12:16 +00:00
dsinghvi
727144bed1
[Refactor]: Use M-RoPE interface directly while defining model class instead of maintaining model specific M-RoPE implementation in mrope.py ( #24172 )
...
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: wwl2755 <wangwenlong2755@gmail.com>
2025-10-11 07:21:04 +00:00
sangho.lee
55392bc879
[Bugfix][Multi Modal] Fix incorrect Molmo image processing ( #26563 )
...
Signed-off-by: sanghol <sanghol@allenai.org>
2025-10-10 22:28:23 -07:00