Max Wittig
fd85c9f426
[Bugfix][FE]: Always include usage with --enable-force-include-usage ( #20983 )
...
Signed-off-by: Max Wittig <max.wittig@siemens.com>
Signed-off-by: Antoine Auger <antoineauger@users.noreply.github.com>
Co-authored-by: Antoine Auger <antoineauger@users.noreply.github.com>
2025-10-14 09:17:39 +02:00
Ye (Charlotte) Qi
d32c611f45
[CI/Build] Use 127.0.0.1 instead of localhost in utils ( #26750 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-10-14 07:04:00 +00:00
CSWYF3634076
01ad27faff
[Model][Bugfix]fix ernie45 load failed due to ernie45 eplb code ( #26684 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
2025-10-14 06:55:23 +00:00
Ryan Li
481545b397
scheduler.py: Update the name of the default scheduler. ( #26758 )
...
Signed-off-by: Ryan Li <ryanli@ryanli.org>
2025-10-14 06:52:21 +00:00
Alexei-V-Ivanov-AMD
d3cc8427c0
[ci] Adding the test-amd.yaml for test definitions for the AMD backend. (alternative PR) ( #26718 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
2025-10-13 23:10:23 -07:00
vllmellm
4821ac1b4d
[CI] [ROCm] Automate CC list for ROCm related issue ( #26753 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-10-14 13:57:26 +08:00
XiongfeiWei
4497c8f821
Fix lora tests failure in TPU CI due to the removal of LoRA bias ( #26723 )
...
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
2025-10-14 13:04:23 +08:00
Michael Yao
2e36cdbe2b
[Docs] Add a start tag to build.inc.md ( #26747 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-10-13 21:51:55 -07:00
Maximilien de Bayser
fe3edb4cf0
Add support for the /rerank endpoint in vllm bench serve ( #26602 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-10-14 04:25:43 +00:00
Heng Guo
29350922c6
[Feature][Quantization] auto_round format add support for regex ( #24024 )
...
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: Heng Guo <heng.guo@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-14 03:03:16 +00:00
Varun Sundar Rabindranath
8ae169286f
[torch.compile] Unwrap fused_marlin_moe custom op ( #26739 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-10-14 02:22:16 +00:00
youkaichao
8a0af6a561
[build][torch.compile] upgrade depyf version ( #26702 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-10-14 10:12:09 +08:00
Jialin Ouyang
cfded80793
[Easy] Fix env type check errors from VLLM_DEBUG_LOG_API_SERVER_RESPONSE ( #26742 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-10-14 01:46:44 +00:00
Angela Yi
b59dd19b55
[compile] Enable sequence parallelism for full cuda graph without specifying compile sizes ( #26681 )
...
Signed-off-by: angelayi <yiangela7@gmail.com>
2025-10-13 18:15:34 -07:00
Michael Goin
3e051bda82
[UX] Replace VLLM_ALL2ALL_BACKEND with --all2all-backend ( #26732 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-10-13 18:12:52 -07:00
Lucia Fang
8317f72354
[Misc][DP] support customized aggregated logger for dp ( #24354 )
...
Signed-off-by: Lu Fang <fanglu@fb.com>
2025-10-13 17:45:59 -07:00
Maximilien de Bayser
d8bebb008a
Add tests for chunked prefill and prefix cache with causal pooling models ( #26526 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Ayush Singh <ayush1009208@gmail.com>
2025-10-14 07:45:04 +08:00
Jialin Ouyang
35bc22f23c
[ResponseAPI] Further polish message serialization and unit tests ( #26728 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-10-13 23:31:35 +00:00
Fardin Hoque
fa96fb9c70
Pruning kernel Core Tests ( #26727 )
...
Signed-off-by: Fardin Hoque <kfhfar@amazon.com>
2025-10-13 23:08:18 +00:00
Morrison Turnansky
e3fdb627d9
[FrontEnd] UNREVERT CompilationConfig overhaul ( #20283 ): deprecate use_inductor in favor of backend, simplify custom_ops ( #26502 )
...
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
2025-10-13 22:47:16 +00:00
Wentao Ye
7200a21cd1
[Bug] Fix Assertion error DeepEP/csrc/kernels/intranode.cu:928: 'false and Unsupported type' ( #26532 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-13 18:26:37 -04:00
Fardin Hoque
577c72a227
[CI Perf]Prune Tests in kernel/mamba ( #26538 )
...
Signed-off-by: Fardin Hoque <kfhfar@amazon.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-10-13 18:22:31 -04:00
Wentao Ye
314285d4f2
[CI] Fix mypy for vllm/distributed ( #26593 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-13 16:02:24 -04:00
wang.yuqi
d2a7938582
[Frontend][1/N] Improve all pooling task | Support FP16 Embedding Base64 (Still uses fp32 by default). ( #26414 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-10-13 19:06:43 +00:00
Alex Kogan
89342ce4c0
[Quantization] [Performance] Enable Marlin GEMM kernels for the calibration-free RTN-based quantization ( #26051 )
...
Signed-off-by: Alex Kogan <alex.kogan@oracle.com>
Signed-off-by: Alex Kogan <82225080+sakogan@users.noreply.github.com>
2025-10-13 18:52:54 +00:00
Yibo Cai
f89f599395
[CI][Release][Arm64]: Build arm64 release for gpu arch 8.9 ( #26698 )
2025-10-13 18:42:12 +00:00
Wentao Ye
e251e457c5
[Log] Optimize Startup Log ( #26601 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-14 02:06:57 +08:00
Cyrus Leung
afc47e4de7
[Model] Use merge_by_field_config for MM models (M-N) ( #26710 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-14 01:27:01 +08:00
Rahul Tuli
e3b90c1ba2
[Bugfix][Speculative Decoding] Extend Eagle quantization config fix to llama_eagle.py ( #26590 )
...
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
2025-10-13 17:17:13 +00:00
haoyangli-amd
134f70b3ed
[Bugfix][Rocm] fix qr error when different inp shape ( #25892 )
...
Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-10-13 10:04:21 -07:00
Sangyeon Cho
a1b2d658ee
[CI/Build] upgrade compressed-tensors to 0.12.2 to address LGPLv3 ( #26501 )
...
Signed-off-by: Sangyeon Cho <josang1204@gmail.com>
2025-10-13 12:58:33 -04:00
Aleksei Tsvetkov
5c7fe25491
[Misc] Separate prompt logging to debug ( #26713 )
...
Signed-off-by: Aleksei Tsvetkov <aitsvet@ya.ru>
2025-10-13 09:04:18 -07:00
Will Eaton
53c9a7cee2
[P/D] [NixlConnector] kv load recovery integration ( #26171 )
...
Signed-off-by: Will Eaton <weaton@redhat.com>
2025-10-13 08:48:04 -07:00
Michael Goin
0d21b9b51e
[UX] Speedup DeepGEMM warmup with heuristics ( #25619 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-10-13 07:59:27 -07:00
Anand Roy
10214b6935
[FEATURE]: Use pydantic validation in multimodal.py config ( #26629 )
...
Signed-off-by: Anand Roy <86306690+andycandy@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-13 07:56:59 -07:00
ihb2032
4a61950f4d
[Hardware][CPU] Disable torch.compile for RISC-V to prevent APIError ( #26693 )
...
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
Signed-off-by: ihb2032 <1355790728@qq.com>
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn
2025-10-13 07:56:01 -07:00
Bram Wasti
3263799056
[unrevert] Add batch invariant kernel override for FlashInfer backend [2/n] ( #26373 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com>
Signed-off-by: Bram Wasti <bwasti@fb.com>
2025-10-13 10:24:53 -04:00
Isotr0py
8e67b2557a
[Bugfix] Fix out of bound index issue for Jina-embedding-v3 RoPE with cuda graph ( #26687 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-10-13 03:21:48 -07:00
Jialin Ouyang
4073c82c4e
[ResponseAPI] Simplify input/output message serialization ( #26620 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-10-13 09:59:15 +00:00
wang.yuqi
767c3ab869
[Model][0/N] Improve all pooling task | clean up ( #25817 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-10-13 16:44:50 +08:00
Harry Mellor
4f207c7174
Ignore large reformatting PRs in git blame ( #26690 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-13 01:20:47 -07:00
CSWYF3634076
782505ed8e
[Model] Add reasoning_parser and tool_parser for Ernie45 thinking ( #25027 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
2025-10-13 15:55:20 +08:00
Jee Jee Li
98f30b8cba
[Model] Fix Skywork R1V mlp ( #26673 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-10-12 22:42:17 -07:00
yihong
3cd36660f7
docs: wrong command in structured_outputs README ( #26677 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-10-12 20:59:01 -07:00
yyzxw
46ad73955a
[FIX] Throwing an exception when the model does not support pool tasks ( #25840 ) ( #25855 )
...
Signed-off-by: zxw <1020938856@qq.com>
Co-authored-by: wang.yuqi <noooop@126.com>
2025-10-12 20:56:21 -07:00
quanliu
41f3884438
[Bugfix][Core]Fix block table out-of-range issue in priority scheduling ( #26661 )
...
Signed-off-by: quanliu <18646313696@163.com>
2025-10-13 01:25:42 +00:00
bnellnm
60e419c1ee
[Misc] cache result of disable_inplace ( #26666 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-10-13 00:17:50 +00:00
Michael Goin
7ef6052804
[CI/Build] Add tool to build vllm-tpu wheel ( #19165 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-10-12 16:25:40 -06:00
Huamin Li
4fca1a1bd2
[easy] fix pre commit error on trunk ( #26665 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com>
2025-10-12 21:25:34 +00:00
Lukas Geiger
a6049be73c
[Models][Qwen3VL] Speedup fast_pos_embed_interpolate ( #26647 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-10-13 01:20:07 +08:00