Nick Hill
4aed506b65
[Core] Streamline some structured output related code ( #26737 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-10-14 23:27:44 +00:00
Boyuan Feng
a86b4c58e8
remove attn output view kernel ( #26680 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com>
Signed-off-by: Boyuan Feng <fby.1994@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-14 22:53:10 +00:00
Nick Hill
ff4810ba73
[Minor] Group async_scheduling related fields in model runner init ( #26736 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-10-14 14:46:37 -07:00
Nan Qin
9d6964926e
fix: response_format for completion ( #23212 )
...
Signed-off-by: Nan2018 <qinnanjoshua@gmail.com>
2025-10-14 21:23:22 +00:00
Dhruvil Bhatt
0e65818910
Added MoE configs for llama 4, H200 device with tp=4/8 tuning ( #26837 )
...
Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
2025-10-14 14:21:03 -07:00
Jialin Ouyang
380f17527c
[Perf] Cache vllm.env.__getattr__ result to avoid recomputation ( #26146 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-10-14 17:03:21 -04:00
HDCharles
b92ab3deda
Notice for deprecation of AutoAWQ ( #26820 )
...
Signed-off-by: HDCharles <39544797+HDCharles@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-14 13:39:59 -07:00
Jialin Ouyang
acaa2c0a4a
[Core] Reuse empty block lists whenever possible in KVCacheBlocks to mitigate GC costs ( #24964 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-10-14 12:58:43 -07:00
Matthew Bonanni
82af928c41
[Attention][Spec Decode] FlashMLA spec decode support ( #26541 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-10-14 19:38:20 +00:00
Huamin Li
87efc681db
llama4_vision_rope: add HIP override to accept (q, k) and avoid (positions, q, k) mismatch ( #26790 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com>
2025-10-14 11:54:12 -07:00
Michael Goin
c3a722fcb2
[CI Failure] Fix tests with missing TinyLlama-1.1B-Chat-v1.0-FP8-e2e ( #26816 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
v0.11.1rc1
2025-10-14 18:38:59 +00:00
Ze'ev Klapow
aba48f7db1
[Kernel][MoE] Add MoE tunings for GLM 4.6-FP8 and GLM 4.5 Air on NVidia B200 ( #26818 )
2025-10-14 11:20:39 -07:00
Michael Goin
04b5f9802d
[CI] Raise VLLM_MAX_SIZE_MB to 500 due to failing Build wheel - CUDA 12.9 ( #26722 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-10-14 10:52:05 -07:00
Reza Barazesh
efc8f7d814
Update coveragerc and add codecov.yml for path fixes ( #26435 )
...
Signed-off-by: Reza Barazesh <rezabarazesh@meta.com>
2025-10-14 09:45:06 -07:00
Wentao Ye
6d87a2838c
[Config] Remove Unused Environment Variable VLLM_DISABLE_PAD_FOR_CUDAGRAPH ( #26743 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-14 11:47:49 -04:00
wang.yuqi
e6cdbd6792
Revert "[issues template] Encourage the author implement their own ideas" ( #26814 )
2025-10-14 08:37:34 -07:00
Chauncey
df850c4912
[Feature][Responses API] Stream Function Call - harmony ( #24317 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-10-14 08:31:43 -07:00
Qier Li
720394de43
[KVConnector][Metrics] Aggregate scheduler-side KVConnectorStats ( #26046 )
...
Signed-off-by: Qier Li <kevin44036@gmail.com>
2025-10-14 14:38:07 +00:00
wang.yuqi
88a49745af
[issues template] Encourage the author implement their own ideas ( #26671 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-10-14 22:32:36 +08:00
Boyuan Feng
ca683a2a72
use combo kernel to fuse qk-norm and qk-rope ( #26682 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-10-14 09:40:59 -04:00
汪志鹏
e9f1b8c9e9
Adjusted the model order of the model registration file ( #26798 )
...
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
2025-10-14 13:26:11 +00:00
Jaya Yuan
ea97940d6c
[DCP] Support Decode Context Parallel (DCP) for GQA with FlashAttention ( #24864 )
...
Signed-off-by: yuanyongjie.yyj <yuanyongjie.yyj@antgroup.com>
Signed-off-by: FENP <32334296+FENP@users.noreply.github.com>
Signed-off-by: Jaya Yuan <yuanyongjie.yyj@antgroup.com>
2025-10-14 13:07:50 +00:00
Jee Jee Li
fdd32750f0
[CI/Build] Cleanup LoRA test ( #26752 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-10-14 12:06:35 +00:00
Vladislav Bronzov
c715ba3735
[Feature] Change vllm.py with pydantic validation ( #26726 )
...
Signed-off-by: Vladislav <vladislav.bronzov@gmail.com>
Signed-off-by: Vladislav Bronzov <58587565+VladOS95-cyber@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-14 12:00:54 +00:00
Cyrus Leung
9c4cb68339
[Chore] Remove SupportsV0Only interface and update supported models docs ( #26783 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-14 04:55:10 -07:00
Chauncey
780eb03d9b
[CI] Fix test_tool_id_kimi_k2 ( #26787 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-10-14 10:27:07 +00:00
Cyrus Leung
ef9676a1f1
[Doc] ruff format some Python examples ( #26767 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-14 03:21:53 -07:00
Harry Mellor
70b1b330e1
Don't allow typos to fix by default ( #26785 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-14 03:05:15 -07:00
Cyrus Leung
d1d063a588
[Chore] Use max_transformers_version for Qwen-VL test ( #26792 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-14 03:03:46 -07:00
Chendi.Xue
7e6edb1469
[NIXL][HeteroTP] Enable KV transfer from HND prefill to NHD decode ( #26556 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
2025-10-14 09:46:05 +00:00
Cyrus Leung
74704d4553
[Model] Use merge_by_field_config for MM models (O-P) ( #26776 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-14 09:42:45 +00:00
Cyrus Leung
d2f816d6ff
[Bugfix] Standardize merging multimodal embeddings ( #26771 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-14 09:36:21 +00:00
wangxiyuan
577d498212
[Plugin] Make plugin group clear ( #26757 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-10-14 07:49:59 +00:00
Max Wittig
fd85c9f426
[Bugfix][FE]: Always include usage with --enable-force-include-usage ( #20983 )
...
Signed-off-by: Max Wittig <max.wittig@siemens.com>
Signed-off-by: Antoine Auger <antoineauger@users.noreply.github.com>
Co-authored-by: Antoine Auger <antoineauger@users.noreply.github.com>
2025-10-14 09:17:39 +02:00
Ye (Charlotte) Qi
d32c611f45
[CI/Build] Use 127.0.0.1 instead of localhost in utils ( #26750 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-10-14 07:04:00 +00:00
CSWYF3634076
01ad27faff
[Model][Bugfix]fix ernie45 load failed due to ernie45 eplb code ( #26684 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
2025-10-14 06:55:23 +00:00
Ryan Li
481545b397
scheduler.py: Update the name of the default scheduler. ( #26758 )
...
Signed-off-by: Ryan Li <ryanli@ryanli.org>
2025-10-14 06:52:21 +00:00
Alexei-V-Ivanov-AMD
d3cc8427c0
[ci] Adding the test-amd.yaml for test definitions for the AMD backend. (alternative PR) ( #26718 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
2025-10-13 23:10:23 -07:00
vllmellm
4821ac1b4d
[CI] [ROCm] Automate CC list for ROCm related issue ( #26753 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-10-14 13:57:26 +08:00
XiongfeiWei
4497c8f821
Fix lora tests failure in TPU CI due to the removal of LoRA bias ( #26723 )
...
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
2025-10-14 13:04:23 +08:00
Michael Yao
2e36cdbe2b
[Docs] Add a start tag to build.inc.md ( #26747 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-10-13 21:51:55 -07:00
Maximilien de Bayser
fe3edb4cf0
Add support for the /rerank endpoint in vllm bench serve ( #26602 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-10-14 04:25:43 +00:00
Heng Guo
29350922c6
[Feature][Quantization] auto_round format add support for regex ( #24024 )
...
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: Heng Guo <heng.guo@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-14 03:03:16 +00:00
Varun Sundar Rabindranath
8ae169286f
[torch.compile] Unwrap fused_marlin_moe custom op ( #26739 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-10-14 02:22:16 +00:00
youkaichao
8a0af6a561
[build][torch.compile] upgrade depyf version ( #26702 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-10-14 10:12:09 +08:00
Jialin Ouyang
cfded80793
[Easy] Fix env type check errors from VLLM_DEBUG_LOG_API_SERVER_RESPONSE ( #26742 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-10-14 01:46:44 +00:00
Angela Yi
b59dd19b55
[compile] Enable sequence parallelism for full cuda graph without specifying compile sizes ( #26681 )
...
Signed-off-by: angelayi <yiangela7@gmail.com>
2025-10-13 18:15:34 -07:00
Michael Goin
3e051bda82
[UX] Replace VLLM_ALL2ALL_BACKEND with --all2all-backend ( #26732 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-10-13 18:12:52 -07:00
Lucia Fang
8317f72354
[Misc][DP] support customized aggregated logger for dp ( #24354 )
...
Signed-off-by: Lu Fang <fanglu@fb.com>
2025-10-13 17:45:59 -07:00
Maximilien de Bayser
d8bebb008a
Add tests for chunked prefill and prefix cache with causal pooling models ( #26526 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Ayush Singh <ayush1009208@gmail.com>
2025-10-14 07:45:04 +08:00