youkaichao
|
650b51f9f9
|
[doc] add Context Parallel Deployment doc (#26877)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-10-15 16:33:52 +08:00 |
|
Cyrus Leung
|
6256697997
|
[Doc] ruff format remaining Python examples (#26795)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-15 01:25:49 -07:00 |
|
Wentao Ye
|
71557a5f7c
|
[CI] Fix mypy for vllm/executor (#26845)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-15 01:23:33 -07:00 |
|
Zhewen Li
|
f3c378ffa7
|
[CI/Build] Add Qwen2.5-VL-7B-Instruct ChartQA Accuracy Tests in CI (#21810)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: zhewenli <zhewenli@meta.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
Co-authored-by: Ye (Charlotte) Qi <ye.charlotte.qi@gmail.com>
|
2025-10-15 08:09:56 +00:00 |
|
Yongye Zhu
|
f5ed68ef63
|
[Deepseek-V3.2][Kernel] Integrate cuda indexer k cache gather (#26456)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
|
2025-10-15 16:05:01 +08:00 |
|
Angela Yi
|
efdef57b1f
|
[bugfix] Lazy import cv2 (#26869)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2025-10-15 07:47:50 +00:00 |
|
Cyrus Leung
|
b8a4572157
|
[Misc] Use helper function to generate dummy messages in OpenAI MM tests (#26875)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-15 07:17:37 +00:00 |
|
Mengqing Cao
|
302ef403a2
|
[DSA][MLA] Tiny refactor on DeepSeek to make it reusable for different backends (#26656)
Signed-off-by: MengqingCao <cmq0113@163.com>
|
2025-10-15 00:16:44 -07:00 |
|
sangho.lee
|
8865da157b
|
[Bugfix][Multi Modal] Fix incorrect Molmo token processing (#26873)
Signed-off-by: sanghol <sanghol@allenai.org>
|
2025-10-15 07:13:59 +00:00 |
|
Boyuan Feng
|
f0862eae43
|
[Graph Partition] pass tests for decorator (#26831)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-10-15 06:39:48 +00:00 |
|
Isotr0py
|
8c851f6d04
|
[Bugfix] Fix qwen3-omni audio truncation issue (#26815)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-15 05:38:36 +00:00 |
|
Angela Yi
|
7cfa420f49
|
[BugFix] Patch inductor partitioning logic (#26735)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2025-10-15 05:04:32 +00:00 |
|
rongfu.leng
|
a27b288e4a
|
[Feature] default --extra-body param to disable thinking in vllm bench serve (#26784)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-10-15 04:23:44 +00:00 |
|
zhrrr
|
e471d7ca7e
|
[CI/Build][Bugfix] fix qutlass cmake error when set QUTLASS_SRC_DIR (#26773)
Signed-off-by: izhuhaoran <izhuhaoran@qq.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-10-15 04:09:44 +00:00 |
|
Michael Yao
|
c43ca8259e
|
[Docs] Move build.inc into arm.inc (#26862)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-10-14 20:35:08 -07:00 |
|
Tao Hui
|
85a65e7f51
|
[Model] Add DeepSeek-V3.1 reasoning parser (split from PR #24972) (#25589)
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: Tao Hui <taohui3@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-10-15 11:09:52 +08:00 |
|
kourosh hakhamaneshi
|
a2986b3e33
|
[Bugfix] Fixes prefix-repetition benchmark script (#26828)
Signed-off-by: Kourosh Hakhamaneshi <Kourosh@anyscale.com>
|
2025-10-15 02:54:43 +00:00 |
|
Morrison Turnansky
|
96b9aa5aa0
|
[Frontend][torch.compile] CompilationConfig Overhaul (#20283): name change compilation level to compilation mode, deprecation compilation level (#26355)
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-15 02:51:16 +00:00 |
|
Michael Goin
|
e66d787bce
|
Disable FlashInfer sampler by default (#26859)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-15 02:35:18 +00:00 |
|
Chendi.Xue
|
bfad142e25
|
[BUGFIX][NIXL] quick fix for 'assert self.connector_worker is not None' in get_kv_connector_stats (#26851)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2025-10-15 02:33:25 +00:00 |
|
Zhikaiiii
|
9354660036
|
[Bugfix]fix Qwen3 xml tool parser (#26345)
Signed-off-by: Zhikaiiii <1658973216@qq.com>
|
2025-10-15 09:50:30 +08:00 |
|
Jialin Ouyang
|
07ca70af8d
|
[Core][Easy] Use envs.__getattr__ for all Unify to environment variable access (#26810)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-10-15 01:41:18 +00:00 |
|
Luka Govedič
|
2dcd12d357
|
[torch.compile] Fix tests for torch==2.9 inductor partition (#26116)
Signed-off-by: ProExpertProg <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
|
2025-10-14 19:55:02 -04:00 |
|
Tyler Michael Smith
|
579d2e5458
|
[WideEP][P/D] Add usage stats for DP+EP and KV Connector (#26836)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2025-10-14 23:51:54 +00:00 |
|
Ye Hu
|
0512c04aee
|
[frontend][gptoss] Add per turn stats into Harmony Context (#25061)
Signed-off-by: lacora <hyelacora@gmail.com>
Co-authored-by: Ye Hu <yehu@fb.com>
|
2025-10-14 16:48:13 -07:00 |
|
Michael Goin
|
7e0ef4084a
|
[CI Failure] Fix torchao dep failure for Quantization Test (#26824)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-14 16:41:43 -07:00 |
|
Nick Hill
|
4aed506b65
|
[Core] Streamline some structured output related code (#26737)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-14 23:27:44 +00:00 |
|
Boyuan Feng
|
a86b4c58e8
|
remove attn output view kernel (#26680)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
Signed-off-by: Boyuan Feng <fby.1994@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-14 22:53:10 +00:00 |
|
Nick Hill
|
ff4810ba73
|
[Minor] Group async_scheduling related fields in model runner init (#26736)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-14 14:46:37 -07:00 |
|
Nan Qin
|
9d6964926e
|
fix: response_format for completion (#23212)
Signed-off-by: Nan2018 <qinnanjoshua@gmail.com>
|
2025-10-14 21:23:22 +00:00 |
|
Dhruvil Bhatt
|
0e65818910
|
Added MoE configs for llama 4, H200 device with tp=4/8 tuning (#26837)
Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
|
2025-10-14 14:21:03 -07:00 |
|
Jialin Ouyang
|
380f17527c
|
[Perf] Cache vllm.env.__getattr__ result to avoid recomputation (#26146)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-10-14 17:03:21 -04:00 |
|
HDCharles
|
b92ab3deda
|
Notice for deprecation of AutoAWQ (#26820)
Signed-off-by: HDCharles <39544797+HDCharles@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-14 13:39:59 -07:00 |
|
Jialin Ouyang
|
acaa2c0a4a
|
[Core] Reuse empty block lists whenever possible in KVCacheBlocks to mitigate GC costs (#24964)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-10-14 12:58:43 -07:00 |
|
Matthew Bonanni
|
82af928c41
|
[Attention][Spec Decode] FlashMLA spec decode support (#26541)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-10-14 19:38:20 +00:00 |
|
Huamin Li
|
87efc681db
|
llama4_vision_rope: add HIP override to accept (q, k) and avoid (positions, q, k) mismatch (#26790)
Signed-off-by: Huamin Li <3ericli@gmail.com>
|
2025-10-14 11:54:12 -07:00 |
|
Michael Goin
|
c3a722fcb2
|
[CI Failure] Fix tests with missing TinyLlama-1.1B-Chat-v1.0-FP8-e2e (#26816)
Signed-off-by: mgoin <mgoin64@gmail.com>
v0.11.1rc1
|
2025-10-14 18:38:59 +00:00 |
|
Ze'ev Klapow
|
aba48f7db1
|
[Kernel][MoE] Add MoE tunings for GLM 4.6-FP8 and GLM 4.5 Air on NVidia B200 (#26818)
|
2025-10-14 11:20:39 -07:00 |
|
Michael Goin
|
04b5f9802d
|
[CI] Raise VLLM_MAX_SIZE_MB to 500 due to failing Build wheel - CUDA 12.9 (#26722)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-14 10:52:05 -07:00 |
|
Reza Barazesh
|
efc8f7d814
|
Update coveragerc and add codecov.yml for path fixes (#26435)
Signed-off-by: Reza Barazesh <rezabarazesh@meta.com>
|
2025-10-14 09:45:06 -07:00 |
|
Wentao Ye
|
6d87a2838c
|
[Config] Remove Unused Environment Variable VLLM_DISABLE_PAD_FOR_CUDAGRAPH (#26743)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-14 11:47:49 -04:00 |
|
wang.yuqi
|
e6cdbd6792
|
Revert "[issues template] Encourage the author implement their own ideas" (#26814)
|
2025-10-14 08:37:34 -07:00 |
|
Chauncey
|
df850c4912
|
[Feature][Responses API] Stream Function Call - harmony (#24317)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-10-14 08:31:43 -07:00 |
|
Qier Li
|
720394de43
|
[KVConnector][Metrics] Aggregate scheduler-side KVConnectorStats (#26046)
Signed-off-by: Qier Li <kevin44036@gmail.com>
|
2025-10-14 14:38:07 +00:00 |
|
wang.yuqi
|
88a49745af
|
[issues template] Encourage the author implement their own ideas (#26671)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-10-14 22:32:36 +08:00 |
|
Boyuan Feng
|
ca683a2a72
|
use combo kernel to fuse qk-norm and qk-rope (#26682)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-10-14 09:40:59 -04:00 |
|
汪志鹏
|
e9f1b8c9e9
|
Adjusted the model order of the model registration file (#26798)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
2025-10-14 13:26:11 +00:00 |
|
Jaya Yuan
|
ea97940d6c
|
[DCP] Support Decode Context Parallel (DCP) for GQA with FlashAttention (#24864)
Signed-off-by: yuanyongjie.yyj <yuanyongjie.yyj@antgroup.com>
Signed-off-by: FENP <32334296+FENP@users.noreply.github.com>
Signed-off-by: Jaya Yuan <yuanyongjie.yyj@antgroup.com>
|
2025-10-14 13:07:50 +00:00 |
|
Jee Jee Li
|
fdd32750f0
|
[CI/Build] Cleanup LoRA test (#26752)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-14 12:06:35 +00:00 |
|
Vladislav Bronzov
|
c715ba3735
|
[Feature] Change vllm.py with pydantic validation (#26726)
Signed-off-by: Vladislav <vladislav.bronzov@gmail.com>
Signed-off-by: Vladislav Bronzov <58587565+VladOS95-cyber@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-14 12:00:54 +00:00 |
|