12027 Commits

Author SHA1 Message Date
Lucas Wilkinson
0044c4038c
[BugFix][DeepSeek-V3.2] Fix backend selection logic for Blackwell (#30195) 2025-12-07 10:53:51 -05:00
Isotr0py
b952f4d3c3
[v1] Add PrefixLM support to FlexAttention backend (#27938)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-07 15:51:36 +00:00
Wentao Ye
541a2ef892
[Perf] Deepgemm fused layout kernel for activations, 4.3% throughput improvement, 10.7% TTFT improvement. (#29546)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-07 20:31:14 +08:00
Jee Jee Li
b0f4866a77
[CI/Build]Temporary workaround for test_default_mm_loras timeout (#30202)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-07 20:27:11 +08:00
Jinzhen Lin
879ddb09c3
[Kernel][MoE] optimize moe_align_block_size (#29642)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-07 01:58:47 -08:00
Yifan Qiao
1b0482b9d1
[Misc][Core] Remove unused req_index increment in scheduler (#30176)
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
2025-12-07 08:39:21 +00:00
Cyrus Leung
e83b7e379c
Revert "[Renderer] Separate out RendererConfig from ModelConfig (#30145)" (#30199) 2025-12-07 00:00:22 -08:00
Cyrus Leung
27f4c2fd46
[Renderer] Separate out RendererConfig from ModelConfig (#30145)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 23:15:42 -08:00
Luke
a49d813fa8
Lazy loading to avoid importing all files (#29716)
Signed-off-by: Luke <yq0536@gmail.com>
2025-12-07 07:13:14 +00:00
Wentao Ye
17eb25e327
[Perf] Enable cuda graph for deepepHT, 5.3% throughput improvement, 4.4% TTFT improvement (#29558)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-07 04:44:50 +00:00
jeremyteboul
dce6d229f7
Support multiple image/audio embeddings per requests (#29988)
Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com>
Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>
2025-12-07 04:34:24 +00:00
Yanan Cao
cbedb703cc
[Frontend] Remove confusing -O.xx flag error (#30169)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
2025-12-07 02:53:42 +00:00
AuruTus
8d3da4c79d
[MISC]: change NIXL compatibility hash logging level to debug (#30182) 2025-12-07 00:21:03 +00:00
Andrew Xia
421125d03a
[ez] move harmony utils to parser folder (#30117)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-12-06 17:34:34 -05:00
Cyrus Leung
671427efbf
[Model] Move multimodal_cpu_fields definition to field config (#30181)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 13:40:02 +00:00
Viacheslav
21bb323542
Gigachat 3 tool parser and tests (#29905)
Signed-off-by: Viacheslav Barinov <viacheslav.teh@gmail.com>
2025-12-06 12:04:14 +00:00
Chukwuma Nwaugha
17a9abec2b
simplify requires_files list creation (#29656)
Signed-off-by: Chukwuma Nwaugha <nwaughac@gmail.com>
2025-12-06 09:42:41 +00:00
Ye (Charlotte) Qi
92c35abb24
[Misc] Fix circular import in vllm.transformers_utils.config (#30179)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-12-06 09:24:03 +00:00
Yu Jiaqi
43e7593031
Support tokenization_kwargs override (#29794)
Signed-off-by: piood <2477084691@qq.com>
2025-12-06 09:12:53 +00:00
Cyrus Leung
c46b932df2
[Chore] Deprecate SupportsMultiModal.merge_by_field_config (#30170)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 07:57:28 +00:00
redwrasse
6476382384
prefix caching design doc sha256 now default (#29261)
Signed-off-by: redwrasse <mail@redwrasse.io>
2025-12-06 07:39:56 +00:00
kx
d6aeaddf4a
[bugfix] fix type[AttentionBackend] bug in kv_connector_base_v1 (#30051)
Signed-off-by: 01267596 <xiongkai123@cmbchina.com>
Co-authored-by: 01267596 <xiongkai123@cmbchina.com>
2025-12-06 07:11:31 +00:00
Woosuk Kwon
a238cbd89d
[Model Runner V2] Support min-p sampling (#30171)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-12-05 21:42:47 -08:00
Nick Hill
4026ae31e9
[Misc] Move disable_nccl_for_dp_synchronization init logic into VllmConfig (#30161)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-12-05 20:59:04 -08:00
rasmith
b12f4a9830
[CI/Build][AMD] Use ROCM_ATTN instead of FLASH_ATTN test for test_register_kv_caches for ROCm and update test for TRITON_ATTN (#29985)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2025-12-05 20:57:38 -08:00
Rohan Potdar
40a046cd82
[Bugfix]: Fix TokenizerLike interface (#30009)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
2025-12-05 20:56:40 -08:00
Peter Salas
e858bc4d14
[Model] Add support for transformer-based Ultravox v0.7 projector (#30089)
Signed-off-by: Peter Salas <peter@fixie.ai>
2025-12-05 20:55:43 -08:00
Dongjie Zou
e3fbb6f152
fix#30092 Kimi-Linear model loading failure with missing indexer_rotary_emb (#30093)
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
2025-12-05 20:55:09 -08:00
yuttian1
c4d62618ca
Fix AWQ MoE marlin check issue in marlin_utils.py for AMD backend (#30102)
Signed-off-by: yuttian1 <yuttian@amd.com>
2025-12-05 20:54:38 -08:00
rasmith
62079d8600
[CI/Build][AMD] Skip marlin, machete, and hadacore tests since these require _C functions not defined for ROCm (#30109)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-12-06 12:54:17 +08:00
Harry Mellor
bf4a901af9
Better error when world size is larger than node and distributed_executor_backend is not set (#30140)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-05 20:53:52 -08:00
Samuel Shen
7e31c3a3f6
[CI]: Remove unnecessary imports from test_lmache_integration (#30157)
Signed-off-by: Samuel Shen <slshen@uchicago.edu>
Co-authored-by: Samuel Shen <slshen@uchicago.edu>
2025-12-06 12:53:34 +08:00
rasmith
dc839ad03d
[CI/Build][AMD][Quantization] Fix test_int8_kernel.py by updating int8_utils to use hip.libdevice.round (#30151)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-12-05 20:52:11 -08:00
Deboleina
02a4169193
[Tests] Tool call tests for openai/gpt-oss-20b (#26237)
Signed-off-by: Debolina Roy <debroy@redhat.com>
2025-12-05 19:03:29 -08:00
Wentao Ye
7b5575fa7d
[Bug] Fix vLLM config is not set error (#29999)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-05 16:42:12 -05:00
Bangsheng Tang
77e4472809
let draft model follow target model's config_format (#30152) 2025-12-05 13:33:42 -08:00
Divakar Verma
962d703818
[Bugfix][llama4_eagle] Fix missing 'lm_head' attribute (#29926)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2025-12-05 19:57:26 +00:00
Nicolò Lucchesi
e23ca3a0e8
[CI] Re-use whisper_client for all tests (#30148)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-05 19:47:37 +00:00
Russell Bryant
3633035a3f
[Misc] Rename CohereForAI references to CohereLabs (#30147)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-12-05 19:41:40 +00:00
Nicolò Lucchesi
bff78310d9
[Enc-Dec] Fix OOT tokenizer issue (#30144)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-05 19:23:33 +00:00
Tova Movshovitz
adb315060c
[KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache (#27170)
Signed-off-by: tovam <tovam@pliops.com>
Signed-off-by: Tova Movshovitz <tovam@pliops.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-05 18:33:26 +00:00
Ilya Markov
4e26d3b09e
[Compile] Conditional compilation. Introduce compile_ranges (#24252)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Luka Govedič <luka.govedic@gmail.com>
Signed-off-by: ProExpertProg <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Luka Govedič <luka.govedic@gmail.com>
2025-12-05 18:17:32 +00:00
Matthew Bonanni
66e674cdd5
[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
2025-12-05 09:48:43 -08:00
Mark McLoughlin
dff0a2b394
[NIXL] Add remote_request_id to kv_transfer_params (#29665)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-12-05 09:43:48 -08:00
Nick Hill
dc264bcea1
[BugFix] Eagerly abort cancelled final-step requests (#29987)
Currently, when requests are cancelled while executing their final
step, "completion" is handled based on normal stop processing (e.g.
length or stop token), so the abort has no effect. This is typically
not a problem, but when a kv connector is involved it thinks the
request completed successfully rather than being aborted.

This is problematic for disaggregated prefill which will free kv
cache blocks if the request was aborted but not if it completed
successfully—since the cancelled request will never be sent to
the decode side, kv cache blocks remain pinned until the fall-back
timeout expires. The problem is exacerbated when many requests
are cancelled and/or there are large prefills whose forward pass
takes a long time (since the window is bigger).

This PR fixes the problem by processing pending aborts
immediately prior to processing model output each step; we process
only aborts, not new requests, since it's preferable for latency to
process model outputs before new incoming requests.

Fixes #26400.

Signed-off-by: Nick Hill <nhill@redhat.com>
2025-12-05 17:28:32 +00:00
Nicolò Lucchesi
78c44fd722
[NIXL] Small cleanup of unused variables (#29618)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-05 18:17:36 +01:00
Angela Yi
e7296b08da
[bugfix] Pass globals to aot_compiled function (#29428)
Signed-off-by: angelayi <yiangela7@gmail.com>
2025-12-05 16:54:26 +00:00
Andrew Xia
da7bc54ea8
[responsesAPI][5] ResponsesParser with tools for full MCP python loop (#29798)
Signed-off-by: Andrew Xia <axia@fb.com>
Signed-off-by: Andrew Xia <axia@meta.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-12-05 11:11:50 -05:00
Mark McLoughlin
949a6a19d2
[NIXL] Add compatibility checking to NIXL KV connector handshake (#29503)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-12-05 15:52:45 +01:00
Alec S
2c174420f5
Reduce validation to a warning (#28749)
Signed-off-by: Alec Solder <alecs@fb.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Alec Solder <alecs@fb.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-05 14:02:49 +00:00