12045 Commits

Author SHA1 Message Date
Laith Sakka
87aee9ed2b
Add evaluate_guards option to DynamicShapesConfig (#27432)
Signed-off-by: Laith Sakka <lsakka@meta.com>
2025-12-08 10:46:15 -05:00
Daniel Cámpora
184076c3fe
[DeepSeek v3.2] Make top-k work for any logit values. (#27568)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-08 06:55:58 -08:00
Ye (Charlotte) Qi
eb1051fb95
[ROCm] Guard group quant RMS norm fusion patterns (#30239) 2025-12-08 14:44:48 +00:00
Jee Jee Li
80433e225e
[LoRA] Reduce the loading time of MoE LoRA (#30243)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-08 13:29:47 +00:00
Harry Mellor
5c2433a6f3
Add tip for mypy and markdownlint to the pre-commit comment (#30259)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-08 13:11:51 +00:00
Simon Mo
77072e93b3
[docs] governance documents (#24801)
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-12-08 12:06:20 +00:00
wang.yuqi
2e660c2434
[Frontend] Binary embedding response does not return metadata by setting encoding_format to bytes_only. (#30249)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-08 12:01:21 +00:00
Shiming Zhang
408cf42f67
[CI] Prevents triggering of an inactive issue/PR check for forked repository. (#29654)
Signed-off-by: Shiming Zhang <wzshiming@hotmail.com>
2025-12-08 10:29:14 +00:00
wang.yuqi
9e77ffca3f
[Model][7/N] Improve all pooling task | Deprecation as_reward_model. Extract hidden states prefer using new multi-vector retrieval API (#26686)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-08 08:10:09 +00:00
Dazhi Jiang
bcb6f5947f
[Perf] Remove sync point in vit torch sdpa attn backend (#30232)
Signed-off-by: Dazhi Jiang <dazhi_jiang@163.com>
2025-12-08 07:12:42 +00:00
Zhiyu
cd00c443d2
[Misc] Rename TensorRT Model Optimizer to Model Optimizer (#30091)
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
2025-12-08 07:05:27 +00:00
Jiangyun Zhu
d143271234
[Bugfix] fix fuse_allreduce_rms when tp =1 (#30178)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-12-08 06:43:47 +00:00
Zhiwei
c6df05ebb4
[ROCm] [Fused Moe EP] Use binary expert mask for aiter fused moe kernel (#29773)
Signed-off-by: ZhiweiYan-96 <zhiwei.yan@amd.com>
2025-12-08 05:23:46 +00:00
Nick Hill
d726a7b0ed
[BugFix] Unblock use of LoRA with data parallel mode (#30220)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-12-08 12:21:05 +08:00
Zhijian Jiang
344b50d525
Address comment to mergify.yml in #30117 (#30219)
Signed-off-by: Zhijian Jiang <Zhijian.Jiang@outlook.com>
2025-12-08 11:26:25 +08:00
Andrew Xia
735284ed86
[responsesAPI][7] Browser, Container MCP tools for non harmony models (#29989)
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-08 10:04:03 +08:00
daniel-salib
444f0e3f33
[Frontend] Add MCP type support infrastructure to Responses API (#30054)
Signed-off-by: Daniel Salib <danielsalib@meta.com>
2025-12-08 10:02:52 +08:00
ElizaWszola
af0444bf40
[Performance] Fused blockwise quant RMS norm (#27883)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
2025-12-07 16:38:04 +00:00
Lucas Wilkinson
0044c4038c
[BugFix][DeepSeek-V3.2] Fix backend selection logic for Blackwell (#30195) 2025-12-07 10:53:51 -05:00
Isotr0py
b952f4d3c3
[v1] Add PrefixLM support to FlexAttention backend (#27938)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-07 15:51:36 +00:00
Wentao Ye
541a2ef892
[Perf] Deepgemm fused layout kernel for activations, 4.3% throughput improvement, 10.7% TTFT improvement. (#29546)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-07 20:31:14 +08:00
Jee Jee Li
b0f4866a77
[CI/Build]Temporary workaround for test_default_mm_loras timeout (#30202)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-07 20:27:11 +08:00
Jinzhen Lin
879ddb09c3
[Kernel][MoE] optimize moe_align_block_size (#29642)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-07 01:58:47 -08:00
Yifan Qiao
1b0482b9d1
[Misc][Core] Remove unused req_index increment in scheduler (#30176)
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
2025-12-07 08:39:21 +00:00
Cyrus Leung
e83b7e379c
Revert "[Renderer] Separate out RendererConfig from ModelConfig (#30145)" (#30199) 2025-12-07 00:00:22 -08:00
Cyrus Leung
27f4c2fd46
[Renderer] Separate out RendererConfig from ModelConfig (#30145)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 23:15:42 -08:00
Luke
a49d813fa8
Lazy loading to avoid importing all files (#29716)
Signed-off-by: Luke <yq0536@gmail.com>
2025-12-07 07:13:14 +00:00
Wentao Ye
17eb25e327
[Perf] Enable cuda graph for deepepHT, 5.3% throughput improvement, 4.4% TTFT improvement (#29558)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-07 04:44:50 +00:00
jeremyteboul
dce6d229f7
Support multiple image/audio embeddings per requests (#29988)
Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com>
Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>
2025-12-07 04:34:24 +00:00
Yanan Cao
cbedb703cc
[Frontend] Remove confusing -O.xx flag error (#30169)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
2025-12-07 02:53:42 +00:00
AuruTus
8d3da4c79d
[MISC]: change NIXL compatibility hash logging level to debug (#30182) 2025-12-07 00:21:03 +00:00
Andrew Xia
421125d03a
[ez] move harmony utils to parser folder (#30117)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-12-06 17:34:34 -05:00
Cyrus Leung
671427efbf
[Model] Move multimodal_cpu_fields definition to field config (#30181)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 13:40:02 +00:00
Viacheslav
21bb323542
Gigachat 3 tool parser and tests (#29905)
Signed-off-by: Viacheslav Barinov <viacheslav.teh@gmail.com>
2025-12-06 12:04:14 +00:00
Chukwuma Nwaugha
17a9abec2b
simplify requires_files list creation (#29656)
Signed-off-by: Chukwuma Nwaugha <nwaughac@gmail.com>
2025-12-06 09:42:41 +00:00
Ye (Charlotte) Qi
92c35abb24
[Misc] Fix circular import in vllm.transformers_utils.config (#30179)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-12-06 09:24:03 +00:00
Yu Jiaqi
43e7593031
Support tokenization_kwargs override (#29794)
Signed-off-by: piood <2477084691@qq.com>
2025-12-06 09:12:53 +00:00
Cyrus Leung
c46b932df2
[Chore] Deprecate SupportsMultiModal.merge_by_field_config (#30170)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 07:57:28 +00:00
redwrasse
6476382384
prefix caching design doc sha256 now default (#29261)
Signed-off-by: redwrasse <mail@redwrasse.io>
2025-12-06 07:39:56 +00:00
kx
d6aeaddf4a
[bugfix] fix type[AttentionBackend] bug in kv_connector_base_v1 (#30051)
Signed-off-by: 01267596 <xiongkai123@cmbchina.com>
Co-authored-by: 01267596 <xiongkai123@cmbchina.com>
2025-12-06 07:11:31 +00:00
Woosuk Kwon
a238cbd89d
[Model Runner V2] Support min-p sampling (#30171)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-12-05 21:42:47 -08:00
Nick Hill
4026ae31e9
[Misc] Move disable_nccl_for_dp_synchronization init logic into VllmConfig (#30161)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-12-05 20:59:04 -08:00
rasmith
b12f4a9830
[CI/Build][AMD] Use ROCM_ATTN instead of FLASH_ATTN test for test_register_kv_caches for ROCm and update test for TRITON_ATTN (#29985)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2025-12-05 20:57:38 -08:00
Rohan Potdar
40a046cd82
[Bugfix]: Fix TokenizerLike interface (#30009)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
2025-12-05 20:56:40 -08:00
Peter Salas
e858bc4d14
[Model] Add support for transformer-based Ultravox v0.7 projector (#30089)
Signed-off-by: Peter Salas <peter@fixie.ai>
2025-12-05 20:55:43 -08:00
Dongjie Zou
e3fbb6f152
fix#30092 Kimi-Linear model loading failure with missing indexer_rotary_emb (#30093)
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
2025-12-05 20:55:09 -08:00
yuttian1
c4d62618ca
Fix AWQ MoE marlin check issue in marlin_utils.py for AMD backend (#30102)
Signed-off-by: yuttian1 <yuttian@amd.com>
2025-12-05 20:54:38 -08:00
rasmith
62079d8600
[CI/Build][AMD] Skip marlin, machete, and hadacore tests since these require _C functions not defined for ROCm (#30109)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-12-06 12:54:17 +08:00
Harry Mellor
bf4a901af9
Better error when world size is larger than node and distributed_executor_backend is not set (#30140)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-05 20:53:52 -08:00
Samuel Shen
7e31c3a3f6
[CI]: Remove unnecessary imports from test_lmache_integration (#30157)
Signed-off-by: Samuel Shen <slshen@uchicago.edu>
Co-authored-by: Samuel Shen <slshen@uchicago.edu>
2025-12-06 12:53:34 +08:00