Johnny Yang
3ecabd06ee
Fix tpu-inference platform path ( #29554 )
...
Signed-off-by: Johnny Yang <johnnyyang@google.com>
2025-11-26 23:25:21 -08:00
Jee Jee Li
c069086b9c
[Bugfix] Fix getting device for MoE LoRA ( #29475 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-26 23:16:07 -08:00
Woosuk Kwon
11ea5ec1ff
[Model Runner V2] Refactor CudaGraphManager ( #29583 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-26 21:37:59 -08:00
Fadi Arafeh
ecb1952378
[cpu][fix] Fix Arm CI tests ( #29552 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-11-27 13:09:41 +08:00
TJian
da8e1a1bf9
[DOC] Add vLLM Bangkok Meetup info ( #29561 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-11-27 04:42:50 +00:00
Woosuk Kwon
ee80aee1ca
[Model Runner V2] Minor cleanup for build_attn_metadata ( #29576 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-26 20:10:12 -08:00
Woosuk Kwon
0aeb698b77
[Model Runner V2] Minor code cleanup ( #29570 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-26 19:47:17 -08:00
Louie Tsai
9bb33c8919
add xpu supported model and model id for cpu ( #29380 )
...
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
2025-11-27 11:30:50 +08:00
Jinzhen Lin
a67dec7cba
[Bugfix] fix IMA issue in certain cases of the moe marlin kernel ( #28619 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-11-26 19:02:21 -08:00
Matthew Bonanni
77740191de
[Attention][Async] Eliminate seq_lens_cpu in FlashAttention metadata building with DCP > 1 ( #29449 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-26 18:48:43 -08:00
HDCharles
df01eda4dc
[Bugfix] Make compressed-tensors MoEs respect ignored layers ( #28878 )
...
Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>
2025-11-26 21:35:13 -05:00
Johnny Yang
ba1fcd84a7
[TPU] add tpu_inference ( #27277 )
...
Signed-off-by: Johnny Yang <johnnyyang@google.com>
2025-11-26 14:46:36 -08:00
Lucas Wilkinson
56539cddac
[Core] Refactor padding logic and pad for CUDA graphs before attention metadata building ( #28579 )
2025-11-26 14:07:13 -05:00
Matthew Bonanni
430dd4d9eb
[Attention] Remove imports from vllm/attention/__init__.py ( #29342 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-26 10:53:15 -07:00
Alec
c4c0354eec
[CI/Build] allow user modify pplx and deepep ref by ENV or command line ( #29131 )
...
Signed-off-by: alec-flowers <aflowers@nvidia.com>
2025-11-26 17:41:16 +00:00
HDCharles
e603129505
[refactor] CTConfig methods to static/class methods ( #28870 )
...
Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-26 17:21:58 +00:00
Wentao Ye
0b0aa874e8
[Perf] Optimize batch invariant BMM, 18.1% Throughput improvement, 10.7% TTFT improvement ( #29345 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-26 09:38:52 -07:00
Huamin Li
70d5953f82
Revert "[Bugfix] Fix GPT-OSS AR+NORM fusion ( #28841 )" ( #29483 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com>
2025-11-26 22:27:26 +08:00
yxt
3650a74ed8
Optimize the wording of the document and unify the terminology and th… ( #29491 )
2025-11-26 05:16:12 -08:00
Yejing Lai
bb706d6048
Fix TeleChatForCausalLM not register issue ( #29473 )
...
Signed-off-by: Lai, Yejing <yejing.lai@intel.com>
2025-11-26 05:15:00 -08:00
Cyrus Leung
e30859dff3
[Bugfix] Fix handling of image embeds in models ( #29480 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-26 05:00:15 -08:00
Roger Wang
452a7c9f7c
[Misc] Allow LM only loading for Pixtral ( #29451 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-11-26 05:00:00 -08:00
Pleaplusone
d9d342d214
[Performance][MLA][ROCm] Remove redundant D2D copy in deepseek ( #27457 )
...
Signed-off-by: ganyi <ygan@amd.com>
2025-11-26 12:45:28 +08:00
Xin Yang
53d7f1f601
[Kernel] Use pre-allocated output buffer for triton kernel fused_experts ( #29219 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com>
2025-11-26 10:21:00 +08:00
dependabot[bot]
c5ee430328
Bump actions/checkout from 4 to 6 ( #29293 )
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-26 01:57:08 +00:00
Michael Goin
8d6a89dffd
[UX] Suppress gloo log spam ( #29250 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-11-25 17:19:35 -08:00
George D. Torres
56531b79cc
[Misc] Add backup hash algorithm for FIPS constrained environments ( #28795 )
...
Signed-off-by: George D. Torres <gdavtor@gmail.com>
Signed-off-by: George D. Torres <41129492+geodavic@users.noreply.github.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-11-26 00:50:22 +00:00
Xieyang Xu
12866af748
dummy run corner case ( #29433 )
2025-11-26 00:20:35 +00:00
Lucia Fang
d8819c88eb
fix assertion for single world use case (uni) ( #29429 )
...
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
2025-11-26 00:14:23 +00:00
Andrey Khalyavin
de75b0bb70
[BugFix] Fix initialization of draft model. ( #29319 )
...
Signed-off-by: Andrey Khalyavin <halyavin@yandex-team.ru>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2025-11-25 18:45:58 -05:00
Michael Goin
7df0289782
Change warning logs to debug for unimplemented MXFP4 Linear/Attention ( #29441 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-11-25 22:52:31 +00:00
Zhengxu Chen
0abc79482a
[caching] Add enable_prompt_embeds and cpu_offload_gb to compile hashes. ( #29435 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
2025-11-25 21:46:41 +00:00
Nick Hill
4e57c6587f
[Core] Support logprobs with spec decode + async scheduling ( #29223 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-25 12:55:24 -08:00
Ilya Markov
e7d776273d
[Compile] Refactor. Move PostGradPassManager out of Compilation config ( #29340 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com>
2025-11-25 19:58:56 +00:00
Eldar Kurtić
c32a18cbe7
Attempt to fix GPU OOM in a spec-decoding test ( #29419 )
...
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
2025-11-25 14:23:36 -05:00
Andrew Xia
b07555d26f
[responsesAPI][2] parse ResponseFunctionToolCallOutputItem ( #29383 )
...
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-11-25 10:27:26 -08:00
Harry Mellor
0353d2e162
Fix RoPE related failures in Transformers nightly tests ( #29333 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-25 16:23:45 +00:00
Harry Mellor
a1f2676879
Scheduled removal of override_pooler_config and disable_log_requests ( #29402 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-25 16:08:57 +00:00
Yifan Qiao
48ddb02b79
[Hybrid Allocator] Support KV cache groups with different block_size ( #29143 )
...
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
2025-11-25 10:30:57 -05:00
Michael Goin
e502098643
[Kernel] Add NVFP4 MoE CUTLASS support for SM120 ( #29242 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
2025-11-25 06:59:07 -08:00
Michael Goin
dbc3d9991a
[UX] Put CUDA attention backend selection log into one line ( #29337 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-11-25 06:46:18 -08:00
Injae Ryou
794029f012
[Feature]: Improve GGUF loading from HuggingFace user experience like repo_id:quant_type ( #29137 )
...
Signed-off-by: Injae Ryou <injaeryou@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-25 14:28:53 +00:00
Eldar Kurtić
0231ce836a
Revert back to torch.equal over torch.allclose from #28819 ( #29086 )
...
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
2025-11-25 14:23:38 +00:00
Thomas Parnell
516c3f7847
[Bugfix] Fix logic for choosing default prefix caching setting ( #29393 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-11-25 14:05:10 +00:00
Harry Mellor
51fc9e017a
Scheduled removal of CompilationConfig.use_inductor ( #29323 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-25 12:55:42 +00:00
Harry Mellor
bf0c75cd4f
Make Transformers Nightly tests soft-fail and enable all tests ( #29401 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-25 12:41:15 +00:00
Roger Wang
c2c661af9b
[Bugfix] Fix overallocation in MM profiling ( #29386 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-11-25 12:38:36 +00:00
Nicolò Lucchesi
798e87db5c
[Core] Generalize Encoder-Decoder seq_lens computation to avoid Whisper hardcoded logic ( #29268 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
2025-11-25 11:32:11 +00:00
wang.yuqi
de6889946b
[Misc] Suppress log outputs when constructing the default vllm config. ( #29291 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-25 03:00:44 -08:00
wang.yuqi
7a80b01889
[CI] Resettle pooling entrypoints tests. ( #29370 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-11-25 10:39:10 +00:00