Woosuk Kwon
c11d1e6781
optimize spec
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-31 16:40:54 -07:00
Woosuk Kwon
e696f78e05
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-31 13:29:58 -07:00
Woosuk Kwon
efcb786d52
merge
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-31 10:44:36 -07:00
Didier Durand
9701352e4b
[Doc]: fix typos in Python comments ( #24001 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-08-31 08:21:59 +00:00
Andy Lo
038e9be4eb
[LoRA] Much faster startup when LoRA is enabled ( #23777 )
...
Signed-off-by: Andy Lo <andy@mistral.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-30 15:37:39 +00:00
wang.yuqi
d9e00dbd1f
[Performance] V1 Classify Models E2E Performance Optimization ( #23541 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-08-29 03:12:32 -07:00
Woosuk Kwon
9ee9d0e274
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-28 15:02:07 -07:00
Yong Hoon Shin
cb293f6a79
[V1] Enable prefill optimization for Gemma3n ( #22628 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-08-28 14:54:30 -07:00
Woosuk Kwon
405578121c
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-28 13:19:10 -07:00
Woosuk Kwon
19c0dfc469
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-28 13:08:07 -07:00
Woosuk Kwon
e451045a66
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-28 12:55:13 -07:00
Woosuk Kwon
efba25e21a
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-28 12:39:15 -07:00
Woosuk Kwon
e570b0a4de
merge
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-27 21:45:11 -07:00
Wentao Ye
321938e9ac
[Feature] Add VLLM_DISABLE_PAD_FOR_CUDAGRAPH to Avoid Hang Issue ( #23595 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-27 21:52:24 +00:00
Hyogeun Oh (오효근)
4e4d017b6f
[Docs] Fix warnings in mkdocs build (continued) ( #23743 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com>
Signed-off-by: Hyogeun Oh (오효근) <ohg3417@gmail.com>
2025-08-27 17:17:29 +00:00
Cyrus Leung
52883ed084
[Model] Merge SupportsMultiModalWithRawInput with SupportsMultiModal ( #23749 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-27 10:01:50 -07:00
Woosuk Kwon
04ff1e43fb
[Misc] Move CpuGpuBuffer to vllm/v1/utils.py ( #23728 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-27 03:25:00 -07:00
Cyrus Leung
69244e67e6
[Core] Use key-only cache for BaseMultiModalProcessor ( #23018 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-27 14:19:13 +08:00
Li, Jiang
9b0187003e
[Bugfix] Fix cuda event usage with CPU model runner ( #23643 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-08-26 17:10:42 +00:00
Chen Zhang
2b4fc9bd9b
Support FlashAttention Backend for Hybrid SSM Models ( #23299 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-08-26 12:41:52 +00:00
Zijing Liu
b395b3b0a3
[Disagg][Perf] Use CUDA event sync instead of blocking tolist to avoid unintentional copy ops blocking across different CUDA streams, improving disagg TTIT/TTFT ( #22760 )
...
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
Signed-off-by: Zijing Liu <liuzijing2014@users.noreply.github.com>
2025-08-25 21:06:00 -07:00
Woosuk Kwon
a851aaa0fc
simplify
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-25 09:23:05 -07:00
Woosuk Kwon
b1d52734f7
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-25 08:55:12 -07:00
Woosuk Kwon
65f93694be
merge
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-25 08:54:32 -07:00
Woosuk Kwon
0ff902f3b4
[Refactor] Refactor persistent buffers with CpuGpuBuffer ( #23515 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-25 08:44:48 -07:00
Ayush Satyam
5c4b6e66fe
[Attention] Unify mamba and attention backend selection ( #23171 )
...
Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com>
2025-08-25 09:09:36 +00:00
Chenguang Zheng
d765cf01fe
[Core][Multimodal] Track encode cache entries by mm_hash and enable embedding sharing between requests ( #22711 )
...
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-08-25 00:41:17 -07:00
Woosuk Kwon
a1e3745150
wip
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-24 18:36:18 -07:00
Woosuk Kwon
ad78868450
[Misc] Remove unused slot_mapping buffer ( #23502 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-24 14:03:36 -07:00
Woosuk Kwon
48bca9a109
merge
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-23 11:30:29 -07:00
Nick Hill
c80c53a30f
[BugFix] Fix batch updates for pooling models ( #23398 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-08-23 08:20:41 +08:00
Woosuk Kwon
808d2e9aa0
[Misc] Move M-RoPE init logic to _init_mrope_positions ( #23422 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-22 03:07:22 -07:00
Woosuk Kwon
79e5eb3643
wip
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-22 01:37:43 -07:00
Chen Zhang
17373dcd93
[Attention] Refactor AttentionMetadata Preparation for Encoder-only Models ( #23154 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-08-22 05:05:59 +00:00
Woosuk Kwon
c472982746
merge
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-21 21:40:44 -07:00
Nick Hill
603fbbbce0
[Misc] Misc code cleanup/simplification ( #23304 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-08-21 17:22:55 +00:00
wang.yuqi
d70a16625d
[Performance] V1 Pooling Models E2E Performance Optimization ( #23162 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-08-21 13:26:09 +00:00
Cyrus Leung
0c6e40bbaa
[Refactor] Simplify code for MM budget ( #23310 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-21 08:00:16 +00:00
Woosuk Kwon
b029de9902
[Optimization] Make new_block_ids None if empty ( #23262 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-08-20 18:25:56 -07:00
rongfu.leng
4fbda0b20c
[Feature] use --eplb_config to set eplb param ( #20562 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: rongfu.leng <lenronfu@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-20 14:07:28 -07:00
Woosuk Kwon
d6d13bd49e
[Misc] Add max_seq_len to CommonAttentionMetadata ( #23216 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-20 09:05:29 -07:00
Woosuk Kwon
40f26734b9
[Misc] Fix seq_lens for graph capture ( #23175 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-19 03:58:16 -07:00
Woosuk Kwon
21bcc8263f
[Misc] Avoid accessing req_ids inside a loop ( #23159 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-19 09:39:38 +00:00
Woosuk Kwon
c9b38be8aa
[Spec Decode] Make propose_draft_token_ids non-blocking for lower TTFT ( #23041 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-18 17:20:38 -07:00
Woosuk Kwon
0dd3f4f5ab
[Misc] Minor refactoring for prepare_inputs ( #23116 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-18 16:58:05 -07:00
Cyrus Leung
27e8d1ea3e
[Refactor] Define MultiModalKwargsItems separate from MultiModalKwargs ( #23053 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-18 09:52:00 +00:00
Woosuk Kwon
699bd7928e
Merge branch 'main' into woosuk/input-prep
2025-08-17 19:28:38 -07:00
Woosuk Kwon
8ea0c2753a
[Misc] Minor code cleanup for _get_prompt_logprobs_dict ( #23064 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-17 18:16:03 -07:00
Calvin Chen
21e39436c8
[XPU] fix xpu to set cudagraph batch sizes ( #23044 )
...
Signed-off-by: calvin chen <wen.chen@dynamia.ai>
2025-08-17 21:45:42 +00:00
Woosuk Kwon
33a3a26ca5
wip
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-17 14:38:24 -07:00