Qiu
|
2fd893b4ce
|
[Feature] Prefill Context Parallel (PCP) basic support (#28718)
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com>
Signed-off-by: LookAround <lixushi@huawei.com>
Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com>
Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Co-authored-by: FENP <yuanyongjie.yyj@antgroup.com>
Co-authored-by: LookAround <lixushi@huawei.com>
Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com>
Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Co-authored-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>
|
2025-11-19 15:52:44 -05:00 |
|
Yeshwanth N
|
71b1c8b667
|
[Chore]:Extract math and argparse utilities to separate modules (#27188)
Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com>
Signed-off-by: Yeshwanth N <yeshsurya@gmail.com>
Signed-off-by: yeshsurya <yeshsurya@gmail.com>
|
2025-10-26 04:03:32 -07:00 |
|
Wentao Ye
|
52efc34ebf
|
[Log] Optimize Startup Log (#26740)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-24 19:27:04 -04:00 |
|
Andrew Sansom
|
ff93cc8c84
|
[CORE] Support Prefix Caching with Prompt Embeds (#27219)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
|
2025-10-22 22:18:07 -07:00 |
|
Sage
|
1651003c35
|
[Prefix Cache] Use LoRA name for consistent KV-cache block hashing (#27211)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
|
2025-10-22 18:13:03 +00:00 |
|
dongbo910220
|
8a297115e2
|
[Chore] Separate out hashing utilities from vllm.utils (#27151)
Signed-off-by: dongbo910220 <1275604947@qq.com>
|
2025-10-19 11:09:38 +08:00 |
|
iAmir97
|
1d165d6d85
|
[Chore] Separate out vllm.utils.mem_utils (#27143)
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com>
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com>
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-18 10:06:59 +00:00 |
|
Harry Mellor
|
6c9fdbf725
|
[Docs] Replace rst style double-backtick with md single-backtick (#27091)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-17 02:47:34 -07:00 |
|
Harry Mellor
|
8fcaaf6a16
|
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-12 09:51:31 -07:00 |
|
Cyrus Leung
|
ad430a67ca
|
[Metrics] Log multi-modal cache stats and fix reset (#26285)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-10 01:45:55 -07:00 |
|
Grant Holmes (Ren)
|
d100d78eb3
|
Optimize KV cache distribution for asymmetric pipeline parallelism (#25164)
Signed-off-by: gholmes829 <g.holmes429@gmail.com>
|
2025-10-07 09:20:30 +00:00 |
|
Harry Mellor
|
d6953beb91
|
Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-05 07:06:22 -07:00 |
|
Jialin Ouyang
|
201c971e96
|
[Perf][Easy] Early stop in request_block_hasher (#26112)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-10-05 16:46:03 +08:00 |
|
Yongye Zhu
|
fa7e254a7f
|
[New Model] DeepSeek-V3.2 (Rebased to Main) (#25896)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Xiaozhu Meng <mxz297@gmail.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
|
2025-09-30 17:14:41 +08:00 |
|
Wenlong Wang
|
032d661d27
|
[Docs] Fix warnings in mkdocs build (continued) (#25042)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
|
2025-09-20 11:45:18 +00:00 |
|
Chen Zhang
|
9607d5eb44
|
[Hybrid Allocator] Support full attention with different hidden size (#25101)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-09-19 23:43:59 -07:00 |
|
Jialin Ouyang
|
2506ce5189
|
[Core][Prefix Hash] Fix prefix hash metrics sliding window maintainance (#24990)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-09-19 12:22:53 -06:00 |
|
lianyibo
|
faa7a5daac
|
[Bugfix] Fix unable to run encoder model when disable_hybrid_kv_cache_manager is true (#24571)
Signed-off-by: lianyibo <lianyibo1@kunlunit.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
|
2025-09-16 17:36:58 +00:00 |
|
Ning Xie
|
59e17dd4a0
|
[Misc] rename interval to max_recent_requests (#24229)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-09-15 09:18:42 +00:00 |
|
Ning Xie
|
3f3313981c
|
[kv cache] update num_free_blocks in the end (#24228)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-09-15 05:15:12 +00:00 |
|
Chen Zhang
|
8e5cdcda4e
|
[Hybrid Allocator] Support Pipeline Parallel (#23974)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-09-14 15:55:17 -07:00 |
|
Flora Feng
|
0377802c20
|
[Multimodal] Remove legacy multimodal fields in favor of MultiModalFeatureSpec (#24548)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2025-09-12 21:42:23 +08:00 |
|
Zebing Lin
|
82dfb12e52
|
[Core] Use sha256 bytes instead of BlockHash to reduce GC overhead (#23673)
Signed-off-by: linzebing <linzebing1995@gmail.com>
|
2025-09-08 21:34:37 -07:00 |
|
yzds
|
ac201a0eaf
|
[Feature] Support Decode Context Parallel (DCP) for MLA (#23734)
Signed-off-by: hongchao <hongchao@msh.team>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: hongchao <hongchao@msh.team>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-09-06 13:24:05 +08:00 |
|
co63oc
|
1bd007f234
|
fix some typos (#24071)
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
|
2025-09-02 20:44:50 -07:00 |
|
Ning Xie
|
5438967fbc
|
[Misc] add hash_function doc string (#24014)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-31 23:11:20 -07:00 |
|
Or Ozeri
|
c280066f9d
|
[v1] Move block_hashes from KVCacheManager to Request.block_hashes (#19728)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-08-15 16:52:52 -07:00 |
|
Cyrus Leung
|
139d155781
|
[Frontend] Use engine argument to control MM cache size (#22441)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-07 09:47:10 -07:00 |
|
Cyrus Leung
|
766bc8162c
|
[Core] Store only the keys for multi-modal data in P0 (#22198)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-07 01:45:04 -07:00 |
|
Zebing Lin
|
e0f63e4a35
|
[Core] Avoid repeated len(block_token_ids) check in hash_request_tokens (#21781)
Signed-off-by: linzebing <linzebing1995@gmail.com>
|
2025-08-01 00:23:29 -07:00 |
|
Ruixiang Tan
|
8f4a1c9a04
|
[Misc] Improve code readability of KVCacheManager (#21673)
Signed-off-by: tanruixiang <tanruixiang0104@gmail.com>
Signed-off-by: Ruixiang Tan <819464715@qq.com>
Signed-off-by: GitHub <noreply@github.com>
|
2025-07-30 07:20:43 -07:00 |
|
Chen Zhang
|
755fa8b657
|
[KVCache] Make KVCacheSpec hashable (#21791)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-07-29 19:58:29 +08:00 |
|
Raushan Turganbay
|
f38ee34a0a
|
[feat] Enable mm caching for transformers backend (#21358)
Signed-off-by: raushan <raushan@huggingface.co>
|
2025-07-22 08:18:46 -07:00 |
|
Jialin Ouyang
|
ed25054577
|
[Core] Introduce popleft_n and append_n in FreeKVCacheBlockQueue to further optimize block_pool (#21222)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-07-22 06:17:47 -07:00 |
|
Lucia Fang
|
9a9fda1423
|
[Core] Support Local Chunked Attention for Hybrid KV Cache (#19351)
Signed-off-by: Lucia Fang <fanglu@fb.com>
Signed-off-by: Lu Fang <fanglu@meta.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lu Fang <fanglu@meta.com>
|
2025-07-18 20:48:38 -07:00 |
|
JialinOuyang-Meta
|
0f199f197b
|
[Core] Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue (#21005)
Signed-off-by: Jialin Ouyang <jialino@meta.com>
|
2025-07-18 12:34:40 -07:00 |
|
Christian Pinto
|
4ffd963fa0
|
[v1][core] Support for attention free models (#20811)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
|
2025-07-15 14:20:01 +00:00 |
|
Maroon Ayoub
|
66f6fbd393
|
[Prefix Cache] Add reproducible prefix-cache block hashing using SHA-256 + CBOR (64bit) (#20511)
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
|
2025-07-14 02:45:31 +00:00 |
|
Thomas Parnell
|
2f35a022e6
|
Enable V1 for Hybrid SSM/Attention Models (#20016)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
|
2025-07-04 17:46:53 +00:00 |
|
jmswen
|
c9280e6346
|
[Bugfix] Respect num-gpu-blocks-override in v1 (#19503)
Signed-off-by: Jon Swenson <jmswen@gmail.com>
|
2025-06-12 11:00:23 +00:00 |
|
Chen Zhang
|
f8a1a2d108
|
[v1] Hybrid Memory Allocator (#17996)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-05 20:47:09 -07:00 |
|
Chen Zhang
|
a8da78eac9
|
[Bugfix] Max concurrency estimation and check_enough_kv_cache_memory for models with sliding window layers (#19029)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-04 00:14:06 +00:00 |
|
Chen Zhang
|
b5fd9506c1
|
[Bugfix] get_num_blocks_to_allocate with null_block (#19031)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-03 15:30:55 -07:00 |
|
Simon Mo
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
|
Chen Zhang
|
f32fcd9444
|
[v1][KVCacheManager] Rename BlockHashType to BlockHash (#19015)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-03 08:01:48 +00:00 |
|
Yong Hoon Shin
|
1e123529d7
|
[Misc] Fix estimated max model len msg (#18966)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-05-31 16:43:44 +08:00 |
|
Chen Zhang
|
6550114c9c
|
[v1] Redo "Support multiple KV cache groups in GPU model runner (#17945)" (#18593)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-05-23 09:39:47 -07:00 |
|
Mark McLoughlin
|
bb0a311213
|
Revert "[v1] Support multiple KV cache groups in GPU model runner (#17945) (#18459)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-05-21 10:25:23 -07:00 |
|
Chen Zhang
|
e60f550b38
|
[v1] Support multiple KV cache groups in GPU model runner (#17945)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-05-14 18:54:54 -07:00 |
|
Marko Rosenmueller
|
77073c77bc
|
[Core] Prevent side-channel attacks via cache salting (#17045)
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
|
2025-04-30 20:27:21 +08:00 |
|