16 Commits

Author SHA1 Message Date
Matthew Bonanni
fc1d8be3dc
[Attention] Update attention imports (#29540)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-27 11:19:09 -05:00
Matthew Bonanni
430dd4d9eb
[Attention] Remove imports from vllm/attention/__init__.py (#29342)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-26 10:53:15 -07:00
Or Ozeri
647464719b
[KVConnector][Core] Support cross-layer KV blocks (#27743)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-11-20 19:09:59 +01:00
Or Ozeri
c0c2dd1e0b
[BugFix] kv_offloading: Fix bug in loading of partial cpu blocks (#28951)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-20 18:55:10 +08:00
Kunshang Ji
2a2d5d2780
Replace torch.cuda.Event with torch.Event for better hardware compatibility (#26985)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-11-18 11:34:36 -08:00
alberto
bac904565f
Implement ARC KV cache eviction policy for CPU offloader (#27039)
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: alberto <aperdomo@redhat.com>
Co-authored-by: Or Ozeri <or@ozery.com>
2025-11-12 09:51:39 -08:00
Zhewen Li
0fe0140408
[KV offload] Enable CPU KV offload on CUDA alike Platforms (#27770)
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-10-30 22:10:29 +08:00
Jonathan Chen
ca76486a16
[Chore] Separate out vllm.utils.platform_utils.py (#27374)
Signed-off-by: Jonathan <chenleejonathan@gmail.com>
2025-10-23 19:08:06 +00:00
Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-12 09:51:31 -07:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
Harry Mellor
8c853050e7
[Docs] Enable fail_on_warning for the docs build in CI (#25580)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-24 19:30:33 +00:00
Or Ozeri
8db2939289
[KV offload][5/N] Add CPUOffloadingSpec (#24251)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-09-22 12:30:36 -07:00
Or Ozeri
c59a0eca42
[KV offload][4/N] Offloading KV connector (#22595)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-09-19 19:07:17 +00:00
Or Ozeri
7ac67ea525
[KV offload][3/N] Add worker-side CPU support (#21448)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-09-19 09:53:45 -07:00
Or Ozeri
9d1c50a5ac
[KV offload][2/N] Introduce LRU-based CPU offloading management (#20075)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-09-19 00:20:51 +00:00
Or Ozeri
a53ad626d6
[KV offload][1b/N] rename offloading to kv_offload (#25191)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-09-18 20:53:52 +00:00