5 Commits

Author SHA1 Message Date
Chen Zhang
f0d610a8ae
[v1][KVCacheManager] Avoid full cache hit by controlling max_length (#17999)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-05-13 06:50:38 +00:00
Chen Zhang
ca66a1674c
[v1] Rename specialized_manager.py to single_type_kv_cache_manager.py (#17946)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-05-10 16:14:12 -07:00
Chen Zhang
200da9a517
[v1] Move block management logic from KVCacheManager to SpecializedManager (#17474)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-05-09 15:25:34 +00:00
Chen Zhang
81ecf425f0
[v1][Spec Decode] Make sliding window compatible with eagle prefix caching (#17398)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-04-30 18:25:53 +00:00
Chen Zhang
3a5f0afcd2
[V1] Implement sliding window attention in kv_cache_manager (#14097)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-04-01 00:33:17 -07:00