This website requires JavaScript.
Explore
Help
Sign In
xinyun
/
vllm
Watch
1
Star
0
Fork
0
You've already forked vllm
mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced
2026-04-21 05:47:02 +08:00
Code
Issues
Packages
Projects
Releases
Wiki
Activity
vllm
/
vllm
/
v1
/
core
History
Woosuk Kwon
6825d9a998
[BugFix][Spec Decode] Improve Prefix Caching Logic in Speculative Decoding (
#18668
)
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-05-24 17:33:46 -07:00
..
sched
[BugFix][Spec Decode] Improve Prefix Caching Logic in Speculative Decoding (
#18668
)
2025-05-24 17:33:46 -07:00
__init__.py
[V1] Implement vLLM V1 [1/N] (
#9289
)
2024-10-22 01:24:07 -07:00
block_pool.py
[V1][Metrics] add support for kv event publishing (
#16750
)
2025-04-30 07:44:45 -07:00
encoder_cache_manager.py
Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True (
#16447
)
2025-04-11 08:09:52 +00:00
kv_cache_manager.py
[BugFix][Spec Decode] Improve Prefix Caching Logic in Speculative Decoding (
#18668
)
2025-05-24 17:33:46 -07:00
kv_cache_utils.py
[v1] Redo "Support multiple KV cache groups in GPU model runner (
#17945
)" (
#18593
)
2025-05-23 09:39:47 -07:00
single_type_kv_cache_manager.py
[v1][KVCacheManager] Avoid full cache hit by controlling max_length (
#17999
)
2025-05-13 06:50:38 +00:00