vllm/core at 2dbe8c07744cd5b7531c191a734a613f8b797e65 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-24 04:57:52 +08:00

History

Nick Hill 2dbe8c0774

[Perf] API-server scaleout with many-to-many server-engine comms (#17546 )

2025-05-30 08:17:00 -07:00

..

[Perf] API-server scaleout with many-to-many server-engine comms (#17546 )

2025-05-30 08:17:00 -07:00

__init__.py

[V1] Implement vLLM V1 [1/N] (#9289 )

2024-10-22 01:24:07 -07:00

block_pool.py

[V1][Metrics] add support for kv event publishing (#16750 )

2025-04-30 07:44:45 -07:00

encoder_cache_manager.py

Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True (#16447 )

2025-04-11 08:09:52 +00:00

kv_cache_manager.py

[BugFix][Spec Decode] Improve Prefix Caching Logic in Speculative Decoding (#18668 )

2025-05-24 17:33:46 -07:00

kv_cache_utils.py

[v1] Redo "Support multiple KV cache groups in GPU model runner (#17945 )" (#18593 )

2025-05-23 09:39:47 -07:00

single_type_kv_cache_manager.py

[v1][KVCacheManager] Avoid full cache hit by controlling max_length (#17999 )

2025-05-13 06:50:38 +00:00