vllm/config at b0cde8866e7df7abd08d42d12abe5af60f2b30a2 - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-01-26 15:44:30 +08:00

History

dongbo910220 b0cde8866e feat(v1): Implement pinned prefix caching with global unpin API

Core Features:
- Add pin_prefix parameter to SamplingParams for per-request prefix pinning
- Implement pinned prefix caching in V1 engine KVCacheManager
- Add pinned_prefix_cap_ratio (default 0.2) to control memory usage
- Add enable_pinned_prefix global gate for conservative rollouts
- Protect pinned blocks from LRU eviction in BlockPool

Bug Fixes:
- Fix multi-group budget bug with round-robin pinning strategy
- Ensure global cap is never exceeded even with multiple KV cache groups
- Use logical pinned depth (min across groups) for accurate reporting

Management APIs:
- Add HTTP endpoint POST /unpin_all_pinned_prefixes for memory reclamation
- Implement complete call chain: API -> AsyncLLM -> EngineCore -> Scheduler -> KVCacheManager
- Remove per-request unpin to keep API surface minimal

Code Quality:
- Replace manual @field_validator with Field(ge=0, le=1) for cleaner validation
- Add comprehensive test coverage (unit + integration + E2E)
- Add test_multi_group_prefix_pinning_respects_global_cap() for multi-group validation
- Add test_unpin_all_pinned_prefixes_clears_pool() for unpin API validation

Resolves: #23083
Signed-off-by: dongbo910220 <1275604947@qq.com>

2025-10-17 19:38:21 +08:00

test_config_generation.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_config_with_model.yaml

[V0 Deprecation] Remove multi-step scheduling (#22138 )

2025-08-12 20:18:39 -07:00

test_config.yaml

[V0 Deprecation] Remove multi-step scheduling (#22138 )

2025-08-12 20:18:39 -07:00

test_mp_reducer.py

[V0 Deprecation] Remove VLLM_USE_V1 from tests (#26341 )

2025-10-07 15:42:31 +00:00

test_pinned_prefix_config.py

feat(v1): Implement pinned prefix caching with global unpin API

2025-10-17 19:38:21 +08:00