608 Commits

Author SHA1 Message Date
dongbo910220
f9bd10566c
Merge b0cde8866e7df7abd08d42d12abe5af60f2b30a2 into 3125d7995020f4ae1b7777e41af6c3a6e714cb1c 2025-10-17 19:05:50 +00:00
Nicolò Lucchesi
2ba60ec7fe
[CI] Nixl integration tests (#27010)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-10-17 07:13:31 -07:00
dongbo910220
b0cde8866e feat(v1): Implement pinned prefix caching with global unpin API
Core Features:
- Add pin_prefix parameter to SamplingParams for per-request prefix pinning
- Implement pinned prefix caching in V1 engine KVCacheManager
- Add pinned_prefix_cap_ratio (default 0.2) to control memory usage
- Add enable_pinned_prefix global gate for conservative rollouts
- Protect pinned blocks from LRU eviction in BlockPool

Bug Fixes:
- Fix multi-group budget bug with round-robin pinning strategy
- Ensure global cap is never exceeded even with multiple KV cache groups
- Use logical pinned depth (min across groups) for accurate reporting

Management APIs:
- Add HTTP endpoint POST /unpin_all_pinned_prefixes for memory reclamation
- Implement complete call chain: API -> AsyncLLM -> EngineCore -> Scheduler -> KVCacheManager
- Remove per-request unpin to keep API surface minimal

Code Quality:
- Replace manual @field_validator with Field(ge=0, le=1) for cleaner validation
- Add comprehensive test coverage (unit + integration + E2E)
- Add test_multi_group_prefix_pinning_respects_global_cap() for multi-group validation
- Add test_unpin_all_pinned_prefixes_clears_pool() for unpin API validation

Resolves: #23083
Signed-off-by: dongbo910220 <1275604947@qq.com>
2025-10-17 19:38:21 +08:00
Jee Jee Li
fec2b341ad
[Kernel] Lazy import FlashInfer (#26977) 2025-10-17 04:48:18 +00:00
Cyrus Leung
4d4d6bad19
[Chore] Separate out vllm.utils.importlib (#27022)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-17 00:48:59 +00:00
Bram Wasti
b2f78cbad4
[small][batch invariance] Rename the env and internal flags to simplify usage (#26855)
Signed-off-by: Bram Wasti <bwasti@meta.com>
2025-10-16 21:40:25 +00:00
Wentao Ye
2ed8b6b3d0
[Bug] Fix batch invariant test has to is (#27032)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-16 19:45:14 +00:00
Tahsin Tunan
43721bc67f
[CI] Replace large models with tiny alternatives in tests (#24057)
Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-16 15:51:27 +01:00
Bram Wasti
7d8975de84
Deepseek-v3 Batch Invariant on 8xH100 (#26609)
Signed-off-by: Bram Wasti <bwasti@meta.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-10-15 22:06:02 -07:00
Cyrus Leung
f93e348010
[Misc] Remove isort and yapf ignores (#26888)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-15 12:09:03 +00:00
Morrison Turnansky
96b9aa5aa0
[Frontend][torch.compile] CompilationConfig Overhaul (#20283): name change compilation level to compilation mode, deprecation compilation level (#26355)
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-10-15 02:51:16 +00:00
Nick Hill
4aed506b65
[Core] Streamline some structured output related code (#26737)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-10-14 23:27:44 +00:00
Matthew Bonanni
82af928c41
[Attention][Spec Decode] FlashMLA spec decode support (#26541)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-10-14 19:38:20 +00:00
Qier Li
720394de43
[KVConnector][Metrics] Aggregate scheduler-side KVConnectorStats (#26046)
Signed-off-by: Qier Li <kevin44036@gmail.com>
2025-10-14 14:38:07 +00:00
Chendi.Xue
7e6edb1469
[NIXL][HeteroTP] Enable KV transfer from HND prefill to NHD decode (#26556)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
2025-10-14 09:46:05 +00:00
Lucia Fang
8317f72354
[Misc][DP] support customized aggregated logger for dp (#24354)
Signed-off-by: Lu Fang <fanglu@fb.com>
2025-10-13 17:45:59 -07:00
Maximilien de Bayser
d8bebb008a
Add tests for chunked prefill and prefix cache with causal pooling models (#26526)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Ayush Singh <ayush1009208@gmail.com>
2025-10-14 07:45:04 +08:00
Will Eaton
53c9a7cee2
[P/D] [NixlConnector] kv load recovery integration (#26171)
Signed-off-by: Will Eaton <weaton@redhat.com>
2025-10-13 08:48:04 -07:00
Bram Wasti
3263799056
[unrevert] Add batch invariant kernel override for FlashInfer backend [2/n] (#26373)
Signed-off-by: Bram Wasti <bwasti@meta.com>
Signed-off-by: Bram Wasti <bwasti@fb.com>
2025-10-13 10:24:53 -04:00
Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-12 09:51:31 -07:00
Chendi.Xue
9bb38130cb
[Bugfix] Fix GPU_ID issue in test script (#26442)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
2025-10-12 11:39:05 +00:00
Nick Hill
5bc26c438d
[BugFix] Make penalties and bad_words work with async scheduling (#26467)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-10-10 23:27:04 +00:00
Nick Hill
949cb0170d
[BugFix] Fix async scheduling + request preemption (#26385)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-10-10 20:29:57 +00:00
Harry Mellor
7c12763b24
Fix some typing issues found by mypy==1.18.2 (#26596)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-10 18:21:25 +00:00
Mark McLoughlin
784c231151
[NIXL] Ignore abort on already-finished request (#25067)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-10-10 12:21:56 +02:00
Chen Zhang
606b00e80f
[bugfix][DCP] fix block_size of hash in DCP prefix caching (#26296)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-10-10 03:02:49 -07:00
Cyrus Leung
ad430a67ca
[Metrics] Log multi-modal cache stats and fix reset (#26285)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-10 01:45:55 -07:00
Cyrus Leung
4bdf7ac593
[Bugfix] Fix SHM cache initialization (#26427)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-09 02:48:04 -07:00
Cyrus Leung
dc7976dd9f
[Misc] Upgrade more code to Python 3.10 (#26463)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-09 10:43:53 +01:00
Zhiyuan Li
d24cf322e1
[Hybrid]: Decouple Kernel Block Size from KV Page Size (#24486)
Signed-off-by: lizhiyuan <uniartisan2017@gmail.com>
Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com>
2025-10-08 23:43:39 -07:00
Qier Li
d17f0fbf30
[Core][KVConnector] Propagate all tokens on resumed preemptions (#24926)
Signed-off-by: Qier Li <kevin44036@gmail.com>
Co-authored-by: Qier Li <qier@fb.com>
2025-10-09 14:43:31 +08:00
Elaine Zhao
f08919b7d1
[Bugfix] Respect min_tokens in scheduler stop check (#26317)
Signed-off-by: Elaine Zhao <elaineyz@amazon.com>
2025-10-08 14:08:24 -07:00
Matthew Bonanni
76879cc160
[Attention] Implement universal BACKEND_MAP (#25900)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-10-08 12:00:25 -07:00
Wentao Ye
4ba8875749
[Bug] Fix Test in Batch Invariant (#26128)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-08 10:13:47 -07:00
Harry Mellor
2f99f2f506
Tidy vllm/config/__init__.py to only add classes and functions (#26405)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-08 07:10:00 -07:00
Lucas Wilkinson
f80e7866c0
[Misc] Clean up cruft from previous FlashMLA sparse implementation (#26125)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-10-08 10:09:34 +08:00
Thomas Parnell
31a4b3e6c4
Revert #24446 and #26168 (#26332)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-10-07 16:38:19 -06:00
Sergei Skvortsov
6ebaf43ee4
[V1] Logit processors for rejection sampler (#19482)
Signed-off-by: southfreebird <yvorott@gmail.com>
Signed-off-by: Sergei Skvortsov <sergeyskv@nebius.com>
Signed-off-by: Sergei Skvortsov <yvorott@gmail.com>
Co-authored-by: Sergei Skvortsov <sergeyskv@nebius.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-10-07 13:02:49 -07:00
Cyrus Leung
1e4ecca1d0
[V0 Deprecation] Remove VLLM_USE_V1 from tests (#26341)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-07 15:42:31 +00:00
Grant Holmes (Ren)
d100d78eb3
Optimize KV cache distribution for asymmetric pipeline parallelism (#25164)
Signed-off-by: gholmes829 <g.holmes429@gmail.com>
2025-10-07 09:20:30 +00:00
Sage Moore
2111b4643c
[Core] Simplify the Dp padding/should ubatch coordination logic (#25768)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-10-07 01:57:49 +00:00
Yannick Schnider
6431be808f
[Tests] conftest: Extending VllmRunner and HfRunner to accept token_ids as input (#26295)
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-06 17:19:34 +00:00
Matthew Bonanni
4727a8afa7
[Attention] Remove unused reorder_batch method (#24463)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-10-06 13:13:39 -04:00
Cyrus Leung
391612e78b
[Frontend] Consolidate tokenizer init code (#26276)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-06 09:34:52 +00:00
Harry Mellor
6c04638214
Fix per file ruff ignores related to line length (#26262)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-06 05:12:40 +00:00
Harry Mellor
1c0c68202c
Fix per file ruff ignores related to typing (#26254)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 16:37:55 +00:00
Harry Mellor
557b2e961d
Remove all cases of fmt: on/off (#26253)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 09:18:14 -07:00
Harry Mellor
4e256cadc2
Remove all references to yapf as it's no longer used (#26251)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 09:18:11 -07:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
22quinn
78c1d5bfd2
[Easy] Add str repr for IterationStats (#26232)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-10-05 05:00:21 +00:00