xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-26 00:09:40 +08:00

Author	SHA1	Message	Date
Wentao Ye	d9417096d1	[Feature] Batch invariant: Enable `TRITON_MLA` without prefix-caching (#29125 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-08 19:31:57 -05:00
Victor Ziliang Peng	f1599ca55d	feat(metrics): Add prefill KV compute metric excluding cached tokens (#30189 ) Signed-off-by: Ziliang Peng <ziliang@character.ai>	2025-12-09 00:08:48 +00:00
Ming Yang	60d17251c9	[Disagg] Support large batch size in proxy server and update NixlConnector doc for DP (#28782 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-12-09 00:01:08 +00:00
Charlie Fu	6af70e11a0	[ROCm][CI] Fix test_max_len.py for Rocm (#29916 ) Signed-off-by: charlifu <charlifu@amd.com> Signed-off-by: Charlie Fu <Charlie.Fu@amd.com>	2025-12-08 16:58:30 -05:00
Cyrus Leung	e83b7e379c	Revert "[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 )" (#30199 )	2025-12-07 00:00:22 -08:00
Cyrus Leung	27f4c2fd46	[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 23:15:42 -08:00
Cyrus Leung	671427efbf	[Model] Move `multimodal_cpu_fields` definition to field config (#30181 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 13:40:02 +00:00
rasmith	b12f4a9830	[CI/Build][AMD] Use ROCM_ATTN instead of FLASH_ATTN test for test_register_kv_caches for ROCm and update test for TRITON_ATTN (#29985 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2025-12-05 20:57:38 -08:00
Samuel Shen	7e31c3a3f6	[CI]: Remove unnecessary imports from test_lmache_integration (#30157 ) Signed-off-by: Samuel Shen <slshen@uchicago.edu> Co-authored-by: Samuel Shen <slshen@uchicago.edu>	2025-12-06 12:53:34 +08:00
Divakar Verma	962d703818	[Bugfix][llama4_eagle] Fix missing 'lm_head' attribute (#29926 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-12-05 19:57:26 +00:00
Matthew Bonanni	66e674cdd5	[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-12-05 09:48:43 -08:00
Mark McLoughlin	dff0a2b394	[NIXL] Add remote_request_id to kv_transfer_params (#29665 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-12-05 09:43:48 -08:00
Nick Hill	dc264bcea1	[BugFix] Eagerly abort cancelled final-step requests (#29987 ) Currently, when requests are cancelled while executing their final step, "completion" is handled based on normal stop processing (e.g. length or stop token), so the abort has no effect. This is typically not a problem, but when a kv connector is involved it thinks the request completed successfully rather than being aborted. This is problematic for disaggregated prefill which will free kv cache blocks if the request was aborted but not if it completed successfully—since the cancelled request will never be sent to the decode side, kv cache blocks remain pinned until the fall-back timeout expires. The problem is exacerbated when many requests are cancelled and/or there are large prefills whose forward pass takes a long time (since the window is bigger). This PR fixes the problem by processing pending aborts immediately prior to processing model output each step; we process only aborts, not new requests, since it's preferable for latency to process model outputs before new incoming requests. Fixes #26400. Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-05 17:28:32 +00:00
Nicolò Lucchesi	78c44fd722	[NIXL] Small cleanup of unused variables (#29618 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-05 18:17:36 +01:00
Mark McLoughlin	949a6a19d2	[NIXL] Add compatibility checking to NIXL KV connector handshake (#29503 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-12-05 15:52:45 +01:00
Alec S	65ee97288a	[BugFix] Adding env variable to disable async grammar compilation (#29996 ) Signed-off-by: Alec Solder <alecs@fb.com> Signed-off-by: Alec S <10566873+alecsolder@users.noreply.github.com> Co-authored-by: Alec Solder <alecs@fb.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-12-05 00:49:37 -08:00
rasmith	feecba09af	[CI/Build][AMD] Use float16 in test_reset_prefix_cache_e2e to avoid accuracy issues (#29997 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-05 08:42:25 +00:00
Lucas Wilkinson	c8ab988b15	[BugFix] Fix DBO assert `assert B_block_table == B_q` (#29933 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-12-04 14:48:54 -05:00
Doug Smith	5b4b42c0b6	Mark DBO test as flaky on b200 for Distributed B200 test (#29913 ) Signed-off-by: dougbtv <dosmith@redhat.com>	2025-12-04 10:38:03 -05:00
rasmith	f2f4cea6cc	[CI/Build][AMD] Skip test on test_hybrid_attention_mamba_tensor_shapes on ROCm, requires FLASHINFER (#29995 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-04 09:30:22 +00:00
Mark McLoughlin	899e2ef558	[Core] Fix standalone runs of test_reset_prefix_cache_e2e (#29899 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-12-04 16:22:03 +08:00
Charlie Fu	9aa33a74b0	[Rocm][CI] Fix test_speculator_eagle3 by skipping the CompressedTensorw4a16 Model (#30001 ) Signed-off-by: charlifu <charlifu@amd.com> Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com>	2025-12-04 07:52:28 +00:00
Cyrus Leung	9ae2f60374	[Misc] Various cleanups for MM input processing (#29970 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-04 06:22:20 +00:00
Micah Williamson	d1f7392c5f	[ROCm][CI] Fix v1/logits_processors failure on ROCm (#29927 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-12-04 01:17:07 +08:00
Lumis Chen	9bcf92295a	[Core] Add xxHash as a high-performance hash option for accelerating prefix caching (#29163 ) Signed-off-by: LuminolT <lumischen01@gmail.com> Signed-off-by: Lumis Chen <lumischen01@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-12-03 16:06:57 +00:00
rasmith	5aa9b09040	[CI/Build][AMD] Skip test_shared_storage_connector_hashes in test_shared_storage_connector.py due to hipErrorLaunchFailure when calling .cpu() (#29839 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-03 22:56:35 +08:00
Micah Williamson	c014de1ec7	[ROCm][CI] Fix test_cudagraph_mode.py Failure For AMD CI (#29808 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-12-02 22:54:36 +00:00
Chauncey	0a9caca9f5	[Bugfix] fix --scheduling-policy=priority & n>1 crashes engine (#29764 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-02 22:42:28 +00:00
Divakar Verma	afb1e5b380	[CI][ROCm][tests/v1/e2e] Fix multiprocessing launch for the test (#29123 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-12-02 20:46:10 +00:00
Harry Mellor	951445a52d	Remove default values from `InitVar`s so that they're not stored (#29859 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-02 12:16:37 +00:00
杰兮	48d15a32aa	[CI] Fix Bad_words test for tokenizer encode/decode asymmetry (#28193 ) Signed-off-by: zhyajie <yajizhan@amd.com> Co-authored-by: zhyajie <yajizhan@amd.com>	2025-12-02 00:02:12 -08:00
Cyrus Leung	653591d5e7	[Chore] Move tokenizer initialization methods (#29793 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-02 13:33:37 +08:00
Divakar Verma	e2fbfc955e	[CI][AMD] spec_decode:eagle skip FLASH_ATTN for deepseek on ROCm (#29827 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-12-02 05:27:46 +00:00
Divakar Verma	a690fb5bd6	[CI][ROCm] Fix test_correctness_sliding_window (#29243 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-02 04:53:27 +00:00
usberkeley	81fe3f82af	[BugFix] Fix index error in ngram_proposer (#29779 ) Signed-off-by: Bradley <bradley.b.pitt@gmail.com>	2025-12-02 04:48:11 +00:00
Zhuohan Li	d0cd728907	[Core] Support reseting all running requests' KV while calling `reset_prefix_cache` (#28827 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-02 02:25:05 +00:00
Nick Hill	44822d7ff2	[BugFix] Preserve spec decoding uniform decode when scheduling (#29759 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-01 17:15:52 -08:00
shivampr	cabc77cc86	[Core][Observability] Add KV cache residency metrics (#27793 ) Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior: vllm:kv_block_lifetime_seconds — total lifetime from allocation to free vllm:kv_block_idle_before_evict_seconds — idle duration before eviction vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates. Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled. Two new runtime flags are introduced: --kv-cache-metrics – enable KV cache residency metrics --kv-cache-metrics-sample – control sampling ratio (default: 0.01) Signed-off-by: Shivam <shivamprasad91@gmail.com>	2025-12-01 18:27:53 +00:00
Marcin Ostrowski	5cfa967efa	[Bugfix] TypeError: 'NoneType' object is not callable (#29414 ) Signed-off-by: Marcin Ostrowski <marcinx.ostrowski@intel.com>	2025-12-01 13:16:44 +00:00
Isotr0py	b95db244ee	[v1] Add real sliding window calculation to FlexAttention direct BlockMask building (#26015 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com> Co-authored-by: baonudesifeizhai <baonudesifeizhai@gmail.com>	2025-12-01 13:12:51 +00:00
Cyrus Leung	f0a28bf661	[Misc] Unify tokenizer registration (#29767 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-01 11:34:58 +00:00
Cyrus Leung	34a984274e	[Misc] Refactor tokenizer interface (#29693 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-29 04:02:21 -08:00
Lucas Wilkinson	e23f665d83	[BugFix] Fix DBO failing with TypeError: 'NoneType' object is not iterable (#29698 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-28 20:19:01 -08:00
Benjamin Chislett	1986de1375	[Perf] Optimize EAGLE prepare_inputs_padded with triton kernels (#28597 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-11-28 22:25:05 +00:00
Cyrus Leung	8d9338fae4	[Chore] Rename `Processor` to `InputProcessor` (#29682 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-28 09:35:41 -08:00
Nick Hill	8e7a891602	[BugFix] Fix spec decoding max_tokens scheduling perf issue (#29542 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-28 20:52:23 +08:00
EanWang211123	37b15e97e8	[Multimodal][Speculative Decoding]Eagle3 mm support, enablement on qwen3vl (#29594 ) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: EanWang211123 <wangyiheng@sangfor.com.cn> Co-authored-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-11-27 22:05:45 -08:00
maang-h	c7ba1f6bc7	[BugFix] Fix ValueError in NewRequestData repr methods (#29392 ) Signed-off-by: maang <maang_h@163.com>	2025-11-28 13:42:30 +08:00
Matthew Bonanni	fc1d8be3dc	[Attention] Update attention imports (#29540 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-27 11:19:09 -05:00
Ryan Rock	bab438ff3e	[CI/Build] Skip ray tests on ROCm (#29556 ) Signed-off-by: Ryan Rock <ryan.rock@amd.com>	2025-11-27 07:01:37 -08:00

1 2 3 4 5 ...

801 Commits