xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-22 17:45:01 +08:00

Author	SHA1	Message	Date
Harry Mellor	8781cd6b88	Add Eagle and Eagle3 support to Transformers modeling backend (#30340 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-11 17:02:10 +00:00
Martin Hickey	f4417f8449	[KVConnector] Add KV events to KV Connectors (#28309 ) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>	2025-12-11 15:30:29 +01:00
Wentao Ye	d6464f2679	[Chore] Fix torch precision warning (#30428 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-11 04:05:56 +00:00
shivampr	8580919ac3	[Bugfix] fix confusing OOM errors during v1 init (#28051 ) Signed-off-by: Shivam <shivamprasad91@gmail.com> Signed-off-by: shivampr <shivampr.dev@gmail.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-12-10 23:17:41 +00:00
Will Eaton	a9e4106f28	[P/D] KV Load Failure Recovery/Abort Configuration (#26813 ) Signed-off-by: Will Eaton <weaton@redhat.com> Signed-off-by: Will Eaton <me@wseaton.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-12-10 11:00:52 -08:00
Andreas Karatzas	ed7af3178a	[ROCm][CI] Attempt to fix the failures under a subgroup of the e2e the test group (#29358 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Micah Williamson <micah.williamson@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com>	2025-12-10 05:33:13 +00:00
Micah Williamson	7d80c73d42	[CI] Reduce Flakiness For test_spec_decode.py::test_suffix_decoding_acceptance (#30367 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-12-10 02:35:49 +00:00
Lucas Wilkinson	abe93bce59	[Attention] Make seq_lens_cpu optional in CommonAttentionMetadata to enable true async spec-decode (#29624 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-12-09 17:18:10 -08:00
Benjamin Chislett	e858bfe051	[Cleanup] Refactor profiling env vars into a CLI config (#29912 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-09 13:29:33 -05:00
Wentao Ye	83319b44c2	[Compile] Fix torch warning `TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled` (#29897 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-09 10:40:37 -05:00
Lucas Wilkinson	56037dfa2f	[BugFix] Fix `assert batch_descriptor.num_tokens == num_tokens_padded` (#30173 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-12-09 10:36:12 -05:00
quanliu	5dcd593baf	[Feature] Batch-Invariant Support for FA2 and LoRA (#30018 ) Signed-off-by: quanliu <18646313696@163.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-12-09 10:01:38 -05:00
Hubert de La Jonquiere	c72ea10723	[Structured Output][Reasoning] Improves decoding throughput for models using single-token reasoning endings. (#30056 )	2025-12-09 18:54:08 +08:00
Micah Williamson	aeb82b1930	[CI] Fix Flaky test_eagle_max_len Test (#30306 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-12-09 07:33:34 +00:00
Lucas Wilkinson	aed846917f	[Attention] Make `split_decodes_and_prefills(..., require_uniform=True)` support padding (#29644 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-12-09 07:24:01 +00:00
Or Ozeri	4c6fd25880	kv_transfer: Rename the shared storage connectors (#30201 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-12-08 20:46:09 -08:00
Wentao Ye	d9417096d1	[Feature] Batch invariant: Enable `TRITON_MLA` without prefix-caching (#29125 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-08 19:31:57 -05:00
Victor Ziliang Peng	f1599ca55d	feat(metrics): Add prefill KV compute metric excluding cached tokens (#30189 ) Signed-off-by: Ziliang Peng <ziliang@character.ai>	2025-12-09 00:08:48 +00:00
Ming Yang	60d17251c9	[Disagg] Support large batch size in proxy server and update NixlConnector doc for DP (#28782 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-12-09 00:01:08 +00:00
Charlie Fu	6af70e11a0	[ROCm][CI] Fix test_max_len.py for Rocm (#29916 ) Signed-off-by: charlifu <charlifu@amd.com> Signed-off-by: Charlie Fu <Charlie.Fu@amd.com>	2025-12-08 16:58:30 -05:00
Cyrus Leung	e83b7e379c	Revert "[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 )" (#30199 )	2025-12-07 00:00:22 -08:00
Cyrus Leung	27f4c2fd46	[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 23:15:42 -08:00
Cyrus Leung	671427efbf	[Model] Move `multimodal_cpu_fields` definition to field config (#30181 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 13:40:02 +00:00
rasmith	b12f4a9830	[CI/Build][AMD] Use ROCM_ATTN instead of FLASH_ATTN test for test_register_kv_caches for ROCm and update test for TRITON_ATTN (#29985 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2025-12-05 20:57:38 -08:00
Samuel Shen	7e31c3a3f6	[CI]: Remove unnecessary imports from test_lmache_integration (#30157 ) Signed-off-by: Samuel Shen <slshen@uchicago.edu> Co-authored-by: Samuel Shen <slshen@uchicago.edu>	2025-12-06 12:53:34 +08:00
Divakar Verma	962d703818	[Bugfix][llama4_eagle] Fix missing 'lm_head' attribute (#29926 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-12-05 19:57:26 +00:00
Matthew Bonanni	66e674cdd5	[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-12-05 09:48:43 -08:00
Mark McLoughlin	dff0a2b394	[NIXL] Add remote_request_id to kv_transfer_params (#29665 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-12-05 09:43:48 -08:00
Nick Hill	dc264bcea1	[BugFix] Eagerly abort cancelled final-step requests (#29987 ) Currently, when requests are cancelled while executing their final step, "completion" is handled based on normal stop processing (e.g. length or stop token), so the abort has no effect. This is typically not a problem, but when a kv connector is involved it thinks the request completed successfully rather than being aborted. This is problematic for disaggregated prefill which will free kv cache blocks if the request was aborted but not if it completed successfully—since the cancelled request will never be sent to the decode side, kv cache blocks remain pinned until the fall-back timeout expires. The problem is exacerbated when many requests are cancelled and/or there are large prefills whose forward pass takes a long time (since the window is bigger). This PR fixes the problem by processing pending aborts immediately prior to processing model output each step; we process only aborts, not new requests, since it's preferable for latency to process model outputs before new incoming requests. Fixes #26400. Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-05 17:28:32 +00:00
Nicolò Lucchesi	78c44fd722	[NIXL] Small cleanup of unused variables (#29618 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-05 18:17:36 +01:00
Mark McLoughlin	949a6a19d2	[NIXL] Add compatibility checking to NIXL KV connector handshake (#29503 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-12-05 15:52:45 +01:00
Alec S	65ee97288a	[BugFix] Adding env variable to disable async grammar compilation (#29996 ) Signed-off-by: Alec Solder <alecs@fb.com> Signed-off-by: Alec S <10566873+alecsolder@users.noreply.github.com> Co-authored-by: Alec Solder <alecs@fb.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-12-05 00:49:37 -08:00
rasmith	feecba09af	[CI/Build][AMD] Use float16 in test_reset_prefix_cache_e2e to avoid accuracy issues (#29997 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-05 08:42:25 +00:00
Lucas Wilkinson	c8ab988b15	[BugFix] Fix DBO assert `assert B_block_table == B_q` (#29933 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-12-04 14:48:54 -05:00
Doug Smith	5b4b42c0b6	Mark DBO test as flaky on b200 for Distributed B200 test (#29913 ) Signed-off-by: dougbtv <dosmith@redhat.com>	2025-12-04 10:38:03 -05:00
rasmith	f2f4cea6cc	[CI/Build][AMD] Skip test on test_hybrid_attention_mamba_tensor_shapes on ROCm, requires FLASHINFER (#29995 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-04 09:30:22 +00:00
Mark McLoughlin	899e2ef558	[Core] Fix standalone runs of test_reset_prefix_cache_e2e (#29899 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-12-04 16:22:03 +08:00
Charlie Fu	9aa33a74b0	[Rocm][CI] Fix test_speculator_eagle3 by skipping the CompressedTensorw4a16 Model (#30001 ) Signed-off-by: charlifu <charlifu@amd.com> Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com>	2025-12-04 07:52:28 +00:00
Cyrus Leung	9ae2f60374	[Misc] Various cleanups for MM input processing (#29970 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-04 06:22:20 +00:00
Micah Williamson	d1f7392c5f	[ROCm][CI] Fix v1/logits_processors failure on ROCm (#29927 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-12-04 01:17:07 +08:00
Lumis Chen	9bcf92295a	[Core] Add xxHash as a high-performance hash option for accelerating prefix caching (#29163 ) Signed-off-by: LuminolT <lumischen01@gmail.com> Signed-off-by: Lumis Chen <lumischen01@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-12-03 16:06:57 +00:00
rasmith	5aa9b09040	[CI/Build][AMD] Skip test_shared_storage_connector_hashes in test_shared_storage_connector.py due to hipErrorLaunchFailure when calling .cpu() (#29839 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-03 22:56:35 +08:00
Micah Williamson	c014de1ec7	[ROCm][CI] Fix test_cudagraph_mode.py Failure For AMD CI (#29808 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-12-02 22:54:36 +00:00
Chauncey	0a9caca9f5	[Bugfix] fix --scheduling-policy=priority & n>1 crashes engine (#29764 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-02 22:42:28 +00:00
Divakar Verma	afb1e5b380	[CI][ROCm][tests/v1/e2e] Fix multiprocessing launch for the test (#29123 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-12-02 20:46:10 +00:00
Harry Mellor	951445a52d	Remove default values from `InitVar`s so that they're not stored (#29859 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-02 12:16:37 +00:00
杰兮	48d15a32aa	[CI] Fix Bad_words test for tokenizer encode/decode asymmetry (#28193 ) Signed-off-by: zhyajie <yajizhan@amd.com> Co-authored-by: zhyajie <yajizhan@amd.com>	2025-12-02 00:02:12 -08:00
Cyrus Leung	653591d5e7	[Chore] Move tokenizer initialization methods (#29793 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-02 13:33:37 +08:00
Divakar Verma	e2fbfc955e	[CI][AMD] spec_decode:eagle skip FLASH_ATTN for deepseek on ROCm (#29827 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-12-02 05:27:46 +00:00
Divakar Verma	a690fb5bd6	[CI][ROCm] Fix test_correctness_sliding_window (#29243 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-02 04:53:27 +00:00

1 2 3 4 5 ...

817 Commits