xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-29 23:58:51 +08:00

Author	SHA1	Message	Date
Benjamin Chislett	e858bfe051	[Cleanup] Refactor profiling env vars into a CLI config (#29912 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-09 13:29:33 -05:00
Ilya Markov	0b6a8a304c	[BugFix] Fix non detected failing tests (#30277 ) Signed-off-by: ilmarkov <markovilya197@gmail.com>	2025-12-09 17:57:55 +00:00
Wentao Ye	83319b44c2	[Compile] Fix torch warning `TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled` (#29897 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-09 10:40:37 -05:00
Lucas Wilkinson	56037dfa2f	[BugFix] Fix `assert batch_descriptor.num_tokens == num_tokens_padded` (#30173 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-12-09 10:36:12 -05:00
quanliu	5dcd593baf	[Feature] Batch-Invariant Support for FA2 and LoRA (#30018 ) Signed-off-by: quanliu <18646313696@163.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-12-09 10:01:38 -05:00
Julien Denize	5c213d2899	[BUGFIX] Mistral tool call parser v11+ (#30332 ) Signed-off-by: juliendenize <julien.denize@mistral.ai>	2025-12-09 14:55:38 +00:00
Hubert de La Jonquiere	c72ea10723	[Structured Output][Reasoning] Improves decoding throughput for models using single-token reasoning endings. (#30056 )	2025-12-09 18:54:08 +08:00
Jaya Yuan	67475a6e81	[DCP][Bugfix][CI] Fix accuracy issue of DCP when using FLASH_ATTN_MLA (#30309 ) Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com>	2025-12-09 08:22:14 +00:00
Micah Williamson	aeb82b1930	[CI] Fix Flaky test_eagle_max_len Test (#30306 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-12-09 07:33:34 +00:00
Lucas Wilkinson	aed846917f	[Attention] Make `split_decodes_and_prefills(..., require_uniform=True)` support padding (#29644 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-12-09 07:24:01 +00:00
Or Ozeri	4c6fd25880	kv_transfer: Rename the shared storage connectors (#30201 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-12-08 20:46:09 -08:00
czhu-cohere	f6227c22ab	[Kernel]Support W4A8 Grouped GEMM on Hopper (#29691 ) Signed-off-by: czhu-cohere <conway.zhu@cohere.com>	2025-12-08 19:29:06 -08:00
gnovack	ea657f2078	Lora MoE Align Improvements (#29257 ) Signed-off-by: gnovack <gnovack@amazon.com>	2025-12-09 10:35:16 +08:00
Yanan Cao	7b35011ad1	Mark qwen2_5_vl as xfail (#30283 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-12-09 01:14:10 +00:00
Wentao Ye	d9417096d1	[Feature] Batch invariant: Enable `TRITON_MLA` without prefix-caching (#29125 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-08 19:31:57 -05:00
Victor Ziliang Peng	f1599ca55d	feat(metrics): Add prefill KV compute metric excluding cached tokens (#30189 ) Signed-off-by: Ziliang Peng <ziliang@character.ai>	2025-12-09 00:08:48 +00:00
Ming Yang	60d17251c9	[Disagg] Support large batch size in proxy server and update NixlConnector doc for DP (#28782 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-12-09 00:01:08 +00:00
Charlie Fu	6af70e11a0	[ROCm][CI] Fix test_max_len.py for Rocm (#29916 ) Signed-off-by: charlifu <charlifu@amd.com> Signed-off-by: Charlie Fu <Charlie.Fu@amd.com>	2025-12-08 16:58:30 -05:00
roikoren755	ae0f69b16a	Add SpecDec support to `selective_state_update` (#29488 ) Signed-off-by: Roi Koren <roik@nvidia.com>	2025-12-08 16:45:18 -05:00
Jee Jee Li	67312cad11	[Misc] Split the LoRA code (#30253 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-12-09 00:59:31 +08:00
Laith Sakka	87aee9ed2b	Add evaluate_guards option to DynamicShapesConfig (#27432 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-12-08 10:46:15 -05:00
Daniel Cámpora	184076c3fe	[DeepSeek v3.2] Make top-k work for any logit values. (#27568 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-08 06:55:58 -08:00
wang.yuqi	2e660c2434	[Frontend] Binary embedding response does not return metadata by setting encoding_format to bytes_only. (#30249 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-08 12:01:21 +00:00
wang.yuqi	9e77ffca3f	[Model][7/N] Improve all pooling task \| Deprecation as_reward_model. Extract hidden states prefer using new multi-vector retrieval API (#26686 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-12-08 08:10:09 +00:00
daniel-salib	444f0e3f33	[Frontend] Add MCP type support infrastructure to Responses API (#30054 ) Signed-off-by: Daniel Salib <danielsalib@meta.com>	2025-12-08 10:02:52 +08:00
ElizaWszola	af0444bf40	[Performance] Fused blockwise quant RMS norm (#27883 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: yewentao256 <zhyanwentao@126.com>	2025-12-07 16:38:04 +00:00
Isotr0py	b952f4d3c3	[v1] Add PrefixLM support to FlexAttention backend (#27938 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-12-07 15:51:36 +00:00
Jee Jee Li	b0f4866a77	[CI/Build]Temporary workaround for test_default_mm_loras timeout (#30202 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-12-07 20:27:11 +08:00
Jinzhen Lin	879ddb09c3	[Kernel][MoE] optimize `moe_align_block_size` (#29642 ) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-12-07 01:58:47 -08:00
Cyrus Leung	e83b7e379c	Revert "[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 )" (#30199 )	2025-12-07 00:00:22 -08:00
Cyrus Leung	27f4c2fd46	[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 23:15:42 -08:00
Wentao Ye	17eb25e327	[Perf] Enable cuda graph for deepepHT, 5.3% throughput improvement, 4.4% TTFT improvement (#29558 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-07 04:44:50 +00:00
jeremyteboul	dce6d229f7	Support multiple image/audio embeddings per requests (#29988 ) Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com> Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>	2025-12-07 04:34:24 +00:00
Yanan Cao	cbedb703cc	[Frontend] Remove confusing -O.xx flag error (#30169 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-12-07 02:53:42 +00:00
Andrew Xia	421125d03a	[ez] move harmony utils to parser folder (#30117 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-12-06 17:34:34 -05:00
Cyrus Leung	671427efbf	[Model] Move `multimodal_cpu_fields` definition to field config (#30181 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 13:40:02 +00:00
Viacheslav	21bb323542	Gigachat 3 tool parser and tests (#29905 ) Signed-off-by: Viacheslav Barinov <viacheslav.teh@gmail.com>	2025-12-06 12:04:14 +00:00
Yu Jiaqi	43e7593031	Support tokenization_kwargs override (#29794 ) Signed-off-by: piood <2477084691@qq.com>	2025-12-06 09:12:53 +00:00
rasmith	b12f4a9830	[CI/Build][AMD] Use ROCM_ATTN instead of FLASH_ATTN test for test_register_kv_caches for ROCm and update test for TRITON_ATTN (#29985 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2025-12-05 20:57:38 -08:00
rasmith	62079d8600	[CI/Build][AMD] Skip marlin, machete, and hadacore tests since these require _C functions not defined for ROCm (#30109 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-06 12:54:17 +08:00
Samuel Shen	7e31c3a3f6	[CI]: Remove unnecessary imports from test_lmache_integration (#30157 ) Signed-off-by: Samuel Shen <slshen@uchicago.edu> Co-authored-by: Samuel Shen <slshen@uchicago.edu>	2025-12-06 12:53:34 +08:00
Deboleina	02a4169193	[Tests] Tool call tests for openai/gpt-oss-20b (#26237 ) Signed-off-by: Debolina Roy <debroy@redhat.com>	2025-12-05 19:03:29 -08:00
Divakar Verma	962d703818	[Bugfix][llama4_eagle] Fix missing 'lm_head' attribute (#29926 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-12-05 19:57:26 +00:00
Nicolò Lucchesi	e23ca3a0e8	[CI] Re-use whisper_client for all tests (#30148 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-05 19:47:37 +00:00
Russell Bryant	3633035a3f	[Misc] Rename CohereForAI references to CohereLabs (#30147 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-12-05 19:41:40 +00:00
Ilya Markov	4e26d3b09e	[Compile] Conditional compilation. Introduce compile_ranges (#24252 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Luka Govedič <luka.govedic@gmail.com> Signed-off-by: ProExpertProg <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com>	2025-12-05 18:17:32 +00:00
Matthew Bonanni	66e674cdd5	[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-12-05 09:48:43 -08:00
Mark McLoughlin	dff0a2b394	[NIXL] Add remote_request_id to kv_transfer_params (#29665 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-12-05 09:43:48 -08:00
Nick Hill	dc264bcea1	[BugFix] Eagerly abort cancelled final-step requests (#29987 ) Currently, when requests are cancelled while executing their final step, "completion" is handled based on normal stop processing (e.g. length or stop token), so the abort has no effect. This is typically not a problem, but when a kv connector is involved it thinks the request completed successfully rather than being aborted. This is problematic for disaggregated prefill which will free kv cache blocks if the request was aborted but not if it completed successfully—since the cancelled request will never be sent to the decode side, kv cache blocks remain pinned until the fall-back timeout expires. The problem is exacerbated when many requests are cancelled and/or there are large prefills whose forward pass takes a long time (since the window is bigger). This PR fixes the problem by processing pending aborts immediately prior to processing model output each step; we process only aborts, not new requests, since it's preferable for latency to process model outputs before new incoming requests. Fixes #26400. Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-05 17:28:32 +00:00
Nicolò Lucchesi	78c44fd722	[NIXL] Small cleanup of unused variables (#29618 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-05 18:17:36 +01:00

1 2 3 4 5 ...

3819 Commits