xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-08 03:29:06 +08:00

Author	SHA1	Message	Date
Lucas Wilkinson	0044c4038c	[BugFix][DeepSeek-V3.2] Fix backend selection logic for Blackwell (#30195 )	2025-12-07 10:53:51 -05:00
Isotr0py	b952f4d3c3	[v1] Add PrefixLM support to FlexAttention backend (#27938 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-12-07 15:51:36 +00:00
Wentao Ye	541a2ef892	[Perf] Deepgemm fused layout kernel for activations, 4.3% throughput improvement, 10.7% TTFT improvement. (#29546 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-07 20:31:14 +08:00
Jee Jee Li	b0f4866a77	[CI/Build]Temporary workaround for test_default_mm_loras timeout (#30202 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-12-07 20:27:11 +08:00
Jinzhen Lin	879ddb09c3	[Kernel][MoE] optimize `moe_align_block_size` (#29642 ) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-12-07 01:58:47 -08:00
Yifan Qiao	1b0482b9d1	[Misc][Core] Remove unused `req_index` increment in scheduler (#30176 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>	2025-12-07 08:39:21 +00:00
Cyrus Leung	e83b7e379c	Revert "[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 )" (#30199 )	2025-12-07 00:00:22 -08:00
Cyrus Leung	27f4c2fd46	[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 23:15:42 -08:00
Luke	a49d813fa8	Lazy loading to avoid importing all files (#29716 ) Signed-off-by: Luke <yq0536@gmail.com>	2025-12-07 07:13:14 +00:00
Wentao Ye	17eb25e327	[Perf] Enable cuda graph for deepepHT, 5.3% throughput improvement, 4.4% TTFT improvement (#29558 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-07 04:44:50 +00:00
jeremyteboul	dce6d229f7	Support multiple image/audio embeddings per requests (#29988 ) Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com> Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>	2025-12-07 04:34:24 +00:00
Yanan Cao	cbedb703cc	[Frontend] Remove confusing -O.xx flag error (#30169 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-12-07 02:53:42 +00:00
AuruTus	8d3da4c79d	[MISC]: change NIXL compatibility hash logging level to debug (#30182 )	2025-12-07 00:21:03 +00:00
Andrew Xia	421125d03a	[ez] move harmony utils to parser folder (#30117 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-12-06 17:34:34 -05:00
Cyrus Leung	671427efbf	[Model] Move `multimodal_cpu_fields` definition to field config (#30181 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 13:40:02 +00:00
Viacheslav	21bb323542	Gigachat 3 tool parser and tests (#29905 ) Signed-off-by: Viacheslav Barinov <viacheslav.teh@gmail.com>	2025-12-06 12:04:14 +00:00
Chukwuma Nwaugha	17a9abec2b	simplify requires_files list creation (#29656 ) Signed-off-by: Chukwuma Nwaugha <nwaughac@gmail.com>	2025-12-06 09:42:41 +00:00
Ye (Charlotte) Qi	92c35abb24	[Misc] Fix circular import in vllm.transformers_utils.config (#30179 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-12-06 09:24:03 +00:00
Yu Jiaqi	43e7593031	Support tokenization_kwargs override (#29794 ) Signed-off-by: piood <2477084691@qq.com>	2025-12-06 09:12:53 +00:00
Cyrus Leung	c46b932df2	[Chore] Deprecate `SupportsMultiModal.merge_by_field_config` (#30170 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 07:57:28 +00:00
redwrasse	6476382384	prefix caching design doc sha256 now default (#29261 ) Signed-off-by: redwrasse <mail@redwrasse.io>	2025-12-06 07:39:56 +00:00
kx	d6aeaddf4a	[bugfix] fix type[AttentionBackend] bug in kv_connector_base_v1 (#30051 ) Signed-off-by: 01267596 <xiongkai123@cmbchina.com> Co-authored-by: 01267596 <xiongkai123@cmbchina.com>	2025-12-06 07:11:31 +00:00
Woosuk Kwon	a238cbd89d	[Model Runner V2] Support min-p sampling (#30171 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-12-05 21:42:47 -08:00
Nick Hill	4026ae31e9	[Misc] Move `disable_nccl_for_dp_synchronization` init logic into `VllmConfig` (#30161 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-05 20:59:04 -08:00
rasmith	b12f4a9830	[CI/Build][AMD] Use ROCM_ATTN instead of FLASH_ATTN test for test_register_kv_caches for ROCm and update test for TRITON_ATTN (#29985 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2025-12-05 20:57:38 -08:00
Rohan Potdar	40a046cd82	[Bugfix]: Fix `TokenizerLike` interface (#30009 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2025-12-05 20:56:40 -08:00
Peter Salas	e858bc4d14	[Model] Add support for transformer-based Ultravox v0.7 projector (#30089 ) Signed-off-by: Peter Salas <peter@fixie.ai>	2025-12-05 20:55:43 -08:00
Dongjie Zou	e3fbb6f152	fix#30092 Kimi-Linear model loading failure with missing indexer_rotary_emb (#30093 ) Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>	2025-12-05 20:55:09 -08:00
yuttian1	c4d62618ca	Fix AWQ MoE marlin check issue in marlin_utils.py for AMD backend (#30102 ) Signed-off-by: yuttian1 <yuttian@amd.com>	2025-12-05 20:54:38 -08:00
rasmith	62079d8600	[CI/Build][AMD] Skip marlin, machete, and hadacore tests since these require _C functions not defined for ROCm (#30109 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-06 12:54:17 +08:00
Harry Mellor	bf4a901af9	Better error when world size is larger than node and `distributed_executor_backend` is not set (#30140 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-05 20:53:52 -08:00
Samuel Shen	7e31c3a3f6	[CI]: Remove unnecessary imports from test_lmache_integration (#30157 ) Signed-off-by: Samuel Shen <slshen@uchicago.edu> Co-authored-by: Samuel Shen <slshen@uchicago.edu>	2025-12-06 12:53:34 +08:00
rasmith	dc839ad03d	[CI/Build][AMD][Quantization] Fix test_int8_kernel.py by updating int8_utils to use hip.libdevice.round (#30151 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-05 20:52:11 -08:00
Deboleina	02a4169193	[Tests] Tool call tests for openai/gpt-oss-20b (#26237 ) Signed-off-by: Debolina Roy <debroy@redhat.com>	2025-12-05 19:03:29 -08:00
Wentao Ye	7b5575fa7d	[Bug] Fix vLLM config is not set error (#29999 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-05 16:42:12 -05:00
Bangsheng Tang	77e4472809	let draft model follow target model's config_format (#30152 )	2025-12-05 13:33:42 -08:00
Divakar Verma	962d703818	[Bugfix][llama4_eagle] Fix missing 'lm_head' attribute (#29926 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-12-05 19:57:26 +00:00
Nicolò Lucchesi	e23ca3a0e8	[CI] Re-use whisper_client for all tests (#30148 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-05 19:47:37 +00:00
Russell Bryant	3633035a3f	[Misc] Rename CohereForAI references to CohereLabs (#30147 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-12-05 19:41:40 +00:00
Nicolò Lucchesi	bff78310d9	[Enc-Dec] Fix OOT tokenizer issue (#30144 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-05 19:23:33 +00:00
Tova Movshovitz	adb315060c	[KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache (#27170 ) Signed-off-by: tovam <tovam@pliops.com> Signed-off-by: Tova Movshovitz <tovam@pliops.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-05 18:33:26 +00:00
Ilya Markov	4e26d3b09e	[Compile] Conditional compilation. Introduce compile_ranges (#24252 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Luka Govedič <luka.govedic@gmail.com> Signed-off-by: ProExpertProg <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com>	2025-12-05 18:17:32 +00:00
Matthew Bonanni	66e674cdd5	[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-12-05 09:48:43 -08:00
Mark McLoughlin	dff0a2b394	[NIXL] Add remote_request_id to kv_transfer_params (#29665 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-12-05 09:43:48 -08:00
Nick Hill	dc264bcea1	[BugFix] Eagerly abort cancelled final-step requests (#29987 ) Currently, when requests are cancelled while executing their final step, "completion" is handled based on normal stop processing (e.g. length or stop token), so the abort has no effect. This is typically not a problem, but when a kv connector is involved it thinks the request completed successfully rather than being aborted. This is problematic for disaggregated prefill which will free kv cache blocks if the request was aborted but not if it completed successfully—since the cancelled request will never be sent to the decode side, kv cache blocks remain pinned until the fall-back timeout expires. The problem is exacerbated when many requests are cancelled and/or there are large prefills whose forward pass takes a long time (since the window is bigger). This PR fixes the problem by processing pending aborts immediately prior to processing model output each step; we process only aborts, not new requests, since it's preferable for latency to process model outputs before new incoming requests. Fixes #26400. Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-05 17:28:32 +00:00
Nicolò Lucchesi	78c44fd722	[NIXL] Small cleanup of unused variables (#29618 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-05 18:17:36 +01:00
Angela Yi	e7296b08da	[bugfix] Pass globals to aot_compiled function (#29428 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2025-12-05 16:54:26 +00:00
Andrew Xia	da7bc54ea8	[responsesAPI][5] ResponsesParser with tools for full MCP python loop (#29798 ) Signed-off-by: Andrew Xia <axia@fb.com> Signed-off-by: Andrew Xia <axia@meta.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-12-05 11:11:50 -05:00
Mark McLoughlin	949a6a19d2	[NIXL] Add compatibility checking to NIXL KV connector handshake (#29503 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-12-05 15:52:45 +01:00
Alec S	2c174420f5	Reduce validation to a warning (#28749 ) Signed-off-by: Alec Solder <alecs@fb.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Alec Solder <alecs@fb.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-05 14:02:49 +00:00

1 2 3 4 5 ...

12027 Commits