xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-16 12:55:42 +08:00

Author	SHA1	Message	Date
rasmith	dc839ad03d	[CI/Build][AMD][Quantization] Fix test_int8_kernel.py by updating int8_utils to use hip.libdevice.round (#30151 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-05 20:52:11 -08:00
Wentao Ye	7b5575fa7d	[Bug] Fix vLLM config is not set error (#29999 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-05 16:42:12 -05:00
Bangsheng Tang	77e4472809	let draft model follow target model's config_format (#30152 )	2025-12-05 13:33:42 -08:00
Divakar Verma	962d703818	[Bugfix][llama4_eagle] Fix missing 'lm_head' attribute (#29926 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-12-05 19:57:26 +00:00
Nicolò Lucchesi	bff78310d9	[Enc-Dec] Fix OOT tokenizer issue (#30144 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-05 19:23:33 +00:00
Tova Movshovitz	adb315060c	[KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache (#27170 ) Signed-off-by: tovam <tovam@pliops.com> Signed-off-by: Tova Movshovitz <tovam@pliops.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-05 18:33:26 +00:00
Ilya Markov	4e26d3b09e	[Compile] Conditional compilation. Introduce compile_ranges (#24252 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Luka Govedič <luka.govedic@gmail.com> Signed-off-by: ProExpertProg <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com>	2025-12-05 18:17:32 +00:00
Matthew Bonanni	66e674cdd5	[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-12-05 09:48:43 -08:00
Mark McLoughlin	dff0a2b394	[NIXL] Add remote_request_id to kv_transfer_params (#29665 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-12-05 09:43:48 -08:00
Nick Hill	dc264bcea1	[BugFix] Eagerly abort cancelled final-step requests (#29987 ) Currently, when requests are cancelled while executing their final step, "completion" is handled based on normal stop processing (e.g. length or stop token), so the abort has no effect. This is typically not a problem, but when a kv connector is involved it thinks the request completed successfully rather than being aborted. This is problematic for disaggregated prefill which will free kv cache blocks if the request was aborted but not if it completed successfully—since the cancelled request will never be sent to the decode side, kv cache blocks remain pinned until the fall-back timeout expires. The problem is exacerbated when many requests are cancelled and/or there are large prefills whose forward pass takes a long time (since the window is bigger). This PR fixes the problem by processing pending aborts immediately prior to processing model output each step; we process only aborts, not new requests, since it's preferable for latency to process model outputs before new incoming requests. Fixes #26400. Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-05 17:28:32 +00:00
Nicolò Lucchesi	78c44fd722	[NIXL] Small cleanup of unused variables (#29618 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-05 18:17:36 +01:00
Angela Yi	e7296b08da	[bugfix] Pass globals to aot_compiled function (#29428 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2025-12-05 16:54:26 +00:00
Andrew Xia	da7bc54ea8	[responsesAPI][5] ResponsesParser with tools for full MCP python loop (#29798 ) Signed-off-by: Andrew Xia <axia@fb.com> Signed-off-by: Andrew Xia <axia@meta.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-12-05 11:11:50 -05:00
Mark McLoughlin	949a6a19d2	[NIXL] Add compatibility checking to NIXL KV connector handshake (#29503 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-12-05 15:52:45 +01:00
Alec S	2c174420f5	Reduce validation to a warning (#28749 ) Signed-off-by: Alec Solder <alecs@fb.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Alec Solder <alecs@fb.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-05 14:02:49 +00:00
Yi Liu	0d8a7d8a26	[Compressed Tensors] Add XPU `wNa16` support (#29484 ) Signed-off-by: yiliu30 <yi4.liu@intel.com>	2025-12-05 22:02:09 +08:00
Max Hu	c2894d3883	[Feature] Add Layer-wise NVTX Support (#29990 ) Signed-off-by: Max Hu <hyoung2991@gmail.com> Signed-off-by: Max Hu <maxhu@nvidia.com> Co-authored-by: Max Hu <maxhu@nvidia.com>	2025-12-05 11:20:07 +00:00
Zhiwei	3628bcaaf2	[ROCm][MXFP4] Infer w4a4 quant method in rocm aiter fused moe (#29775 ) Signed-off-by: ZhiweiYan-96 <zhiwei.yan@amd.com>	2025-12-05 11:01:16 +00:00
strinczer	b73b158ab0	[Bugfix] Fix parse_output_message crash on commentary with no recipient (#29972 ) Signed-off-by: Shai Trinczer <strinczer@icloud.com> Signed-off-by: strinczer <strinczer@icloud.com>	2025-12-05 10:51:12 +00:00
Ning Xie	7ae13c66ba	[typing] fix type (#29964 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-12-05 10:46:08 +00:00
Ming Yang	f16356fe36	[bench] Support common prefix len config (for decode-only bench) (#29934 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-12-05 10:26:52 +00:00
Alec S	65ee97288a	[BugFix] Adding env variable to disable async grammar compilation (#29996 ) Signed-off-by: Alec Solder <alecs@fb.com> Signed-off-by: Alec S <10566873+alecsolder@users.noreply.github.com> Co-authored-by: Alec Solder <alecs@fb.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-12-05 00:49:37 -08:00
Yanan Cao	62b3333448	[Frontend] Remove deprecated -O.xx flag (#29991 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-12-05 00:47:22 -08:00
amitz-nv	6038b1b04b	[Frontend][Model] Add 'float16' to possible mamba cache dtype values, override mamba SSM cache dtype value for NemotronH (#29978 ) Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>	2025-12-05 00:34:33 -08:00
Jingchun Gao	d698bb382d	[Bugfix] Correct num_q_heads on DCP for Flashinfer backends (#29487 ) Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com> Signed-off-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com> Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com>	2025-12-05 05:54:31 +00:00
Laith Sakka	5867819eaf	Do not guard during noop elimination pass (#30095 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-12-05 04:10:12 +00:00
Qiu	0098a6e3da	[PCP&DCP] move CUDAGraph check for PCP&DCP to the check func of platforms (#29952 ) Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-04 21:40:51 -05:00
Hubert de La Jonquiere	befb59e5b1	[Model] Add Holo2 reasoning parser (#30048 ) Signed-off-by: hdlj-h <hubert@hcompany.ai>	2025-12-05 10:38:45 +08:00
Alexander Matveev	4470ee2f90	[Perf] Enable separate shared_experts stream only for CUDA (#30085 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-12-05 00:03:17 +00:00
Laith Sakka	1f0d184590	[aot_compile]change VLLM backend to read fake args from example_value (#29104 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-12-04 17:33:45 -05:00
Lucas Wilkinson	c8ab988b15	[BugFix] Fix DBO assert `assert B_block_table == B_q` (#29933 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-12-04 14:48:54 -05:00
Peng-YM	48a5fff66e	[Bugfix] Missing tokens in `return_token_ids` when tool parsers is enabled in streaming mode (#29074 ) Signed-off-by: Peng-YM <1048217874pengym@gmail.com>	2025-12-04 19:09:39 +00:00
Mercykid-bash	1119f6e47a	Abstract eplb algo (#26471 ) Signed-off-by: Che Ruan <cr623@ic.ac.uk> Signed-off-by: mengxingkongzhouhan <117415539+mengxingkongzhouhan@users.noreply.github.com> Signed-off-by: Mercykid-bash <ruanche0218@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Che Ruan <cr623@ic.ac.uk> Co-authored-by: mengxingkongzhouhan <117415539+mengxingkongzhouhan@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-04 19:09:09 +00:00
Harry Mellor	e10c84e06a	Access `partial_rotary_factor` from `rope_parameters` (#29966 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-04 18:42:49 +00:00
Kuntai Du	ece2825a29	[KVConnector] Remove v0-related kv connector components such as kv pipe and kv lookup buffer (#29705 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>	2025-12-04 18:20:48 +00:00
Jee Jee Li	652ba93da3	[Bugfix] Fix FP8 MoE LoRA (#29890 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-12-04 18:17:49 +00:00
Tao Yun	6dcb07f676	support qwen3-vl handle requests with embeddings (#30037 ) Signed-off-by: taoyun <1069423820@qq.com> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-12-04 17:34:06 +00:00
Cyrus Leung	b286a311c2	[Chore] Deprecate `merge_by_field_config` arg (#30035 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-04 17:21:24 +00:00
Woosuk Kwon	cc050558f4	[Model Runner V2] Implement get_num_sampled_and_rejected kernel (#30029 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-12-04 07:19:42 -08:00
Harry Mellor	5c32a06a04	Use Transformers v5 RoPE standardisation and validation (#30046 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-04 14:54:28 +00:00
Yongtao Huang	dd97e047e0	Fix broken multiline assert in `LoRAModelManager.register_module` (#30032 ) Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com>	2025-12-04 22:04:42 +08:00
Harry Mellor	9998ea5b57	Delete HF version of Phi 4 MM (#30049 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-04 13:44:50 +00:00
wang.yuqi	74c4d80c6c	[Model][6/N] Improve all pooling task \| Support chunked prefill with ALL pooling (#27145 ) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-04 13:44:15 +00:00
Chauncey	6796ce8bdb	[Bugfix] Fix the issue with interleaved thinking when using streaming (#30033 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: Chauncey <chaunceyjiang@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-12-04 11:11:59 +00:00
Andreas Karatzas	e96a6a6dca	[ROCm][CI][Bugfix] Fixing the `Multi-Modal Models Test (Extended) 1` group (#30013 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2025-12-04 11:00:16 +00:00
Noa Neria	6366c098d7	Validating Runai Model Streamer Integration with S3 Object Storage (#29320 ) Signed-off-by: Noa Neria <noa@run.ai>	2025-12-04 18:04:43 +08:00
dtc	842aba501d	[P/D] Introduce Mooncake Transfer Engine as kv_connector (#24718 ) Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: dtc <dtcccc@linux.alibaba.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>	2025-12-04 09:51:36 +00:00
Arpit Khandelwal	dfdda96747	[Core] Remove forced None assignment for deprecated PassConfig flags (#29994 ) Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-04 09:15:04 +00:00
Xu Wenqing	ffdd18111b	Add DeepSeek-V3.2 tool parser. (#29848 ) Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>	2025-12-04 08:46:34 +00:00
Ye (Charlotte) Qi	b8a6ae4158	[ROCm] add fallback for aiter fp8 decode mla (#30005 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-12-04 08:45:57 +00:00

1 2 3 4 5 ...

8476 Commits