xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-22 23:57:19 +08:00

Author	SHA1	Message	Date
Andrew Xia	421125d03a	[ez] move harmony utils to parser folder (#30117 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-12-06 17:34:34 -05:00
Cyrus Leung	671427efbf	[Model] Move `multimodal_cpu_fields` definition to field config (#30181 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 13:40:02 +00:00
Viacheslav	21bb323542	Gigachat 3 tool parser and tests (#29905 ) Signed-off-by: Viacheslav Barinov <viacheslav.teh@gmail.com>	2025-12-06 12:04:14 +00:00
Chukwuma Nwaugha	17a9abec2b	simplify requires_files list creation (#29656 ) Signed-off-by: Chukwuma Nwaugha <nwaughac@gmail.com>	2025-12-06 09:42:41 +00:00
Ye (Charlotte) Qi	92c35abb24	[Misc] Fix circular import in vllm.transformers_utils.config (#30179 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-12-06 09:24:03 +00:00
Yu Jiaqi	43e7593031	Support tokenization_kwargs override (#29794 ) Signed-off-by: piood <2477084691@qq.com>	2025-12-06 09:12:53 +00:00
Cyrus Leung	c46b932df2	[Chore] Deprecate `SupportsMultiModal.merge_by_field_config` (#30170 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 07:57:28 +00:00
redwrasse	6476382384	prefix caching design doc sha256 now default (#29261 ) Signed-off-by: redwrasse <mail@redwrasse.io>	2025-12-06 07:39:56 +00:00
kx	d6aeaddf4a	[bugfix] fix type[AttentionBackend] bug in kv_connector_base_v1 (#30051 ) Signed-off-by: 01267596 <xiongkai123@cmbchina.com> Co-authored-by: 01267596 <xiongkai123@cmbchina.com>	2025-12-06 07:11:31 +00:00
Woosuk Kwon	a238cbd89d	[Model Runner V2] Support min-p sampling (#30171 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-12-05 21:42:47 -08:00
Nick Hill	4026ae31e9	[Misc] Move `disable_nccl_for_dp_synchronization` init logic into `VllmConfig` (#30161 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-05 20:59:04 -08:00
rasmith	b12f4a9830	[CI/Build][AMD] Use ROCM_ATTN instead of FLASH_ATTN test for test_register_kv_caches for ROCm and update test for TRITON_ATTN (#29985 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2025-12-05 20:57:38 -08:00
Rohan Potdar	40a046cd82	[Bugfix]: Fix `TokenizerLike` interface (#30009 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2025-12-05 20:56:40 -08:00
Peter Salas	e858bc4d14	[Model] Add support for transformer-based Ultravox v0.7 projector (#30089 ) Signed-off-by: Peter Salas <peter@fixie.ai>	2025-12-05 20:55:43 -08:00
Dongjie Zou	e3fbb6f152	fix#30092 Kimi-Linear model loading failure with missing indexer_rotary_emb (#30093 ) Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>	2025-12-05 20:55:09 -08:00
yuttian1	c4d62618ca	Fix AWQ MoE marlin check issue in marlin_utils.py for AMD backend (#30102 ) Signed-off-by: yuttian1 <yuttian@amd.com>	2025-12-05 20:54:38 -08:00
rasmith	62079d8600	[CI/Build][AMD] Skip marlin, machete, and hadacore tests since these require _C functions not defined for ROCm (#30109 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-06 12:54:17 +08:00
Harry Mellor	bf4a901af9	Better error when world size is larger than node and `distributed_executor_backend` is not set (#30140 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-05 20:53:52 -08:00
Samuel Shen	7e31c3a3f6	[CI]: Remove unnecessary imports from test_lmache_integration (#30157 ) Signed-off-by: Samuel Shen <slshen@uchicago.edu> Co-authored-by: Samuel Shen <slshen@uchicago.edu>	2025-12-06 12:53:34 +08:00
rasmith	dc839ad03d	[CI/Build][AMD][Quantization] Fix test_int8_kernel.py by updating int8_utils to use hip.libdevice.round (#30151 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-05 20:52:11 -08:00
Deboleina	02a4169193	[Tests] Tool call tests for openai/gpt-oss-20b (#26237 ) Signed-off-by: Debolina Roy <debroy@redhat.com>	2025-12-05 19:03:29 -08:00
Wentao Ye	7b5575fa7d	[Bug] Fix vLLM config is not set error (#29999 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-05 16:42:12 -05:00
Bangsheng Tang	77e4472809	let draft model follow target model's config_format (#30152 )	2025-12-05 13:33:42 -08:00
Divakar Verma	962d703818	[Bugfix][llama4_eagle] Fix missing 'lm_head' attribute (#29926 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-12-05 19:57:26 +00:00
Nicolò Lucchesi	e23ca3a0e8	[CI] Re-use whisper_client for all tests (#30148 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-05 19:47:37 +00:00
Russell Bryant	3633035a3f	[Misc] Rename CohereForAI references to CohereLabs (#30147 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-12-05 19:41:40 +00:00
Nicolò Lucchesi	bff78310d9	[Enc-Dec] Fix OOT tokenizer issue (#30144 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-05 19:23:33 +00:00
Tova Movshovitz	adb315060c	[KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache (#27170 ) Signed-off-by: tovam <tovam@pliops.com> Signed-off-by: Tova Movshovitz <tovam@pliops.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-05 18:33:26 +00:00
Ilya Markov	4e26d3b09e	[Compile] Conditional compilation. Introduce compile_ranges (#24252 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Luka Govedič <luka.govedic@gmail.com> Signed-off-by: ProExpertProg <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com>	2025-12-05 18:17:32 +00:00
Matthew Bonanni	66e674cdd5	[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-12-05 09:48:43 -08:00
Mark McLoughlin	dff0a2b394	[NIXL] Add remote_request_id to kv_transfer_params (#29665 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-12-05 09:43:48 -08:00
Nick Hill	dc264bcea1	[BugFix] Eagerly abort cancelled final-step requests (#29987 ) Currently, when requests are cancelled while executing their final step, "completion" is handled based on normal stop processing (e.g. length or stop token), so the abort has no effect. This is typically not a problem, but when a kv connector is involved it thinks the request completed successfully rather than being aborted. This is problematic for disaggregated prefill which will free kv cache blocks if the request was aborted but not if it completed successfully—since the cancelled request will never be sent to the decode side, kv cache blocks remain pinned until the fall-back timeout expires. The problem is exacerbated when many requests are cancelled and/or there are large prefills whose forward pass takes a long time (since the window is bigger). This PR fixes the problem by processing pending aborts immediately prior to processing model output each step; we process only aborts, not new requests, since it's preferable for latency to process model outputs before new incoming requests. Fixes #26400. Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-05 17:28:32 +00:00
Nicolò Lucchesi	78c44fd722	[NIXL] Small cleanup of unused variables (#29618 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-05 18:17:36 +01:00
Angela Yi	e7296b08da	[bugfix] Pass globals to aot_compiled function (#29428 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2025-12-05 16:54:26 +00:00
Andrew Xia	da7bc54ea8	[responsesAPI][5] ResponsesParser with tools for full MCP python loop (#29798 ) Signed-off-by: Andrew Xia <axia@fb.com> Signed-off-by: Andrew Xia <axia@meta.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-12-05 11:11:50 -05:00
Mark McLoughlin	949a6a19d2	[NIXL] Add compatibility checking to NIXL KV connector handshake (#29503 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-12-05 15:52:45 +01:00
Alec S	2c174420f5	Reduce validation to a warning (#28749 ) Signed-off-by: Alec Solder <alecs@fb.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Alec Solder <alecs@fb.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-05 14:02:49 +00:00
Yi Liu	0d8a7d8a26	[Compressed Tensors] Add XPU `wNa16` support (#29484 ) Signed-off-by: yiliu30 <yi4.liu@intel.com>	2025-12-05 22:02:09 +08:00
Elham	9843e332da	[CPU][Perf] Add fast vectorized exp impl from Arm Optimized Routines (#30068 ) Signed-off-by: Ubuntu <ubuntu@ip-10-252-30-150.eu-west-1.compute.internal> Signed-off-by: Elham Harirpoush <elham.harirpoush@arm.com> Co-authored-by: Ubuntu <ubuntu@ip-10-252-30-150.eu-west-1.compute.internal>	2025-12-05 13:09:20 +00:00
Harry Mellor	b7d85cf25c	[CI] Have pre-commit comment on a PR if pre-commit was not used (#30077 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-05 13:03:45 +00:00
Max Hu	c2894d3883	[Feature] Add Layer-wise NVTX Support (#29990 ) Signed-off-by: Max Hu <hyoung2991@gmail.com> Signed-off-by: Max Hu <maxhu@nvidia.com> Co-authored-by: Max Hu <maxhu@nvidia.com>	2025-12-05 11:20:07 +00:00
Zhiwei	3628bcaaf2	[ROCm][MXFP4] Infer w4a4 quant method in rocm aiter fused moe (#29775 ) Signed-off-by: ZhiweiYan-96 <zhiwei.yan@amd.com>	2025-12-05 11:01:16 +00:00
strinczer	b73b158ab0	[Bugfix] Fix parse_output_message crash on commentary with no recipient (#29972 ) Signed-off-by: Shai Trinczer <strinczer@icloud.com> Signed-off-by: strinczer <strinczer@icloud.com>	2025-12-05 10:51:12 +00:00
Ning Xie	7ae13c66ba	[typing] fix type (#29964 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-12-05 10:46:08 +00:00
Ming Yang	f16356fe36	[bench] Support common prefix len config (for decode-only bench) (#29934 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-12-05 10:26:52 +00:00
Alec S	65ee97288a	[BugFix] Adding env variable to disable async grammar compilation (#29996 ) Signed-off-by: Alec Solder <alecs@fb.com> Signed-off-by: Alec S <10566873+alecsolder@users.noreply.github.com> Co-authored-by: Alec Solder <alecs@fb.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-12-05 00:49:37 -08:00
Yanan Cao	62b3333448	[Frontend] Remove deprecated -O.xx flag (#29991 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-12-05 00:47:22 -08:00
rasmith	feecba09af	[CI/Build][AMD] Use float16 in test_reset_prefix_cache_e2e to avoid accuracy issues (#29997 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-05 08:42:25 +00:00
amitz-nv	6038b1b04b	[Frontend][Model] Add 'float16' to possible mamba cache dtype values, override mamba SSM cache dtype value for NemotronH (#29978 ) Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>	2025-12-05 00:34:33 -08:00
Tiger Xu / Zhonghu Xu	60a66ea2dc	[DOC]: Add kthena to integrations (#29931 ) Signed-off-by: Zhonghu Xu <xuzhonghu@huawei.com>	2025-12-05 08:11:03 +00:00

1 2 3 4 5 ...

12014 Commits