xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-04-01 13:27:05 +08:00

Author	SHA1	Message	Date
jiangkuaixue123	92d38e41c8	Merge e7254d8994a4caf49e4cd08b604657b7ee8ae418 into 254f6b986720c92ddf97fbb1a6a6465da8e87e29	2025-12-25 00:06:54 +00:00
Michael Goin	8ee90c83f8	Add `--max-model-len auto` to auto-fit context to available memory (#29431 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-12-23 21:37:14 -08:00
Nick Hill	45c0526ac9	[BugFix] Handle errors when preprocessing added requests (#30895 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-19 01:29:11 +00:00
Alec	62be3670cb	[BugFix] Add sleep to fix tight loop and release GIL (#29476 ) Signed-off-by: alec-flowers <aflowers@nvidia.com> Signed-off-by: Alec <35311602+alec-flowers@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-18 09:52:55 -08:00
jiangkuaixue123	36f9c3d6b5	add log Signed-off-by: jiangkuaixue123 <jiangxiaozhou111@163.com>	2025-12-16 15:49:36 +08:00
jiangkuaixue123	eb2355c600	ffn server use vllm serve and dp Signed-off-by: jiangkuaixue123 <jiangxiaozhou111@163.com>	2025-12-16 15:49:36 +08:00
Jialin Ouyang	9f042ba26b	[Perf] Enable environment cache in EngineCore to enable the feature for UniProcExecutor as well (#29289 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-12-10 14:13:01 -05:00
Tova Movshovitz	adb315060c	[KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache (#27170 ) Signed-off-by: tovam <tovam@pliops.com> Signed-off-by: Tova Movshovitz <tovam@pliops.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-05 18:33:26 +00:00
Nick Hill	dc264bcea1	[BugFix] Eagerly abort cancelled final-step requests (#29987 ) Currently, when requests are cancelled while executing their final step, "completion" is handled based on normal stop processing (e.g. length or stop token), so the abort has no effect. This is typically not a problem, but when a kv connector is involved it thinks the request completed successfully rather than being aborted. This is problematic for disaggregated prefill which will free kv cache blocks if the request was aborted but not if it completed successfully—since the cancelled request will never be sent to the decode side, kv cache blocks remain pinned until the fall-back timeout expires. The problem is exacerbated when many requests are cancelled and/or there are large prefills whose forward pass takes a long time (since the window is bigger). This PR fixes the problem by processing pending aborts immediately prior to processing model output each step; we process only aborts, not new requests, since it's preferable for latency to process model outputs before new incoming requests. Fixes #26400. Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-05 17:28:32 +00:00
Zhuohan Li	d0cd728907	[Core] Support reseting all running requests' KV while calling `reset_prefix_cache` (#28827 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-02 02:25:05 +00:00
wang.yuqi	f4b76056ee	Improve enable chunked_prefill & prefix_caching logic. (#26623 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-27 22:05:48 -08:00
Jialin Ouyang	537cc635c7	[GC Debugger] Simply and improve GC Debugger Utils (#29029 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-20 00:10:22 +00:00
Qiu	2fd893b4ce	[Feature] Prefill Context Parallel (PCP) basic support (#28718 ) Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com> Signed-off-by: LookAround <lixushi@huawei.com> Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com> Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com> Co-authored-by: FENP <yuanyongjie.yyj@antgroup.com> Co-authored-by: LookAround <lixushi@huawei.com> Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com> Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com> Co-authored-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>	2025-11-19 15:52:44 -05:00
Nick Hill	da8dadf68b	[Minor] Rename `ec_producer` field to `is_ec_producer` (#28884 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-18 17:26:07 +00:00
Nick Hill	439368496d	[BugFix] Fix PP/async scheduling with pooling models (#28899 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-18 00:20:45 -08:00
Nick Hill	7765e5ba75	[BugFix] Fix PP performance and PP kv connector output regression (#28768 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-17 14:08:50 -08:00
Ronald	d8874c61a5	[Core] Async Scheduling X Spec Decoding Compatibility (#24799 ) Signed-off-by: Ronald1995 <ronaldautomobile@163.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-11-17 12:16:20 -08:00
Cyrus Leung	e2741f6cbc	[Chore] Rename `SchedulerConfig.chunked_prefill_enabled` (#28735 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-14 18:39:57 +00:00
Jialin Ouyang	b30372cbd0	[Perf] Move gc.freeze logic from EngineCoreProc to EngineCore for better coverage (#27896 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-10 15:34:18 -08:00
Wei Wei	bf6a3d0ff5	[Misc] Add more scoping for improved trace (#28329 ) Signed-off-by: Wei Wei <wwei6@meta.com>	2025-11-10 21:03:21 +00:00
Nick Hill	da786e339e	[Core] Rework handling of async scheduling config (#28250 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-07 20:01:23 +00:00
Nick Hill	0cdbe7b744	[Core] Async scheduling + structured outputs compatibility (#26866 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-01 00:35:04 +00:00
GuanLuo	d6517be3cd	[Bugfix] Missing NIXL metadata for handshake initialization if instance spans multi-node (#26338 ) Signed-off-by: Guan Luo <gluo@nvidia.com> Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com> Signed-off-by: Guan Luo <41310872+GuanLuo@users.noreply.github.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2025-10-31 10:16:00 -07:00
Wentao Ye	c01f6e525f	[CI] Fix mypy for `vllm/v1/core` and `vllm/v1/engine` (#27108 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-30 11:32:17 +00:00
Wentao Ye	52efc34ebf	[Log] Optimize Startup Log (#26740 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-24 19:27:04 -04:00
dongbo910220	a0003b56b0	[Chore] Separate out system utilities from vllm.utils (#27201 ) Signed-off-by: dongbo910220 <1275604947@qq.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-10-22 20:25:25 +00:00
Nicolò Lucchesi	4dfdb821c8	[P/D] Dynamic `kv_output_aggregator` collect size (#26734 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-10-22 18:07:58 +02:00
Nick Hill	647214f3d5	[V0 Deprecation] Remove V0 executors (#27142 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-21 11:09:37 -07:00
iAmir97	7a6c8c3fa1	[Chore] Separate out `vllm.utils.network_utils` (#27164 ) Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com> Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com>	2025-10-19 03:06:32 -07:00
dongbo910220	8a297115e2	[Chore] Separate out hashing utilities from vllm.utils (#27151 ) Signed-off-by: dongbo910220 <1275604947@qq.com>	2025-10-19 11:09:38 +08:00
Nick Hill	fe3b9372ad	[Core] Change `execute_model_with_error_logging()` to be a ctx manager (#27060 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-17 11:45:32 +08:00
Cyrus Leung	4d4d6bad19	[Chore] Separate out `vllm.utils.importlib` (#27022 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-17 00:48:59 +00:00
Jialin Ouyang	380f17527c	[Perf] Cache vllm.env.__getattr__ result to avoid recomputation (#26146 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-10-14 17:03:21 -04:00
Wentao Ye	e251e457c5	[Log] Optimize Startup Log (#26601 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-14 02:06:57 +08:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Chen Zhang	606b00e80f	[bugfix][DCP] fix block_size of hash in DCP prefix caching (#26296 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-10-10 03:02:49 -07:00
Cyrus Leung	ad430a67ca	[Metrics] Log multi-modal cache stats and fix reset (#26285 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-10 01:45:55 -07:00
Ayush Satyam	5e65d6b2ad	fix[DP][v1]: Prevent hangs from mismatched worker configurations (#26218 ) Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com>	2025-10-07 22:55:08 -07:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Jialin Ouyang	c216119d64	[Core] GC Debug callback (#24829 ) Signed-off-by: Jialin Ouyang <jialino@meta.com> Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Co-authored-by: Jialin Ouyang <jialino@meta.com>	2025-09-27 17:53:31 +00:00
Chen Zhang	9607d5eb44	[Hybrid Allocator] Support full attention with different hidden size (#25101 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-09-19 23:43:59 -07:00
Chao Lei	8de261b04a	[P/D]`kv_output_aggregator` support P TP > D TP (#23917 ) Signed-off-by: LCAIZJ <leichao139636@163.com> Co-authored-by: leichao.lc <leichao.lc@antgroup.com>	2025-09-15 11:36:06 +02:00
Chen Zhang	8e5cdcda4e	[Hybrid Allocator] Support Pipeline Parallel (#23974 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-09-14 15:55:17 -07:00
Nick Hill	4fdd6f5cbf	[Core] Support async scheduling with uniproc executor (#24219 ) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Ronald1995 <ronaldautomobile@163.com> Co-authored-by: Ronald1995 <ronaldautomobile@163.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-09-12 16:34:28 -07:00
dongluw	a5b84f1cbf	[Core] Shared memory based object store for Multimodal data caching and IPC (#20452 ) Signed-off-by: donglu <donglu@cohere.com>	2025-09-12 07:54:17 -07:00
22quinn	0cdd213641	[Misc] Improve Worker process title and logging prefix (#22205 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-09-08 21:43:48 -07:00
Didier Durand	35bf193864	[Doc]: fix typos in Python comments (#24294 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-05 19:41:12 -07:00
Nick Hill	e41a0fa377	[Perf] Freeze core engine proc heap after init (#24008 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-09-04 22:55:23 +08:00
Nick Hill	d90d8eb674	[BugFix] Async scheduling and PP compatibility with DP (#23770 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-29 08:17:27 -07:00
Flora Feng	69f46359dd	[Multimodal] Consolidate mm inputs into MultiModalFeatureSpec (#23779 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2025-08-29 18:36:57 +08:00

1 2 3 4

170 Commits