xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-23 07:05:01 +08:00

Author	SHA1	Message	Date
shivampr	8580919ac3	[Bugfix] fix confusing OOM errors during v1 init (#28051 ) Signed-off-by: Shivam <shivamprasad91@gmail.com> Signed-off-by: shivampr <shivampr.dev@gmail.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-12-10 23:17:41 +00:00
Or Ozeri	4c6fd25880	kv_transfer: Rename the shared storage connectors (#30201 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-12-08 20:46:09 -08:00
Cyrus Leung	e83b7e379c	Revert "[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 )" (#30199 )	2025-12-07 00:00:22 -08:00
Cyrus Leung	27f4c2fd46	[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 23:15:42 -08:00
Nick Hill	dc264bcea1	[BugFix] Eagerly abort cancelled final-step requests (#29987 ) Currently, when requests are cancelled while executing their final step, "completion" is handled based on normal stop processing (e.g. length or stop token), so the abort has no effect. This is typically not a problem, but when a kv connector is involved it thinks the request completed successfully rather than being aborted. This is problematic for disaggregated prefill which will free kv cache blocks if the request was aborted but not if it completed successfully—since the cancelled request will never be sent to the decode side, kv cache blocks remain pinned until the fall-back timeout expires. The problem is exacerbated when many requests are cancelled and/or there are large prefills whose forward pass takes a long time (since the window is bigger). This PR fixes the problem by processing pending aborts immediately prior to processing model output each step; we process only aborts, not new requests, since it's preferable for latency to process model outputs before new incoming requests. Fixes #26400. Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-05 17:28:32 +00:00
Lumis Chen	9bcf92295a	[Core] Add xxHash as a high-performance hash option for accelerating prefix caching (#29163 ) Signed-off-by: LuminolT <lumischen01@gmail.com> Signed-off-by: Lumis Chen <lumischen01@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-12-03 16:06:57 +00:00
Harry Mellor	951445a52d	Remove default values from `InitVar`s so that they're not stored (#29859 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-02 12:16:37 +00:00
Cyrus Leung	34a984274e	[Misc] Refactor tokenizer interface (#29693 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-29 04:02:21 -08:00
Cyrus Leung	8d9338fae4	[Chore] Rename `Processor` to `InputProcessor` (#29682 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-28 09:35:41 -08:00
Cyrus Leung	e2741f6cbc	[Chore] Rename `SchedulerConfig.chunked_prefill_enabled` (#28735 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-14 18:39:57 +00:00
elvischenv	5d6ce2b960	[Perf] Support stream interval for reducing host overhead (#27869 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-11-13 13:21:25 -05:00
Jialin Ouyang	a1d3866dda	[n-gen] DO NOT repeatedly return finished child requests (#28591 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-13 03:36:07 +00:00
Chenguang Zheng	4ccffe561f	[Core] Encoder separation for Encode-Prefill-Decode Disaggregation (#25233 ) Signed-off-by: n00909098 <nguyen.kha.long@huawei.com> Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> Signed-off-by: herotai214 <herotai214@gmail.com> Signed-off-by: Khuong Le <khuong.le.manh@huawei.com> Signed-off-by: Khuong Le <lemanhkhuong2611@gmail.com> Co-authored-by: n00909098 <nguyen.kha.long@huawei.com> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: herotai214 <herotai214@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Khuong Le <khuong.le.manh@huawei.com> Co-authored-by: Khuong Le <lemanhkhuong2611@gmail.com>	2025-11-11 18:58:33 -08:00
Jialin Ouyang	4228be7959	[Perf] Use np.ndarray instead of list[list[int]] to reduce GC overhead (#28245 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-11 10:28:47 -08:00
Mark McLoughlin	6f7de33bed	[Metrics] Refactor LoRA state tracking (#26801 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-11-10 16:34:36 +08:00
Nick Hill	da786e339e	[Core] Rework handling of async scheduling config (#28250 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-07 20:01:23 +00:00
Zhewen Li	0b8e871e5e	[CI/Build] Fix `test_defaults_with_usage_context` in AMD CI (#27926 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-11-05 15:40:24 -08:00
wangxiyuan	428bc7bf1c	[V0 deprecation] Remove VLLM_USE_V1 usage in most modules (#27955 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-04 20:51:16 -08:00
Nick Hill	0cdbe7b744	[Core] Async scheduling + structured outputs compatibility (#26866 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-01 00:35:04 +00:00
Yeshwanth N	71b1c8b667	[Chore]:Extract math and argparse utilities to separate modules (#27188 ) Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com> Signed-off-by: Yeshwanth N <yeshsurya@gmail.com> Signed-off-by: yeshsurya <yeshsurya@gmail.com>	2025-10-26 04:03:32 -07:00
Nick Hill	647214f3d5	[V0 Deprecation] Remove V0 executors (#27142 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-21 11:09:37 -07:00
Isotr0py	6ac5e06f7c	[Chore] Clean up pytorch helper functions in `vllm.utils` (#26908 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: isotr0py <2037008807@qq.com>	2025-10-18 09:48:22 -07:00
Tahsin Tunan	43721bc67f	[CI] Replace large models with tiny alternatives in tests (#24057 ) Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-16 15:51:27 +01:00
Cyrus Leung	f93e348010	[Misc] Remove `isort` and `yapf` ignores (#26888 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-15 12:09:03 +00:00
Lucia Fang	8317f72354	[Misc][DP] support customized aggregated logger for dp (#24354 ) Signed-off-by: Lu Fang <fanglu@fb.com>	2025-10-13 17:45:59 -07:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Cyrus Leung	4bdf7ac593	[Bugfix] Fix SHM cache initialization (#26427 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-09 02:48:04 -07:00
Cyrus Leung	1e4ecca1d0	[V0 Deprecation] Remove `VLLM_USE_V1` from tests (#26341 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-07 15:42:31 +00:00
Cyrus Leung	391612e78b	[Frontend] Consolidate tokenizer init code (#26276 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-06 09:34:52 +00:00
Harry Mellor	1c0c68202c	Fix per file ruff ignores related to typing (#26254 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 16:37:55 +00:00
Harry Mellor	557b2e961d	Remove all cases of `fmt: on/off` (#26253 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 09:18:14 -07:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Yongye Zhu	fa7e254a7f	[New Model] DeepSeek-V3.2 (Rebased to Main) (#25896 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Signed-off-by: Lucia Fang <fanglu@meta.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com> Co-authored-by: Lucia Fang <fanglu@meta.com> Co-authored-by: NickLucche <nlucches@redhat.com> Co-authored-by: Siyuan Fu <siyuanf@nvidia.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Xiaozhu Meng <mxz297@gmail.com> Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>	2025-09-30 17:14:41 +08:00
wang.yuqi	fe6b19c314	[Bugfix] Properly abort pooling request. (#25734 ) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-09-26 05:47:34 -07:00
Woosuk Kwon	0ff8ebb2d7	[V0 Deprecation] Remove async_output_proc, preemption mode, delay factor (#25334 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-21 08:52:32 -07:00
Woosuk Kwon	26e673fe93	[V0 Deprecation] Remove V0 Sequence class & Sampler (#25332 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-09-21 08:52:15 -07:00
Nick Hill	535d80056b	[Misc] Support more collective_rpc return types (#25294 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-09-20 02:02:38 +00:00
Aaron Pham	29283e8976	[Chore] Cleanup guided namespace, move to structured outputs config (#22772 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-18 09:20:27 +00:00
Zhuohan Li	6c47f6bfa4	[Core] Remove tokenizer group in vLLM (#24078 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>	2025-09-17 08:42:59 +00:00
Harry Mellor	c4afdb69cc	Move `MultiModalConfig` from `config/__init__.py` to `config/multimodal.py` (#24659 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-15 17:43:16 +00:00
Nick Hill	4fdd6f5cbf	[Core] Support async scheduling with uniproc executor (#24219 ) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Ronald1995 <ronaldautomobile@163.com> Co-authored-by: Ronald1995 <ronaldautomobile@163.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-09-12 16:34:28 -07:00
Chenheli Hua	009d689b0c	[Core] Simplify and unify mm uuid handling & auto-generated mm hash overrides processing. (#24271 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>	2025-09-09 21:36:09 -07:00
Zebing Lin	82dfb12e52	[Core] Use sha256 bytes instead of BlockHash to reduce GC overhead (#23673 ) Signed-off-by: linzebing <linzebing1995@gmail.com>	2025-09-08 21:34:37 -07:00
Seiji Eicher	60b755cbcb	[Misc] Have AsyncLLM `custom_stat_loggers` extend default logger list (#20952 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-09-04 14:25:30 -07:00
Roger Wang	749be00a98	[Core][Multimodal] Allow passing `multi_modal_uuids` as multimodal identifiers. (#23394 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-08-30 18:01:22 -07:00
Nick Hill	d90d8eb674	[BugFix] Async scheduling and PP compatibility with DP (#23770 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-29 08:17:27 -07:00
Flora Feng	69f46359dd	[Multimodal] Consolidate mm inputs into MultiModalFeatureSpec (#23779 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2025-08-29 18:36:57 +08:00
Nick Hill	ad0297d113	[Misc] Support passing multiple request ids at once to `AsyncLLM.abort()` (#22944 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-15 17:00:36 -07:00
Nick Hill	b9dc9d2607	[BugFix] Handle case where async utility call is cancelled (#22996 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Yinghai Lu <yinghai@thinkingmachines.ai>	2025-08-15 17:38:42 -06:00
Nick Hill	ebcce2cd36	[Core] Return final response for aborted requests from `AsyncLLM.generate` (#22283 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-14 14:49:02 -07:00

1 2 3

137 Commits