xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-04 18:07:15 +08:00

Author	SHA1	Message	Date
Matthew Bonanni	4c23690f43	[Attention] FlashAttention ViT support, make default backend (#28763 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-18 20:06:21 -08:00
Jialin Ouyang	40b6b38f2c	[Core] Switch Flat logprob control from environment variable to SamplingParams (#28914 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-11-19 02:10:02 +00:00
Kunshang Ji	2a2d5d2780	Replace `torch.cuda.Event` with `torch.Event` for better hardware compatibility (#26985 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-18 11:34:36 -08:00
vllmellm	0af3d4f0df	[FEAT] [AITER] [ROCm] integrate aiter sampling ops (#26084 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-11-18 17:28:34 +00:00
Nick Hill	da8dadf68b	[Minor] Rename `ec_producer` field to `is_ec_producer` (#28884 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-18 17:26:07 +00:00
Luciano Martins	c2612371ad	[Model] Add Gemma3 GGUF multimodal support (#27772 ) Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-18 08:56:29 -08:00
Nick Hill	439368496d	[BugFix] Fix PP/async scheduling with pooling models (#28899 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-18 00:20:45 -08:00
Zhuohan Li	552cac95b5	[Misc] Fix wrong comment in scheduler (#28880 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>	2025-11-17 15:32:22 -08:00
Bangsheng Tang	61485844fc	[BugFix] Corner case that could cause out-of-sync with external launcher mode and dp >1 (#28774 )	2025-11-17 15:22:11 -08:00
Nick Hill	7765e5ba75	[BugFix] Fix PP performance and PP kv connector output regression (#28768 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-17 14:08:50 -08:00
Ronald	d8874c61a5	[Core] Async Scheduling X Spec Decoding Compatibility (#24799 ) Signed-off-by: Ronald1995 <ronaldautomobile@163.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-11-17 12:16:20 -08:00
Lucas Wilkinson	64e39d667c	[BugFix] Temporary fix for IMA with MTP = 2 and full-cg (#28315 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-17 09:41:22 -05:00
Jae-Won Chung	d4acf518d0	[Metrics] Fix KV cache usage percent metric multiproc (#28792 ) The `vllm:kv_cache_usage_perc` Gauge metric is missing `multiprocess_mode="mostrecent"` and ends up returning ``` vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="277"} 0.0 vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="275"} 0.0 vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="273"} 0.6530455880475035 ... ``` The deprecated `vllm:gpu_cache_usage_perc` Gauge metric has `multiprocess_mode="mostrecent"`. Signed-off-by: Jae-Won Chung <jwnchung@umich.edu>	2025-11-17 09:54:15 +00:00
Li, Jiang	577bb34fff	[CPU][Bugfix] Fix _to_list in CPU model runner (#28824 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-11-17 07:47:24 +00:00
Xiake Sun	60e089f0b9	[ROCm][Qwen3-32B] Fix AITER MHA accuracy issue cause by #25763 (#28670 ) Signed-off-by: Xiake Sun <xiake.sun@amd.com>	2025-11-16 20:52:11 -08:00
Nick Hill	80b6080ddc	[BugFix] Fix async scheduling + chunked prefill + preemption (#28787 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-17 06:46:46 +08:00
Lucia Fang	b316ac6589	[V1] Support MP Executor for multi node distributed inference (#23691 ) Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Signed-off-by: Lucia Fang <fanglu@fb.com> Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-11-16 09:01:21 +00:00
wang.yuqi	a55b64635c	[Model] Allow users to control skip reading cache per request. (#28194 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com>	2025-11-16 00:04:50 -08:00
Lucas Wilkinson	be263f7645	[BugFix] Fix `AssertionError: DCP not support reorder_batch_threshold > 1 now.` (#28751 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-15 22:35:06 +00:00
Didier Durand	2bb4435cb7	[Doc]: fix typos in various files (#28567 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-11-15 19:27:50 +00:00
Eldar Kurtić	e439c784fa	Add support for Eagle with separate lm-head and embed_tokens layers (#28549 ) Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>	2025-11-15 06:12:02 -08:00
tingtinggithub	cb15ee28db	Allow Gemma3 to take image embeddings (#28483 ) Signed-off-by: tingtinggithub <streamttt@gmail.com>	2025-11-15 04:18:08 -08:00
Cyrus Leung	638e4196d1	[Misc] Make `SchedulerConfig.max_model_len` init-only (#28733 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-15 01:59:31 -08:00
Zhuohan Li	dd6ac1c2bb	[RL] [V1] Remove unused device argument from reset_kv_cache (#28766 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>	2025-11-14 23:59:42 -08:00
Cyrus Leung	98b4d389ed	[Redo] #26368 (#28771 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-14 22:47:41 -08:00
Nick Hill	ac86bff8cb	Revert "[Core] Performance: Use list[np.ndarray] instead of list[list… (#28773 )	2025-11-14 20:24:00 -08:00
Jialin Ouyang	186352b270	[Core] Performance: Use list[np.ndarray] instead of list[list[int]] for output tokens for GC optimization (#26368 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-14 16:04:04 -08:00
rasmith	ba041d980b	[Log] Save profiler results to file instead of stdout (#28144 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-14 23:26:39 +00:00
Laith Sakka	2e0ad629b0	Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch (#25110 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-11-14 14:11:10 -08:00
Cyrus Leung	e2741f6cbc	[Chore] Rename `SchedulerConfig.chunked_prefill_enabled` (#28735 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-14 18:39:57 +00:00
Nicolò Lucchesi	6f1e7f7226	[DisaggEverything] Tokens in<>out `/generate` endpoint (#24261 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-14 09:58:01 -07:00
Nicolò Lucchesi	96b23b8e3b	[Bugfix][Nixl] Fix kernel physical<>logical block_size issue (#28677 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-11-14 22:40:05 +08:00
Lucas Wilkinson	db56a59970	[BugFix] Fix FA3 IMA with FULL_AND_PIECEWISE and cascade attention (default) (#28702 )	2025-11-14 12:19:22 +00:00
Yong Hoon Shin	9324e10275	Fix KV sharing fast prefill with cudagraph enabled (#28537 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-14 11:53:42 +00:00
Jingchun Gao	4516d44b7f	[DCP] Support Decode Context Parallel (DCP) for GQA with Flashinfer (#25438 ) Signed-off-by: gaojc <1055866782@qq.com> Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com> Signed-off-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com> Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Co-authored-by: gaojingchun (A) <g00955623@china.huawei.com> Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com> Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com>	2025-11-14 11:24:10 +00:00
lyn610	ecf8230d4d	[Metrics] Log number of preempted requests (#28522 ) Add tracking and periodic logging for the number of preempted requests in the metrics logger. This helps monitor system behavior under load. Signed-off-by: Yining Liu <610lyn@gmail.com>	2025-11-14 09:47:45 +00:00
Nick Hill	bc3e43069a	[BugFix] Fix multi-modal async scheduling race condition (#28706 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-14 01:11:13 -08:00
Yan Ma	529cea343d	use default CCL_ZE_IPC_EXCHANGE (#28700 ) Signed-off-by: Yan Ma <yan.ma@intel.com>	2025-11-14 16:55:29 +08:00
Cyrus Leung	01bea115c4	[Misc] Remove `warn_for_unimplemented_methods` (#28613 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-14 11:10:10 +08:00
Wentao Ye	e64011f29a	[CI] Bug: Fix ci entrypoint pooling (#28684 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-13 14:19:35 -08:00
Qiu	968060c15a	[bugfix] correct local_chunk_len for DCP in reorg_kvcache with long context (#28526 ) Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-13 11:29:22 -08:00
elvischenv	5d6ce2b960	[Perf] Support stream interval for reducing host overhead (#27869 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-11-13 13:21:25 -05:00
Matthew Bonanni	f9f3b596f3	[Attention][Bugfix] Fix FA sink support (#28660 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-13 13:20:01 -05:00
Yannick Schnider	119c4927b3	[Bugfix] Fix validate model input for decoder models (#27099 ) Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com> Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-11-13 10:18:47 -08:00
Huamin Li	07a606aa7e	[CI Failure] Fix backend selection for encoder-only models (#28534 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-11-13 10:11:27 -05:00
Pleaplusone	8da2f28f53	[ROCm][BugFix]Fix `get_cu_count` in rocm_aiter_fa.py (#28618 ) Signed-off-by: ganyi <ygan@amd.com>	2025-11-13 14:18:20 +00:00
tjandy98	4504e8029b	[Bugfix] Prevent crash on empty grammar string (#28210 ) Signed-off-by: tjandy98 <3953059+tjandy98@users.noreply.github.com>	2025-11-13 06:42:29 +00:00
Pleaplusone	ca00b1bfc6	[ROCm][BugFix] Remove the usage of `device_info` from aiter (#28383 ) Signed-off-by: ganyi <ygan@amd.com>	2025-11-12 21:43:42 -08:00
Jialin Ouyang	a1d3866dda	[n-gen] DO NOT repeatedly return finished child requests (#28591 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-13 03:36:07 +00:00
Harry Mellor	97d1c99302	Rename clashing method names for vLLM model protocol (#27583 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-12 19:14:33 -08:00

1 2 3 4 5 ...

1657 Commits