xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-22 16:27:15 +08:00

Author	SHA1	Message	Date
Nick Hill	ad0297d113	[Misc] Support passing multiple request ids at once to `AsyncLLM.abort()` (#22944 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-15 17:00:36 -07:00
Yong Hoon Shin	3e2f7985a2	Support multiple attention groups for KV sharing (#22672 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-08-15 16:54:10 -07:00
Or Ozeri	c280066f9d	[v1] Move block_hashes from KVCacheManager to Request.block_hashes (#19728 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-08-15 16:52:52 -07:00
Nick Hill	b9dc9d2607	[BugFix] Handle case where async utility call is cancelled (#22996 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Yinghai Lu <yinghai@thinkingmachines.ai>	2025-08-15 17:38:42 -06:00
fhl2000	74f441f4b5	[Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer (#20059 ) Signed-off-by: fhl <2410591650@qq.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-08-15 10:01:39 -04:00
Thomas Parnell	75531a6c13	[V1] [Hybrid] Support using float32 for state in Hybrid Models (Mamba2, Mamba1, Minimax) (#22928 ) Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com> Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Daniel Afrimi <danielafrimi8@gmail.com> Co-authored-by: Burkhard Ringlein <ngl@zurich.ibm.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-08-15 12:57:06 +00:00
Asaf Joseph Gardin	3d232dbd19	[Mamba] - refactor: Renamed mamba_attn to mamba2_attn (#22818 ) Signed-off-by: asafg <asafg@ai21.com> Co-authored-by: asafg <asafg@ai21.com>	2025-08-15 06:38:05 +00:00
Nick Hill	ae05a6d83d	[BugFix] Fix port lookup in internal DP LB tests (#22252 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-15 11:17:11 +08:00
Nick Hill	ebcce2cd36	[Core] Return final response for aborted requests from `AsyncLLM.generate` (#22283 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-14 14:49:02 -07:00
Lucas Wilkinson	b8ff05361a	[CI] Temporarily disable flaky test (#22930 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-08-14 19:59:16 +00:00
Jialin Ouyang	31a500c86f	[Core] [N-gram SD Optimization][1/n] Propose tokens with a single KMP (#22437 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-08-13 14:44:06 -07:00
Cyrus Leung	b4b78d6317	[CI/Build] Fix param mismatch in `test_eagle_correctness` (#22847 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-13 10:55:25 -07:00
Nicolò Lucchesi	12817a8ac7	[CI] Fix `tests/v1/e2e/test_kv_sharing_fast_prefill.py` import on test (#22815 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-08-13 10:35:50 -07:00
Cyrus Leung	19b927e52d	[Core] Use individual MM items in P0/P1 cache and model runner (#22570 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-13 07:18:07 -07:00
Nicolò Lucchesi	6b794c756c	[Nixl][CI] Fix tests (#22806 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-08-13 06:03:53 -07:00
Giancarlo Delfin	d94e3026de	[V1] Add tree drafting tests for eagle spec decoding (#22705 ) Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>	2025-08-13 04:11:28 -07:00
Woosuk Kwon	71683ca6f6	[V0 Deprecation] Remove multi-step scheduling (#22138 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-08-12 20:18:39 -07:00
Nicolò Lucchesi	422f22e012	[CI][Nixl] Check kv cache layout during handshake (#22745 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-08-12 12:53:52 -07:00
Nicolò Lucchesi	3d9d40efde	[Bugfix][CI] Fix `test_remote_decode_lifecycle.py::test_short_prompt_lifecycle` (#22727 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-08-12 07:30:17 -07:00
phantomlei	bc8372efc3	[Bugfix] Fix erroneous randomly generated cases in bad word testing (#22170 ) Signed-off-by: phantomlei <phantomlei3@gmail.com>	2025-08-12 02:03:22 -07:00
Michael Goin	93d0652433	[CI] Increase timeout for test_completion_with_image_embeds (#22670 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-11 20:31:36 -07:00
TJian	65abe111a3	[CI] Skip Tree Attn Test in `test_max_len.py` to unblock CI (#22664 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-08-11 10:36:05 -07:00
22quinn	807d21b80d	[BugFix] [Spec Decode] Remove LlamaForCausalLMEagle3 to fix CI (#22611 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-08-11 10:31:36 -07:00
GuanLuo	16fb668b61	fix: NIXL connector transfers partial block to pass full multi-modal context (#21074 ) Signed-off-by: GuanLuo <gluo@nvidia.com>	2025-08-11 09:40:55 -07:00
Nick Hill	5898b135ab	[BugFix] Fix KVConnectorOutput TPU breakage (#22598 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-10 19:33:48 -07:00
Chengji Yao	2a84fb422f	[TPU] kv cache update kernel doesn't need to be padded slices to multiple of num_slices_per_block (#22394 ) Signed-off-by: Chengji Yao <chengjiyao@gmail.com> Co-authored-by: Chengji Yao <chengjiyao@gmail.com>	2025-08-09 20:49:04 -07:00
Le Chen	3d7363e61c	[Config] add "qwen" as a native eagle3 target supported model (#22333 ) Signed-off-by: lechen <lecself@163.com> Signed-off-by: LeChen <lecself@163.com>	2025-08-09 20:21:05 -07:00
Kyuyeun Kim	9a0c5ded5a	[TPU] Add support for online w8a8 quantization (#22425 ) Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>	2025-08-08 23:12:54 -07:00
Chauncey	17eaaef595	[Bugfix] Fix RuntimeError: Index put requires the source and destination dtypes match (#22065 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-08-07 19:20:21 -07:00
TJian	1ee5ead5f8	[ROCm] [V1] [SpecDec] Enable Speculative Decoding on ROCm V1 Engine (#21496 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-08-07 19:13:17 -07:00
Harry Mellor	7e3a8dc906	Remove `from_dict` from `SpeculativeConfig` (#22451 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-07 10:13:04 -07:00
Chen Zhang	4815b00f54	[gpt-oss] Generate ResponseOutputItem from Harmony Message (#22410 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-08-07 08:33:25 -07:00
Michael Goin	a00d8b236f	Use float32 for test_completion.py (#22385 ) Signed-off-by: Michael Goin <mgoin64@gmail.com>	2025-08-07 11:07:47 +08:00
Lucas Wilkinson	1dc8a70b6d	[Attention] Support multiple attention metadata builders per kv_cache_spec + proper local attention no hybrid kv cache fix (#21588 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-08-06 18:40:52 -07:00
Asaf Joseph Gardin	46a13949d5	[v1] - Mamba1 Attention Metadata (#21249 ) Signed-off-by: asafg <asafg@ai21.com> Co-authored-by: asafg <asafg@ai21.com>	2025-08-06 17:03:42 -07:00
Giancarlo Delfin	469b3ffaaa	[V1] port xformers backend to v1 (#21342 ) Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>	2025-08-05 10:04:46 -07:00
Nicolò Lucchesi	0c275ad5ad	[V0 Deprecation][TPU] Remove V1 flag check from tests (#22248 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-08-05 06:53:23 -07:00
Giancarlo Delfin	5ea71ff46f	[V1] reduce block size for tree attention correctness test to fix 'ou… (#22207 ) Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>	2025-08-04 19:11:06 -07:00
Woosuk Kwon	7175817637	Revert "[Bugfix] V1 Fix the cursor leakage issue during request scheduling." (#22223 )	2025-08-04 18:37:06 -07:00
PiteXChen	2dffac464c	[Bugfix] V1 Fix the cursor leakage issue during request scheduling. (#21173 ) Signed-off-by: CLFutureX <775523362@qq.com>	2025-08-04 18:34:10 -07:00
22quinn	54de71d0df	[Sampler] Support returning all logprobs or logits (#21792 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-08-04 03:04:12 -07:00
Tyler Michael Smith	8ecb3e9e93	[CI Bugfix] Fix wNa16 kernel not found for test_shared_storage_connector_hashes (#22163 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-08-03 22:19:04 -07:00
Giancarlo Delfin	aa7012eb6d	Add tree attention backend for v1 (part 1) (#20401 ) Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>	2025-08-03 22:13:26 -07:00
Abirdcfly	0d7db16a92	[PD] add test for chat completions endpoint (#21925 ) Signed-off-by: Abirdcfly <fp544037857@gmail.com>	2025-08-03 19:57:03 -07:00
Woosuk Kwon	6d98843b31	[Responses API] Disable response store by default (#22137 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-08-03 04:04:21 -07:00
David Ben-David	aefeea0fde	[V1] [P/D] Refactor KV Connector Path (#21980 ) Signed-off-by: David Ben-David <davidb@pliops.com> Co-authored-by: David Ben-David <davidb@pliops.com>	2025-08-03 04:03:40 -07:00
Roger Wang	067c34a155	docs: remove deprecated disable-log-requests flag (#22113 ) Signed-off-by: Roger Wang <hey@rogerw.me>	2025-08-02 00:19:48 -07:00
Yong Hoon Shin	8564dc9448	Fix test_kv_sharing_fast_prefill flakiness (#22038 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-08-01 23:55:34 -07:00
Sage Moore	0edaf752d7	[Attention][DBO] Add support for "splitting" the CommonAttentionMetadata (#21153 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-08-01 19:47:53 -07:00
rongfu.leng	b879ecd6e2	[Bugfix] fix when skip tokenizer init (#21922 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-08-01 10:09:36 -07:00

1 2 3 4 5 ...

424 Commits