xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-23 20:07:22 +08:00

Author	SHA1	Message	Date
Roger Wang	b5d34af328	[Bugfix] Fix scheduling when repeated images in one request (#23544 ) Signed-off-by: Roger Wang <hey@rogerw.me> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.me> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>	2025-08-26 09:46:28 +00:00
Driss Guessous	e0329ed4b4	Updates to Flex + VLLm integration (#21416 ) Signed-off-by: drisspg <drisspguessous@gmail.com>	2025-08-25 09:32:42 -04:00
Ayush Satyam	5c4b6e66fe	[Attention] Unify mamba and attention backend selection (#23171 ) Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com>	2025-08-25 09:09:36 +00:00
Chenguang Zheng	d765cf01fe	[Core][Multimodal] Track encode cache entries by mm_hash and enable embedding sharing between requests (#22711 ) Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-08-25 00:41:17 -07:00
Noam Gat	39971db3aa	Frontend: Adding LM Format Enforcer support to V1 engine (#22564 ) Signed-off-by: Noam Gat <noamgat@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-08-24 19:31:22 -07:00
Yong Hoon Shin	b6d7d34fc6	Add unit tests for batched guided and non-guided requests (#23389 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-08-22 10:31:24 -07:00
Flora Feng	53415653ff	[P/D][Nixl] Make kv cache register compatible with hybrid memory allocator (#23079 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2025-08-21 22:30:48 -07:00
Chen Zhang	17373dcd93	[Attention] Refactor AttentionMetadata Preparation for Encoder-only Models (#23154 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-08-22 05:05:59 +00:00
Arjun Reddy	111692bb8c	[CI] Add end-to-end V1 min_tokens test coverage (#22495 ) Signed-off-by: Arjun Reddy <189282188+arjunbreddy22@users.noreply.github.com> Co-authored-by: Arjun Reddy <189282188+arjunbreddy22@users.noreply.github.com>	2025-08-21 22:04:07 -06:00
Cyrus Leung	8896eb72eb	[Deprecation] Remove `prompt_token_ids` arg fallback in `LLM.generate` and `LLM.embed` (#18800 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-22 10:56:57 +08:00
22quinn	480bdf5a7b	[Core] Support custom executor qualname (#23314 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-08-22 09:40:54 +08:00
Kebe	5368f76855	[Feature][Responses API] Support logprobs(non-stream) (#23319 ) Signed-off-by: Kebe <mail@kebe7jun.com>	2025-08-21 23:09:16 +00:00
22quinn	f571ff8eb6	[Sampler] Support returning final logprobs (#22387 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-08-20 21:28:32 -07:00
Matthew Bonanni	10cc12ba66	Feature/mla tests (#23195 ) Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-08-20 21:46:47 +00:00
Woosuk Kwon	d6d13bd49e	[Misc] Add max_seq_len to CommonAttentionMetadata (#23216 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-08-20 09:05:29 -07:00
Xin Yang	83e69a09d6	[Model] Support deepseek with eagle (#21086 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2025-08-20 19:01:31 +08:00
Woosuk Kwon	c9b38be8aa	[Spec Decode] Make `propose_draft_token_ids` non-blocking for lower TTFT (#23041 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-08-18 17:20:38 -07:00
Cyrus Leung	27e8d1ea3e	[Refactor] Define MultiModalKwargsItems separate from MultiModalKwargs (#23053 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-18 09:52:00 +00:00
Cyrus Leung	5c32143b9d	[Refactor] Defer tensor data construction in MultiModalKwargs (#23030 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-16 21:05:50 -07:00
afeldman-nm	bf7f470b22	[V1] Logits processors extensibility (#19912 ) Signed-off-by: Andrew Feldman <afeldman@redhat.com> Signed-off-by: Andrew Feldman <afeld2012@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Andrew Feldman <afeld2012@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-16 12:59:17 -07:00
Cyrus Leung	4dff91c93d	[Refactor] Allow optional MultiModalKwargsItem in IPC (#23022 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-16 11:30:49 +00:00
Nick Hill	ad0297d113	[Misc] Support passing multiple request ids at once to `AsyncLLM.abort()` (#22944 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-15 17:00:36 -07:00
Yong Hoon Shin	3e2f7985a2	Support multiple attention groups for KV sharing (#22672 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-08-15 16:54:10 -07:00
Or Ozeri	c280066f9d	[v1] Move block_hashes from KVCacheManager to Request.block_hashes (#19728 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-08-15 16:52:52 -07:00
Nick Hill	b9dc9d2607	[BugFix] Handle case where async utility call is cancelled (#22996 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Yinghai Lu <yinghai@thinkingmachines.ai>	2025-08-15 17:38:42 -06:00
fhl2000	74f441f4b5	[Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer (#20059 ) Signed-off-by: fhl <2410591650@qq.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-08-15 10:01:39 -04:00
Thomas Parnell	75531a6c13	[V1] [Hybrid] Support using float32 for state in Hybrid Models (Mamba2, Mamba1, Minimax) (#22928 ) Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com> Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Daniel Afrimi <danielafrimi8@gmail.com> Co-authored-by: Burkhard Ringlein <ngl@zurich.ibm.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-08-15 12:57:06 +00:00
Asaf Joseph Gardin	3d232dbd19	[Mamba] - refactor: Renamed mamba_attn to mamba2_attn (#22818 ) Signed-off-by: asafg <asafg@ai21.com> Co-authored-by: asafg <asafg@ai21.com>	2025-08-15 06:38:05 +00:00
Nick Hill	ae05a6d83d	[BugFix] Fix port lookup in internal DP LB tests (#22252 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-15 11:17:11 +08:00
Nick Hill	ebcce2cd36	[Core] Return final response for aborted requests from `AsyncLLM.generate` (#22283 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-14 14:49:02 -07:00
Lucas Wilkinson	b8ff05361a	[CI] Temporarily disable flaky test (#22930 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-08-14 19:59:16 +00:00
Jialin Ouyang	31a500c86f	[Core] [N-gram SD Optimization][1/n] Propose tokens with a single KMP (#22437 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-08-13 14:44:06 -07:00
Cyrus Leung	b4b78d6317	[CI/Build] Fix param mismatch in `test_eagle_correctness` (#22847 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-13 10:55:25 -07:00
Nicolò Lucchesi	12817a8ac7	[CI] Fix `tests/v1/e2e/test_kv_sharing_fast_prefill.py` import on test (#22815 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-08-13 10:35:50 -07:00
Cyrus Leung	19b927e52d	[Core] Use individual MM items in P0/P1 cache and model runner (#22570 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-13 07:18:07 -07:00
Nicolò Lucchesi	6b794c756c	[Nixl][CI] Fix tests (#22806 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-08-13 06:03:53 -07:00
Giancarlo Delfin	d94e3026de	[V1] Add tree drafting tests for eagle spec decoding (#22705 ) Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>	2025-08-13 04:11:28 -07:00
Woosuk Kwon	71683ca6f6	[V0 Deprecation] Remove multi-step scheduling (#22138 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-08-12 20:18:39 -07:00
Nicolò Lucchesi	422f22e012	[CI][Nixl] Check kv cache layout during handshake (#22745 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-08-12 12:53:52 -07:00
Nicolò Lucchesi	3d9d40efde	[Bugfix][CI] Fix `test_remote_decode_lifecycle.py::test_short_prompt_lifecycle` (#22727 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-08-12 07:30:17 -07:00
phantomlei	bc8372efc3	[Bugfix] Fix erroneous randomly generated cases in bad word testing (#22170 ) Signed-off-by: phantomlei <phantomlei3@gmail.com>	2025-08-12 02:03:22 -07:00
Michael Goin	93d0652433	[CI] Increase timeout for test_completion_with_image_embeds (#22670 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-11 20:31:36 -07:00
TJian	65abe111a3	[CI] Skip Tree Attn Test in `test_max_len.py` to unblock CI (#22664 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-08-11 10:36:05 -07:00
22quinn	807d21b80d	[BugFix] [Spec Decode] Remove LlamaForCausalLMEagle3 to fix CI (#22611 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-08-11 10:31:36 -07:00
GuanLuo	16fb668b61	fix: NIXL connector transfers partial block to pass full multi-modal context (#21074 ) Signed-off-by: GuanLuo <gluo@nvidia.com>	2025-08-11 09:40:55 -07:00
Nick Hill	5898b135ab	[BugFix] Fix KVConnectorOutput TPU breakage (#22598 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-10 19:33:48 -07:00
Chengji Yao	2a84fb422f	[TPU] kv cache update kernel doesn't need to be padded slices to multiple of num_slices_per_block (#22394 ) Signed-off-by: Chengji Yao <chengjiyao@gmail.com> Co-authored-by: Chengji Yao <chengjiyao@gmail.com>	2025-08-09 20:49:04 -07:00
Le Chen	3d7363e61c	[Config] add "qwen" as a native eagle3 target supported model (#22333 ) Signed-off-by: lechen <lecself@163.com> Signed-off-by: LeChen <lecself@163.com>	2025-08-09 20:21:05 -07:00
Kyuyeun Kim	9a0c5ded5a	[TPU] Add support for online w8a8 quantization (#22425 ) Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>	2025-08-08 23:12:54 -07:00
Chauncey	17eaaef595	[Bugfix] Fix RuntimeError: Index put requires the source and destination dtypes match (#22065 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-08-07 19:20:21 -07:00

1 2 3 4 5 ...

445 Commits