xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-21 01:35:47 +08:00

Author	SHA1	Message	Date
Nick Hill	58e61e56b7	[Test] Rework e2e async scheduling tests (#28744 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-14 16:01:09 -08:00
Laith Sakka	2e0ad629b0	Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch (#25110 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-11-14 14:11:10 -08:00
Cyrus Leung	e2741f6cbc	[Chore] Rename `SchedulerConfig.chunked_prefill_enabled` (#28735 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-14 18:39:57 +00:00
Yong Hoon Shin	9324e10275	Fix KV sharing fast prefill with cudagraph enabled (#28537 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-14 11:53:42 +00:00
Yannick Schnider	119c4927b3	[Bugfix] Fix validate model input for decoder models (#27099 ) Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com> Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-11-13 10:18:47 -08:00
Nicolò Lucchesi	19d91ece4b	[CI] Fix flaky `test_eagle_correctness` test (#28364 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-11-09 16:04:59 +00:00
Xiaohong (Sean) Chen	d0c7792004	[Bugfix][LoRA][Spec Decode] Support LoRA with speculative decoding (#21068 ) Signed-off-by: Sean Chen <xiaohong_chen1991@hotmail.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Danielle Robinson <dcmaddix@gmail.com> Co-authored-by: Haipeng Li <li2haipeng@gmail.com> Co-authored-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com>	2025-11-08 01:58:22 +00:00
Nick Hill	938a81692e	[AsyncScheduling] Don't schedule past request max_tokens (#27922 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-04 17:06:28 +00:00
Aurick Qiao	2c19d96777	[Spec Decode] Integrate Suffix Decoding from Arctic Inference (#25784 ) Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>	2025-11-03 09:23:31 -08:00
Rémi Delacourt	cec7c28833	[Bugfix] Padded Eagle Specdec with Chunked Prefill (#26263 ) Signed-off-by: Rémi Delacourt <remi@mistral.ai> Signed-off-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com> Signed-off-by: remi <remi@mistral.ai> Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>	2025-11-03 02:22:46 -05:00
Nick Hill	0cdbe7b744	[Core] Async scheduling + structured outputs compatibility (#26866 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-01 00:35:04 +00:00
Dipika Sikka	413ef7a3b4	[Speculators] Move tests + fix integration (#27308 ) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Rahul Tuli <rtuli@redhat.com> Signed-off-by: rahul-tuli <rtuli@redhat.com> Co-authored-by: Rahul Tuli <rtuli@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-10-29 00:54:21 -07:00
Nick Hill	4fe5895361	[AsyncScheduling] Make async overlap work with logprobs (#27615 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-28 22:35:54 +00:00
Huy Do	becb7de40b	Update PyTorch to 2.9.0+cu129 (#24994 ) Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-21 17:20:18 -04:00
Bram Wasti	b2f78cbad4	[small][batch invariance] Rename the env and internal flags to simplify usage (#26855 ) Signed-off-by: Bram Wasti <bwasti@meta.com>	2025-10-16 21:40:25 +00:00
Morrison Turnansky	96b9aa5aa0	[Frontend][torch.compile] CompilationConfig Overhaul (#20283 ): name change compilation level to compilation mode, deprecation compilation level (#26355 ) Signed-off-by: morrison-turnansky <mturnans@redhat.com> Signed-off-by: Morrison Turnansky <mturnans@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-15 02:51:16 +00:00
Maximilien de Bayser	d8bebb008a	Add tests for chunked prefill and prefix cache with causal pooling models (#26526 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Ayush Singh <ayush1009208@gmail.com>	2025-10-14 07:45:04 +08:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Nick Hill	5bc26c438d	[BugFix] Make penalties and bad_words work with async scheduling (#26467 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-10 23:27:04 +00:00
Nick Hill	949cb0170d	[BugFix] Fix async scheduling + request preemption (#26385 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-10 20:29:57 +00:00
Thomas Parnell	31a4b3e6c4	Revert #24446 and #26168 (#26332 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-10-07 16:38:19 -06:00
Cyrus Leung	1e4ecca1d0	[V0 Deprecation] Remove `VLLM_USE_V1` from tests (#26341 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-07 15:42:31 +00:00
Yannick Schnider	6431be808f	[Tests] conftest: Extending VllmRunner and HfRunner to accept token_ids as input (#26295 ) Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com> Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-06 17:19:34 +00:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Yannick Schnider	f05fea1f5e	[Core] Enable decode of context length equal to max model length (#26168 ) Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>	2025-10-04 09:59:26 +00:00
Yannick Schnider	8ee846c27c	[Bugfix] Re-enable prefill of max model length (#24446 ) Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>	2025-10-03 14:13:34 +02:00
WeiQing Chen	f1d53d150c	[Multimodal][Speculative Decoding]Eagle Eagle3 mm support, enablement on qwen2.5vl (#22872 ) Signed-off-by: Junhong <liujunhong11@huawei.com> Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> Co-authored-by: Junhong <liujunhong11@huawei.com> Co-authored-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com>	2025-09-27 03:35:47 +00:00
Jonas M. Kübler	6f5c0931c1	[Spec decode] automatically disable mm for text-only draft models (#25667 ) Signed-off-by: Jonas Kuebler <kuebj@amazon.com>	2025-09-27 08:10:21 +08:00
qizixi	c70ac4b8ff	[spec decode] Consolidate speculative decode method name for MTP (#25232 ) Signed-off-by: zixi-qi <qizixi@meta.com>	2025-09-26 22:27:05 +00:00
Matthew Bonanni	3468f17ebe	[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names (#25489 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>	2025-09-25 17:37:50 +00:00
Woosuk Kwon	eb68c2dcd9	[CI] Revert back prepare_prompts and check_answers (#25087 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-17 11:03:16 -07:00
Wenlong Wang	cfa3234a5b	[CI][Spec Decode] Adjust threshold for flaky ngram spec decoding test again (#24771 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>	2025-09-13 15:45:11 +08:00
co63oc	3144d90217	fix some typos (#24167 ) Signed-off-by: co63oc <co63oc@users.noreply.github.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-09-10 06:21:23 -07:00
Wenlong Wang	53b42f4102	[BugFix][Spec Decode] Fix out-of-range index triggered by eagle3; re-enable test for LlamaForCausalLMEagle3 (#24392 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>	2025-09-09 21:24:23 -07:00
Nick Hill	83dd28aae4	[CI] Adjust threshold for flaky ngram spec decoding test (#24528 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-09-09 21:07:33 -07:00
Didier Durand	d7e1e59972	[Doc]: fix typos in Python comments (#24093 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-09-02 21:05:45 -07:00
Yong Hoon Shin	8c3e199998	Revert gemma3n fast prefill changes (#23897 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-08-29 12:16:57 -07:00
Yong Hoon Shin	cb293f6a79	[V1] Enable prefill optimization for Gemma3n (#22628 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-08-28 14:54:30 -07:00
Arjun Reddy	111692bb8c	[CI] Add end-to-end V1 min_tokens test coverage (#22495 ) Signed-off-by: Arjun Reddy <189282188+arjunbreddy22@users.noreply.github.com> Co-authored-by: Arjun Reddy <189282188+arjunbreddy22@users.noreply.github.com>	2025-08-21 22:04:07 -06:00
Xin Yang	83e69a09d6	[Model] Support deepseek with eagle (#21086 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2025-08-20 19:01:31 +08:00
Lucas Wilkinson	b8ff05361a	[CI] Temporarily disable flaky test (#22930 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-08-14 19:59:16 +00:00
Cyrus Leung	b4b78d6317	[CI/Build] Fix param mismatch in `test_eagle_correctness` (#22847 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-13 10:55:25 -07:00
Nicolò Lucchesi	12817a8ac7	[CI] Fix `tests/v1/e2e/test_kv_sharing_fast_prefill.py` import on test (#22815 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-08-13 10:35:50 -07:00
22quinn	807d21b80d	[BugFix] [Spec Decode] Remove LlamaForCausalLMEagle3 to fix CI (#22611 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-08-11 10:31:36 -07:00
Le Chen	3d7363e61c	[Config] add "qwen" as a native eagle3 target supported model (#22333 ) Signed-off-by: lechen <lecself@163.com> Signed-off-by: LeChen <lecself@163.com>	2025-08-09 20:21:05 -07:00
TJian	1ee5ead5f8	[ROCm] [V1] [SpecDec] Enable Speculative Decoding on ROCm V1 Engine (#21496 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-08-07 19:13:17 -07:00
Yong Hoon Shin	8564dc9448	Fix test_kv_sharing_fast_prefill flakiness (#22038 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-08-01 23:55:34 -07:00
zhiweiz	9e0726e5bf	[Meta] Official Eagle mm support, first enablement on llama4 (#20788 ) Signed-off-by: morgendave <morgendave@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.me>	2025-07-31 10:35:07 -07:00
Yong Hoon Shin	ad510309ee	Override attention metadata for fast prefill in some KV sharing setups (#21590 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-07-30 08:54:15 -07:00
Chen Zhang	755fa8b657	[KVCache] Make KVCacheSpec hashable (#21791 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-07-29 19:58:29 +08:00

1 2

67 Commits