Nick Hill
|
58e61e56b7
|
[Test] Rework e2e async scheduling tests (#28744)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-14 16:01:09 -08:00 |
|
Laith Sakka
|
2e0ad629b0
|
Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch (#25110)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-11-14 14:11:10 -08:00 |
|
Cyrus Leung
|
e2741f6cbc
|
[Chore] Rename SchedulerConfig.chunked_prefill_enabled (#28735)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-14 18:39:57 +00:00 |
|
Yong Hoon Shin
|
9324e10275
|
Fix KV sharing fast prefill with cudagraph enabled (#28537)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-14 11:53:42 +00:00 |
|
Yannick Schnider
|
119c4927b3
|
[Bugfix] Fix validate model input for decoder models (#27099)
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-11-13 10:18:47 -08:00 |
|
Nicolò Lucchesi
|
19d91ece4b
|
[CI] Fix flaky test_eagle_correctness test (#28364)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-11-09 16:04:59 +00:00 |
|
Xiaohong (Sean) Chen
|
d0c7792004
|
[Bugfix][LoRA][Spec Decode] Support LoRA with speculative decoding (#21068)
Signed-off-by: Sean Chen <xiaohong_chen1991@hotmail.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Danielle Robinson <dcmaddix@gmail.com>
Co-authored-by: Haipeng Li <li2haipeng@gmail.com>
Co-authored-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com>
|
2025-11-08 01:58:22 +00:00 |
|
Nick Hill
|
938a81692e
|
[AsyncScheduling] Don't schedule past request max_tokens (#27922)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-04 17:06:28 +00:00 |
|
Aurick Qiao
|
2c19d96777
|
[Spec Decode] Integrate Suffix Decoding from Arctic Inference (#25784)
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>
|
2025-11-03 09:23:31 -08:00 |
|
Rémi Delacourt
|
cec7c28833
|
[Bugfix] Padded Eagle Specdec with Chunked Prefill (#26263)
Signed-off-by: Rémi Delacourt <remi@mistral.ai>
Signed-off-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com>
Signed-off-by: remi <remi@mistral.ai>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>
|
2025-11-03 02:22:46 -05:00 |
|
Nick Hill
|
0cdbe7b744
|
[Core] Async scheduling + structured outputs compatibility (#26866)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-01 00:35:04 +00:00 |
|
Dipika Sikka
|
413ef7a3b4
|
[Speculators] Move tests + fix integration (#27308)
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Signed-off-by: rahul-tuli <rtuli@redhat.com>
Co-authored-by: Rahul Tuli <rtuli@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2025-10-29 00:54:21 -07:00 |
|
Nick Hill
|
4fe5895361
|
[AsyncScheduling] Make async overlap work with logprobs (#27615)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-28 22:35:54 +00:00 |
|
Huy Do
|
becb7de40b
|
Update PyTorch to 2.9.0+cu129 (#24994)
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-21 17:20:18 -04:00 |
|
Bram Wasti
|
b2f78cbad4
|
[small][batch invariance] Rename the env and internal flags to simplify usage (#26855)
Signed-off-by: Bram Wasti <bwasti@meta.com>
|
2025-10-16 21:40:25 +00:00 |
|
Morrison Turnansky
|
96b9aa5aa0
|
[Frontend][torch.compile] CompilationConfig Overhaul (#20283): name change compilation level to compilation mode, deprecation compilation level (#26355)
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-15 02:51:16 +00:00 |
|
Maximilien de Bayser
|
d8bebb008a
|
Add tests for chunked prefill and prefix cache with causal pooling models (#26526)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Ayush Singh <ayush1009208@gmail.com>
|
2025-10-14 07:45:04 +08:00 |
|
Harry Mellor
|
8fcaaf6a16
|
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-12 09:51:31 -07:00 |
|
Nick Hill
|
5bc26c438d
|
[BugFix] Make penalties and bad_words work with async scheduling (#26467)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-10 23:27:04 +00:00 |
|
Nick Hill
|
949cb0170d
|
[BugFix] Fix async scheduling + request preemption (#26385)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-10 20:29:57 +00:00 |
|
Thomas Parnell
|
31a4b3e6c4
|
Revert #24446 and #26168 (#26332)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-10-07 16:38:19 -06:00 |
|
Cyrus Leung
|
1e4ecca1d0
|
[V0 Deprecation] Remove VLLM_USE_V1 from tests (#26341)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-07 15:42:31 +00:00 |
|
Yannick Schnider
|
6431be808f
|
[Tests] conftest: Extending VllmRunner and HfRunner to accept token_ids as input (#26295)
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-06 17:19:34 +00:00 |
|
Harry Mellor
|
d6953beb91
|
Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-05 07:06:22 -07:00 |
|
Yannick Schnider
|
f05fea1f5e
|
[Core] Enable decode of context length equal to max model length (#26168)
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
|
2025-10-04 09:59:26 +00:00 |
|
Yannick Schnider
|
8ee846c27c
|
[Bugfix] Re-enable prefill of max model length (#24446)
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
|
2025-10-03 14:13:34 +02:00 |
|
WeiQing Chen
|
f1d53d150c
|
[Multimodal][Speculative Decoding]Eagle Eagle3 mm support, enablement on qwen2.5vl (#22872)
Signed-off-by: Junhong <liujunhong11@huawei.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Co-authored-by: Junhong <liujunhong11@huawei.com>
Co-authored-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com>
|
2025-09-27 03:35:47 +00:00 |
|
Jonas M. Kübler
|
6f5c0931c1
|
[Spec decode] automatically disable mm for text-only draft models (#25667)
Signed-off-by: Jonas Kuebler <kuebj@amazon.com>
|
2025-09-27 08:10:21 +08:00 |
|
qizixi
|
c70ac4b8ff
|
[spec decode] Consolidate speculative decode method name for MTP (#25232)
Signed-off-by: zixi-qi <qizixi@meta.com>
|
2025-09-26 22:27:05 +00:00 |
|
Matthew Bonanni
|
3468f17ebe
|
[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names (#25489)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
|
2025-09-25 17:37:50 +00:00 |
|
Woosuk Kwon
|
eb68c2dcd9
|
[CI] Revert back prepare_prompts and check_answers (#25087)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-17 11:03:16 -07:00 |
|
Wenlong Wang
|
cfa3234a5b
|
[CI][Spec Decode] Adjust threshold for flaky ngram spec decoding test again (#24771)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
|
2025-09-13 15:45:11 +08:00 |
|
co63oc
|
3144d90217
|
fix some typos (#24167)
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-09-10 06:21:23 -07:00 |
|
Wenlong Wang
|
53b42f4102
|
[BugFix][Spec Decode] Fix out-of-range index triggered by eagle3; re-enable test for LlamaForCausalLMEagle3 (#24392)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
|
2025-09-09 21:24:23 -07:00 |
|
Nick Hill
|
83dd28aae4
|
[CI] Adjust threshold for flaky ngram spec decoding test (#24528)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-09-09 21:07:33 -07:00 |
|
Didier Durand
|
d7e1e59972
|
[Doc]: fix typos in Python comments (#24093)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-09-02 21:05:45 -07:00 |
|
Yong Hoon Shin
|
8c3e199998
|
Revert gemma3n fast prefill changes (#23897)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-08-29 12:16:57 -07:00 |
|
Yong Hoon Shin
|
cb293f6a79
|
[V1] Enable prefill optimization for Gemma3n (#22628)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-08-28 14:54:30 -07:00 |
|
Arjun Reddy
|
111692bb8c
|
[CI] Add end-to-end V1 min_tokens test coverage (#22495)
Signed-off-by: Arjun Reddy <189282188+arjunbreddy22@users.noreply.github.com>
Co-authored-by: Arjun Reddy <189282188+arjunbreddy22@users.noreply.github.com>
|
2025-08-21 22:04:07 -06:00 |
|
Xin Yang
|
83e69a09d6
|
[Model] Support deepseek with eagle (#21086)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2025-08-20 19:01:31 +08:00 |
|
Lucas Wilkinson
|
b8ff05361a
|
[CI] Temporarily disable flaky test (#22930)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-08-14 19:59:16 +00:00 |
|
Cyrus Leung
|
b4b78d6317
|
[CI/Build] Fix param mismatch in test_eagle_correctness (#22847)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-13 10:55:25 -07:00 |
|
Nicolò Lucchesi
|
12817a8ac7
|
[CI] Fix tests/v1/e2e/test_kv_sharing_fast_prefill.py import on test (#22815)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-08-13 10:35:50 -07:00 |
|
22quinn
|
807d21b80d
|
[BugFix] [Spec Decode] Remove LlamaForCausalLMEagle3 to fix CI (#22611)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-08-11 10:31:36 -07:00 |
|
Le Chen
|
3d7363e61c
|
[Config] add "qwen" as a native eagle3 target supported model (#22333)
Signed-off-by: lechen <lecself@163.com>
Signed-off-by: LeChen <lecself@163.com>
|
2025-08-09 20:21:05 -07:00 |
|
TJian
|
1ee5ead5f8
|
[ROCm] [V1] [SpecDec] Enable Speculative Decoding on ROCm V1 Engine (#21496)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-08-07 19:13:17 -07:00 |
|
Yong Hoon Shin
|
8564dc9448
|
Fix test_kv_sharing_fast_prefill flakiness (#22038)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-08-01 23:55:34 -07:00 |
|
zhiweiz
|
9e0726e5bf
|
[Meta] Official Eagle mm support, first enablement on llama4 (#20788)
Signed-off-by: morgendave <morgendave@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.me>
|
2025-07-31 10:35:07 -07:00 |
|
Yong Hoon Shin
|
ad510309ee
|
Override attention metadata for fast prefill in some KV sharing setups (#21590)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-07-30 08:54:15 -07:00 |
|
Chen Zhang
|
755fa8b657
|
[KVCache] Make KVCacheSpec hashable (#21791)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-07-29 19:58:29 +08:00 |
|