Cyrus Leung
|
19b927e52d
|
[Core] Use individual MM items in P0/P1 cache and model runner (#22570)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-13 07:18:07 -07:00 |
|
Nicolò Lucchesi
|
6b794c756c
|
[Nixl][CI] Fix tests (#22806)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-08-13 06:03:53 -07:00 |
|
Giancarlo Delfin
|
d94e3026de
|
[V1] Add tree drafting tests for eagle spec decoding (#22705)
Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>
|
2025-08-13 04:11:28 -07:00 |
|
Woosuk Kwon
|
71683ca6f6
|
[V0 Deprecation] Remove multi-step scheduling (#22138)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-08-12 20:18:39 -07:00 |
|
Nicolò Lucchesi
|
422f22e012
|
[CI][Nixl] Check kv cache layout during handshake (#22745)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-08-12 12:53:52 -07:00 |
|
Nicolò Lucchesi
|
3d9d40efde
|
[Bugfix][CI] Fix test_remote_decode_lifecycle.py::test_short_prompt_lifecycle (#22727)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-08-12 07:30:17 -07:00 |
|
phantomlei
|
bc8372efc3
|
[Bugfix] Fix erroneous randomly generated cases in bad word testing (#22170)
Signed-off-by: phantomlei <phantomlei3@gmail.com>
|
2025-08-12 02:03:22 -07:00 |
|
Michael Goin
|
93d0652433
|
[CI] Increase timeout for test_completion_with_image_embeds (#22670)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-11 20:31:36 -07:00 |
|
TJian
|
65abe111a3
|
[CI] Skip Tree Attn Test in test_max_len.py to unblock CI (#22664)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-08-11 10:36:05 -07:00 |
|
22quinn
|
807d21b80d
|
[BugFix] [Spec Decode] Remove LlamaForCausalLMEagle3 to fix CI (#22611)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-08-11 10:31:36 -07:00 |
|
GuanLuo
|
16fb668b61
|
fix: NIXL connector transfers partial block to pass full multi-modal context (#21074)
Signed-off-by: GuanLuo <gluo@nvidia.com>
|
2025-08-11 09:40:55 -07:00 |
|
Nick Hill
|
5898b135ab
|
[BugFix] Fix KVConnectorOutput TPU breakage (#22598)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-08-10 19:33:48 -07:00 |
|
Chengji Yao
|
2a84fb422f
|
[TPU] kv cache update kernel doesn't need to be padded slices to multiple of num_slices_per_block (#22394)
Signed-off-by: Chengji Yao <chengjiyao@gmail.com>
Co-authored-by: Chengji Yao <chengjiyao@gmail.com>
|
2025-08-09 20:49:04 -07:00 |
|
Le Chen
|
3d7363e61c
|
[Config] add "qwen" as a native eagle3 target supported model (#22333)
Signed-off-by: lechen <lecself@163.com>
Signed-off-by: LeChen <lecself@163.com>
|
2025-08-09 20:21:05 -07:00 |
|
Kyuyeun Kim
|
9a0c5ded5a
|
[TPU] Add support for online w8a8 quantization (#22425)
Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>
|
2025-08-08 23:12:54 -07:00 |
|
Chauncey
|
17eaaef595
|
[Bugfix] Fix RuntimeError: Index put requires the source and destination dtypes match (#22065)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-08-07 19:20:21 -07:00 |
|
TJian
|
1ee5ead5f8
|
[ROCm] [V1] [SpecDec] Enable Speculative Decoding on ROCm V1 Engine (#21496)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-08-07 19:13:17 -07:00 |
|
Harry Mellor
|
7e3a8dc906
|
Remove from_dict from SpeculativeConfig (#22451)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-07 10:13:04 -07:00 |
|
Chen Zhang
|
4815b00f54
|
[gpt-oss] Generate ResponseOutputItem from Harmony Message (#22410)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-08-07 08:33:25 -07:00 |
|
Michael Goin
|
a00d8b236f
|
Use float32 for test_completion.py (#22385)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-07 11:07:47 +08:00 |
|
Lucas Wilkinson
|
1dc8a70b6d
|
[Attention] Support multiple attention metadata builders per kv_cache_spec + proper local attention no hybrid kv cache fix (#21588)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-08-06 18:40:52 -07:00 |
|
Asaf Joseph Gardin
|
46a13949d5
|
[v1] - Mamba1 Attention Metadata (#21249)
Signed-off-by: asafg <asafg@ai21.com>
Co-authored-by: asafg <asafg@ai21.com>
|
2025-08-06 17:03:42 -07:00 |
|
Giancarlo Delfin
|
469b3ffaaa
|
[V1] port xformers backend to v1 (#21342)
Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>
|
2025-08-05 10:04:46 -07:00 |
|
Nicolò Lucchesi
|
0c275ad5ad
|
[V0 Deprecation][TPU] Remove V1 flag check from tests (#22248)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-08-05 06:53:23 -07:00 |
|
Giancarlo Delfin
|
5ea71ff46f
|
[V1] reduce block size for tree attention correctness test to fix 'ou… (#22207)
Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>
|
2025-08-04 19:11:06 -07:00 |
|
Woosuk Kwon
|
7175817637
|
Revert "[Bugfix] V1 Fix the cursor leakage issue during request scheduling." (#22223)
|
2025-08-04 18:37:06 -07:00 |
|
PiteXChen
|
2dffac464c
|
[Bugfix] V1 Fix the cursor leakage issue during request scheduling. (#21173)
Signed-off-by: CLFutureX <775523362@qq.com>
|
2025-08-04 18:34:10 -07:00 |
|
22quinn
|
54de71d0df
|
[Sampler] Support returning all logprobs or logits (#21792)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-08-04 03:04:12 -07:00 |
|
Tyler Michael Smith
|
8ecb3e9e93
|
[CI Bugfix] Fix wNa16 kernel not found for test_shared_storage_connector_hashes (#22163)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-08-03 22:19:04 -07:00 |
|
Giancarlo Delfin
|
aa7012eb6d
|
Add tree attention backend for v1 (part 1) (#20401)
Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>
|
2025-08-03 22:13:26 -07:00 |
|
Abirdcfly
|
0d7db16a92
|
[PD] add test for chat completions endpoint (#21925)
Signed-off-by: Abirdcfly <fp544037857@gmail.com>
|
2025-08-03 19:57:03 -07:00 |
|
Woosuk Kwon
|
6d98843b31
|
[Responses API] Disable response store by default (#22137)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-03 04:04:21 -07:00 |
|
David Ben-David
|
aefeea0fde
|
[V1] [P/D] Refactor KV Connector Path (#21980)
Signed-off-by: David Ben-David <davidb@pliops.com>
Co-authored-by: David Ben-David <davidb@pliops.com>
|
2025-08-03 04:03:40 -07:00 |
|
Roger Wang
|
067c34a155
|
docs: remove deprecated disable-log-requests flag (#22113)
Signed-off-by: Roger Wang <hey@rogerw.me>
|
2025-08-02 00:19:48 -07:00 |
|
Yong Hoon Shin
|
8564dc9448
|
Fix test_kv_sharing_fast_prefill flakiness (#22038)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-08-01 23:55:34 -07:00 |
|
Sage Moore
|
0edaf752d7
|
[Attention][DBO] Add support for "splitting" the CommonAttentionMetadata (#21153)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-08-01 19:47:53 -07:00 |
|
rongfu.leng
|
b879ecd6e2
|
[Bugfix] fix when skip tokenizer init (#21922)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-08-01 10:09:36 -07:00 |
|
Harry Mellor
|
2d7b09b998
|
Deprecate --disable-log-requests and replace with --enable-log-requests (#21739)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-01 17:16:37 +01:00 |
|
Yong Hoon Shin
|
71470bc4af
|
[Misc] Add unit tests for chunked local attention (#21692)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-07-31 11:39:16 -07:00 |
|
zhiweiz
|
9e0726e5bf
|
[Meta] Official Eagle mm support, first enablement on llama4 (#20788)
Signed-off-by: morgendave <morgendave@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.me>
|
2025-07-31 10:35:07 -07:00 |
|
Nick Hill
|
5daffe7cf6
|
[BugFix] Fix case where collective_rpc returns None (#22006)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-31 12:51:37 +00:00 |
|
Michael Goin
|
055bd3978e
|
[CI Bugfix] Fix CI OOM for test_shared_storage_connector_hashes (#21973)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-31 11:45:29 +08:00 |
|
Zebing Lin
|
ca9e2be3ed
|
[Core] Move EngineCoreRequest to Request conversion out of EngineCore (#21627)
Signed-off-by: linzebing <linzebing1995@gmail.com>
|
2025-07-30 15:00:54 -07:00 |
|
Nick Hill
|
56bd537dde
|
[Misc] Support more collective_rpc return types (#21845)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-30 10:20:20 -07:00 |
|
Chenguang Zheng
|
4904e53c32
|
[Bugfix] SharedStorage Connector for V1 PD multimodal (#21611)
Signed-off-by: fake0fan <645327136@qq.com>
Signed-off-by: herotai214 <herotai214@gmail.com>
Co-authored-by: herotai214 <herotai214@gmail.com>
|
2025-07-30 09:18:37 -07:00 |
|
633WHU
|
5c765aec65
|
[Bugfix] Fix TypeError in scheduler when comparing mixed request_id types (#21816)
Signed-off-by: chiliu <chiliu@paypal.com>
Co-authored-by: chiliu <chiliu@paypal.com>
|
2025-07-30 08:54:44 -07:00 |
|
Yong Hoon Shin
|
ad510309ee
|
Override attention metadata for fast prefill in some KV sharing setups (#21590)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-07-30 08:54:15 -07:00 |
|
Ruixiang Tan
|
8f4a1c9a04
|
[Misc] Improve code readability of KVCacheManager (#21673)
Signed-off-by: tanruixiang <tanruixiang0104@gmail.com>
Signed-off-by: Ruixiang Tan <819464715@qq.com>
Signed-off-by: GitHub <noreply@github.com>
|
2025-07-30 07:20:43 -07:00 |
|
Chen Zhang
|
555e7225bc
|
[v1][attention] Support Hybrid Allocator + FlashInfer (#21412)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-07-30 01:45:29 +00:00 |
|
Chen Zhang
|
755fa8b657
|
[KVCache] Make KVCacheSpec hashable (#21791)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-07-29 19:58:29 +08:00 |
|