Nick Hill
e38e96a3c0
[Tests] Harden DP tests ( #21508 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-07-25 02:27:24 -07:00
Chengji Yao
40d86ee412
[TPU][Bugfix] fix OOM issue in CI test ( #21550 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-07-24 23:01:53 -07:00
QiliangCui
e0be2c4d09
[TPU][Test] Temporarily suspend this MoE model in test_basic.py. ( #21560 )
...
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
2025-07-24 20:44:50 -07:00
Nick Hill
9c8b2c2a8a
[DP] Support api-server-count > 0 in hybrid DP LB mode ( #21510 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-07-24 20:18:16 -07:00
Juncheng Gu
6066284914
[P/D] Support CPU Transfer in NixlConnector ( #18293 )
...
Signed-off-by: Juncheng Gu <juncgu@gmail.com>
Signed-off-by: Richard Liu <ricliu@google.com>
Co-authored-by: Richard Liu <39319471+richardsliu@users.noreply.github.com>
Co-authored-by: Richard Liu <ricliu@google.com>
2025-07-24 17:58:42 +01:00
Rui Qiao
1e9ea8e69d
[P/D] Move FakeNixlWrapper to test dir ( #21328 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-07-24 08:53:45 -07:00
Lucas Wilkinson
61b8cea3b4
[Attention] Optimize FlashInfer MetadataBuilder Build call ( #21137 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-07-24 03:21:46 -07:00
Zhou Fang
fc5f756db4
[v1][Core] Clean up usages of SpecializedManager ( #21407 )
...
Signed-off-by: Zhou Fang <fang.github@gmail.com>
2025-07-24 00:40:11 -07:00
Chengji Yao
e74bfc70e4
[TPU][Bugfix] fix moe layer ( #21340 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2025-07-24 00:38:39 -07:00
Robert Shaw
d5b981f8b1
[DP] Internal Load Balancing Per Node [one-pod-per-node] ( #21238 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-07-23 20:57:32 -07:00
22quinn
5c9b807b34
[Core] Add reload_weights RPC method ( #20096 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-07-23 14:24:52 -07:00
Nick Hill
316b1bf706
[Tests] Add tests for headless internal DP LB ( #21450 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-07-23 07:49:25 -07:00
Lu Fang
accac82928
[Sampler] Introduce logprobs mode for logging ( #21398 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-07-23 01:39:25 -07:00
Jialin Ouyang
a1f3610fc6
[Core] Add basic unit test for maybe_evict_cached_block ( #21400 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-07-23 00:02:02 -07:00
Jialin Ouyang
ed25054577
[Core] Introduce popleft_n and append_n in FreeKVCacheBlockQueue to further optimize block_pool ( #21222 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-07-22 06:17:47 -07:00
Thomas Parnell
488d8a986a
[V1] [Hybrid] Add new test to verify that hybrid views into KVCacheTensor are compatible ( #21300 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-07-21 23:31:18 -07:00
Robert Shaw
29d1ffc5b4
[DP] Fix Prometheus Logging ( #21257 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2025-07-21 09:11:35 -07:00
Ning Xie
d97841078b
[Misc] unify variable for LLM instance ( #20996 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-07-21 12:18:33 +01:00
Jiayi Yan
7ba34b1241
[bugfix] fix syntax warning caused by backslash ( #21251 )
2025-07-20 17:12:10 +00:00
Seiji Eicher
d1fb65bde3
Enable v1 metrics tests ( #20953 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
2025-07-20 03:22:02 +00:00
Chengji Yao
3a1d8940ae
[TPU] support fp8 kv cache quantization ( #19292 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-07-20 03:01:00 +00:00
kourosh hakhamaneshi
9f414a12ad
[BugFix] Make PD work with Ray ( #21072 )
...
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
2025-07-19 08:46:50 -07:00
Woosuk Kwon
dd572c0ab3
[V0 Deprecation] Remove V0 Spec Decode workers ( #21152 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-18 21:47:50 -07:00
Lucia Fang
9a9fda1423
[Core] Support Local Chunked Attention for Hybrid KV Cache ( #19351 )
...
Signed-off-by: Lucia Fang <fanglu@fb.com>
Signed-off-by: Lu Fang <fanglu@meta.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lu Fang <fanglu@meta.com>
2025-07-18 20:48:38 -07:00
JialinOuyang-Meta
0f199f197b
[Core] Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue ( #21005 )
...
Signed-off-by: Jialin Ouyang <jialino@meta.com>
2025-07-18 12:34:40 -07:00
Chauncey
fdc5b43d20
[Bugfix]: Fix final_res_batch list index out of range error ( #21055 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-07-17 00:29:09 -07:00
David Ben-David
4fcef49ec4
[V1] [KVConnector] Fix MultiprocExecutor worker output aggregation ( #21048 )
...
Signed-off-by: David Ben-David <davidb@pliops.com>
Co-authored-by: David Ben-David <davidb@pliops.com>
2025-07-17 13:29:45 +08:00
Lucas Wilkinson
76b494444f
[Attention] Refactor attention metadata builder interface ( #20466 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-07-17 04:44:25 +00:00
zhiweiz
c11013db8b
[Meta] Llama4 EAGLE Support ( #20591 )
...
Signed-off-by: qizixi <qizixi@meta.com>
Co-authored-by: qizixi <qizixi@meta.com>
2025-07-15 21:14:15 -07:00
Peter Pan
1eb2b9c102
[CI] update typos config for CI pre-commit and fix some spells ( #20919 )
...
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
2025-07-15 21:12:40 -07:00
Chauncey
34cda778a0
[Frontend] OpenAI Responses API supports input image ( #20975 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-07-15 18:59:36 -06:00
Woosuk Kwon
d4d309409f
Implement Async Scheduling ( #19970 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-14 23:01:46 -07:00
XiongfeiWei
d4170fad39
Use w8a8 quantized matmul Pallas kernel ( #19170 )
...
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
2025-07-15 03:06:33 +00:00
wangxiyuan
1e9438e0b0
[MISC] Move bind_kv_cache to worker module ( #20900 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-14 09:40:00 +00:00
Maroon Ayoub
66f6fbd393
[Prefix Cache] Add reproducible prefix-cache block hashing using SHA-256 + CBOR (64bit) ( #20511 )
...
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
2025-07-14 02:45:31 +00:00
22quinn
8632e831ba
[Core] Add update_config RPC method ( #20095 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-07-14 00:49:18 +00:00
Woosuk Kwon
f45a332886
[Sched] Enhance the logic to remove stopped requests from queues ( #20739 )
2025-07-12 15:33:13 -07:00
Alexander Matveev
5b032352cc
[Attention] MLA - Flashinfer Ragged Prefill ( #20034 )
2025-07-10 20:17:47 -07:00
Nathan Hoos
d6902ce79f
[V0][V1][Core] Add outlines integration for V1, and update V0 integration. ( #15975 )
...
Signed-off-by: Nathan Hoos <thwackyy.y@gmail.com>
2025-07-10 15:30:26 -04:00
Yiming
cd587c93ef
[BugFix]: Properly set engine_id when using multi connector ( #19487 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: leiyiming <leiyiming@kingsoft.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-07-09 20:32:44 +00:00
Chengji Yao
eb58f5953d
[TPU][Bugfix] fix test_pallas ( #20666 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-07-09 09:32:48 -07:00
Dmitry Rogozhkin
e760fcef22
[XPU] Use spawn with XPU multiprocessing ( #20649 )
...
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
2025-07-09 00:34:28 -07:00
QiliangCui
d8ee5a2ca4
[TPU][Bugfix] disable phi-3 test ( #20632 )
...
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
2025-07-08 23:14:26 +00:00
Nicolò Lucchesi
71d1d75b7a
[PD][Nixl] Remote consumer READ timeout for clearing request blocks ( #20139 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-07-08 08:56:40 +01:00
Chauncey
93b9d9f499
[Bugfix]: Fix messy code when using logprobs ( #19209 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-07-08 11:02:15 +08:00
Woosuk Kwon
462b269280
Implement OpenAI Responses API [1/N] ( #20504 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-06 18:32:13 -07:00
Isotr0py
32c9be2200
[v1] Re-add fp32 support to v1 engine through FlexAttention ( #19754 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-07-05 09:41:10 +00:00
Thomas Parnell
2f35a022e6
Enable V1 for Hybrid SSM/Attention Models ( #20016 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
2025-07-04 17:46:53 +00:00
Jee Jee Li
1caca5a589
[Misc] Add SPDX-FileCopyrightText ( #20428 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-04 07:40:42 +00:00
Aaron Pham
4a98edff1f
[Structured Outputs][V1] Skipping with models doesn't contain tokenizers ( #20365 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-07-04 15:05:49 +08:00