Robert Shaw
|
c56ed8bb0e
|
[Bugfix][Nixl] Fix full prefix cache hit bug (#18632)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-06-05 02:07:32 +00:00 |
|
Nicolò Lucchesi
|
b2fac67130
|
[P/D] Heterogeneous TP (#18833)
Signed-off-by: nicklucche <nlucches@redhat.com>
|
2025-06-04 23:25:34 +00:00 |
|
Siyuan Liu
|
7ee2590478
|
[TPU] Update dynamo dump file name in compilation test (#19108)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
|
2025-06-04 16:13:43 -04:00 |
|
jmswen
|
c8dcc15921
|
Allow AsyncLLMEngine.generate to target a specific DP rank (#19102)
Signed-off-by: Jon Swenson <jmswen@gmail.com>
|
2025-06-04 08:26:47 -07:00 |
|
Seiji Eicher
|
2669a0d7b5
|
Fix ValueError: Missing value for tag key(s): model_name,engine. (#19113)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-06-04 17:10:45 +08:00 |
|
Siyuan Liu
|
8e972d9c44
|
[TPU] Skip hanging tests (#19115)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
|
2025-06-04 01:43:00 -07:00 |
|
Yan Ru Pei
|
b712be98c7
|
feat: add data parallel rank to KVEventBatch (#18925)
|
2025-06-03 17:14:20 -07:00 |
|
Chen Zhang
|
a8da78eac9
|
[Bugfix] Max concurrency estimation and check_enough_kv_cache_memory for models with sliding window layers (#19029)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-04 00:14:06 +00:00 |
|
Chen Zhang
|
b5fd9506c1
|
[Bugfix] get_num_blocks_to_allocate with null_block (#19031)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-03 15:30:55 -07:00 |
|
Chen Zhang
|
6cac54f4d1
|
[v1] Re-init input batch for multiple kv cache groups (#18654)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-03 21:41:36 +00:00 |
|
Yong Hoon Shin
|
bdf13965ab
|
[V1] Support cross-layer KV sharing (#18212)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-06-03 20:33:07 +00:00 |
|
Simon Mo
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
|
Chen Zhang
|
f32fcd9444
|
[v1][KVCacheManager] Rename BlockHashType to BlockHash (#19015)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-03 08:01:48 +00:00 |
|
Rui Qiao
|
bdce64f236
|
[V1] Support DP with Ray (#18779)
|
2025-06-02 21:15:13 -07:00 |
|
Siyuan Liu
|
9112b443a0
|
[Hardware][TPU] Initial support of model parallelism with single worker using SPMD (#18011)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
|
2025-06-03 00:06:20 +00:00 |
|
22quinn
|
9760fd8f6a
|
[Core] Support inplace model weights loading (#18745)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-02 17:38:50 +08:00 |
|
Nick Hill
|
2dbe8c0774
|
[Perf] API-server scaleout with many-to-many server-engine comms (#17546)
|
2025-05-30 08:17:00 -07:00 |
|
Carol Zheng
|
fba02e3bd1
|
[Bugfix][TPU] Fix tpu model runner testcase failure (#18810)
Signed-off-by: Carol Zheng <cazheng@google.com>
|
2025-05-30 18:04:03 +08:00 |
|
Nick Hill
|
d1d61f3351
|
[BugFix] Make DP work with connector-delayed new requests (#18559)
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Will Eaton <weaton@redhat.com>
|
2025-05-29 18:04:18 +00:00 |
|
Nicolò Lucchesi
|
32ce3cf7c9
|
[V1] Allocate kv_cache with stride order for V1 (#18775)
Signed-off-by: nicklucche <nlucches@redhat.com>
|
2025-05-29 17:54:16 +00:00 |
|
Mark McLoughlin
|
06a0338015
|
[V1][Metrics] Add API for accessing in-memory Prometheus metrics (#17010)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-05-27 09:37:06 +00:00 |
|
qizixi
|
c1e4a4052d
|
[V1][Spec Decode] Support multi-layer eagle draft model (#18030)
Signed-off-by: qizixi <qizixi@meta.com>
|
2025-05-24 09:45:34 +00:00 |
|
qizixi
|
d55e446d13
|
[V1][Spec Decode] Small refactors to improve eagle bookkeeping performance (#18424)
Signed-off-by: qizixi <qizixi@meta.com>
|
2025-05-24 06:51:22 +00:00 |
|
Robert Shaw
|
2b10ba7491
|
[Bugfix][Nixl] Fix Preemption Bug (#18631)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-05-23 23:30:16 +00:00 |
|
Feng XiaoLong
|
4fc1bf813a
|
[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454)
Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com>
Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>
|
2025-05-23 16:16:26 -07:00 |
|
Chen Zhang
|
6550114c9c
|
[v1] Redo "Support multiple KV cache groups in GPU model runner (#17945)" (#18593)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-05-23 09:39:47 -07:00 |
|
Chauncey
|
b046cf792d
|
[Feature][V1]: suupports cached_tokens in response usage (#18149)
Co-authored-by: simon-mo <xmo@berkeley.edu>
|
2025-05-23 01:41:03 -07:00 |
|
lkchen
|
e44d8ce8c7
|
[Bugfix] Set KVTransferConfig.engine_id in post_init (#18576)
Signed-off-by: Linkun Chen <github@lkchen.net>
|
2025-05-23 02:54:42 +00:00 |
|
Mark McLoughlin
|
c6b636f9fb
|
[V1][Spec Decoding] Use model_loader.get_model() to load models (#18273)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-05-23 02:05:44 +00:00 |
|
rasmith
|
46791e1b4b
|
[AMD] [P/D] Compute num gpus for ROCm correctly in run_accuracy_test.sh (#18568)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2025-05-22 18:45:35 -07:00 |
|
Harry Mellor
|
ca86a7cf6e
|
[CI/Build] Update bamba test model location (#18544)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-22 06:01:07 -07:00 |
|
Jee Jee Li
|
db5a29ba19
|
[Bugfix] Fix LoRA test (#18518)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-05-21 21:48:53 -07:00 |
|
Mark McLoughlin
|
bb0a311213
|
Revert "[v1] Support multiple KV cache groups in GPU model runner (#17945) (#18459)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-05-21 10:25:23 -07:00 |
|
Bowen Wang
|
7fdfa01530
|
[Sampler] Adapt to FlashInfer 0.2.3 sampler API (#15777)
Signed-off-by: Bowen Wang <abmfy@icloud.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-05-16 15:14:03 -07:00 |
|
Seiji Eicher
|
541817670c
|
[Misc] Add Ray Prometheus logger to V1 (#17925)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-05-16 01:02:42 -07:00 |
|
Lucia Fang
|
8795eb9975
|
[Bugfix] Fix test_eagle test (#18223)
Signed-off-by: Lucia Fang <fanglu@fb.com>
|
2025-05-15 15:59:42 -07:00 |
|
David Xia
|
de71fec81b
|
[CI] don't skip fixed test_kv_cache_events() (#18183)
Signed-off-by: David Xia <david@davidxia.com>
|
2025-05-14 23:17:16 -07:00 |
|
Ning Xie
|
420caf7557
|
[UT] Add ut for none hash (#17892)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-05-15 13:28:11 +08:00 |
|
Mark McLoughlin
|
65334ef3b9
|
[V1][Metrics] Remove unused code (#18158)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-05-14 20:13:17 -07:00 |
|
Chen Zhang
|
e60f550b38
|
[v1] Support multiple KV cache groups in GPU model runner (#17945)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-05-14 18:54:54 -07:00 |
|
Michael Goin
|
2142035b51
|
[V1] Support multiple kv connectors (#17564)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-05-14 16:28:02 -07:00 |
|
Russell Bryant
|
78aa341d12
|
[CI] Fix race condition in test_kv_cache_events test (#18169)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-05-14 16:27:48 -07:00 |
|
Aaron Pham
|
2fc9075b82
|
[V1] Structured Outputs + Thinking compatibility (#16577)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-05-14 15:45:24 -07:00 |
|
Robert Shaw
|
856865008e
|
[CI] Disable Failing Tests (#18165)
|
2025-05-14 13:49:56 -07:00 |
|
Nick Hill
|
59dd311cf5
|
[KVConnector] Keep KVTransferParams as a dict (#18033)
|
2025-05-14 08:05:57 -07:00 |
|
Chen Zhang
|
f2ae883b67
|
[v1][KVCacheManager] pass num_new_computed_tokens to kv cache manager (#18001)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-05-13 19:09:39 -07:00 |
|
Nick Hill
|
55aa7af994
|
[V1] DP scale-out (2/N): Decouple engine process management and comms (#15977)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-05-13 10:48:21 -07:00 |
|
Chen Zhang
|
f0d610a8ae
|
[v1][KVCacheManager] Avoid full cache hit by controlling max_length (#17999)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-05-13 06:50:38 +00:00 |
|
Chauncey
|
dc1a821768
|
[Feature][V1] Support tool_choice: required when using Xgrammar as the StructuredOutputBackend. (#17845)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-05-12 23:01:31 -07:00 |
|
wwl2755
|
dc9905368d
|
[V1][Spec Decode] Eagle unit tests (#17350)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
|
2025-05-12 23:01:17 +00:00 |
|