Cyrus Leung
43c4f3d77c
[Misc] Begin deprecation of get_tensor_model_*_group ( #22494 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-08 01:11:54 -07:00
Shu Wang
b2c8ce57c6
Fix Flashinfer CUTLASS MOE Allgather ( #21963 )
...
Signed-off-by: Shu Wang <shuw@nvidia.com>
2025-08-07 19:18:25 -07:00
WeiQing Chen
4be02a3776
[Bugfix] EPLB load statistics problem ( #22167 )
...
Signed-off-by: ycyaw66 <497410282@qq.com>
Signed-off-by: David Chen <530634352@qq.com>
Co-authored-by: ycyaw66 <497410282@qq.com>
2025-08-07 04:07:54 +00:00
Ning Xie
74333ae2f6
[Misc] correct static type check for GroupCoordinator ( #21946 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-08-05 03:17:46 -07:00
Ning Xie
bd3db7f469
[Misc] log more detailed message for ensure_model_parallel_initialized ( #22144 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-08-04 19:36:55 -07:00
Ning Xie
29b97c0995
[Doc] add backend to doc string of initialize_model_parallel ( #22142 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-08-04 19:36:20 -07:00
lkchen
f4f4e7ef27
[V0 deprecation][P/D] Deprecate v0 KVConnectorBase code (1/2) ( #21785 )
...
Signed-off-by: Linkun Chen <github@lkchen.net>
2025-08-04 19:11:33 -07:00
Ning Xie
c2e75b3c11
remove duplicate code within cleanup_dist_env_and_memory ( #22147 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-08-03 20:03:58 -07:00
David Ben-David
aefeea0fde
[V1] [P/D] Refactor KV Connector Path ( #21980 )
...
Signed-off-by: David Ben-David <davidb@pliops.com>
Co-authored-by: David Ben-David <davidb@pliops.com>
2025-08-03 04:03:40 -07:00
Ning Xie
7de45db9a5
[Misc] update doc comment for send ( #22026 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-08-03 00:55:20 -07:00
Rui Qiao
d331759488
Introduce RayPPCommunicator for ray-based PP ( #21660 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-08-01 11:50:58 -07:00
wxsm
f4135232b9
feat(distributed): add get_required_kvcache_layout class method to kv connector api ( #20433 )
...
Signed-off-by: wxsm <wxsms@foxmail.com>
2025-07-30 16:41:51 +00:00
Chenguang Zheng
4904e53c32
[Bugfix] SharedStorage Connector for V1 PD multimodal ( #21611 )
...
Signed-off-by: fake0fan <645327136@qq.com>
Signed-off-by: herotai214 <herotai214@gmail.com>
Co-authored-by: herotai214 <herotai214@gmail.com>
2025-07-30 09:18:37 -07:00
Calvin Chen
e18f085103
skip fusedmoe layer for start_load_kv ( #21378 )
...
Signed-off-by: calvin chen <wen.chen@dynamia.ai>
2025-07-28 18:59:44 -07:00
Kuntai Du
b18b417fbf
Revert "[V1] Exception Handling when Loading KV Cache from Remote Store" ( #21778 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
2025-07-28 20:15:18 +00:00
Nick Hill
7d44c691b0
[P/D] Log warnings related to prefill KV expiry ( #21753 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-07-28 18:40:53 +00:00
Adeline
15a72ac478
[V1] Exception Handling when Loading KV Cache from Remote Store ( #21534 )
...
Signed-off-by: liuyumoye <adeline_ly2023@outlook.com>
Co-authored-by: liuyumoye <adeline_ly2023@outlook.com>
2025-07-27 20:34:17 -07:00
WeiQing Chen
97d6c30cc9
[BugFix] Fix shared storage connector load kv only load attention layer ( #21428 )
...
Signed-off-by: David Chen <530634352@qq.com>
2025-07-26 07:07:40 -07:00
Juncheng Gu
6066284914
[P/D] Support CPU Transfer in NixlConnector ( #18293 )
...
Signed-off-by: Juncheng Gu <juncgu@gmail.com>
Signed-off-by: Richard Liu <ricliu@google.com>
Co-authored-by: Richard Liu <39319471+richardsliu@users.noreply.github.com>
Co-authored-by: Richard Liu <ricliu@google.com>
2025-07-24 17:58:42 +01:00
Rui Qiao
1e9ea8e69d
[P/D] Move FakeNixlWrapper to test dir ( #21328 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-07-24 08:53:45 -07:00
Li, Jiang
a15a50fc17
[CPU] Enable shared-memory based pipeline parallel for CPU backend ( #21289 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-07-21 09:07:08 -07:00
kourosh hakhamaneshi
9f414a12ad
[BugFix] Make PD work with Ray ( #21072 )
...
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
2025-07-19 08:46:50 -07:00
Rui Qiao
217937221b
Elastic Expert Parallel Initial Support ( #20775 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-07-18 17:46:09 -07:00
Woosuk Kwon
4de7146351
[V0 deprecation] Remove V0 HPU backend ( #21131 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-17 16:37:36 -07:00
Zhonghua Deng
8a4e5c5f3c
[V1][P/D]Enhance Performance and code readability for P2pNcclConnector ( #20906 )
...
Signed-off-by: Abatom <abzhonghua@gmail.com>
2025-07-16 22:13:00 -07:00
Trevor Morris
a8593237c0
Add pynccl all-gatherv and reducescatterv ( #20154 )
...
Signed-off-by: Trevor Morris <tmorris@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-11 18:59:23 -07:00
Varun Sundar Rabindranath
53fa457391
[Misc] Add unit tests for MoE ModularKernel combinations + Profiling utility ( #20449 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-07-11 07:51:46 -07:00
Nick Hill
574ad60db9
[KVConnector] Always call connector clear_metadata() at end of step ( #20756 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: David Ben-David <sdavidbd@gmail.com>
2025-07-10 22:37:27 +01:00
Or Ozeri
cc876d0f29
[KVConnector] Aggregate finished requests on the scheduler ( #19555 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-07-10 09:22:18 +01:00
Yiming
cd587c93ef
[BugFix]: Properly set engine_id when using multi connector ( #19487 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: leiyiming <leiyiming@kingsoft.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-07-09 20:32:44 +00:00
Liangliang Ma
a3e4e85ece
[XPU][CI] enhance xpu test support ( #20652 )
...
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
Co-authored-by: zhenwei-intel <zhenweiliu@habana.ai>
2025-07-09 16:53:09 +00:00
Nicolò Lucchesi
71d1d75b7a
[PD][Nixl] Remote consumer READ timeout for clearing request blocks ( #20139 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-07-08 08:56:40 +01:00
Jee Jee Li
1caca5a589
[Misc] Add SPDX-FileCopyrightText ( #20428 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-04 07:40:42 +00:00
Nicolò Lucchesi
8d775dd30a
[Misc] Fix Unable to detect current VLLM config. Defaulting to NHD kv cache layout warning ( #20400 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-07-03 14:56:09 -07:00
Ning Xie
1dba2c4ebe
[Misc] adjust for ipv6 for mookcacke url parse ( #20107 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-07-03 20:27:17 +00:00
Woosuk Kwon
7f280d69c9
[Optimization] Cache sampled token ids in model runner ( #20291 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-01 11:01:31 -07:00
Nicolò Lucchesi
650d5dbd04
[Misc] Minor refactor of NIXL background handshake ( #20068 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-07-01 12:40:14 +01:00
Michael Goin
be250bbc67
[V1] Only print cudagraph tqdm on rank 0 with is_global_first_rank ( #19516 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-01 06:02:09 +00:00
Zhonghua Deng
ded1fb635b
[Bugfix][V1][P/D]Fix the issue of occasional garbled output for P2pNcclConnector ( #20263 )
...
Signed-off-by: Abatom <abzhonghua@gmail.com>
2025-06-30 16:45:14 -07:00
Woosuk Kwon
2863befce3
[Optimization] Use Shared CachedRequestData Instance Across All Requests ( #20232 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-06-30 09:07:50 -07:00
Wentao Ye
4d36693687
[Refactor] Create a function util and cache the results for has_deepgemm, has_deepep, has_pplx ( #20187 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-06-28 22:06:38 +00:00
li haoyang
0740e29b66
[Feature] add quick all reduce ( #19744 )
...
Signed-off-by: ilmarkov <imarkov@redhat.com>
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-06-26 20:54:24 -07:00
Bowen Wang
e9fd658a73
[Feature] Expert Parallelism Load Balancer (EPLB) ( #18343 )
...
Signed-off-by: Bowen Wang <abmfy@icloud.com>
2025-06-26 15:30:21 -07:00
Nicolò Lucchesi
2582683566
[PD] Skip tp_size exchange with rank0 ( #19413 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-06-25 20:04:39 -07:00
Nick Hill
55c65ab495
[P/D] Avoid stranding blocks in P when aborted in D's waiting queue ( #19223 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-06-25 15:19:44 -07:00
Nick Hill
c40692bf9a
[Misc] Add parallel state node_count function ( #20045 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-06-25 13:38:53 -07:00
lkchen
91f7d9d0b6
[P/D] Asynchronously do _nixl_handshake ( #19836 )
...
Signed-off-by: Linkun Chen <github@lkchen.net>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-06-24 12:46:10 -07:00
lkchen
d0132f025d
[Misc] Add type alias ReqId and EngineId for better readability ( #19880 )
...
Signed-off-by: Linkun Chen <github@lkchen.net>
2025-06-23 12:57:57 -07:00
lkchen
1bcd15edc7
[BugFix][P/D] Fix for cases where _recving_transfers can be cleaned up when *all* transfer done ( #19874 )
...
Signed-off-by: Linkun Chen <github@lkchen.net>
2025-06-22 22:41:53 -07:00
Nicolò Lucchesi
2ebff5b77c
[P/D][NixlConnector] Support tp_size > num_kv_heads deployments ( #19691 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-06-22 22:41:50 -07:00