Jani Monoses
|
9c485d9e25
|
[Core] Free CPU pinned memory on environment cleanup (#10477)
|
2025-01-21 11:56:41 -08:00 |
|
shangmingc
|
df450aa567
|
[Bugfix] Fix num_heads value for simple connector when tp enabled (#12074)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-01-20 02:56:43 +00:00 |
|
youkaichao
|
ad34c0df0f
|
[core] platform agnostic executor via collective_rpc (#11256)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-15 13:45:21 +08:00 |
|
youkaichao
|
310aca88c9
|
[perf]fix current stream (#11870)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-09 07:18:21 +00:00 |
|
Harry Mellor
|
aba8d6ee00
|
[Doc] Move examples into categories (#11840)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-01-08 13:09:53 +00:00 |
|
XiaobingZhang
|
e512f76a89
|
fix init error for MessageQueue when n_local_reader is zero (#11768)
|
2025-01-07 06:12:48 +00:00 |
|
cennn
|
9e764e7b10
|
[distributed] remove pynccl's redundant change_state (#11749)
|
2025-01-06 09:05:48 +08:00 |
|
cennn
|
635b897246
|
[distributed] remove pynccl's redundant stream (#11744)
|
2025-01-05 23:09:11 +08:00 |
|
Yan Burman
|
300acb8347
|
[Core][Bugfix] Use correct device to initialize GPU data during CUDA-graph-capture (#11233)
Signed-off-by: Yan Burman <yanburman@users.noreply.github.com>
Signed-off-by: Ido Asraff <idoa@atero.ai>
|
2025-01-04 14:50:16 +08:00 |
|
youkaichao
|
b12e87f942
|
[platforms] enable platform plugins (#11602)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-30 20:24:45 +08:00 |
|
Kuntai Du
|
faef77c0d6
|
[Misc] KV cache transfer connector registry (#11481)
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
|
2024-12-29 16:08:09 +00:00 |
|
shangmingc
|
d263bd9df7
|
[Core] Support disaggregated prefill with Mooncake Transfer Engine (#10884)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2024-12-15 21:28:18 +00:00 |
|
shangmingc
|
db6c264a1e
|
[Bugfix] Fix value unpack error of simple connector for KVCache transfer. (#11058)
Signed-off-by: ShangmingCai <csmthu@gmail.com>
|
2024-12-12 21:19:17 +00:00 |
|
youkaichao
|
62de37a38e
|
[core][distributed] initialization from StatelessProcessGroup (#10986)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-12 09:04:19 +00:00 |
|
Tyler Michael Smith
|
28b3a1c7e5
|
[V1] Multiprocessing Tensor Parallel Support for v1 (#9856)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-12-10 06:28:14 +00:00 |
|
youkaichao
|
21fe7b481a
|
[core][distributed] add pynccl broadcast (#10843)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-03 04:53:23 +00:00 |
|
Kuntai Du
|
0590ec3fd9
|
[Core] Implement disagg prefill by StatelessProcessGroup (#10502)
This PR provides initial support for single-node disaggregated prefill in 1P1D scenario.
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Co-authored-by: ApostaC <yihua98@uchicago.edu>
Co-authored-by: YaoJiayi <120040070@link.cuhk.edu.cn>
|
2024-12-01 19:01:00 -06:00 |
|
Sage Moore
|
9a88f89799
|
custom allreduce + torch.compile (#10121)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-11-25 22:00:16 -08:00 |
|
Tyler Michael Smith
|
978b39744b
|
[Misc] Add pynccl wrappers for all_gather and reduce_scatter (#9432)
|
2024-11-22 22:14:03 -05:00 |
|
youkaichao
|
0d4ea3fb5c
|
[core][distributed] use tcp store directly (#10275)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-12 17:36:08 -08:00 |
|
youkaichao
|
e6de9784d2
|
[core][distributed] add stateless process group (#10216)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-11 09:02:14 -08:00 |
|
Yan Ma
|
f10797c0ce
|
[Bugfix][XPU] Fix xpu tp by introducing XpuCommunicator (#10144)
Signed-off-by: yan ma <yan.ma@intel.com>
|
2024-11-08 09:41:03 +00:00 |
|
Hanzhi Zhou
|
6192e9b8fe
|
[Core][Distributed] Refactor ipc buffer init in CustomAllreduce (#10030)
Signed-off-by: Hanzhi Zhou <hanzhi713@gmail.com>
|
2024-11-06 23:50:47 -08:00 |
|
youkaichao
|
719c1ca468
|
[core][distributed] add stateless_init_process_group (#10072)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-06 16:42:09 -08:00 |
|
Russell Bryant
|
098f94de42
|
[CI/Build] Drop Python 3.8 support (#10038)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-06 14:31:01 +00:00 |
|
Konrad Zawora
|
a02a50e6e5
|
[Hardware][Intel-Gaudi] Add Intel Gaudi (HPU) inference backend (#6143)
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Signed-off-by: Bob Zhu <bob.zhu@intel.com>
Signed-off-by: zehao-intel <zehao.huang@intel.com>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>
Co-authored-by: Michal Adamczyk <madamczyk@habana.ai>
Co-authored-by: Marceli Fylcek <mfylcek@habana.ai>
Co-authored-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com>
Co-authored-by: Vivek Goel <vgoel@habana.ai>
Co-authored-by: yuwenzho <yuwen.zhou@intel.com>
Co-authored-by: Dominika Olszewska <dolszewska@habana.ai>
Co-authored-by: barak goldberg <149692267+bgoldberg-habana@users.noreply.github.com>
Co-authored-by: Michal Szutenberg <37601244+szutenberg@users.noreply.github.com>
Co-authored-by: Jan Kaniecki <jkaniecki@habana.ai>
Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyniewicz-habana@users.noreply.github.com>
Co-authored-by: Krzysztof Wisniewski <kwisniewski@habana.ai>
Co-authored-by: Dudi Lester <160421192+dudilester@users.noreply.github.com>
Co-authored-by: Ilia Taraban <tarabanil@gmail.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>
Co-authored-by: Jakub Maksymczuk <jmaksymczuk@habana.ai>
Co-authored-by: Tomasz Zielinski <85164140+tzielinski-habana@users.noreply.github.com>
Co-authored-by: Sun Choi <schoi@habana.ai>
Co-authored-by: Iryna Boiko <iboiko@habana.ai>
Co-authored-by: Bob Zhu <41610754+czhu15@users.noreply.github.com>
Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>
Co-authored-by: Zehao Huang <zehao.huang@intel.com>
Co-authored-by: Andrzej Kotłowski <Andrzej.Kotlowski@intel.com>
Co-authored-by: Yan Tomsinsky <73292515+Yantom1@users.noreply.github.com>
Co-authored-by: Nir David <ndavid@habana.ai>
Co-authored-by: Yu-Zhou <yu.zhou@intel.com>
Co-authored-by: Ruheena Suhani Shaik <rsshaik@habana.ai>
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
Co-authored-by: Marcin Swiniarski <mswiniarski@habana.ai>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Jacek Czaja <jacek.czaja@intel.com>
Co-authored-by: Jacek Czaja <jczaja@habana.ai>
Co-authored-by: Yuan <yuan.zhou@outlook.com>
|
2024-11-06 01:09:10 -08:00 |
|
Aaron Pham
|
21063c11c7
|
[CI/Build] drop support for Python 3.8 EOL (#8464)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2024-11-06 07:11:55 +00:00 |
|
youkaichao
|
4be3a45158
|
[distributed] add function to create ipc buffers directly (#10064)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-05 22:35:03 -08:00 |
|
Tyler Michael Smith
|
04bbf38e05
|
[Core] Use os.sched_yield in ShmRingBuffer instead of time.sleep (#9994)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-11-05 01:08:21 +00:00 |
|
youkaichao
|
96e0c9cbbd
|
[torch.compile] directly register custom op (#9896)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-10-31 21:56:09 -07:00 |
|
Yan Ma
|
04a3ae0aca
|
[Bugfix] Fix multi nodes TP+PP for XPU (#8884)
Signed-off-by: YiSheng5 <syhm@mail.ustc.edu.cn>
Signed-off-by: yan ma <yan.ma@intel.com>
Co-authored-by: YiSheng5 <syhm@mail.ustc.edu.cn>
|
2024-10-29 21:34:45 -07:00 |
|
youkaichao
|
1ab6f6b4ad
|
[core][distributed] fix custom allreduce in pytorch 2.5 (#9815)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-10-29 17:06:24 -07:00 |
|
Yongzao
|
ad6f78053e
|
[torch.compile] expanding support and fix allgather compilation (#9637)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-10-24 01:32:15 -07:00 |
|
wangshuai09
|
3ddbe25502
|
[Hardware][CPU] using current_platform.is_cpu (#9536)
|
2024-10-22 00:50:43 -07:00 |
|
Cody Yu
|
d11bf435a0
|
[MISC] Consolidate cleanup() and refactor offline_inference_with_prefix.py (#9510)
|
2024-10-18 14:30:55 -07:00 |
|
youkaichao
|
663874e048
|
[torch.compile] improve allreduce registration (#9061)
|
2024-10-04 16:43:50 -07:00 |
|
youkaichao
|
18e60d7d13
|
[misc][distributed] add VLLM_SKIP_P2P_CHECK flag (#8911)
|
2024-09-27 14:27:56 -07:00 |
|
Russell Bryant
|
b05f5c9238
|
[Core] Allow IPv6 in VLLM_HOST_IP with zmq (#8575)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-09-23 12:15:41 -07:00 |
|
Kunshang Ji
|
d4bf085ad0
|
[MISC] add support custom_op check (#8557)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-09-20 19:03:55 -07:00 |
|
Russell Bryant
|
d65798f78c
|
[Core] zmq: bind only to 127.0.0.1 for local-only usage (#8543)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-09-18 16:10:27 +00:00 |
|
Cyrus Leung
|
6ffa3f314c
|
[CI/Build] Avoid CUDA initialization (#8534)
|
2024-09-18 10:38:11 +00:00 |
|
youkaichao
|
99aa4eddaf
|
[torch.compile] register allreduce operations as custom ops (#8526)
|
2024-09-16 22:57:57 -07:00 |
|
Richard Liu
|
2148441fd3
|
[TPU] Support single and multi-host TPUs on GKE (#7613)
|
2024-08-30 00:27:40 -07:00 |
|
youkaichao
|
05826c887b
|
[misc] fix custom allreduce p2p cache file generation (#7853)
|
2024-08-26 15:02:25 -07:00 |
|
youkaichao
|
d95cc0a55c
|
[core][misc] update libcudart finding (#7620)
Co-authored-by: cjackal <44624812+cjackal@users.noreply.github.com>
|
2024-08-16 23:01:35 -07:00 |
|
bnellnm
|
e680349994
|
[Bugfix] Fix custom_ar support check (#7617)
|
2024-08-16 19:05:49 -07:00 |
|
Woosuk Kwon
|
59edd0f134
|
[Bugfix][CI] Import ray under guard (#7486)
|
2024-08-13 17:12:58 -07:00 |
|
Woosuk Kwon
|
a08df8322e
|
[TPU] Support multi-host inference (#7457)
|
2024-08-13 16:31:20 -07:00 |
|
Cyrus Leung
|
7025b11d94
|
[Bugfix] Fix weight loading for Chameleon when TP>1 (#7410)
|
2024-08-13 05:33:41 +00:00 |
|
youkaichao
|
639159b2a6
|
[distributed][misc] add specialized method for cuda platform (#7249)
|
2024-08-07 08:54:52 -07:00 |
|