youkaichao
|
cc466a3290
|
[Core][Distributed] support cpu&device in broadcast tensor dict (#4660)
[Core][Distributed] support both cpu and device tensor in broadcast tensor dict (#4660)
|
2024-05-07 19:34:47 -07:00 |
|
youkaichao
|
63575bc2e1
|
[Core][Optimization] change python dict to pytorch tensor (#4607)
|
2024-05-06 21:30:27 -07:00 |
|
youkaichao
|
344a5d0c33
|
[Core][Distributed] enable allreduce for multiple tp groups (#4566)
|
2024-05-02 17:32:33 -07:00 |
|
youkaichao
|
5b8a7c1cb0
|
[Misc] centralize all usage of environment variables (#4548)
|
2024-05-02 11:13:25 -07:00 |
|
youkaichao
|
2a85f93007
|
[Core][Distributed] enable multiple tp group (#4512)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2024-05-02 04:28:21 +00:00 |
|
youkaichao
|
6ef09b08f8
|
[Core][Distributed] fix pynccl del error (#4508)
|
2024-05-01 15:23:06 -07:00 |
|
youkaichao
|
f4f921b7f1
|
[Core][Distributed] use cpu group to broadcast metadata in cpu (#4444)
|
2024-04-29 13:52:22 -07:00 |
|
SangBin Cho
|
a88081bf76
|
[CI] Disable non-lazy string operation on logging (#4326)
Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>
|
2024-04-26 00:16:58 -07:00 |
|
youkaichao
|
3cd9b5bb2d
|
[Core][Distributed] use existing torch.cuda.device (#4318)
[Core][Distributed] use existing torch.cuda.device context manager (#4318)
|
2024-04-24 09:00:20 -07:00 |
|
youkaichao
|
91f50a6fe2
|
[Core][Distributed] use cpu/gloo to initialize pynccl (#4248)
|
2024-04-23 18:32:19 -07:00 |
|
SangBin Cho
|
0ae11f78ab
|
[Mypy] Part 3 fix typing for nested directories for most of directory (#4161)
|
2024-04-22 21:32:44 -07:00 |
|
youkaichao
|
747b1a7147
|
[Core][Distributed] fix _is_full_nvlink detection (#4233)
|
2024-04-21 23:04:16 -07:00 |
|
Adam Tilghman
|
8f9c28fd40
|
[Bugfix] Fix CustomAllreduce nvlink topology detection (#3974)
[Bugfix] Fix CustomAllreduce pcie nvlink topology detection (#3974) (#4159)
|
2024-04-18 15:32:47 -07:00 |
|
youkaichao
|
6dc1fc9cfe
|
[Core] nccl integrity check and test (#4155)
[Core] Add integrity check during initialization; add test for it (#4155)
|
2024-04-17 22:28:52 -07:00 |
|
youkaichao
|
2cd6b4f362
|
[Core] avoid too many cuda context by caching p2p test (#4021)
|
2024-04-13 23:40:21 -07:00 |
|
youkaichao
|
98afde19fc
|
[Core][Distributed] improve logging for init dist (#4042)
|
2024-04-13 07:12:53 -07:00 |
|
SangBin Cho
|
09473ee41c
|
[mypy] Add mypy type annotation part 1 (#4006)
|
2024-04-12 14:35:50 -07:00 |
|
youkaichao
|
559eb852f8
|
[Core] init_distributed_environment align with init_process_group(#4014)
[Core][Distributed] make init_distributed_environment compatible with init_process_group (#4014)
|
2024-04-11 14:00:48 -07:00 |
|
SangBin Cho
|
67b4221a61
|
[Core][5/N] Fully working chunked prefill e2e (#3884)
|
2024-04-10 17:56:48 -07:00 |
|
youkaichao
|
63e7176f26
|
[Core][Refactor] move parallel_utils into vllm/distributed (#3950)
[WIP][Core][Refactor] move vllm/model_executor/parallel_utils into vllm/distributed and vllm/device_communicators (#3950)
|
2024-04-10 15:33:30 -07:00 |
|