xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-29 09:07:13 +08:00

Author	SHA1	Message	Date
xwjiang2010	b90d8cd832	[Distributed] Make it clear that % should not be in tensor dict keys. (#5927 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>	2024-06-28 15:20:22 +00:00
xwjiang2010	74d55c065b	[VLM][BugFix] Make sure that `multi_modal_kwargs` can broadcast properly with ring buffer. (#5905 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-06-28 07:29:13 +00:00
xwjiang2010	d12af207d2	[VLM][Bugfix] Make sure that `multi_modal_kwargs` is broadcasted properly (#5880 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>	2024-06-27 15:15:24 +08:00
youkaichao	515080ad2f	[bugfix][distributed] fix shm broadcast when the queue size is full (#5801 )	2024-06-25 21:56:02 -07:00
Matt Wong	dd793d1de5	[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes (#5422 )	2024-06-25 15:56:15 -07:00
Woo-Yeon Lee	2ce5d6688b	[Speculative Decoding] Support draft model on different tensor-parallel size than target model (#5414 )	2024-06-25 09:56:06 +00:00
Murali Andoorveedu	5d4d90536f	[Distributed] Add send and recv helpers (#5719 )	2024-06-23 14:42:28 -07:00
youkaichao	832ea88fcb	[core][distributed] improve shared memory broadcast (#5754 )	2024-06-22 10:00:43 -07:00
youkaichao	d9a252bc8e	[Core][Distributed] add shm broadcast (#5399 ) Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2024-06-21 05:12:35 +00:00
youkaichao	6c5b7af152	[distributed][misc] use fork by default for mp (#5669 )	2024-06-20 17:06:34 -07:00
youkaichao	db5ec52ad7	[bugfix][distributed] improve p2p capability test (#5612 ) [bugfix][distributed] do not error if two processes do not agree on p2p capability (#5612)	2024-06-18 07:21:05 +00:00
Kunshang Ji	728c4c8a06	[Hardware][Intel GPU] Add Intel GPU(XPU) inference backend (#3814 ) Co-authored-by: Jiang Li <jiang1.li@intel.com> Co-authored-by: Abhilash Majumder <abhilash.majumder@intel.com> Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>	2024-06-17 11:01:25 -07:00
Cyrus Leung	0e9164b40a	[mypy] Enable type checking for test directory (#5017 )	2024-06-15 04:45:31 +00:00
youkaichao	f5bb85b435	[Core][Distributed] improve p2p cache generation (#5528 )	2024-06-14 14:47:45 -07:00
youkaichao	d1c3d7d139	[misc][distributed] fix benign error in `is_in_the_same_node` (#5512 )	2024-06-14 10:59:28 -07:00
Antoni Baum	50eed24d25	Add `cuda_device_count_stateless` (#5473 )	2024-06-13 16:06:49 -07:00
youkaichao	ea3890a5f0	[Core][Distributed] code deduplication in tp&pp with coordinator(#5293 ) [Core][Distributed] add coordinator to reduce code duplication in tp and pp (#5293)	2024-06-12 17:27:08 -07:00
youkaichao	c4bd03c7c5	[Core][Distributed] add same-node detection (#5369 )	2024-06-11 10:53:59 -07:00
youkaichao	c81da5f56d	[misc][typo] fix typo (#5372 )	2024-06-10 09:51:02 +00:00
bnellnm	5467ac3196	[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )	2024-06-09 16:23:30 -04:00
youkaichao	594392d27a	[Core][Distributed] improve p2p access check (#4992 )	2024-05-29 11:29:07 +00:00
youkaichao	5bd3c65072	[Core][Optimization] remove vllm-nccl (#5091 )	2024-05-29 05:13:52 +00:00
Murali Andoorveedu	5eda2ea02a	[Core][1/N] Support send/recv in PyNCCL Groups (#4988 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-05-23 09:54:48 -07:00
youkaichao	e08188081b	[Core][Distributed] remove graph mode function (#4818 )	2024-05-16 10:59:52 -07:00
Cody Yu	973617ae02	[Speculative decoding][Re-take] Enable TP>1 speculative decoding (#4840 ) Co-authored-by: Cade Daniel <edacih@gmail.com> Co-authored-by: Cade Daniel <cade@anyscale.com>	2024-05-16 00:53:51 -07:00
youkaichao	702bee461f	[Core][Distributed] refactor custom allreduce to support multiple tp groups (#4754 )	2024-05-12 17:47:59 -07:00
youkaichao	4e12131089	[Core][Test] fix function name typo in custom allreduce (#4750 )	2024-05-10 15:14:40 -07:00
youkaichao	208b71bcc1	[Core][Distributed] refactor pynccl (#4591 ) [Core][Distributed] refactor pynccl to hold multiple communicators (#4591)	2024-05-09 19:48:43 -07:00
youkaichao	cc466a3290	[Core][Distributed] support cpu&device in broadcast tensor dict (#4660 ) [Core][Distributed] support both cpu and device tensor in broadcast tensor dict (#4660)	2024-05-07 19:34:47 -07:00
youkaichao	63575bc2e1	[Core][Optimization] change python dict to pytorch tensor (#4607 )	2024-05-06 21:30:27 -07:00
youkaichao	344a5d0c33	[Core][Distributed] enable allreduce for multiple tp groups (#4566 )	2024-05-02 17:32:33 -07:00
youkaichao	5b8a7c1cb0	[Misc] centralize all usage of environment variables (#4548 )	2024-05-02 11:13:25 -07:00
youkaichao	2a85f93007	[Core][Distributed] enable multiple tp group (#4512 ) Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2024-05-02 04:28:21 +00:00
youkaichao	6ef09b08f8	[Core][Distributed] fix pynccl del error (#4508 )	2024-05-01 15:23:06 -07:00
youkaichao	f4f921b7f1	[Core][Distributed] use cpu group to broadcast metadata in cpu (#4444 )	2024-04-29 13:52:22 -07:00
SangBin Cho	a88081bf76	[CI] Disable non-lazy string operation on logging (#4326 ) Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>	2024-04-26 00:16:58 -07:00
youkaichao	3cd9b5bb2d	[Core][Distributed] use existing torch.cuda.device (#4318 ) [Core][Distributed] use existing torch.cuda.device context manager (#4318)	2024-04-24 09:00:20 -07:00
youkaichao	91f50a6fe2	[Core][Distributed] use cpu/gloo to initialize pynccl (#4248 )	2024-04-23 18:32:19 -07:00
SangBin Cho	0ae11f78ab	[Mypy] Part 3 fix typing for nested directories for most of directory (#4161 )	2024-04-22 21:32:44 -07:00
youkaichao	747b1a7147	[Core][Distributed] fix _is_full_nvlink detection (#4233 )	2024-04-21 23:04:16 -07:00
Adam Tilghman	8f9c28fd40	[Bugfix] Fix CustomAllreduce nvlink topology detection (#3974 ) [Bugfix] Fix CustomAllreduce pcie nvlink topology detection (#3974) (#4159)	2024-04-18 15:32:47 -07:00
youkaichao	6dc1fc9cfe	[Core] nccl integrity check and test (#4155 ) [Core] Add integrity check during initialization; add test for it (#4155)	2024-04-17 22:28:52 -07:00
youkaichao	2cd6b4f362	[Core] avoid too many cuda context by caching p2p test (#4021 )	2024-04-13 23:40:21 -07:00
youkaichao	98afde19fc	[Core][Distributed] improve logging for init dist (#4042 )	2024-04-13 07:12:53 -07:00
SangBin Cho	09473ee41c	[mypy] Add mypy type annotation part 1 (#4006 )	2024-04-12 14:35:50 -07:00
youkaichao	559eb852f8	[Core] init_distributed_environment align with init_process_group(#4014 ) [Core][Distributed] make init_distributed_environment compatible with init_process_group (#4014)	2024-04-11 14:00:48 -07:00
SangBin Cho	67b4221a61	[Core][5/N] Fully working chunked prefill e2e (#3884 )	2024-04-10 17:56:48 -07:00
youkaichao	63e7176f26	[Core][Refactor] move parallel_utils into vllm/distributed (#3950 ) [WIP][Core][Refactor] move vllm/model_executor/parallel_utils into vllm/distributed and vllm/device_communicators (#3950)	2024-04-10 15:33:30 -07:00

48 Commits