xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-29 14:40:54 +08:00

Author	SHA1	Message	Date
Wallas Henrique	8baf85e4e9	[Doc] Compatibility matrix for mutual exclusive features (#8512 ) Signed-off-by: Wallas Santos <wallashss@ibm.com>	2024-10-11 11:18:50 -07:00
AlpinDale	0b5b5d767e	[Frontend] Log the maximum supported concurrency (#8831 )	2024-10-09 00:03:14 -07:00
Sergey Shlyapnikov	f58d4fccc9	[OpenVINO] Enable GPU support for OpenVINO vLLM backend (#8192 )	2024-10-02 17:50:01 -04:00
Russell Bryant	d1537039ce	[Core] Improve choice of Python multiprocessing method (#8823 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: youkaichao <youkaichao@126.com>	2024-09-29 09:17:07 +08:00
David Newman	3368c3ab36	[Bugfix] Ray 2.9.x doesn't expose available_resources_per_node (#8767 ) Signed-off-by: darthhexx <darthhexx@gmail.com>	2024-09-25 00:52:26 -07:00
Alexander Matveev	7c7714d856	[Core][Bugfix][Perf] Introduce `MQLLMEngine` to avoid `asyncio` OH (#8157 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-09-18 13:56:58 +00:00
Rui Qiao	cbdb252259	[Misc] Limit to ray[adag] 2.35 to avoid backward incompatible change (#8509 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2024-09-17 00:06:26 -07:00
Nick Hill	acd5511b6d	[BugFix] Fix clean shutdown issues (#8492 )	2024-09-16 09:33:46 -07:00
Woosuk Kwon	50e9ec41fc	[TPU] Implement multi-step scheduling (#8489 )	2024-09-14 16:58:31 -07:00
Li, Jiang	0b952af458	[Hardware][Intel] Support compressed-tensor W8A8 for CPU backend (#7257 )	2024-09-11 09:46:46 -07:00
William Lin	1d5e397aa4	[Core/Bugfix] pass VLLM_ATTENTION_BACKEND to ray workers (#8172 )	2024-09-10 23:46:08 +00:00
William Lin	12dd715807	[misc] [doc] [frontend] LLM torch profiler support (#7943 )	2024-09-06 17:48:48 -07:00
Rui Qiao	de80783b69	[Misc] Use ray[adag] dependency instead of cuda (#7938 )	2024-09-06 09:18:35 -07:00
Woosuk Kwon	e2b2aa5a0f	[TPU] Align worker index with node boundary (#7932 )	2024-09-01 23:09:46 -07:00
Richard Liu	2148441fd3	[TPU] Support single and multi-host TPUs on GKE (#7613 )	2024-08-30 00:27:40 -07:00
afeldman-nm	428dd1445e	[Core] Logprobs support in Multi-step (#7652 )	2024-08-29 19:19:08 -07:00
youkaichao	f52a43a8b9	[ci][test] fix pp test failure (#7945 )	2024-08-28 01:27:07 -07:00
Kunshang Ji	076169f603	[Hardware][Intel GPU] Add intel GPU pipeline parallel support. (#7810 )	2024-08-27 10:07:02 -07:00
Megha Agarwal	2eedede875	[Core] Asynchronous Output Processor (#7049 ) Co-authored-by: Alexander Matveev <alexm@neuralmagic.com>	2024-08-26 20:53:20 -07:00
omrishiv	760e9f71a8	[Bugfix] neuron: enable tensor parallelism (#7562 ) Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>	2024-08-26 15:13:13 -07:00
Kunshang Ji	fc5ebbd1d3	[Hardware][Intel GPU] refactor xpu_model_runner for tp (#7712 )	2024-08-22 20:06:54 -07:00
SangBin Cho	c01a6cb231	[Ray backend] Better error when pg topology is bad. (#7584 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-08-22 17:44:25 -07:00
youkaichao	7eebe8ccaa	[distributed][misc] error on same VLLM_HOST_IP setting (#7756 )	2024-08-21 16:25:34 -07:00
Antoni Baum	66a9e713a7	[Core] Pipe `worker_class_fn` argument in Executor (#7707 )	2024-08-21 00:37:39 +00:00
Kunshang Ji	c42590f97a	[Hardware] [Intel GPU] refactor xpu worker/executor (#7686 )	2024-08-20 09:54:10 -07:00
Kunshang Ji	b6f99a6ffe	[Core] Refactor executor classes for easier inheritance (#7673 ) [Core] Refactor executor classes to make it easier to inherit GPUExecutor (#7673)	2024-08-20 00:56:50 -07:00
William Lin	47b65a5508	[core] Multi Step Scheduling (#7000 ) Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>	2024-08-19 13:52:13 -07:00
SangBin Cho	ff7ec82c4d	[Core] Optimize SPMD architecture with delta + serialization optimization (#7109 )	2024-08-18 17:57:20 -07:00
Roger Wang	bbf55c4805	[VLM] Refactor `MultiModalConfig` initialization and profiling (#7530 )	2024-08-17 13:30:55 -07:00
omrishiv	9c1f78d5d6	[Bugfix] update neuron for version > 0.5.0 (#7175 ) Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-08-15 09:44:14 -07:00
youkaichao	4d2dc5072b	[hardware] unify usage of is_tpu to current_platform.is_tpu() (#7102 )	2024-08-13 00:16:42 -07:00
Rui Qiao	198d6a2898	[Core] Shut down aDAG workers with clean async llm engine exit (#7224 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2024-08-12 17:57:16 -07:00
Cyrus Leung	4ddc4743d7	[Core] Consolidate `GB` constant and enable float GB arguments (#7416 )	2024-08-12 14:14:14 -07:00
Mahesh Keralapura	933790c209	[Core] Add span metrics for model_forward, scheduler and sampler time (#7089 )	2024-08-09 13:55:13 -07:00
Rui Qiao	22e718ff1a	[Misc] Revive to use loopback address for driver IP (#7091 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2024-08-02 15:50:00 -07:00
Rui Qiao	05308891e2	[Core] Pipeline parallel with Ray ADAG (#6837 ) Support pipeline-parallelism with Ray accelerated DAG. Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2024-08-02 13:55:40 -07:00
youkaichao	660dea1235	[cuda][misc] remove error_on_invalid_device_count_status (#7069 )	2024-08-02 00:14:21 -07:00
Travis Johnson	593e79e733	[Bugfix] torch.set_num_threads() in multiproc_gpu_executor (#6802 ) [Bugfix] Use torch.set_num_threads() to configure parallelism in multiproc_gpu_executor (#6802) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2024-07-26 22:15:20 -07:00
Woosuk Kwon	52f07e3dec	[Hardware][TPU] Implement tensor parallelism with Ray (#5871 )	2024-07-26 20:54:27 -07:00
Li, Jiang	3bbb4936dc	[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125 )	2024-07-26 13:50:10 -07:00
Woosuk Kwon	aa4867791e	[Misc][TPU] Support TPU in initialize_ray_cluster (#6812 )	2024-07-26 19:39:49 +00:00
Anthony Platanios	084a01fd35	[Bugfix] [Easy] Fixed a bug in the multiprocessing GPU executor. (#6770 )	2024-07-25 21:25:35 -07:00
SangBin Cho	1adddb14bf	[Core] Fix ray forward_dag error mssg (#6792 )	2024-07-25 16:53:25 -07:00
Antoni Baum	7bd82002ae	[Core] Allow specifying custom Executor (#6557 )	2024-07-20 01:25:06 +00:00
Nick Hill	b5672a112c	[Core] Multiprocessing Pipeline Parallel support (#6130 ) Co-authored-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-07-18 19:15:52 -07:00
Rui Qiao	61e592747c	[Core] Introduce SPMD worker execution using Ray accelerated DAG (#6032 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>	2024-07-17 22:27:09 -07:00
Murali Andoorveedu	5fa6e9876e	[Bugfix] Fix for multinode crash on 4 PP (#6495 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-07-17 08:25:10 +00:00
youkaichao	09c2eb85dd	[ci][distributed] add pipeline parallel correctness test (#6410 )	2024-07-16 15:44:22 -07:00
Thomas Parnell	eaec4b9153	[Bugfix] Add custom Triton cache manager to resolve MoE MP issue (#6140 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Chih-Chieh-Yang <chih.chieh.yang@ibm.com>	2024-07-15 10:12:47 -07:00
youkaichao	41708e5034	[ci] try to add multi-node tests (#6280 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-07-12 21:51:48 -07:00

1 2 3

128 Commits