128 Commits

Author SHA1 Message Date
Wallas Henrique
8baf85e4e9
[Doc] Compatibility matrix for mutual exclusive features (#8512)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
2024-10-11 11:18:50 -07:00
AlpinDale
0b5b5d767e
[Frontend] Log the maximum supported concurrency (#8831) 2024-10-09 00:03:14 -07:00
Sergey Shlyapnikov
f58d4fccc9
[OpenVINO] Enable GPU support for OpenVINO vLLM backend (#8192) 2024-10-02 17:50:01 -04:00
Russell Bryant
d1537039ce
[Core] Improve choice of Python multiprocessing method (#8823)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: youkaichao <youkaichao@126.com>
2024-09-29 09:17:07 +08:00
David Newman
3368c3ab36
[Bugfix] Ray 2.9.x doesn't expose available_resources_per_node (#8767)
Signed-off-by: darthhexx <darthhexx@gmail.com>
2024-09-25 00:52:26 -07:00
Alexander Matveev
7c7714d856
[Core][Bugfix][Perf] Introduce MQLLMEngine to avoid asyncio OH (#8157)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-09-18 13:56:58 +00:00
Rui Qiao
cbdb252259
[Misc] Limit to ray[adag] 2.35 to avoid backward incompatible change (#8509)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2024-09-17 00:06:26 -07:00
Nick Hill
acd5511b6d
[BugFix] Fix clean shutdown issues (#8492) 2024-09-16 09:33:46 -07:00
Woosuk Kwon
50e9ec41fc
[TPU] Implement multi-step scheduling (#8489) 2024-09-14 16:58:31 -07:00
Li, Jiang
0b952af458
[Hardware][Intel] Support compressed-tensor W8A8 for CPU backend (#7257) 2024-09-11 09:46:46 -07:00
William Lin
1d5e397aa4
[Core/Bugfix] pass VLLM_ATTENTION_BACKEND to ray workers (#8172) 2024-09-10 23:46:08 +00:00
William Lin
12dd715807
[misc] [doc] [frontend] LLM torch profiler support (#7943) 2024-09-06 17:48:48 -07:00
Rui Qiao
de80783b69
[Misc] Use ray[adag] dependency instead of cuda (#7938) 2024-09-06 09:18:35 -07:00
Woosuk Kwon
e2b2aa5a0f
[TPU] Align worker index with node boundary (#7932) 2024-09-01 23:09:46 -07:00
Richard Liu
2148441fd3
[TPU] Support single and multi-host TPUs on GKE (#7613) 2024-08-30 00:27:40 -07:00
afeldman-nm
428dd1445e
[Core] Logprobs support in Multi-step (#7652) 2024-08-29 19:19:08 -07:00
youkaichao
f52a43a8b9
[ci][test] fix pp test failure (#7945) 2024-08-28 01:27:07 -07:00
Kunshang Ji
076169f603
[Hardware][Intel GPU] Add intel GPU pipeline parallel support. (#7810) 2024-08-27 10:07:02 -07:00
Megha Agarwal
2eedede875
[Core] Asynchronous Output Processor (#7049)
Co-authored-by: Alexander Matveev <alexm@neuralmagic.com>
2024-08-26 20:53:20 -07:00
omrishiv
760e9f71a8
[Bugfix] neuron: enable tensor parallelism (#7562)
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
2024-08-26 15:13:13 -07:00
Kunshang Ji
fc5ebbd1d3
[Hardware][Intel GPU] refactor xpu_model_runner for tp (#7712) 2024-08-22 20:06:54 -07:00
SangBin Cho
c01a6cb231
[Ray backend] Better error when pg topology is bad. (#7584)
Co-authored-by: youkaichao <youkaichao@126.com>
2024-08-22 17:44:25 -07:00
youkaichao
7eebe8ccaa
[distributed][misc] error on same VLLM_HOST_IP setting (#7756) 2024-08-21 16:25:34 -07:00
Antoni Baum
66a9e713a7
[Core] Pipe worker_class_fn argument in Executor (#7707) 2024-08-21 00:37:39 +00:00
Kunshang Ji
c42590f97a
[Hardware] [Intel GPU] refactor xpu worker/executor (#7686) 2024-08-20 09:54:10 -07:00
Kunshang Ji
b6f99a6ffe
[Core] Refactor executor classes for easier inheritance (#7673)
[Core] Refactor executor classes to make it easier to inherit GPUExecutor (#7673)
2024-08-20 00:56:50 -07:00
William Lin
47b65a5508
[core] Multi Step Scheduling (#7000)
Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>
2024-08-19 13:52:13 -07:00
SangBin Cho
ff7ec82c4d
[Core] Optimize SPMD architecture with delta + serialization optimization (#7109) 2024-08-18 17:57:20 -07:00
Roger Wang
bbf55c4805
[VLM] Refactor MultiModalConfig initialization and profiling (#7530) 2024-08-17 13:30:55 -07:00
omrishiv
9c1f78d5d6
[Bugfix] update neuron for version > 0.5.0 (#7175)
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-15 09:44:14 -07:00
youkaichao
4d2dc5072b
[hardware] unify usage of is_tpu to current_platform.is_tpu() (#7102) 2024-08-13 00:16:42 -07:00
Rui Qiao
198d6a2898
[Core] Shut down aDAG workers with clean async llm engine exit (#7224)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2024-08-12 17:57:16 -07:00
Cyrus Leung
4ddc4743d7
[Core] Consolidate GB constant and enable float GB arguments (#7416) 2024-08-12 14:14:14 -07:00
Mahesh Keralapura
933790c209
[Core] Add span metrics for model_forward, scheduler and sampler time (#7089) 2024-08-09 13:55:13 -07:00
Rui Qiao
22e718ff1a
[Misc] Revive to use loopback address for driver IP (#7091)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2024-08-02 15:50:00 -07:00
Rui Qiao
05308891e2
[Core] Pipeline parallel with Ray ADAG (#6837)
Support pipeline-parallelism with Ray accelerated DAG.

Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2024-08-02 13:55:40 -07:00
youkaichao
660dea1235
[cuda][misc] remove error_on_invalid_device_count_status (#7069) 2024-08-02 00:14:21 -07:00
Travis Johnson
593e79e733
[Bugfix] torch.set_num_threads() in multiproc_gpu_executor (#6802)
[Bugfix] Use torch.set_num_threads() to configure parallelism in multiproc_gpu_executor (#6802)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2024-07-26 22:15:20 -07:00
Woosuk Kwon
52f07e3dec
[Hardware][TPU] Implement tensor parallelism with Ray (#5871) 2024-07-26 20:54:27 -07:00
Li, Jiang
3bbb4936dc
[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125) 2024-07-26 13:50:10 -07:00
Woosuk Kwon
aa4867791e
[Misc][TPU] Support TPU in initialize_ray_cluster (#6812) 2024-07-26 19:39:49 +00:00
Anthony Platanios
084a01fd35
[Bugfix] [Easy] Fixed a bug in the multiprocessing GPU executor. (#6770) 2024-07-25 21:25:35 -07:00
SangBin Cho
1adddb14bf
[Core] Fix ray forward_dag error mssg (#6792) 2024-07-25 16:53:25 -07:00
Antoni Baum
7bd82002ae
[Core] Allow specifying custom Executor (#6557) 2024-07-20 01:25:06 +00:00
Nick Hill
b5672a112c
[Core] Multiprocessing Pipeline Parallel support (#6130)
Co-authored-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-18 19:15:52 -07:00
Rui Qiao
61e592747c
[Core] Introduce SPMD worker execution using Ray accelerated DAG (#6032)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
2024-07-17 22:27:09 -07:00
Murali Andoorveedu
5fa6e9876e
[Bugfix] Fix for multinode crash on 4 PP (#6495)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-17 08:25:10 +00:00
youkaichao
09c2eb85dd
[ci][distributed] add pipeline parallel correctness test (#6410) 2024-07-16 15:44:22 -07:00
Thomas Parnell
eaec4b9153
[Bugfix] Add custom Triton cache manager to resolve MoE MP issue (#6140)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Chih-Chieh-Yang <chih.chieh.yang@ibm.com>
2024-07-15 10:12:47 -07:00
youkaichao
41708e5034
[ci] try to add multi-node tests (#6280)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-12 21:51:48 -07:00