omrishiv
|
760e9f71a8
|
[Bugfix] neuron: enable tensor parallelism (#7562)
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
|
2024-08-26 15:13:13 -07:00 |
|
youkaichao
|
05826c887b
|
[misc] fix custom allreduce p2p cache file generation (#7853)
|
2024-08-26 15:02:25 -07:00 |
|
Dipika Sikka
|
dd9857f5fa
|
[Misc] Update gptq_marlin_24 to use vLLMParameters (#7762)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-08-26 17:44:54 -04:00 |
|
Dipika Sikka
|
665304092d
|
[Misc] Update qqq to use vLLMParameters (#7805)
|
2024-08-26 13:16:15 -06:00 |
|
Cody Yu
|
2deb029d11
|
[Performance][BlockManagerV2] Mark prefix cache block as computed after schedule (#7822)
|
2024-08-26 11:24:53 -07:00 |
|
Cyrus Leung
|
029c71de11
|
[CI/Build] Avoid downloading all HF files in RemoteOpenAIServer (#7836)
|
2024-08-26 05:31:10 +00:00 |
|
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
|
0b769992ec
|
[Bugfix]: Use float32 for base64 embedding (#7855)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
|
2024-08-26 03:16:38 +00:00 |
|
Nick Hill
|
1856aff4d6
|
[Spec Decoding] Streamline batch expansion tensor manipulation (#7851)
|
2024-08-25 15:45:14 -07:00 |
|
youkaichao
|
70c094ade6
|
[misc][cuda] improve pynvml warning (#7852)
|
2024-08-25 14:30:09 -07:00 |
|
Isotr0py
|
8aaf3d5347
|
[Model][VLM] Support multi-images inputs for Phi-3-vision models (#7783)
|
2024-08-25 11:51:20 +00:00 |
|
zifeitong
|
80162c44b1
|
[Bugfix] Fix Phi-3v crash when input images are of certain sizes (#7840)
|
2024-08-24 18:16:24 -07:00 |
|
youkaichao
|
7d9ffa2ae1
|
[misc][core] lazy import outlines (#7831)
|
2024-08-24 00:51:38 -07:00 |
|
Tyler Rockwood
|
d81abefd2e
|
[Frontend] add json_schema support from OpenAI protocol (#7654)
|
2024-08-23 23:07:24 -07:00 |
|
Pooya Davoodi
|
8da48e4d95
|
[Frontend] Publish Prometheus metrics in run_batch API (#7641)
|
2024-08-23 23:04:22 -07:00 |
|
Pooya Davoodi
|
6885fde317
|
[Bugfix] Fix run_batch logger (#7640)
|
2024-08-23 13:58:26 -07:00 |
|
Alexander Matveev
|
9db93de20c
|
[Core] Add multi-step support to LLMEngine (#7789)
|
2024-08-23 12:45:53 -07:00 |
|
Simon Mo
|
09c7792610
|
Bump version to v0.5.5 (#7823)
|
2024-08-23 11:35:33 -07:00 |
|
Dipika Sikka
|
f1df5dbfd6
|
[Misc] Update marlin to use vLLMParameters (#7803)
|
2024-08-23 14:30:52 -04:00 |
|
Maximilien de Bayser
|
e25fee57c2
|
[BugFix] Fix server crash on empty prompt (#7746)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2024-08-23 13:12:44 +00:00 |
|
Jie Fu (傅杰)
|
faeddb565d
|
[misc] Add Torch profiler support for CPU-only devices (#7806)
|
2024-08-23 05:46:25 +00:00 |
|
Kunshang Ji
|
fc5ebbd1d3
|
[Hardware][Intel GPU] refactor xpu_model_runner for tp (#7712)
|
2024-08-22 20:06:54 -07:00 |
|
SangBin Cho
|
c01a6cb231
|
[Ray backend] Better error when pg topology is bad. (#7584)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-08-22 17:44:25 -07:00 |
|
Joe Runde
|
b903e1ba7f
|
[Frontend] error suppression cleanup (#7786)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-08-22 21:50:21 +00:00 |
|
Siyuan Liu
|
a152246428
|
[Misc] fix typo in triton import warning (#7794)
|
2024-08-22 13:51:23 -07:00 |
|
Michael Goin
|
15310b5101
|
[Bugfix] Use LoadFormat values for vllm serve --load-format (#7784)
|
2024-08-22 11:37:08 -07:00 |
|
Travis Johnson
|
cc0eaf12b1
|
[Bugfix] spec decode handle None entries in topk args in create_sequence_group_output (#7232)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-08-22 09:33:48 -04:00 |
|
Dipika Sikka
|
955b5191c9
|
[Misc] update fp8 to use vLLMParameter (#7437)
|
2024-08-22 08:36:18 -04:00 |
|
Flex Wang
|
4f419c00a6
|
Fix ShardedStateLoader for vllm fp8 quantization (#7708)
|
2024-08-22 08:25:04 -04:00 |
|
Abhinav Goyal
|
a3fce56b88
|
[Speculative Decoding] EAGLE Implementation with Top-1 proposer (#6830)
|
2024-08-22 02:42:24 -07:00 |
|
Woosuk Kwon
|
b3856bef7d
|
[Misc] Use torch.compile for GemmaRMSNorm (#7642)
|
2024-08-22 01:14:13 -07:00 |
|
Woosuk Kwon
|
eeee1c3b1a
|
[TPU] Avoid initializing TPU runtime in is_tpu (#7763)
|
2024-08-21 21:31:49 -07:00 |
|
Michael Goin
|
aae74ef95c
|
Revert "[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7527)" (#7764)
|
2024-08-22 03:42:14 +00:00 |
|
Joe Runde
|
cde9183b40
|
[Bug][Frontend] Improve ZMQ client robustness (#7443)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-08-22 02:18:11 +00:00 |
|
zifeitong
|
df1a21131d
|
[Model] Fix Phi-3.5-vision-instruct 'num_crops' issue (#7710)
|
2024-08-22 09:36:24 +08:00 |
|
youkaichao
|
7eebe8ccaa
|
[distributed][misc] error on same VLLM_HOST_IP setting (#7756)
|
2024-08-21 16:25:34 -07:00 |
|
Dipika Sikka
|
8678a69ab5
|
[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7527)
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
|
2024-08-21 16:17:10 -07:00 |
|
Peter Salas
|
1ca0d4f86b
|
[Model] Add UltravoxModel and UltravoxConfig (#7615)
|
2024-08-21 22:49:39 +00:00 |
|
William Lin
|
dd53c4b023
|
[misc] Add Torch profiler support (#7451)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-08-21 15:39:26 -07:00 |
|
Robert Shaw
|
970dfdc01d
|
[Frontend] Improve Startup Failure UX (#7716)
|
2024-08-21 19:53:01 +00:00 |
|
William Lin
|
91f4522cbf
|
[multi-step] Raise error if not using async engine (#7703)
|
2024-08-21 11:49:19 -07:00 |
|
Robert Shaw
|
f7e3b0c5aa
|
[Bugfix][Frontend] Fix Issues Under High Load With zeromq Frontend (#7394)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-08-21 13:34:14 -04:00 |
|
Brian Li
|
d3c002eadc
|
[Bugfix] chat method add_generation_prompt param (#7734)
|
2024-08-21 17:33:35 +00:00 |
|
Nick Hill
|
9b73a2f498
|
[Spec Decoding] Use target model max length as default for draft model (#7706)
|
2024-08-22 00:23:22 +08:00 |
|
Isotr0py
|
6925cdbeea
|
[Bugfix][Hardware][CPU] Fix mm_limits initialization for CPU backend (#7735)
|
2024-08-21 16:23:03 +00:00 |
|
LI MOU
|
53328d7536
|
[BUG] fix crash on flashinfer backend with cudagraph disabled, when attention group_size not in [1,2,4,8] (#7509)
|
2024-08-21 08:54:31 -07:00 |
|
Nick Hill
|
c75363fbc0
|
[BugFix] Avoid premature async generator exit and raise all exception variations (#7698)
|
2024-08-21 11:45:55 -04:00 |
|
Cyrus Leung
|
baaedfdb2d
|
[mypy] Enable following imports for entrypoints (#7248)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Fei <dfdfcai4@gmail.com>
|
2024-08-20 23:28:21 -07:00 |
|
Isotr0py
|
12e1c65bc9
|
[Model] Add AWQ quantization support for InternVL2 model (#7187)
|
2024-08-20 23:18:57 -07:00 |
|
youkaichao
|
b74a125800
|
[ci] try to log process using the port to debug the port usage (#7711)
|
2024-08-20 17:41:12 -07:00 |
|
Antoni Baum
|
66a9e713a7
|
[Core] Pipe worker_class_fn argument in Executor (#7707)
|
2024-08-21 00:37:39 +00:00 |
|