Cyrus Leung
|
3f674a49b5
|
[VLM][Core] Support profiling with multiple multi-modal inputs per prompt (#7126)
|
2024-08-14 17:55:42 +00:00 |
|
Wallas Henrique
|
70b746efcf
|
[Misc] Deprecation Warning when setting --engine-use-ray (#7424)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-08-14 09:44:27 -07:00 |
|
jack
|
67d115db08
|
[Bugfix][Frontend] Disable embedding API for chat models (#7504)
Co-authored-by: jack <jack@alex>
|
2024-08-14 09:15:19 -07:00 |
|
youkaichao
|
d3d9cb6e4b
|
[ci] fix model tests (#7507)
|
2024-08-14 01:01:43 -07:00 |
|
Chang Su
|
c134a46402
|
Fix empty output when temp is too low (#2937)
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2024-08-14 05:31:44 +00:00 |
|
youkaichao
|
199adbb7cf
|
[doc] update test script to include cudagraph (#7501)
|
2024-08-13 21:52:58 -07:00 |
|
Cyrus Leung
|
dd164d72f3
|
[Bugfix][Docs] Update list of mock imports (#7493)
|
2024-08-13 20:37:30 -07:00 |
|
youkaichao
|
ea49e6a3c8
|
[misc][ci] fix cpu test with plugins (#7489)
|
2024-08-13 19:27:46 -07:00 |
|
Jee Jee Li
|
97992802f3
|
[CI/Build]Reduce the time consumption for LoRA tests (#7396)
|
2024-08-13 17:27:29 -07:00 |
|
Woosuk Kwon
|
59edd0f134
|
[Bugfix][CI] Import ray under guard (#7486)
|
2024-08-13 17:12:58 -07:00 |
|
Woosuk Kwon
|
a08df8322e
|
[TPU] Support multi-host inference (#7457)
|
2024-08-13 16:31:20 -07:00 |
|
youkaichao
|
16422ea76f
|
[misc][plugin] add plugin system implementation (#7426)
|
2024-08-13 16:24:17 -07:00 |
|
Kyle Sayers
|
373538f973
|
[Misc] compressed-tensors code reuse (#7277)
|
2024-08-13 19:05:15 -04:00 |
|
youkaichao
|
33e5d7e6b6
|
[frontend] spawn engine process from api server process (#7484)
|
2024-08-13 15:40:17 -07:00 |
|
Simon Mo
|
c5c7768264
|
Announce NVIDIA Meetup (#7483)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-08-13 14:28:36 -07:00 |
|
Dipika Sikka
|
b1e5afc3e7
|
[Misc] Update awq and awq_marlin to use vLLMParameters (#7422)
|
2024-08-13 17:08:20 -04:00 |
|
Dipika Sikka
|
d3bdfd3ab9
|
[Misc] Update Fused MoE weight loading (#7334)
|
2024-08-13 14:57:45 -04:00 |
|
Dipika Sikka
|
fb377d7e74
|
[Misc] Update gptq_marlin to use new vLLMParameters (#7281)
|
2024-08-13 14:30:11 -04:00 |
|
Dipika Sikka
|
181abbc27d
|
[Misc] Update LM Eval Tolerance (#7473)
|
2024-08-13 14:28:14 -04:00 |
|
Peter Salas
|
00c3d68e45
|
[Frontend][Core] Add plumbing to support audio language models (#7446)
|
2024-08-13 17:39:33 +00:00 |
|
Woosuk Kwon
|
e20233d361
|
Revert "[Doc] Update supported_hardware.rst (#7276)" (#7467)
|
2024-08-13 01:37:08 -07:00 |
|
Woosuk Kwon
|
d6e634f3d7
|
[TPU] Suppress import custom_ops warning (#7458)
|
2024-08-13 00:30:30 -07:00 |
|
youkaichao
|
4d2dc5072b
|
[hardware] unify usage of is_tpu to current_platform.is_tpu() (#7102)
|
2024-08-13 00:16:42 -07:00 |
|
Cyrus Leung
|
7025b11d94
|
[Bugfix] Fix weight loading for Chameleon when TP>1 (#7410)
|
2024-08-13 05:33:41 +00:00 |
|
Kevin H. Luu
|
5469146bcc
|
[ci] Remove fast check cancel workflow (#7455)
|
2024-08-12 21:19:51 -07:00 |
|
Andrew Wang
|
97a6be95ba
|
[Misc] improve logits processors logging message (#7435)
|
2024-08-13 02:29:34 +00:00 |
|
Cyrus Leung
|
9ba85bc152
|
[mypy] Misc. typing improvements (#7417)
|
2024-08-13 09:20:20 +08:00 |
|
Rui Qiao
|
198d6a2898
|
[Core] Shut down aDAG workers with clean async llm engine exit (#7224)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2024-08-12 17:57:16 -07:00 |
|
Daniele
|
774cd1d3bf
|
[CI/Build] bump minimum cmake version (#6999)
|
2024-08-12 16:29:20 -07:00 |
|
sasha0552
|
91294d56e1
|
[Bugfix] Handle PackageNotFoundError when checking for xpu version (#7398)
|
2024-08-12 16:07:20 -07:00 |
|
jon-chuang
|
a046f86397
|
[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel (#7208)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-08-12 22:47:41 +00:00 |
|
Cyrus Leung
|
4ddc4743d7
|
[Core] Consolidate GB constant and enable float GB arguments (#7416)
|
2024-08-12 14:14:14 -07:00 |
|
Lucas Wilkinson
|
6aa33cb2dd
|
[Misc] Use scalar type to dispatch to different gptq_marlin kernels (#7323)
|
2024-08-12 14:40:13 -04:00 |
|
Kevin H. Luu
|
1137f343aa
|
[ci] Cancel fastcheck when PR is ready (#7433)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-08-12 10:59:14 -07:00 |
|
Kevin H. Luu
|
9b3e2edd30
|
[ci] Cancel fastcheck run when PR is marked ready (#7427)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-08-12 10:56:52 -07:00 |
|
Kevin H. Luu
|
65950e8f58
|
[ci] Entrypoints run upon changes in vllm/ (#7423)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-08-12 10:18:03 -07:00 |
|
Woosuk Kwon
|
cfba4def5d
|
[Bugfix] Fix logit soft cap in flash-attn backend (#7425)
|
2024-08-12 09:58:28 -07:00 |
|
Daniele
|
d2bc4510a4
|
[CI/Build] bump Dockerfile.neuron image base, use public ECR (#6832)
|
2024-08-12 09:53:35 -07:00 |
|
Cyrus Leung
|
24154f8618
|
[Frontend] Disallow passing model as both argument and option (#7347)
|
2024-08-12 12:58:34 +00:00 |
|
Roger Wang
|
e6e42e4b17
|
[Core][VLM] Support image embeddings as input (#6613)
|
2024-08-12 16:16:06 +08:00 |
|
Lily Liu
|
ec2affa8ae
|
[Kernel] Flashinfer correctness fix for v0.1.3 (#7319)
|
2024-08-12 07:59:17 +00:00 |
|
Roger Wang
|
86ab567bae
|
[CI/Build] Minor refactoring for vLLM assets (#7407)
|
2024-08-12 02:41:52 +00:00 |
|
Simon Mo
|
f020a6297e
|
[Docs] Update readme (#7316)
|
2024-08-11 17:13:37 -07:00 |
|
youkaichao
|
6c8e595710
|
[misc] add commit id in collect env (#7405)
|
2024-08-11 15:40:48 -07:00 |
|
tomeras91
|
02b1988b9f
|
[Doc] building vLLM with VLLM_TARGET_DEVICE=empty (#7403)
|
2024-08-11 14:38:17 -07:00 |
|
tomeras91
|
386087970a
|
[CI/Build] build on empty device for better dev experience (#4773)
|
2024-08-11 13:09:44 -07:00 |
|
William Lin
|
c08e2b3086
|
[core] [2/N] refactor worker_base input preparation for multi-step (#7387)
|
2024-08-11 08:50:08 -07:00 |
|
Noam Gat
|
4fb7b52a2c
|
Updating LM Format Enforcer version to v0.10.6 (#7189)
|
2024-08-11 08:11:50 -04:00 |
|
Woosuk Kwon
|
90bab18f24
|
[TPU] Use mark_dynamic to reduce compilation time (#7340)
|
2024-08-10 18:12:22 -07:00 |
|
Isotr0py
|
4c5d8e8ea9
|
[Bugfix] Fix phi3v batch inference when images have different aspect ratio (#7392)
|
2024-08-10 16:19:33 +00:00 |
|