Alex Brooks
|
9da25a88aa
|
[MODEL] Qwen Multimodal Support (Qwen-VL / Qwen-VL-Chat) (#8029)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-05 12:48:10 +00:00 |
|
manikandan.tm@zucisystems.com
|
8685ba1a1e
|
Inclusion of InternVLChatModel In PP_SUPPORTED_MODELS(Pipeline Parallelism) (#7860)
|
2024-09-05 11:33:37 +00:00 |
|
Cyrus Leung
|
288a938872
|
[Doc] Indicate more information about supported modalities (#8181)
|
2024-09-05 10:51:53 +00:00 |
|
Elfie Guo
|
e39ebf5cf5
|
[Core/Bugfix] Add query dtype as per FlashInfer API requirements. (#8173)
|
2024-09-05 05:12:26 +00:00 |
|
Kevin H. Luu
|
ba262c4e5a
|
[ci] Mark LoRA test as soft-fail (#8160)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-09-04 20:33:12 -07:00 |
|
Woosuk Kwon
|
4624d98dbd
|
[Misc] Clean up RoPE forward_native (#8076)
|
2024-09-04 20:31:48 -07:00 |
|
William Lin
|
1afc931987
|
[bugfix] >1.43 constraint for openai (#8169)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-09-04 17:35:36 -07:00 |
|
Maureen McElaney
|
e01c2beb7d
|
[Doc] [Misc] Create CODE_OF_CONDUCT.md (#8161)
|
2024-09-04 16:50:13 -07:00 |
|
Simon Mo
|
32e7db2536
|
Bump version to v0.6.0 (#8166)
v0.6.0
|
2024-09-04 16:34:27 -07:00 |
|
Harsha vardhan manoj Bikki
|
008cf886c9
|
[Neuron] Adding support for adding/ overriding neuron configuration a… (#8062)
Co-authored-by: Harsha Bikki <harbikh@amazon.com>
|
2024-09-04 16:33:43 -07:00 |
|
Cody Yu
|
77d9e514a2
|
[MISC] Replace input token throughput with total token throughput (#8164)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-09-04 20:23:22 +00:00 |
|
Kyle Mistele
|
e02ce498be
|
[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649)
Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com>
Co-authored-by: Kyle Mistele <kyle@constellate.ai>
|
2024-09-04 13:18:13 -07:00 |
|
Woosuk Kwon
|
561d6f8077
|
[CI] Change test input in Gemma LoRA test (#8163)
|
2024-09-04 13:05:50 -07:00 |
|
alexeykondrat
|
d1dec64243
|
[CI/Build][ROCm] Enabling LoRA tests on ROCm (#7369)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-09-04 11:57:54 -07:00 |
|
Cody Yu
|
2ad2e5608e
|
[MISC] Consolidate FP8 kv-cache tests (#8131)
|
2024-09-04 18:53:25 +00:00 |
|
wnma
|
d3311562fb
|
[Bugfix] remove post_layernorm in siglip (#8106)
|
2024-09-04 18:55:37 +08:00 |
|
TimWang
|
ccd7207191
|
chore: Update check-wheel-size.py to read MAX_SIZE_MB from env (#8103)
|
2024-09-03 23:17:05 -07:00 |
|
Cyrus Leung
|
855c262a6b
|
[Frontend] Multimodal support in offline chat (#8098)
|
2024-09-04 05:22:17 +00:00 |
|
Peter Salas
|
2be8ec6e71
|
[Model] Add Ultravox support for multiple audio chunks (#7963)
|
2024-09-04 04:38:21 +00:00 |
|
Dipika Sikka
|
e16fa99a6a
|
[Misc] Update fbgemmfp8 to use vLLMParameters (#7972)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-09-03 20:12:41 -06:00 |
|
Woosuk Kwon
|
61f4a93d14
|
[TPU][Bugfix] Use XLA rank for persistent cache path (#8137)
|
2024-09-03 18:35:33 -07:00 |
|
Nick Hill
|
d4db9f53c8
|
[Benchmark] Add --async-engine option to benchmark_throughput.py (#7964)
|
2024-09-03 20:57:41 -04:00 |
|
Dipika Sikka
|
2188a60c7e
|
[Misc] Update GPTQ to use vLLMParameters (#7976)
|
2024-09-03 17:21:44 -04:00 |
|
Simon Mo
|
dc0b6066ab
|
[CI] Change PR remainder to avoid at-mentions (#8134)
|
2024-09-03 14:11:42 -07:00 |
|
Woosuk Kwon
|
0af3abe3d3
|
[TPU][Bugfix] Fix next_token_ids shape (#8128)
|
2024-09-03 13:29:24 -07:00 |
|
Kevin H. Luu
|
f1575dc99f
|
[ci] Fix GHA workflow (#8129)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-09-03 13:25:09 -07:00 |
|
tomeras91
|
c02638efb3
|
[CI/Build] make pip install vllm work in macos (for import only) (#8118)
|
2024-09-03 12:37:08 -07:00 |
|
Antoni Baum
|
652c83b697
|
[Misc] Raise a more informative exception in add/remove_logger (#7750)
|
2024-09-03 12:28:25 -07:00 |
|
Alexander Matveev
|
6d646d08a2
|
[Core] Optimize Async + Multi-step (#8050)
|
2024-09-03 18:50:29 +00:00 |
|
Kevin H. Luu
|
95a178f861
|
[CI] Only PR reviewers/committers can trigger CI on PR (#8124)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-09-03 11:32:27 -07:00 |
|
Cody Yu
|
bd852f2a8b
|
[Performance] Enable chunked prefill and prefix caching together (#8120)
Co-authored-by: Tao He <sighingnow@gmail.com>
Co-authored-by: Juelianqvq <Juelianqvq@noreply.github.com>
|
2024-09-03 10:49:18 -07:00 |
|
Isotr0py
|
ec266536b7
|
[Bugfix][VLM] Add fallback to SDPA for ViT model running on CPU backend (#8061)
|
2024-09-03 21:37:52 +08:00 |
|
Woosuk Kwon
|
0fbc6696c2
|
[Bugfix] Fix single output condition in output processor (#7881)
|
2024-09-02 20:35:42 -07:00 |
|
wang.yuqi
|
6e36f4fa6c
|
improve chunked prefill performance
[Bugfix] Fix #7592 vllm 0.5.4 enable_chunked_prefill throughput is slightly lower than 0.5.3~0.5.0. (#7874)
|
2024-09-02 14:20:12 -07:00 |
|
Isotr0py
|
dd2a6a82e3
|
[Bugfix] Fix internlm2 tensor parallel inference (#8055)
|
2024-09-02 23:48:56 +08:00 |
|
Isotr0py
|
4ca65a9763
|
[Core][Bugfix] Accept GGUF model without .gguf extension (#8056)
|
2024-09-02 08:43:26 -04:00 |
|
Woosuk Kwon
|
e2b2aa5a0f
|
[TPU] Align worker index with node boundary (#7932)
|
2024-09-01 23:09:46 -07:00 |
|
Lily Liu
|
e6a26ed037
|
[SpecDecode][Kernel] Flashinfer Rejection Sampling (#7244)
|
2024-09-01 21:23:29 -07:00 |
|
Shawn Tan
|
f8d60145b4
|
[Model] Add Granite model (#7436)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-09-01 18:37:18 -07:00 |
|
Roger Wang
|
5b86b19954
|
[Misc] Optional installation of audio related packages (#8063)
|
2024-09-01 14:46:57 -07:00 |
|
Roger Wang
|
5231f0898e
|
[Frontend][VLM] Add support for multiple multi-modal items (#8049)
|
2024-08-31 16:35:53 -07:00 |
|
Robert Shaw
|
8423aef4c8
|
[BugFix][Core] Multistep Fix Crash on Request Cancellation (#8059)
|
2024-08-31 19:44:03 +00:00 |
|
Nicolò Lucchesi
|
4f5d8446ed
|
[Bugfix] Fix ModelScope models in v0.5.5 (#8037)
|
2024-08-31 00:27:58 -07:00 |
|
Cyrus Leung
|
d05f0a9db2
|
[Bugfix] Fix import error in Phi-3.5-MoE (#8052)
|
2024-08-30 22:26:55 -07:00 |
|
Pavani Majety
|
622f8abff8
|
[Bugfix] bugfix and add model test for flashinfer fp8 kv cache. (#8013)
|
2024-08-30 22:18:50 -07:00 |
|
Wenxiang
|
1248e8506a
|
[Model] Adding support for MSFT Phi-3.5-MoE (#7729)
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Zeqi Lin <zelin@microsoft.com>
Co-authored-by: Zeqi Lin <Zeqi.Lin@microsoft.com>
|
2024-08-30 13:42:57 -06:00 |
|
Woosuk Kwon
|
2684efc467
|
[TPU][Bugfix] Fix tpu type api (#8035)
|
2024-08-30 09:01:26 -07:00 |
|
Kaunil Dhruv
|
058344f89a
|
[Frontend]-config-cli-args (#7737)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Kaunil Dhruv <kaunil_dhruv@intuit.com>
|
2024-08-30 08:21:02 -07:00 |
|
Cyrus Leung
|
98cef6a227
|
[Core] Increase default max_num_batched_tokens for multimodal models (#8028)
|
2024-08-30 08:20:34 -07:00 |
|
Jungho Christopher Cho
|
f97be32d1d
|
[VLM][Model] TP support for ViTs (#7186)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-08-30 08:19:27 -07:00 |
|