Russell Bryant
776dbd74f1
[CI/Build] mypy: Resolve some errors from checking vllm/engine ( #9267 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-10-16 22:55:59 +00:00
Junhao Li
5b8a1fde84
[Model][Bugfix] Add FATReLU activation and support for openbmb/MiniCPM-S-1B-sft ( #9396 )
2024-10-16 16:40:24 +00:00
Mor Zusman
fb60ae9b91
[Kernel][Model] Improve continuous batching for Jamba and Mamba ( #9189 )
2024-10-16 12:12:43 -04:00
Isotr0py
cf1d62a644
[Model] Support SDPA attention for Molmo vision backbone ( #9410 )
2024-10-16 11:52:01 +00:00
Cyrus Leung
cee711fdbb
[Core] Rename input data types ( #8688 )
2024-10-16 10:49:37 +00:00
Cyrus Leung
7abba39ee6
[Model] VLM2Vec, the first multimodal embedding model in vLLM ( #9303 )
2024-10-16 14:31:00 +08:00
Cyrus Leung
7e7eae338d
[Misc] Standardize RoPE handling for Qwen2-VL ( #9250 )
2024-10-16 13:56:17 +08:00
Reza Salehi
ed920135c8
[Bugfix] Molmo text-only input bug fix ( #9397 )
...
Co-authored-by: sanghol <sanghol@allenai.org>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-10-16 04:56:09 +00:00
Michael Goin
22f8a69549
[Misc] Directly use compressed-tensors for checkpoint definitions ( #8909 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-15 15:40:25 -07:00
hhzhang16
55e081fbad
[Bugfix] Update InternVL input mapper to support image embeds ( #9351 )
2024-10-14 21:29:19 -07:00
Tyler Michael Smith
169b530607
[Bugfix] Clean up some cruft in mamba.py ( #9343 )
2024-10-15 00:24:25 +00:00
Xiang Xu
f0fe4fe86d
[Model] Make llama3.2 support multiple and interleaved images ( #9095 )
2024-10-14 15:24:26 -07:00
Reza Salehi
dfe43a2071
[Model] Molmo vLLM Integration ( #9016 )
...
Co-authored-by: sanghol <sanghol@allenai.org>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-10-14 07:56:24 -07:00
Jee Jee Li
250e26a63e
[Bugfix]Fix MiniCPM's LoRA bug ( #9286 )
2024-10-12 09:36:47 -07:00
sixgod
6cf1167c1a
[Model] Add GLM-4v support and meet vllm==0.6.2 ( #9242 )
2024-10-11 17:36:13 +00:00
Burkhard Ringlein
f710090d8e
[Kernel] adding fused moe kernel config for L40S TP4 ( #9245 )
2024-10-11 08:54:22 -07:00
Tyler Michael Smith
7342a7d7f8
[Model] Support Mamba ( #6484 )
2024-10-11 15:40:06 +00:00
Cyrus Leung
e808156f30
[Misc] Collect model support info in a single process per model ( #9233 )
2024-10-11 11:08:11 +00:00
youkaichao
cbc2ef5529
[misc] hide best_of from engine ( #9261 )
...
Co-authored-by: Brendan Wong <bjwpokemon@gmail.com>
2024-10-10 21:30:44 -07:00
youkaichao
e00c094f15
[torch.compile] generic decorators ( #9258 )
2024-10-10 15:54:23 -07:00
youkaichao
e4d652ea3e
[torch.compile] integration with compilation control ( #9058 )
2024-10-10 12:39:36 -07:00
whyiug
04de9057ab
[Model] support input image embedding for minicpmv ( #9237 )
2024-10-10 15:00:47 +00:00
Isotr0py
07c11cf4d4
[Bugfix] Fix lm_head weights tying with lora for llama ( #9227 )
2024-10-10 21:11:56 +08:00
youkaichao
de895f1697
[misc] improve model support check in another process ( #9208 )
2024-10-09 21:58:27 -07:00
Li, Jiang
ca77dd7a44
[Hardware][CPU] Support AWQ for CPU backend ( #7515 )
2024-10-09 10:28:08 -06:00
Cyrus Leung
8bfaa4e31e
[Bugfix] fix composite weight loading and EAGLE weight loading ( #9160 )
2024-10-09 00:36:55 -07:00
Hui Liu
cdc72e3c80
[Model] Remap FP8 kv_scale in CommandR and DBRX ( #9174 )
2024-10-09 06:43:06 +00:00
chenqianfzh
2f4117c38e
support bitsandbytes quantization with more models ( #9148 )
2024-10-08 19:52:19 -06:00
Cyrus Leung
151ef4efd2
[Model] Support NVLM-D and fix QK Norm in InternViT ( #9045 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2024-10-07 11:55:12 +00:00
Isotr0py
f19da64871
[Core] Refactor GGUF parameters packing and forwarding ( #8859 )
2024-10-07 10:01:46 +00:00
Cyrus Leung
8c6de96ea1
[Model] Explicit interface for vLLM models and support OOT embedding models ( #9108 )
2024-10-07 06:10:35 +00:00
youkaichao
18b296fdb2
[core] remove beam search from the core ( #9105 )
2024-10-07 05:47:04 +00:00
Cyrus Leung
b22b798471
[Model] PP support for embedding models and update docs ( #9090 )
...
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-10-06 16:35:27 +08:00
Xin Yang
15986f598c
[Model] Support Gemma2 embedding model ( #9004 )
2024-10-05 06:57:05 +00:00
hhzhang16
53b3a33027
[Bugfix] Fixes Phi3v & Ultravox Multimodal EmbeddingInputs ( #8979 )
2024-10-04 22:05:37 -07:00
Chongming Ni
cc90419e89
[Hardware][Neuron] Add on-device sampling support for Neuron ( #8746 )
...
Co-authored-by: Ashraf Mahgoub <ashymahg@amazon.com>
2024-10-04 16:42:20 -07:00
ElizaWszola
05d686432f
[Kernel] Zero point support in fused MarlinMoE kernel + AWQ Fused MoE ( #8973 )
...
Co-authored-by: Dipika <dipikasikka1@gmail.com>
Co-authored-by: Dipika Sikka <ds3822@columbia.edu>
2024-10-04 12:34:44 -06:00
Roger Wang
26aa325f4f
[Core][VLM] Test registration for OOT multimodal models ( #8717 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-04 10:38:25 -07:00
Prashant Gupta
9ade8bbc8d
[Model] add a bunch of supported lora modules for mixtral ( #9008 )
...
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
2024-10-04 16:24:40 +00:00
whyiug
3d826d2c52
[Bugfix] Reshape the dimensions of the input image embeddings in Qwen2VL ( #9071 )
2024-10-04 14:34:58 +00:00
Cyrus Leung
0e36fd4909
[Misc] Move registry to its own file ( #9064 )
2024-10-04 10:01:37 +00:00
Murali Andoorveedu
0f6d7a9a34
[Models] Add remaining model PP support ( #7168 )
...
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-04 10:56:58 +08:00
Domen Vreš
2838d6b38e
[Bugfix] Weight loading fix for OPT model ( #9042 )
...
Co-authored-by: dvres <dvres@fri.uni-lj.si>
2024-10-03 19:53:29 -04:00
Divakar Verma
01843c89b8
[Misc] log when using default MoE config ( #8971 )
2024-10-03 04:31:07 +00:00
Shawn Tan
19f0d25796
[Model] Adding Granite MoE. ( #8206 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-10-03 09:33:57 +08:00
Sergey Shlyapnikov
f58d4fccc9
[OpenVINO] Enable GPU support for OpenVINO vLLM backend ( #8192 )
2024-10-02 17:50:01 -04:00
Lily Liu
1570203864
[Spec Decode] (1/2) Remove batch expansion ( #8839 )
2024-10-01 16:04:42 -07:00
Cyrus Leung
4f341bd4bf
[Doc] Update list of supported models ( #8987 )
2024-10-02 00:35:39 +08:00
Alex Brooks
1fe0a4264a
[Bugfix] Fix Token IDs Reference for MiniCPM-V When Images are Provided With No Placeholders ( #8991 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2024-10-01 09:52:44 +00:00
Isotr0py
bc4eb65b54
[Bugfix] Fix Fuyu tensor parallel inference ( #8986 )
2024-10-01 17:51:41 +08:00