Divakar Verma
9dc7c6c7f3
[dbrx] refactor dbrx experts to extend FusedMoe class ( #8518 )
2024-09-21 15:09:39 -06:00
Cyrus Leung
5e85f4f82a
[VLM] Use SequenceData.from_token_counts to create dummy data ( #8687 )
2024-09-20 23:28:56 -07:00
zyddnys
0f961b3ce9
[Bugfix] Fix incorrect llava next feature size calculation ( #8496 )
2024-09-20 22:48:32 +00:00
Niklas Muennighoff
3b63de9353
[Model] Add OLMoE ( #7922 )
2024-09-20 09:31:41 -07:00
Amit Garg
18ae428a0d
[Bugfix] Fix Phi3.5 mini and MoE LoRA inference ( #8571 )
2024-09-20 08:54:02 +08:00
Geun, Lim
e18749ff09
[Model] Support Solar Model ( #8386 )
...
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-09-18 11:04:00 -06:00
Aaron Pham
9d104b5beb
[CI/Build] Update Ruff version ( #8469 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-09-18 11:00:56 +00:00
Cyrus Leung
6ffa3f314c
[CI/Build] Avoid CUDA initialization ( #8534 )
2024-09-18 10:38:11 +00:00
Joe Runde
98f9713399
[Bugfix] Fix TP > 1 for new granite ( #8544 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-09-17 23:17:08 +00:00
sroy745
1009e93c5d
[Encoder decoder] Add cuda graph support during decoding for encoder-decoder models ( #7631 )
2024-09-17 07:35:01 -07:00
Chris
3724d5f6b5
[Bugfix][Model] Fix Python 3.8 compatibility in Pixtral model by updating type annotations ( #8490 )
2024-09-15 04:20:05 +00:00
ywfang
8a0cf1ddc3
[Model] support minicpm3 ( #8297 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-14 14:50:26 +00:00
Jee Jee Li
06311e2956
[Misc] Skip loading extra bias for Qwen2-VL GPTQ-Int8 ( #8442 )
2024-09-13 07:58:28 +00:00
Wenxiang
a480939e8e
[Bugfix] Fix weight loading issue by rename variable. ( #8293 )
2024-09-12 19:25:00 -04:00
Patrick von Platen
d31174a4e1
[Hotfix][Pixtral] Fix multiple images bugs ( #8415 )
2024-09-12 15:21:51 -07:00
Roger Wang
c16369455f
[Hotfix][Core][VLM] Disable chunked prefill by default and prefix caching for multimodal models ( #8425 )
2024-09-12 14:06:51 -07:00
Alex Brooks
c6202daeed
[Model] Support multiple images for qwen-vl ( #8247 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-12 10:10:54 -07:00
Isotr0py
e56bf27741
[Bugfix] Fix InternVL2 inference with various num_patches ( #8375 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-12 10:10:35 -07:00
Blueyo0
1bf2dd9df0
[Gemma2] add bitsandbytes support for Gemma2 ( #8338 )
2024-09-11 21:53:12 -07:00
Patrick von Platen
d394787e52
Pixtral ( #8377 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-09-11 14:41:55 -07:00
bnellnm
73202dbe77
[Kernel][Misc] register ops to prevent graph breaks ( #6917 )
...
Co-authored-by: Sage Moore <sage@neuralmagic.com>
2024-09-11 12:52:19 -07:00
Yang Fan
3b7fea770f
[Model][VLM] Add Qwen2-VL model support ( #7905 )
...
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-11 09:31:19 -07:00
Yangshen⚡Deng
6a512a00df
[model] Support for Llava-Next-Video model ( #7559 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-09-10 22:21:36 -07:00
Isotr0py
1230263e16
[Bugfix] Fix InternVL2 vision embeddings process with pipeline parallel ( #8299 )
2024-09-11 10:11:01 +08:00
Jee Jee Li
e497b8aeff
[Misc] Skip loading extra bias for Qwen2-MOE GPTQ models ( #8329 )
2024-09-10 20:59:19 -04:00
Cyrus Leung
da1a844e61
[Bugfix] Fix missing post_layernorm in CLIP ( #8155 )
2024-09-10 08:22:50 +00:00
Dipika Sikka
6cd5e5b07e
[Misc] Fused MoE Marlin support for GPTQ ( #8217 )
2024-09-09 23:02:52 -04:00
Vladislav Kruglikov
f9b4a2d415
[Bugfix] Correct adapter usage for cohere and jamba ( #8292 )
2024-09-09 11:20:46 -07:00
Isotr0py
36bf8150cc
[Model][VLM] Decouple weight loading logic for Paligemma ( #8269 )
2024-09-07 17:45:44 +00:00
Isotr0py
e807125936
[Model][VLM] Support multi-images inputs for InternVL2 models ( #8201 )
2024-09-07 16:38:23 +08:00
Cyrus Leung
2f707fcb35
[Model] Multi-input support for LLaVA ( #8238 )
2024-09-07 02:57:24 +00:00
Patrick von Platen
29f49cd6e3
[Model] Allow loading from original Mistral format ( #8168 )
...
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-09-06 17:02:05 -06:00
Alex Brooks
9da25a88aa
[MODEL] Qwen Multimodal Support (Qwen-VL / Qwen-VL-Chat) ( #8029 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-05 12:48:10 +00:00
manikandan.tm@zucisystems.com
8685ba1a1e
Inclusion of InternVLChatModel In PP_SUPPORTED_MODELS(Pipeline Parallelism) ( #7860 )
2024-09-05 11:33:37 +00:00
wnma
d3311562fb
[Bugfix] remove post_layernorm in siglip ( #8106 )
2024-09-04 18:55:37 +08:00
Peter Salas
2be8ec6e71
[Model] Add Ultravox support for multiple audio chunks ( #7963 )
2024-09-04 04:38:21 +00:00
Isotr0py
ec266536b7
[Bugfix][VLM] Add fallback to SDPA for ViT model running on CPU backend ( #8061 )
2024-09-03 21:37:52 +08:00
Isotr0py
dd2a6a82e3
[Bugfix] Fix internlm2 tensor parallel inference ( #8055 )
2024-09-02 23:48:56 +08:00
Shawn Tan
f8d60145b4
[Model] Add Granite model ( #7436 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-09-01 18:37:18 -07:00
Roger Wang
5b86b19954
[Misc] Optional installation of audio related packages ( #8063 )
2024-09-01 14:46:57 -07:00
Cyrus Leung
d05f0a9db2
[Bugfix] Fix import error in Phi-3.5-MoE ( #8052 )
2024-08-30 22:26:55 -07:00
Wenxiang
1248e8506a
[Model] Adding support for MSFT Phi-3.5-MoE ( #7729 )
...
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Zeqi Lin <zelin@microsoft.com>
Co-authored-by: Zeqi Lin <Zeqi.Lin@microsoft.com>
2024-08-30 13:42:57 -06:00
Jungho Christopher Cho
f97be32d1d
[VLM][Model] TP support for ViTs ( #7186 )
...
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-08-30 08:19:27 -07:00
Cyrus Leung
afd39a4511
[Bugfix] Fix import error in Exaone model ( #8034 )
2024-08-30 08:03:28 -07:00
Yohan Na
dc13e99348
[MODEL] add Exaone model support ( #7819 )
2024-08-29 23:34:20 -07:00
afeldman-nm
428dd1445e
[Core] Logprobs support in Multi-step ( #7652 )
2024-08-29 19:19:08 -07:00
Peter Salas
74d5543ec5
[VLM][Core] Fix exceptions on ragged NestedTensors ( #7974 )
2024-08-29 03:24:31 +00:00
Mor Zusman
fdd9daafa3
[Kernel/Model] Migrate mamba_ssm and causal_conv1d kernels to vLLM ( #7651 )
2024-08-28 15:06:52 -07:00
Cyrus Leung
ef9baee3c5
[Bugfix][VLM] Fix incompatibility between #7902 and #7230 ( #7948 )
2024-08-28 08:11:18 -07:00
Peter Salas
fab5f53e2d
[Core][VLM] Stack multimodal tensors to represent multiple images within each prompt ( #7902 )
2024-08-28 01:53:56 +00:00