Rui Qiao
9832e5572a
[V1] Unify VLLM_ENABLE_V1_MULTIPROCESSING handling in RayExecutor ( #11472 )
2024-12-24 19:49:46 -08:00
Cyrus Leung
3f3e92e1f2
[Model] Automatic conversion of classification and reward models ( #11469 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-24 18:22:22 +00:00
Jee Jee Li
196c34b0ac
[Misc] Move weights mapper ( #11443 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-24 13:05:25 +00:00
Jee Jee Li
b1b1038fbd
[Bugfix] Fix Qwen2-VL LoRA weight loading ( #11430 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-24 09:56:10 +00:00
Cyrus Leung
9edca6bf8f
[Frontend] Online Pooling API ( #11457 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-24 17:54:30 +08:00
Rui Qiao
a491d6f535
[V1] TP Ray executor ( #11107 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2024-12-23 23:00:12 +00:00
Michael Goin
63afbe9215
[CI] Expand OpenAI test_chat.py guided decoding tests ( #11048 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-12-23 18:35:38 +00:00
Dipika Sikka
8cef6e02dc
[Misc] add w8a8 asym models ( #11075 )
2024-12-23 13:33:20 -05:00
Michael Goin
5bfb30a529
[Bugfix] Fix CFGGuide and use outlines for grammars that can't convert to GBNF ( #11389 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-12-23 23:06:20 +08:00
Jason T. Greene
f1d1bf6288
[Bugfix] Fix fully sharded LoRAs with Mixtral ( #11390 )
...
Signed-off-by: Jason Greene <jason.greene@redhat.com>
2024-12-22 23:25:10 +08:00
Roger Wang
29c748930e
[CI] Fix flaky entrypoint tests ( #11403 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
2024-12-21 21:08:44 -08:00
omer-dayan
995f56236b
[Core] Loading model from S3 using RunAI Model Streamer as optional loader ( #10192 )
...
Signed-off-by: OmerD <omer@run.ai>
2024-12-20 16:46:24 +00:00
Wallas Henrique
86c2d8fd1c
[Bugfix] Fix spec decoding when seed is none in a batch ( #10863 )
...
Signed-off-by: Wallas Santos <wallashss@ibm.com>
2024-12-20 05:15:31 +00:00
Isotr0py
e24113a8fe
[Model] Refactor Qwen2-VL to use merged multimodal processor ( #11258 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-19 16:28:00 +00:00
Yehoshua Cohen
6c7f881541
[Model] Add JambaForSequenceClassification model ( #10860 )
...
Signed-off-by: Yehoshua Cohen <yehoshuaco@ai21.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Yehoshua Cohen <yehoshuaco@ai21.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-19 22:48:06 +08:00
Yanyi Liu
5aef49806d
[Feature] Add load generation config from model ( #11164 )
...
Signed-off-by: liuyanyi <wolfsonliu@163.com>
Signed-off-by: Yanyi Liu <wolfsonliu@163.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2024-12-19 10:50:38 +00:00
Cyrus Leung
6142ef0ada
[VLM] Merged multimodal processor for Qwen2-Audio ( #11303 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-19 06:14:17 +00:00
Michael Goin
a30482f054
[CI] Expand test_guided_generate to test all backends ( #11313 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-12-19 04:00:38 +00:00
Travis Johnson
17ca964273
[Model] IBM Granite 3.1 ( #11307 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2024-12-19 11:27:24 +08:00
Tyler Michael Smith
5a9da2e6e9
[Bugfix][Build/CI] Fix sparse CUTLASS compilation on CUDA [12.0, 12.2) ( #11311 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-12-19 02:43:30 +00:00
Joe Runde
ca5f54a9b9
[Bugfix] fix minicpmv test ( #11304 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-12-18 10:34:26 -08:00
Isotr0py
996aa70f00
[Bugfix] Fix broken phi3-v mm_processor_kwargs tests ( #11263 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-12-18 10:16:40 -08:00
Dipika Sikka
60508ffda9
[Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support ( #10995 )
...
Co-authored-by: Faraz Shahsavan <faraz.shahsavan@gmail.com>
Co-authored-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
2024-12-18 09:57:16 -05:00
Wallas Henrique
8b79f9e107
[Bugfix] Fix guided decoding with tokenizer mode mistral ( #11046 )
2024-12-17 22:34:08 -08:00
Cody Yu
bf8717ebae
[V1] Prefix caching for vision language models ( #11187 )
...
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
2024-12-17 16:37:59 -08:00
Michael Goin
c77eb8a33c
[Bugfix] Set temperature=0.7 in test_guided_choice_chat ( #11264 )
2024-12-17 16:34:06 -08:00
Joe Runde
2d1b9baa8f
[Bugfix] Fix request cancellation without polling ( #11190 )
2024-12-17 12:26:32 -08:00
kYLe
66d4b16724
[Frontend] Add OpenAI API support for input_audio ( #11027 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-16 22:09:58 -08:00
Michael Goin
0064f697d3
[CI] Add test case with JSON schema using references + use xgrammar by default with OpenAI parse ( #10935 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-12-17 11:39:58 +08:00
youkaichao
551603feff
[core] overhaul memory profiling and fix backward compatibility ( #10511 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-16 13:32:25 -08:00
Isotr0py
d927dbcd88
[Model] Refactor Ultravox to use merged input processor ( #11198 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-12-16 10:09:53 +00:00
Jani Monoses
bddbbcb132
[Model] Support Cohere2ForCausalLM (Cohere R7B) ( #11203 )
2024-12-16 09:56:19 +00:00
Cyrus Leung
b10609e6a1
[Misc] Clean up multi-modal processor ( #11207 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-15 06:30:28 +00:00
Cyrus Leung
93abf23a64
[VLM] Fully dynamic prompt replacement in merged input processor ( #11199 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-14 17:52:18 +00:00
Brad Hilton
9c3dadd1c9
[Frontend] Add logits_processors as an extra completion argument ( #11150 )
...
Signed-off-by: Brad Hilton <brad.hilton.nw@gmail.com>
2024-12-14 16:46:42 +00:00
Cyrus Leung
0920ab9131
[Doc] Reorganize online pooling APIs ( #11172 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-14 00:22:22 +08:00
Sungjae Lee
c31d4a57a6
[Core] support LoRA and prompt adapter in content-based hashing for Block Manager v2 prefix caching ( #8240 )
2024-12-13 07:51:25 -08:00
Cyrus Leung
eeec9e3390
[Frontend] Separate pooling APIs in offline inference ( #11129 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-13 10:40:07 +00:00
youkaichao
be39e3cd18
[core] clean up cudagraph batchsize padding logic ( #10996 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-13 06:57:50 +00:00
Pooya Davoodi
1efce68605
[Bugfix] Use runner_type instead of task in GritLM ( #11144 )
...
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
2024-12-13 04:09:53 +00:00
Luka Govedič
30870b4f66
[torch.compile] Dynamic fp8 + rms_norm fusion ( #10906 )
...
Signed-off-by: luka <luka@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-12-13 03:19:23 +00:00
Cody Yu
78ed8f57d8
[Misc][V1] Fix type in v1 prefix caching ( #11151 )
2024-12-13 00:57:40 +00:00
Jiaxin Shan
85362f028c
[Misc][LoRA] Ensure Lora Adapter requests return adapter name ( #11094 )
...
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-12 09:25:16 +00:00
youkaichao
62de37a38e
[core][distributed] initialization from StatelessProcessGroup ( #10986 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-12 09:04:19 +00:00
Pooya Davoodi
1da8f0e1dd
[Model] Add support for embedding model GritLM ( #10816 )
...
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
2024-12-12 06:39:16 +00:00
Alexander Matveev
4e11683368
[V1] VLM preprocessor hashing ( #11020 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-12-12 00:55:30 +00:00
Cyrus Leung
d1e21a979b
[CI/Build] Split up VLM tests ( #11083 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-12 06:18:16 +08:00
Cyrus Leung
8f10d5e393
[Misc] Split up pooling tasks ( #10820 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-11 01:28:00 -08:00
Cyrus Leung
2e33fe4191
[CI/Build] Check transformers v4.47 ( #10991 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-11 05:02:02 +00:00
Mor Zusman
ffa48c9146
[Model] PP support for Mamba-like models ( #10992 )
...
Signed-off-by: mzusman <mor.zusmann@gmail.com>
2024-12-10 21:53:37 -05:00