Cyrus Leung
|
7af553ea30
|
[Misc] Abstract the logic for reading and writing media content (#11527)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-27 19:21:23 +08:00 |
|
Robert Shaw
|
46d4359450
|
[CI] Fix broken CI (#11543)
|
2024-12-26 18:49:16 -08:00 |
|
Woosuk Kwon
|
371d04d39b
|
[V1] Use FlashInfer Sampling Kernel for Top-P & Top-K Sampling (#11394)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-27 09:32:38 +09:00 |
|
Michael Goin
|
2072924d14
|
[Model] [Quantization] Support deepseek_v3 w8a8 fp8 block-wise quantization (#11523)
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: simon-mo <simon.mo@hey.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: HandH1998 <1335248067@qq.com>
|
2024-12-26 15:33:30 -08:00 |
|
Cyrus Leung
|
eec906d811
|
[Misc] Add placeholder module (#11501)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-26 13:12:51 +00:00 |
|
sroy745
|
dcb1a944d4
|
[V1] Adding min tokens/repetition/presence/frequence penalties to V1 sampler (#10681)
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-26 19:02:58 +09:00 |
|
Jee Jee Li
|
aa25985bd1
|
[Misc][LoRA] Fix LoRA weight mapper (#11495)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-26 15:52:48 +08:00 |
|
Cyrus Leung
|
51a624bf02
|
[Misc] Move some multimodal utils to modality-specific modules (#11494)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-26 04:23:20 +00:00 |
|
Jiaxin Shan
|
fc601665eb
|
[Misc] Update disaggregation benchmark scripts and test logs (#11456)
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
|
2024-12-25 06:58:48 +00:00 |
|
Rui Qiao
|
9832e5572a
|
[V1] Unify VLLM_ENABLE_V1_MULTIPROCESSING handling in RayExecutor (#11472)
|
2024-12-24 19:49:46 -08:00 |
|
Cyrus Leung
|
3f3e92e1f2
|
[Model] Automatic conversion of classification and reward models (#11469)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-24 18:22:22 +00:00 |
|
Jee Jee Li
|
196c34b0ac
|
[Misc] Move weights mapper (#11443)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-24 13:05:25 +00:00 |
|
Jee Jee Li
|
b1b1038fbd
|
[Bugfix] Fix Qwen2-VL LoRA weight loading (#11430)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-24 09:56:10 +00:00 |
|
Cyrus Leung
|
9edca6bf8f
|
[Frontend] Online Pooling API (#11457)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-24 17:54:30 +08:00 |
|
Rui Qiao
|
a491d6f535
|
[V1] TP Ray executor (#11107)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2024-12-23 23:00:12 +00:00 |
|
Michael Goin
|
63afbe9215
|
[CI] Expand OpenAI test_chat.py guided decoding tests (#11048)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-23 18:35:38 +00:00 |
|
Dipika Sikka
|
8cef6e02dc
|
[Misc] add w8a8 asym models (#11075)
|
2024-12-23 13:33:20 -05:00 |
|
Michael Goin
|
5bfb30a529
|
[Bugfix] Fix CFGGuide and use outlines for grammars that can't convert to GBNF (#11389)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-23 23:06:20 +08:00 |
|
Jason T. Greene
|
f1d1bf6288
|
[Bugfix] Fix fully sharded LoRAs with Mixtral (#11390)
Signed-off-by: Jason Greene <jason.greene@redhat.com>
|
2024-12-22 23:25:10 +08:00 |
|
Roger Wang
|
29c748930e
|
[CI] Fix flaky entrypoint tests (#11403)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2024-12-21 21:08:44 -08:00 |
|
omer-dayan
|
995f56236b
|
[Core] Loading model from S3 using RunAI Model Streamer as optional loader (#10192)
Signed-off-by: OmerD <omer@run.ai>
|
2024-12-20 16:46:24 +00:00 |
|
Wallas Henrique
|
86c2d8fd1c
|
[Bugfix] Fix spec decoding when seed is none in a batch (#10863)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
|
2024-12-20 05:15:31 +00:00 |
|
Isotr0py
|
e24113a8fe
|
[Model] Refactor Qwen2-VL to use merged multimodal processor (#11258)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-19 16:28:00 +00:00 |
|
Yehoshua Cohen
|
6c7f881541
|
[Model] Add JambaForSequenceClassification model (#10860)
Signed-off-by: Yehoshua Cohen <yehoshuaco@ai21.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Yehoshua Cohen <yehoshuaco@ai21.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-19 22:48:06 +08:00 |
|
Yanyi Liu
|
5aef49806d
|
[Feature] Add load generation config from model (#11164)
Signed-off-by: liuyanyi <wolfsonliu@163.com>
Signed-off-by: Yanyi Liu <wolfsonliu@163.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2024-12-19 10:50:38 +00:00 |
|
Cyrus Leung
|
6142ef0ada
|
[VLM] Merged multimodal processor for Qwen2-Audio (#11303)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-19 06:14:17 +00:00 |
|
Michael Goin
|
a30482f054
|
[CI] Expand test_guided_generate to test all backends (#11313)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-19 04:00:38 +00:00 |
|
Travis Johnson
|
17ca964273
|
[Model] IBM Granite 3.1 (#11307)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-12-19 11:27:24 +08:00 |
|
Tyler Michael Smith
|
5a9da2e6e9
|
[Bugfix][Build/CI] Fix sparse CUTLASS compilation on CUDA [12.0, 12.2) (#11311)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-12-19 02:43:30 +00:00 |
|
Joe Runde
|
ca5f54a9b9
|
[Bugfix] fix minicpmv test (#11304)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-12-18 10:34:26 -08:00 |
|
Isotr0py
|
996aa70f00
|
[Bugfix] Fix broken phi3-v mm_processor_kwargs tests (#11263)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-12-18 10:16:40 -08:00 |
|
Dipika Sikka
|
60508ffda9
|
[Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support (#10995)
Co-authored-by: Faraz Shahsavan <faraz.shahsavan@gmail.com>
Co-authored-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
|
2024-12-18 09:57:16 -05:00 |
|
Wallas Henrique
|
8b79f9e107
|
[Bugfix] Fix guided decoding with tokenizer mode mistral (#11046)
|
2024-12-17 22:34:08 -08:00 |
|
Cody Yu
|
bf8717ebae
|
[V1] Prefix caching for vision language models (#11187)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-12-17 16:37:59 -08:00 |
|
Michael Goin
|
c77eb8a33c
|
[Bugfix] Set temperature=0.7 in test_guided_choice_chat (#11264)
|
2024-12-17 16:34:06 -08:00 |
|
Joe Runde
|
2d1b9baa8f
|
[Bugfix] Fix request cancellation without polling (#11190)
|
2024-12-17 12:26:32 -08:00 |
|
kYLe
|
66d4b16724
|
[Frontend] Add OpenAI API support for input_audio (#11027)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-16 22:09:58 -08:00 |
|
Michael Goin
|
0064f697d3
|
[CI] Add test case with JSON schema using references + use xgrammar by default with OpenAI parse (#10935)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-17 11:39:58 +08:00 |
|
youkaichao
|
551603feff
|
[core] overhaul memory profiling and fix backward compatibility (#10511)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-16 13:32:25 -08:00 |
|
Isotr0py
|
d927dbcd88
|
[Model] Refactor Ultravox to use merged input processor (#11198)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-12-16 10:09:53 +00:00 |
|
Jani Monoses
|
bddbbcb132
|
[Model] Support Cohere2ForCausalLM (Cohere R7B) (#11203)
|
2024-12-16 09:56:19 +00:00 |
|
Cyrus Leung
|
b10609e6a1
|
[Misc] Clean up multi-modal processor (#11207)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-15 06:30:28 +00:00 |
|
Cyrus Leung
|
93abf23a64
|
[VLM] Fully dynamic prompt replacement in merged input processor (#11199)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-14 17:52:18 +00:00 |
|
Brad Hilton
|
9c3dadd1c9
|
[Frontend] Add logits_processors as an extra completion argument (#11150)
Signed-off-by: Brad Hilton <brad.hilton.nw@gmail.com>
|
2024-12-14 16:46:42 +00:00 |
|
Cyrus Leung
|
0920ab9131
|
[Doc] Reorganize online pooling APIs (#11172)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-14 00:22:22 +08:00 |
|
Sungjae Lee
|
c31d4a57a6
|
[Core] support LoRA and prompt adapter in content-based hashing for Block Manager v2 prefix caching (#8240)
|
2024-12-13 07:51:25 -08:00 |
|
Cyrus Leung
|
eeec9e3390
|
[Frontend] Separate pooling APIs in offline inference (#11129)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-13 10:40:07 +00:00 |
|
youkaichao
|
be39e3cd18
|
[core] clean up cudagraph batchsize padding logic (#10996)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-13 06:57:50 +00:00 |
|
Pooya Davoodi
|
1efce68605
|
[Bugfix] Use runner_type instead of task in GritLM (#11144)
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
|
2024-12-13 04:09:53 +00:00 |
|
Luka Govedič
|
30870b4f66
|
[torch.compile] Dynamic fp8 + rms_norm fusion (#10906)
Signed-off-by: luka <luka@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-12-13 03:19:23 +00:00 |
|