3762 Commits

Author SHA1 Message Date
Cyrus Leung
d1e21a979b
[CI/Build] Split up VLM tests (#11083)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-12 06:18:16 +08:00
Rui Qiao
72ff3a9686
[core] Bump ray to use _overlap_gpu_communication in compiled graph tests (#10410)
Signed-off-by: Rui Qiao <ubuntu@ip-172-31-15-128.us-west-2.compute.internal>
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: Rui Qiao <ubuntu@ip-172-31-15-128.us-west-2.compute.internal>
2024-12-11 11:36:35 -08:00
youkaichao
66aaa7722d
[torch.compile] remove graph logging in ci (#11110)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-11 10:59:50 -08:00
Woosuk Kwon
d643c2aba1
[V1] Use input_ids as input for text-only models (#11032)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-12-11 10:49:23 -08:00
youkaichao
91642db952
[torch.compile] use depyf to dump torch.compile internals (#10972)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-11 10:43:05 -08:00
bingps
fd22220687
[Doc] Installed version of llmcompressor for int8/fp8 quantization (#11103)
Signed-off-by: Guangda Liu <bingps@users.noreply.github.com>
Co-authored-by: Guangda Liu <bingps@users.noreply.github.com>
2024-12-11 15:43:24 +00:00
hissu-hyvarinen
b2f775456e
[CI/Build] Enable prefix caching test for AMD (#11098)
Signed-off-by: Hissu Hyvarinen <hissu.hyvarinen@amd.com>
2024-12-11 15:23:37 +00:00
Cyrus Leung
cad5c0a6ed
[Doc] Update docs to refer to pooling models (#11093)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-11 13:36:27 +00:00
Cyrus Leung
8f10d5e393
[Misc] Split up pooling tasks (#10820)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-11 01:28:00 -08:00
Rafael Vasquez
40766ca1b8
[Bugfix]: Clamp -inf logprob values in prompt_logprobs (#11073)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-12-11 01:27:39 -08:00
B-201
2e32f5d28d
[Bugfix] Fix Idefics3 fails during multi-image inference (#11080)
Signed-off-by: B-201 <Joy25810@foxmail.com>
2024-12-11 01:27:07 -08:00
Russell Bryant
61b1d2f6ae
[Core] v1: Use atexit to handle engine core client shutdown (#11076)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-12-11 01:26:36 -08:00
Kevin H. Luu
9974fca047
[ci/build] Fix entrypoints test and pin outlines version (#11088) 2024-12-11 01:01:53 -08:00
Kevin H. Luu
3fb4b4f163
[ci/build] Fix AMD CI dependencies (#11087) 2024-12-11 00:39:53 -08:00
Cyrus Leung
2e33fe4191
[CI/Build] Check transformers v4.47 (#10991)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-11 05:02:02 +00:00
Maximilien de Bayser
e39400a4b6
Fix streaming for granite tool call when <|tool_call|> is present (#11069)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2024-12-11 04:51:40 +00:00
Mor Zusman
ffa48c9146
[Model] PP support for Mamba-like models (#10992)
Signed-off-by: mzusman <mor.zusmann@gmail.com>
2024-12-10 21:53:37 -05:00
Aurick Qiao
d5c5154fcf
[Misc] LoRA + Chunked Prefill (#9057) 2024-12-11 10:09:20 +08:00
Tyler Michael Smith
9a93973708
[Bugfix] Fix Mamba multistep (#11071)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-12-11 00:16:22 +00:00
Woosuk Kwon
134810b3d9
[V1][Bugfix] Always set enable_chunked_prefill = True for V1 (#11061)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-12-10 14:41:23 -08:00
youkaichao
75f89dc44c
[torch.compile] add a flag to track batchsize statistics (#11059)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-10 12:40:52 -08:00
Russell Bryant
e739194926
[Core] Update to outlines >= 0.1.8 (#10576)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-12-10 12:08:16 -08:00
Flávia Béo
250ee65d72
[BUG] Remove token param #10921 (#11022)
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
2024-12-10 17:38:15 +00:00
Joe Runde
9b9cef3145
[Bugfix] Backport request id validation to v0 (#11036)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-12-10 16:38:23 +00:00
Jee Jee Li
d05f88679b
[Misc][LoRA] Add PEFTHelper for LoRA (#11003)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-10 11:12:01 +00:00
Travis Johnson
beb16b2c81
[Bugfix] Handle <|tool_call|> token in granite tool parser (#11039)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2024-12-10 10:27:11 +00:00
Maxime Fournioux
fe2e10c71b
Add example of helm chart for vllm deployment on k8s (#9199)
Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
2024-12-10 09:19:27 +00:00
Gene Der Su
82c73fd510
[Bugfix] cuda error running llama 3.2 (#11047) 2024-12-10 07:41:11 +00:00
Diego Marinho
bfd610430c
Update README.md (#11034) 2024-12-09 23:08:10 -08:00
Jeff Cook
e35879c276
[Bugfix] Fix xgrammar failing to read a vocab_size from LlavaConfig on PixtralHF. (#11043) 2024-12-10 14:54:22 +08:00
youkaichao
ebf778061d
monitor metrics of tokens per step using cudagraph batchsizes (#11031)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-09 22:35:36 -08:00
Tyler Michael Smith
28b3a1c7e5
[V1] Multiprocessing Tensor Parallel Support for v1 (#9856)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-12-10 06:28:14 +00:00
Patrick von Platen
bc192a2b09
[Pixtral] Improve loading (#11040) 2024-12-10 06:09:32 +00:00
Joe Runde
980ad394a8
[Frontend] Use request id from header (#10968)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-12-10 13:46:29 +08:00
Cyrus Leung
391d7b2763
[Bugfix] Fix usage of deprecated decorator (#11025)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-10 13:45:47 +08:00
Isotr0py
d1f6d1c8af
[Model] Add has_weight to RMSNorm and re-enable weights loading tracker for Mamba (#10739)
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-12-10 10:23:07 +08:00
Michael Goin
6d525288c1
[Docs] Add dedicated tool calling page to docs (#10554)
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-12-09 20:15:34 -05:00
Woosuk Kwon
6faec54505
[V1] Do not store None in self.generators (#11038)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-12-09 15:08:19 -08:00
Richard Liu
5ed5d5f128
Build tpu image in release pipeline (#10936)
Signed-off-by: Richard Liu <ricliu@google.com>
Co-authored-by: Kevin H. Luu <kevin@anyscale.com>
2024-12-09 23:07:48 +00:00
Gregory Shtrasberg
b63ba84832
[ROCm][bugfix] scpecilative decoding worker class (#11035)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2024-12-09 14:00:29 -08:00
xendo
9c6459e4cb
[Neuron] Upgrade neuron to 2.20.2 (#11016)
Signed-off-by: Jerzy Zagorski <jzagorsk@amazon.com>
Co-authored-by: Jerzy Zagorski <jzagorsk@amazon.com>
2024-12-09 13:53:24 -08:00
youkaichao
1a2f8fb828
[v1] fix use compile sizes (#11000)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-09 13:47:24 -08:00
Konrad Zawora
cbcbdb1ceb
[Bugfix][Hardware][Gaudi] Bump vllm_hpu_extension version (#11028)
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
2024-12-09 13:21:06 -08:00
Isotr0py
a811dd6608
[Model] merged input processor for Phi-3-Vision models (#10977)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-12-09 12:55:10 -08:00
Jee Jee Li
ca871491ed
[Misc][LoRA] Abstract PunicaWrapper (#10955)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-09 12:54:44 -08:00
Woosuk Kwon
3b61cb450d
[V1] Further reduce CPU overheads in flash-attn (#10989)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-12-09 12:38:46 -08:00
Kevin H. Luu
edc4fa3188
[ci/build] Recompile CI dependencies list with Python 3.12 (#11013)
Signed-off-by: kevin <kevin@anyscale.com>
2024-12-09 11:46:58 -08:00
Varun Sundar Rabindranath
25b79d9fd3
[V1] Input Batch Relocation (#10962)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-12-09 09:33:41 -08:00
wangxiyuan
aea2fc38c3
[Platform] Move async output check to platform (#10768)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2024-12-09 17:24:46 +00:00
Russell Bryant
e691b26f6f
[Core] Require xgrammar >= 0.1.6 (#11021)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-12-09 16:44:27 +00:00