184 Commits

Author SHA1 Message Date
Nick Hill
7df331c66b
[BugFix] Fix chunked prompt logprobs + preemption (#29071) 2025-11-22 16:07:18 -05:00
Lucas Wilkinson
30d6466238
[BugFix] Fix Eagle IndexError: list index out of range for even num_speculative_tokens (#29102)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-11-22 00:47:05 +00:00
Varun Sundar Rabindranath
fe1cd7704d
[Performance][B200] silu_mul_quant: pack scales in int32 (#28358)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-11-13 10:16:55 -08:00
wangxiyuan
428bc7bf1c
[V0 deprecation] Remove VLLM_USE_V1 usage in most modules (#27955)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-11-04 20:51:16 -08:00
Nick Hill
0cdbe7b744
[Core] Async scheduling + structured outputs compatibility (#26866)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-01 00:35:04 +00:00
Nick Hill
4fe5895361
[AsyncScheduling] Make async overlap work with logprobs (#27615)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-10-28 22:35:54 +00:00
Cyrus Leung
d31f7844f8
[Misc] Move utils to avoid conflicts with stdlib, and move tests (#27169)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-19 05:20:55 -07:00
Isotr0py
6ac5e06f7c
[Chore] Clean up pytorch helper functions in vllm.utils (#26908)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: isotr0py <2037008807@qq.com>
2025-10-18 09:48:22 -07:00
Luka Govedič
bd7157a071
[torch.compile] Enable attention and allreduce fusion without custom ops enabled (#24604)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-10-17 08:10:23 -06:00
Cyrus Leung
d2740fafbf
[Chore] Separate out vllm.utils.collections (#26990)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-16 08:35:35 +00:00
wangxiyuan
8f4b313c37
[Misc] rename torch_dtype to dtype (#26695)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-10-15 12:11:48 +00:00
wang.yuqi
f54f85129e
[Model][2/N] Improve all pooling task | Support multi-vector retrieval (#25370)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-10-15 11:14:41 +00:00
Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-12 09:51:31 -07:00
Yannick Schnider
6431be808f
[Tests] conftest: Extending VllmRunner and HfRunner to accept token_ids as input (#26295)
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-06 17:19:34 +00:00
Thomas Parnell
d3c84297c3
[CI] Add comment about the single cudagraph capture size that is used (#26252) 2025-10-06 02:35:37 +00:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
Cyrus Leung
b7e8e4e6be
[Bugfix] Always apply MM processor even when no MM items are passed (#26240)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-05 10:10:20 +00:00
Yannick Schnider
5446ad1d24
[test utils] correct wrong typing (#26159)
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
2025-10-03 02:11:49 -07:00
Thomas Parnell
be8921fbba
Change size of single CUDA graph for CI to 4 (#26089)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-10-02 14:14:28 +00:00
Harry Mellor
a332b84578
[CI] Only capture a single CUDA graph size in CI by default (#25951)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-01 10:03:44 +01:00
Isotr0py
27ec3c78f3
[CI/Build] Fix v1 OOT registration test (#25547)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-24 08:03:13 +00:00
Woosuk Kwon
26e673fe93
[V0 Deprecation] Remove V0 Sequence class & Sampler (#25332)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-21 08:52:15 -07:00
Woosuk Kwon
52c2a8d4ad
[V0 Deprecation] Remove LLMEngine (#25033)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-20 17:56:30 -07:00
Cyrus Leung
bef180f009
[V0 Deprecation] Enable the remaining multimodal tests in V1 (#25307)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-20 17:50:58 +00:00
Cyrus Leung
3d9a1d2de5
[V1] Support LLM.apply_model (#18465)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-20 07:14:35 +00:00
Harry Mellor
aed16879a9
Move ModelConfig from config/__init__.py to config/model.py (#25252)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-19 16:22:33 +00:00
Isotr0py
f2718d2948
[Misc] Cleanup test conftest for deprecated encoder-decoder models (#25231)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-19 07:44:56 +00:00
Nick Hill
4db4426404
[CI] Fail subprocess tests with root-cause error (#23795)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-09-10 13:53:21 -07:00
dsinghvi
70549c1245
[CI/Build] Serve images used by multimodal tests through local HTTP Server (#23907)
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-09-03 16:13:11 +08:00
Christian Pinto
1cb39dbcdd
[Misc] IO Processor plugins for pooling models (#22820)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
2025-08-31 23:07:12 -07:00
wang.yuqi
11a7fafaa8
[New Model]: Support GteNewModelForSequenceClassification (#23524)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-08-28 15:36:42 +08:00
Kyle Sayers
22feac8e95
[Transform] [Quantization] Add transforms to compressed tensors (#22486) 2025-08-28 02:43:48 -04:00
Chen Zhang
142ac08030
[Frontend] Optimize beam search performance by limiting concurrency (#23599)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-08-27 04:59:14 +00:00
wang.yuqi
f856c33ce9
[Model] Add multi_label_classification support (#23173)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-08-19 12:54:30 +00:00
Isotr0py
3dddbf1f25
[Misc] Add tensor schema test coverage for multimodal models (#21754)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-08-03 00:52:14 -07:00
wang.yuqi
65f311ce59
[Frontend] Add LLM.reward specific to reward models (#21720)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-07-29 20:56:03 -07:00
Cyrus Leung
86ae693f20
[Deprecation][2/N] Replace --task with --runner and --convert (#21470)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-27 19:42:40 -07:00
Maximilien de Bayser
1cd6eaba54
Support encoder-only models without KV-Cache (#21270)
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-07-26 21:09:52 +08:00
Ning Xie
d97841078b
[Misc] unify variable for LLM instance (#20996)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-07-21 12:18:33 +01:00
Patrick von Platen
cfbcb9ed87
[Voxtral] Add more tests (#21010)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-15 21:11:49 -07:00
Liangliang Ma
a3e4e85ece
[XPU][CI] enhance xpu test support (#20652)
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
Co-authored-by: zhenwei-intel <zhenweiliu@habana.ai>
2025-07-09 16:53:09 +00:00
Liangliang Ma
a0389e0554
[UT][intel GPU] use current_platform instead of device hardcode in v1 tests (#20169)
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
2025-07-02 09:06:04 +08:00
Isotr0py
61f4fc5dc6
[Bugfix][v1] Fix step pooler implementation and step pooling usage in v1 (#19956)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-23 18:38:06 +00:00
Isotr0py
6021999573
[Minor] Allow redirecting model path for HfRunner in test (#19795)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-18 23:04:10 -07:00
Maximilien de Bayser
799397ee4f
Support embedding models in V1 (#16188)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-06-18 21:36:33 -07:00
wang.yuqi
f40f763f12
[CI] Add mteb testing for rerank models (#19344) 2025-06-16 01:36:43 -07:00
Simon Mo
02f0c7b220
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-06-03 11:20:17 -07:00
Cyrus Leung
6aa8f9a4e7
[Core] Rework dtype resolution (#18751)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-06-01 11:04:23 +08:00
Isotr0py
c9479b2920
[Bugfix] Fix the failing gte embedding test (#18720)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-05-29 07:39:25 -07:00
Nan Qin
9609327fa4
[Core] [Bugfix]: tensor parallel with prompt embeds (#18171)
Signed-off-by: Nan2018 <nan@protopia.ai>
Co-authored-by: Andrew Sansom <andrew@protopia.ai>
2025-05-19 20:21:27 -07:00