Sage Moore
9b5913ed10
Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/dbo-eager-decode-only
2025-07-09 15:51:12 +00:00
Ricardo Decal
b91cb3fa5c
[Docs] Improve documentation for Deepseek R1 on Ray Serve LLM ( #20601 )
...
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
2025-07-08 02:09:06 -07:00
Sanger Steel
72d14d0eed
[Frontend] [Core] Integrate Tensorizer in to S3 loading machinery, allow passing arbitrary arguments during save/load ( #19619 )
...
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
Co-authored-by: Eta <esyra@coreweave.com>
2025-07-07 22:47:43 -07:00
Ricardo Decal
e60d422f19
[Docs] Improve docstring for ray data llm example ( #20597 )
...
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
2025-07-07 20:06:26 -07:00
wang.yuqi
110df74332
[Model][Last/4] Automatic conversion of CrossEncoding model ( #19675 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-07-07 14:46:04 +00:00
Cyrus Leung
9fb52e523a
[V1] Support any head size for FlexAttention backend ( #20467 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-06 09:54:36 -07:00
Woosuk Kwon
e202dd2736
[V0 deprecation] Remove V0 CPU/XPU/TPU backends ( #20412 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2025-07-06 08:48:13 -07:00
Flora Feng
fe1e924811
[Frontend] Support image object in llm.chat ( #19635 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: Flora Feng <4florafeng@gmail.com>
2025-07-06 06:47:13 +00:00
Jee Jee Li
1caca5a589
[Misc] Add SPDX-FileCopyrightText ( #20428 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-04 07:40:42 +00:00
Reid
a7bab0c9e5
[Misc] small update ( #20462 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
2025-07-03 20:33:44 -07:00
汪志鹏
25950dca9b
Add ignore consolidated file in mistral example code ( #20420 )
...
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
2025-07-04 02:55:07 +00:00
Sage Moore
9b7edc0343
cleanup data_parallel.py
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:02:12 +00:00
Sage Moore
be2e1632fd
delete basic-ub.py
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:01:01 +00:00
Reid
359200f6ac
[doc] fix link ( #20417 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
2025-07-03 00:21:57 -07:00
qscqesze
363528de27
[Feature] Support MiniMax-M1 function calls features ( #20297 )
...
Signed-off-by: QscQ <qscqesze@gmail.com>
Signed-off-by: qingjun <qingjun@minimaxi.com>
2025-07-03 06:48:27 +00:00
Sage Moore
0e499c4f4d
first round of cleanups
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 21:11:28 +00:00
Sage Moore
0767d9863f
fix data_parallel.py
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 19:25:59 +00:00
Sage Moore
c0efbbb5de
misc changes
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 16:56:30 +00:00
Lucas Wilkinson
f7a3ee0ea1
Merge remote-tracking branch 'origin/main' into lwilkinson/attn-slicing
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-07-02 16:52:19 +00:00
Kwai-Keye
8452946c06
[Model][VLM] Support Keye-VL-8B-Preview ( #20126 )
...
Signed-off-by: Kwai-Keye <Keye@kuaishou.com>
2025-07-01 23:35:04 -07:00
Nicolò Lucchesi
314af8617c
[Docs] Update transcriptions API to use openai client with stream=True ( #20271 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-07-01 15:47:13 +00:00
Yuxuan Zhang
ed70f3c64f
Add GLM4.1V model (Draft) ( #19331 )
...
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-07-01 12:48:26 +00:00
Kuntai Du
92ee7baaf9
[Example] add one-click runnable example for P2P NCCL XpYd ( #20246 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
2025-06-30 21:03:55 -07:00
Woosuk Kwon
7151f92241
[Misc] Fix spec decode example ( #20296 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-06-30 21:01:48 -07:00
Sage Moore
d833982e48
random push
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-30 17:08:51 +00:00
Woosuk Kwon
2965c99c86
[Spec Decode] Clean up spec decode example ( #20240 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-06-30 08:28:13 -07:00
Sage Moore
4672c72f44
capture works replay does not
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-28 19:14:48 +00:00
Wentao Ye
d45417b804
fix ci issue distributed 4 gpu test ( #20204 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-06-27 22:50:00 -07:00
Ekagra Ranjan
9502c38138
[Benchmark][Bug] Fix multiple bugs in bench and add args to spec_decode offline ( #20083 )
2025-06-25 22:06:27 -07:00
Nicolò Lucchesi
e795d723ed
[Frontend] Add /v1/audio/translations OpenAI API endpoint ( #19615 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-06-25 17:54:14 +00:00
Reid
26d34eb67e
refactor example - qwen3_reranker ( #19847 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-24 14:03:20 +00:00
Lukas Geiger
c3649e4fee
[Docs] Fix syntax highlighting of shell commands ( #19870 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-06-23 17:59:09 +00:00
Reid
b82e0f82cb
[doc] use MkDocs collapsible blocks - supplement ( #19973 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-23 10:54:16 +00:00
汪志鹏
c3bf9bad11
[New model support]Support Tarsier2 ( #19887 )
...
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
2025-06-21 04:01:51 +00:00
Reid
e384f2f108
[Misc] refactor example - openai_transcription_client ( #19851 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-20 08:02:21 +00:00
Reid
089a306f19
[Misc] update cuda version ( #19526 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-20 07:25:15 +00:00
Zuxin
1d0ae26c85
Add xLAM tool parser support ( #17148 )
2025-06-19 14:26:41 +08:00
Maximilien de Bayser
799397ee4f
Support embedding models in V1 ( #16188 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-06-18 21:36:33 -07:00
Sage Moore
0889f66297
Merge branch 'main' of https://github.com/neuralmagic/vllm into lwilkinson/attn-slicing
2025-06-18 13:56:24 +00:00
Zhonghua Deng
eccdc8318c
[V1][P/D] An native implementation of xPyD based on P2P NCCL ( #18242 )
...
Signed-off-by: Abatom <abzhonghua@gmail.com>
2025-06-18 06:32:36 +00:00
Isotr0py
aed8468642
[Doc] Add missing llava family multi-image examples ( #19698 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-17 07:05:21 +00:00
Navanit Dubey
3e7506975c
[DOC] Add reasoning capability to vLLM streamlit code ( #19557 )
2025-06-16 07:09:12 -04:00
Aaron Pham
7b3c9ff91d
[Doc] uses absolute links for structured outputs ( #19582 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2025-06-13 03:35:17 +00:00
Aaron Pham
dba68f9159
[Doc] Unify structured outputs examples ( #18196 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2025-06-12 22:50:31 +00:00
Ekagra Ranjan
017ef648e9
[Spec Decode][Benchmark] Generalize spec decode offline benchmark to more methods and datasets ( #18847 )
2025-06-12 10:30:56 -07:00
niu_he
dff680001d
Fix typo ( #19525 )
...
Signed-off-by: 2niuhe <carlton2tang@gmail.com>
2025-06-12 09:24:45 +00:00
runzhen
943ffa5703
[Bugfix] Update the example code, make it work with the latest lmcache ( #19453 )
...
Signed-off-by: Runzhen Wang <wangrunzhen@gmail.com>
2025-06-11 12:42:20 +00:00
wang.yuqi
3952731e8f
[New Model]: Support Qwen3 Embedding & Reranker ( #19260 )
2025-06-10 20:07:30 -07:00
Reid
6b1391ca7e
[Misc] refactor neuron_multimodal and profiling ( #19397 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-10 06:12:42 +00:00
Sage Moore
642bf2dd8b
Merge branch 'main' of https://github.com/neuralmagic/vllm into lwilkinson/attn-slicing
2025-06-08 18:02:06 +00:00