483 Commits

Author SHA1 Message Date
Sage Moore
0889f66297 Merge branch 'main' of https://github.com/neuralmagic/vllm into lwilkinson/attn-slicing 2025-06-18 13:56:24 +00:00
Isotr0py
aed8468642
[Doc] Add missing llava family multi-image examples (#19698)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-17 07:05:21 +00:00
Navanit Dubey
3e7506975c
[DOC] Add reasoning capability to vLLM streamlit code (#19557) 2025-06-16 07:09:12 -04:00
Aaron Pham
7b3c9ff91d
[Doc] uses absolute links for structured outputs (#19582)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2025-06-13 03:35:17 +00:00
Aaron Pham
dba68f9159
[Doc] Unify structured outputs examples (#18196)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2025-06-12 22:50:31 +00:00
Ekagra Ranjan
017ef648e9
[Spec Decode][Benchmark] Generalize spec decode offline benchmark to more methods and datasets (#18847) 2025-06-12 10:30:56 -07:00
niu_he
dff680001d
Fix typo (#19525)
Signed-off-by: 2niuhe <carlton2tang@gmail.com>
2025-06-12 09:24:45 +00:00
runzhen
943ffa5703
[Bugfix] Update the example code, make it work with the latest lmcache (#19453)
Signed-off-by: Runzhen Wang <wangrunzhen@gmail.com>
2025-06-11 12:42:20 +00:00
wang.yuqi
3952731e8f
[New Model]: Support Qwen3 Embedding & Reranker (#19260) 2025-06-10 20:07:30 -07:00
Reid
6b1391ca7e
[Misc] refactor neuron_multimodal and profiling (#19397)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-10 06:12:42 +00:00
Sage Moore
642bf2dd8b Merge branch 'main' of https://github.com/neuralmagic/vllm into lwilkinson/attn-slicing 2025-06-08 18:02:06 +00:00
Reid
122cdca5f6
[Misc] refactor context extension (#19246)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-07 05:13:21 +00:00
Sage Moore
f8848bb201 misc fixes. lm_eval still gets a wrong answer but it no longer hangs
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-04 22:46:18 +00:00
jmswen
c8dcc15921
Allow AsyncLLMEngine.generate to target a specific DP rank (#19102)
Signed-off-by: Jon Swenson <jmswen@gmail.com>
2025-06-04 08:26:47 -07:00
Xu Wenqing
02658c2dfe
Add DeepSeek-R1-0528 function call chat template (#18874)
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
2025-06-04 13:24:18 +00:00
汪志鹏
3336c8cfbe
Fix #19130 (#19132)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
2025-06-04 01:42:06 -07:00
Calvin Chen
8d646c2e53
[Cleanup][v1]:remote guided-decoding-backend for example (#19059)
Signed-off-by: calvin chen <120380290@qq.com>
2025-06-04 04:23:26 +00:00
Jiaxin Shan
abd7df2fca
[Misc] Fix path and python alias errors in disagg_prefill exmaples (#18919) 2025-06-03 17:15:18 -07:00
Sage Moore
2e3484c237 debugging
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-03 19:25:01 +00:00
Simon Mo
02f0c7b220
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-06-03 11:20:17 -07:00
汪志鹏
1282bd812e
Add tarsier model support (#18985)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
2025-06-03 13:13:13 +08:00
Sage Moore
18e7d6c7b8 Merge branch 'main' of https://github.com/neuralmagic/vllm into lwilkinson/attn-slicing 2025-06-03 00:52:39 +00:00
Siyuan Liu
9112b443a0
[Hardware][TPU] Initial support of model parallelism with single worker using SPMD (#18011)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
2025-06-03 00:06:20 +00:00
Calvin Chen
c57d577e8d
add an absolute path for run.sh (#18258)
Signed-off-by: calvin chen <120380290@qq.com>
2025-06-02 19:38:23 +00:00
Sage Moore
8332924320 dp format
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 19:15:23 +00:00
Sage Moore
8ea80fca4a revert offline_inference/basic.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 18:05:48 +00:00
Sage Moore
21d9529a79 revert offline_inference/basic.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-02 18:05:26 +00:00
Nick Hill
9a1b9b99d7
[BugFix] Fix multi-node offline data-parallel (#18981)
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com>
2025-05-31 08:34:52 -07:00
Satyajith Chilappagari
2a50ef5760
[Neuron] Add Multi-Modal model support for Neuron (#18921)
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>
Co-authored-by: Ashraf Mahgoub <ashymahg@amazon.com>
Co-authored-by: Rohith Nallamaddi <nalrohit@amazon.com>
Co-authored-by: FeliciaLuo <luof@amazon.com>
Co-authored-by: Elaine Zhao <elaineyz@amazon.com>
2025-05-31 10:39:11 +00:00
Sage Moore
62da375465 more fixes 2025-05-30 21:17:06 +00:00
Mark McLoughlin
0e98964e94
[V1][Metrics] Remove metrics that were deprecated in 0.8 (#18837)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-05-28 18:54:12 +00:00
Reid
435fa95444
[Frontend] add run batch to CLI (#18804)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-28 07:08:57 -07:00
wang.yuqi
3e9ce609bd
[Bugfix] Fix nomic max_model_len (#18755) 2025-05-27 20:29:53 -07:00
Mark McLoughlin
06a0338015
[V1][Metrics] Add API for accessing in-memory Prometheus metrics (#17010)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-05-27 09:37:06 +00:00
Reid
fc6d0c290f
[Misc] improve docs (#18734)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-27 07:07:01 +00:00
Cyrus Leung
753944fa9b
[Doc] Update reproducibility doc and example (#18741)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-27 07:03:13 +00:00
Harry Mellor
27bebcd897
Convert examples to ruff-format (#18400)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-26 16:57:54 +00:00
Cyrus Leung
82e2339b06
[Doc] Move examples and further reorganize user guide (#18666)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-26 07:38:04 -07:00
AlexZhao
8820821b59
[Misc] Fixed the abnormally high TTFT issue in the PD disaggregation example (#18644)
Signed-off-by: zhaohaidao <zhaohaidao2008@hotmail.com>
Signed-off-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com>
Co-authored-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com>
2025-05-26 13:51:27 +08:00
Isotr0py
75f81750f3
[VLM] Initialize video input support for InternVL models (#18499)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-05-25 04:51:25 +00:00
Feng XiaoLong
4fc1bf813a
[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454)
Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com>
Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>
2025-05-23 16:16:26 -07:00
Lucas Wilkinson
18bf91e6a8 wip
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-23 03:31:49 +00:00
Chenheli Hua
04eb88dc80
Re-submit: Fix: Proper RGBA -> RGB conversion for PIL images. (#18569)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
2025-05-23 01:59:18 +00:00
Sanger Steel
c32e249a23
[Frontend] [Core] Add Tensorizer support for V1, LoRA adapter serialization and deserialization (#17926)
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
2025-05-22 18:44:18 -07:00
Kai Wu
c91fe7b1b9
[Frontend][Bug Fix] Update llama4 pythonic jinja template and llama4_pythonic parser (#17917)
Signed-off-by: Kai Wu <kaiwu@meta.com>
2025-05-22 16:44:08 -07:00
Lucas Wilkinson
9c60a6299d tp1 working multistream tp > 1 broken
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:36 +00:00
Lucas Wilkinson
04f11d97a0 working but only on the same stream
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:36 +00:00
Lucas Wilkinson
ffb740ae95 manually manage stream
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:36 +00:00
Sage Moore
020269c4c5 added multhreading support
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-05-22 20:51:36 +00:00
Lucas Wilkinson
9ccfd094ff fix dummy mode
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:35 +00:00