Lain
|
e23564cb70
|
use ceil_div in cutlass block scaling shape check (#17918)
|
2025-05-16 03:02:58 -07:00 |
|
Isotr0py
|
390ec88905
|
[Misc] Consolidate Audio tests into multimodal common generation tests (#18214)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-16 09:18:08 +00:00 |
|
Seiji Eicher
|
541817670c
|
[Misc] Add Ray Prometheus logger to V1 (#17925)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-05-16 01:02:42 -07:00 |
|
Vadim Gimpelson
|
67da5720d4
|
[PERF] Speed up Qwen2.5-VL model by speed up rotary position embedding (#17973)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai>
|
2025-05-15 23:31:02 -07:00 |
|
David Xia
|
5c04bb8b86
|
[doc] fix multimodal example script (#18089)
Signed-off-by: David Xia <david@davidxia.com>
|
2025-05-16 06:05:34 +00:00 |
|
Lucia Fang
|
3d2779c29a
|
[Feature] Support Pipeline Parallism in torchrun SPMD offline inference for V1 (#17827)
Signed-off-by: Lucia Fang <fanglu@fb.com>
|
2025-05-15 22:28:27 -07:00 |
|
Will Eaton
|
6b31c84aff
|
Throw better error for when running into k8s service discovery issue (#18209)
Signed-off-by: Will Eaton <weaton@redhat.com>
|
2025-05-15 21:07:28 -07:00 |
|
Harry Mellor
|
b18201fe06
|
Allow users to pass arbitrary JSON keys from CLI (#18208)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-15 21:05:34 -07:00 |
|
Sky Lee
|
f4937a51c1
|
[Model] vLLM v1 supports Medusa (#17956)
Signed-off-by: lisiqi23 <lisiqi23@xiaomi.com>
Signed-off-by: skylee-01 <497627264@qq.com>
Co-authored-by: lisiqi23 <lisiqi23@xiaomi.com>
|
2025-05-15 21:05:31 -07:00 |
|
kliuae
|
ee659e3b60
|
[Bugfix][ROCm] Use chunked_prefill_paged_decode as fallback for V1 attention on ROCm (#18093)
Signed-off-by: kf <kuanfu.liu@embeddedllm.com>
|
2025-05-15 19:30:17 -07:00 |
|
Lucas Wilkinson
|
4e1c6a0264
|
[Bugfix] fix rotary embedding test for _get_padded_tensor_shape (#18229)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-05-16 01:32:45 +00:00 |
|
Lucas Wilkinson
|
c7852a6d9b
|
[Build] Allow shipping PTX on a per-file basis (#18155)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-05-15 16:41:55 -07:00 |
|
Lucia Fang
|
8795eb9975
|
[Bugfix] Fix test_eagle test (#18223)
Signed-off-by: Lucia Fang <fanglu@fb.com>
|
2025-05-15 15:59:42 -07:00 |
|
Alexei-V-Ivanov-AMD
|
0b34593017
|
Adding "AMD: Tensorizer Test" to amdproduction. (#18216)
|
2025-05-15 11:01:25 -07:00 |
|
Nicolò Lucchesi
|
e3f3aee6f4
|
[Misc] Avoid cuda graph log when sizes still match (#18202)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-05-15 09:59:38 -07:00 |
|
TJian
|
92540529c0
|
[Bugfix] [ROCm]: Remove assertion logic when using AITER fused moe in unquantizedMethod to reenable LLama4 BF16 (#18205)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-05-15 09:53:18 -07:00 |
|
Zhonghua Deng
|
fadb8d5c2d
|
[Bugfix]Change the exception thrown by call_hf_processor from RuntimeError to ValueError (#18181)
Signed-off-by: Abatom <abzhonghua@gmail.com>
|
2025-05-15 09:01:47 -07:00 |
|
Sebastian Schoennenbeck
|
2aa5470ac5
|
[Frontend] Fix chat template content format detection (#18190)
Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com>
|
2025-05-15 09:00:21 -07:00 |
|
Harry Mellor
|
51ff154639
|
Improve examples rendering in docs and GitHub (#18203)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-15 15:57:49 +00:00 |
|
Alexei-V-Ivanov-AMD
|
566ec04c3d
|
Adding "Basic Models Test" and "Multi-Modal Models Test (Extended) 3" in AMD Pipeline (#18106)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-05-15 08:49:23 -07:00 |
|
Thomas Parnell
|
01c22335ba
|
[Kernel] [V1] Fix performance regression for triton unified attention (#18161)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-05-15 06:39:00 -07:00 |
|
hustxiayang
|
451da4bcbd
|
add tools into TokenizeChatRequest (#18187)
Signed-off-by: yangxia <yangxiast@gmail.com>
|
2025-05-15 04:01:49 -07:00 |
|
Harry Mellor
|
07ad27121f
|
Update deprecated type hinting in model_loader (#18130)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-15 04:00:21 -07:00 |
|
omahs
|
a9944aabfa
|
fix: typos (#18151)
Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>
|
2025-05-15 02:16:15 -07:00 |
|
Russell Bryant
|
a8f5aec20a
|
[V1] Update zmq socket creation in nixl connector (#18148)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-05-14 23:17:57 -07:00 |
|
David Xia
|
de71fec81b
|
[CI] don't skip fixed test_kv_cache_events() (#18183)
Signed-off-by: David Xia <david@davidxia.com>
|
2025-05-14 23:17:16 -07:00 |
|
Mengqing Cao
|
70f8b96724
|
[Bugfix] Fix FusedMoEPrepareAndFinalize for cuda-disalike backends (#18178)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2025-05-14 23:16:31 -07:00 |
|
inkcherry
|
dd2a94596a
|
[Model] Allow the use of sliding window in Qwen2 (#17772)
Signed-off-by: inkcherry <mingzhi.liu@intel.com>
|
2025-05-14 22:29:38 -07:00 |
|
Ning Xie
|
420caf7557
|
[UT] Add ut for none hash (#17892)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-05-15 13:28:11 +08:00 |
|
Chenheli Hua
|
4f07a64075
|
Support custom implementations of VideoLoader backends. (#18091)
|
2025-05-15 13:26:49 +08:00 |
|
Thomas Parnell
|
e6b8e65d2d
|
[Bugfix] Fix fp8 tests for triton_unified_attention for Triton 3.3 (#18013)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-05-15 13:26:34 +08:00 |
|
Harry Mellor
|
26d0419309
|
Update deprecated type hinting in models (#18132)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-14 22:06:50 -07:00 |
|
Luka Govedič
|
83f74c698f
|
[Fix][ROCm] Enforce eager for all encoder-decoder models on ROCm (#18154)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
|
2025-05-14 22:04:43 -07:00 |
|
Reid
|
2dff093574
|
[Misc] add lobe-chat support (#18177)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-15 05:02:23 +00:00 |
|
Aaron Pham
|
afe3236e90
|
[Chore] astral's ty (#18116)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-05-15 05:00:43 +00:00 |
|
Mark McLoughlin
|
65334ef3b9
|
[V1][Metrics] Remove unused code (#18158)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-05-14 20:13:17 -07:00 |
|
Chen Zhang
|
e60f550b38
|
[v1] Support multiple KV cache groups in GPU model runner (#17945)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-05-14 18:54:54 -07:00 |
|
David Xia
|
f25e0d1125
|
[Bugfix]: make most of test_openai_schema.py pass (#17664)
|
2025-05-14 17:04:35 -07:00 |
|
Andrey Talman
|
09f106a91e
|
Upload vllm index for the rc builds (#18173)
|
2025-05-14 16:35:56 -07:00 |
|
Michael Goin
|
2142035b51
|
[V1] Support multiple kv connectors (#17564)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-05-14 16:28:02 -07:00 |
|
Russell Bryant
|
78aa341d12
|
[CI] Fix race condition in test_kv_cache_events test (#18169)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-05-14 16:27:48 -07:00 |
|
Jerry Zhang
|
7974736740
|
Add support for loading torchao models with AOPerModuleConfig (#17826)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
|
2025-05-14 16:24:59 -07:00 |
|
Aaron Pham
|
2fc9075b82
|
[V1] Structured Outputs + Thinking compatibility (#16577)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-05-14 15:45:24 -07:00 |
|
Lucas Wilkinson
|
d93c976a0d
|
[Kernel] Have rotary embeddings support tensors (#18046)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-05-14 15:43:55 -07:00 |
|
David Xia
|
749f792553
|
[Frontend] decrease import time of vllm.multimodal (#18031)
Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com>
|
2025-05-14 15:43:32 -07:00 |
|
Robert Shaw
|
856865008e
|
[CI] Disable Failing Tests (#18165)
|
2025-05-14 13:49:56 -07:00 |
|
bnellnm
|
f9c069c85e
|
Modularize fused experts and integrate PPLX kernels (#15956)
|
2025-05-14 13:11:54 -07:00 |
|
Ekagra Ranjan
|
418d2f8bfb
|
[V1][Spec Decode] Share input embedding of target model with EAGLE draft model to free ~1GB for llama 3 model (#17326)
Co-authored-by: root <root@ekagra-8xh100.us-east5-a.c.serving-efficiency-poc.internal>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-05-14 12:31:46 -07:00 |
|
Chen Zhang
|
964472b966
|
[Doc] Update prefix cache metrics to counting tokens (#18138)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-05-14 15:23:30 +00:00 |
|
Nick Hill
|
59dd311cf5
|
[KVConnector] Keep KVTransferParams as a dict (#18033)
|
2025-05-14 08:05:57 -07:00 |
|