Aaron Pham
|
afe3236e90
|
[Chore] astral's ty (#18116)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-05-15 05:00:43 +00:00 |
|
Mark McLoughlin
|
65334ef3b9
|
[V1][Metrics] Remove unused code (#18158)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-05-14 20:13:17 -07:00 |
|
Chen Zhang
|
e60f550b38
|
[v1] Support multiple KV cache groups in GPU model runner (#17945)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-05-14 18:54:54 -07:00 |
|
David Xia
|
f25e0d1125
|
[Bugfix]: make most of test_openai_schema.py pass (#17664)
|
2025-05-14 17:04:35 -07:00 |
|
Andrey Talman
|
09f106a91e
|
Upload vllm index for the rc builds (#18173)
|
2025-05-14 16:35:56 -07:00 |
|
Michael Goin
|
2142035b51
|
[V1] Support multiple kv connectors (#17564)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-05-14 16:28:02 -07:00 |
|
Russell Bryant
|
78aa341d12
|
[CI] Fix race condition in test_kv_cache_events test (#18169)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-05-14 16:27:48 -07:00 |
|
Jerry Zhang
|
7974736740
|
Add support for loading torchao models with AOPerModuleConfig (#17826)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
|
2025-05-14 16:24:59 -07:00 |
|
Aaron Pham
|
2fc9075b82
|
[V1] Structured Outputs + Thinking compatibility (#16577)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-05-14 15:45:24 -07:00 |
|
Lucas Wilkinson
|
d93c976a0d
|
[Kernel] Have rotary embeddings support tensors (#18046)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-05-14 15:43:55 -07:00 |
|
David Xia
|
749f792553
|
[Frontend] decrease import time of vllm.multimodal (#18031)
Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com>
|
2025-05-14 15:43:32 -07:00 |
|
Robert Shaw
|
856865008e
|
[CI] Disable Failing Tests (#18165)
|
2025-05-14 13:49:56 -07:00 |
|
bnellnm
|
f9c069c85e
|
Modularize fused experts and integrate PPLX kernels (#15956)
|
2025-05-14 13:11:54 -07:00 |
|
Ekagra Ranjan
|
418d2f8bfb
|
[V1][Spec Decode] Share input embedding of target model with EAGLE draft model to free ~1GB for llama 3 model (#17326)
Co-authored-by: root <root@ekagra-8xh100.us-east5-a.c.serving-efficiency-poc.internal>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-05-14 12:31:46 -07:00 |
|
Chen Zhang
|
964472b966
|
[Doc] Update prefix cache metrics to counting tokens (#18138)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-05-14 15:23:30 +00:00 |
|
Nick Hill
|
59dd311cf5
|
[KVConnector] Keep KVTransferParams as a dict (#18033)
|
2025-05-14 08:05:57 -07:00 |
|
Cyrus Leung
|
d066e52013
|
[Bugfix] Fix chat utils tests (#18139)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-14 05:38:21 -07:00 |
|
Harry Mellor
|
c8ea982d9b
|
Update deprecated type hinting in platform, plugins, triton_utils, vllm_flash_attn (#18129)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-14 05:28:16 -07:00 |
|
Harry Mellor
|
dc372b9c8a
|
Update deprecated type hinting in vllm/device_allocator and vllm/distributed (#18126)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-14 04:07:57 -07:00 |
|
Harry Mellor
|
9b5b39b650
|
Update deprecated type hinting in vllm/lora (#18128)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-14 03:57:59 -07:00 |
|
Reid
|
9ccc6ded42
|
[doc] add missing import (#18133)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-14 10:57:34 +00:00 |
|
Cyrus Leung
|
d62a076e84
|
[Model] GritLM supports other attention backends (#18109)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-14 03:33:19 -07:00 |
|
Jee Jee Li
|
259127f8b8
|
[Bugfix] Fix LoRA test (#18123)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-05-14 10:25:47 +00:00 |
|
TJian
|
612c2edb4f
|
[FEAT] [ROCm]: Add AITER CK 2 Stages MoE support (#17110)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-05-14 03:03:11 -07:00 |
|
Andrzej Kotłowski
|
38fe728d60
|
[Bugfix] Fix QKVCrossParallelLinear::sync_weight_attrs for PyTorch compile (#17844)
Signed-off-by: Andrzej Kotłowski <akotlowski@habana.ai>
|
2025-05-14 09:39:51 +00:00 |
|
rongfu.leng
|
82e7f9bb03
|
[Misc] replace does not exist model (#18119)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-05-14 02:13:47 -07:00 |
|
Jee Jee Li
|
63dc3426e0
|
[Model] Add packed_modules_mapping for Qwen3-MOE (#18118)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-05-14 02:13:19 -07:00 |
|
Cyrus Leung
|
8f5dc41481
|
[Bugfix] Fix entrypoints audio test failure (#18111)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-14 09:08:07 +00:00 |
|
wang.yuqi
|
63ad622233
|
[New Model]: support GTE NewModel (#17986)
|
2025-05-14 01:31:31 -07:00 |
|
majianpeng
|
e7ef61c1f0
|
[Bugfix][Example] make lmcache v0 work. (#18051)
Signed-off-by: Ma, Jianpeng <jianpeng.ma@intel.com>
|
2025-05-13 23:43:44 -07:00 |
|
Jinzhen Lin
|
d4154c35a2
|
[Bugfix] fix moe marlin topk_weight loading (#18080)
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-05-13 23:31:57 -07:00 |
|
lkchen
|
6685890d11
|
[Fix] Move "model_config" as keyword args in chat_utils.py (#18098)
Signed-off-by: Linkun <github@lkchen.net>
|
2025-05-13 23:27:26 -07:00 |
|
Ecthlion_zyy
|
33011318c2
|
Fix broken example: examples/offline_inference/profiling at scheduler_config (#18117)
|
2025-05-13 23:19:14 -07:00 |
|
qli88
|
4f8b373225
|
[BugFix][AMD] Compatible patch for AITER lib after 04/20 (#17912)
Signed-off-by: Qiang Li <qiang.li2@amd.com>
|
2025-05-13 23:05:20 -07:00 |
|
Charlie Fu
|
7b2f28deba
|
[AMD][torch.compile] Enable silu+fp8_quant fusion for rocm (#18082)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-05-13 22:13:56 -07:00 |
|
vllmellm
|
2d912fb66f
|
[FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 (#17955)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-05-13 22:03:47 -07:00 |
|
Michael Goin
|
12e6c0b41c
|
[Bugfix][V1] Fix FlashInfer V1 backend using the wrong VllmConfig (#18086)
|
2025-05-13 20:36:17 -07:00 |
|
Michael Goin
|
9a2a6357de
|
[Bugfix] Fix FP8 Marlin MoE and enable for compressed-tensors models (#18026)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-13 19:48:33 -07:00 |
|
youkaichao
|
6266c57bae
|
[core][distributed] add ep group and all2all interface (#18077)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-05-14 10:46:49 +08:00 |
|
Jon Gill
|
754b699cbe
|
[Bug]: Fix S3 model/tokenizer path resolution (#18083)
Signed-off-by: Jon Gill <jon@yurts.ai>
|
2025-05-13 19:34:17 -07:00 |
|
Roger Wang
|
6e27c6d86b
|
[Misc] Remove unused numpy tensor (#18084)
Signed-off-by: Roger Wang <hey@rogerw.me>
|
2025-05-13 19:33:40 -07:00 |
|
Nick Hill
|
d5af47a149
|
[P/D] Add some more debug logs to NixlConnector (#18102)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-05-13 19:33:03 -07:00 |
|
Pavani Majety
|
65f0f74b66
|
[Hardware/NVIDIA/Modelopt] Fix modelopt forward method for v1 torch.compile (#18101)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-05-13 19:33:00 -07:00 |
|
Luka Govedič
|
176a95c670
|
[Fix] Support CUDAGraph capture for encoder-decoder on ROCm (#18104)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
|
2025-05-13 19:31:42 -07:00 |
|
Chen Zhang
|
f2ae883b67
|
[v1][KVCacheManager] pass num_new_computed_tokens to kv cache manager (#18001)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-05-13 19:09:39 -07:00 |
|
vllmellm
|
40de1ef455
|
[FEAT] [ROCm]: Add AITER Block-Scaled GEMM Feature (#14968)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-05-13 19:08:20 -07:00 |
|
Russell Bryant
|
0189a65a2e
|
[Docs] Expand security doc with firewall info (#18081)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-05-13 19:36:00 +00:00 |
|
Nick Hill
|
55aa7af994
|
[V1] DP scale-out (2/N): Decouple engine process management and comms (#15977)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-05-13 10:48:21 -07:00 |
|
Harry Mellor
|
0b217da646
|
Update deprecated type hinting in vllm/adapter_commons (#18073)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-13 08:32:51 -07:00 |
|
Harry Mellor
|
19324d660c
|
Update deprecated type hinting in vllm/compilation (#18072)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-13 08:32:48 -07:00 |
|