7395 Commits

Author SHA1 Message Date
Lukas Geiger
3b523e38d9
[Core] Do not copy array during hashing (#19484)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-06-18 15:36:55 -07:00
afeldman-nm
16c16301c8
Disable "Forbid direct 'import triton'" check for vllm/triton_utils/importing.py in an extensible way (#19783)
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
2025-06-18 15:08:00 -07:00
Nathan Weinberg
9206d0ff01
docs: fix Slack bulletpoint in README (#19811)
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-06-18 20:47:08 +00:00
Chen Zhang
a89209b78d
[v1] Support mamba2 (#19327)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-06-18 20:34:15 +00:00
Russell Bryant
ffacb222cb
[Docs] Add Huzaifa Sidhpurwala to vuln mgmt team doc (#19808)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-06-18 20:22:28 +00:00
Chauncey
12575cfa7a
[Bugfix] fix RAY_CGRAPH_get_timeout is not set successfully (#19725)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-06-18 10:26:16 -07:00
Zzz9990
8b6e1d639c
[Hardware][AMD] integrate aiter chunked prefill into vllm (#18596)
Signed-off-by: fsx950223 <fsx950223@outlook.com>
Signed-off-by: charlifu <charlifu@amd.com>
Co-authored-by: fsx950223 <fsx950223@outlook.com>
Co-authored-by: charlifu <charlifu@amd.com>
2025-06-18 08:46:51 -07:00
Lu Fang
735a9de71f
[Qwen] Add tagging rule for Qwen related PRs (#19799)
Signed-off-by: Lu Fang <lufang@fb.com>
2025-06-18 14:26:43 +00:00
wangxiyuan
257ab95439
[Platform] Allow platform use V1 Engine by default (#19792)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-06-18 13:03:36 +00:00
Reid
cca91a7a10
[doc] fix the incorrect label (#19787)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-18 10:30:58 +00:00
Woosuk Kwon
f04d604567
[Minor] Zero-initialize attn output buffer (#19784)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-06-18 06:59:27 +00:00
afeldman-nm
19a53b2783
[V1] Decouple GPU and TPU InputBatch (#19778)
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
2025-06-18 06:38:13 +00:00
Zhonghua Deng
eccdc8318c
[V1][P/D] An native implementation of xPyD based on P2P NCCL (#18242)
Signed-off-by: Abatom <abzhonghua@gmail.com>
2025-06-18 06:32:36 +00:00
Russell Bryant
5f52a84685
[V1] Add API docs for EncoderCacheManager (#19294)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-06-18 13:37:01 +08:00
lkchen
d4629dc43f
[Misc] Add __str__ for RequestStatus (#19780)
Signed-off-by: Linkun Chen <github@lkchen.net>
2025-06-18 03:03:01 +00:00
Ning Xie
6e9cc73f67
[MISC] correct DeviceConfig device field static type analysis (#19699)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-06-17 17:21:50 -07:00
Ning Xie
c53711bd63
[MISC] correct copy_blocks src_to_dists param type (#19696)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-06-17 17:21:06 -07:00
Chenyaaang
dac8cc49f4
[TPU] Update torch version to include paged attention kernel change (#19706)
Signed-off-by: Chenyaaang <chenyangli@google.com>
2025-06-17 22:24:49 +00:00
Charlie Fu
a44b1c951d
[Feature][ROCm] Add full graph capture support for TritonAttentionBackend (#19158)
Signed-off-by: charlifu <charlifu@amd.com>
2025-06-17 17:03:06 -04:00
Michael Goin
b447624ee3
[Bugfix] Fix faulty triton importing logic when using Ray for DP (#19734)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-17 20:59:29 +00:00
Jiayi Yao
cda92307c1
[Misc] Update lmcache connector with the latest connector apis (#19441)
Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn>
2025-06-17 19:57:54 +00:00
Michael Goin
bf57ccc5c2
Remove sm120 arch from sm100 cutlass kernel arch list (#19716)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-17 11:49:39 -07:00
Wentao Ye
ffb2cd6b54
[Perf] Optimize moe_align_block_size CUDA kernel (#19572)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-06-17 11:49:26 -07:00
Isotr0py
ca94d7fa00
[Bugfix] Update multimodel models mapping to fit new checkpoint after Transformers v4.52 (#19151)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-17 15:58:38 +00:00
CYJiang
5a1c2e15d8
[Mis] remove duplicate engine status checks (#19647)
Signed-off-by: googs1025 <googs1025@gmail.com>
2025-06-17 08:17:38 -07:00
Nicolò Lucchesi
4c8f64faa7
[V1][Kernel] Flashinfer HND KV cache layout (#19280)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-06-17 09:09:22 -04:00
David Xia
93aee29fdb
[doc] split "Other AI Accelerators" tabs (#19708) 2025-06-17 22:05:29 +09:00
Reid
154d063b9f
[doc][mkdocs] Add edit button to documentation (#19637)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-17 11:10:31 +00:00
jvlunteren
ccd7c05089
[Kernel] Add Split-KV Support to Unified Triton Attention Kernel (#19152)
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>
2025-06-17 10:45:07 +00:00
Huy Do
c48c6c4008
Add a doc on how to update PyTorch version (#19705) 2025-06-17 18:10:37 +08:00
Isotr0py
aed8468642
[Doc] Add missing llava family multi-image examples (#19698)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-17 07:05:21 +00:00
quanliu
5c76b9cdaf
[Core] add remove_seq_from_computed_blocks_tracker to BlockSpaceManager (#19686)
Signed-off-by: 刘全 <quan.liu2@dbappsecurity.com.cn>
Co-authored-by: 刘全 <quan.liu2@dbappsecurity.com.cn>
2025-06-17 04:40:58 +00:00
Driss Guessous
ddfed314f9
Fixes IMA for TP w/ flex-attention (#19712)
Signed-off-by: drisspg <drisspguessous@gmail.com>
2025-06-17 04:01:50 +00:00
Di Liu
5b3ad5ecf2
[DOC] fix doc typos (#19600)
Signed-off-by: Di Liu <liu-di@sjtu.edu.cn>
2025-06-17 11:34:53 +08:00
nguyenhoangthuan99
ede5c4ebdf
[Frontend] add chunking audio for > 30s audio (#19597)
Signed-off-by: nguyenhoangthuan99 <thuanhppro12@gmail.com>
2025-06-17 11:34:00 +08:00
Lucas Wilkinson
07334959d8
[Wheel Size] Only build FA2 8.0+PTX (#19336) 2025-06-17 12:32:49 +09:00
David Xia
119f683949
[doc] add project flag to gcloud TPU command (#19664)
Signed-off-by: David Xia <david@davidxia.com>
2025-06-17 01:00:09 +00:00
Conroy Cheers
0860087aff
[Fix] Fall back to Gloo when NCCL backend is unavailable (#19641)
Signed-off-by: conroy-cheers <conroy@corncheese.org>
2025-06-17 08:42:14 +08:00
Dipika Sikka
6bc7b57315
[Quantization] Remove FP4 emulation; Fall-back to marlin for device < 100 (#19563) 2025-06-16 17:33:51 -04:00
Russell Bryant
90f9c2eb5c
[V1] Change return type on get_multimodal_embeddings() (#19446)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-06-16 13:32:15 -04:00
qscqesze
387bdf0ab9
[Model] Add support for MiniMaxM1ForCausalLM (shares architecture with MiniMaxText01ForCausalLM) (#19677)
Signed-off-by: QscQ <qscqesze@gmail.com>
2025-06-16 09:47:14 -07:00
bnellnm
5e5baa91aa
[Kernels] Use empty for modular MoE workspaces (#19667)
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-06-16 14:58:01 +00:00
Chauncey
836d4ce140
[Bugfix] fix missing 'finish_reason': null in streaming chat (#19662)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-06-16 14:10:39 +00:00
Ning Xie
c3fec47bb7
[MISC] bump huggingface_hub pkg to 0.33.0 (#19547)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-06-16 05:22:28 -07:00
Isotr0py
1173804dca
[Bugfix] Fix TP inference for Flex attention backend (#19657)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-16 11:21:37 +00:00
Shawn Tan
4d5424029b
[Feature]:Allow for Granite MoE Hybrid models with _only_ shared experts. (#19652)
Signed-off-by: Shawn Tan <shawntan@ibm.com>
2025-06-16 11:14:18 +00:00
Navanit Dubey
3e7506975c
[DOC] Add reasoning capability to vLLM streamlit code (#19557) 2025-06-16 07:09:12 -04:00
Nick Hill
ee35e96ac3
[BugFix] Don't catch BaseException when dumping execute_model errors (#19626)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-06-16 11:01:08 +00:00
Szymon Ożóg
dec66d253b
[Kernel] GGUF MMVQ kernel for multiple input vectors (#18754)
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
2025-06-16 17:33:26 +08:00
Russell Bryant
8d120701fd
[Docs] Move multiproc doc to v1 dir (#19651)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-06-16 09:10:12 +00:00