6011 Commits

Author SHA1 Message Date
Harry Mellor
a9138e85b1
Fix OOT registration test (#17099)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-24 04:44:12 -07:00
Harry Mellor
0a05ed57e6
Simplify TokenizerGroup (#16790)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-24 04:43:56 -07:00
Michael Goin
14288d1332
Disable enforce_eager for V1 TPU sampler and structured output tests (#17016)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-04-24 02:50:09 -07:00
Woosuk Kwon
b411418ff0
[Chore] Remove Sampler from Model Code (#17084)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-04-24 02:49:33 -07:00
omer-dayan
2bc0f72ae5
Add docs for runai_streamer_sharded (#17093)
Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-04-24 01:03:21 -07:00
Reid
9c1244de57
[doc] update to hyperlink (#17096)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-24 00:58:08 -07:00
Reid
db2f8d915c
[V1] Update structured output (#16812)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-23 23:57:17 -07:00
张宇
6167c0e5d2
[Bugfix][Core] add seq_id_to_seq_group clearing to avoid memory leak when s… (#16472)
Signed-off-by: 开哲 <kaizhe.zy@alibaba-inc.com>
Co-authored-by: 开哲 <kaizhe.zy@alibaba-inc.com>
2025-04-24 11:25:37 +08:00
Areeb Syed
ed2e464653
Addendum Fix to support FIPS enabled machines with MD5 hashing (#17043)
Signed-off-by: sydarb <areebsyed237@gmail.com>
2025-04-23 19:55:00 -07:00
Harry Mellor
2c8ed8ee48
More informative error when using Transformers backend (#16988)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-23 19:54:03 -07:00
Michael Goin
ed50f46641
[Bugfix] Enable V1 usage stats (#16986)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-04-23 19:54:00 -07:00
Woosuk Kwon
46e678bcff
[Minor] Use larger batch sizes for A100/B100/B200/MI300x (#17073)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-04-23 19:18:59 -07:00
Chen Xia
6b2427f995
[Quantization]add prefix for commandA quantized model (#17017) 2025-04-23 17:32:40 -07:00
Sangyeon Cho
b07d741661
[CI/Build] workaround for CI build failure (#17070)
Signed-off-by: csy1204 <josang1204@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-04-23 16:14:18 -07:00
Woosuk Kwon
41fb013d29
[V1][Spec Decode] Always use argmax for sampling draft tokens (#16899)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-04-23 14:57:43 -07:00
Yong Hoon Shin
32d4b669d0
[BugFix][V1] Fix int32 token index overflow when preparing input ids (#16806) 2025-04-23 12:12:35 -07:00
Travis Johnson
3cde34a4a4
[Frontend] Support guidance:no-additional-properties for compatibility with xgrammar (#15949)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2025-04-23 18:34:41 +00:00
Harry Mellor
bdb3660312
Use @property and private field for data_parallel_rank_local (#17053)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-23 08:50:08 -07:00
Harry Mellor
f3a21e9c68
CacheConfig.block_size should always be int when used (#17052)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-23 08:50:05 -07:00
Harry Mellor
8e630d680e
Improve Transformers backend model loading QoL (#17039)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-23 07:33:51 -07:00
Russell Bryant
af869f6dff
[CI] Update structured-output label automation (#17055)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-04-23 07:33:14 -07:00
Harry Mellor
53c0fa1e25
Ensure that pid passed to kill_process_tree is int for mypy (#17051)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-23 07:32:26 -07:00
Michael Yao
f7912cba3d
[Doc] Add top anchor and a note to quantization/bitblas.md (#17042)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-04-23 07:32:16 -07:00
Michael Goin
6317a5174a
Categorize tests/kernels/ based on kernel type (#16799)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-04-23 09:21:07 -04:00
Michael Goin
aa72d9a4ea
Mistral-format support for compressed-tensors (#16803)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-04-23 08:46:23 -04:00
Russell Bryant
ce17db8085
[CI] Run v1/test_serial_utils.py in CI (#16996)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-04-23 01:13:34 -07:00
Chauncey
8c87a9ad46
[Bugfix] Fix AssertionError: skip_special_tokens=False is not supported for Mistral tokenizers (#16964)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-04-23 07:24:09 +00:00
huafeng
ec69124eb4
[Misc] Improve readability of get_open_port function. (#17024)
Signed-off-by: gitover22 <qidizou88@gmail.com>
2025-04-23 06:16:53 +00:00
Lucas Wilkinson
d0da99fb70
[BugFix] llama4 fa3 fix - RuntimeError: scheduler_metadata must have shape (metadata_size) (#16998)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-04-22 21:49:24 -07:00
Nick Hill
b2f195c429
[V1] Avoid socket errors during shutdown when requests are in in-flight (#16807)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-04-23 12:36:29 +08:00
vllmellm
047797ef90
[Bugfix] Triton FA function takes no keyword arguments (#16902)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-04-22 21:35:24 -07:00
Reid
eb8ef4224d
[doc] add download path tips (#17013)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-23 04:06:30 +00:00
Chendi.Xue
56a735261c
[INTEL-HPU][v0] Port delayed sampling to upstream (#16949)
Signed-off-by: Michal Adamczyk <michal.adamczyk@intel.com>
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Co-authored-by: Michal Adamczyk <madamczyk@habana.ai>
2025-04-22 20:14:11 -07:00
youkaichao
e1cf90e099
[misc] tune some env vars for GB200 (#16992)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-04-23 10:59:48 +08:00
Chauncey
6bc1e30ef9
Revert "[Misc] Add S3 environment variables for better support of MinIO." (#17021) 2025-04-22 19:22:29 -07:00
vllmellm
7e081ba7ca
[BugFix] Revert ROCm Custom Paged Attention Env Flag Check (#17022)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-04-22 19:17:48 -07:00
Nick Hill
1e013fa388
[V1][DP] More robust DP/EP dummy request coordination (#16277)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-04-22 19:12:15 -07:00
Aleksandr Malyshev
bc7c4d206b
[Kernel][ROCM] Upstream prefix prefill speed up for vLLM V1 (#13305)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu>
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com>
Signed-off-by: maleksan85 <maleksan@amd.com>
Signed-off-by: <>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: qli88 <qiang.li2@amd.com>
Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com>
2025-04-22 19:11:56 -07:00
Yang Wang
f67e9e9f22
add Dockerfile build vllm against torch nightly (#16936)
Signed-off-by: Yang Wang <elainewy@meta.com>
2025-04-22 19:08:27 -07:00
Guillaume Calmettes
36fe78769f
[Bugfix] validate urls object for multimodal content parts (#16990)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
2025-04-23 09:43:06 +08:00
Chenyaaang
83d933718c
[Core][V1][TPU] Enable structured decoding on TPU V1 (#16499)
Signed-off-by: Chenyaaang <chenyangli@google.com>
2025-04-22 18:05:23 -06:00
Nick Hill
5175b884f7
[BugFix] Remove default multiproc executor collective_rpc timeout (#17000)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-04-22 23:27:14 +00:00
Alexei-V-Ivanov-AMD
5536b30a4c
Fencing Kernels Tests for enabling on AMD (#16929)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
2025-04-22 09:32:40 -07:00
Richard Zou
7f58fb9718
Add assertion for no objects while hashing hf_config (#16930)
Signed-off-by: rzou <zou3519@gmail.com>
2025-04-22 09:32:22 -07:00
vllmellm
30bc3e0f66
[FEAT][ROCm]: Support AITER MLA (#15893)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: qli88 <qiang.li2@amd.com>
2025-04-22 09:31:13 -07:00
Reid
f34410715f
[frontend] enhance tool_calls type check (#16882)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-22 15:40:24 +00:00
Chauncey
68d4c33202
[Misc] Add S3 environment variables for better support of MinIO. (#16977)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-04-22 14:27:36 +00:00
Zhengyuan Su (苏政渊)
f961d7f6ef
[BugFix] Pass in correct VLLM config in FlashInfer backend (#13207) (#16973)
Signed-off-by: 苏政渊 <suzhengyuan@moonshot.cn>
Co-authored-by: 苏政渊 <suzhengyuan@moonshot.cn>
2025-04-22 06:44:10 -07:00
Harry Mellor
d059110498
Improve configs - SpeculativeConfig (#16971)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-22 12:55:36 +00:00
Yang Fan
571e8dd65e
[Bugfix] Fix distributed bug again in Qwen2.5-VL & Qwen2.5-Omni (#16974)
Signed-off-by: fyabc <suyang.fy@alibaba-inc.com>
2025-04-22 12:23:17 +00:00