4666 Commits

Author SHA1 Message Date
Woosuk Kwon
4fb8142a0e
[V1][PP] Enable true PP with Ray executor (#13472)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-02-18 09:15:32 -08:00
Daniele
a02c86b4dd
[CI/Build] migrate static project metadata from setup.py to pyproject.toml (#8772) 2025-02-18 08:02:49 -08:00
Liangfu Chen
3809458456
[Bugfix] Fix invalid rotary embedding unit test (#13431)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
2025-02-18 11:52:03 +00:00
zifeitong
d3231cb436
[Bugfix] Handle content type with optional parameters (#13383)
Signed-off-by: Zifei Tong <zifeitong@gmail.com>
2025-02-18 11:29:13 +00:00
Cyrus Leung
435b502a6e
[ROCm] Make amdsmi import optional for other platforms (#13460) 2025-02-18 03:15:56 -08:00
Isotr0py
29fc5772c4
[Bugfix] Remove noisy error logging during local model loading (#13458) 2025-02-18 03:15:48 -08:00
Harry Mellor
2358ca527b
[Doc]: Improve feature tables (#13224)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-02-18 18:52:39 +08:00
Isotr0py
8cf97f8661
[Bugfix] Fix failing transformers dynamic module resolving with spawn multiproc method (#13403)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-02-18 10:25:53 +00:00
Yuan Tang
e2603fefb8
[Bugfix] Ensure LoRA path from the request can be included in err msg (#13450)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-18 16:19:15 +08:00
Michael Goin
b53d79983c
Add outlines fallback when JSON schema has enum (#13449)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-02-18 06:49:41 +00:00
Woosuk Kwon
9915912f7f
[V1][PP] Fix & Pin Ray version in requirements-cuda.txt (#13436)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-02-17 21:58:06 -08:00
Kyle Sayers
d1b649f1ef
[Quant] Aria SupportsQuant (#13416) 2025-02-17 21:51:09 -08:00
youkaichao
ac19b519ed
[core] fix sleep mode in pytorch 2.6 (#13456)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-18 13:48:10 +08:00
Yuan Tang
a1074b3efe
[Bugfix] Only print out chat template when supplied (#13444) 2025-02-17 21:43:31 -08:00
Kyle Sayers
00294e1bc6
[Quant] Arctic SupportsQuant (#13366) 2025-02-17 21:35:09 -08:00
Kyle Sayers
88787bce1d
[Quant] Molmo SupportsQuant (#13336) 2025-02-17 21:34:47 -08:00
youkaichao
932b51cedd
[v1] fix parallel config rank (#13445)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-18 12:33:45 +08:00
Divakar Verma
7c7adf81fc
[ROCm] fix get_device_name for rocm (#13438)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2025-02-18 04:07:12 +00:00
Isotr0py
67ef8f666a
[Model] Enable quantization support for transformers backend (#12960) 2025-02-17 19:52:47 -08:00
Harry Mellor
efbe854448
[Misc] Remove dangling references to SamplingType.BEAM (#13402) 2025-02-17 19:52:35 -08:00
Tyler Michael Smith
b3942e157e
[Bugfix][CI][V1] Work around V1 + CUDA Graph + torch._scaled_mm fallback issue (#13425)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-02-18 00:32:48 +00:00
Woosuk Kwon
cd4a72a28d
[V1][Spec decode] Move drafter to model runner (#13363)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-02-17 15:40:12 -08:00
Cody Yu
6ac485a953
[V1][PP] Fix intermediate tensor values (#13417)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
2025-02-17 13:37:45 -08:00
Woosuk Kwon
4c21ce9eba
[V1] Get input tokens from scheduler (#13339)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-02-17 11:01:07 -08:00
r.4ntix
ce77eb9410
[Bugfix] Fix VLLM_USE_MODELSCOPE issue (#13384) 2025-02-17 14:22:01 +00:00
Yan Ma
30513d1cb6
[Bugfix] fix xpu communicator (#13368)
Signed-off-by: yan ma <yan.ma@intel.com>
2025-02-17 20:59:18 +08:00
Tyler Michael Smith
1f69c4a892
[Model] Support Mamba2 (Codestral Mamba) (#9292)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
2025-02-17 20:17:50 +08:00
Cyrus Leung
7b623fca0b
[VLM] Check required fields before initializing field config in DictEmbeddingItems (#13380) 2025-02-17 01:36:07 -08:00
Mengqing Cao
238dfc8ac3
[MISC] tiny fixes (#13378) 2025-02-17 00:57:13 -08:00
Huy Do
45186834a0
Run v1 benchmark and integrate with PyTorch OSS benchmark database (#13068)
Signed-off-by: Huy Do <huydhn@gmail.com>
2025-02-17 08:16:32 +00:00
yankooo
f857311d13
Fix spelling error in index.md (#13369) 2025-02-17 06:53:20 +00:00
shangmingc
46cdd59577
[Feature][Spec Decode] Simplify the use of Eagle Spec Decode (#12304)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-02-16 19:32:26 -08:00
Jee Jee Li
2010f04c17
[V1][Misc] Avoid unnecessary log output (#13289) 2025-02-16 19:26:24 -08:00
Woosuk Kwon
69e1d23e1e
[V1][BugFix] Clean up rejection sampler & Fix warning msg (#13362)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-02-16 12:25:29 -08:00
Isotr0py
d67cc21b78
[Bugfix][Platform][CPU] Fix cuda platform detection on CPU backend edge case (#13358)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-02-16 18:55:27 +00:00
Woosuk Kwon
e18227b04a
[V1][PP] Cache Intermediate Tensors (#13353)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-02-16 10:02:27 -08:00
Woosuk Kwon
7b89386553
[V1][BugFix] Add __init__.py to v1/spec_decode/ (#13359)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-02-16 09:39:08 -08:00
da833b0aee
[Docs] Change myenv to vllm. Update python_env_setup.inc.md (#13325) 2025-02-16 16:04:21 +00:00
Cyrus Leung
5d2965b7d7
[Bugfix] Fix 2 Node and Spec Decode tests (#13341)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-02-16 22:20:22 +08:00
youkaichao
a0231b7c25
[platform] add base class for communicators (#13208)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-16 22:14:22 +08:00
youkaichao
124776ebd5
[ci] skip failed tests for flashinfer (#13352)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-16 22:09:15 +08:00
Roger Wang
b7d309860e
[V1] Update doc and examples for H2O-VL (#13349)
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-02-16 10:35:54 +00:00
wchen61
dc0f7ccf8b
[BugFix] Enhance test_pos_encoding to support execution on multi-devices (#13187)
Signed-off-by: wchen61 <wchen61@foxmail.com>
2025-02-16 08:59:49 +00:00
Michael Goin
d3d547e057
[Bugfix] Pin xgrammar to 0.1.11 (#13338) 2025-02-15 19:42:25 -08:00
Kyle Sayers
12913d17ba
[Quant] Add SupportsQuant to phi3 and clip (#13104) 2025-02-15 19:28:33 -08:00
Lily Liu
80f63a3966
[V1][Spec Decode] Ngram Spec Decode (#12193)
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
2025-02-15 18:05:11 -08:00
Cyrus Leung
367cb8ce8c
[Doc] [2/N] Add Fuyu E2E example for multimodal processor (#13331) 2025-02-15 07:06:23 -08:00
youkaichao
54ed913f34
[ci/build] update flashinfer (#13323) 2025-02-15 05:33:13 -08:00
Cody Yu
9206b3d7ec
[V1][PP] Run engine busy loop with batch queue (#13064) 2025-02-15 03:59:01 -08:00
rasmith
ed0de3e4b8
[AMD] [Model] DeepSeek tunings (#13199) 2025-02-15 03:58:09 -08:00