6796 Commits

Author SHA1 Message Date
Pooya Davoodi
185cc19f92
[Frontend] Optionally remove memory buffer used for uploading to URLs in run_batch (#12927)
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
2025-02-14 08:22:42 +00:00
Alexander Matveev
45f90bcbba
[WIP] TPU V1 Support Refactored (#13049) 2025-02-14 00:21:53 -08:00
Kero Liang
b0ccfc565a
[Bugfix][V1] GPUModelRunner._update_states should return True when there is a finished request in batch (#13126) 2025-02-13 22:39:20 -08:00
Sage Moore
ba59b78a9c
[ROCm][V1] Add intial ROCm support to V1 (#12790) 2025-02-13 22:21:50 -08:00
Varun Sundar Rabindranath
cbc40128eb
[V1] LoRA - Enable Serving Usecase (#12883)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-02-14 14:21:12 +08:00
Michael Goin
f0b2da72a8
Expand MLA to support most types of quantization (#13181) 2025-02-13 22:19:22 -08:00
Harry Mellor
f2b20fe491
Consolidate Llama model usage in tests (#13094) 2025-02-13 22:18:03 -08:00
Wang Ran (汪然)
40932d7a05
[Misc] Remove redundant statements in scheduler.py (#13229) 2025-02-13 22:07:25 -08:00
XiaobingZhang
84683fa271
[Bugfix] Offline example of disaggregated prefill (#13214) 2025-02-13 20:20:47 -08:00
Tyler Michael Smith
067678262a
[Bugfix][CI] Inherit codespell settings from pyproject.toml in the pre-commit-config (#13237) 2025-02-13 20:19:43 -08:00
Tyler Michael Smith
09545c0a94
[Bugfix/CI] Turn test_compressed_tensors_2of4_sparse back on (#13250) 2025-02-13 20:19:25 -08:00
Roger Wang
dd5ede4440
[V1] Consolidate MM cache size to vllm.envs (#13239) 2025-02-13 20:19:03 -08:00
Jinzhen Lin
8c32b08a86
[Kernel] Fix awq error when n is not divisable by 128 (#13227) 2025-02-13 20:07:05 -08:00
Gregory Shtrasberg
410886950a
[ROCm] Avoid using the default stream on ROCm (#13238)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-02-14 09:29:26 +08:00
Harry Mellor
e38be640e6
Revert "Add label if pre-commit passes" (#13242) 2025-02-13 16:12:32 -08:00
Tyler Michael Smith
c1e37bf71b
[Kernel][Bugfix] Refactor and Fix CUTLASS 2:4 Sparse Kernels (#13198)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-02-14 00:01:14 +00:00
Michael Goin
2344192a55
Optimize moe_align_block_size for deepseek_v3 (#12850)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-02-13 18:43:37 -05:00
Harry Mellor
bffddd9a05
Add label if pre-commit passes (#12527)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-02-13 20:51:30 +00:00
Nicolò Lucchesi
d84cef76eb
[Frontend] Add /v1/audio/transcriptions OpenAI API endpoint (#12909) 2025-02-13 07:23:45 -08:00
Vaibhav Jain
37dfa60037
[Bugfix] Missing Content Type returns 500 Internal Server Error (#13193) 2025-02-13 06:52:22 -08:00
Cyrus Leung
1bc3b5e71b
[VLM] Separate text-only and vision variants of the same model architecture (#13157) 2025-02-13 06:19:15 -08:00
02ed8a1fbe
[Misc] Qwen2.5-VL Optimization (#13155) 2025-02-13 06:17:57 -08:00
Aoyu
2092a6fa7d
[V1][Core] Add worker_base for v1 worker (#12816)
Signed-off-by: Aoyu <aoyuzhan@amazon.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Aoyu <aoyuzhan@amazon.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-02-13 20:35:18 +08:00
Cyrus Leung
c9d3ecf016
[VLM] Merged multi-modal processor for Molmo (#12966) 2025-02-13 04:34:00 -08:00
Roger Wang
fdcf64d3c6
[V1] Clarify input processing and multimodal feature caching logic (#13211) 2025-02-13 03:43:24 -08:00
Russell Bryant
578087e56c
[Frontend] Pass pre-created socket to uvicorn (#13113) 2025-02-13 00:51:46 -08:00
Isotr0py
fa253f1a70
[VLM] Remove input processor from clip and siglip (#13165) 2025-02-13 00:31:37 -08:00
Rui Qiao
9605c1256e
[V1][core] Implement pipeline parallel on Ray (#12996) 2025-02-13 08:02:46 +00:00
Russell Bryant
0ccd8769fb
[CI/Build] Allow ruff to auto-fix some issues (#13180)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-02-13 07:45:38 +00:00
Daniel Han
cb944d5818
Allow Unsloth Dynamic 4bit BnB quants to work (#12974) 2025-02-12 23:13:08 -08:00
Russell Bryant
d46d490c27
[Frontend] Move CLI code into vllm.cmd package (#12971) 2025-02-12 23:12:21 -08:00
LikeSundayLikeRain
04f50ad9d1
[Bugfix] deepseek_r1_reasoning_parser put reason content in wrong field in certain edge case (#13097) 2025-02-12 23:11:26 -08:00
Cody Yu
60c68df6d1
[Build] Automatically use the wheel of the base commit with Python-only build (#13178) 2025-02-12 23:10:28 -08:00
Lu Fang
009439caeb
Simplify logic of locating CUDART so file path (#13203)
Signed-off-by: Lu Fang <lufang@fb.com>
2025-02-13 13:52:41 +08:00
Isotr0py
bc55d13070
[VLM] Implement merged multimodal processor for Mllama (#11427) 2025-02-12 20:26:21 -08:00
Michael Goin
d88c8666a1
[Bugfix][Example] Fix GCed profiling server for TPU (#12792)
Signed-off-by: mgoin <michael@neuralmagic.com>
2025-02-13 11:52:11 +08:00
Kaixi Hou
4fc5c23bb6
[NVIDIA] Support nvfp4 quantization (#12784) 2025-02-12 19:51:51 -08:00
Kevin H. Luu
9f9704dca6
[perf-benchmark] cleanup unused Docker images and volumes in H100 benchmark instance (#12706) 2025-02-12 19:51:33 -08:00
Russell Bryant
8eafe5eaea
[CI/Build] Ignore ruff warning up007 (#13182)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-02-13 11:48:31 +08:00
Murali Andoorveedu
4c0d93f4b2
[V1][Bugfix] Copy encoder input ids to fix set iteration issue during VLM abort (#13173)
Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com>
2025-02-12 12:58:11 -08:00
Michael Goin
14b7899d10
[CI] Fix failing FP8 cpu offload test (#13170)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-02-12 19:16:06 +00:00
Michael Goin
09972e716c
[Bugfix] Allow fallback to AWQ from AWQMarlin at per-layer granularity (#13119) 2025-02-12 09:19:53 -08:00
Qubitium-ModelCloud
36a08630e8
[CORE] [QUANT] Support for GPTQModel's dynamic quantization per module override/control (#7086) 2025-02-12 09:19:43 -08:00
Russell Bryant
2c2b560f48
[CI/Build] Use mypy matcher for pre-commit CI job (#13162)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-02-12 17:12:22 +00:00
Lu Fang
042c3419fa
Introduce VLLM_CUDART_SO_PATH to allow users specify the .so path (#12998)
Signed-off-by: Lu Fang <lufang@fb.com>
2025-02-12 09:06:13 -08:00
Jee Jee Li
82cabf53a3
[Misc] Delete unused LoRA modules (#13151) 2025-02-12 08:58:24 -08:00
Rafael Vasquez
314cfade02
[Frontend] Generate valid tool call IDs when using tokenizer-mode=mistral (#12332) 2025-02-12 08:29:56 -08:00
Cyrus Leung
985b4a2b19
[Bugfix] Fix num video tokens calculation for Qwen2-VL (#13148)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-02-12 11:55:23 +00:00
bnellnm
f4d97e4fc2
[Bug] [V1] Try fetching stop_reason from EngineOutput before checking the request (#13108) 2025-02-12 02:39:16 -08:00
Shiyan Deng
f1042e86f0
[Misc] AMD Build Improvements (#12923) 2025-02-12 02:36:10 -08:00