Cyrus Leung
|
1bc3b5e71b
|
[VLM] Separate text-only and vision variants of the same model architecture (#13157)
|
2025-02-13 06:19:15 -08:00 |
|
燃
|
02ed8a1fbe
|
[Misc] Qwen2.5-VL Optimization (#13155)
|
2025-02-13 06:17:57 -08:00 |
|
Aoyu
|
2092a6fa7d
|
[V1][Core] Add worker_base for v1 worker (#12816)
Signed-off-by: Aoyu <aoyuzhan@amazon.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Aoyu <aoyuzhan@amazon.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-02-13 20:35:18 +08:00 |
|
Cyrus Leung
|
c9d3ecf016
|
[VLM] Merged multi-modal processor for Molmo (#12966)
|
2025-02-13 04:34:00 -08:00 |
|
Roger Wang
|
fdcf64d3c6
|
[V1] Clarify input processing and multimodal feature caching logic (#13211)
|
2025-02-13 03:43:24 -08:00 |
|
Russell Bryant
|
578087e56c
|
[Frontend] Pass pre-created socket to uvicorn (#13113)
|
2025-02-13 00:51:46 -08:00 |
|
Isotr0py
|
fa253f1a70
|
[VLM] Remove input processor from clip and siglip (#13165)
|
2025-02-13 00:31:37 -08:00 |
|
Rui Qiao
|
9605c1256e
|
[V1][core] Implement pipeline parallel on Ray (#12996)
|
2025-02-13 08:02:46 +00:00 |
|
Russell Bryant
|
0ccd8769fb
|
[CI/Build] Allow ruff to auto-fix some issues (#13180)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-02-13 07:45:38 +00:00 |
|
Daniel Han
|
cb944d5818
|
Allow Unsloth Dynamic 4bit BnB quants to work (#12974)
|
2025-02-12 23:13:08 -08:00 |
|
Russell Bryant
|
d46d490c27
|
[Frontend] Move CLI code into vllm.cmd package (#12971)
|
2025-02-12 23:12:21 -08:00 |
|
LikeSundayLikeRain
|
04f50ad9d1
|
[Bugfix] deepseek_r1_reasoning_parser put reason content in wrong field in certain edge case (#13097)
|
2025-02-12 23:11:26 -08:00 |
|
Cody Yu
|
60c68df6d1
|
[Build] Automatically use the wheel of the base commit with Python-only build (#13178)
|
2025-02-12 23:10:28 -08:00 |
|
Lu Fang
|
009439caeb
|
Simplify logic of locating CUDART so file path (#13203)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-02-13 13:52:41 +08:00 |
|
Isotr0py
|
bc55d13070
|
[VLM] Implement merged multimodal processor for Mllama (#11427)
|
2025-02-12 20:26:21 -08:00 |
|
Michael Goin
|
d88c8666a1
|
[Bugfix][Example] Fix GCed profiling server for TPU (#12792)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2025-02-13 11:52:11 +08:00 |
|
Kaixi Hou
|
4fc5c23bb6
|
[NVIDIA] Support nvfp4 quantization (#12784)
|
2025-02-12 19:51:51 -08:00 |
|
Kevin H. Luu
|
9f9704dca6
|
[perf-benchmark] cleanup unused Docker images and volumes in H100 benchmark instance (#12706)
|
2025-02-12 19:51:33 -08:00 |
|
Russell Bryant
|
8eafe5eaea
|
[CI/Build] Ignore ruff warning up007 (#13182)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-02-13 11:48:31 +08:00 |
|
Murali Andoorveedu
|
4c0d93f4b2
|
[V1][Bugfix] Copy encoder input ids to fix set iteration issue during VLM abort (#13173)
Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com>
|
2025-02-12 12:58:11 -08:00 |
|
Michael Goin
|
14b7899d10
|
[CI] Fix failing FP8 cpu offload test (#13170)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-02-12 19:16:06 +00:00 |
|
Michael Goin
|
09972e716c
|
[Bugfix] Allow fallback to AWQ from AWQMarlin at per-layer granularity (#13119)
|
2025-02-12 09:19:53 -08:00 |
|
Qubitium-ModelCloud
|
36a08630e8
|
[CORE] [QUANT] Support for GPTQModel's dynamic quantization per module override/control (#7086)
|
2025-02-12 09:19:43 -08:00 |
|
Russell Bryant
|
2c2b560f48
|
[CI/Build] Use mypy matcher for pre-commit CI job (#13162)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-02-12 17:12:22 +00:00 |
|
Lu Fang
|
042c3419fa
|
Introduce VLLM_CUDART_SO_PATH to allow users specify the .so path (#12998)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-02-12 09:06:13 -08:00 |
|
Jee Jee Li
|
82cabf53a3
|
[Misc] Delete unused LoRA modules (#13151)
|
2025-02-12 08:58:24 -08:00 |
|
Rafael Vasquez
|
314cfade02
|
[Frontend] Generate valid tool call IDs when using tokenizer-mode=mistral (#12332)
|
2025-02-12 08:29:56 -08:00 |
|
Cyrus Leung
|
985b4a2b19
|
[Bugfix] Fix num video tokens calculation for Qwen2-VL (#13148)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-02-12 11:55:23 +00:00 |
|
bnellnm
|
f4d97e4fc2
|
[Bug] [V1] Try fetching stop_reason from EngineOutput before checking the request (#13108)
|
2025-02-12 02:39:16 -08:00 |
|
Shiyan Deng
|
f1042e86f0
|
[Misc] AMD Build Improvements (#12923)
|
2025-02-12 02:36:10 -08:00 |
|
Maximilien de Bayser
|
7c4033acd4
|
Further reduce the HTTP calls to huggingface.co (#13107)
|
2025-02-12 02:34:09 -08:00 |
|
dependabot[bot]
|
d59def4730
|
Bump actions/setup-python from 5.3.0 to 5.4.0 (#12672)
|
2025-02-12 16:41:22 +08:00 |
|
dependabot[bot]
|
0c7d9effce
|
Bump helm/chart-testing-action from 2.6.1 to 2.7.0 (#12463)
|
2025-02-12 16:41:06 +08:00 |
|
dependabot[bot]
|
dd3b4a01f8
|
Bump actions/stale from 9.0.0 to 9.1.0 (#12462)
|
2025-02-12 00:40:25 -08:00 |
|
dependabot[bot]
|
a0597c6b75
|
Bump helm/kind-action from 1.10.0 to 1.12.0 (#11612)
|
2025-02-12 00:40:19 -08:00 |
|
Lingfan Yu
|
e92694b6fe
|
[Neuron][Kernel] Support Longer Sequences in NKI-based Flash PagedAttention and Improve Efficiency (#12921)
Signed-off-by: Lingfan Yu <lingfany@amazon.com>
|
2025-02-11 21:12:37 -08:00 |
|
Kevin H. Luu
|
842b0fd402
|
[ci] Add more source file dependencies for some tests (#13123)
Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>
|
2025-02-11 20:38:10 -08:00 |
|
Christian Pinto
|
974dfd4971
|
[Model] IBM/NASA Prithvi Geospatial model (#12830)
|
2025-02-11 20:34:30 -08:00 |
|
Keyun Tong
|
3ee696a63d
|
[RFC][vllm-API] Support tokenizer registry for customized tokenizer in vLLM (#12518)
Signed-off-by: Keyun Tong <tongkeyun@gmail.com>
|
2025-02-12 12:25:58 +08:00 |
|
Russell Bryant
|
72c2b68dc9
|
[Misc] Move pre-commit suggestion back to the end (#13114)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-02-11 22:34:16 +00:00 |
|
Yuan Tang
|
14ecab5be2
|
[Bugfix] Guided decoding falls back to outlines when fails to import xgrammar (#12976)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
|
2025-02-11 18:17:44 +00:00 |
|
Harry Mellor
|
deb6c1c6b4
|
[Doc] Improve OpenVINO installation doc (#13102)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-02-11 18:02:46 +00:00 |
|
Li, Jiang
|
565c1efa65
|
[CI/Build][Bugfix] Fix CPU backend default threads num (#13077)
|
2025-02-11 16:55:56 +00:00 |
|
Szymon Ożóg
|
2b25b7d2e1
|
Fix initializing GGUF weights for ColumnParallelLinear when using tensor parallel > 1 (#13023)
|
2025-02-11 08:38:48 -08:00 |
|
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
|
6c4dbe23eb
|
[BugFix] Pop instead of del CUDA_VISIBLE_DEVICES (#12962)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
|
2025-02-12 00:21:50 +08:00 |
|
MoonRide303
|
21f5d50fa5
|
[Bugfix] Do not use resource module on Windows (#12858) (#13029)
|
2025-02-11 08:21:18 -08:00 |
|
Jewon Lee
|
bf3e05215c
|
[Misc] Fix typo at comments at metrics.py (#13024)
|
2025-02-11 08:20:37 -08:00 |
|
Harry Mellor
|
ad9776353e
|
Set torch_dtype in TransformersModel (#13088)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-02-11 23:51:19 +08:00 |
|
Mark McLoughlin
|
75e6e14516
|
[V1][Metrics] Add several request timing histograms (#12644)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-02-11 10:14:00 -05:00 |
|
மனோஜ்குமார் பழனிச்சாமி
|
110f59a33e
|
[Bugfix] fix flaky test (#13089)
Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>
|
2025-02-11 14:41:20 +00:00 |
|