Russell Bryant
|
ec54d73c31
|
[CI] Fix test_collective_rpc (#17858)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-05-08 16:47:12 +00:00 |
|
Jee Jee Li
|
a944f8ede7
|
[Misc] Delete LoRA-related redundancy code (#17841)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-05-08 06:02:21 -07:00 |
|
Cyrus Leung
|
015815fe01
|
[Bugfix] use_fast failing to be propagated to Qwen2-VL image processor (#17838)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-08 05:39:21 -07:00 |
|
Harry Mellor
|
e4ca6e3a99
|
Fix transient dependency error in docs build (#17848)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-08 03:42:03 -07:00 |
|
Reid
|
53d0cb7423
|
[Misc] add chatbox integration (#17828)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-08 10:05:26 +00:00 |
|
Lu Fang
|
f50dcb7c21
|
[Easy] Eliminate c10::optional usage in vllm/csrc (#17819)
|
2025-05-08 03:05:10 -07:00 |
|
Cyrus Leung
|
a1e19b635d
|
[Doc] Fix a typo in the file name (#17836)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-08 18:04:18 +08:00 |
|
fxmarty-amd
|
bb239a730f
|
[Bugfix] Fix quark fp8 format loading on AMD GPUs (#12612)
Signed-off-by: Felix Marty <felmarty@amd.com>
Signed-off-by: kewang2 <kewang2@amd.com>
Co-authored-by: kewang2 <kewang2@amd.com>
|
2025-05-08 02:53:53 -07:00 |
|
Jevin Jiang
|
a463555dee
|
[TPU] Fix the test_sampler (#17820)
|
2025-05-08 05:51:33 -04:00 |
|
Rick Yuan
|
ca04b97c93
|
[Bugfix] Fix tool call template validation for Mistral models (#17644)
Signed-off-by: Rick Yuan <yuan821120@gmail.com>
Signed-off-by: RIck Yuan <yuan821120@gmail.com>
Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com>
|
2025-05-08 09:47:19 +00:00 |
|
xsank
|
0a9bbaa104
|
[Misc] support model prefix & add deepseek vl2 tiny fused moe config (#17763)
Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com>
Co-authored-by: 唯勤 <xsank.mz@alibaba-inc.com>
|
2025-05-08 07:50:22 +00:00 |
|
Qiong Zhou Huang
|
39956efb3f
|
[Bugfix] Fix bad words for Mistral models (#17753)
Signed-off-by: Qiong Zhou Huang <qiong@phonic.co>
|
2025-05-07 23:32:10 -07:00 |
|
Ximingwang-09
|
597051e56f
|
[Qwen3]add qwen3-235b-bf16 fused moe config on A100 (#17715)
|
2025-05-07 23:09:32 -07:00 |
|
Cyrus Leung
|
96722aa81d
|
[Frontend] Chat template fallbacks for multimodal models (#17805)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-07 23:05:54 -07:00 |
|
Agata Dobrzyniewicz
|
843b222723
|
[Hardware][Intel-Gaudi] Support Automatic Prefix Caching on HPU (#17648)
Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
|
2025-05-07 22:37:03 -07:00 |
|
Akash kaothalkar
|
e515668edf
|
[Hardware][Power] Enable compressed tensor W8A8 INT8 quantization for POWER (#17153)
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-05-07 22:35:03 -07:00 |
|
Hashem Hashemi
|
5a499e70d5
|
[Kernel][Hardware][AMD] Bf16 mfma opt for ROCm skinny GEMMs (#17071)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
Signed-off-by: charlifu <charlifu@amd.com>
Co-authored-by: charlifu <charlifu@amd.com>
|
2025-05-07 22:34:49 -07:00 |
|
Russell Bryant
|
6930a41116
|
[V1] Add VLLM_ALLOW_INSECURE_SERIALIZATION env var (#17490)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-05-08 13:34:02 +08:00 |
|
Harry Mellor
|
998eea4a0e
|
Only log non-default CLI args for online serving (#17803)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-07 22:33:29 -07:00 |
|
Mikhail Podvitskii
|
c747d84576
|
[Installation] OpenTelemetry version update (#17771)
Signed-off-by: Mikhail Podvitskii <podvitskiymichael@gmail.com>
|
2025-05-07 22:32:49 -07:00 |
|
Vadim Markovtsev
|
b2da14a05a
|
Improve exception reporting in MP engine (#17800)
Signed-off-by: Vadim Markovtsev <vadim@poolside.ai>
|
2025-05-08 05:32:39 +00:00 |
|
Chanh Nguyen
|
7ea2adb802
|
[Core] Support full cuda graph in v1 (#16072)
Signed-off-by: Chanh Nguyen <cnguyen@linkedin.com>
Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com>
|
2025-05-07 22:30:15 -07:00 |
|
Nick Hill
|
3d13ca0e24
|
[BugFix] Fix --disable-log-stats in V1 server mode (#17600)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-05-08 04:08:15 +00:00 |
|
Harry Mellor
|
66ab3b13c9
|
Don't call the venv vllm (#17810)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-08 04:06:39 +00:00 |
|
Aaron Pham
|
a8238bbdb0
|
[Chore][Doc] uses model id determined from OpenAI client (#17815)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-05-08 01:48:57 +00:00 |
|
Wallas Henrique
|
d43f914d42
|
[Core][Feature] Input metadata dump on crash (#13407)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
|
2025-05-07 22:15:09 +00:00 |
|
Nick Hill
|
ed5272cf21
|
[BugFix] Avoid secondary missing MultiprocExecutor.workers error (#17811)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-05-07 21:55:04 +00:00 |
|
Akshat Tripathi
|
c20ef40fd0
|
[Hardware][TPU][V1] Multi-LoRA implementation for the V1 TPU backend (#14238)
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
|
2025-05-07 16:28:47 -04:00 |
|
Bowen Bao
|
db593aa67f
|
[Quantization] Quark MXFP4 format loading (#16943)
|
2025-05-07 15:05:05 -04:00 |
|
Isotr0py
|
f98e307588
|
[Bugfix] Fix missing lora name mapping for lora without prefix (#17793)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-07 16:17:12 +00:00 |
|
Harry Mellor
|
646a31e51e
|
Fix and simplify deprecated=True CLI kwarg (#17781)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-07 16:51:06 +01:00 |
|
Isotr0py
|
be8ff88e66
|
[Bugfix] Fix Video IO error for short video (#17791)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-07 15:36:06 +00:00 |
|
Christian Heimes
|
1a6af1453d
|
Only depend on importlib-metadata for Python < 3.10 (#17776)
Signed-off-by: Christian Heimes <christian@python.org>
|
2025-05-07 07:51:06 -07:00 |
|
Gregory Shtrasberg
|
32aa74c09c
|
[ROCm][FP8][Kernel] FP8 quantization fused into Custom Paged Attention (#17139)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-05-07 07:12:35 -07:00 |
|
Reid
|
7377dd0307
|
[doc] update the issue link (#17782)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-07 20:29:05 +08:00 |
|
Yong Hoon Shin
|
98c89e16ff
|
Make key optional for rotary embedding (#17566)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-05-07 00:11:46 -07:00 |
|
Yong Hoon Shin
|
324a3119b0
|
Fix test_memory_usage_no_spec (#17754)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-05-07 00:10:33 -07:00 |
|
Cyrus Leung
|
8a15c2603a
|
[Frontend] Add missing chat templates for various MLLMs (#17758)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-07 00:10:01 -07:00 |
|
Satyajith Chilappagari
|
043e4c4955
|
Add NeuronxDistributedInference support, Speculative Decoding, Dynamic on-device sampling (#16357)
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>
Co-authored-by: Aaron Dou <yzdou@amazon.com>
Co-authored-by: Shashwat Srijan <sssrijan@amazon.com>
Co-authored-by: Chongming Ni <chongmni@amazon.com>
Co-authored-by: Amulya Ballakur <amulyaab@amazon.com>
Co-authored-by: Patrick Lange <patlange@amazon.com>
Co-authored-by: Elaine Zhao <elaineyz@amazon.com>
Co-authored-by: Lin Lin Pan <tailinpa@amazon.com>
Co-authored-by: Navyadhara Gogineni <navyadha@amazon.com>
Co-authored-by: Yishan McNabb <yishanm@amazon.com>
Co-authored-by: Mrinal Shukla <181322398+mrinalks@users.noreply.github.com>
|
2025-05-07 00:07:30 -07:00 |
|
Jee Jee Li
|
ba7703e659
|
[Misc] Remove qlora_adapter_name_or_path (#17699)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-05-06 23:10:37 -07:00 |
|
Wanrui Dai
|
f80ae5bdcf
|
[Kernel] Use fused rmsnorm for some models like qwen3 series (#17735)
Signed-off-by: evian <eviantai@u.nus.edu>
Co-authored-by: evian <eviantai@u.nus.edu>
|
2025-05-06 23:10:02 -07:00 |
|
Szymon Ożóg
|
1a45a61387
|
[Kernel] GGUF MoeVec kernel (#16780)
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-05-06 23:07:23 -07:00 |
|
Isotr0py
|
c3e9d5060e
|
[Misc] Use apply_rotary_emb from vllm_flash_attn for Qwen2-VL vision RoPE (#17726)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-07 04:51:33 +00:00 |
|
Jee Jee Li
|
822de7fb94
|
[Misc] Split model loader (#17712)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-05-07 12:42:26 +08:00 |
|
Woosuk Kwon
|
8d84d836d1
|
[BugFix][Spec Decode] Fix hidden size mismatch between target and eagle head (#17740)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-05-06 19:51:26 -07:00 |
|
Michael Goin
|
950b71186f
|
Replace lm-eval bash script with pytest and use enforce_eager for faster CI (#17717)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-06 18:00:10 -07:00 |
|
Michael Goin
|
e50a1f1a9c
|
[TPU] Add kernel test for moe_pallas (#17496)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2025-05-06 17:59:57 -07:00 |
|
Michael Goin
|
a17cef70ea
|
Removed unused marlin cuda code (#17684)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-06 17:59:47 -07:00 |
|
Chih-Chieh Yang
|
18dd5e01f2
|
[Model] Mamba2 causal conv1d Refactor to Split Prefill and Decode Requests for Corresponding Kernels (#17146)
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
|
2025-05-06 17:59:30 -07:00 |
|
Yang Wang
|
6de3e13413
|
Add logging for torch nightly version (#17669)
Signed-off-by: Yang Wang <elainewy@meta.com>
|
2025-05-07 00:45:51 +00:00 |
|