vllmellm
|
77b6e74fe2
|
[ROCm] Remove unnecessary assertion of max_model_len in ROCM_AITER_MLA attention backend. (#18938)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-05-29 22:33:17 -07:00 |
|
H
|
5acf828d99
|
[docs] fix: fix markdown syntax (#18927)
|
2025-05-30 05:20:48 +00:00 |
|
iLeGend
|
3987e2ae96
|
[Model] Use AutoWeightsLoader for mamba2 (#18918)
Signed-off-by: iLeGend <824040212@qq.com>
|
2025-05-30 04:50:10 +00:00 |
|
Chauncey
|
77164dad5e
|
[Bugfix] Consistent ascii handling in tool parsers (#18883)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-05-30 04:44:43 +00:00 |
|
Bill Nell
|
95c40f9b09
|
hacks
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-05-30 02:33:58 +00:00 |
|
Wenhua Cheng
|
3de3eadf5b
|
improve the robustness of parsing vlms config in AutoRound (#18894)
Signed-off-by: wenhuach21 <wenhua.cheng@intel.com>
|
2025-05-29 19:24:47 -07:00 |
|
Carol Zheng
|
3132290a14
|
[TPU][CI/CD] Clean up docker for TPU tests. (#18926)
Signed-off-by: Carol Zheng <cazheng@google.com>
|
2025-05-30 10:24:19 +08:00 |
|
Cyrus Leung
|
1aa2f81b43
|
[Misc] Update type annotation for rotary embedding base (#18914)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-30 10:17:01 +08:00 |
|
Michael Goin
|
d54af615d5
|
[Bugfix] Fix PP default fallback behavior for V1 (#18915)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-30 10:13:17 +08:00 |
|
Bill Nell
|
a0efd3106c
|
hack fix MoEConfig.quant_dtype
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-05-30 02:08:21 +00:00 |
|
Bill Nell
|
e69879996f
|
re-enable cudagraph+torch.compile
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-05-30 00:12:54 +00:00 |
|
Chengji Yao
|
a1cc9f33a3
|
[TPU] remove transpose ops in moe kernel (#18923)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-05-29 23:00:11 +00:00 |
|
Richard Zou
|
a521ef06e5
|
Use standalone_compile by default in torch >= 2.8.0 (#18846)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-05-30 06:41:58 +08:00 |
|
Bill Nell
|
922165cba3
|
fp8 + pplx tests + fixes
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-05-29 21:25:33 +00:00 |
|
Bill Nell
|
12ea698498
|
pplx + fp8 test
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-05-29 18:50:37 +00:00 |
|
Will Eaton
|
64eaf5fe05
|
[P/D] NixlConnector DP fixes (#18903)
Signed-off-by: Will Eaton <weaton@redhat.com>
|
2025-05-29 18:08:40 +00:00 |
|
Nick Hill
|
d1d61f3351
|
[BugFix] Make DP work with connector-delayed new requests (#18559)
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Will Eaton <weaton@redhat.com>
|
2025-05-29 18:04:18 +00:00 |
|
Nicolò Lucchesi
|
32ce3cf7c9
|
[V1] Allocate kv_cache with stride order for V1 (#18775)
Signed-off-by: nicklucche <nlucches@redhat.com>
|
2025-05-29 17:54:16 +00:00 |
|
CYJiang
|
d58f9c7f7a
|
[Misc] Remove duplicate init for self.vllm_config (#18896)
Signed-off-by: googs1025 <googs1025@gmail.com>
|
2025-05-29 17:26:07 +00:00 |
|
Cyrus Leung
|
c29034037d
|
[Deprecation] Disallow pos-args other than model when initializing LLM (#18802)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-29 09:36:58 -07:00 |
|
Gregory Shtrasberg
|
1b7cfd5a36
|
[ROCm][V0][Attention] Revert to the previous FA triton kernel (#18226)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-05-29 12:13:18 -04:00 |
|
Gregory Shtrasberg
|
da4b69d0b4
|
[Attention][V1] Toggle for v1 attention backend (#18275)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-05-29 10:48:24 -04:00 |
|
Isotr0py
|
c9479b2920
|
[Bugfix] Fix the failing gte embedding test (#18720)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-29 07:39:25 -07:00 |
|
Hyogeun Oh (오효근)
|
6f2909405e
|
[Doc] Fix codeblocks formatting in LoRA adapters documentation (#18907)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
|
2025-05-29 07:38:55 -07:00 |
|
Duyi-Wang
|
b169d5f7b6
|
[Misc][Tools][Benchmark] Add benchmark_serving supports for llama.cpp. (#18692)
Signed-off-by: Duyi-Wang <duyi.wang@intel.com>
|
2025-05-29 20:02:08 +08:00 |
|
Chenyaaang
|
f8977c233f
|
Fix an error in dummy weight loading for quantization models (#18855)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-05-29 03:07:20 -07:00 |
|
Luka Govedič
|
f274581f44
|
[BugFix] Update pydantic to fix error on python 3.10 (#18852)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-05-29 03:05:46 -07:00 |
|
Lukas Geiger
|
0b1447f890
|
[Bugfix] Ensure tensors are contiguous during serialisation (#18860)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-05-29 03:05:20 -07:00 |
|
Nicolò Lucchesi
|
24d0ef8970
|
[Misc] Replace TODO in serving transcription (#18895)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-05-29 02:58:14 -07:00 |
|
Jee Jee Li
|
7fcfd954ff
|
[Bugfix] Fix misleading information in the documentation (#18845)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-05-29 02:54:14 -07:00 |
|
Reid
|
e740d07f07
|
[doc] add CLI doc (#18871)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-29 09:51:36 +00:00 |
|
Michael Yao
|
a652e71dd0
|
[Doc] Remove redundant spaces from compatibility_matrix.md (#18891)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-05-29 02:51:20 -07:00 |
|
Jee Jee Li
|
34d6c447c4
|
[LoRA] Add LoRA support for InternVL (#18842)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-05-29 08:46:24 +00:00 |
|
Satyajith Chilappagari
|
972eddf7c9
|
[Neuron] Add multi-LoRA support for Neuron. (#18284)
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>
|
2025-05-29 16:41:22 +08:00 |
|
Brent Salisbury
|
fd7bb88d72
|
Fixes a dead link in nightly benchmark readme (#18856)
Signed-off-by: Brent Salisbury <bsalisbu@redhat.com>
|
2025-05-29 04:41:39 +00:00 |
|
Yikun Jiang
|
3c49dbdd03
|
Skip device and quant Pydantic validation to make plugin device work (#18843)
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
|
2025-05-28 20:12:30 -07:00 |
|
aws-elaineyz
|
1661a9c28f
|
[Doc][Neuron] Update documentation for Neuron (#18868)
Signed-off-by: Elaine Zhao <elaineyz@amazon.com>
|
2025-05-28 19:44:01 -07:00 |
|
Chengji Yao
|
8e882ffdc0
|
[Bugfix][TPU] fix moe custom kernel import (#18853)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-05-28 19:34:19 -07:00 |
|
Richard Zou
|
26b4fa45be
|
Add ability to use CUDAGraphs with use_inductor=False (#17345)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-05-29 10:16:52 +08:00 |
|
Maximilien de Bayser
|
515b413ebf
|
Prevent the cross-encoder logic from being applied to classification tasks (#18838)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-05-28 19:16:17 -07:00 |
|
Bill Nell
|
caca0b718a
|
fixes
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-05-29 02:08:22 +00:00 |
|
Bill Nell
|
d86e3f0172
|
lint
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-05-28 23:40:56 +00:00 |
|
Bill Nell
|
3ca8322b74
|
lint
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-05-28 23:40:56 +00:00 |
|
Bill Nell
|
03b41b6cad
|
fix merge
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-05-28 23:40:56 +00:00 |
|
Bill Nell
|
cad6447664
|
fix
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-05-28 23:40:56 +00:00 |
|
Bill Nell
|
c169b05541
|
merge
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-05-28 23:40:56 +00:00 |
|
Bill Nell
|
468d16654a
|
cleanup quantization
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-05-28 23:40:53 +00:00 |
|
Bill Nell
|
909f234faa
|
stuff
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-05-28 23:40:27 +00:00 |
|
Bill Nell
|
f8510587c2
|
tests + fix
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-05-28 23:40:27 +00:00 |
|
Bill Nell
|
9cfebf51ba
|
basic working test
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-05-28 23:40:27 +00:00 |
|