bk-201
|
f3a55ff958
|
fix mm_hash
Signed-off-by: bk-201 <joy25810@foxmail.com>
|
2025-12-22 13:53:52 +00:00 |
|
Jee Jee Li
|
8aedddd546
|
Merge branch 'main' into mlm-full-lora-support
|
2025-12-21 19:30:22 +08:00 |
|
bk-201
|
fa6dd85421
|
fix
Signed-off-by: bk-201 <joy25810@foxmail.com>
|
2025-12-21 04:25:59 +00:00 |
|
bk-201
|
81b5ace128
|
revert lora_kwargs change
Signed-off-by: bk-201 <joy25810@foxmail.com>
|
2025-12-21 04:14:11 +00:00 |
|
bk-201
|
20402090b8
|
move mm-token-functions to model
Signed-off-by: bk-201 <joy25810@foxmail.com>
|
2025-12-21 03:34:32 +00:00 |
|
Chauncey
|
bb80f69bc9
|
add aarnphm and chaunceyjiang to the new tool_parser directory (#31088)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-12-21 03:24:34 +00:00 |
|
B-201
|
a3a8fc1fd0
|
Merge pull request #12 from Anexdeus/mlm-full-lora-support
Extended SupportsMultiModal
|
2025-12-21 11:02:44 +08:00 |
|
汪志鹏
|
3e92b2b7ac
|
[BugFix]fix gpt-oss v1/completions response bug (#30608)
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: bbrowning <bbrownin@redhat.com>
|
2025-12-21 10:39:31 +08:00 |
|
Jinzhen Lin
|
7c73ceb581
|
[Quantization] add marlin w4a8/w8a8 check (#31061)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
|
2025-12-20 21:58:11 +00:00 |
|
Lucas Wilkinson
|
ae0770fa6b
|
[CI] Fix H200 Distributed test (#31054)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-12-20 16:48:49 -05:00 |
|
Jinzhen Lin
|
ee52d9901d
|
[Quantization] support logical_widths for fp8 marlin (#30962)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-20 12:02:57 -08:00 |
|
Anexdeus
|
86c6c5cf00
|
removed get_allowed_mm_limits() from models
|
2025-12-20 21:56:07 +03:00 |
|
baonudesifeizhai
|
54c8924384
|
[MoE Refactor][5/N] Isolate zero expert to LongCatFlash (#28891)
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Signed-off-by: Dongjie Zou <85092850+baonudesifeizhai@users.noreply.github.com>
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robertgshaw2@gmail.com>
|
2025-12-20 18:22:04 +00:00 |
|
Anexdeus
|
2b03137fca
|
Merge branch 'mlm-full-lora-support' of https://github.com/jeejeelee/vllm into mlm-full-lora-support
|
2025-12-20 20:40:31 +03:00 |
|
bk-201
|
cb72a0ef01
|
fix pre-commit
Signed-off-by: bk-201 <joy25810@foxmail.com>
|
2025-12-20 16:36:13 +00:00 |
|
bk-201
|
68116edfe2
|
fix bug
Signed-off-by: bk-201 <joy25810@foxmail.com>
|
2025-12-20 16:20:12 +00:00 |
|
Anexdeus
|
c6831e793d
|
extended SupportsMultiModal
|
2025-12-20 17:22:41 +03:00 |
|
Yan Ma
|
560ae9638c
|
[XPU] enable fp8 online streaming quantization (#30944)
Signed-off-by: Yan Ma <yan.ma@intel.com>
|
2025-12-20 13:45:27 +00:00 |
|
Anexdeus
|
cd32aeadfa
|
Merge branch 'jeejeelee:mlm-full-lora-support' into mlm-full-lora-support
|
2025-12-20 15:29:40 +03:00 |
|
Anexdeus
|
d525556a25
|
Revert the mixin changes
|
2025-12-20 13:31:53 +03:00 |
|
Jeffrey Wang
|
1501a4070e
|
[Bugfix] Read truncate_prompt_tokens from pooling_params in AsyncLLM.encode() (#31013)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
|
2025-12-20 10:29:31 +00:00 |
|
Anexdeus
|
b03d1a04a8
|
added ProcessingInfoMixin for QwenVL series models
|
2025-12-20 12:29:46 +03:00 |
|
Jee Jee Li
|
e5ba472ae2
|
Merge branch 'main' into mlm-full-lora-support
|
2025-12-20 15:19:28 +08:00 |
|
bk-201
|
4c2e95ad56
|
correct f-string formatting
Signed-off-by: bk-201 <joy25810@foxmail.com>
|
2025-12-20 06:23:33 +00:00 |
|
bk-201
|
9c9950c080
|
fix
Signed-off-by: bk-201 <joy25810@foxmail.com>
|
2025-12-20 04:05:59 +00:00 |
|
Lucas Wilkinson
|
ff2168bca3
|
[CI] FIx fixture 'siglip_attention_config' not found (#31053)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-12-20 03:46:15 +00:00 |
|
Gregory Shtrasberg
|
0be149524c
|
[ROCm][CI/Build] Update ROCm dockerfiles (#30991)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-12-20 03:19:12 +00:00 |
|
Jee Jee Li
|
d053aa73e1
|
Fix
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-20 01:47:11 +00:00 |
|
zejunchen-zejun
|
d52c5096d7
|
[Bugfix] fix the alias bug of AttentionBackendEnum when register CUSTOM attention backend to vllm (#30869)
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
|
2025-12-20 09:03:35 +08:00 |
|
Jee Jee Li
|
463074fac8
|
Merge branch 'main' into mlm-full-lora-support
|
2025-12-20 08:25:41 +08:00 |
|
Yuxuan Zhang
|
8a7a414374
|
GLM-4.7 Tool Parser and Doc Update (#30876)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
|
2025-12-20 00:09:58 +00:00 |
|
Robert Shaw
|
95befecc18
|
[MoE Refactor][2/N] Use Modular Kernels for Fp8 (#30825)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-12-19 23:36:38 +00:00 |
|
Wentao Ye
|
4cf9429897
|
[Bug] Fix error 'Dynamo failed to run FX node with fake tensors for Deepseek V3.2 (#31046)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-19 23:31:31 +00:00 |
|
Robert Shaw
|
83a317f650
|
[MoE Refactor][3/N] Deprecate cutlass block quant fp8 (b200) (#30990)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-12-19 13:09:54 -08:00 |
|
Lucas Wilkinson
|
5f6477d1d0
|
[BugFix] Fix TypeError: unhashable type: 'dict' when serving deepseek32 (#30924)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-12-19 16:07:54 -05:00 |
|
Wentao Ye
|
3bd8335bd0
|
[Refactor] Refactor for DeepGemmQuantScaleFMT using cache (#30898)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-19 13:50:39 -07:00 |
|
Seiji Eicher
|
1ab5213531
|
Make engine core client handshake timeout configurable (#27444)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-12-19 20:38:30 +00:00 |
|
Zhonghua Deng
|
969bbc7c61
|
[Model] Add MiMo-V2-Flash support (#30836)
Signed-off-by: Abatom <abzhonghua@gmail.com>
Signed-off-by: Jumiar <liuanqim10@126.com>
Signed-off-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jumiar <liuanqim10@126.com>
Co-authored-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-19 17:17:03 +00:00 |
|
bk-201
|
764aa45140
|
fix bug
Signed-off-by: bk-201 <joy25810@foxmail.com>
|
2025-12-19 16:57:25 +00:00 |
|
Andrey Talman
|
268a972c62
|
Update Pytorch version update docs (#30982)
|
2025-12-19 16:08:53 +00:00 |
|
Jinzhen Lin
|
5fbfa8d9ef
|
[Quantization] fix marlin w8a8 check (#30961)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
|
2025-12-19 07:33:22 -08:00 |
|
Shanshan Shen
|
23a1946e3b
|
[CustomOp][Refactor] Extract common methods for ApplyRotaryEmb CustomOp (#31021)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2025-12-19 22:16:09 +08:00 |
|
Thomas Parnell
|
b5545d9d5c
|
[Bugfix] [Kernel] Triton attention kernels: mask out V blocks that fall outside sliding window (#30887)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-12-19 21:39:54 +08:00 |
|
Nishidha Panpaliya
|
bd2b52fc2d
|
[CPU][Bugfix] Fix ppc64le CPU build (#30871)
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>
|
2025-12-19 12:26:35 +00:00 |
|
Li, Jiang
|
420ba2dbb6
|
Enable aarch64 CPU performance benchmarks (#26494)
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: Ioana Ghiban <ioana.ghiban@arm.com>
Co-authored-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2025-12-19 12:16:18 +00:00 |
|
Marko Rosenmueller
|
455949675d
|
[Frontend][Bug] allow tool calls in analysis channel (#28139)
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-12-19 10:47:44 +00:00 |
|
lif
|
086b96339f
|
[Bugfix] Add validation for tool requests when tool_parser is unavailable (#30613)
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2025-12-19 18:23:28 +08:00 |
|
Jinzhen Lin
|
9187de9fac
|
[Quantization] enable compressed-tensors marlin support for turing (2) (#31008)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
|
2025-12-19 08:56:35 +00:00 |
|
Isotr0py
|
ac1c934276
|
[Bugfix] Fix incorrect tiles creation for mm prefix triton attention (#30974)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-19 16:00:33 +08:00 |
|
Wenqi Glantz
|
4924ac582c
|
Add hidden dimension validation for multimodal embedding inputs (#30968)
Signed-off-by: Wenqi Glantz <wglantz@nvidia.com>
|
2025-12-19 07:59:36 +00:00 |
|